Selection of PTF Sources Based on Light-curve Variability

Research output: Book/ReportPh.D. thesisResearch

The Universe is a vast and diverse place. Fortunately for our exploration of the world around us, we have access to a wide array of tools for astronomy and data analysis. For the efficient study of astrophysical objects, it is important to choose the right tools. In this context, the easiest and cheapest source of information is light. The light we gather tells us, for example, the brightness and colours of objects. Over time, we can analyse the variability of brightness. With this, we can select specific types of light sources to study them and their connection to the rest of the Universe. For large surveys, such as the Vera C. Rubin Observatory Legacy Survey of Space and Time, selection must be done automatically. This is possible through new methods in statistics and machine learning.
We aim to characterise and identify astrophysical sources using variability and colours. In Chapter 4, we study the properties of quasars, stars and galaxies in the Palomar Transient Factory (PTF). We do so by selecting objects for these three classes using simple criteria, and then we study the differences in variability and colours. In Chapter 5, we create a machine learning model for efficient classification, and we compare the roles of variability and colours.
To quantify variability, we predict how much an object will change in magnitude depending on the difference in time between observations. For this, we use a simple power law model to extract two variability parameters. With 71 million fits to PTF light curves, and matches to optical and infrared colours in Pan-STARRS1 and the Wide-field Infrared Survey Explorer for most of them, this provides a large data set for determining the common properties of different object types. We select objects for each class using colour and variability to study their photometric properties and identify inconsistencies with spectroscopic classifications ("labels") by the Sloan Digital Sky Survey. For automatic classification, we use a histogram-based gradient boosting classification tree, which learns decision boundaries to separate the classes in a high-dimensional parameter space. We implement efficient model selection using random search with successive halving and combine input parameters for more efficient learning. For example, we subtract magnitudes in different bands to create colours.
We find the automatic classification model to perform well with a quasar completeness of 92.49 % and a purity of 95.64 %. It is fast to train, easy to implement, automatically handles missing values and does not need scaling of inputs or calibration of outputs. We create a catalogue of the 71 million objects including their predicted classes, the probabilities of belonging to each class, structure function parameters and magnitudes. Selecting subsets of the data reveals a similar performance down to 100 000 labeled samples, which we recommend for similar, future studies, although the algorithm is well suited to large data sets. With both manual and automatic selection techniques, we find that selection by 7 band colour information performs better than by monochromatic variability in both completeness and purity for PTF sources. Fitting structure functions is cheaper than taking spectroscopy, and in the future, structure function fitting might be prioritised for the most relevant sources depending on the resources available. Experiments with different feature engineering and data might reveal further performance improvements. For large, future datasets, it is key to optimise computational resources, and in that context we recommend using histogram-based gradient boosting for astrophysical object classification.
Original languageEnglish
PublisherNiels Bohr Institute, Faculty of Science, University of Copenhagen
Number of pages189
Publication statusPublished - 2024

ID: 382757508