Machine-learning

A review of unsupervised learning in astronomy

Fotopoulou 2024 put together an overview of the unsupervised learning landscape in astronomy.

An interactive dashboard for visualisation, integration and classification of data using Active Learning

Stevens at al., 2021 presented a flexible and interactive dashboard for classification of tabular data, combining local files and web services.

Using scikit-learn as backend, and easily extensible to support Tensorflow, astronomicAL can be used to train models, label data, and visualize an active learning workflow.

[Get the software, Read the docs]

Detecting neutral hydrogen at z ≳ 3 in large spectroscopic surveys of quasars

Fumagalli, Fotopoulou & Thomson 2020 presented a pipeline for identification of spectra with Lyman Limit Systems (LLS) present along the line of sight of quasars.

Using Random Forest, the algorithm was trained first to identify the presence of LLS, and as a second step in the processing the redshift and column density.

The figure on the left shows the median stacked spectrum of quasars with LLS at the rest frame of the quasar (blue curve), quasars without LLS in the restframe of the quasar (orange curve), spectra with LLS present at the restframe of the LLS (green curve).

Unsupervised classification - application of hdbscan

Supervised machine-learning methods are limited by the quality of the training sample. Specifically for astronomy this would mean most of the time redshift and magnitude limitations.

Logan & Fotopoulou 2020 used Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) to classify the CPz dataset into stars, galaxies and QSO without the need for an extensive training set. The method was then applied to 2.7 million sources from the KiDS and VIKING surveys (KiDSVW sample).

The four panels on the right show the impact of supervised vs unsupervised learning. In a supervised setting, the algorithm is bound to repeat the caveats of the labels. (a) spectroscopic labels from SDSS DR14 (b) HSBSCAN classification (c) vetoed SDSS Quasar catalog (Paris et al., 2018) (d) Random Forest classification based on spectroscopic labels (Nakoneczny et al., 2019).

[Get the data, get the code]

Classification-aided photometric redshift estimation [CPz]

The identification of galaxy populations in an extragalactic survey is crucial not only for any further scientific analysis, but also for the accurate processing of the data according to their nature.

Fotopoulou & Paltani 2018, showed that a pre-classification step using a Random Forest classifier can be used to split the sample into stars, galaxy type (passive, starforming, starburst, AGN ,QSO), and also to identify catastrophic photometric redshift outliers, resulting to more accurate photometric redshift estimates.

The figure on the left shows the outcome of the classification on 40,000 sources drawn from 200 sq.deg. Notice how the stars (black) and the quasars (blue) separate themselves from the galaxy population.

[Get the data]