Deep Learning the Deep Sky: Recovering low surface brightness objects with machine learning
- Missing Satellite Problem
- Around 38 dwarf galaxies have been observed in the Local Group, and only around 11 orbiting the Milky Way, yet dark matter simulations predict that there should be around 500 dwarf satellites for the Milky Way alone.
- Potential Resolutions:
- Smaller-sized clumps of dark matter may be unable to obtain or retain the baryonic matter needed to form stars in the first place.
- After they form, dwarf galaxies may be quickly "eaten" by the larger galaxies that they orbit.
- Non-standard DM models could modify the expected number of low mass halos.
- How to proceed? We need data!
- Optical Sky Surveys
- Typically detect compact bright sources.
- Surveys telescope optics are not optimised for faint and extended objects.
- We see the tops of the mountains.
- But small hills deep in the valleys are difficult to detect.
- These small hills are where the dwarf galaxies lie.
- Paudel el al. 2022 (submitted)
- Using 7,000+ square degrees of the Legacy Survey DR9 (g, r, i-band imaging)
- Visual search and classification of faint early-type dwarf (dE)
- With an army of students in Nepal they detected 5,405 dEs
- Machine Learning Approach
- We now have a training set of 5,000+ images.
- Each image also contains many stars and galaxies.
- How will the machine know what we want to look for?
- We set two target classes.
- 1 = Image containing dE
- 0 = Image containing Random Sky (including galaxies, stars, etc)
- How to generate Random Sky image fairly...
- At each dE we randomly move 1 degree away and take a image.
- So we end up with same number of random sky images.
- ~10,000 images in total for training, validation and testing.
- Random Sky Image Sampling
- Preprocessing Pipeline...
- Load image
- Split into RGB colours, Crop, De-centre
- We should decentre the dE because we don't want the centre of the image to be 'special'. New images passed to the ML algorithm will not have dE at the centre of image (we don't know where they are!)
- CNN model
- Data augmentation (flips)
- 6 layer CNN (3, 3 filters)
- With Max Pulling (2, 2)
- 3 fully connected NN layers
- 2 outputs (softmax activation)
- dE or random/nothing
- Training (80/20 train/validate)
- Model accuracy: Reaching around 95% accuracy
- Testing Data Result
- Detailed inspection
- Label : Predict class % (truth class)
- Test Data Confusion Matrix
- Predictions - Actuals graph
- But wait a moment...
- Confusion Matrix = 63
- This box is telling us how many random sky boxes were predicted to have a dE.
- We did not check to see if the random sky patch has dE - they could have!
- Lets visually inspect some new candidates...
- New Candidates...
- Next:
- We are planning to update the catalogue of low mass dwarf galaxies to create the most complete and comprehensive survey of the local universe.
- A dataset that will be useful not only for galaxy physics but maybe also cosmology