On GPT models, this paper with this interesting term “AstraPT” caught my eye unintentionally, and I started reading this interesting article about AstroPT(Astra Pretrained Transformer), found a few paper on this as well. So I thought why don’t I share on web which I just understood.
AstroPT represents a pivotal advancement in leveraging machine learning for astronomical data processing. By harnessing large datasets and fostering community collaboration, this initiative aims to bridge gaps between observational sciences and artificial intelligence, paving the way for future innovations in both fields.
Researchers have developed a new artificial intelligence model called AstroPT that can learn meaningful information about galaxies just by looking at images. This breakthrough could help solve the “token crisis” facing large language models and lead to more powerful AI systems that can understand both text and visual data from scientific observations. The AstroPT model, developed by a team led by Michael J. Smith, was trained on over 8 million galaxy images from the DESI Legacy Survey. Unlike traditional computer vision approaches that require labeled data, AstroPT learned in an unsupervised way by predicting patches of galaxy images.
“We demonstrated that simple generative autoregressive models can learn scientifically useful information when pre-trained on the surrogate task of predicting the next 16 × 16 pixel patch in a sequence of galaxy image patches,” the researchers write. Importantly, the team found that AstroPT’s performance on downstream tasks like estimating a galaxy’s mass or classifying its shape improved predictably as the model was trained on more data. This mirrors the “scaling laws” seen in large language models, suggesting astronomical data could be combined with text data to create more capable multimodal AI systems.
The researchers tested versions of AstroPT ranging from 1 million to 2.1 billion parameters. They found performance improvements started to saturate around 89 million parameters for their dataset. Larger models were able to learn more nuanced information – for example, the ability to estimate stellar mass and classify tight spiral galaxies emerged suddenly in models with 12 million parameters or more. To evaluate what AstroPT learned, the researchers used a technique called linear probing. This involves training a simple linear model to predict galaxy properties from AstroPT’s internal representations. They found AstroPT could extract meaningful information about properties like a galaxy’s color, redshift, star formation rate, and morphological classification.
Interestingly, AstroPT also seemed to pick up on subtle differences between telescopes used in the DESI Legacy Survey. The researchers suggest this instrumental variation could be removed through additional fine-tuning if desired. The AstroPT architecture was deliberately designed to be similar to leading language models like GPT-2. The researchers hope this will allow astronomers to benefit from rapid progress in natural language AI. They’ve released their code, model weights, and dataset openly to encourage further research. “We believe that collaborative community development paves the fastest route towards realising an open source web-scale large observation model,” they write, inviting other researchers to join the project.
While this study focused on galaxy images, the researchers expect similar models could be applied to other types of scientific observations like stellar spectra or climate data. This could help train more powerful “Large Observation Models” that understand multiple scientific domains.The development of AstroPT required significant computing power, generating an estimated 120 kg of CO2 emissions. To avoid unnecessary recomputation, the researchers have made their trained models freely available. With the ability to extract scientific meaning from raw observational data, models like AstroPT could accelerate research across many fields. They may also help address the limited supply of high-quality text data for training large language models. By incorporating the vast amounts of observational data collected by scientists, future AI systems may develop a deeper understanding of both human knowledge and the natural world.
Reference:
AstroPT: Scaling Large Observation Models for Astronomy
Development of aeroelastic analysis methods for turborotors and propfans, including mistuning