LLM for natural science | Evgeny Smirnov

Recently, my article on the use of LLM in astronomy was in The Astrophysical Journal. If you want to read more about the study, there is a link to the article at the end. Here I want to tell you why I think it is important for everyone involved in science.

We all know that large language models (LLMs) have excelled at working with texts and images. That’s why it is usually assumed that LLMs are suitable mostly for social sciences and humanities. But can they be useful for researchers in natural sciences? My experience says — yes.

We deal with a massive volume of data in scientific research, especially in fields like astronomy. Traditionally, we use machine learning (ML) techniques to analyse this data, which involves complex algorithms and extensive training. This process can be incredibly time-consuming and resource-intensive. Moreover, it often requires astronomers to design, develop, and implement complex algorithms. And, as you might know, there are not too many astronomers. The challenge is to find a way to make this process more efficient and accessible.

This is where large language models come into play. Unlike traditional ML methods, LLMs like GPT-4 can understand and process natural language inputs without needing much training or fine-tuning. Essentially, they can learn and adapt quickly, making them incredibly versatile tools for researchers.

I was particularly interested in seeing how GPT-4 could help with classifying asteroids to determine whether or not they have chaotic behaviour. These are essentially patterns in the way asteroids orbit the sun, and accurately classifying them is crucial for our understanding of the solar system.

The Experiment and Results

I had the following problem: if you want to find out whether an asteroid has chaotic behaviour, you have to analyse an evolution of a special variable called ‘resonant angle’. Sometimes, it behaves like sinus (period low-amplitude) oscillations. Sometimes, there is no pattern. Therefore, the overall goal is simple: distinguish these two possible scenarios.

You can imagine that for a human being, it’s pretty easy to do: all you need is to briefly assess the image. However, automatic procedure is complex (digital filtering, FFT, Lomb-Scargle periodograms, etc.). Fortunately, it is not a problem anymore. I could use LLMs.

I fed GPT-4 images of resonant angles from asteroids and asked it to classify them. The results were nothing short of amazing. GPT-4 was able to classify the asteroids with near-perfect accuracy. It distinguished between different resonant angle’s behaviour, achieving accuracy rates close to 100%.

To put this in perspective, traditional methods typically achieve around 80-90% accuracy and require a lot of preprocessing and specialised algorithms. GPT-4 did all this without the need for extensive training or complex algorithms.

What This Means for Science

These findings have huge implications for scientific research. The success of GPT-4 in classifying asteroids highlights the broader potential of LLMs across various scientific fields.

The traditional approach requires developing complex algorithms, which require a lot of efforts by astronomers. With LLMs, it can be done within 15 minutes.
The regular algorithms are not accurate: the real images have multiple random factors. Humans can easily remove them. Automatic removal of artefacts is challenging. However, LLMs act as human beings. Thus, they can achieve 100% accuracy.

Besides that, using AI reduces the need for large datasets and heavy computational power, making advanced research tools more accessible to scientists everywhere. This could lead to faster, more accurate discoveries not just in astronomy, but in areas like biology, environmental science, and more.

If you want to dive deeper into this study, check out my full article: “Fast, Simple, and Accurate Time Series Analysis with Large Language Models: An Example of Mean-motion Resonances Identification”