Tag: benchmark
All the articles with the tag "benchmark".
New paper in Nature's Scientific Reports: benchmarking multimodal LLMs on dynamical astronomy
Posted on:April 5, 2026 at 10:00 AMA new paper in Scientific Reports: a systematic benchmark of multimodal LLMs — commercial and open-source, large and small — on a real astronomical classification problem. Commercial models hit F1 = 100% on simple cases; even small local open-source models reach surprisingly useful accuracy.