Categories: FAANG

SO-Bench: A Structural Output Evaluation of Multimodal LLMs

Multimodal large language models (MLLMs) are increasingly deployed in real-world, agentic settings where outputs must not only be correct, but also conform to predefined data schemas. Despite recent progress in structured generation in textual domain, there is still no benchmark that systematically evaluates schema-grounded information extraction and reasoning over visual inputs. In this work, we conduct a comprehensive study of visual structural output capabilities for MLLMs with our carefully designed SO-Bench benchmark. Covering four visual domains, including UI screens, natural images…
AI Generated Robotic Content

Recent Posts

RELEASE – The model you’ve all been waiting for – Smartphone Snapshot Photo Reality v13 – OMEGA

This is a LoRA for FLUX Klein Base 9b. **Link: https://civitai.red/models/2381927/flux2-klein-base-9b-smartphone-snapshot-photo-reality-style** All infos on how…

22 hours ago

Asus Zenbook A16 (2026) Review: Savor the Power, Ignore the Beige

This $2,000 Asus laptop delivers breathtaking performance thanks to Qualcomm's Snapdragon X2 Elite Extreme, but…

23 hours ago

The realism is getting out of hand

ComfyUI with ZIT submitted by /u/Ferwien [link] [comments]

2 days ago

Tovala Family Meals Review: Good Food, Lots of Salt

Tovala is a meal kit that comes with a smart oven, or a smart oven…

2 days ago

Open weight (and closed) Models with character sheet inputs

Now that we have some open weight models available to us that work with character…

3 days ago

Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents

This paper was accepted at the Fifth Workshop on Natural Language Generation, Evaluation, and Metrics…

3 days ago