Categories: FAANG

SO-Bench: A Structural Output Evaluation of Multimodal LLMs

Multimodal large language models (MLLMs) are increasingly deployed in real-world, agentic settings where outputs must not only be correct, but also conform to predefined data schemas. Despite recent progress in structured generation in textual domain, there is still no benchmark that systematically evaluates schema-grounded information extraction and reasoning over visual inputs. In this work, we conduct a comprehensive study of visual structural output capabilities for MLLMs with our carefully designed SO-Bench benchmark. Covering four visual domains, including UI screens, natural images…
AI Generated Robotic Content

Recent Posts

How Harmonic Security improved their data-leakage detection system with low-latency fine-tuned models using Amazon SageMaker, Amazon Bedrock, and Amazon Nova Pro

This post was written with Bryan Woolgar-O’Neil, Jamie Cockrill and Adrian Cunliffe from Harmonic Security…

17 hours ago

How we built a multi-agent system for superior business forecasting

In today's dynamic business environment, accurate forecasting is the bedrock of efficient operations. Yet, businesses…

17 hours ago

Scientists reveal a tiny brain chip that streams thoughts in real time

BISC is an ultra-thin neural implant that creates a high-bandwidth wireless link between the brain…

2 days ago

Deepening our partnership with the UK AI Security Institute

Google DeepMind and UK AI Security Institute (AISI) strengthen collaboration on critical AI safety and…

2 days ago

Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling

Standard discrete diffusion models treat all unobserved states identically by mapping them to an absorbing…

2 days ago

Implement automated smoke testing using Amazon Nova Act headless mode

Automated smoke testing using Amazon Nova Act headless mode helps development teams validate core functionality…

2 days ago