Categories: FAANG

Training Software Engineering Agents and Verifiers with SWE-Gym

We present SWE-Gym, the first environment for training real-world software engineering (SWE) agents. SWE-Gym contains 2,438 real-world Python task instances, each comprising a codebase with an executable runtime environment, unit tests, and a task specified in natural language. We use SWE-Gym to train language model based SWE agents, achieving up to 19% absolute gains in resolve rate on the popular SWE-Bench Verified and Lite test sets. We also experiment with inference-time scaling through verifiers trained on agent trajectories sampled from SWE-Gym. When combined with our fine-tuned SWE…
AI Generated Robotic Content

Recent Posts

We can finally watch TNG in 16:9

Somone posted an example of LTX 2.3 outpainting to expand 4:3 video to 16:9. I…

8 hours ago

The Complete Guide to Inference Caching in LLMs

Calling a large language model API at scale is expensive and slow.

8 hours ago

The Human Infrastructure: How Netflix Built the Operations Layer Behind Live at Scale

By: Brett Axler, Casper Choffat, and Alo LowryIn the three years since our first Live show,…

8 hours ago

Introducing granular cost attribution for Amazon Bedrock

As AI inference grows into a significant share of cloud spend, understanding who and what…

8 hours ago

OpenAI Executive Kevin Weil Is Leaving the Company

The former Instagram VP is departing the ChatGPT-maker, which is folding the AI science application…

9 hours ago