Training Software Engineering Agents and Verifiers with SWE-Gym

We present SWE-Gym, the first environment for training real-world software engineering (SWE) agents. SWE-Gym contains 2,438 real-world Python task instances, each comprising a codebase with an executable runtime environment, unit tests, and a task specified in natural language. We use SWE-Gym to train language model based SWE agents, achieving up to 19% absolute gains in …

19864 1

Iterative fine-tuning on Amazon Bedrock for strategic model improvement

Organizations often face challenges when implementing single-shot fine-tuning approaches for their generative AI models. The single-shot fine-tuning method involves selecting training data, configuring hyperparameters, and hoping the results meet expectations without the ability to make incremental adjustments. Single-shot fine-tuning frequently leads to suboptimal results and requires starting the entire process from scratch when improvements are …

Announcing prompt management in the Vertex AI SDK

As generative AI applications grow in sophistication, development workflows become more fragmented. Although AI can be a force multiplier, teams may design prompts in one environment, manage versions in spreadsheets or text files, and then manually integrate them into their code. This leads to inefficiencies, versioning chaos, and collaboration bottlenecks.  Vertex AI Studio is designed …

How Anthropic’s ‘Skills’ make Claude faster, cheaper, and more consistent for business workflows

Anthropic launched a new capability on Thursday that allows its Claude AI assistant to tap into specialized expertise on demand, marking the company’s latest effort to make artificial intelligence more practical for enterprise workflows as it chases rival OpenAI in the intensifying competition over AI-powered software development. The feature, called Skills, enables users to create …