Training Software Engineering Agents and Verifiers with SWE-Gym

We present SWE-Gym, the first environment for training real-world software engineering (SWE) agents. SWE-Gym contains 2,438 real-world Python task instances, each comprising a codebase with an executable runtime environment, unit tests, and a task specified in natural language. We use SWE-Gym to train language model based SWE agents, achieving up to 19% absolute gains in …

19864 1

Iterative fine-tuning on Amazon Bedrock for strategic model improvement

Organizations often face challenges when implementing single-shot fine-tuning approaches for their generative AI models. The single-shot fine-tuning method involves selecting training data, configuring hyperparameters, and hoping the results meet expectations without the ability to make incremental adjustments. Single-shot fine-tuning frequently leads to suboptimal results and requires starting the entire process from scratch when improvements are …

Announcing prompt management in the Vertex AI SDK

As generative AI applications grow in sophistication, development workflows become more fragmented. Although AI can be a force multiplier, teams may design prompts in one environment, manage versions in spreadsheets or text files, and then manually integrate them into their code. This leads to inefficiencies, versioning chaos, and collaboration bottlenecks.  Vertex AI Studio is designed …

How Anthropic’s ‘Skills’ make Claude faster, cheaper, and more consistent for business workflows

Anthropic launched a new capability on Thursday that allows its Claude AI assistant to tap into specialized expertise on demand, marking the company’s latest effort to make artificial intelligence more practical for enterprise workflows as it chases rival OpenAI in the intensifying competition over AI-powered software development. The feature, called Skills, enables users to create …

S2AYB8aYLeKHoEniZwOliQkCFrrlCJQ8HlkNiMuo6 A

An experiment with “realism” with Wan2.2 that are safe for work images

Got bored seeing the usual women pics every time I opened this sub so decided to make something a little friendlier for the work place. I was loosely working to a theme of “Scandinavian Fishing Town” and wanted to see how far I could get making them feel “realistic”. Yes I am aware there’s all …

Agentic RAG for Software Testing with Hybrid Vector-Graph and Multi-Agent Orchestration

We present an approach to software testing automation using Agentic Retrieval-Augmented Generation (RAG) systems for Quality Engineering (QE) artifact creation. We combine autonomous AI agents with hybrid vector-graph knowledge systems to automate test plan, case, and QE metric generation. Our approach addresses traditional software testing limitations by leveraging LLMs such as Gemini and Mistral, multi-agent …

hyperbadge abhixbh 100x135 1

Transforming enterprise operations: Four high-impact use cases with Amazon Nova

Since the launch of Amazon Nova at AWS re:Invent 2024, we have seen adoption trends across industries, with notable gains in operational efficiency, compliance, and customer satisfaction. With its capabilities in secure, multimodal AI and domain customization, Nova is enhancing workflows and enabling cost efficiencies across core use cases. In this post, we share four …