Categories: FAANG

TASER: Translation Assessment via Systematic Evaluation and Reasoning

We introduce TASER (Translation Assessment via Systematic Evaluation and Reasoning), a metric that uses Large Reasoning Models (LRMs) for automated translation quality assessment. TASER harnesses the explicit reasoning capabilities of LRMs to conduct systematic, step-by-step evaluation of translation quality. We evaluate TASER on the WMT24 Metrics Shared Task across both reference-based and reference-free scenarios, demonstrating state-of-the-art performance. In system-level evaluation, TASER achieves the highest soft pairwise accuracy in both reference-based and reference-free settings…

Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

Human evaluation is a critical component in machine translation system development and has received much attention in text translation research. However, little prior work exists on the topic of human evaluation for speech translation, which adds additional challenges such as noisy data and segmentation mismatches. We take first steps to…

June 15, 2024

In "FAANG"

Build reliable AI agents with Amazon Bedrock AgentCore Evaluations

April 1, 2026

In "FAANG"

Introducing agent evaluation in Vertex AI Gen AI evaluation service

January 25, 2025

In "FAANG"

AI Generated Robotic Content

Next Remember when hands and eyes used to be a problem? (Workflow included) »

Previous « Inside the AIPCon 8 Demos Redefining the Future of Enterprise AI

Share

Published by

AI Generated Robotic Content

Tags: ai/mlfaang

10 months ago

Recent Posts

AI/ML News

One of NASA’s Most Important Deep Space Observatories Hit by Spanish Wildfires

Flames burned through the Deep Space Communications Complex near Madrid, but NASA has been unable…

41 mins ago

AI/ML News

Get ready for mobile ‘stores on wheels.’ Research shows they can outperform traditional retail stores

As retailers increasingly embrace artificial intelligence (AI), robotics and autonomous vehicles, a new retail model…

41 mins ago

AI/ML Research

An Introduction to Loop Engineering

It's tempting to treat loop engineering as something invented in a single week in June,…

24 hours ago

FAANG

Best practices for applying Amazon Bedrock Guardrails to code generation workflows

This post continues our series on best practices with Amazon Bedrock Guardrails. For the previous…

24 hours ago

FAANG

The Blueprint: How Voicify makes AI-enabled ordering a delight for customers

Welcome to The Blueprint, a new feature where we highlight how Google Cloud customers are…

24 hours ago

AI/ML News

An FDA Panel Just Endorsed These Unproven Peptides

Outside experts—some with a vested interest in peptides—recommended adding a number of the amino acids…

1 day ago

L