Zohreh Norouz

Run Generative AI inference with Amazon Bedrock in Asia Pacific (New Zealand)

Kia ora! Customers in New Zealand have been asking for access to foundation models (FMs) on Amazon Bedrock from their local AWS Region. Today, we’re excited to announce that Amazon Bedrock is now available in the Asia Pacific (New Zealand) Region (ap-southeast-6). Customers in New Zealand can now access Anthropic Claude models (Claude Opus 4.5, …

The new AI literacy: Insights from student developers

AI has made it easier than ever for student developers to work efficiently, tackle harder problems, and pursue ambitious projects. But for students earning technical degrees, these new capabilities also create genuine tensions around learning.  How much should I use AI? What should I use it for?  As 90% of technology professionals now use AI …

Exclusive Self Attention

We introduce exclusive self attention (XSA), a simple modification of self attention (SA) that improves Transformer’s sequence modeling performance. The key idea is to constrain attention to capture only information orthogonal to the token’s own value vector (thus excluding information of self position), encouraging better context modeling. Evaluated on the standard language modeling task, XSA …

ml 20634 image 1

Unlocking video insights at scale with Amazon Bedrock multimodal models

Video content is now everywhere, from security surveillance and media production to social platforms and enterprise communications. However, extracting meaningful insights from large volumes of video remains a major challenge. Organizations need solutions that can understand not only what appears in a video, but also the context, narrative, and underlying meaning of the content. In …

DRA Blog Diagrammax 1000x1000 1

DRA: A new era of Kubernetes device management with Dynamic Resource Allocation

The explosion of large language models (LLMs) has increased demand for high-performance accelerators like GPUs and TPUs. As organizations scale their AI capabilities, the scarcity of compute resources is sometimes the primary bottleneck. Efficiently managing every GPU and TPU cycle is no longer just a recommendation — it’s an operational necessity. Kubernetes is becoming the …