Categories: FAANG

AXLearn: Modular Large Model Training on Heterogeneous Infrastructure

We design and implement AXLearn, a production deep learning system that facilitates scalable and high-performance training of large deep learning models. Compared to other state-of-art deep learning systems, AXLearn has a unique focus on modularity and support for heterogeneous hardware infrastructure. AXLearn’s internal interfaces between software components follow strict encapsulation, allowing different components to be assembled to facilitate rapid model development and experimentation on heterogeneous compute infrastructure. We introduce a novel method of quantifying modularity via…
AI Generated Robotic Content

Recent Posts

Word Embeddings for Tabular Data Feature Engineering

It would be difficult to argue that word embeddings — dense vector representations of words…

3 hours ago

Advanced fine-tuning methods on Amazon SageMaker AI

This post provides the theoretical foundation and practical insights needed to navigate the complexities of…

3 hours ago

How Jina AI built its 100-billion-token web grounding system with Cloud Run GPUs

Editor’s note: The Jina AI Reader is a specialized tool that transforms raw web content…

3 hours ago

A Gaming GPU Helps Crack the Code on a Thousand-Year Cultural Conversation

Ceramics — the humble mix of earth, fire and artistry — have been part of…

3 hours ago

Moonshot AI’s Kimi K2 outperforms GPT-4 in key benchmarks — and it’s free

Chinese AI startup Moonshot releases open-source Kimi K2 model that outperforms OpenAI and Anthropic on…

4 hours ago