Categories: FAANG

Scalable Pre-training of Large Autoregressive Image Models

This paper introduces AIM, a collection of vision models pre-trained with an autoregressive objective. These models are inspired by their textual counterparts, i.e., Large Language Models (LLMs), and exhibit similar scaling properties. Specifically, we highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, (2) the value of the objective function correlates with the performance of the model on downstream tasks. We illustrate the practical implication of these findings by pre-training a 7 billion parameter AIM on 2…

Multimodal Autoregressive Pre-Training of Large Vision Encoders

*Equal Contributors A dominant paradigm in large multimodal models is to pair a large language de- coder with a vision encoder. While it is well-known how to pre-train and tune language decoders for multimodal tasks, it is less clear how the vision encoder should be pre-trained. A de facto standard…

November 23, 2024

In "FAANG"

Non-Autoregressive Neural Machine Translation: A Call for Clarity

Non-autoregressive approaches aim to improve the inference speed of translation models by only requiring a single forward pass to generate the output sequence instead of iteratively producing each predicted token. Consequently, their translation quality still tends to be inferior to their autoregressive counterparts due to several issues involving output token…

October 28, 2022

In "FAANG"

Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models

Speech foundation models, such as HuBERT and its variants, are pre-trained on large amounts of unlabeled speech data and then used for a range of downstream tasks. These models use a masked prediction objective, where the model learns to predict information about masked input segments from the unmasked context. The…

March 14, 2025

In "FAANG"

AI Generated Robotic Content

Next Conversion Marketing Best Practices and Strategies Explained »

Previous « 5 signs you need a premium DNS service

Published by

AI Generated Robotic Content

Tags: ai/mlfaang

2 years ago

Anima – Sharing Some Prompts and Results

Been experimenting with Anima lately and ended up spending way too much time refining prompts.…

20 hours ago

AI/ML News

Keychron K2 HE Concrete Edition Review: Rock-Solid Typing

Keychron's K2 HE Concrete Edition sounds like a cute gimmick, but as I discovered, there's…

21 hours ago

AI/ML News

AI generates full battery electrolyte recipes, matching top lithium metal battery performance

Battery electrolytes aren't just one chemical, but a complex mixture of salts, solvents, and additives…

21 hours ago

Image

Nava – A 6.3B audio-video model .

Page: https://ernie-research.github.io/NAVA/ Model: https://huggingface.co/ernie-research/NAVA Github: https://github.com/ernie-research/NAVA NAVA is a 6.3 B-parameter joint audio-video generator that…

2 days ago