Categories: FAANG

Matrix3D: Large Photogrammetry Model All-in-One

We present Matrix3D, a unified model that performs several photogrammetry subtasks, including pose estimation, depth prediction, and novel view synthesis using just the same model. Matrix3D utilizes a multi-modal diffusion transformer (DiT) to integrate transformations across several modalities, such as images, camera parameters, and depth maps. The key to Matrix3D’s large-scale multi-modal training lies in the incorporation of a mask learning strategy. This enables full-modality model training even with partially complete data, such as bi-modality data of image-pose and image-depth pairs…
AI Generated Robotic Content

Recent Posts

We can finally watch TNG in 16:9

Somone posted an example of LTX 2.3 outpainting to expand 4:3 video to 16:9. I…

7 hours ago

The Complete Guide to Inference Caching in LLMs

Calling a large language model API at scale is expensive and slow.

7 hours ago

The Human Infrastructure: How Netflix Built the Operations Layer Behind Live at Scale

By: Brett Axler, Casper Choffat, and Alo LowryIn the three years since our first Live show,…

7 hours ago

Introducing granular cost attribution for Amazon Bedrock

As AI inference grows into a significant share of cloud spend, understanding who and what…

7 hours ago

OpenAI Executive Kevin Weil Is Leaving the Company

The former Instagram VP is departing the ChatGPT-maker, which is folding the AI science application…

8 hours ago