Categories: FAANG

Matrix3D: Large Photogrammetry Model All-in-One

We present Matrix3D, a unified model that performs several photogrammetry subtasks, including pose estimation, depth prediction, and novel view synthesis using just the same model. Matrix3D utilizes a multi-modal diffusion transformer (DiT) to integrate transformations across several modalities, such as images, camera parameters, and depth maps. The key to Matrix3D’s large-scale multi-modal training lies in the incorporation of a mask learning strategy. This enables full-modality model training even with partially complete data, such as bi-modality data of image-pose and image-depth pairs…
AI Generated Robotic Content

Recent Posts

Nava – A 6.3B audio-video model .

Page: https://ernie-research.github.io/NAVA/ Model: https://huggingface.co/ernie-research/NAVA Github: https://github.com/ernie-research/NAVA NAVA is a 6.3 B-parameter joint audio-video generator that…

21 hours ago

Enterprise Business Software and the Mixed-Up Chameleon Problem

Editor’s Note: This blog post was written by Greg Little, Senior Counselor at Palantir, with…

21 hours ago

High-Throughput Graph Abstraction at Netflix: Part I

By Oleksii Tkachuk, Kartik Sathyanarayanan, Rajiv ShringiIntroductionNetflix has a diverse range of graph use cases, each…

21 hours ago

Comprehensive observability for Amazon SageMaker AI LLM inference: From GPU utilization to LLM quality

Deploying large language models (LLMs) at scale on Amazon SageMaker AI Inference makes observability a…

21 hours ago

Cloud CISO Perspectives: How to build an AI-ready security program for the public sector

Welcome to the second Cloud CISO Perspectives for May 2026. Today, Usman Chaudhary, Field CISO,…

21 hours ago

24 Best Father’s Day Gifts for Dads (2026)

Dads are traditionally tough to shop for—let me help with these handpicked gift ideas for…

22 hours ago