Categories: Image

A method to turn a video into a 360° 3D VR panorama video

I started working on this with the goal of eventually producing an FMV VR video game. At first, I thought that training a WAN panorama LoRA would be the easy solution, but the very high resolution required for VR means it cannot be the ultimate answer. Also, almost all new models are designed for perspective videos; for example, if you try to animate a character’s mouth on a panorama, it will not work properly unless the model was trained on panoramic images. So to be able to use any existing models in the workflow, the best technical solution was to work with a normal video first, and only then convert it to VR.

I thought this would be simple, but very quickly the obvious ideas started to hit hard limits with the models that are currently available. What I describe below is the result of weeks of research to get something that actually works in the current technical ecosystem.

Step 1: Convert the video to a spherical mapping with a mask for outpainting.

Step 1 is to convert the video into a spherical mapping and add a mask around it to inpaint the missing areas. To make this step work, you need to know the camera intrinsics. I tested all the repos I could find to estimate these, and the best so far is GeoCalib: you just input the first frame and it gives you pretty accurate camera settings. I have not turned that repo into a node yet, because the online demo is already well done.

Using these camera intrinsics, I created a custom node that converts the video into a spherical projection that becomes part of a larger panorama. Depending on the camera intrinsics, the size of the projected video can vary a lot. You can already find this node on the Patreon I just created. Since this part is pretty straightforward, the node is basically ready to go and should adapt to all videos.

Step 2: Panorama outpainting for fixed‑camera videos (work in progress).

This is where it gets tricky, and for now I will not release this part of the workflow because it is not yet ready to adapt to all kinds of videos. It is important that the input is not shaky; camera shake has no real purpose in a VR context anyway, so you want the input to be perfectly stable. The method explained below is only for a fixed camera; if the camera moves in space, it will require training a WAN LoRA. Hopefully this LoRA/paper will be released at some point to help here.

For a fixed camera, you can in theory just take the panoramic video/mask from Step1, and run it through a VACE inpainting workflow. But in my tests, the results were not perfect and would need a proper fixed camera video panorama LoRA, which does not exist yet, to help the stability. So instead, what I do is:

Inpaint the first frame only (with Qwen Edit or Flux Fill) and make sure this first frame is perfect.
Then use this new first frame as first frame input in an inpainting VACE workflow for the whole video.
Do one or two extra passes, re‑inputting the source video/mask in the middle of each upscaling pass to keep things faithful to the original footage.

At the moment, this step is not yet working “off the shelf” for any videos (if there are a lot of background elements moving for example), so I plan to work on it more because the goal is to release a one‑click workflow. I will also add a way to handle longer videos (with SVI or Painter‑LongVideo).

Step 3: Compute depth for the panorama.

Next, we need to calculate the depth of the panorama video. A panorama is basically many images stitched together, so you cannot just use Depth Anything directly and expect good results. In my case, the best solution was to use MOGE2 in a custom node and modify the node to work with panoramas, following a method that was originally explained for MOGE1.

This worked well overall, but there were big differences between frames. I took inspiration from the VideoDepthAnything paper to implement something to help with temporal consistency. It does not feel completely perfect yet, but it is getting there. I will release this node as soon as possible.

Step 4: Generate stereoscopic 360° from panorama + depth.

Now that we have a monoscopic panoramic video and its depth map, we can create the stereoscopic final video for VR. The custom node I created distorts the video in a spherical way adapted to panoramas and creates holes in a few regions. At first, I output masks for these holes (as shown at the end of the example video), ready to be filled by inpainting. But so far, I have not found any inpainting workflow that works perfectly here. as the holes are too small and changing a lot between frames.

So for the moment, what I do is:

Mask the very high‑depth element (the character, in my example) and remove it from the video to get a background‑only video.
Recalculate the depth for this background‑only video.
Merge everything back together in a custom node, using the full video, the full‑video depth, the background depth, and the character mask.

This worked great for my test video, but it feels limited to this specific type of scene, and I still need to work on handling all kinds of scenarios.

—

Right now this is a proof of concept. It works great for my use case, but it will not work well for everyone or for every type of video yet. So what I have done is upload the first step (which works 100%) to this new Patreon page: https://patreon.com/hybridworkflow.

If many people are interested, I will do my best to release the next steps as soon as possible. I do not want to release anything that does not work reliably across scenarios, so it might take a bit of time but we’ll get there, especially if people bring new ideas here to help bypass the current limitations!

submitted by /u/supercarlstein
[link] [comments]

AI Generated Robotic Content