Categories: Image

LTX 2 is amazing : LTX-2 in ComfyUI on RTX 3060 12GB

My setup: RTX 3060 12GB VRAM + 48GB system RAM.

I spent the last couple of days messing around with LTX-2 inside ComfyUI and had an absolute blast. I created short sample scenes for a loose spy story set in a neon-soaked, rainy Dhaka (cyberpunk/Bangla vibes with rainy streets, umbrellas, dramatic reflections, and a mysterious female lead).

Workflow : https://drive.google.com/file/d/1VYrKf7jq52BIi43mZpsP8QCypr9oHtCO/view
i forgot the username who shared it under a post. This workflow worked really well!

Each 8-second scene took about 12 minutes to generate (with synced audio). I queued up 70+ scenes total, often trying 3-4 prompt variations per scene to get the mood right. Some scenes were pure text-to-video, others image-to-video starting from Midjourney stills I generated for consistency.

Here’s a compilation of some of my favorite clips (rainy window reflections, coffee steam morphing into faces, walking through crowded neon markets, intense close-ups in the downpour):

i cleaned up the audio. it had some squeaky sounds.

Strengths that blew me away:

  1. Speed – Seriously fast for what it delivers, especially compared to other local video models.
  2. Audio sync is legitimately impressive. I tested illustration styles, anime-ish looks, realistic characters, and even puppet/weird abstract shapes – lip sync, ambient rain, subtle SFX/music all line up way better than I expected. Achieving this level of quality on just 12GB VRAM is wild.
  3. Handles non-realistic/abstract content extremely well – illustrations, stylized/puppet-like figures, surreal elements (like steam forming faces or exaggerated rain effects) come out coherent and beautiful.

Weaknesses / Things to avoid:

  1. Weird random zoom-in effects pop up sometimes – not sure if prompt-related or model quirk.
  2. Actions/motion-heavy scenes just don’t work reliably yet. Keep it to subtle movements, expressions, atmosphere, rain, steam, walking slowly, etc. – anything dynamic tends to break coherence.

Overall verdict: I literally couldn’t believe how two full days disappeared – I was having way too much fun iterating prompts and watching the queue. LTX-2 feels like a huge step forward for local audio-video gen, especially if you lean into atmospheric/illustrative styles rather than high-action.

submitted by /u/tanzim31
[link] [comments]

AI Generated Robotic Content

Share
Published by
AI Generated Robotic Content
Tags: ai images

Recent Posts

Deni Avdija in Space Jam with LTX-2 I2V + iCloRA. Flow included

made a short video with LTX-2 using an iCloRA Flow to recreate a Space Jam…

10 hours ago

How PARTs Assemble into Wholes: Learning the Relative Composition of Images

The composition of objects and their parts, along with object-object positional relationships, provides a rich…

10 hours ago

Structured outputs on Amazon Bedrock: Schema-compliant AI responses

Today, we’re announcing structured outputs on Amazon Bedrock—a capability that fundamentally transforms how you can…

10 hours ago

How we cut Vertex AI latency by 35% with GKE Inference Gateway

As generative AI moves from experimentation to production, platform engineers face a universal challenge for…

10 hours ago

ICE Agent’s ‘Dragging’ Case May Help Expose Evidence in Renee Good Shooting

The government has withheld details of the investigation of Renee Good’s killing—but an unrelated case…

11 hours ago

Scientists create smart synthetic skin that can hide images and change shape

Inspired by the shape-shifting skin of octopuses, Penn State researchers developed a smart hydrogel that…

11 hours ago