Categories: Image

LTX 2 is amazing : LTX-2 in ComfyUI on RTX 3060 12GB

My setup: RTX 3060 12GB VRAM + 48GB system RAM.

I spent the last couple of days messing around with LTX-2 inside ComfyUI and had an absolute blast. I created short sample scenes for a loose spy story set in a neon-soaked, rainy Dhaka (cyberpunk/Bangla vibes with rainy streets, umbrellas, dramatic reflections, and a mysterious female lead).

Workflow : https://drive.google.com/file/d/1VYrKf7jq52BIi43mZpsP8QCypr9oHtCO/view
i forgot the username who shared it under a post. This workflow worked really well!

Each 8-second scene took about 12 minutes to generate (with synced audio). I queued up 70+ scenes total, often trying 3-4 prompt variations per scene to get the mood right. Some scenes were pure text-to-video, others image-to-video starting from Midjourney stills I generated for consistency.

Here’s a compilation of some of my favorite clips (rainy window reflections, coffee steam morphing into faces, walking through crowded neon markets, intense close-ups in the downpour):

i cleaned up the audio. it had some squeaky sounds.

Strengths that blew me away:

  1. Speed – Seriously fast for what it delivers, especially compared to other local video models.
  2. Audio sync is legitimately impressive. I tested illustration styles, anime-ish looks, realistic characters, and even puppet/weird abstract shapes – lip sync, ambient rain, subtle SFX/music all line up way better than I expected. Achieving this level of quality on just 12GB VRAM is wild.
  3. Handles non-realistic/abstract content extremely well – illustrations, stylized/puppet-like figures, surreal elements (like steam forming faces or exaggerated rain effects) come out coherent and beautiful.

Weaknesses / Things to avoid:

  1. Weird random zoom-in effects pop up sometimes – not sure if prompt-related or model quirk.
  2. Actions/motion-heavy scenes just don’t work reliably yet. Keep it to subtle movements, expressions, atmosphere, rain, steam, walking slowly, etc. – anything dynamic tends to break coherence.

Overall verdict: I literally couldn’t believe how two full days disappeared – I was having way too much fun iterating prompts and watching the queue. LTX-2 feels like a huge step forward for local audio-video gen, especially if you lean into atmospheric/illustrative styles rather than high-action.

submitted by /u/tanzim31
[link] [comments]

AI Generated Robotic Content

Share
Published by
AI Generated Robotic Content
Tags: ai images

Recent Posts

We may have a new SOTA open-source model: ERNIE-Image Comparisons

Base model is definitely SOTA, can even easily compete with closed-source ones in terms of…

6 hours ago

Navigating the generative AI journey: The Path-to-Value framework from AWS

Generative AI is reshaping how organizations approach productivity, customer experiences, and operational capabilities. Across industries,…

6 hours ago

The Surprising MacBook Neo Competitor You’ve Never Heard Of

In many ways, the HP OmniBook 5 is a better budget laptop than the MacBook…

7 hours ago

Tiny cameras in earbuds let users talk with AI about what they see

University of Washington researchers developed the first system that incorporates tiny cameras in off-the-shelf wireless…

7 hours ago

Update: Distilled v1.1 is live

We've pushed an LTX-2.3 update today. The Distilled model has been retrained (now v1.1) with…

1 day ago

How to Implement Tool Calling with Gemma 4 and Python

The open-weights model ecosystem shifted recently with the release of the

1 day ago