Categories: Image

LTX 2 is amazing : LTX-2 in ComfyUI on RTX 3060 12GB

My setup: RTX 3060 12GB VRAM + 48GB system RAM.

I spent the last couple of days messing around with LTX-2 inside ComfyUI and had an absolute blast. I created short sample scenes for a loose spy story set in a neon-soaked, rainy Dhaka (cyberpunk/Bangla vibes with rainy streets, umbrellas, dramatic reflections, and a mysterious female lead).

Workflow : https://drive.google.com/file/d/1VYrKf7jq52BIi43mZpsP8QCypr9oHtCO/view
i forgot the username who shared it under a post. This workflow worked really well!

Each 8-second scene took about 12 minutes to generate (with synced audio). I queued up 70+ scenes total, often trying 3-4 prompt variations per scene to get the mood right. Some scenes were pure text-to-video, others image-to-video starting from Midjourney stills I generated for consistency.

Here’s a compilation of some of my favorite clips (rainy window reflections, coffee steam morphing into faces, walking through crowded neon markets, intense close-ups in the downpour):

i cleaned up the audio. it had some squeaky sounds.

Strengths that blew me away:

  1. Speed – Seriously fast for what it delivers, especially compared to other local video models.
  2. Audio sync is legitimately impressive. I tested illustration styles, anime-ish looks, realistic characters, and even puppet/weird abstract shapes – lip sync, ambient rain, subtle SFX/music all line up way better than I expected. Achieving this level of quality on just 12GB VRAM is wild.
  3. Handles non-realistic/abstract content extremely well – illustrations, stylized/puppet-like figures, surreal elements (like steam forming faces or exaggerated rain effects) come out coherent and beautiful.

Weaknesses / Things to avoid:

  1. Weird random zoom-in effects pop up sometimes – not sure if prompt-related or model quirk.
  2. Actions/motion-heavy scenes just don’t work reliably yet. Keep it to subtle movements, expressions, atmosphere, rain, steam, walking slowly, etc. – anything dynamic tends to break coherence.

Overall verdict: I literally couldn’t believe how two full days disappeared – I was having way too much fun iterating prompts and watching the queue. LTX-2 feels like a huge step forward for local audio-video gen, especially if you lean into atmospheric/illustrative styles rather than high-action.

submitted by /u/tanzim31
[link] [comments]

AI Generated Robotic Content

Share
Published by
AI Generated Robotic Content
Tags: ai images

Recent Posts

For very low resolution videos restoration, SeedVR2 is better than FlashVSR+ like 256px to 1024px

HD version is here since Reddit downscaled massively : https://youtube.com/shorts/WgGN2fqIPzo submitted by /u/CeFurkan [link] [comments]

14 hours ago

Can LLM Embeddings Improve Time Series Forecasting? A Practical Feature Engineering Approach

Using large language models (LLMs) — or their outputs, for that matter — for all…

14 hours ago

Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments

Large-scale commercial search systems optimize for relevance to drive successful sessions that help users find…

14 hours ago

Image upscale with Klein 9B

Prompt: upscale image and remove jpeg compression artifacts. Added few hours later: Please note that…

2 days ago

KV Caching in LLMs: A Guide for Developers

Language models generate text one token at a time, reprocessing the entire sequence at each…

2 days ago

Learnings from COBOL modernization in the real world

There’s a lot of excitement right now about AI enabling mainframe application modernization. Boards are…

2 days ago