| | My setup: RTX 3060 12GB VRAM + 48GB system RAM. I spent the last couple of days messing around with LTX-2 inside ComfyUI and had an absolute blast. I created short sample scenes for a loose spy story set in a neon-soaked, rainy Dhaka (cyberpunk/Bangla vibes with rainy streets, umbrellas, dramatic reflections, and a mysterious female lead). Workflow : https://drive.google.com/file/d/1VYrKf7jq52BIi43mZpsP8QCypr9oHtCO/view Each 8-second scene took about 12 minutes to generate (with synced audio). I queued up 70+ scenes total, often trying 3-4 prompt variations per scene to get the mood right. Some scenes were pure text-to-video, others image-to-video starting from Midjourney stills I generated for consistency. Here’s a compilation of some of my favorite clips (rainy window reflections, coffee steam morphing into faces, walking through crowded neon markets, intense close-ups in the downpour): i cleaned up the audio. it had some squeaky sounds. Strengths that blew me away:
Weaknesses / Things to avoid:
Overall verdict: I literally couldn’t believe how two full days disappeared – I was having way too much fun iterating prompts and watching the queue. LTX-2 feels like a huge step forward for local audio-video gen, especially if you lean into atmospheric/illustrative styles rather than high-action. submitted by /u/tanzim31 |
HD version is here since Reddit downscaled massively : https://youtube.com/shorts/WgGN2fqIPzo submitted by /u/CeFurkan [link] [comments]
Using large language models (LLMs) — or their outputs, for that matter — for all…
Large-scale commercial search systems optimize for relevance to drive successful sessions that help users find…
Prompt: upscale image and remove jpeg compression artifacts. Added few hours later: Please note that…
Language models generate text one token at a time, reprocessing the entire sequence at each…
There’s a lot of excitement right now about AI enabling mainframe application modernization. Boards are…