| | Last week I built a local pipeline where a state machine + LLM watches my security cam and yells at Amazon drivers peeing on my house. State machine is the magic: it flips the system from passive (just watching) to active (video/audio ingest + ~1s TTS out) only when a trigger hits. Keeps things deterministic and way more reliable than letting the LLM run solo. LLM handles the fuzzy stuff (vision + reasoning) while the state machine handles control flow. Together it’s solid. Could just as easily be swapped to spot trespassing, log deliveries, or recognize gestures. TL;DR: gave my camera a brain and a mouth + a state machines to keep it focused. Repo in comments to see how it’s wired up. submitted by /u/Weary-Wing-6806 |
None of the video gen models do a real CRT terminal animation look. Weights +…
Zero-shot text classification is a way to label text without first training a classifier on…
GRASP is a new gradient-based planner for learned dynamics (a “world model”) that makes long-horizon…
Recent work has shown that probing model internals can reveal a wealth of information not…
As the demand for generative AI continues to grow, developers and enterprises seek more flexible,…
An autonomous robot from the company Honor ran a half marathon in 50:26, beating the…