I unintentionally scared myself by using the I2V generation model

While experimenting with the video generation model, I had the idea of taking a picture of my room and using it in the ComfyUI workflow. I thought it could be fun. So, I decided to take a photo with my phone and transfer it to my computer. Apart from the furniture and walls, nothing else …

Benchmarking hallucinations: New metric tracks where multimodal reasoning models go wrong

Over the past decades, computer scientists have introduced increasingly sophisticated machine learning-based models, which can perform remarkably well on various tasks. These include multimodal large language models (MLLMs), systems that can process and generate different types of data, predominantly texts, images and videos.