Cutting-edge vision chip brings human eye-like perception to machines
With the rapid advancement of artificial intelligence, unmanned systems such as autonomous driving and embodied intelligence are continuously being promoted and applied in real-world scenarios, leading to a new wave of technological revolution and industrial transformation. Visual perception, a core means of information acquisition, plays a crucial role in these intelligent systems. However, achieving efficient, precise, and robust visual perception in dynamic, diverse, and unpredictable environments remains an open challenge.
Using generative artificial intelligence, a team of researchers at The University of Texas at Austin has converted sounds from audio recordings into street-view images. The visual accuracy of these generated images demonstrates that machines can replicate human connection between audio and visual perception of environments.
Despite significant progress in developing AI systems that can understand the physical world like humans do, researchers have struggled with modeling a certain aspect of our visual system: the perception of light.
Self-driving cars occasionally crash because their visual systems can't always process static or slow-moving objects in 3D space. In that regard, they're like the monocular vision of many insects, whose compound eyes provide great motion-tracking and a wide field of view but poor depth perception.