From Data to Map: Visualizing Ames House Prices with Python

Geospatial visualization has become an essential tool for understanding and representing data in a geographical context. It plays a pivotal role in various real-world applications, from urban planning and environmental studies to real estate and transportation. For instance, city planners might use geospatial data to optimize public transportation routes, while real estate professionals could leverage …

matrixhero 1

Mixed-input matrix multiplication performance optimizations

Posted by Manish Gupta, Staff Software Engineer, Google Research AI-driven technologies are weaving themselves into the fabric of our daily routines, with the potential to enhance our access to knowledge and boost our overall productivity. The backbone of these applications lies in large language models (LLMs). LLMs are memory-intensive and typically require specialized hardware accelerators …

Co-ML: Collaborative Machine Learning Model Building for Developing Dataset Design Practices

Machine learning (ML) models are fundamentally shaped by data, and building inclusive ML systems requires significant considerations around how to design representative datasets. Yet, few novice-oriented ML modeling tools are designed to foster hands-on learning of dataset design practices, including how to design for data diversity and inspect for data quality. To this end, we …

How does data deduplication work?

Recent years have witnessed an explosion in the proliferation of self-storage units. These large, warehouse units have sprung up nationally as a booming industry because of one reason—the average person now has more possessions than they know what to do with. The same basic situation also plagues the world of IT. We’re in the midst …

ML 15932 image001 1

Benchmark and optimize endpoint deployment in Amazon SageMaker JumpStart 

When deploying a large language model (LLM), machine learning (ML) practitioners typically care about two measurements for model serving performance: latency, defined by the time it takes to generate a single token, and throughput, defined by the number of tokens generated per second. Although a single request to the deployed endpoint would exhibit a throughput …

Researchers harness large language models to accelerate materials discovery

Princeton researchers have created an artificial intelligence (AI) tool to predict the behavior of crystalline materials, a key step in advancing technologies such as batteries and semiconductors. Although computer simulations are commonly used in crystal design, the new method relies on a large language model, similar to those that power text generators like ChatGPT.