B Portal
LLMs have become more powerful at smaller sizes, but deploying them to edge devices like smartphones remains a massive challenge. Today, developers have to optimize across a sprawling combination of accelerators, operating systems, and countless System-on-a-Chip (SoC) configurations, often relying on manual testing with just a handful of devices. Google AI Edge Portal helps solve these challenges.
By letting developers test ML workloads across a fleet of over 120 representative Android device types, Google AI Edge Portal provides deep insight into latency and performance across all CPU, GPU, and NPU backends.
Today, we are excited to announce two new capabilities that expand Google AI Edge Portal’s capabilities for the generative AI era: benchmarking and debugging on-device LLMs. These new services give developers what they need to optimize generative AI performance accurately and efficiently across the entire Android ecosystem.
When a user interacts with an LLM-enabled experience in your app, they expect fast and consistent performance on their device. Common challenges like initialization time can result in your app appearing to freeze, or, in a worst case, crash completely if the model consumes all available memory.
With the latest release of Google AI Edge Portal, you can now run automated gen AI benchmarks directly on a physical lab of over 120 diverse Android devices and test for these scenarios specifically. Portal natively supports CPU and GPU benchmarking for LLMs in the LiteRT-LM format.
Customers can benchmark GenAI models on over 120 Android devices, viewing metrics including initialization time, prefill speed, decode speed, and peak memory usage.
When you trigger a gen AI benchmarking job with Portal, it profiles the critical metrics that dictate your end-users’ experience when interacting with your AI application on-device:
| Metric | What it measures | Why it matters to you |
| Initialization time | Measures how long it takes to load your model into memory. | High initialization time can result in delays, or freeze the user interface when your application starts up. |
| Prefill speed | Captures how fast the device processes prompt tokens to generate the first output token. | Dictates the initial delay before the user sees the first response. |
| Decode speed | Captures how fast the model generates tokens during a response. | Dictates the speed at which output is generated. |
| Peak memory | Monitors maximum RAM usage. | Flags potential “out of memory” crash risk, especially prevalent on memory constrained devices. |
With these insights, you can confidently decide which devices are ready to host your model and adjust or better optimize your LLMs for device targeting before shipping.
Benchmarking is only useful if you can fix the discovered performance issues. When an LLM performs poorly, finding the root cause within the complex graph of multiple layers and thousands of nodes is a daunting task for developers, involving tedious and time-consuming searching that can take hours if not days.
To bridge this gap, we have added the ability to visualize and compare model graphs in Portal with ease. Through the natively integrated Model Explorer, our graph visualization tool, you can search and locate specific nodes, compare models side-by-side in the same tab, and view tensor shapes, trace inputs and outputs, and more. To further speed up debugging for teams, we also added the ability to take screenshots and share specific views directly with your collaborators in Google Cloud.
These visualizations are one of the most effective ways to identify targets for optimization, including:
With Model Explorer, you can view model graphs, search for specific layers, and compare models side-by-side to debug performance.
With the era of LLMs on-device here, we are excited to help close the critical gap in benchmarking to bring the power of AI to the thousands of types of smartphones on the market today. To utilize these latest features, please complete our sign-up form here to express interest.
Google AI Edge Portal is currently available in private preview for allowlisted Google Cloud customers. During this private preview period, access is provided at no charge, subject to the preview terms. All current allowlisted customers will receive access to these new features automatically.
We can’t wait to see what gen AI capabilities you are able to deploy across the full spectrum of devices with Google AI Edge Portal!
Thank you to the members of the team, and collaborators for their contributions in making the advancements in this release possible: Akshat Sharma, Ami Kubota, Charlie Xu, Chunlei Niu, Cormac Brick, Derek Bekebrede, Eric Yang, Jing Jin, Kathleen Low, Matthias Grundmann, Marissa Ikonomidis, Na Li, Ram Iyengar, Sachin Kotwani, Sommayah Soliman, Tenghui Zhu, Xiaoming Hu, Zi Yuan
Depois de gerar vários prompts e combinar vários LoRas, tentei tudo o que você pode…
Here is the number that defines the current state of things:
Today, Amazon SageMaker AI introduces OpenAI-compatible API support for real-time inference endpoints. If you use…
The rocket company has set aside more than $500 million for potential litigation losses, in…
Patrick Traynor, Ph.D., has questions. When the professor and interim chair of the University of…
Official Link : Nvidia docs NVIDIA RTX 2-Pass Upscaler (4GB VRAM + 8GB RAM) Post:…