Categories: AI/ML News

Programming light propagation creates highly efficient neural networks

Current artificial intelligence models utilize billions of trainable parameters to achieve challenging tasks. However, this large number of parameters comes with a hefty cost. Training and deploying these huge models require immense memory space and computing capability that can only be provided by hangar-sized data centers in processes that consume energy equivalent to the electricity needs of midsized cities.

Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models

Scaling the capacity of language models has consistently proven to be a reliable approach for improving performance and unlocking new capabilities. Capacity can be primarily defined by two dimensions: the number of model parameters and the compute per example. While scaling typically involves increasing both, the precise interplay between these…

January 29, 2025

In "FAANG"

The Super Weight in Large Language Models

Recent works have shown a surprising result: a small fraction of Large Language Model (LLM) parameter outliers are disproportionately important to the quality of the model. LLMs contain billions of parameters, so these small fractions, such as 0.01%, translate to hundreds of thousands of parameters. In this work, we present…

July 3, 2025

In "FAANG"

How to Grid Search Hyperparameters for PyTorch Models

The “weights” of a neural network is referred as “parameters” in PyTorch code and it is fine-tuned by optimizer during training. On the contrary, hyperparameters are the parameters of a neural network that is fixed by design and not tuned by training. Examples are the number of hidden layers and…

February 10, 2023

In "AI/ML Research"

AI Generated Robotic Content