Improve training time of distributed machine learning with NCCL Fast Socket
Large Machine Learning (ML) models – such as large language models, generative AI, and vision models – are dramatically increasing the number of trainable parameters and are achieving state-of-the-art results. Increasing the number of parameters results in the model being too large to fit on a single VM instance thus demands distributed compute to spread …
Read more “Improve training time of distributed machine learning with NCCL Fast Socket”