Train Your Large Model on Multiple GPUs with Fully Sharded Data Parallelism

This article is divided into five parts; they are: • Introduction to Fully Sharded Data Parallel • Preparing Model for FSDP Training • Training Loop with FSDP • Fine-Tuning FSDP Behavior • Checkpointing FSDP Models Sharding is a term originally used in database management systems, where it refers to dividing a database into smaller units, …

Tools for this?

What tools are used for these type of videos?I was thinking face fusion or some kind of face swap tool in stable diffusion.Could anybody help me? submitted by /u/vasthebus [link] [comments]