14 & 16 Sept - 10am CET
NVIDIA A100 GPU Boot Camp: Use Cases and Best Practices
Since its debut in 2020, the new NVIDIA A100 Tensor Core GPU has quickly established its place in the AI development world as the most versatile, flexible and powerful GPU technology in the market. Featuring the new Multi-Instance GPU (MIG) GPU partioning capability, TensorFloat-32 (TF32) and BFloat (BF16)/FP32 mixed-precision Tensor Core operations, and many other performance enhancement designs, NVIDIA A100 defines a whole new category of GPU compute and scalability.
Join the webinar and learn about NVIDIA A100 use cases and best practices from NVIDIA GPU experts.
Receive technical consultation, hardware sizing suggestions and software stack optimization tips from S3S GPU server experts throughout your proof-of-concept experience with NVIDIA A100.
Get in touch with S3S to set up your test environment.
Whether using MIG to partition an A100 GPU into smaller instances or NVLink to connect multiple GPUs to speed large-scale workloads, A100 can readily handle different-sized acceleration needs, from the smallest job to the biggest multi-node workload. A100’s versatility means IT managers can maximize the utility of every GPU in their data center, around the clock.
NVIDIA A100 delivers 312 teraFLOPS (TFLOPS) of deep learning performance. That’s 20X the Tensor floating-point operations per second (FLOPS) for deep learning training and 20X the Tensor tera operations per second (TOPS) for deep learning inference compared to NVIDIA Volta GPUs.
NVIDIA NVLink in A100 delivers 2X higher throughput compared to the previous generation. When combined with NVIDIA NVSwitch ™ , up to 16 A100 GPUs can be interconnected at up to 600 gigabytes per second (GB/sec), unleashing the highest application performance possible on a single server. NVLink is available in A100 SXM GPUs via HGX A100 server boards and in PCIe GPUs via an NVLink Bridge for up to 2 GPUs.
An A100 GPU can be partitioned into as many as seven GPU instances, fully isolated at the hardware level with their own high-bandwidth memory, cache, and compute cores. MIG gives developers access to breakthrough acceleration for all their applications, and IT administrators can offer right-sized GPU acceleration for every job, optimizing utilization and expanding access to every user and application.
With up to 80 gigabytes of HBM2e, A100 delivers the world’s fastest GPU memory bandwidth of over 2TB/s, as well as a dynamic randomaccess memory (DRAM) utilization efficiency of 95%. A100 delivers 1.7X higher memory bandwidth over the previous generation.
AI networks have millions to billions of parameters. Not all of these parameters are needed for accurate predictions, and some can be converted to zeros, making the models “sparse” without compromising accuracy. Tensor Cores in A100 can provide up to 2X higher performance for sparse models. While the sparsity feature more readily benefits AI inference, it can also improve the performance of model training.
All trademarks, logos and brand names are the property of their respective owners. All company, product and service names used in this website are for identification purposes only.