Optimizing DLRM by using PyTorch with oneCCL Backend

Introduction

Intel® oneAPI Collective Communications Library

  • Built on top of lower-level communication middleware. MPI and libfabrics transparently support many interconnects, such as Intel® Omni-Path Architecture, InfiniBand*, and Ethernet.
  • Optimized for high performance on Intel® CPUs and GPUs.
  • Allows the tradeoff of compute for communication performance to drive scalability of communication patterns.
  • Enables efficient implementations of collectives that are heavily used for neural network training, including all-gather, all-reduce, and reduce-scatter.
Fig. 1 Software stacks for PyTorch DistibutedDataParallel. CCL is one of communication backend options.

DLRM : a new era of deep learning workloads from Facebook

Fig.2 Schematic of the DLRM topology.

Multi-Socket and Multi-Nodes DLRM

Multi-Socket / Multi-node DLRM results and related performance benefit from oneCCL

  • The Small variant is identical to the model problem used in DLRM’s release paper [2].
  • Large variant is the small problem scaled in every aspect for scale-out runs and is the best representative for production workloads in terms of actual compute and memory capacity requirements.
  • The MLPerf configuration is recently proposed as a benchmark config for performance evaluation of a recommendation system training [3].
Fig. 4 DLRM strong scaling performance comparison.
Fig. 5 DLRM weak scaling performance comparison.
Fig. 6 : Compute-Communication time break up for Large Config
Fig. 7 : Compute-Communication time break up for MLPerf Config

BFLOAT16 training supported by oneCCL backend on Intel Xeon scalable processors

Fig. 8 Split-SGD BF16 performance

Conclusion

Reference

--

--

--

PyTorch is an open source machine learning platform that provides a seamless path from research prototyping to production deployment.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

ARTIFICIAL INTELLIGENCE BECOMES THE NEW FACE OF WORLD

Sky Computing: Accelerating Geo-distributed Computing in Federated Learning

Algorithms Bring Us Together

The widening scope of AI applications

10 uses of AI in your Everyday Life

Artificial Intelligence Is a Must, Not a Need

Artificial creativity

A Letter to Young Technologists

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
PyTorch

PyTorch

PyTorch is an open source machine learning platform that provides a seamless path from research prototyping to production deployment.

More from Medium

Training a Vision Transformer on Amazon SageMaker

Install Pytorch on Ubuntu

Self-Supervised Vision with Masked Autoencoder

Flash 0.7 — Your AI Factory Just Got Better!