Optimizing DLRM by using PyTorch with oneCCL Backend

Introduction

Intel® oneAPI Collective Communications Library

  • Built on top of lower-level communication middleware. MPI and libfabrics transparently support many interconnects, such as Intel® Omni-Path Architecture, InfiniBand*, and Ethernet.
  • Optimized for high performance on Intel® CPUs and GPUs.
  • Allows the tradeoff of compute for communication performance to drive scalability of communication patterns.
  • Enables efficient implementations of collectives that are heavily used for neural network training, including all-gather, all-reduce, and reduce-scatter.
Fig. 1 Software stacks for PyTorch DistibutedDataParallel. CCL is one of communication backend options.

DLRM : a new era of deep learning workloads from Facebook

Fig.2 Schematic of the DLRM topology.

Multi-Socket and Multi-Nodes DLRM

Multi-Socket / Multi-node DLRM results and related performance benefit from oneCCL

  • The Small variant is identical to the model problem used in DLRM’s release paper [2].
  • Large variant is the small problem scaled in every aspect for scale-out runs and is the best representative for production workloads in terms of actual compute and memory capacity requirements.
  • The MLPerf configuration is recently proposed as a benchmark config for performance evaluation of a recommendation system training [3].
Fig. 4 DLRM strong scaling performance comparison.
Fig. 5 DLRM weak scaling performance comparison.
Fig. 6 : Compute-Communication time break up for Large Config
Fig. 7 : Compute-Communication time break up for MLPerf Config

BFLOAT16 training supported by oneCCL backend on Intel Xeon scalable processors

Fig. 8 Split-SGD BF16 performance

Conclusion

Reference

--

--

--

PyTorch is an open source machine learning platform that provides a seamless path from research prototyping to production deployment.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Facebook’s mea culpa, party parrots, and is AI art real art? (Issue 11)

Interpreting Body Language With XR

What Are Algorithms And How Do They Influence Our Society?

Where are Self-Driving Cars??

Data science isn’t dead

How Spark’s creator is trying democratize AI with the DAWN project

We Used An Advanced Reality Simulation to Give Our Intern Repeated Lifetimes With His Soulmate &…

Paris, Willie Wonka, and Dreamers (Issue 33)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
PyTorch

PyTorch

PyTorch is an open source machine learning platform that provides a seamless path from research prototyping to production deployment.

More from Medium

Large-Scale Distributed Training with TorchX and Ray

Loading Huge PyTorch Models with Linear Memory Consumption

Best Practices for Neural Network Exports to ONNX

Understand collate_fn in PyTorch