Loading…
Monday, April 8 • 4:50pm - 5:20pm
Automated GPU Kernel Fusion with XLA

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

XLA (Accelerated Linear Algebra) is an optimizing compiler for linear algebra that accelerates TensorFlow computations. The XLA compiler lowers to LLVM IR and relies on LLVM for low-level optimization and code generation. XLA achieves significant performance gains on TensorFlow models. We observed speedups of up to 3x on internal models. The popular image classification model ResNet-50 trains 1.6x faster.

A key optimization performed by XLA is automated GPU kernel fusion. The idea is to combine multiple linear algebra operators into a single GPU kernel to reduce memory bandwidth requirements and kernel launch overhead. TensorFlow with XLA demonstrated competitive performance on MLPerf benchmarks (mlperf.org) compared to ML frameworks that rely on manually fused, hand-tuned GPU kernels.

Speakers

Monday April 8, 2019 4:50pm - 5:20pm CEST
Theatre