Name: Automated GPU Kernel Fusion with XLA
Start: 2019-04-08T16:50:00+0200
End: 2019-04-08T17:20:00+0200

Automated GPU Kernel Fusion with XLA

XLA (Accelerated Linear Algebra) is an optimizing compiler for linear algebra that accelerates TensorFlow computations. The XLA compiler lowers to LLVM IR and relies on LLVM for low-level optimization and code generation. XLA achieves significant performance gains on TensorFlow models. We observed speedups of up to 3x on internal models. The popular image classification model ResNet-50 trains 1.6x faster.

A key optimization performed by XLA is automated GPU kernel fusion. The idea is to combine multiple linear algebra operators into a single GPU kernel to reduce memory bandwidth requirements and kernel launch overhead. TensorFlow with XLA demonstrated competitive performance on MLPerf benchmarks (mlperf.org) compared to ML frameworks that rely on manually fused, hand-tuned GPU kernels.

Speakers

Thomas Joerg

Google

Monday April 8, 2019 4:50pm - 5:20pm CEST
Theatre

Technical Talk

2019 EuroLLVM Developers' Meeting

Thomas Joerg

Attendees (30)

2019 EuroLLVM Developers' Meeting

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Thomas Joerg

Attendees (30)