2019 EuroLLVM Developers' Meeting: Full Schedule

9:00am CEST

Welcome

Welcome to EuroLLVM 2019!

Speakers

Arnaud de Grandmaison

LLVM Foundation

Monday April 8, 2019 9:00am - 9:15am CEST
Theatre

Opening/Closing

9:15am CEST

MLIR: Multi-Level Intermediate Representation for Compiler Infrastructure

This talk will give an overview of Multi-Level Intermediate Representation - a new intermediate representation designed to provide a unified, flexible and extensible intermediate representation that is language-agnostic and can be used as a base compiler infrastructure. MLIR shares similarities with traditional CFG-based three-address SSA representations (including LLVM IR or SIL), but it also introduces notions from the polyhedral domain as first class concepts. The notion of dialects is a core concept of MLIR extensibility, allowing multiple levels in a single representation. MLIR supports the continuous lowering from dataflow graphs to high-performance target specific code through partial specialization between dialects. We will illustrate in this talk how MLIR can be used to build an optimizing compiler infrastructure for deep learning applications.

MLIR supports multiple front- and back-ends and uses LLVM IR as one of its primary code generation targets. MLIR also relies heavily on design principles and practices developed by the LLVM community. For example, it depends on LLVM APIs and programming idioms to minimize IR size and maximize optimization efficiency. MLIR uses LLVM testing utilities such as FileCheck to ensure robust functionality at every level of the compilation stack, TableGen to express IR invariants, and it leverages LLVM infrastructure such as dominance analysis to avoid implementing all the necessary compiler functionalities from scratch. At the same time, it is a brand new IR, both more restrictive and more general than LLVM IR in different aspects of its design. We believe that the LLVM community will find in MLIR a useful tool for developing new compilers, especially in machine learning and other high-performance domains.

Speakers

Chris Lattner

Google

Tatiana Shpeisman

Google

Monday April 8, 2019 9:15am - 10:00am CEST
Theatre

Keynote

10:05am CEST

Safely Optimizing Casts between Pointers and Integers

In this talk, a list of optimizations that soundly removes casts between pointers and integers
will be presented. In LLVM, a pointer is more than just an integer:
LLVM allows a pointer to track its underlying object, and the rule to find it is defined as based-on relation. This allows LLVM to aggressively optimize load/stores, but makes the meaning of pointer-integer casts complicated. This causes conflict between existing optimizations, causing long-standing miscompilation bugs like 34548.

To fix it, we suggest disabling folding of inttoptr(ptrtoint(p)) to p and using a safe workaround to remove them. This optimization is important because it's removing a significant portion of
such cast pairs. We'll show that even if the optimization is disabled, majority of casts can be removed by carefully adding new \& modifying existing optimizations. After the updates, the performance is still comparable to the original LLVM.

Speakers

Juneyoung Lee

Seoul National University

Monday April 8, 2019 10:05am - 10:30am CEST
Theatre

Student Research Competition

10:30am CEST

An alternative OpenMP Backend for Polly

LLVM’s polyhedral infrastructure framework Polly may automatically exploit thread-level parallelism through OpenMP. Currently, the user can only influence the number of utilized threads, while other OpenMP parameters such as the scheduling type and chunk size are set to fixed values. This in turn,
limits a user’s ability to adapt the optimization process for a given problem.
In this work, we present an alternative OpenMP backend for Polly, which provides additional customization options to the user and is based on the LLVM OpenMP runtime. We evaluate our new backend and the influence of the new customization options on performance and compare to Polly's existing OpenMP backend.

Speakers

Michael Halkenhäuser

Student, TU Darmstadt

Monday April 8, 2019 10:30am - 10:55am CEST
Theatre

Student Research Competition

10:55am CEST

Implementing SPMD control flow in LLVM using reconverging CFGs

Compiling programs for an SPMD execution model, e.g. for GPUs or for whole program vectorization on CPUs, requires a transform from the thread-level input program into a vectorized wave-level program in which the values of the original threads are stored in corresponding lanes of vectors. The main challenge of this transform is handling divergent control flow, where threads take different paths through the original CFG. A common approach, which is currently taken by the AMDGPU backend in LLVM, is to first structurize the program as a simplification for subsequent steps.

However, structurization is overly conservative. It can be avoided when control flow is uniform, i.e. not divergent. Even where control flow is divergent, structurization is often unnecessary. Moreover, LLVM's StructurizeCFG pass relies on region analysis, which limits the extent to which it can be evolved.

We propose a new approach to SPMD vectorization based on saying that a CFG is reconverging if for every divergent branch, one of the successors is a post-dominator. This property is weaker than structuredness, and we show that it can be achieved while preserving uniform branches and inserting fewer new basic blocks than structurization requires. It is also sufficient for code generation, because it guarantees that threads which "leave" a wave at divergent branches will be able to rejoin it later.

Speakers

Fabian Wahlster

Technical University Munich

Monday April 8, 2019 10:55am - 11:20am CEST
Theatre

Student Research Competition

11:40am CEST

Function Merging by Sequence Alignment

Resource-constrained devices for embedded systems are becoming increasingly important. In such systems, memory is highly restrictive, making code size in most cases even more important than performance. Compared to more traditional platforms, memory is a larger part of the cost and code occupies much of it. Despite that, compilers make little effort to reduce code size. One key technique attempts to merge the bodies of similar functions. However, production compilers only apply this optimization to identical functions, while research compilers improve on that by merging the few functions with identical control-flow graphs and signatures. Overall, existing solutions are insufficient and we end up having to either increase cost by adding more memory or remove functionality from programs.

We introduce a novel technique that can merge arbitrary functions through sequence alignment, a bioinformatics algorithm for identifying regions of similarity between sequences. We combine this technique with an intelligent exploration mechanism to direct the search towards the most promising function pairs. Our approach is more than 2.4x better than the state-of-the-art, reducing code size by up to 25%, with an overall average of 6%, while introducing an average compilation-time overhead of only 15%. When aided by profiling information, this optimization can be deployed without any significant impact on the performance of the generated code.

Speakers

Rodrigo Rocha

University of Edinburgh

Monday April 8, 2019 11:40am - 12:05pm CEST
Theatre

Student Research Competition

12:05pm CEST

Compilation and optimization with security annotations

Program analysis and program transformation systems need to express additional program properties, to specify test and verification goals, and to enhance their effectiveness. Such annotations are typically inserted to the representation on which the tool operates; e.g., source level for establishing compliance with a specification, and binary level for the validation of secure code. While several annotation languages have been proposed, these typically target the expression of functional properties. For the purpose of implementing secure code, there has been little effort to support non functional properties about side-channels or faults. Furthermore, analyses and transformations making use of such annotations may target different representations encountered along the compilation flow.

We extend an annotation language to express a wider range of functional and non-functional properties, enabling security-oriented analyses and influencing the application of code transformations along the compilation flow. We translate this language to the different compiler representations from abstract syntax down to binary code. We explore these concepts through the design and implementation of an optimizing, annotation-aware compiler, capturing annotations from the program source, propagating and emitting them in the binary, so that binary-level analysis tools can use them.

Speakers

Son Tuan Vu

Sorbonne University

Monday April 8, 2019 12:05pm - 12:30pm CEST
Theatre

Student Research Competition

12:30pm CEST

Adding support for C++ contracts to Clang

A language supporting contract-checking allows to detect programming errors. Also, making this information available to the compiler may cause it to perform additional optimizations.

This paper presents our implementation of the P0542R5 technical specification (now part of the C++20 working draft).

Speakers

Javier López-Gómez

Computer Science PhD. student, University Carlos III of Madrid

I am a software enginner and PhD. student passionate about computer architecture, embedded systems, electronics, compilers (and, in general, anything that is low-level).

Monday April 8, 2019 12:30pm - 12:55pm CEST
Theatre

Student Research Competition

2:00pm CEST

Lightning Talks

Does the win32 clang compiler executable really need to be over 21MB in size?
Russell Gallop
The title of this lighting talk is from a bug filed in the early days of the PS4 compiler. It noted that the LLVM-based PS4 compiler was more than 3 times larger than the PS3 compiler. Since then it has almost doubled to over 40MB. For a compiler which targets one system this seems excessive. Executable size can cost in worse cache performance and cost time if transferring for distributed builds.
In this lightning talk I will look at where this comes from and how it can be managed.

LLVM IR Timing Predictions: Fast Explorations via lli
Alessandro Cornaglia
Many applications, especially in the embedded domain, have to be executed on different hardware target platforms. For these applications, it is necessary to evaluate both functional and non-functional properties, such as software execution time, in all their hardware/software combinations. Especially in the context of software product line engineering, it is not feasible to test all variants one-by-one. The intermediate representation of the source code offers an attractive opportunity for a single-run analysis, because it covers the software variability, while at the same time omitting the hardware-dependent optimizations. We present an extension for the LLVM IR execution engines, which are part of the LLVM lli tool. The extension evaluates on the fly functional and non-functional properties for all the hardware variants during one lli execution. In particular, our extension is designed for the evaluation of the execution time of a program for multiple target platforms considering different software variants. Both the interpreter and JIT execution modes are supported. Prospectively, our approach will be enriched with multiple analysis techniques. Thanks to our approach, it is now possible to evaluate software variants with regard to multiple hardware platforms in a single lli execution run.

Simple Outer-Loop-Vectorization == LoopUnroll-And-Jam + SLP
Dibyendu Das
In this brief talk I will show how Outer-Loop-Vectorization (OLV), which is of great interest to the LLVM community, can be visualized as a combination of two transformations applied to a loop-nest of interest. These two transformations are LoopUnrollAndJam and SLP. LoopUnrollAndJam is a fairly new addition to the LLVM loop-optimization repertoire. Combined with a fairly powerful SLP that LLVM supports today, we are able to vectorize the outer loop of several important kernels automatically without the support of any pragma. At present our implementation is at the level of a PoC and does not exploit any rigorous costing mechanism. While we understand that OLV is being implemented in the LoopVectorizer using the VPlan technique, this paper highlights a quick and cheap way to solve the same problem in a different manner using two existing transforms.

Clacc 2019: An Update on OpenACC Support for Clang and LLVM
Joel E. Denny
We are developing production-quality, standard-conforming OpenACC [1] compiler and runtime support in Clang and LLVM for the US Exascale Computing Project [2][3]. A key strategy of Clacc’s design is to translate OpenACC to OpenMP in order to leverage Clang’s existing OpenMP compiler and runtime support and to minimize implementation divergence. To maximize reuse of the OpenMP implementation and to facilitate research and development into new source-level tools involving both the OpenACC and OpenMP levels, Clacc implements this translation in the Clang AST using Clang’s TreeTransform facility. However, we are also following LLVM IR parallel extensions being developed by the community as a path to improve compiler optimizations and analyses.

The purpose of this talk is to provide an update on Clacc progress over the preceding year including early performance results, to present the plan for the year ahead, and to invite participation from others. Clacc’s OpenACC support is still maturing and we have not yet offered it upstream. However, we have already upstreamed many mutually beneficial improvements from the Clacc project, including improvements to LLVM’s testing infrastructure and to Clang and its OpenMP support. This talk will summarize those contributions as well.

[1]: OpenACC standard: https://www.openacc.org/
[2]: Clacc: Translating OpenACC to OpenMP in Clang. Joel E. Denny, Seyong Lee, and Jeffrey S. Vetter. 2018 IEEE/ACM 5th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC), Dallas, TX, USA, (2018).
[3]: Clacc: OpenACC Support for Clang and LLVM. Joel E. Denny, Seyong Lee, and Jeffrey S. Vetter. 2018 European LLVM Developers Meeting (EuroLLVM 2018).

Targeting a statically compiled program repository with LLVM
Russell Gallop
Following on from the 2016 talk "Demo of a repository for statically compiled programs", this lightning talk will present a brief overview of how LLVM was modified to target a program repository. This includes adding a new target output format and a new optimization pass to skip program elements already present in the repository. Reference: https://github.com/SNSystems/llvm-project-prepo

Speakers

Dibyendu Das

Senior Fellow, AMD

Russell Gallop

Software Engineer, Sony Interactive Entertainment

Alessandro Cornaglia

FZI Forschungszentrum Informatik

Joel E. Denny

Oak Ridge National Laboratory

Monday April 8, 2019 2:00pm - 2:30pm CEST
Theatre

Lightning Talks

2:35pm CEST

Lightning Talks

Resolving the almost decade old checker dependency issue in the Clang Static Analyzer
Kristóf Umann
As checkers grew in numbers in the Static Analyzer, the problem of certain checkers depending on one another was inevitable. One particular problem, for example, is that a checker called MallocChecker, which despite its name does all sorts of memory allocation and de- or reallocation related checks, depends on CStringChecker to model calls to strcmp. While these checkers are completely separate entities, the Static Analyzer also contains large checker classes that in fact expose multiple checkers to the user: For example, IteratorChecker has a modeling part, and it exposes 3 iterator related checkers, and enabling any of the three will also enable the unexposed modeling part. Having both of these structures makes it difficult to find a solution where the developer (or the experienced user) can easily see what checkers are enabled, as these dependencies are only expressed in the implementation.

This talk is going to discuss elegant solutions as to how these rather fragile checker structures can be preserved by declaring these dependencies in TableGen files, how checker developers (and users) can ensure that when the analyzer is invoked, only the requested checkers will be enabled, and also take a very brief look at what other features the analyzer gained thanks to these issues being resolved.

Adopting LLVM Binary Utilities in Toolchains
Jordan Rupprecht
Although many projects have migrated from GCC-based toolchains to Clang-based ones, tools from the GNU Binutils collection are still widely used despite having equivalents in the LLVM project. The problems faced when attempting to use LLVM tools range anywhere from simple command line syntax differences to unimplemented or buggy features. In this talk, I will describe some of the types of challenges we faced when adopting LLVM tools, as well as some of the strategies we used to test the toolchain.

Multiplication and Division in the Range-Based Constraint Manager
Ádám Balogh
The default constraint manager of the Clang Static Analyzer is a simple range-based constraint manager: it stores and manages the valid ranges for the values of symbolic expressions. Upon new assumptions it further constrains these ranges which often results in an empty range which tells the analyzer that the assumption is impossible. Until now the constraint manager could handle basic assumptions: A <rel> m, A + n <rel> m and A - n <rel> m where A is a symbolic expression, n and m integer constants and <rel> a relational operator. In the latter two cases where a constant is added or subtracted from the symbolic expression the range of the additive expression is calculated by adjusting the range circularly by the constant. However, it could not cope with division and multiplication, thus not even the range for A*2 could be deduced from the range of A. This shortcoming lead to both false positives and missed true positives.

To improve the true positive/false positive ratio of the analyzer we extended the range-based constraint manager to be able to handle expressions of the format A <mul> k <add> n <rel> m, where A is a symbolic expression, k, m and n integer constants, <mul> a multiplicative operator (* or /), <add> an additive operator (+ or -) and <rel> a relational operator. The main challenge in our work was to correctly scale the ranges in the circular arithmetic: for example in case of signed 8 bit types in A * 2 == 56 the value of A could not only be 28, but also -100. Similarly, in A / 3 == 4 the value of A is not necessarily 12, but anything in range [12..14]. To ensure full correctness we also proved our solution: first we generated every range for every constants in both the 8 bit signed and unsigned arithmetic, then we tested whether the scaling algorithm calculates exactly the same ranges. Finally we extrapolated this algorithm to wider integer types and ported it to the range-based constraint manager. According to our measurements there is no significant change in the performance and in the talk we will present numbers of lost false positives and new true positives.

Statistics Based Checkers in the Clang Static Analyzer
Ádám Balogh
In almost every development project there are some conventions that the return value of some functions in an external library must be compared to some extremal value, such as zero. For example, many integer functions return negative number in case of error similarly to pointer functions returning null pointers. In a large project with many external functions it is virtually impossible to formalize all these rules explicitly: they are either unwritten or only exist in a natural language. To help enforcing these rules, we created checkers in the Clang Static Analyzer to explore these rules on statistical base and check the code for them. We currently support two kinds of extremal values: negative numbers for functions returning integers and null pointers for functions returning pointers.

Example:
int i = may_return_return_negative();
v[i]; // error: negative indexing

Exploration and checking for these rules happens in two phases: in the first phase we check every function call and create a summary for each function recording the percentage the return value is checked for negativeness (integer functions) or nullness (pointer functions). If this percentage is above a defined threshold (85% by default) we assume that the rule for the function exists. The second phase is the usual execution of the analyzer where a checker checks the code for violations of the rule: it splits the execution path to two branches at the call of the listed functions, where the return value in one branch is an extremal value (negative for integers or null for pointers) and non-extremal value on the other branch. Other checkers (e.g. the null-pointer dereference checker) are expected to find errors on the extremal-value branch if they are not terminated in the code by checking for the extremal-value. The performance impact of the state-split is low: in at least 85% of the cases the extremal-value branch is terminated quickly, in the remaining cases we expect another checker to create a sink-node because of an error. The new checker is under evaluation on open-source projects. We found some false positives, however their amount can be reduced by involving the arguments into the statistics.

Flang Update
Steve Scalpone
An update about the current state of Flang, including a report on OpenMP 4.5 target offload, Fortran performance and the new f18 front end.

Speakers

Ádám Balogh

Master Developer, Ericsson

Steve Scalpone

NVIDIA

Flang, F18, and NVIDIA C, C++, and Fortran for high-performance computing.

Kristóf Umann

Eötvös Loránd University, Ericsson

Jordan Rupprecht

Software Engineer, Google

Monday April 8, 2019 2:35pm - 3:05pm CEST
Theatre

Lightning Talks

3:10pm CEST

Lightning Talks

Swinging Modulo Scheduling together with Register Allocation
Lama Saba
VLIW architectures rely heavily on Modulo Scheduling to optimize ILP in loops. Modulo Scheduling can be achieved today in LLVM using the MachinePipeliner pass, which implements a Swing Modulo Scheduler prior to register allocation [1]. For some VLIW architectures, such as those lacking hardware interlocks or the ability to spill registers onto a stack, the MachinePipeliner's decisions become crucial for the success of the register allocation phase, since they affect the latter’s decisions to generate splits or spills, which in turn can result in an inefficient or even an unsuccessful resource allocation.
Nevertheless, even though the MachinePipeliner aims to schedule with a minimal Initiation Interval, it is structured in a way that facilities trying larger Initiation Intervals or a different ordering, this structure lends itself to alternative, possibly less aggressive scheduling retries, after more aggressive attempts have failed in register allocation.
This talk introduces this issue and explores how we can achieve successful modulo scheduling and register allocation for such architectures in LLVM by introducing a repetitive rollback-and-retry mechanism for altering scheduling decisions based on the register allocator’s outcome, and how we can leverage such an approach to improve the scheduling of VLIW architectures in general.

[1] An Implementation of Swing Modulo Scheduling in a Production Compiler - Brendon Cahoon - http://llvm.org/devmtg/2015-10/slides/Cahoon-SwingModuloScheduling.pdf

LLVM for the Apollo Guidance Computer
Lewis Revill
Nearly 50 years ago on the 20th of July 1969 humans set foot on the
moon for the first time. Among the many extraordinary engineering feats
that made this possible was the Apollo Guidance Computer, an innovative
processor for its time with an instruction set that was thought up well
before the advent of C. So 50 years later, why not implement support
for it in a modern compiler such as LLVM?

This talk will give a brief overview of some of the architectural
features of the Apollo Guidance Computer followed by an account of my
implementation of an LLVM target so far. The shortcomings of LLVM when
it comes to implementing such an unusual architecture will be
discussed along with the workarounds used to overcome them.

Catch dangling inner pointers with the Clang Static Analyzer
Réka Kovács
C++ container classes provide methods that return a raw pointer to the container's inner buffer. When the container is destroyed, the inner buffer is deallocated. A common bug is to use such a raw pointer after deallocation, which may lead to crashes or other unexpected behavior.

This lightning talk will present a new Clang Static Analyzer checker designed to address the above described problems, implemented last year as a Google Summer of Code project. The checker has found serious problems in popular open source projects with a negligible false positive rate. Future plans include adding support for view-like constructs and non-STL containers.

Cross translation unit test case reduction
Réka Kovács
C-Reduce, released by Regehr et al. in 2012, is an excellent tool designed to generate a minimal test case from a C/C++ file that has some specific property (e.g. triggers a bug). One of the most interesting parts of C-Reduce is Clang Delta, which is a set of compiler-like transformations implemented using Clang libraries. Clang Delta includes transformations like changing a function parameter to a global variable etc.

With the introduction of the experimental cross translation unit analysis feature in the Clang Static Analyzer, there arose a need to investigate crashes, bugs, or false positive reports that spread across different translation units. Unfortunately, C-Reduce was designed to minimize one translation unit at a time, and some of the Clang Delta transformations cannot be applied to multiple TUs in their original form.

This talk/poster is a status report about a work in progress that aims to make it possible to use C-Reduce for cross translation unit test case reduction.

Speakers

Reka Kovacs

Software Engineer, Microsoft

Lama Saba

Intel Israel

Lewis Revill

Embecosm

Monday April 8, 2019 3:10pm - 3:40pm CEST
Theatre

Lightning Talks

4:15pm CEST

Handling massive concurrency: Development of a programming model for GPU and CPU

For efficient parallel execution it is necessary to write massively concurrent algorithms and to optimize memory access.
In this session we show our approach of a programming model that is able to execute the same concurrent algorithm efficiently on GPUs and CPUs:
Similar to OpenMP it allows the programmer to describe concurrency and memory access declaratively but hides complexity like memory transfers between the CPU and the GPU. In comparison to OpenMP our model provides a higher level of expressiveness which enables us to reach a performance comparable to OpenCL/CUDA.

Speakers

Matthias Liedtke

SAP

Monday April 8, 2019 4:15pm - 4:45pm CEST
Theatre

Technical Talk

4:50pm CEST

Automated GPU Kernel Fusion with XLA

XLA (Accelerated Linear Algebra) is an optimizing compiler for linear algebra that accelerates TensorFlow computations. The XLA compiler lowers to LLVM IR and relies on LLVM for low-level optimization and code generation. XLA achieves significant performance gains on TensorFlow models. We observed speedups of up to 3x on internal models. The popular image classification model ResNet-50 trains 1.6x faster.

A key optimization performed by XLA is automated GPU kernel fusion. The idea is to combine multiple linear algebra operators into a single GPU kernel to reduce memory bandwidth requirements and kernel launch overhead. TensorFlow with XLA demonstrated competitive performance on MLPerf benchmarks (mlperf.org) compared to ML frameworks that rely on manually fused, hand-tuned GPU kernels.

Speakers

Thomas Joerg

Google

Monday April 8, 2019 4:50pm - 5:20pm CEST
Theatre

Technical Talk

5:25pm CEST

Compiler Optimizations for (OpenMP) Target Offloading to GPUs

The support of OpenMP target offloading in Clang is steadily increasing. However, when it comes to the optimization of such codes, LLVM is still doing a horrible job. Early separation into different modules and state machine generation are only two reasons why the middle and backend have a hard time generating efficient code.

In this talk, we want to focus on code offloading to GPUs (through OpenMP), an increasingly important part of modern programming. We will first highlight different reasons for missing optimizations and poor code quality before we introduce new __practical__ solutions. While our implementation is still experimental, early results suggest that there is enormous optimization potential in both manually written, and automatically generated, target offloading code.

In addition to the talk, we will, closer to the conference date, initiate a discussion on the LLVM mailing list and publish our implementation.

Speakers

Johannes Doerfert

Argonne National Laboratory

Monday April 8, 2019 5:25pm - 5:55pm CEST
Theatre

Technical Talk

10:00am CEST

The Future of AST Matcher-based Refactoring

In the last few years, Clang has opened up new possibilities in C++ tooling for the masses. Tools such as clang-tidy and clazy offer ready-to-use source-to-source transformations. Available transformations can be used to modernize (use newer C++ language features), improve readability (remove redundant constructs), or improve adherence to the C++ Core Guidelines.

However, when special needs arise, maintainers of large codebases need to learn some of the Clang APIs to create their own porting aids. The Clang APIs necessarily form a more-exact picture of the structure of C++ code than most developers keep in their heads, and bridging the conceptual gap can be a daunting task.

This talk will show tools and features which make this task easier for developers, ranging from

* Improvements to the clang-query interpreter
* Improvements to the AST Matcher API
* Information essential to creating clang-tidy-checks
* Debugging and profiling of AST Matchers
* Advanced tooling

These features are in various stages along the way to being upstreamed to Clang. They enable new possibilities for large-scale refactoring in a reasonable timeframe by solving problems of API discovery, guiding users in creating working refactorings.

Speakers

Stephen Kelly

Energid

Tuesday April 9, 2019 10:00am - 10:30am CEST
Theatre

Technical Talk

10:35am CEST

clang-scan-deps: Fast dependency scanning for explicit modules

The dependency information that’s provided by Clang can be used to implement a pre-scanning phase for a build system that uses Clang modules in an explicit manner, by discovering the required modules before compiling. However, the traditional approach of preprocessing all sources to find the required modular dependencies is typically not fast enough for a pre-scanning phase that must run for every build. This talk introduces clang-scan-deps, an optimized dependency discovery service that can provide speed up of up to 10X over the regular preprocessor-based scanning. This talk goes into details of how this service is implemented and how it can be leveraged by the build system to implement a fast pre-scanning phase for explicit Clang modules.

Speakers

Alex Lorenz

Tuesday April 9, 2019 10:35am - 11:05am CEST
Theatre

Technical Talk

11:25am CEST

Changes to the C++ standard library for C++20

The next version of the C++ standard will almost certainly be approved next year, and be called C++20. There will be many new features in the standard library in C++20. Things like ranges, concepts, calendar support, and many others. In this talk, I'll give an overview of the new features, and an update on the status of their implementation in libc++.

Speakers

Marshall Clow

Tuesday April 9, 2019 11:25am - 11:55am CEST
Theatre

Technical Talk

12:00pm CEST

Implementing the C++ Core Guidelines’ Lifetime Safety Profile in Clang

This is an experience report of the Clang-based implementation of Herb Sutter’s Lifetime safety profile for the C++ Core Guidelines, available online at cppx.godbolt.org.
We will cover the kinds of diagnoses supported by the checker and how they are implemented using Clang’s control flow graph. We will discuss what are the main problems of the current prototype and what are we going to do to fix those. We also plan to discuss the upstreaming process. Some parts of the analysis might end up improving existing clang warnings some of which are on by default. We will also summarize early experience with performance against real-world code bases, including compile time performance for LLVM sources with the checker.

Speakers

Matthias Gehre

Senior Software Architect, Silexica GmbH

Matthias co-maintains the Clang-based implementation of Herb Sutter’s lifetime checks, available online at https://github.com/mgehre/llvm-project and godbolt.org. He is currently working as a Senior Software Architect at Silexica. With its headquarters in Germany and offices in... Read More →

Gábor Horváth

Software Engineer, Microsoft

Gabor started a Ph.D. in 2016. He is a contributor to research projects related to static analysis since 2012. He is a clang contributor, participated in Google Summer of Code twice as a student and many times as a mentor, interned for Apple, Microsoft and Google. He taught C++ and... Read More →

Tuesday April 9, 2019 12:00pm - 12:30pm CEST
Theatre

Technical Talk

12:35pm CEST

DOE Proxy Apps: Compiler Performance Analysis and Optimistic Annotation Exploration

The US Department of Energy proxy applications are simplified models of the key components of various scientific computing workloads. These proxy applications are useful for research and exploration in many areas, including software technology. We have conducted performance analysis of these proxy application using Clang, GCC and some vendor compilers. These results have identified and motivated our work on modelling the memory access of math functions in Clang. We will discuss our design and our work to expose this ability to encode function information to the developer. Additionally in this area, I will then discuss my collaboration on a development tool designed to explore both the potential performance gap lost from knowledge the developer could encode (but did not) and the extent to which LLVM is able to profitably make use of this information.

Speakers

Brian Homerding

Argonne National Laboratory

Tuesday April 9, 2019 12:35pm - 1:05pm CEST
Theatre

Technical Talk

2:00pm CEST

LLVM Numerics Improvements

Some LLVM based compilers currently provide two modes of floating point code generation. The first mode, called fast-math, is where performance is the primary consideration over numerical precision and accuracy. This mode does not strictly follow the IEEE-754 standard, but has proven useful for applications that do not require this level of precision. The second mode, called precise-math, is where the compiler carefully follows the subset of behavior defined in the IEEE standard that is applicable to conforming hardware targets. This mode is primarily used for compute workloads and wherever fast-math precision is inadequate, however it runs much slower as it requires a larger number of instructions in general. In practice neither of these modes is particularly desirable. The fast-math mode ignores a significant portion of the standard as pertains to handling undefined values described as Not a Number (NaNs) and Infinities (INFs), resulting in difficulties for certain workloads when the hardware target computes these values correctly and performance remains critical.

Until recently these two models were mutually exclusive, however with the addition of IR flags they need not be. For instance, the FastMath metadata module flag drives behavior deemed numerically unsafe when it is enabled, by indiscriminately enabling optimizations. With IR flags this behavior can be enabled with much finer granularity, allowing various code forms to be fast or precise together in one module. We call this mixed mode compilation. IR flags can be used individually or paired to produce desired floating point behavior under specified constraints with fine granularity of control. Optimization passes have been modified under this new kind of control to produce this behavior. This talk will describe the recent numerics work and discuss the implications for front-ends and backends built with LLVM.

Speakers

Steve Canon

Apple

Michael Berg

GPU Compiler Engineer, Apple

Tuesday April 9, 2019 2:00pm - 2:30pm CEST
Theatre

Technical Talk

2:35pm CEST

A Tale of Two ABIs: ILP32 on AArch64

We faced the challenge of seamlessly running 32b application binaries on a new 64b S4 chip, which has no hardware support to run 32b binaries. Translating the ARM binaries directly to the new hardware would be hard, but when an application is available in bitcode format, the task is much more feasible. This talk opens the curtain for an inside look into the decisions and steps taken to translate 32b bitcode for the new 64b hardware. It will discuss the many design, implementation and verification challenges of introducing a new ABI, arm64_32, which guarantees that the binaries for the new S4 chip are compatible to the original 32b applications.

Speakers

Tim Northover

Apple Inc.

Tuesday April 9, 2019 2:35pm - 3:05pm CEST
Theatre

Technical Talk

3:10pm CEST

Loop Fusion, Loop Distribution and their Place in the Loop Optimization Pipeline

Loop fusion and loop distribution are two key optimizations that typically are featured prominently in a loop optimization pipeline. They are used both to improve performance of applications and also to enable other loop optimizations. For example, loop fusion can improve the performance of applications through increasing temporal data cache locality. It can also increase the scope of other optimizations by creating larger loop nests for intra-loop nest optimizations to work on. Similarly, loop distribution is often used to improve performance directly by distributing loops that exceed hardware resources (e.g., register pressure). It is also frequently used to distribute loops containing loop-carried dependencies into two loops: one with loop carried dependencies and the second with no loop carried dependencies; this enables other optimizations (e.g., vectorization) on the independent loop. Furthermore, these two optimizations can work nicely together, as they have the ability to "undo" transformations done by the other. Thus, the implementation of both of these optimizations must be robust as they can both play an important role in a loop optimization pipeline.

This talk will be a follow-on to "Revisiting Loop Fusion, and its place in the loop transformation framework", presented at the 2018 LLVM Developers' Meeting. The patch to implement basic loop fusion described in the talk is currently undergoing review on phabricator (https://reviews.llvm.org/D55851). We have prototypes to make loop fusion more aggressive by moving code from between two loops (making them adjacent) that will be posted for review once the basic loop fusion patch is accepted. We also have plans to peel loops to (to make their
bounds conform), and improve the dependence analysis between the two loop bodies. This talk will also include findings from our current analysis of the loop distribution pass in LLVM. It will provide a summary of the strengths and limitations of loop distribution, and summarize any improvements that are made prior to EuroLLVM 2019. Finally, the presentation will discuss how loop fusion and loop distribution can fit into the existing loop optimization pipeline in LLVM.

Speakers

Kit Barton

Technical lead for LLVM on Power and XL Compilers, IBM Canada

Tuesday April 9, 2019 3:10pm - 3:40pm CEST
Theatre

Technical Talk

4:00pm CEST

A compiler approach to Cyber-Security

STMicroelectronics is developing LLVM-based compilation tools for its
proprietary processors and also for the ARM cores. Applications, among
which an increasing number of IOTs developments, require more and more
security implemented either in hardware or software, or both. To
implement complex and reliable software countermeasures that can be
deployed in a timely manner, we are adding specific cybersecurity
code-generation features in our production LLVM compiler, that we
present in this talk.

We give implementation details on how we worked into Clang and LLVM to
implement these techniques and we explain how they contribute to
reinforce the software protection. We also detail how we can restrict
these transformations to specific safety-critical regions of a program
to meet the industrial constraints on performance and code size of our
applications.

Speakers

François de Ferrière

STMicroelectronics

Tuesday April 9, 2019 4:00pm - 4:30pm CEST
Theatre

Technical Talk

4:35pm CEST

Clang tools for implementing cryptographic protocols like OTRv4

OTRv4 is the newest version of the Off-The-Record protocol. It is a protocol where the newest academic research intertwines with real-world implementations: it provides end to end encryption, and offline and online deniability for interactive and non-interactive applications. As a real world protocol, it needs to provide an implementation that works for real world users. For this, the OTRv4 team decided to implement it in C. But as we know, working in C can be challenging due to several factors.

In order to make OTRv4s implementation much safer and usable, we decided to use several clang tools, such as clang format, clang tidy and address sanitizers. By using these tools, we uncovered bugs, issues and problems. In this talk, we aim to highlight the most interesting bugs we uncovered by using these tools, by comparing the results of using static analysis and fast memory error detector. We also aim to highlight the importance of using a specific code formatting style, as it makes an implementation much clearer and easier to find bugs. We plan to high point the importance of using these tools on real world implementations that are going to be used by millions of users and that aim to provide the best security properties available.

Speakers

Sofía Celi

Crytography Researcher, Cloudflare

Tuesday April 9, 2019 4:35pm - 5:05pm CEST
Theatre

Technical Talk

5:10pm CEST

Testing and Qualification of Optimizing Compilers for Functional Safety

In the development of embedded applications, the compiler plays a crucial role in the translation from source to machine code. If the application is safety-critical, functional safety standards such as ISO 26262 for the automotive industry require that the user of the compiler develops confidence in the compilers correct operation. In this presentation we will discuss the requirements of ISO 26262 on tools such as LLVM compilers and how they can be met with a testing procedure that works well with the V-Model of engineering.

As the name implies, functional safety standards deal with specified functionality of components. But what about the optimizations that a LLVM-based compiler applies to the program, sometimes even silently? Optimizations are not even mentioned in the language standards for C and C++ - they are ``non-functional" behavior of the compiler. As we will demonstrate, ignoring optimizations will lead to significant holes in the compiler's test coverage. We will show how we have developed a technique that achieves good results with optimization testing and have some errors in Intel's well-regarded Clang-based compiler to show. To show the completeness of our method for the requirements of functional safety, we have analyzed how the tests match with the various LLVM IR-level transformation passes that they go through.

Speakers

José Luis March Cabrelles

Solid Sands

Tuesday April 9, 2019 5:10pm - 5:40pm CEST
Theatre

Technical Talk

5:45pm CEST

Closing

Closing session

Speakers

Arnaud de Grandmaison

LLVM Foundation

Tuesday April 9, 2019 5:45pm - 6:00pm CEST
Theatre

Opening/Closing