Thinking Fast and Correct: Automated Rewriting of Numerical Code through Compiler Augmentation
Floating-point numbers are finite-precision approximations to real numbers and are ubiquitous in computer applications in nearly every field. Selecting the right floating-point representation that achieves a good balance of performance and numerical accuracy is a difficult task, which has become even more critical as hardware has trended to high-performance, low-precision operations. Although the common wisdom around changing floating-point precision implies that accuracy and performance are inversely correlated, more advanced techniques can often circumvent this tradeoff. Applying complex numerical optimizations to real-world code, however, is an arduous engineering task that requires expertise in numerical analysis and performance engineering, and the specific computational context. While there is a plethora of existing tools that partially automate this process, they are limited in the scope of optimization techniques or still require substantial human intervention. We present Poseidon, a modular and fully extensible framework that fully automates floating-point analyses and optimizations for real-world applications within a production compiler. Poseidon operates as two-phase compiler. In the first compilation, Poseidon captures computational context through small surrogate profiled executions. In the second compilation, Poseidon consumes profiled data, generates and evaluates candidate rewrites such as precision changes and algebraic rewrites, and solves for optimal performance/accuracy tradeoffs and rewrite sets. Poseidon’s interoperability with standard compiler analyses and optimizations grants it analysis and optimization advantages unavailable to existing source- and binary-level approaches. We evaluate Poseidon on multiple large-scale applications and perform ablations on each component of Poseidon’s design. We find that performing profile-guided algebraic rewrites and precision tuning leads to outsized benefits in performance without substantially changing accuracy, and outsized accuracy benefits without diminishing performance. On a quaternion differentiator, Poseidon enables a 1.46× speedup with a relative error of 1e-7. In DOE’s 64-bit LULESH hydrodynamics application, Poseidon improves program accuracy to exactly match a 512-bit simulation run without substantially reducing runtime performance.