Performance/accuracy trade-offs of floating-point arithmetic on NVidia GPUs : from a characterization to an auto-tuner
Metadata[+] Show full item record
Floating-point computations produce approximate results, possibly leading to inaccuracy and reproducibility problems. Existing work addresses two issues: first, the design of high precision floating-point representations, and second, the study of methods to support a trade-off between accuracy and performance of central processing unit (CPU) applications. However, a comprehensive study of trade-offs between accuracy and performance on modern graphic processing units (GPUs) is missing. This thesis covers the use of different floating-point precisions (i.e., single and double floating-point precision) in the IEEE 754 standard, the GNU Multiple Precision Arithmetic Library (GMP), and composite floating-point precision on a GPU using a variety of synthetic and real-world benchmark applications. First, we analyze the support for a single and double precision floating-point arithmetic on the considered GPU architectures, and we characterize the latencies of all floating-point instructions on GPU. Second, a study is presented on the performance/accuracy tradeoffs related to the use of different arithmetic precisions on addition, multiplication, division, and natural exponential function. Third, an analysis is given on the combined use of different arithmetic operations on three benchmark applications characterized by different instruction mixes and arithmetic intensities. As a result of this analysis, a novel auto tuner was designed in order to select the arithmetic precision of a GPU program leading to a better performance and accuracy tradeoff depending on the arithmetic operations and math functions used in the program and the degree of multithreading of the code.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.