Design of runtime libraries to improve programmability and efficiency of heterogeneous CPU-GPU nodes
Metadata[+] Show full item record
[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] As computers began to reach their limit on how fast a single processor could execute code, computer developers began to devise methods to process more code simultaneously. Since then, parallel systems have become commonplace and increasingly necessary for processing large quantities of data quickly. These distributed architectures come in many different flavors, ranging from a multi-core CPU to many-core General Purpose Graphics Processing Units (GPUs) all the way up to a large cluster of computers and the cloud. Unfortunately, whereas all code in the past was automatically accelerated by upgrading to a better processor, these new parallel systems require advanced compiler directives, libraries, and in some cases entirely new languages in order to effectively use these distributed architectures, and oftentimes expertise on each system is required to fully utilize these devices. My research focuses on removing some of the difficulty with programming heterogeneous CPU-GPU nodes by transparently optimizing application execution during runtime. The first part of this thesis describes Sync-Free GPU (SF-GPU) , a node-level runtime system which automatically and transparently improves application concurrency on GPUs by managing implicit and explicit synchronizations, and the second part describes FPU, a functional programming API which generates highly optimized code for both the CPU and for any number of GPUs and automatically performs load balancing between all compute devices at runtime, all with only a few lines of code from the user.