[-] Show simple item record

dc.contributor.advisorBecchi, Michelaeng
dc.contributor.authorLi, Da (Scientist)eng
dc.date.issued2016eng
dc.date.submitted2016 Summereng
dc.descriptionSupervisor: Dr. Michela Becchi.eng
dc.descriptionIncludes vita.eng
dc.description.abstractOver the last decade, many-core Graphics Processing Units (GPUs) have been widely used to accelerate a variety of applications. Meanwhile, Intel has released its Xeon Phi Coprocessor, which is equipped with more than fifty x86 cores, each supporting four hardware threads. Despite their widespread use, many-core processors are still considered relatively difficult to program, in that they require the programmer to be familiar with both parallel programming and the hardware features of these devices. Due to their massive computational powers, many-core processors have been successfully used to parallelize a wide variety of dense matrices and vectors based applications. These extensively investigated problems are mainly from linear algebra, stencil computations, image processing and so on. However, many established and emerging problems have not yet been fully explored. Some of these applications use irregular algorithms/operations (e.g., dynamic programming), while others are based on irregular data structures, such as graphs. It has been shown that these emerging applications do exhibit certain degree of static and runtime parallelism, but are relatively hard to parallelize. My research focuses on addressing important issues related to the deployment of emerging applications on many-core processors. In particular, we proposed efficient GPU implementations for large-scale pairwise sequence alignment and integrated proposed GPU kernels into a hybrid MPI-CUDA framework for CPU-GPU clusters. we also targeted graph- or tree-based applications and proposed: (1) unifying programming interfaces for many-core processors (2) runtime support for efficient execution on irregular datasets and (3) compiler support for efficient mapping of applications onto hardware. Finally, we conducted a comprehensive study of performance, memory footprint and power consumption on various platforms and extended existing central processing units (CPU) only or graphic processing units (GPU) only CNNs learning methods to CPU-GPU cooperative ways. We also implemented a virtual memory and integrated into Caffe to facilitate training large CNN models with limited GPU memory.eng
dc.description.bibrefIncludes bibliographical references (pages 176-190).eng
dc.format.extent1 online resource (xiii, 191 pages) : illustrationseng
dc.identifier.merlinb118866138eng
dc.identifier.oclc990269130eng
dc.identifier.urihttps://hdl.handle.net/10355/57236
dc.identifier.urihttps://doi.org/10.32469/10355/57236eng
dc.languageEnglisheng
dc.publisherUniversity of Missouri--Columbiaeng
dc.relation.ispartofcommunityUniversity of Missouri--Columbia. Graduate School. Theses and Dissertationseng
dc.rightsOpenAccess.eng
dc.rights.licenseThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.eng
dc.subject.FASTGraphics processing unitseng
dc.subject.FASTApplication softwareeng
dc.subject.FASTNeural networks (Computer science)eng
dc.titleFacilitating emerging applications on many-core processorseng
dc.typeThesiseng
thesis.degree.disciplineElectrical and computer engineering (MU)eng
thesis.degree.grantorUniversity of Missouri--Columbiaeng
thesis.degree.levelDoctoraleng
thesis.degree.namePh. D.eng


Files in this item

[PDF]
[PDF]
[PDF]

This item appears in the following Collection(s)

[-] Show simple item record