In-memory distributed indexing for large-scale media data retrieval
Metadata[+] Show full item record
[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] Multimedia data includes various media types such as text, image and video. Recent research has shown that media data retrieval serves a critical role in the development of multimedia applications. However, due to the exponential growth of multimedia data, high-speed and efficient indexing is becoming more difficult than ever. In this thesis work, we propose a novel approach to speed up the retrieval process by adopting a distributed computing paradigm through the Apache Spark framework. Utilizing search trees on the Apache Spark ecosystem leads to fast and cost-effective media database retrievals by caching indexing structures into memory and aggregating ranked results with flexibilities for users to specify the importance of search cues. We conducted computational experiments on large-scaled biomedical images and protein 3D structures to demonstrate the effectiveness and scalability of our system with reasonably high accuracy.
Access is limited to the campuses of the University of Missouri.