Compression for Machine Vision and Beyond
Metadata[+] Show full item record
Compression has been one of the most fundamental and elusive challenges in both academia and industry. With the sheer increase of high-definition video content over the internet, developing improved compression algorithms becomes an urgent necessity. This thesis tackles the problem of visual content compression: how to reduce the transmitted data volume under specific application scenarios. One of the core steps is how to remove the redundancy to achieve a compact latent representation. We approach the problem from two directions: prediction and transform. While a typical prediction process targets at removing the statistical redundancy between the reference and current image blocks, and transform further removes the inter-pixel redundancy between residual samples. We will address compression for machine vision and related topics. In compression for machine vision, machines will communicate amongst themselves to perform tasks without a human in the mix, which requires a separate pipeline to achieve optimal coding performance. We aim to investigate how to efficiently transmit image features in low latency scenario and focus on developing a multiple-transform solution to achieve a more compact data representation for image retrieval task. Multiple-transform solution is proven to be more efficient to preserve more distinguishable properties for a large-scale dataset. However, over-sized transform candidate list burdens the bit-rate constraint. We develop a merge scheme to search for the optimal transforms from available transform candidates. We will also present our efforts at contributing the development of next-generation video coding standard: Versatile Video Coding (VVC), and exploring improved intra prediction schemes beyond the High Efficiency Video Coding (HEVC) standard. 1) Based on observations on the properties of DST-7 and DCT-8, a dual-implementation support solution is developed to reduce the software run-time complexity. The (anti-)symmetric features are leveraged to reduce the number of arithmetic operations involved in deriving the transformed coefficients from the residual block. The scheme has been adopted by MPEG VVC standardization development group and was integrated into VVC reference software. 2) In prediction-relevant attempts, we explore both traditional and Convolutional Neural Network (CNN)-based schemes. Multiple Linear Regression is utilized to further explore spatial correlation with reference pixels and existing intra prediction. A CNN-based scheme is developed by combining local and non-local information for intra prediction. We demonstrate the effectiveness of these approaches.
Table of Contents
Introduction -- Mobile visual search Compression -- Fast transform for VVC -- Improved intra prediction beyond HEVC -- Conclusion
Ph.D. (Doctor of Philosophy)