Multi-scale deep-learning approaches for visual coding and processing
Date
2024Metadata
[+] Show full item recordAbstract
Visual information can be captured in both 2D and 3D formats. For example, an image is a 2D representation while a point cloud is a 3D representation. A sequence of images captured to represent a certain duration of an event is called a video. Similarly, such a sequence of a point cloud is called a dynamic point cloud. With the advancement of sensor technology, it is now possible to capture extremely high-resolution visual information in both spatial and temporal dimensions. The raw visual data are enormous in size. Therefore, an efficient compression technology, which enables efficient transmission and storage of large amounts of visual data, is crucial for powering various 2D and 3D video applications like streaming, conferencing, surveillance, augmented-reality(AR)/virtual-reality(VR) etc.
Over the years, various compression standards for video and point clouds have been developed to meet the quality-of-experience (QoE) demand. These compression technologies essentially divide a video and point cloud into blocks, then perform various transform, prediction and quantization on their color and attribute, respectively. This inherently introduces compression artifacts like blocking, ringing, blurring, etc., in the color and attributes of the reconstructed frame, compromising the QoE. Similarly, in applications like 3D video streaming, a point cloud is often down-sampled to satisfy transmission bandwidth limitations, resulting in a loss of frame quality. A receiver then needs to perform geometry and optionally attribute upsampling to improve the QoE.
In video compression standards, in-loop filters are developed to alleviate compression artifacts. These in-loop filters are hand-crafted and often sub-optimal in performance. Moreover, such in-loop filters are not proposed in point cloud compression standards. Similarly, various optimization and learning-based geometry upsampling technologies have been proposed for point clouds, but not many effective upsampling solutions exist for attributes. In this thesis, various deep learning methods are studied to reduce compression artifacts in video and point clouds by proposing an in-loop filter for Versatile Video Coding (VVC/H.266) and post-processing for Geometry-based Point Cloud Compression (G-PCC). Additionally, we propose a deep learning-based solution to effectively perform color upsampling for point cloud. Furthermore, we explore multi-scale deep learning architectures to develop solutions for the aforementioned challenges.
Table of Contents
Introduction -- Learned in-loop filter for VVC -- Point cloud attribute compression artifacts removal -- Point cloud color upsampling
Degree
Ph.D. (Doctor of Philosophy)