Visual place recognition in aerial imagery

No Thumbnail Available

Meeting name

Sponsors

Date

Journal Title

Format

Thesis

Subject

Research Projects

Organizational Units

Journal Issue

Abstract

[EMBARGOED UNTIL 12/01/2026] Visual localization within wide-area urban settings presents unique challenges due to building occlusions, substantial perspective variations, and large operational scales. A fundamental difficulty arises from the drastic scale discrepancy between comprehensive aerial imagery used for database construction and off-center, oblique bird's-eyeview drone imagery. While current Visual Place Recognition methods match whole images using aggregated global descriptors, we address the distinct challenge of localizing specific buildings within large-scale aerial imagery using patch-level feature matching. In this thesis, we introduce the Landmark Matching Network (LMNet) family for city-scale aerial image localization. LMNet employs a Siamese architecture with Multi-Patch matching to handle off-center landmarks and occlusions. LMNet++ incorporates multi-head attention mechanisms for improved computational efficiency. WS-LMNet extends this into a fully convolutional architecture for direct landmark detection in high-resolution Wide-Area Motion Imagery (WAMI). Building on these CNN-based methods, we introduce LMDNet leveraging DINOv3 Vision Transformer features with a patch-level similarity algorithm. LMDNet-C extends this with hierarchical representations merging semantic features with attentionweighted discriminative patches. LMDNet-VR implements a coarse-to-fine retrieval pipeline with SIFT-based geometric verification. Extensive experiments across four cities (Albuquerque, Berkeley, Los Angeles, Syracuse) using 10,000 query images demonstrate robust performance. For detection, WS-LMNet achieves 76.2% Top@1 accuracy in localizing buildings across city orbits. For view retrieval, LMDNet-VR reaches 83% Top@1 in identifying exact matching frames. The proposed methods offer practical benefits including ninefold storage reduction and real-time operation.

Table of Contents

PubMed ID

Degree

Ph. D.

Thesis Department

Rights

License