An image-text model for effective retrieval of seventeenth-century Spanish American notary records

Loading...
Thumbnail Image

Meeting name

Sponsors

Date

Journal Title

Format

Thesis

Subject

Research Projects

Organizational Units

Journal Issue

Abstract

Historical manuscripts are rich in information but challenging to search due to their physical format (e.g., poor image quality) and varied handwriting. The existing work on handwritten documents has predominantly utilized traditional optical character recognition (OCR) techniques, which struggle with the varied and complex handwriting styles. In this thesis, we investigate the potential of an image-text model that captures the semantic similarities between image and text pairs, for effective retrieval of documents in the seventeenth-century Spanish American notary records containing diverse writing styles. We propose a new technique using OpenAI's Contrastive Language-Image Pretraining (CLIP) model, using contrastive learning to understand and match images and text through high-dimensional embeddings. We fine-tuned the CLIP model using a specialized dataset of handwritten Spanish words, curated and annotated by paleography experts, focusing on maximizing the alignment between image and text embeddings. This fine-tuning involved adjusting the model to capture the nuances of historical handwriting better. We employed the fine-tuned OCR model in KGSAR, a previous retrieval technique for the notary records, to detect and extract word patches and the corresponding text from the documents. The extracted words were then converted into embeddings using the fine-tuned CLIP model and stored in a database. During retrieval, the query keyword was also transformed into its embedding using the fine-tuned CLIP model. The database of embeddings was queried using similarity search to find the top-k matching documents. We compared our technique with KGSAR, which built a knowledge graph for retrieval. The matching documents output by both techniques were reviewed by a paleography expert to mark a document as relevant or irrelevant for each query keyword. Our technique outperformed KGSAR for top-5 matching documents. It achieved a mean average precision (mAP) score of 0.80 compared to 0.26 by KGSAR. Our findings demonstrate that an image-text embedding model can enable effective retrieval of the Spanish American notary records.

Table of Contents

DOI

PubMed ID

Degree

M.S.

Thesis Department

Rights

License