From Pixels to Metadata: Reviving the Music Index with AI-Powered Archival Structuring

Conference:

ARCHIVES*RECORDS 2025

Session Type:

Pop-Up Session 

Session Chair:

Vijay Singh  
N/A

Abstract:

How do we recover structured, searchable metadata from pages of mid-century typewritten music bibliographies? In this session, we present the collaborative work between Doxie.AI and EBSCO to digitally restore and enrich "The Music Index," an iconic reference for music scholarship spanning decades. Using AI-assisted data processing and custom visual parsing models, our team is transforming scanned images of densely printed bibliographic entries into structured, queryable metadata in a modern format.

The project combines intelligent OCR preprocessing, machine-learning-based layout analysis, and robust post-processing pipelines to extract fields such as subject, title, author, source, volume, date, and pagination. In this Pop-Up session, we will share how we built a custom AI system to parse complex, inconsistently formatted entries; how we collaborated with domain experts to define field logic; and how we tuned quality control processes to ensure accuracy and transparency in the digitized record.

We will demo how source images-like those containing brief, densely packed bibliographic lines-are processed into structured records suitable for integration into EBSCO databases and archival metadata systems.

Short Description:

We highlight the recovery and stewardship of cultural records that include diverse musical voices and contributors across time, geography, and identity. The Music Index spans decades of global music literature, offering a rich archive of underrepresented artists, genres, and researchers. Our work ensures these entries are not lost to obscurity but made accessible to scholars worldwide. Our team represents a blend of technologists, archivists, and institutions collaborating across geographies.

Pop-Up Format:

Lightning Talks