Mazoni, Alysson Fernandes; Borges, Luís Fabiano Farias; Macedo, Estevao Fernandes; Tuesta, Esteban Fernandez
SciELO Preprints
Numerous initiatives are currently underway to disambiguate databases worldwide. In this paper, we propose a methodology for disambiguating research entities using big data techniques, adopting an approach that goes from local to global databases. Our objective is to enhance the quality of data in the OpenAlex database by leveraging information from Brazilian databases, particularly data from the Lattes Platform and the Brazilian Federal Agency for Support and Evaluation of Graduate Education. We compare similar names of authors and institutions, employing Digital Object Identifiers to link entities, along with an adaptation of the Levenshtein distance algorithm. The proposed method is straightforward to implement in tabular databases and facilitates disambiguation, thereby contributing to open science practices and providing an effective solution for research information systems. The findings indicate the potential for integrating local and global databases to address issues related to ambiguous names and incomplete metadata.