About Deduplication in Interfolio Web Profiles
Generally speaking, we expect that scholars only want records to show once on the web profile (on the user page, the unit page, and in search results). However, there are many reasons why duplicates could occur including, but not limited to:
- User enters the record twice in FAR.
- User collaborates with coauthors at the same institution.
Our technology performs several jobs to prevent duplicate records from showing on the web profile. The aim of this article is to help you understand what our deduplication rules are so that you can understand why a record was deduplicated (and therefore does not show) or why it was not deduplicated (and therefore shows but perhaps “looks” like a duplicate).
Deduplication Rules: Identifying and Removing Same Content
- Deduplication checks occur within scholarship type (ex: we compare conference proceedings to conference proceedings, not conference proceedings to journal articles). Given that scholars often present their work before publication, when those two activities have the same titles, this will appear to be duplicated, but are actually separate activities.
- DOI Check: Records with the same Digital Object Identifier (DOI) are deduplicated if there is also an 80% match on Title.
- When records do not match on DOI, we compare several fields. All the following must be true for a record to be identified as a duplicate. Publication B must match publication A on:
- Title and subtitle (80% similarity)
- Number of pages
- Persons
- Publication date (within a 1-year difference)
- Pages
- ISBN
- Host publication title (e.g., journal title)
Merge Different Variants of the Same Scholarship
The web profile takes the first received version of the record and fills in blank fields with data from coauthors when available. In the case of institutional coauthorship (#2 at the top of this article), when scholars use FAR to denote their coauthors who are faculty members at their current institution, every coauthor/FAR user on the record has their own version of that scholarship activity in FAR. This could result in at least 3 problems:
- Inconsistency in representation of the same scholarship activity across coauthor profile pages.
- Unit and institutional activity counts being artificially inflated by counting all the versions as unique scholarship activities.
- Search results being less useful given they will also show duplicates.
We aim to prevent these problems by merging the records. Our merging process includes the following steps:
- The first record received by the web profile from FAR is identified as the basis for what is going to show on the web profile.
- Other scholars entry of that scholarship activity in FAR are allowed to fill in blank fields.
Enrichment: Fetch from outside of FAR to add metadata to the record
After deduplication and merge processes, we attempt to match all records sent to the web profile to records in Scopus to enrich them with key metadata.
For more information about our enrichment processes and metrics, please check out these help articles: