Q: Where is this data from?
A: This data is a processed form of the September 2019 release of the Global Network of Biomedical Relationships dataset.
Q: Where can I read more about the methods?
A: The full paper is Percha, Bethany, and Russ B. Altman. "A global network of biomedical relationships derived from text." Bioinformatics 34.15 (2018): 2614-2624.
Q: Where are the associations extracted from?
A: All associations are extracted from PubMed abstracts. At the moment, full-text articles aren't included in the dataset.
Q: How has the data been processed?
A: The GNBR data has been processed to enable viewing in this form. Each candidate association between two biomedical entities in a sentence is given scores for each applicable theme. We normalize the scores using percentile rank to make them comparable and then use the theme with the highest score for each association. This percentile score is then provided for each sentence in the lower table.
Q: Some of the entity names don't match what's in the text. Why is that?
A: We've normalized the entities back the corresponding ontology and are using the canonical name for them. For examples, mentions of the gene HER2 are normalized to the ERBB2 gene in NCBI genes. The ontologies used are NCBI genes, MeSH, CHEBI and OMIM.
Q: What does gene species mean?
A: Each gene extracted from text is linked to a specific species. GNormPlus, the tool that extracts gene names, uses species predictions on the text to match appropriately. This means the (hopefully) correct identifier is given for a gene, so that EGFR in a human cancer paper is matched to EGFR (human, id=1956) instead of EGFR (mouse, id=13649). You can filter the gene associations so that they only contain genes from a specific species.