Introduction

Pharos is the user interface to the Knowledge Management Center (KMC) for the Illuminating the Druggable Genome (IDG) program funded by the National Institutes of Health (NIH) Common Fund. (Grant No. 1U24CA224370-01). The goal of KMC is to develop a comprehensive, integrated knowledge-base for the Druggable Genome (DG) to illuminate the uncharacterized and/or poorly annotated portion of the DG, focusing on three of the most commonly drug-targeted protein families:

  • G-protein-coupled receptors (GPCRs)
  • Ion channels (ICs)
  • Kinases

The Pharos interface provides facile access to most data types collected by the KMC. Given the complexity of the data surrounding any target, efficient and intuitive visualization has been a high priority, to enable users to quickly navigate & summarize search results and rapidly identify patterns. A critical feature of the interface is the ability to perform flexible search and subsequent drill down of search results. Underlying the interface is a GraphQL API that provides programmatic access to all KMC data, allowing for easy consumption in user applications.

Our Collaborators

Pharos is developed at NCATS, together with collaborators from the University of New Mexico, Icahn School of Medicine, Mount Sinai, EMBL-EBI, the Novo Nordisk Foundation Center for Protein Research (U. Copenhagen) and the University of Miami.

The DRGC network is the experimental side of IDG. DRGC research focuses on illuminating the druggable genome by two-pronged approach of empirical screening of drugs followed by computational screening against modeled structures of the GPCR to produced optimized lead compounds, thereby providing high value data and knowledge for KMC. KMC in turn is tasked with providing guidance as to research priorities based on knowledge gaps and druggability likelihood analyses.

To find more details on our collaborators visit the IDG Consortium website

How to Link to Pharos

Link to target details pages with uniprot_id or gene symbol

https://pharos.nih.gov/targets/uniprot_id
https://pharos.nih.gov/targets/gene_symbol

Link to a list of targets with a comma separated list of UniProt IDs or gene symbols

https://pharos.nih.gov/targets?collection=gene_symbol1,uniprot_id,gene_symbol2
https://pharos.nih.gov/analyze/targets?collection=gene_symbol1,uniprot_id,gene_symbol2

Link to disease details pages with disease name or MONDO ID

https://pharos.nih.gov/diseases/disease_name
https://pharos.nih.gov/diseases/mondo_id

Link to ligand details pages with drug name or chembl ID

https://pharos.nih.gov/ligands/drug_name
https://pharos.nih.gov/ligands/chembl_ID

Render structures with a URL encoded SMILES string

https://ncatsidg.appspot.com/render?standardize=true&size=150&structure=SMILES
Available Data

The data available in Pharos is obtained from the Target Central Resource Database (TCRD) which integrates data from a variety of data sources including the Harmonizome, Jensen Lab datasets, EBI data sets (such as ChEMBL) and the Drug Target Ontology (DTO) from U. Miami. TCRD integration methodology involves importation by value or by reference to external sources, as informed by performance, provenance, and other design criteria. See below for a full listing of datasets incorporated in to TCRD.

Data Types

The key data types represented in Pharos are listed below:

Small molecule data including approved drug data, bioassay data
Protein data including protein-protein interaction data
Disease data from OMIM and Disease Ontology
Genomic data including expression (protein, RNA), transcription factors and epigenomic associations
Phenotypic data including mouse phenotypes, mouse/human orthologs and GWAS results
Text data including GeneRIF's and text-mined publications
Ontologies including the Drug Target Ontology, Mondo Disease Ontology, Disease Ontology, PANTHER, and GO
Data Sources

The goal of the IDG KMC is to integrate a variety of data sources to shed light on unstudied and understudied targets. To achieve this we have pulled together data on protein targets, small molecule activity, genomic behavior and disease links. We are continually researching ang incorporating other relevant data sources.

Source
Targets
Diseases
Ligands
9233
GTEx
HPA Protein
HPA RNA
HPM Protein
JensenLab TISSUES
5614
8960
Data Download

In addition to the CSV download links available on all the List and Details pages, users can also download the entire SQL database underlying Pharos. Versions of Target Central Resource Database (TCRD) are available here, and Pharos' version is available from NCATS. Pharos' version has some updated data sources, such as publications, expression data, etc., while TCRD includes some data that doesn't get displayed in Pharos, such as ClinVar and RDO.

Pharos License

Data accessed from Pharos and TCRD is publicly available from the primary sources listed above. Please respect their individual licenses regarding proper use and redistribution.

Pharos Code

The sources for the Pharos web interface are available from https://github.com/ncats/pharos_frontend and https://github.com/ncats/pharos-graphql-server. The repositories provide README instructions on building and installation.

Attribution

If you use Pharos, please consider citing it as:

Kelleher, K., Sheils, T. et al, "Pharos 2023: an integrated resource for the understudied human proteome", Nucl. Acids Res., 2023. DOI: 10.1093/nar/gkac1033
Other References
Sheils, T., Mathias, S. et al, "TCRD and Pharos 2021: mining the human proteome for disease biology", Nucl. Acids Res., 2021. DOI: 10.1093/nar/gkaa993
Sheils, T., Mathias, S. et al, "How to Illuminate the Druggable Genome Using Pharos", Curr Protoc Bioinformatics, 2020, 69(1). DOI: 10.1002/cpbi.92
Nguyen, D.-T., Mathias, S. et al, "Pharos: Collating Protein Information to Shed Light on the Druggable Genome", Nucl. Acids Res., 2017, 45(D1), D995-D1002. DOI: 10.1093/nar/gkw1072
Help

For help in Pharos, click the help icon or check the Faq page

For feedback and comments, please contact us at pharos@mail.nih.gov