Pharos : About

Introduction

Pharos is the user interface to the Knowledge Management Center (KMC) for the Illuminating the Druggable Genome (IDG) program funded by the National Institutes of Health (NIH) Common Fund. (Grant No. 1U24CA224370-01). The goal of KMC is to develop a comprehensive, integrated knowledge-base for the Druggable Genome (DG) to illuminate the uncharacterized and/or poorly annotated portion of the DG, focusing on three of the most commonly drug-targeted protein families:

G-protein-coupled receptors (GPCRs)
Ion channels (ICs)
Kinases

The Pharos interface provides facile access to most data types collected by the KMC. Given the complexity of the data surrounding any target, efficient and intuitive visualization has been a high priority, to enable users to quickly navigate & summarize search results and rapidly identify patterns. A critical feature of the interface is the ability to perform flexible search and subsequent drill down of search results. Underlying the interface is a GraphQL API that provides programmatic access to all KMC data, allowing for easy consumption in user applications.

Our Collaborators

Pharos is developed at NCATS, together with collaborators from the University of New Mexico, Icahn School of Medicine, Mount Sinai, EMBL-EBI, the Novo Nordisk Foundation Center for Protein Research (U. Copenhagen) and the University of Miami.

The DRGC network is the experimental side of IDG. DRGC research focuses on illuminating the druggable genome by two-pronged approach of empirical screening of drugs followed by computational screening against modeled structures of the GPCR to produced optimized lead compounds, thereby providing high value data and knowledge for KMC. KMC in turn is tasked with providing guidance as to research priorities based on knowledge gaps and druggability likelihood analyses.

To find more details on our collaborators visit the IDG Consortium website

How to Link to Pharos

Link to target details pages with uniprot_id or gene symbol

https://pharos.nih.gov/targets/uniprot_id
https://pharos.nih.gov/targets/gene_symbol

Link to a list of targets with a comma separated list of UniProt IDs or gene symbols

https://pharos.nih.gov/targets?collection=gene_symbol1,uniprot_id,gene_symbol2
https://pharos.nih.gov/analyze/targets?collection=gene_symbol1,uniprot_id,gene_symbol2

Link to disease details pages with disease name or MONDO ID

https://pharos.nih.gov/diseases/disease_name
https://pharos.nih.gov/diseases/mondo_id

Link to ligand details pages with drug name or chembl ID

https://pharos.nih.gov/ligands/drug_name
https://pharos.nih.gov/ligands/chembl_ID

Render structures with a URL encoded SMILES string

https://ncatsidg.appspot.com/render?standardize=true&size=150&structure=SMILES

Available Data

The data available in Pharos is obtained from the Target Central Resource Database (TCRD) which integrates data from a variety of data sources including the Harmonizome, Jensen Lab datasets, EBI data sets (such as ChEMBL) and the Drug Target Ontology (DTO) from U. Miami. TCRD integration methodology involves importation by value or by reference to external sources, as informed by performance, provenance, and other design criteria. See below for a full listing of datasets incorporated in to TCRD.

Data Types

The key data types represented in Pharos are listed below:

Small molecule data including approved drug data, bioassay data

Protein data including protein-protein interaction data

Disease data from OMIM and Disease Ontology

Genomic data including expression (protein, RNA), transcription factors and epigenomic associations

Phenotypic data including mouse phenotypes, mouse/human orthologs and GWAS results

Text data including GeneRIF's and text-mined publications

Ontologies including the Drug Target Ontology, Mondo Disease Ontology, Disease Ontology, PANTHER, and GO

Data Sources

The goal of the IDG KMC is to integrate a variety of data sources to shed light on unstudied and understudied targets. To achieve this we have pulled together data on protein targets, small molecule activity, genomic behavior and disease links. We are continually researching ang incorporating other relevant data sources.

Source	Targets	Diseases	Ligands
Animal TFDB	1630

Antibodypedia	18496

ARCHS4	20238

BioPlex Protein-Protein Interactions	12005

CCLE	18750

Cell Surface Protein Atlas	1038

ChEMBL	2288		353872

ClinVar	2947

Consensus Expression Values	19008

CTD	7837	5748

Dark Kinase Knowledgebase	161

Disease Ontology		9233

DisGeNET	9025	10193

Drug Central - ChEMBL	886		1095

Drug Central - Drug Label	324		200

Drug Central - GtoPdb	157		156

Drug Central - Kegg Drug	19		17

Drug Central - Scientific Literature	302		277

Drug Central Indication	1118	1452

Drug Target Ontology IDs and Classifications	9232

EBI Patent Counts	1710

Ensembl Gene IDs	19452

eRAM	5139	1362

Expression Atlas	16784	107

Gene Ontology	7107

GENEVA	18883

GlyGen	20175

GTEx	19241

Guide to Pharmacology	1321		5136

GWAS Catalog	13116

Harmonizome	18789

HGNC	20206

HomoloGene	18806

HPA Protein	11023

HPA RNA	19203

HPM Protein	16567

Human Cell Atlas Compartments	11166

Human Cell Atlas Expression	19070

Human Protein Atlas	10513

Human Proteome Map	16855

IDG Eligible Targets List	1301

IDG Families	8147

IMPC Mouse Clones	270

IMPC Phenotypes	5787

JAX/MGI Mouse/Human Orthology Phenotypes	10204

JensenLab COMPARTMENTS	18491

JensenLab Knowledge UniProtKB-KW	2418	118

JensenLab PubMed Text-mining Scores	19052

JensenLab Text Mining	2785	1268

JensenLab TISSUES	17987

KEGG Distances	4896

KEGG Nearest Tclins	2977

KEGG Pathways	7686

LINCS	980

LINCS L1000 XRefs	978

LocSigDB	18916

Monarch	3825	5096

Monarch Ortholog Disease Associations	3827	5614

NCBI Gene	20153

NCBI GI Numbers	20402

OMIM	13856

Orthologs	18056

P-HIPSTer Viral PPIs	5719

PANTHER Protein Classes	8070

PathwayCommons	5001

ProKinO	544

PubChem CIDs			355932

PubMed	19790

PubTator Text-mining Scores	18310

RCSB Protein Data Bank	6510

Reactome Pathways	10781

Reactome Protein-Protein Interactions	4465

RESOLUTE	451

RGD	431

STRING IDs	19121

STRINGDB	19057

Target Illumination GWAS Analytics (TIGA)	18005	326

TIN-X Data	18982	8960

TMHMM Predictions	5350

Transcription Factor Flags	1630

UniProt	20412

UniProt Disease	3766	4972

WikiPathways	6411

Data Download

In addition to the CSV download links available on all the List and Details pages, users can also download the entire SQL database underlying Pharos. Versions of Target Central Resource Database (TCRD) are available here, and Pharos' version is available from NCATS. Pharos' version has some updated data sources, such as publications, expression data, etc., while TCRD includes some data that doesn't get displayed in Pharos, such as ClinVar and RDO.

Pharos License

Data accessed from Pharos and TCRD is publicly available from the primary sources listed above. Please respect their individual licenses regarding proper use and redistribution.

Pharos Code

The sources for the Pharos web interface are available from https://github.com/ncats/pharos_frontend and https://github.com/ncats/pharos-graphql-server. The repositories provide README instructions on building and installation.

Attribution

If you use Pharos, please consider citing it as:

Kelleher, K., Sheils, T. et al, "Pharos 2023: an integrated resource for the understudied human proteome", Nucl. Acids Res., 2023. DOI: 10.1093/nar/gkac1033

Other References

Sheils, T., Mathias, S. et al, "TCRD and Pharos 2021: mining the human proteome for disease biology", Nucl. Acids Res., 2021. DOI: 10.1093/nar/gkaa993

Sheils, T., Mathias, S. et al, "How to Illuminate the Druggable Genome Using Pharos", Curr Protoc Bioinformatics, 2020, 69(1). DOI: 10.1002/cpbi.92

Nguyen, D.-T., Mathias, S. et al, "Pharos: Collating Protein Information to Shed Light on the Druggable Genome", Nucl. Acids Res., 2017, 45(D1), D995-D1002. DOI: 10.1093/nar/gkw1072

Help

For help in Pharos, click the help icon or check the Faq page

For feedback and comments, please contact us at pharos@mail.nih.gov