Human Gene Coexpression Analysis

What does HGCA do?

Human Gene Correlation Analysis (HGCA) is used for the identification of transcriptionally correlated (coexpressed) genes in Homo sapiens.

How do I cite HGCA?

Zogopoulos, V.L., Malatras, A., Kyriakidis, K., Charalampous, C., Makrygianni, E.A., Duguez, S., Koutsi, M.A., Pouliou, M., Vasileiou, C., Duddy, W.J., Agelopoulos, M., Chrousos, G.P., Iconomidou, V.A., and Michalopoulos, I. (2023). HGCA2.0: An RNA-Seq Based Webtool for Gene Coexpression Analysis in Homo sapiens. Cells 12, 388.
Michalopoulos, I., Pavlopoulos, G.A., Malatras, A., Karelas, A., Kostadima, M.A., Schneider, R., and Kossida, S. (2012). Human gene correlation analysis (HGCA): A tool for the identification of transcriptionally coexpressed genes. BMC Res Notes 5, 265.

Further reading on coexpression analysis:

Zogopoulos, V.L., Saxami, G., Malatras, A., Papadopoulos, K., Tsotra, I., Iconomidou, V.A., and Michalopoulos, I. (2022). Approaches in Gene Coexpression Analysis in Eukaryotes. Biology 11, 1019.

What data are stored and where do they come from?

HGCA database mainly contains:

Expression data for 55431 genes in 3500 samples of RNASeq data from GTEx
Annotation data from Ensembl and GeneCards
Gene Ontology data from Gene Ontology
Pathway data from KEGG Pathway
Biological Pathway data from WikiPathways
Transcription factor-target gene interaction data from ENCODE through Harmonizome website
Genetic disease data from OMIM
Gene - Disease association data from DisGeNET
Protein family data from Pfam
Gene chromosome bands and coordinates from NCBI

What is the input and output of HGCA?

The ENSG Gene Name or the Gene Symbol of a gene of interest can be used as input. The output shows the most closely coexpressed genes to the driver gene as a coexpression subtree, as well as their ENSG Gene names, Gene Symbols and descriptions. A biological term category can be picked from the drop down menu to perform an Over-representation analysis.

What is the Over-representation analysis?

By selecting one of the available Enrichment Analyses from the drop down menu, HGCA will perform a term over-representation analysis for each term in that category that describes the list of the coexpressed genes. The statistical significance (p-value) of the over-representation of each term is based on Hypergeometric Distibution. P-values are adjusted using Benjamini–Hochberg procedure. The Enrichment Summary only outputs terms whose over-representation p-value is below the 0.05 Cut-off.

What are the Ensembl Gene Annotation, Gene Ontology: Biological Process, Gene Ontology: Cellular Component, Gene Ontology: Molecular Function, KEGG Pathway, WikiPathways, ENCODE, OMIM, DisGeNET, Pfam and Chromosome Band lists useful for?

The following sorts of analyses can be performed:

Ensembl Gene Annotation: This is the default output of the tool. A description of each gene is shown. There is no Over-representation analysis at this stage.
Biological Process: Biological Process is one of the Gene Ontology aspects. The Biological Process GO Terms for each gene is shown. Over-representation analysis shows the most over-represented GO Terms of the collection of GO Terms. This analysis is useful when a repeated term can imply the biological process the gene is likely to participate in.
Molecular Function: Molecular Function is one of the Gene Ontology aspects. The Molecular Function GO Terms for each gene is shown. Over-representation analysis shows the most over-represented GO Terms of the collection of GO Terms. This analysis is useful when a repeated term can imply the molecular function the gene is likely to have.
Cellular Component: Cellular Component is one of the Gene Ontology aspects. The Cellular Component GO Terms for each gene is shown. Over-representation analysis shows the most over-represented GO Terms of the collection of GO Terms. This analysis is useful when a repeated term can imply the cellular component the gene is likely to be part of.
KEGG Pathway: The KEGG Pathway terms and their descriptions for each gene is shown. Over-representation analysis shows the most over-represented KEGG terms. This analysis is useful when a repeated term can imply the pathway the gene is likely to participate in.
WikiPathways: The WikiPathways terms and their descriptions for each gene is shown. Over-representation analysis shows the most over-represented WikiPathways terms. This analysis is useful when a repeated term can imply the pathway the gene is likely to participate in.
ENCODE: The transcription factors and their target genes are shown. Over-representation analysis shows the most over-represented transcription factors. This analysis is useful when a repeated term can imply the list of transcription factors which drive coexpression.
OMIM: The OMIM genetic disease id and their description are shown. This analysis is useful to discover if coexpressed genes play a vital role in the same genetic disease.
DisGeNET: The DisGeNET disease id and their description are shown. Over-representation analysis shows the most over-represented diseases in which the coexpressed genes in the list are involved.
Pfam: The Pfam terms and their descriptions for each gene is shown. Over-representation analysis shows the most over-represented Pfam terms of the collection of Pfam terms. This analysis is useful when a repeated term can imply the protein family the gene is likely to belong to.
Chromosome Band: The Chromosome Bands of each gene are shown. Each band redirects to the specific genomic location in the Ensembl genome browser

How can I navigate through the lists?

The user can change enrichment analysis by selecting another category. Alternatively, the user can select a different driver gene by clicking on a different ENSG id. The user can also visit external sources that are related to the terms shown on the analysis.

What is the gene list useful for?

The user can download the current tree gene list that can be used for further analyses in external websites. Automatic redirections to multiple websites such as String and g:Profiler are already provided.

How can I navigate through the trees?

Further to the list navigation, the user can choose to see more or less nodes of the subtree. The Newick formatted subtree can also be downloaded. The tree can also be viewed externally in the iTol tree viewer.

Is there an API available?

HGCA2.0 coexpression results are available through a public JSON-based API endpoint which is keyed on a Ensembl gene stable ID, node number and, optionally, enrichment category. For example: https://www.michalopoulos.net/hgca2.0/api/ENSG00000114391/5/bp provides the results of Gene Ontology: Biological Process enrichment analysis for the coexpression subtree of 5 ancestral nodes for ENSG00000114391 driver gene. Instructions and an API parser are available.

What are your contact details?

Contact Dr Ioannis Michalopoulos

The “ELIXIR-GR: Managing and Analysing Life Sciences Data (MIS: 5002780)” Project is co-financed by Greece and the European Union - European Regional Development Fund