NetGenes

NetGenes is a web-database which contains essential genes for 2711 bacterial organisms predicted using protein-protein interaction networks. These interaction networks are retrieved from STRING-DB v11.0.

Essential genes are indispensable genes for organisms to grow and reproduce offspring under certain environment. In general, deletion of such genes will either compromise organism viability or results in profound loss of fitness.

Gene essentiality is challenging to predict since it encompasses a number of factors. Nevertheless, many studies have sought to predict essentiality using machine learning approaches. For this purpose, essentiality was defined as indispensability of a gene under rich media conditions. [ref]

We retrieved all the interactomes offered by STRING-DB v11.0 and separated the bacterial interactomes. The bacterial filter yielded 2711 organisms. The list of organisms and the number of predicted essential genes in each organism is given in the Species List file which can be downloaded from Downloads page.

Each gene is represented as a feature vector of size 283. 267 of those features comes from a graph mining method called Recursive Feature Extraction (ReFeX). Learn more about ReFeX in this paper. The rest of the 16 features are centrality features calculated for each node in the interactome. You can find the names of the features in Training Dataset file in Downloads page.

NetGenes provides functional annotation and STRING ID for each gene. Along with that, NetGenes also gives you essentiality score for each gene which is the probability estimate that the given gene is an essential gene. You can find these information in the individual organism page.

For a gene to be predicted as essential in our model, it has to score greater than 70% predicted essentiality probability. Only the genes that crossed this threshold were classified as essential genes and added in NetGenes. The essentiality score is the predicted probability percentage. Hence, the range is from 70 to 100.

You can download individual organism files either in the organism table in Home page or in the individual organism page. If you want download data for all organisms, you can download the Complete Genes List file in Downloads page.

We trained the model using 27 organisms used in our original paper and trained it on the rest of the 2711 organisms. The training dataset used is given in the Training Dataset file. Feature matrices link provides you the feature matrices for all 2711 organisms which is used as test data.

Cite us at: Azhagesan, K., Ravindran, B., & Raman, K., (2018) Network-based features enable prediction of essential genes across diverse organisms. PLOS ONE, 13(12), e0208722. DOI

Frequently Asked Questions

If you have any more questions/issues, feel free to reach us at our GitHub repo.