Q: What is KeyGenes?
A: KeyGenes is an algorithm to predict the identity and provide you with an identity score for the queried samples. It uses transcriptional profiles of the queried data (test set) and matches them to chosen sets of transcriptional profiles (training set). The idea is that you select data set from tissues/organs of interest and in general (training set) and that you provide data set of cells (test set) differentiated towards one of the tissues/organs included in the training set. KeyGenes will provide the samples in the test set with an identity score to the samples that have been included in the training set. Therefore, it is very important to choose your training set carefully. Moreover, KeyGenes uses the top 500 most variably expressed genes. This top 500 list should also be carefully selected depending on the training set used.
Q: I do not know R. Can I still use KeyGenes?
A: Yes, a Web App is provided but it is at the moment limited to the use of “fixed” training sets of human transcriptional data. In that Web App, a test set can be uploaded and one of the provided “fixed” training sets can be selected. At the moment the WebApp is also limited to the use of NGS-derived data. Choosing a “flexible” training set at the moment is not possible using the Web App.
Q: Can I use KeyGenes on a microarray-derived test set?
A: Although KeyGenes was designed for NGS-data, it can be applied on a microarray-derived test set by using Script 3. However, microarray-derived data of tissues/organs from Affymetrix and Illumina platforms have been tested only using the training set “fetal”. Microarray-derived data of differentiated cells have not been tested with other training sets, and therefore, the results should be interpreted with care. See section 2.3. for more information. In addition, we also suggest to use http://cellnet.hms.harvard.edu/ for that.
Q: My data is annotated with other than Ensembl Gene IDs. Can I still use the “fixed” training set?
A: When one of the provided training sets or the script for microarray-derived data is used, the genes must be annotated with Ensembl Gene IDs. If a new training set of NGS-derived data is used, alternative annotations are possible. But, both, the training and the test set, need to have the same annotation. See section 2.4. for more information.
Q: I would like to use my own training set. Is this possible?
A: Yes, this is possible as described in section 2.4. However, if the training set is assembled of data from different sources, the selection of the adequate top500 is crucial to obtain meaningful predictions and identity scores. We are working on a Web App to allow this as well.
Q: Can I analyse mouse data using KeyGenes?
A: Yes, this is possible. Providing the training set and test set are from the same organism, KeyGenes can analyze the data regardless of the organism. See section 2.4. for more information.