The goal of this project is to build a high quality gene set library from the Gene Expression Omnibus (GEO) by identifying studies that perturbed a single gene in a mammalian system (cells or tissue) and then gene expression was measured. The recommended method for identifying the gene set from these studies is the utilization of a new tool developed by the Ma'ayan Lab called GEO2Enrichr. GEO2Enrichr is a Chrome Extension that adds functionality to the GEO database web-site to make extraction of gene set and downstream analysis easy. Your goal is to identify such studies and then use GEO2Enrichr to create and submit signatures into this web-site form. You will receive 1 point for each unique entry into the database. If we find a mistake in any of your entries we will subtract 30 points from your overall score for each such mistake. The hashtag for this project is #GENES_BD2K_LINCS_DCIC_COURSERA.
We used the collected single gene perturbation signatures as postive training samples and other signatures as negative samples to learn a Gradient Boosting classifier to identify GEO series studying single gene perturbation based on the textural description of the GEO series. The classifier was then applied to all microarray studies on GEO performed in human or mouse to prioritize GEO series studying single gene perturbation. The results of the classifier is listed below.