Megatasks

Develop a function that can transform L1000 data to P100 data

Deadline:
December 31st, 2017 11:59 PM EST.
Status: Open
Instructions:
The challenge is to construct a regression model of the characteristic directions with L1000 transcriptomic data as the input variables and P100 phosphoproteomics as the output variables. Submitted models will be evaluated based on an angular distance metric and based on the ability of the model to successfully predict relationship between future P100 and L1000 experiments. Attached are two heatmaps of the complete training set with clustering based on cosine distances, indicating the difficulty of such task. The column colors represent the small-molecule perturbagens (with no relationship between similar colors). As can bee seen on the clustering of directions, many of the P100 perturbation directions depend on the cell line. Therefore, successful approaches may have to explicitly model differences between cell lines. Use of external data are encouraged if helpful in this regard. Note that the row and column metadata is considered part of the input data.
Submit your contribution here...

Predict protein-protein interactions from a collection of gene sets

Deadline:
December 31st, 2017 11:59 PM EST.
Status: Open
Instructions:
You are given a large collection of non-annotated gene sets. You goal is to construct a network from this gene sets to predict protein-protein interactions. The algorithm you develop must be self-contained. This means you can’t use as part of the executable program any prior knowledge about known protein-protein interactions. The computer program that predicts protein-protein interactions needs to be a UNIX based command line executable that takes as an input the file containing the gene sets and outputs the top 5000 predicted protein-protein interactions. Your code needs to be open source upon completion of the task deadline and documentation of the inner workings of the code needs to be provided as part of the submission process of this task.
Submit your contribution here...

Prediction of Adverse Drug Reactions (ADRs) with gene expression data

Deadline:
December 31st, 2017 11:59 PM EST.
Status: Open
Instructions:
The goal of this challenge is to predict ADRs using gene expression data. The gene expression data was generated using the L1000 technology at the Broad Institute by the Connectivity Map Team. 978 genes were directly measured in the human cancer cell lines before and after drug treatment. The expression data that is provided for this mega-challenge consists of gene expression signatures of drug treatments calculated using the Characteristic Direction method. This is data should be considered the feature set or the attributes. The class matrix contains ADRs for the corresponding drugs that were profiled for gene expression. ADRs have been extracted from the SIDER database and are coded into their High Level Grouping Terms (HLGT) using MedDRA.
Submit your contribution here...

Clustering of an unspecified set of gene lists

Deadline:
December 31st, 2017 11:59 PM EST.
Status: Open
Instructions:
This project provides participants with 5000 gene sets that do not have any annotations. The goal of the project is to identify the global structure of this dataset based on gene set similarity and content. This project is open ended without specific solution or objective metric for evaluation. A two page report that describes the analysis is required for evaluation.
Submit your contribution here...
Microtasks

Extract gene expression signatures collected from before and after perturbations of MCF7 cells

Deadline:
December 31st, 2017 11:59 PM EST.
Status: Open
Instructions:
MCF7 cells are a widely studied breast cancer cell line that is also profiled within the LINCS program by several assays and many perturbations. To complement the effort of the data collection of MCF7 by the LINCS consortium, the BD2K-LINCS DCIC is interested in accumulating gene expression signatures from prior studied that profiled gene expression before and after any perturbations of MCF7 cells. Such collection of signatures can be used to answer questions such as: how similar MCF7 cells are across labs and across platforms? Do perturbations of MCF7 cells converge into few prototypical responses? Using GEO2Enrichr and the crowdsourcing portal you can contribute signatures to this collection. The hashtag for this project is #MCF7_BD2K_LINCS_DCIC_COURSERA.
Submit your contributions here...

Extract gene expression signatures to enhance our understanding of the aging process

Deadline:
December 31st, 2017 11:59 PM EST.
Status: Open
Instructions:
The Gene Expression Omnibus (GEO) has many studies where gene expression from young human or mouse tissues was compared to old tissues. Such studies not always focused on understanding the aging process but the data collected from such studies can be used to do so. Collecting signatures from young vs. old tissues can shed light on common alteration in pathway activity in aging and ultimately lead to the identification of small molecules that can potentially slow down aging. Note that the “young” sample should be from mature but young adults and not from tissue collected from organisms that still undergo development and maturation. Using GEO2Enrichr and the crowdsourcing portal you can contribute signatures to this collection. The hashtag for this project is #AGING_BD2K_LINCS_DCIC_COURSERA.
Submit your contributions here...

Extract gene expression signatures from studies that examined how endogenous ligand perturbations affect mammalian cells

Deadline:
December 31st, 2017 11:59 PM EST.
Status: Open
Instructions:
To complement the LINCS efforts of profiling human cells response to small molecule and gene knockdown perturbations, it would be useful to have a collection of gene expression signatures collected from before and after treatment of human or mouse cells with endogenous ligands. Endogenous ligands are extracellular molecules that can be found in the body naturally. These ligands bind to receptors at the cell surface or can travel into the cell and bind to transcription factors. These small molecules signal to the cell about the status of the extracellular environment and whether the cell should take action and change its phenotypic behavior. Endogenous ligands include hormones such as growth factors, cytokines or chemokines. A complete list of such molecules can be found here. Using GEO2Enrichr and the crowdsourcing portal you can contribute signatures to this collection. The hashtag for this project is #LIGANDS_BD2K_LINCS_DCIC_COURSERA.
Submit your contributions here...

Extract gene expression signatures from studies that examined how viral or bacterial infections of human cells modify gene expression.

Deadline:
December 31st, 2017 11:59 PM EST.
Status: Open
Instructions:
The Gene Expression Omnibus (GEO) has many studies where gene expression from human or mouse cells was collected before and after viral or bacterial infection of those cells. Such collection of signatures can be used to potentially identify similarities between responses to different pathogens, and potentially help in identifying molecular mechanisms for novel pathogens based on their global molecular effects upon infection. Using GEO2Enrichr and the crowdsourcing portal you can contribute signatures to this collection. The hashtag for this project is #PATHOGENS_BD2K_LINCS_DCIC_COURSERA.
Submit your contributions here...

Building a gene set library of disease signatures in GEO

Deadline:
December 31st, 2017 11:59 PM EST.
Status: Open
Instructions:
For this project the community will build a gene set library from the Gene Expression Omnibus (GEO) by identifying studies that perturbed a single gene in a mammalian system (cells or tissue) and then gene expression was measured. The recommended method for identifying the gene set from these studies is the utilization of a new tool developed by the Ma'ayan Lab called GEO2Enrichr. GEO2Enrichr is a Chrome Extension that adds functionality to the GEO database web-site to make extraction of gene set and downstream analysis easy.
Submit your contributions here...

Building a gene set library from single drug perturbations in GEO

Deadline:
December 31st, 2017 11:59 PM EST.
Status: Open
Instructions:
For this project the community will build a gene set library from the Gene Expression Omnibus (GEO) by identifying studies that perturbed a single gene in a mammalian system (cells or tissue) and then gene expression was measured. The recommended method for identifying the gene set from these studies is the utilization of a new tool developed by the Ma'ayan Lab called GEO2Enrichr. GEO2Enrichr is a Chrome Extension that adds functionality to the GEO database web-site to make extraction of gene set and downstream analysis easy.
Submit your contributions here...

Building a gene set library from single gene perturbations in GEO

Deadline:
December 31st, 2017 11:59 PM EST.
Status: Open
Instructions:
For this project the community will build a gene set library from the Gene Expression Omnibus (GEO) by identifying studies that perturbed a single gene in a mammalian system (cells or tissue) and then gene expression was measured. The recommended method for identifying the gene set from these studies is the utilization of a new tool developed by the Ma'ayan Lab called GEO2Enrichr. GEO2Enrichr is a Chrome Extension that adds functionality to the GEO database web-site to make extraction of gene set and downstream analysis easy.
Submit your contributions here...