Data Sources

The data held in EMAGE comes from three main sources:

 

 

Literature

Data from the literature is entered into EMAGE in collaboration with our sister database, the GXD at Mouse Genome Informatics (MGI). GXD curation staff locate mouse gene expression data in the literature and compile a Gene Expression Literature Index which contains information sourced from a very wide range of journals (nearly 600). The Gene Expression Literature Index includes information on the citation, authors, genes/proteins assayed in each paper, whether the samples were whole-mount or sectioned material and the age of the specimens involved.

GXD staff then go on to fully annotate a proportion of this data, by describing in detail the specimen; the detection reagent and methods used and the sites of expression (a text-based description is produced by annotating terms in the EMAP anatomy ontology). This allows query of the GXD database by many aspects.

EMAGE imports some fully annotated data from GXD. EMAGE curators also use the GXD Gene Expression Literature Index to locate data from about 150 journals for full-indexing (including spatial annotation). If a journal does not license its material under a suitable Creative Commons License we have arranged individual legal agreements with the publishers of 41 journals (that collectively house over 80% of published in situ gene expression images in the mouse), which allows us to reproduce copyrighted images from these journals on the EMAGE website. Note that if we do not have permission to reproduce the original data image, it is our policy to use a generic image showing the copyright symbol on the EMAGE website that also includes a relevant link to the original data at either PubMed entry or a DOI link direct to the data at the journal website.

Data originating in the literature that is fully annotated in the EMAGE and GXD databases sometimes overlaps and sometimes does not:

data_overlap_EMAGE_GXD

 

We are committed to better integration of EMAGE and the GXD in the future to produce a resource (MGEIR) that will unify the annotated data.

 

 

 

 

 

Large-scale projects

EMAGE contains a proportion of data originating from large-scale gene expression screening projects. One current notable example is EURExpress.


The EURExpress consortium is generating mRNA in situ hybridisation data for ~20,000 mouse genes on sagittal sections at E14.5 (~24 evenly spaced sections for each gene), and performing a text-based annotation of the sites of expression seen in all 480,000 images. The text annotation is performed manually, with annotation staff visually assessing each image and then using the EMAP anatomy ontology to describe sites of expression.

EURExpress_strategy

 

EURExpress data will be imported into EMAGE in early 2009. In addition to the information already compilied by the EURExpress consortium, EMAGE has developed automated signal extraction and alignment methods to allow spatial-based annotation and analyses to be applied to this dataset.

 

Data from other large-scale screens incorporated into EMAGE include:

 

 

Individual Labs

We encourage all mouse embryologists and geneticists to deposit their in situ gene expression data in the EMAGE database

We have received direct data submissions from many labs including those of Brigid Hogan, Janet Rossant, Patrick Tam, Virginia Papaioannou, Carol Wicking, Marianne Bronner-Fraser, Paula Murphy, Yasuhide Furuta, David FitzPatrick, Mike Dixon and Salvador Martinez.