Frequently Asked Questions
How do I .....?
Contents of the database
General Background and Theory (biological and technical)
Technical issues with interfaces
Q. How do I perform a search of EMAGE?
A. There are currently three main ways to search EMAGE - using a gene/protein name, a spatial region or a named anatomical structure as the query term. We have made some short demo movies showing how to search.
Coming in 2009 will be several more query options including searching for members of a biological pathway, using a biological sequence as the query term (i.e. searching probe or antigen sequences used in EMAGE by BLAST) or searching for members of a GO biological functional class.
Q. How do I mine the data in EMAGE to find possible genetic associations?
A. One major option currently exists - use our CLUSTER algorithm to show heirarchically clustered data that has been measured for spatial similarities in the spatial annotations. This allows examples of genes that are expressed in a similar spatial pattern to be identified, and can be used to identify members of a synexpression group.
Coming in 2009 will be (a) added functionality to perform heirarchical clustering on text-annotated data, which will provide another avenue to also identify genes with similar expression patterns (albeit based on text annotaions and not the spatial patterns themselves) and (b) a BioMart interface for EMAGE.
Q. How do I save the results from a search?
Q. How do I perform Boolean operations between the result sets of several searches?
A. You can currently perform Boolean operations on two groups of EMAGE entries. One must be saved to your clipboard (see above - How do I save the results from a search?) and the other the results from your current search. They can currently only be compared using EMAGE:ID. In 2009 we have plan to release functionality that will allow the storage of many collections (cf. a clipboard) of EMAGE entries. This will require user log-in as the collections will be stored on our central server. The contents of collections will be able to be compared using many aspects fo the data (e.g. ID, gene/protein, detection reagent, stage of development) etc.
Q. What types of expression data are found in EMAGE?
A. EMAGE contains in situ gene/protein expression data, assayed using in situ hybridisation (ISH), immunohistochemistry (ISR) and in situ reporter (ISR) assays. The emphasis is on endogenous expression data: for ISH and IHC data, specimens are usually wild-type or occasionally heterozygous mutants (but with an apparently wild-type phenotype). For ISR (knock-in reporters or gene trap data), specimens are usually heterozygous mutants (but with an apparently wild-type phenotype) and occasionally homozygous mutants (again, with an apparently wild-type phenotype). Very occasionally we incorporate data from transient transgenic specimens under the proviso that the staining pattern reflects the endogenous gene expression pattern.
Q. Who generates the data in EMAGE?
A. The data in EMAGE is generated from researchers outwith the EMAGE project. It arrives at EMAGE via the literature or via direct submission from individual labs or consortia.
Q. Who enters the data into EMAGE?
A. Data is entered into EMAGE primarily by full-time EMAGE curators who make entries based on information supplied by the data submitter. Alternatively, data entries are sometimes completed by researchers off-site and electronically submitted to EMAGE using a data submission tool. EMAGE curators assess such entries for consistency and accuracy before adding to the public database.
Q. What constitutes one EMAGE entry?
A. One EMAGE entry = one specimen stained for the expression of one gene/protein at one point in embryonic development. Associated with each entry are details of the detection reagent, the specimen, sites of expression (denoted by both text annotation and spatial annotation), the names and details of the data submitter/source and relevant links to other data either in EMAGE or several external databases.
Q. What do EMAGE:IDs without an R in them signify (such as EMAGE:1234)?
A. These IDs represent full EMAGE entries - i.e. entries from the EMAGE 'repository' (see below) that have been scrutinised and added to by an EMAGE curator. They can be accessed by all search methods. The mimimum amount of information held for these entries is: Data Source; Gene/Protein assayed; Detection Reagent used; Theiler Stage (based on actual morphological criteria and not approximated from the dpc value as they are in the 'repository'); the stage given by the data submitter (usually as dpc value, sometimes as numbers of somites); Assay type (ISH/ISR/IHC); Specimen type (WM/section); whether the specimen is wild-type or a mutant (and if a mutant, the allele is indicated); an image (or set of images) of the specimen: and most importantly a spatial and a text annotation, both describing the sites of gene expression.
Q. What do EMAGE:IDs with an R in them signify (such as EMAGE:R12345)?
A. These IDs represent entries in our data warehouse, the EMAGE 'repository'. These data have been externally sourced and have not yet been scrutinised by an EMAGE curator. They can be accessed by Quick Search only. The mimimum amount of information held for these entries is: Data Source; Gene/Protein assayed; Detection Reagent used; Theiler Stage (usually approximated from the dpc value and not based on actual morphological criteria); the stage given by the data submitter (usually as dpc value, sometimes as numbers of somites); Assay type (ISH/ISR/IHC); Specimen type (WM/section); whether the specimen is wild-type or a mutant (and if a mutant, the allele is indicated); and an image (or set of images) of the specimen.
Q. How many entries are in the EMAGE database?
A. The most up to date answer is available by performing a database search.
The number of fully annotated entries in the database can be found by performing a gene search for * (i.e. a wild-card search for all genes). The total number of fully annotated entries in EMAGE will be the number given in the ID column (as of late 2008, ~5,500 entries). These graphs show the spread of annotated entries across the 4D spatio-temporal space of the EMAGE framework (correct as of late 2008).
The number of entries in the 'repository' can be found by performing a quick search for * (i.e. a wild-card search for all genes). The total number of entries in the repository will be the number given in the ID column (as of late 2008, ~30,000 entries).
Q. How is expression data annotated in EMAGE?
A. Expression data is annotated using two methods to denote sites of expression in the embryo: spatial annotation and text annotation. Additionally, many aspects of the detection reagent and specimen are also annotated during this process (assignment of IDs, nucleotide sequences for probes etc). Further information on the curation process.
Q. How much data is annotated?
A. The most up to date answer is available by performing a database search.
The number of fully annotated entries in the database can be found by performing a gene search for * (i.e. a wild-card search for all genes). The total number of fully annotated entries in EMAGE will be the number given in the ID column (as of late 2008, ~5,500 entries).These graphs show the spread of annotated entries across the 4D spatio-temporal framework of EMAGE (correct as of late 2008).
Q. At what stages is most data annotated?
A. These graphs show the spread of annotated entries across the 4D spatio-temporal framework of EMAGE (correct as of late 2008). Our current manual data entry policy (as suggested by our Advisory Board) is to focus on increasing gene coverage for WM data between TS15 and TS18. We will also be adding automatically spatially annotated data for TS23 embryo (from the EURExpress dataset) in 2009.
Q. Why don't I find any expression data at stages of development that I'm interested in?
A. There are several possible reasons:
1) If you're searching 3D data in a spatial search, it is possible that your plane of section and subsequent query region may not have intersected with any available spatial annotations. If this happens, try a Quick Search for the embryo stage you are interested, and then peruse the 'section' data (in the 'specimen' column) to locate data.
2) If you're searching for sites of expression based on named anatomical structures, you may have chosen a term that is too specific. Often the text annotations (as based on the descriptions supplied by the data submitters) are broad as opposed to specific. Try a broader term for query.
3) We currently do not have appropriately annotated data in EMAGE at that stage. These graphs show the spread of text- and spatial- annotated entries across the 4D spatio-temporal framework of EMAGE (correct as of late 2008). Our current manual data entry policy (as suggested by our Advisory Board) is to focus on increasing gene coverage for WM data between TS15 and TS18. We will also be adding automatically spatially annotated data for TS23 embryo (from the EURExpress dataset) into EMAGE in 2009 (with text annotations as performed by EURExpress data annotators). In the meantime it is possible to access this data by Quick Search of the 'repository'.
Q. Why don't all entries have an original data image?
A. Some data images are not shown in EMAGE because we do not have permission to reproduce certain copyrighted images that have appeared in the literature. In this case we use a generic image showing the copyright symbol whilst providing a direct link to the relevant paper. More about EMAGE and copyrighted images. Occasionally, the data submitter does not provide images documenting a gene expression pattern where they feel no expression is detected. Whilst we prefer photographic documentation, we sometimes will add a image entitled 'DATA NOT SHOWN' including associated notes explaining why.
Q. Why do I see lots of instances of a generic image showing the copyright symbol?
A. A generic image showing the copyright symbol is shown in EMAGE when we do not have permission to reproduce certain copyrighted images that have appeared in the literature. in these cases we always provide a direct link to the relevant paper. In this case we use More about EMAGE and copyrighted images.
Q. What is a Theiler Stage and how does it relate to dpc values?
A. A Theiler Stage is a stage of mouse embryo development based on the external and/or internal morphological development of the embryo, that is not directly dependent on either age or size. The mouse embryonic period proper is divided into 26 Theiler stages.
As a rough guide, Theiler stages and the approximate dpc values in (C57BL/6 x CBA) F1 embryos are:
TS01 ~0.5 dpc
TS02 ~1.0 dpc
TS03 ~2.0 dpc
TS04 ~3.0 dpc
TS05 ~4.0 dpc
TS06 ~4.5 dpc
TS07 ~5.0 dpc
TS08 ~6.0 dpc
TS09 ~6.5 dpc
TS10 ~7.0 dpc
TS11 ~7.5 dpc
TS12 ~8.0 dpc
TS13 ~8.5 dpc
TS14 ~9.0 dpc
TS15 ~9.5 dpc
TS16 ~10.0 dpc
TS17 ~10.5 dpc
TS18 ~11.0 dpc
TS19 ~11.5 dpc
TS20 ~12.0 dpc
TS21 ~13.0 dpc
TS22 ~14.0 dpc
TS23 ~15.0 dpc
TS24 ~16.0 dpc
TS25 ~17.0 dpc
TS26 ~18.0 dpc
More information about Theiler staging is available from the EMAP mouse embryo anatomy atlas.
Q. How does EMAGE relate to the GXD database at MGI?
A. The nature of the collaboration between EMAGE and GXD is discussed on our MGEIR page.
Q. How does the LOSSST algorithm work?
A. Briefly, by comparing shape similarities between all the spatial annotation patterns in the database to your query region, calculating a value of spatial similarity, and then ranking the results. Detailed explanation.
Q. How does the CLUSTER algorithm work?
A. Briefly, by comparing spatial similarities between all the patterns in the input set to each other, calculating a value indicating how spatially similar the patterns are, and then hierarchically clustering these values. Detailed explanation.
Q. When I try to use JavaTree View to view the results of a spatial clustering, it crashes my web-browser. What can I do to stop this?
A. You most probably need to increase Java memory on your computer.