HOME
BACKGROUND
FEATURES
TOUR
EXAMPLES
TOOLS
CONTACT

Analyze your microarray data in 4 steps

Step 1 of 4:

Data Submission and Parameter Selection

First,select or type the following parameters (default values are given):
Analysis Method: Choose either one of the two threshold dependent methods (based on hypergeometric or Fisher's test) or one of the threshold independent methods (Kolmogorov-Smirnov test, Student's t-test, Wilcoxon's test)
Level of Signifance: Choose alpha error for the statistical test which serves to identify the significant GO nodes. The specified p-value will be adapted according to the Bonferroni correction.
Threshold Value: Only if a threshold dependent method was chosen --> Specify your threshold, for example - when using ratios - 2.0 in order to only consider genes that are at least two-fold up-regulated, or 0.5 for genes that are at least two-fold down-regulated. Or, when using probabilities of differential expression, for example 0.90 or 0.95.

Alternative Hypothesis: Only if a threshold independent method was chosen --> choose alternative hypothesis dependening on the assumption that you have on the change genes' in expression values:
Consider that both expression value distributions within the GO node and outside were similarly shaped and have 1 peak.
  • less, if you expect the peak of the GO node left from its environment (more genes with lower expression in the node)
  • greater, if you expect the peak of the GO node right from its environment (more genes with higher expression in the node)
  • two.sided, if you expect the peak of the GO node either left or right from its environment (more genes with changed expression / p-values in the node)
If you have no assumption, try two.sided in order to detect any differences between the nodes and their environments.

Organism: Choose the prokaryotic organism where the analyzed microarray data are derived from. Currently, more than 20 prokaryotic species are supported including important model organisms Escherichia coli and Bacillus subtilis) and - in addition - many medical relevant bacteria (e.g. Pseudomonas aeruginosa, Listeria monocytogenes, Clostridium tetani, Mycobacterium tuberculosis)

Microarray File: Upload your microarray data (text file), for example expression matrix containing ratios or test statistics like probabilities of differential expression. As usual for expression matrices, the measured genes should be represented by the rows. The file must at least contain one column containing the gene short names and one column containing the measured/calculated values for each gene.

Specifiy the column delimiter in your text file, for example \t for tabulator. Additionally, specify the numbers of the column containing the gene names (e.g. 1 for the first column) and the column containing the expression data.
So its possible to submit microarray files with several columns (for example if you tested more than one condition) as long as you specify thegene name column and corresponding data column correctly.

Type of Microarray Data: Choose the type of microarray data to be analyzed, for exampleratios,log ratios orprobalitites of differential expression.

After selecting and/or typing these parameters, you can proceed with analysis by pressing on the button Query.
Now, your submitted data and parameters will be checked for consistency. This includes for example a check of valid number format of the expression data.

IF YOU HAVE NO MICROARRAY DATA AT HAND BUT WOULD LIKE TO GET AN IMPRESSION OF THIS TOOL, JUST FEEL FREE TO DOWNLOAD THE SAMPLE DATA FROM THE TABLE AT THE BOTTOM OF THE START SITE ("Example Input File"). AFTER THAT YOU CAN UPLOAD THEM AS DESCRIBED ABVOVE --> .

Step 2 of 4:

Data Check

The outcome of checking the data and parameters is an overview on the genes of your microarray specifying the proportion of genes with valid expression data. Also the proportion of genes which could be mapped to the selected organism is shown.
If no severe error occured (e.g you forgot to submit a microarray file or specified a negative gene column number) you are asked wether to proceed with GO-based analysis (press )
or to return to the parameter selection and file upload procedure of Step 1 (press ).


Step 3 of 4:

Analysis

The GO-based functional analysis has started and will gradually output the results. The analysis process can take 1 to several minutes depending on the method chosen (threshold dependent methods take shorter time), the length of your expression matrix, and the current processor load of our server.
After analysis has finished, the following button will appear on the bottom of the table. If you press this button, you will change to the sorted table view of the GO nodes (Step 4 of 4) with assigned p-values. Several filtering and sorting functionalities are provided like for example filtering for GO nodes which are below a certain p-value. From this Sorted View you can download the analysis results.


Step 4 of 4:

Table View and Download of Results

After switching to the Sorted Table View, you get a site which is split into three parts:
  1. Analysis Parameters:
    All Parameters you selected on the first form (e.g. used method, alternative hypothesis) and the name of the analyzed microarray file are given.
  2. Query form:
    Here you can choose the query terms, sorting and filtering options for the results table (details see below).
  3. Results:
    Obviously this is the most important of the three parts since it presents the results obtained for the analyzed GO nodes (with p-values). Two result representations are available: a) a table view and b) a visualization of the GO subgraph containing the significant nodes. a) Table View: By default, significant GO nodes are automatically marked in yellow and shown at the top of the table because GO nodes are sorted according to their p-value in ascendending order.
    GO nodes are classified as significant when their p-value is below the corrected level of significance (see Analysis Parameter). It corresponds to the selected alpha error (e.g. 0.05) divided by the number of statistically tested GO nodes (for example about 0.05/1000=0.00005=5E-5 for E.coli K12) b) Subgraph Visualization: Since an important advantage of GO is its organization as directed acyclic graph, in addition to the table view, results are also visualized as subgraph of GO. The subgraph contains the GO nodes marked in the table ("yellow nodes") and all parent nodes of them up to the root node. So, by default, the significant GO nodes and their paths up to the root are shown. .

Part 1: Analysis Parameters

Here you find the detailed settings (selected parameters and microarray data set) of your analysis.
Tip: Bookmarking or saving of the URL of this page is possible! So, in addition to downloading your results, you can also proceed your online analysis later. But keep in mind that all data stored on our server will be erased at certain time points (end of each month) in order to restore hard disk memory.

Part 2: Query Form

In this form you can select parameters for searching or filtering the result table.
Possible scenarios:
  • Highlight only GO nodes whose names contain certain words or a certain phrase phrase in (regular expression and wildcards supported). For example search for GO nodes containing "amino acid" or "transcription" and sort GO nodes by p-value or alphabetically and hide all other GO nodes.
    Here you find detailed examples for such queries.
  • Search for GO Terms with p-values below or above a certain value and sort them by p-value
    e.g. mark significant nodes with p-values below 2.0E-5 for E.coli (see below)

Part 3a: Result Table


By clicking on an arrowheads in the column header you can sort the table by the specified column.
Column descriptions (from left to right):
1) The column GO Category denotes the sub-ontology the GO node belongs to which is either Molecular Function (MF), Biological Process (BP) or Cellular Component (CC). 2) The column GO Accession specifies the official GO Accession Number assigned by the Gene Ontology consortium to this node. Clicking on this number you directs you to further information on the selected GO node provided by the AmiGO! web interface . 3) The column GO name specifies the official name of this GO node. Clicking on the directs you to a complete list of the genes assigend to this GO nodes and their expression profile reflected by the microarray data investigated. 4) The column p-value shows the p-value (alpha error) which was calculated for this node employing the statistical test selected.
You can download your results as tabulator-delimited text file by clicking on .

Part 3b: GO Subgraph Visualization


The three main sub-ontologies of GO 1) molecular function, 2) biological process and 3) cellular component are marked with three distinct colors:
  1. Molecular Function (MF): nodes are filled in red
  2. Biological Process (BP): nodes are filled in green
  3. Cellular Component (CC): nodes are filled in blue
By default, significant GO nodes have a thicker border than non-significant nodes. In addition, the p-value of a GO nodes is reflected by its size and brightness: the lower the p-value, the larger the node and the brighter its fill color. The user has several options in order to customize the outfit of the subgraph. For example, he/she also can suppress "gradual brightness coding" and use instead "two-color coding" in order to only distinguish between significant nodes and non-significant ones. Similarly, the "size coding" can be suppressed, so that all nodes have the same size. In addition to the subgraph visualization, the all nodes shown in the subgraph are listed together with their p-values in a supplementary table (see below).
In the example shown below, the significant GO node hexose metabolism from the sub-ontology biological process (see picture above) is bright green compared to the dark green GO node primary metabolism which is not significant on this level of significane.

You can download the calculated GO subgraph either as PDF and as PNG image:
Press for PDF and for PNG.

Additional Functionalities available from the Sorted Table View (see Step 4):

1) Showing gene expression values and their distributions :

If you click on the located left from the GO node's name, a new window will open. It contains two main features:

1a) It lists all genes assigned to this GO node directly or indirectly (by their child nodes according to the "true path rule"). Where available the corresponding expression values are shown in square brackets. By clicking on a gene name you will be redirected to the appropriate entry of PRODORIC web-accessible database providing you comprehensive information on this gene.
Example:


1b) The distributions of expression values are shown as histograms a) for the genes belonging to the actual GO node and b) for all remaining gene in the genome. So you can visually inspect and compare the distributions of expression values. Below the first two histograms you find two other. These are broadly speaking zooms into the two distributions achieved by masking the 10% most extreme expression values. Thus, resolution of the X axis is better now (higher scale) and the two distributions can be inspected and visually compared much easier.
Example:
Number of genes, that are annotated to the GO Node: 34 out of it 32 measured
Number of genes NOT annotated to the GO node: 4256 out of it 2777 measured

Zoom into Distributions by masking the 10.0% genes with the highest expression.

Genes within Node
The most extreme 10.0% of the values were excluded for better visual representation (higher magnification on X-Axis) [only 29 considered]
Genes outside of the Node (Environment)
The most extreme 10.0% of the values were excluded for better visual representation (higher magnification on X-Axis) [only 2500 considered]

2) Show GO subgraph from current node back to the root node :

If you click on the accession number in the GO Accession column of the result table you will be forwarded to the AmiGO! web interface. The current node (represented by the GO accession number) will be automatically selected as seed for calculating a subgraph back to the root node ("all"). AmiGO represents the subgraph as a denormalized tree with each branch representing one path up the root node.

Example: Node:

Cellular Respiration

(GO:0045333)

Additional Functionalities available from the Extended GO Node View (see above):

If you click on a name of a gene in the table which lists all genes assigned to the selected GO node you will be redirected to the corresponding entry of the web interface of PRODORIC a database on gene regulation, signal transduction and metabolic pathways in prokaryotes.
Example (atpF chosen):
back to top