GO-MAC: Gene Ontology based MicroArray Clustering
The story starts from DNA. DNA of human and animal consists of millions of genes, and different genes can be applied to produce different proteins. If we know the function of each kind of protein, we can control the quantity of the corresponding gene, then the creation of protein is under our control, and we can use it to fight for heart-stroke, cancer... Yes, this is a big topic in Bio-Informatics.
Using the technilogy of MicroArray, DNA is cut into gene slides, when each slide has a single gene. In a small chip there are thousands of grids, and in each grid there is one kind of gene slide. The chip will be put in a changing environment such as a heating room. When the temperature is increasing, some genes will duplicate itself so that more protein will be created, when the quantity of some genes remain the same, because those genes and corresponding proteins have nothing to do with the temperature. We record the quantity of the genes, and study the pattern of the changing curve, then we can know (a little bit of) the function of the gene in the condition of changing environment.
The creation of MicroArray technology is an evolutional event, because it enables the high-throughput study of the function of genes, and highly increase the productivity of the research, because the previous research methods only allow us to study 2-3 genes at the same time, compare to thousands of genes studied in MicroArray.
After the lab experiment, we can the numbers of different genes during different environment. For example, the number of 6,000 gene slides in the 37C degree, in 37.5C degree, and in 38C degree. That will be 18,000 numbers we need to study. The traditional way to study the 18,000 numbers is to cluster them using k-means or hierarychy clustering method. When clustered, genes with similar changing curve will be put in one group, and if we know the function of ONE gene in the group, we can assume that other genes has the same function.
The disadvantage of this method is: When the k-means or hierarchy clustering method is applied, the program treats the genes as numbers. It doesn't understand the meaning of the genes at all. In other words, the biological meaning is lost.
Yes, BIOLOGICAL MEANING. You can see mist on top of a cup of hot water and a cup of ice cream, but they are different!
That's why I adopted Gene Ontology in the study of MicroArray. What is Ontology? Basically, Ontology is a dictionary of a knowledge domain. For example, an Ontology of a software company can be defined as:
____
Gene Ontology (GO) is defined by biologiests around the world. I can give a sample of the Gene Ontology here:
_____
Before applying the Gene Ontology into application, we annotate genes using the terms of GO. ____ is annotated as _____, ______, ____. From the annotation we can have some basic idea of this gene already. There are some automatic annotation tools deployed to extract terms of existing thesis, papers, databases all over the world to annotate the genes that we have studied. We are not working from void. We use all the knowledge human knows.
Labels: bioinformatics, ontology, thesis
0 Comments:
<< Home