Like books in a library, the purpose of genes is to store information. Each gene is a book containing the information required to make a protein, or in some cases a non-coding RNA. In the same way that books may be taken off a shelf and read, genes are expressed to produce functional RNA and protein molecules in the cell.
Gene expression occurs in two major stages (see Figure). The first is transcription. In this process, the gene is copied to produce an RNA molecule (a primary transcript) with essentially the same sequence as the gene. Most human genes are divided into exons and introns, and only the exons carry information required for protein synthesis (see Gene structure ). Most primary transcripts are therefore processed by splicing to remove intron sequences and generate a mature transcript or messenger RNA (mRNA) that only contains exons.
The second stage is protein synthesis. This stage is also known as translation and is so called because there is no direct correspondence between the nucleotide sequence in DNA (and RNA) and the sequence of amino acids in the protein. In fact, three nucleotides are required to specify one amino acid. The chain of amino acids must fold up to generate the final tertiary structure of the protein (see Protein structure ).
All the books in a library are not equally popular. Some will be well thumbed and others consulted only rarely. Similarly, all genes in the human genome are not expressed in the same way. Some are expressed in all cells all of the time. These so-called housekeeping genes are essential for very basic cellular functions.
Other genes are expressed in particular cell types or at particular stages of development. For example, the genes that encode muscle proteins such as actin and myosin are expressed only in muscle cells, not in the cells of the brain. Still other genes can be activated or inhibited by signals circulating in the body, such as hormones.
This differential gene expression is achieved by regulating transcription and translation. All genes are surrounded by DNA sequences that control their expression. Proteins called transcription factors bind to these sequences and can switch the genes on or off. Gene expression is therefore controlled by the availability and activity of different transcription factors.
As transcription factors are proteins themselves, they must also be produced by genes, and these genes must be regulated by other transcription factors. In this way, all genes and proteins can be linked into a regulatory hierarchy starting with the transcription factors present in the egg at the beginning of development. A number of human diseases are known to result from the absence or malfunction of transcription factors and the disruption of gene expression thus caused.