The material that carries genetic information in all organisms, except for some families of viruses that use ribonucleic acid (RNA). The set of DNA molecules that contains all genetic information for an organism is called its genome. DNA is found primarily in the nuclei of eukaryotic cells and in the nucleoid of bacteria. Small amounts of DNA are also found in mitochondria and chloroplasts and in autonomously maintained DNAs called plasmids. See Nucleic acid
DNA is composed of two long polymer strands of the sugar 2-deoxyribose, phosphate, and purine and pyrimidine bases. The backbone of each strand is composed of alternating 2-deoxyribose and phosphate linked together through phosphodiester bonds. A DNA strand has directionality; each phosphate is linked to the 3′ position of the preceding deoxyribose and to the 5′ position of the following deoxyribose (Fig. 1). The four bases found in DNA are adenine, thymine, guanine, and cytosine. Each 2-deoxyribose is linked to one of the four bases via a covalent glycosidic bond, forming a nucleotide. The sequence of these four bases allows DNA to carry genetic information. Bases can form hydrogen bonds with each other. Adenine forms two bonds with thiamine, and cytosine forms three bonds with guanine. These two sets of base pairs have the same geometry, allowing DNA to maintain the same structure regardless of the specific sequence of base pairs. See Deoxyribose, Purine, Pyrimidine
DNA is composed of two strands that wrap around each other to form a double helix. The two strands are held together by base pairing and are antiparallel. Thus if one strand is oriented in the 5′ to 3′ direction, the other strand will be 3′ to 5′. This double-helical structure of DNA was first proposed in 1954 by J. D. Watson and F. H. C. Crick. The most common form of DNA is the B-form, which is a right-handed double helix with 10.4 base pairs per turn. Less common forms of DNA include A-form, which is a right-handed double helix that has 11 base pairs per turn and has wider diameter than B-form, and Z-form, which is a narrow, irregular left-handed double helix.
For cells to live and grow, the genetic information in DNA must be (1) propagated and maintained from generation to generation, and (2) expressed to synthesize the components of a cell. These two functions are carried out by the processes of DNA replication and transcription, respectively. See Genetic code
Each of the two strands of a DNA double helix contains all of the information necessary to make a new double-stranded molecule (Fig. 2). During replication the two parental strands are separated, and each is used as a template for the synthesis of a new strand of DNA. Synthesis of the nascent DNA strands is carried out by a family of enzymes called DNA polymerases. Base incorporation is directed by the existing DNA strand; nucleotides that base-pair with the template are added to the nascent DNA strand. The product of replication is two complete double-stranded helices, each of which contains all of the genetic information (has the identical base sequence) of the parental DNA. Each progeny double helix is composed of one parental and one nascent strand. DNA replication is very accurate. In bacteria the mutation rate is about 1 error per 1000 bacteria per generation, or about 1 error in 109 base pairs replicated. This low error rate is due to a combination of the high accuracy of the replication process and cellular pathways which repair misincorporated bases. See Mutation
In transcription, DNA acts as a template directing the synthesis of RNA. RNA is single-stranded polymer similar to DNA except that it contains the sugar ribose instead of 2-deoxyribose and the base uracil instead of thymidine. The two strands of DNA separate transiently, and one of the two single-stranded regions is used as a template to direct the synthesis of an RNA strand. As in DNA replication, base pairing between the incoming ribonucleotide and the template strand determines the sequence of bases incorporated into the nascent RNA. Thus, genetic information in the form of a specific sequence of bases is directly transferred from DNA to RNA in transcription. After the RNA is synthesized, the DNA reverts to double-stranded form. Transcription is carried out by a family of enzymes called RNA polymerases. Following transcription, newly synthesized RNA is often processed prior to being used to direct protein synthesis by ribosomes in a process called translation. See Protein, Ribonucleic acid (RNA), Ribosomes
There is a great deal of variation in the DNA content and sequences in different organisms. Because of base pairing, the ratios of adenine to thiamine and cytosine to guanine are always the same. However, the ratio of adenine and thymine to guanine and cytosine in different organisms ranges from 25 to 75%. There is also large variation in the amount of DNA in the genome of various organisms. The simplest viruses have genomes of only a few thousand base pairs, while complex eukaryotic organisms have genomes of billions of base pairs. This variation partially reflects the increasing number of genes necessary to encode more complex organisms, but mainly reflects an increase in the amount of DNA that does not encode proteins (known as introns). A large percentage of the DNA in multicellular eukaryotes is in introns or is repetitive DNA (sequences that are repeated many times). In most eukaryotes the DNA sequences that encode proteins (known as exons) are not continuous but have introns interspersed within them. The initial transcript synthesized by RNA polymerase contains both exons and introns and can be many times the length of the actual coding sequence. The RNA is then processed and the introns are removed through a mechanism called RNA splicing to yield messenger RNA (mRNA), which is translated to make protein.
Techniques have been developed to allow DNA to be manipulated in the laboratory. These techniques have led to a revolution in biotechnology. This revolution began when methods were developed to cleave DNA at specific sequences and to join pieces of DNA together. Another major component of this technology is the ability to determine the sequence of the bases in DNA. There are two general approaches for determining DNA sequence. Either chemical reactions are carried out which specifically cleave the sugar-phosphate bond at sites which contain a certain base, or DNA is synthesized in the presence of modified bases that cause termination of synthesis after the incorporation of a certain base. These methods can now be automated so that it is practical to determine the DNA sequences of the entire genome of an organism. Currently, the complete sequences of several bacterial and fungal genomes are known, drafts exist for the complete mouse and rat genomes, and 99% of the gene-containing part of the human sequence has been determined. See Human Genome Project
The full genome of DNA must be substantially compacted to fit into a cell. For example, the full human genome has a total length of about 3 m (10 ft). This DNA must fit into a nucleus with a diameter of 10-5 m. This immense reduction in length is accomplished in eukaryotes via multiple levels of compaction in a nucleoprotein structure termed chromatin. The first level involves spooling about 200 base pairs of DNA onto a complex of basic proteins called histones to form a nucleosome. Nucleosomes are connected like beads on a string (Fig. 3) to form a 10-nanometer diameter fiber, and this is further coiled to form a 30-nm fiber. The 30-nm fibers are further coiled and organized into loops formed by periodic attachments to a protein scaffold. This scaffold organizes the complex into the shape of the metaphase chromosome seen at mitosis. See Nucleoprotein
The nucleosome is the fundamental structural unit of DNA in all eukaryotes. Nucleosomes reduce the accessibility of the DNA to DNA-binding proteins such as polymerases and other protein factors essential for transcription and replication. Consequently, nucleosomes tend to act as general repressors of transcription. See Nucleosome