The Genetic Code, Part I

You may be familiar with the cryptic code puzzles that appear on the comic sections of newspapers. They involve some famous or amusing quote that has all of the letters in it substituted for other letters. For example, all the A’s may be substituted with W’s, all the V’s with P’s, etc. The substitutions are always consistent, and the puzzle is solved simply by figuring out the correspondence of letters. This code is very simple, because it just involves finding the equivalents of single symbols, which are both in the same alphabet. Unfortunately, the code needed to direct protein synthesis has to be more complex than that.

The first reason our code needs to be more complex is that DNA and proteins are very different kinds of molecules. Even if we could imagine a code that establishes a relationship between the bases in the nucleic acid and amino-acids, how could this correspondence work on a molecular level? Is there some molecular mechanism that could predict an interaction between a particular amino-acid, and one or more particular nucleic acid bases?

The fact that proteins and nucleic acids have different biochemical properties suggests that a direct molecular correspondence is unlikely to occur. What we need is to have nucleic acids and proteins communicated through a translator, who speaks the language of both. Francis Crick was the first to propose this solution.

Crick suggested that there must be a molecule with two functionally different ends. On the one end there must be a mechanism for attaching a specific amino-acid, and at the other end, there must be a mechanism for interacting with a specific sequence of nucleotide bases. Crick was correct indeed. There are such molecules. These are small, highly specialized, strings of RNA, called “transfer RNA” (tRNA).

Three-Letters Words

The second reason the genetic code needs to be more complex than newspaper puzzles is that there are 20 kinds of amino-acids, but only four kinds of bases in nucleic acids. We have A, G, T and C, and that’s it. If we had a simple substitution that was one for one, we could only have a code that was specific for four different amino-acids. We have 20 of them to account for, however.

What that suggests is that sequences of more than one nucleotide must be used to code for a single amino-acid. This would be like having the bases of a nucleic acid combined together to form code-words, where each base is a single letter. How many letters long must each word be?

Imagine that we had code-words that were made up of two letters each. That would not be sufficient to code for all the amino-acids. This is simple math. If we have four things, and we combine these four things in pairs, then we can make 4 squared combinations. That’s sixteen combinations. However, if we make combinations of three things, we would have 4 cubed combinations, that’s 64, more than enough. Actually, a three letter code suggests that the code has some sort of redundancy.

The logic in support for having a three letter code seems pretty obvious, but it simply suggests a testable hypothesis which then had to be demonstrated. Once again, it was Francis Crick along with colleagues who demonstrated experimentally that the genetic code must involve sequences of three bases. Crick used a technique in which they could cause a very particular kind of mutation in the DNA of a virus. This involved the elimination of just one base-pair from a DNA double helix. Alternatively, the mutation involved the addition of just one base-pair. If they applied this treatment in the appropriate fashion, they could be assured that either one base pair, or one was added.

Consider the simple sentence: “Old men are fun”, but without the spacing. Consider it as a string of characters. That sentence is composed of four words, each specified by three letters. If we delete the first letter, and try to read it, we’ve got: “ldmenarefun”. If we deleted two letters, we’d have: “dmenarefun”. Neither of these strings of characters makes any sense because we have shifted the place we start reading by one or two positions. The resulting remainder of the string becomes nonsensical.

Now, if we delete three letters, the words make sense again: “menarefun”. It’s not the same sentence as we’ve started with, but that’s not the point. The point is that the words in that sentence, comprised of three letters, only make sense if we take out three letters. We could do the opposite. If we add three letters, part of the string would make sense.

Crick used the same kind of logic, deleting or adding just one base-pair from a viral DNA, to show that there must a three letter code. If they induced one mutation in a gene, then all of the amino-acids coded by that gene would get changed. If they eliminated two base-pairs, they’d have the same results. If they deleted three base-pairs, however, they found that the remaining portion of that gen would make sense, in the sense that most amino-acid sequences would be intact.

It was pretty clear that the genetic code had to be made of three bases each. Many experiments since that have used a variety of techniques to verify the existence of this three letter code. The three-base sequences that serve as fundamental units for the code have come to be known as codons. Each codon corresponds to a unique amino-acid.

The next step was to establish what the correspondence is between particular codons and particular amino-acids. I will leave that for next time.

Pablo's Origins Blog

Linkbar

The Genetic Code, Part I

Three-Letters Words

Be the first to leave a comment

Archive

About Me