DNA seen through the eyes of a coder

Friday, March 28th, 2008

Bert Hubert describes DNA seen through the eyes of a coder — and, frankly, I would expect coders to have a better grasp of DNA than most biologists:

DNA is not like C source but more like byte-compiled code for a virtual machine called the nucleus. It is very doubtful that there is a source to this byte compilation — what you see is all you get.

The language of DNA is digital, but not binary. Where binary encoding has 0 and 1 to work with (2 — hence the binary), DNA has 4 positions, T, C, G and A.

Whereas a digital byte is mostly 8 binary digits, a DNA byte (called a codon) has three digits. Because each digit can have 4 values instead of 2, an DNA codon has 64 possible values, compared to a binary byte which has 256.

A typical example of a DNA codon is GCC, which encodes the amino acid Alanine. A larger number of these amino acids combined are called a polypeptide or protein, and these are chemically active in making a living being.

That’s all pretty basic. Let’s move along to position independent code and conditional compilation:

Dynamically linked libraries (.so under Unix, .dll on Microsoft) code cannot use static addresses internally because the code may appear in different places in memory in different situations. DNA has this too, where it is called transposing code:
Nearly half of the human genome is composed of transposable elements or jumping DNA. First recognized in the 1940s by Dr. Barbara McClintock in studies of peculiar inheritance patterns found in the colors of Indian corn, jumping DNA refers to the idea that some stretches of DNA are unstable and “transposable,” ie., they can move around — on and between chromosomes.

Of the 20,000 to 30,000 genes now thought to be in the human genome, most cells express only a very small part — which makes sense; a liver cell has little need for the DNA code that makes neurons.

But as almost all cells carry around a full copy (distribution) of the genome, a system is needed to #ifdef out stuff not needed. And that is just how it works. The genetic code is full of #if/#endif statements.

This is why stem cells are so hot right now — these cells have the ability to differentiate into everything. The code hasn’t been #ifdeffed out yet, so to speak.

Stated more exactly, stem cells do not have everything turned on — they are not at once liver cells and neurons. Cells can be likened to state machines, starting out as a stem cell. Over the lifetime of the cell, during which time it may clone (fork()) many times, it specializes. Each specialization can be regarded as choosing a branch in a tree.

Each cell can make (or be induced to make) decisions about its future, which each make it more specialized. These decisions are persistent over cloning using transcription factors and by modifying the way DNA is stored spatially (steric effects).

A liver cell, although it carries the genes to do so, will generally not be able to function as a skin cell. There are some indications out there that it is possible to breed cells upwards into the hierarchy, making them pluripotent.

From a coder’s perspective, so-called junk DNA is just dead code, bloat, and comments:

The genome is littered with old copies of genes and experiments that went wrong somewhere in the recent past — say, the last half a million years. This code is there but inactive. These are called the pseudo genes.

Furthermore, 97% of your DNA is commented out. DNA is linear and read from start to end. The parts that should not be decoded are marked very clearly, much like C comments. The 3% that is used directly form the so called exons. The comments, that come inbetween are called introns.

These comments are fascinating in their own right. Like C comments they have a start marker, like /*, and a stop marker, like */. But they have some more structure. Remember that DNA is like a tape — the comments need to be snipped out physically! The start of a comment is almost always indicated by the letters GT, which thus corresponds to /*, the end is signalled by AG, which is then like */.

However because of the snipping, some glue is needed to connect the code before the comment to the code after, which makes the comments more like html comments, which are longer: <!– signifies the start, –> the end.

If code and DNA interest you, definitely read the whole thing.


  1. Sam J. says:

    Interesting. Thanks for bringing it to our attention.

Leave a Reply