Overview of the Genome

WSP Rhodes
Feb 6, 2022
8 min read

Today, I’d like to give a brief, high-level overview of the genome, its parts, and how it works. Given how often I talk about genetics, I believe a basic walkthrough that could work as a reference for other posts would be of benefit. This will be a bit more complicated or technical than my previous posts, so I will present this in two sections, each being a step up in technicality. So let’s start with…

A Tortured Metaphor

Imagine a library, containing a little over 3 million pages. It’s not a well-organized library; the vast majority of these pages are just sheets of paper sitting unbound on shelves. Each page is full of randomly-typed gibberish, but if one page happens to contain the word “start”, and further down the shelf there’s a page that happens to contain the word “end”, the librarians will bind these pages and all the pages on the shelf between them into a book. Only about 1% of the pages in this library are in books.

Every day, the librarians patrol up and down the stacks of shelves. When they happen to notice a book, they will hastily copy its contents onto sheets of scratch paper and carry these sheets to the basement. Here, there are workshops and craftsmen who take these scratch paper instructions and use them to construct widgets. Each widget has a very specific task it performs around the library; they might leave the library to gather supplies, sit in windows to watch for danger, repair damaged equipment, carry messages between departments, or one of many other roles needed to keep the library functioning. Though the instructions for these widgets are randomly generated, the widgets that can reliably perform useful tasks are more likely to see continuous use, so over enough time, all widgets will trend toward being effective.

But just because a page isn’t bound up in a book doesn’t mean it’s unimportant. There is a whole class of regulatory widgets who are designed to seek out very particular loose pages on the shelves. When a widget finds the special page it’s programmed to find, it will attach itself to the shelf and stay there for a little while. If there happens to be a book on the shelf near a special page, this regulatory widget can impact how often a librarian finds and transcribes its instructions. Some regulatory widgets shoo librarians away, so the book gets copied less often and there will be fewer of it’s type of widget. Some regulatory widgets call the librarian’s attention to the book, so it gets copied more often and there’s more of its type of widget. These regulatory widgets are shaped by the same process of trial and error that other widgets are shaped by, as is the location of the special pages they attach to.

Every now and then, a new library will be built which will require a new set of all the same books. When this happens, hundreds of copy widgets are built which go through every book and page in the library and make a copy for the new library. But remember; there are over 3 million pages in this library and they all have to be copied by hand. Mistakes will be made. There’s also a fleet of editing widgets whose job is to compare the copied pages to the originals so as to find mistakes and correct them. But with billions of letters that need to be checked, there will be mistakes that slip through the cracks. No new library is a perfect copy of the old one. Fortunately, the majority of these mistakes don’t change anything important; the books are still readable or the misspellings are among the thousands of pages that don’t get used. But sometimes, a mistake will get made that results in broken widgets, or a book stops being a book, or a special page gets changed so the regulatory widgets don’t recognize it anymore, or a new special page will be made somewhere it shouldn’t be. If this mistake is damaging enough, it can weaken this new library and force it to close down. Or worse. But every once in a while, a mistake will be made that improves the library. This is how these widgets are created from randomly generated text, as small improvements build up over time. A widget will be made more effective at a specific role. A new book will be made with a potentially useful new widget. A special page will change that results in beneficial regulation. This happens rarely, but this new library will run better, it will stay open longer, and more new libraries will be made that will also have its useful mistake.

The Actual Description

I’m sure you’ve parsed apart that this library represents a cell and the books represent the genome. The human genome consists of 46 chromosomes, each chromosome being one continuous ribbon of DNA. There are roughly 3 billion individual “letters” written out across these 46 ribbons. Only about 1% of these letters are part of what we call genes, that is DNA that act as instructions for the making of proteins. The only real differentiator of a gene from the rest of the genome is that it is bordered on each side by two specific sequences of letters, one called a promoter sequence and the other called a termination sequence. Proteins called RNA polymerases (RNAP) travel down the length of each chromosome like a car on a road. When they come across a promoter region, the polymerases usually slow down and begin transcribing the DNA until it reaches the termination sequence. This transcript of the gene takes the form of a short strand of RNA. RNA is a nucleic acid like DNA, but it breaks down quicker, making it useful for this kind of quick messaging.

This RNA copy of the gene is what gets translated into a protein. I won’t go into the process in depth here, but what’s important to know is that every group of three RNA letters (called a codon) corresponds to a specific amino acid, the building block of proteins. This allows each gene to directly translate to a specific chain of amino acids. Once a chain of amino acids is formed, the chemical properties of each amino acid determine what shape the chain will fold into. This folded shape allows the protein to fit to specific other proteins or molecules, like a puzzle piece. This lets multiple proteins fit together to build complex structures, but it also allows single proteins to interact with each other or other molecules.

One class of proteins that are produced are called transcription factors. These are proteins which are shaped to attach themselves to a specific segment of DNA, often right next to a gene’s promoter sequence. Once attached, they can either be repressors, knocking RNA polymerases away so the gene gets transcribed less, or they can be activators, stopping RNA polymerases right on top of the promoter region so the gene gets transcribed more. Many of these transcription factors are cranked out at a constant rate but are activated by interacting with other proteins or under certain circumstances.

Proteins of all sorts can be modified to have their shapes changed. This can take the form of having bits of them cut off, having other molecules attached to them, being attached to other proteins, or just being reshaped as a single unit. The protein’s shape allows it to fit into certain jobs and allows evolution to occur faster since proteins can be modified more easily. It also makes it possible for proteins to transmit signals between each other. As an illustration, let’s start with surface proteins; the primary sense organ of cells. There are thousands of different types of surface protein, each one sticking through the cell membrane with a part on both the inside and outside. The outside part binds to a very specific molecule, from food and other useful resources to toxins and other potential threats. When one of these molecules binds to the receptor protein, the inside portion changes its shape. This allows a set of specific signaling proteins inside the cell to bind to the surface protein and become modified. This modified protein can then modify other proteins. These chains of protein interactions (called a signaling pathway) usually end with a transcription factor, now activated and able to alter DNA transcription. This is how external stimuli can change a cell’s behavior. Sensor proteins can also exist wholly inside the cell, allowing it to respond to internal stimuli, such as maintaining homeostasis.

The first regulatory pathway to be understood in full was the lac operon in bacteria, fully described by François Jacob and Jaques Monod in 1965. Bacteria normally fuel themselves with the sugar glucose, but they can also fuel themselves with lactose. To do this though, they have to produce enzymes which break lactose down into its component parts (two glucose molecules stuck together). To ensure these enzymes are only produced when there is lactose to break down, a transcription factor called lacl is constantly attached to the DNA strand right in front of the genes for these lactose-digesting enzymes, preventing them from ever being transcribed. But there is a site on the protein that lactose molecules can attach to, which causes the lacl to “fall off” the DNA strand. If the cell finds and absorbs any lactose, some will slot into this lacl protein, knocking it off the strand and allowing the genes to be transcribed and the cell to produce the enzymes needed to digest the lactose. The cell continuously produces lacl proteins, but when there’s lactose in the cell, these lacl proteins will get shut down before they can shut down the production of these enzymes. It’s only once all the lactose has been digested and these enzymes are no longer needed that a lacl molecule can remain functioning long enough to attach to the strand and turn off enzyme production. There are many other regulatory pathways in every cell with different structures than this one, but lacl is perhaps the most basic example of a cellular mechanism allowing a cell to respond to its environment.

https://apbiologyctd.wordpress.com/genetics/gene-expression/

While the mechanisms of gene expression are fairly simple, the sheer number of signaling pathways and how they interact with each other is what makes genetics and biology so complex. The image below shows every protein involved in the insulin signaling pathway (how the hormone insulin tells cells to metabolize sugar). This is just one pathway; there must be a pathway for every pair of cellular mechanisms that need to communicate with each other, and there are often multiple pathways for each of these lines of communication in case one breaks. It’s also not always a simple path from receptor-to-DNA, there are pathways that loop onto each other to create feedback loops and proteins that are part of multiple pathways so these pathways can influence each other.

https://www.cusabio.com/pathway/Insulin-signaling-pathway.html

To fully appreciate this complexity, it is important to remember that none of these mechanisms were designed; each protein is a simple tool that was modified and repurposed over-and-over again with the cells using bad variations being more likely to die. Studying these mechanisms has become a major field of study for biologists; identifying proteins, figuring out what pathway(s) they’re a part of, and trying to deduce what their exact contribution is to the functioning of the cell. It’s an incredibly complex field of study, but with the direction the field of biology has taken into being a data-driven field, these are mysteries that are rapidly being unraveled.

Overview of the Genome

A Tortured Metaphor

The Actual Description

Recent Posts

Comments

Subscribe Form