I’m a computer scientist (and non-biologist) trying to out how to model basic gene expression as the operation of a “production system,” which is just an unordered set of IF-THEN rules, and am not sure how to model cis-regulatory modules (CRM’s) that may control more than one gene, and genes that may be controlled by more than one CRM.
Here’s the basic idea, based on the oversimplification that each gene has exactly one CRM controlling it.
Such a gene is exactly a "production"—an IF-THEN rule—in a "production system," which is just a set of such rules, which can interact. A genome is a such a set of rules, processed asynchronously and in parallel by default, by a "production system computer."
(Just to be clear, a rule is not an IF-THEN statement like in a von Neumann computer. There’s no serial flow of control, no place for GOTOs to go to, or any of that. I’m not a naive computer weenie who thinks the genome operates like a von Neumann machine, or like a Turing machine. Gene expression operates like a very different kind of computer, which is older and more basic, but every bit as much “a computer.”)
A one-CRM gene that produces a transcription factor is a rule like this:
IF (A and not(B) and C) THEN (E and D and F)
The left hand side (preconditions) of the rule are a CRM that is enabled if molecules matching binding sites A and C are docked to it, unless a molecule matching B is docked, because B is a repressor site.
If those conditions are met, we say that the production rule can “fire,” i.e., the “action” on the right hand side can be taken to “produce” an E, a D, and an F, and put them in the “working memory” of the computer. Physically, that means that the coding part of the gene is used for transcription and translation to produce a molecule of a transcription factor with spots that can match E, D, and F in other CRMs (after folding).
The “working memory” of the computer is just the nucleoplasm or cytoplasm that transcription factors diffuse through to reach other genes.
But what about the potential many-to-many connections between CRM’s and genes?
If my wild-ass guess is right, it goes something like this:
A CRM implements some logical expression with AND, OR, and NOT operators, which can be free-standing and used by any genes that happen to use it.
(Again just for clarity, when I say “logical expression” I do not mean that expressions have boolean true-false values at the usual levels of analysis, or at the timescales we normally consider for gene function. Usually what matters is rates of gene expression and concentrations of transcription factors. It is only at the shortest timescale and at the level of the "machine language" that we see boolean operation, and it’s stochastic—an enabled rule may fire, but may not, depending on whether the right transcription factor molecules are in the right places at that moment, whether the transcription machinery is around and not busy, etc. So at the lowest level, it’s boolean, but stochastic, and above that level, it acts a lot like fuzzy logic, where the values of propositions (A, B,... etc.) are crude analog scalar quantities, rather than binary truth or falsity. The proper interpretation of such quantities varies depending on what the program is actually modelling or controlling---they can be used to model degrees of truth, as in fuzzy logic, or actual quantities, or degrees of evidentiary support for hypotheses---and they're often used as binary values at a high level, using rules with feedbacks to ensure concentrations are "high" or "low." The "machine language" doesn't force any particular interpretation of the analog scalars associated with propositions.)
So my guess is that when we take modular CRM’s into account, that changes our rule language in a neato kind of way, roughly like this:
We can define logical expressions, give them something like a name, and then refer to them by name on the left-hand sides of whatever rules we want. We could write a modular version of the above rule as
K1 IF (A and not (B) and C)
IF (K1) THEN (E and D and F)
The first rule isn’t a production rule—you can’t fire it to produce something that goes in working memory. It’s just giving the name K1 to a (compound) condition.
Now if we want to produce some other transcription factor under the same combination of circumstances, but only if some other condition is true, too, we can have a third gene/rule like this:
IF (K1 and not(J)) THEN (Q and R)
Is that about right?
If so, one specific question I have is what kind of connection is the condition K1?
It doesn’t correspond to a transcription factor or a binding site, right…there’s an entirely different mechanism at work, right?
And in our rule language, K1 can’t appear on the right-hand side of a rule, can it? That is, the expression of a gene can’t produce a protein that signifies K1, because K1 isn’t a kind of binding site, right?
Another question I have is about complex "AND" and "OR" conditions in a CRM.
My understanding is that CRM's can be sensitive to different transcription factors, and enabled by either one, even if the other is not present, so you can have a CRMs like ones in these rules:
IF (A and not(B)) THEN (F and G)
K3 IF (not(D) or C)
Assuming that's correct, what I don't know is whether AND expressions can be nested inside of OR expressions, or vice versa, as in these examples:
IF (A and (B or not(C))) THEN (F and G)
K3 IF ((not(A) and B) or C)
That is, can a CRM correspond to nested logical conditions, or just the one-level ANDing (or ORing) together of simple propositions and/or negated propositions?
Thanks for any help,
Paul W. (Phd., EE & CS)
Thanks, anybody, for any help with this.