| There are two main players in Gene Expression Programming: the
             chromosomes and the  expression trees (ETs) or programs, being the latter the expression of the genetic information encoded in the former. As in nature, the process of information decoding is called
            translation. And this translation implies obviously a kind of code and a set of rules. The
             genetic code of Gene Expression Programming is very simple: a one-to-one relationship between the symbols of the chromosome and the nodes they represent in the trees. The
             rules are also very simple: they determine the spatial organization of nodes in the expression trees and the type of interaction between sub-ETs. Therefore, there are two languages in GEP: the language of the genes and the language of expression trees and, thanks to the simple rules that determine the structure of ETs and their interactions, it is possible to infer immediately the expression tree given the sequence of a gene, and vice versa. This means that one can choose to have a very complex program represented by its compact genome without losing in meaning. This unequivocal bilingual notation is called
             Karva language. Its details are explained in the remainder of this section. 
 The  structural organization of GEP genes is better understood in terms of
             open reading frames (ORFs). In biology, an ORF or coding sequence of a gene begins with the start codon, continues with the amino acid codons, and ends at a termination codon. However, a gene is more than the respective ORF, with sequences upstream of the start codon and sequences downstream of the stop codon. Although in GEP the start site is always the first position of a gene, the termination point does not always coincide with the last position of a gene. Consequently, it is common for GEP genes to have
            noncoding regions downstream of the termination point. These noncoding
            regions obviously do not interfere with expression but, nonetheless, they play a crucial role in evolution, for they alone allow the creation of valid programs no matter how profoundly their chromosomes are modified. Well get back to them later.
 
 Consider, for example, the algebraic expression:
 
 It can also be represented as a diagram or an  expression tree:
              
 
            where Q represents the square root function.
 This kind of diagram representation is what is called the  phenotype
            in Gene Expression Programming. And the
             genotype can be easily inferred from the phenotype as follows:
 01234567                 Q*-+abcd             
            (2)
 which is the straightforward reading of the expression tree from left to right and from top to bottom (exactly as we read a page of text). The expression (2) is an ORF, starting at Q (position 0) and terminating at d (position 7). These ORFs were named
             K-expressions from Karva language.
 Consider another ORF, the following K-expression:
 01234567890                 Q*b**+baQba             
            (3)
 Its expression as an ET is also very simple and straightforward. In order to express the ORF correctly, we must follow the rules governing the spatial distribution of functions and terminals. First, the start of a gene corresponds to the
             root of the ET which is placed in the topmost line. Second, in the next line, below each function, are placed as many branch nodes as there are arguments to that function. Third, from left to right, the nodes are filled consecutively with the next elements of the K-expression. Fourth, the process is repeated until a line containing only terminals is formed. In this case, the following expression tree is formed:
              
 
            Looking at the structure of GEP ORFs only, it is difficult or even impossible to see the advantages of such a representation, except perhaps for its simplicity and elegance. However, when
            ORFs are analyzed in the context of a gene, the advantages of this representation become obvious. As previously stated, GEP chromosomes have fixed length, and they are composed of one or more genes of equal length. Consequently, the length of a gene is also fixed. Thus, in
            Gene Expression Programming, what varies is not the length of genes which is constant, but the length of the
            ORFs. Indeed, the length of an
            ORF may be equal to or less than the length of the gene. In the first case, the termination point coincides with the end of the gene, and in the latter, the termination point is somewhere upstream of the end of the gene.
 What is the function of these noncoding regions of GEP genes? We will see that they are the essence of
            Gene Expression Programming and evolvability, for they allow the modification of the genome using several
             genetic operators without restrictions, always producing syntactically correct programs. Thus, in
            Gene Expression Programming, the fundamental property of genotype/phenotype systems 
             syntactic closure  is intrinsic, allowing the totally unconstrained restructuring of the genotype and, consequently, an efficient evolution.
 
 Lets analyze then the  structural organization of GEP genes in order to understand how they invariably code for syntactically correct programs and why they allow an unconstrained application of virtually any genetic operator.
 
 The genes of Gene Expression Programming are composed of a  head and a
            tail. The head contains symbols that represent both functions and terminals, whereas the tail contains only terminals. For each problem, the length of the head
             h is chosen, whereas the length of the tail  t is a function of
             h and the number of arguments  n of the function with more arguments (also called
             maximum arity) and is evaluated by the equation:
 t =  h (n-1) + 1
             Consider a gene for which the  set of functions F = {Q, *, /, -, +} and the
             set of terminals T = {a, b}. In this case  n = 2; if we chose an
             h = 15, then  t = 15 (2 - 1) + 1 = 16; thus, the length of the gene
             g is 15 + 16 = 31. One such gene is shown below (the head is shown in
            blue):
             0123456789012345678901234567890                 *b+a-aQab+//+b+babbabbbababbaaa             
            (4)
 It codes for the following ET:
              
 
            In this case, the K-expression ends at position 7, whereas the gene ends at position
            30.
 Suppose now a mutation occurred at position 6, changing the Q into *. Then the following gene is obtained:
 0123456789012345678901234567890                 *b+a-a*ab+//+b+babbabbbababbaaa             
            (5)
 And its expression gives:
              
 
            In this case, the termination point shifts one position to the right (position
            8), changing slightly the daughter tree.
 Consider another mutation in chromosome (4) above, the substitution of a at position 5 by +,
            resulting in the following chromosome:
 0123456789012345678901234567890                 *b+a-+Qab+//+b+babbabbbababbaaa             
            (6)
 And its expression gives:
              
 
            In this case, the termination point shifts twelve positions to the right (position
            19), enlarging and changing significantly the daughter tree.
 Obviously the opposite also might happen, and the daughter tree might shrink. For example, consider again gene (4) above, and suppose a mutation occurred at position 2, changing the + into Q:
 0123456789012345678901234567890                 *bQa-aQab+//+b+babbabbbababbaaa             
            (6)
 And now its expression results in the following ET:
              
 
            In this case, the ORF ends at position 3, shortening the original ET in
            four nodes.
 So, despite their fixed length, each gene has the potential to code for expression trees of different sizes and shapes, where the simplest is composed of only one node (when the first element of a gene is a terminal) and the largest is composed of as many nodes as the length of the gene (when all the elements of the head are functions with maximum arity).
 
 It is evident from the examples above, that any modification made in the genome, no matter how profound, always results in a structurally correct program. Obviously, the structural organization of genes must be preserved, always maintaining the boundaries between head and tail.
 
 If you want to explore further the plasticity of GEP chromosomes, please see the next chapter
             Genetic Operators where the mechanisms and effects of all the genetic operators used in
            GeneXproTools 4.0 are thoroughly analyzed.
 
 
 |