It is unclear how the weighting of functions and the non-weighting of possibly hundreds or thousands of terminals is dealt with. There is a short note on this on the Help File that suggests that there is some rationalization of the difference between the numbers of functions with weights and the numbers of terminals (variables). For example, when selecting terminals for the tails of a gene, are all terminals weighted equally? If not, how are they weighted since there is no domain knowledge that would assist in doing this other than give equal weight to all? Also, when choosing in the head (functions or terminals) if we had only 4 functions and 100 terminals then clearly the terminals would be selected much too often. Do the weightings get adjusted so that the functions and terminals get approximately the same overall weight or is there some other rule of thumb used? It is important to understand this especially when trying to understand the impact of changing function weights.
First of all, terminals are not weighted in GeneXproTools, that is, they are all weighted equally, no matter whether they are being chosen for the heads or the tails. So the only way of weighting certain variables would be either by duplicating them in the datasets or creating UDFs returning a certain terminal.
In the heads, however, terminals are chosen together with functions and, given that in version 4.0 there are no limits concerning the number of variables, it became essential to automatically balance the function set. This is documented in the Online Knowledge Base and was optimized to produce an efficient evolution. The rule of thumb is very simple: when the number of functions in the function set is smaller than the number of terminals, for each position in the heads, the probability of it being a function is 1/2. When functions outnumber terminals though, all elements (functions and terminals) are equally weighted. These rules are operational both during the creation of the initial population and during mutation. Also important is that when you are using random numerical constants, the rule of thumb for choosing the terminal “?” (the special terminal that represents the random numerical constants) both for the heads and tails is 1 out of 3 terminals.