Choosing the Fitness Function

Relative Squared Error
 
GeneXproTools 4.0 implements the Relative Squared Error (rRSE, with the small "r" indicating that it is based on the relative error rather than the absolute) fitness function both with and without parsimony pressure. The version with parsimony pressure puts a little pressure on the size of the evolving solutions, allowing the discovery of more compact models.

The rRSE fitness function of GeneXproTools is, as expected, based on the standard relative squared error, which is usually based on the absolute error, but obviously the relative error can also be used in order to create a slightly different fitness measure.

The relative squared error is relative to what it would have been if a simple predictor had been used. More specifically, this simple predictor is just the average of the actual values. Thus, the relative squared error takes the total squared error and normalizes it by dividing by the total squared error of the simple predictor.

Mathematically, the rRSE Ei of an individual program i is evaluated by the equation:

where P(ij) is the value predicted by the individual program i for fitness case j (out of n fitness cases); Tj is the target value for fitness case j; andis given by the formula:

For a perfect fit, the numerator is equal to 0 and Ei = 0. So, the rRSE index ranges from 0 to infinity, with 0 corresponding to the ideal.

As it stands, Ei can not be used directly as fitness since, for fitness proportionate selection, the value of fitness must increase with efficiency.

Thus, for evaluating the fitness fi of an individual program i, the following equation is used:

which obviously ranges from 0 to 1000, with 1000 corresponding to the ideal.

Its counterpart with parsimony pressure, uses this fitness measure fi as raw fitness rfi and complements it with a parsimony term.

Thus, in this case, raw maximum fitness rfmax = 1000. And the overall fitness fppi (that is, fitness with parsimony pressure) is evaluated by the formula:

where Si is the size of the program, Smax and Smin represent, respectively, maximum and minimum program sizes and are evaluated by the formulas:

Smax = G (h + t)

Smin = G

where G is the number of genes, and h and t are the head and tail sizes (note that, for simplicity, the linking function was not taken into account). Thus, when rfi = rfmax and Si = Smin (highly improbable, though, as this can only happen for very simple functions as this means that all the sub-ETs are composed of just one node), fppi = fppmax, with fppmax evaluated by the formula:



Home | Contents | Previous  | Next