GeneXproTools 4.0 implements the Correlation
Coefficient fitness function both with and
without parsimony pressure. The
version with parsimony
pressure puts a little pressure on the size of the evolving
solutions, allowing the discovery of more compact models.
data:image/s3,"s3://crabby-images/244d9/244d96be61ff028fac6e0d7372317529aa07bf88" alt=""
The Correlation Coefficient fitness function of GeneXproTools 4.0
is, as expected, based on the standard correlation
coefficient, which is a dimensionless index that ranges from -1 to 1 and reflects the extent of a linear relationship between
the predicted values and the target values.
The correlation coefficient Ci of an individual program
i is evaluated by the equation:
data:image/s3,"s3://crabby-images/0cd80/0cd80bbdd2d0719702acfd6d28776a62ff22db4f" alt=""
where Cov(T,P) is the covariance of the target and model outputs; and
st and sp are the corresponding standard deviations, which are given by:
data:image/s3,"s3://crabby-images/91e10/91e1030c317a9faf7637e85896fd8b7b0ff139d3" alt=""
data:image/s3,"s3://crabby-images/24a8d/24a8d7ba10ac0c3d5ef8b1c504abdcc8f1bad029" alt=""
data:image/s3,"s3://crabby-images/cc854/cc8547a7cc1c9b1124530f004d404d6c1aa80544" alt=""
where P(ij) is the value predicted by the individual program
i for sample case j (out of n fitness cases
or sample cases);
Tj is the target value for fitness case j; and and are given by the formulas:
data:image/s3,"s3://crabby-images/a2f41/a2f41ef003a7e38591a33dfdf6413cc858627d8c" alt=""
data:image/s3,"s3://crabby-images/99f51/99f5104e5fdb1371dacbe2043b46d89d3c9db1cb" alt=""
The correlation coefficient is confined to the range [-1, 1]. When
Ci = 1, there is a perfect positive linear correlation between
T and P, that is, they vary by the same amount. When Ci = -1, there is a perfect negative linear correlation between
T and P, that is, they vary in opposite ways (when T increases,
P decreases by the same amount). When Ci = 0, there is no correlation between
T and P. Intermediate values describe partial correlations and the closer to 1 or -1 the better the model.
The fitness fi of an individual program
i is expressed by the equation: fi = 1000*Ci*Ci
and therefore ranges from 0 to 1000, with 1000 corresponding to the ideal.
Its counterpart with parsimony pressure, uses this fitness
measure fi
as raw fitness rfi and complements
it with a parsimony term.
Thus, in this case, raw maximum fitness rfmax =
1000.
And the overall fitness fppi (that is, fitness with parsimony pressure) is evaluated by the formula: data:image/s3,"s3://crabby-images/c732b/c732ba317765f7cd5d472c972ccdfae7833da0b9" alt=""
where Si is the size of the program, Smax and
Smin represent, respectively, maximum and minimum program sizes and are evaluated by the formulas:
Smax = G (h + t)
Smin = G
where G is the number of genes, and h and t are the head and tail sizes (note that, for simplicity, the linking function was not taken into account). Thus, when
rfi = rfmax and Si =
Smin (highly improbable, though, as this can only happen for very simple functions as this means that all the sub-ETs are composed of just one node),
fppi = fppmax, with fppmax evaluated by the formula:
|