Custom Fitness Function for Logistic Regression
GeneXproTools allows you to design your own custom fitness functions and then use them to create models.
GeneXproTools gives you access to a wide set of essential and useful parameters you may use to design
your fitness functions. Note that most of these parameters, such as the rounding threshold, the cost matrix,
the model bounds and the parsimony pressure and variable pressure, are adjustable parameters
easily accessible through the Fitness Function Tab in the Settings Panel:
- aParameters[0] = number of records
- aParameters[1] = averaged target output
- aParameters[2] = variance of the target output
- aParameters[3] = 0/1 rounding threshold
- aParameters[4] = number of records in the predominant class
- aParameters[5] = minimum program size
- aParameters[6] = maximum program size
- aParameters[7] = number of positive cases
- aParameters[8] = encoding of negative class
- aParameters[9] = encoding of positive class
- aParameters[10] = identifies the dataset: "1" for Training and "0" for Validation
- aParameters[11] = Cost of True Positives
- aParameters[12] = Cost of True Negatives
- aParameters[13] = Cost of False Positives
- aParameters[14] = Cost of False Negatives
- aParameters[15] = Lower Bound
- aParameters[16] = Upper Bound
- aParameters[17] = Parsimony Pressure Rate
- aParameters[18] = Variable Pressure Rate
In addition, GeneXproTools also gives you access to useful information about the structure and
composition of evolving models that are essential for designing custom fitness functions
that favor simpler or more complex solutions:
- aModelInfo[0] = program size
- aModelInfo[1] = used variables
- aModelInfo[2] = number of literals
The code for the custom fitness function must be in JavaScript and can be tested
before evolving a model with it. Note that GeneXproTools uses fitness proportionate selection
to select the models and, therefore, fitness must increase with performance and only
non-negative values are acceptable. Below is the sample code of a simple custom fitness function,
based on the classification accuracy. It's an example of a valid custom fitness function for
logistic regression problems. Note that in this case maximum fitness equals 1000 both for the training and
validation datasets, and you must also feed this information into GeneXproTools through the
Custom Fitness Function window so that all the charts in the Run Panel show correctly.
/////////////////////////////////////////////////////////////////////////////
// All the values of the Target output
// are accessible through the array:
// aOutputTarget[0] = 0
// aOutputTarget[1] = 1
// aOutputTarget[2] = 0
// etc.
// All the values of the Model output
// are accessible through the array:
// aOutputModel[0] = 0
// aOutputModel[1] = 1
// aOutputModel[2] = 1
// etc.
// Essential and useful parameters you may use
// to design your fitness function:
// aParameters[0] = number of records
// aParameters[1] = averaged target output
// aParameters[2] = variance of the target output
// aParameters[3] = 0/1 rounding threshold
// aParameters[4] = number of records in the predominant class
// aParameters[5] = minimum program size
// aParameters[6] = maximum program size
// aParameters[7] = number of positive cases
// aParameters[8] = encoding of negative class
// aParameters[9] = encoding of positive class
// aParameters[10] = identifies the dataset: "1" for Training and "0" for Validation
// aParameters[11] = Cost of True Positives
// aParameters[12] = Cost of True Negatives
// aParameters[13] = Cost of False Positives
// aParameters[14] = Cost of False Negatives
// aParameters[15] = Lower Bound
// aParameters[16] = Upper Bound
// aParameters[17] = Parsimony Pressure Rate
// aParameters[18] = Variable Pressure Rate
// Useful information about the evolving models
// you may use to design your fitness function:
// aModelInfo[0] = program size
// aModelInfo[1] = used variables
// aModelInfo[2] = number of literals
// gepFilePath: local variable with the full path to the gep file
// Your custom fitness function must return a value, for example:
// return fitness;
// Below is an example of a simple fitness function, the Accuracy fitness function,
// for which maximum fitness is equal to 1000:
// ACCURACY FITNESS FUNCTION
var nRecords = aParameters[0];
var roundingThreshold = aParameters[3];
var fitness = 0.0;
var hits = 0;
var negativeClassEncoding = aParameters[8];
var positiveClassEncoding = aParameters[9];
// For the penalty
var VERY_SMALL_FITNESS = 1.0E-11;
var trueNegatives = 0;
var truePositives = 0;
// Fitness evaluation
for (var nR=0; nR<nRecords; nR++)
{
// Convert the value returned by the model into crisp classifications
if (aOutputModel[nR] >= roundingThreshold)
{
aOutputModel[nR] = positiveClassEncoding;
}
else
{
aOutputModel[nR] = negativeClassEncoding;
}
// Evaluation of the fitness components
if (aOutputModel[nR] == aOutputTarget[nR])
{
hits++;
if(aOutputTarget[nR] == positiveClassEncoding)
{
truePositives++;
}
else
{
trueNegatives++;
}
}
} //for nR
// Fitness evaluation & normalization
fitness = (hits / nRecords) * 1000.0;
// Penalty
if ((truePositives == 0) || (trueNegatives == 0))
{
fitness = fitness * VERY_SMALL_FITNESS;
}
return fitness;
/////////////////////////////////////////////////////////////////////////////
See Also:
Related Tutorials:
Related Videos:
|