External Custom Fitness
In this article we show how to create an external Custom Fitness Function for GeneXproTools
using Microsoft Visual Basic 6.0. Until now, to create custom fitness functions you had to be
familiar with Javascript but now you can use the techniques in this article to create custom fitness
functions in any language that supports the creation of COM components.
This covers almost every language from scripting languages like Perl and Python to C++, Delphi, C#
and Visual Basic.
You can download all the project files, including
complete projects for Visual Basic 6, VB.NET and C#, from
the resources links a the bottom of this page.
How Does It Work?
The key to this technique is to delegate the fitness function calculation to an in-process component
that is created and called from the Javascript fitness function in GeneXproTools. The Javascript is very simple:
var proc = new ActiveXObject("[PROGID.CLASSID]");
return proc.Calculate(gxptTarget, gxptOutput, gxptParams, gxptModelInfo);
The code above creates a new object and passes the GeneXproTools arrays to the method Calculate.
These arrays are Javascript VBArray types that are new in version 4.0 of GeneXproTools and correspond to
the various arrays of the previous version, which are still supported. These arrays are variants
containing arrays of variants and their contents are:
- gxptTarget: Contains the values of the dependent variable
- gxptOutput: Contains the values calculated by the current model
- gxptParams: Contains settings of the run
- gxptParams(0) = number of samples
- gxptParams(1) = averaged target output
- gxptParams(2) = variance of the target output
- gxptParams(3) = 0/1 rounding threshold
- gxptParams(4) = number of samples in the predominant class
- gxptParams(5) = minimum program size
- gxptParams(6) = maximum program size
- gxptModelInfo: Contains information about the model
- gxptModelInfo(0) = program size
- gxptModelInfo(1) = used variables
- gxptModelInfo(2) = number of literals
The "[PROGID.CLASSID]" string in the code above must be replaced with the respective ProjectName.ClassName of your VB project. In our sample project the complete code would be:
var proc = new ActiveXObject("VBCustomFitness.MSE");
return proc.Calculate(gxptTarget, gxptOutput, gxptParams, gxptModelInfo);
And this is all that needs to be added to the custom fitness function in GeneXproTools.
The image below shows the Javascript part of the custom fitness function in GeneXproTools.
Creating the VB Fitness Function
The easiest way to test this feature is to download the tutorial’s files, open the project VBCustomFitness in Visual Basic 6.0 and compile the library. Then start GeneXproTools, open the file VBCustomFitnessTest.gep (also in the tutorial's files) and start a run. If the run fails to start with the error “The custom fitness code does not compile” you must review the Javascript code ensuring that it matches the example and that the ActiveX string is correct.
If you prefer to start your own project you have to create a new VB project of type ActiveX DLL, add a class and implement the fitness function. You will also have to adjust the class id to match your component. If you change the name of the method Calculate or its signature you also have to adapt the Javascript code.
Finally it is also possible to debug you fitness function from within Visual Studio or VB6's IDE. For VB6 open the
project, add a breakpoint in the function body and press F5. Then open GeneXproTools and start the run.
For fitness functions
created with VB.NET or C# you must open the project and change the debugging properties to start an external program. Point this
property to the GeneXproTools executable (usually at c:\Program Files\GeneXproTools 4\GeneXproTools.exe) and then press F5 to start
a debugging session.
Notice that aborting a debug session will probably unload GeneXproTools losing all your unsaved changes.
In some cases it may also corrupt the run file so we strongly suggest that you
create backups of your runs
before using them for debugging a fitness function.
The tutorial’s projects implement the Mean Squared Error fitness function, with the following Calculate method
(using the VB6 version):
Public Function Calculate(ByRef gxptTarget As Variant, _
ByRef gxptOutput As Variant, _
ByRef gxptParams As Variant, _
ByRef gxptModelInfo As Variant) _
As Variant
Dim nSamples As Long: nSamples = gxptParams(0)
Dim fitness As Double
Dim modelMinusTargetSquared As Double
Dim MSE As Double
Dim temp1 As Double
Dim i As Long
For i = 0 To nSamples - 1
temp1 = 0
temp1 = gxptOutput(i) - gxptTarget(i)
temp1 = temp1 * temp1
modelMinusTargetSquared = modelMinusTargetSquared + temp1
Next
MSE = modelMinusTargetSquared / nSamples
If MSE <= 0.000000001 Then
MSE = 0#
End If
Calculate = (1 / (1 + MSE)) * 1000
End Function
Performance
This approach is about 10% faster when compared with the same implementation of MSE in Javascript.
This is an approximate value and was measured for a very simple fitness function. But the real advantage of this new technique will assert itself for more complex fitness functions involving more time consuming calculations.
With this approach you can store information between instantiations of your code allowing you to perform expensive operations like opening files only once and this is where you will see the major performance improvements.
The GXPT4CFHelper.DataHelper
Library
This library is a free add-on to GeneXproTools that allows you to access
run datasets from within your custom fitness functions. The library is part of this articles’ files which can be downloaded
here. To install the DataHelper library unzip the files, open a command prompt, navigate to the folder where the file GXPT4CFHelper.dll file is and run the following command:
regsvr32 GXPT4CFHelper.dll
If you need to uninstall the library run the command:
regsvr32 GXPT4CFHelper.dll /u
From this point on the DataHelper class will be available to your internal and external Custom
Fitness functions.
DataHelper methods
- DataHelper.Initialise(ByRef RunPath
As String,
ByRef gxptTarget
As Variant,
ByRef gxptOutput
As Variant,
ByRef gxptParams
As Variant,
ByRef gxptModelInfo
As Variant)
This method
initialises the library, must be
called every time the fitness
function is calculated and this call
must appear before any other call
into the library.
Parameters:
RunPath: string
– Must contain the complete path to
the GeneXproTools file.
gxptTarget, gxptOutput,
gxptParams, gxptModelInfo:
variant – These are the internal
GeneXproTools variant arrays that
are passed to the fitness function
and must be passed on to the
library.
- GetDoubleValue(ByVal Column
As Long,
ByVal Row
As Long,
ByVal DataSet
As DataSetEnum)
This method returns a data point of either the training set of the testing (if it exists).
Parameters:
Column: long – The variable index starting at zero.
Row: long – The sample index starting at zero
DataSet: enum – Flag that selects the set to fetch the data from:
- dsTrainingSet(1) – Training Set
- dsTestingSet(2) – Testing Set
DataHelper properties
-
CurrentSet: enum -
dsTrainingSet(1), dsTestingSet(2)
This property returns the set that is
now being processed. It relies on both
sets having a different number of
samples and it does not return a correct
value when both the Training and Testing
set have the same number of samples.
-
Columns: long
Returns the number of variables plus the
dependent variable.
-
TrainingSamples: long
Returns the number of samples in use in
the Training set.
-
TestingSamples: long
Returns the number of samples in use in
the Testing set.
-
ErrorNumber: long
Returns the last error after a
DataHelper call. You should check if
DataHelper.ErrorNumber is not zero to
ensure that the previous call did not
fail.
-
ErrorDescription:
string
Short description of the last error.
How It Works and Limitations
The DataHelper works by opening the GeneXproTools file and reading its datasets to memory. For the library to work correctly you must point it to the same run file that you are working on in GeneXproTools and you should save before you start a run. The DataHelper caches the loaded data for the duration of the session so if you replace any of the datasets you must restart GeneXproTools. As the data is loaded once the library will not be reset when you open a different run. You will have to restart GeneXproTools every time you want to use a different file.
The library does not raise any errors and will fail silently when your code requests invalid data (like an inexistent data point). When an error is detected the ErrorNumber property is set to a non-zero value and the ErrorDescription is set to the description of the error.
The sample projects include examples on how to use this library in VB, VB.NET and C#.
Precautions
This is an advanced and powerful feature that also carries some responsibility. Your code is responsible for handling any exceptions and free up any memory it allocates. Also it must not modify the
GeneXproTools parameters or attempt to deallocate them.
Finally, to allow GeneXproTools to function correctly your fitness function must always return a double value between 0 and maximum fitness.
Resources
VB6, VB.NET, C# projects and the DataHelper library
Mean Squared Error
User Defined Fitness Functions
Files for the first edition of this article. VB6 project
Last modified:
July 16, 2007
Disclaimer & License: The code made available in this article is provided as-is,
it is copyright of Gepsoft Limited and falls under the license of GeneXproTools.
The DataHelper library is also provided as-is under the same license agreement as GeneXproTools. Even though
this library is not part of the product we will provide support for its use at our discretion.
|