Genetic Operators: Fixed-Root Mutation

Posted Jan 6, 08:39 PM
Comments None

For the new project "New Project: Multi-class Classification & Trading Strategies" we implemented 2 new genetic operators: Fixed-Root Mutation and Conservative Fixed-Root Mutation. These genetic operators were implemented in all modeling categories and core algorithms of GeneXproTools (GEP and GEP-RNC). In this post I'll talk about Fixed-Root Mutation and its uses in the context of this new project.

As the name implies, the Fixed-Root Mutation operator implements a different version of the Mutation operator, where the root of each sub-tree is conserved. So, if we have a multigenic system encoding n trees, with the Fixed-Root Mutation operator we can change all the elements in the chromosome except the ones in the first position of each gene (this position encodes the root of the sub-tree).

The Fixed-Root Mutation operator is interesting because it gives us more control over the root position, which might come in handy in problems like the ones we set out to solve with this new project "New Project: Multi-class Classification & Trading Strategies". For example, if we are using a single-tree system for discovering buy-sell-wait rules, by fixing the function of interest (the buy-sell-wait function in this case) at the root of the tree, we might get a faster and more efficient evolution.

The Fixed-Root Mutation operator has also other applications besides fixing the root function in a unigenic system. For example, in multigenic systems we can use it to impose certain structural constraints on our solutions, especially if we use it in combination with other genetic operators. For example, we can warm start the design process with a weak linear solution and then let the learning algorithm discover a better solution to our problem.

In the next post I'll talk about the related Conservative Fixed-Root Mutation operator and its uses.

Author Dr. Candida Ferreira

New Classification Algorithms

Posted Jan 6, 07:39 PM
Comments None

Both the new classifier functions and the new linking functions introduced with this mini-release give rise to new ways of doing classification in GeneXproTools.

For example, the new classifier functions of 2-6 discrete outputs greatly simplify the creation of classification models in the Regression Framework when we are interested in crisp outcomes for the model outputs.

Furthermore, these new classifier functions can be used both in unigenic and multigenic systems not only in binary classification but also in multi-class classification. It's worth noting, however, that for multi-class classification better results are usually obtained by decomposing the problem in k sub-tasks (k is the number of classes). But as usual in these situations it all depends on the problem and on the data. For example, all the new classifier functions of 3-6 outputs perform great on the Iris dataset (a 3-class classification problem), but on the Balance Scale dataset (also with 3 classes and the same number of input variables) they perform poorly, mainly due to the unbalanced distribution of classes in the original data. This is obviously a problem for which we can find a solution, like for instance, creating a balanced dataset, but the recommended course of action for multi-class classification is to divide the problem into k binary sub-tasks and solve each one of them in the Logistic Regression Framework or Classification Framework where we have access to the probabilities for each class.

Notwithstanding, for multi-class classification problems where these new 3-6 output classifier functions work well, you can either use these new discrete functions in combination with other math functions in systems with just one tree or in systems with multiple trees, linking the sub-trees with the most appropriate linking function. For example, the Min & Max linking functions work particularly well in those cases.

The other alternative is to use the new discrete output linking functions. In this case, you can either add some of the new discrete output functions to your function set or use just a discrete output function for the linking. Particularly interesting is the setup with just 2 trees linked by a function like the buy-sell-wait function described in the post "Function Design: The BUY-SELL-WAIT Function". But this obviously extends to all kinds of 3-class classification problems as long as the data is amenable to this kind of approach.

And to finish, just a quick note on the uses of these new discrete output linkers in the Logistic Regression Framework and the Classification Framework.

GeneXproTools allows you to use the new linking functions both in Logistic Regression and Classification, but you must be aware that the models you're creating are much simpler (which might be great if you're interested in just that) than the ones created with non-discrete linkers (addition, min, max, Avg2 and others). Therefore these simpler models won't give you much differentiation in terms of probabilities. But again, this might prove useful in certain problems or as a form to gain insight into your data.

So you're free to explore all these new classification algorithms by exploring different combinations of fitness functions (the fitness functions of Regression, Logistic Regression and Classification were all fine-tuned for different purposes), visualization tools (the model fitting charts of Regression and Logistic Regression/Classification are all different), and analytics (the measures of fit used in Regression and Logistic Regression/Classification are also different).

Have fun!

Author Dr. Candida Ferreira

New Linking Functions

Posted Jan 5, 11:23 PM
Comments None

The new linking functions added to GeneXproTools with this mini-release include a total of 6 discrete output classifier functions of 2 and 3 outputs. All these functions showed good performance on the Iris dataset both as normal building blocks when added to the default function set and also as linking functions in systems with 2 and more genes.

In the 3-output category we implemented 3 new linking functions, all of them with outputs of the form {-1, 0, 1}. One of these linkers is the buy-sell-wait function (represented in GeneXproTools as CL3A) described in the post "Function Design: The BUY-SELL-WAIT Function". The other two are the new 3-output classifier functions CL3B and CL3C introduced in the post "Function Design: More 3-Output Classifier Functions".

In the binary category, we also added 3 new linking functions: CL2A, CL2D, and the argmin function of 2 arguments (represented as AMin2 in GeneXproTools). I didn't talk much about these functions, but they all perform very well on the Iris dataset, sub-problem Virginica/Not Virginica. But the main reason for their inclusion and not others (for example the functions CL2B and CL2C) was the fact that they have a neutral gene.

The C++ code for all these new linking functions is shown below:

double gepCL2A(double x, double y)
{
    if (x > 0.0 && y > 0.0)
        return 1.0;
    else
        return -1.0;
}

double gepCL2D(double x, double y)
{
    if (x > 0.0 && y > 0.0)
        return 1.0;
    else
        return 0.0;
}

double gepCL3A(double x, double y)
{
    if (x > 0.0 && y < 0.0)
        return 1.0;
    else
        if (x < 0.0 && y > 0.0)
            return -1.0;
        else
            return 0.0;
}

double gepCL3B(double x, double y)
{
    if (x >= 1.0 && y >= 1.0)
        return 1.0;
    else
        if (x <= -1.0 && y <= -1.0)
            return -1.0;
        else
            return 0.0;
}

double gepCL3C(double x, double y)
{
    if (x > 0.0 && y > 0.0)
        return 1.0;
    else
        if (x < 0.0 && y < 0.0)
            return -1.0;
        else
            return 0.0;
}

double gepAMin2(double x, double y)
{
    if (x < y)
        return 0.0;
    else
        return 1.0;
}

These new linking functions were implemented in the modeling frameworks of Regression, Classification, and Logistic Regression and therefore can have different uses depending on the framework you are working on. In all cases, though, these new linking functions can be used to solve different kinds of classification problems, exploring the different modeling tools available in a particular framework, including different fitness functions, visualization tools and analytics. In the next post I'll talk more about the different kinds of classification models these new linking functions allow us to explore.

Author Dr. Candida Ferreira

New Math Functions in Visual Basic

Posted Jan 4, 09:50 PM
Comments None

For the new 39 math functions of GeneXproTools, the Visual Basic code is almost an exact clone of the Excel VBA code. The only difference is related to the problem of minus zero in Excel median calculations described in the post "Debugging: Minus Zero in Excel Median Calculations". We realized then that by changing the value returned by the new discrete output functions to "0" instead of "0.0", the number of "-0" occurrences in Excel decreased. So, although this fix did not solve the problem of negative zero in Excel, we decided to adopt this format for all discrete output values of all the new functions in the Excel VBA. And that's the only difference you'll find between the Visual Basic Grammar and the Excel VBA Grammar, as far the new 39 math functions are concerned.

So here's the VB code for the new 39 functions that will be added to the built-in math functions of GeneXproTools in the light of the new project "New Project: Multi-class Classification & Trading Strategies":

Function gepRamp1(ByVal x As Double) As Double
    If x > 0.0 Then
        gepRamp1 = x
    Else
        gepRamp1 = 0.0
    End If
End Function

Function gepRamp2(ByVal x As Double) As Double
    If x > 0.0 Then
        gepRamp2 = 0.0
    Else
        gepRamp2 = x
    End If
End Function

Function gepRamp3(ByVal x As Double) As Double
    If x > 0.0 Then
        gepRamp3 = 0.0
    Else
        gepRamp3 = -x
    End If
End Function

Function gepRamp4(ByVal x As Double) As Double
    If x > 0.0 Then
        gepRamp4 = -x
    Else
        gepRamp4 = 0.0
    End If
End Function

Function gepStep1(ByVal x As Double) As Double
    If x > 0.0 Then
        gepStep1 = 1.0
    Else
        gepStep1 = -1.0
    End If
End Function

Function gepStep2(ByVal x As Double) As Double
    If x > 0.0 Then
        gepStep2 = 1.0
    Else
        gepStep2 = 0.0
    End If
End Function

Function gepStep3(ByVal x As Double) As Double
    If x >= 1.0 Then
        gepStep3 = 1.0
    ElseIf x <= -1.0 Then
        gepStep3 = -1.0
    Else
        gepStep3 = x
    End If
End Function

Function gepStep4(ByVal x As Double) As Double
    If x >= 1.0 Then
        gepStep4 = 1.0
    ElseIf x <= 0.0 Then
        gepStep4 = 0.0
    Else
        gepStep4 = x
    End If
End Function

Function gepCL2A(ByVal x As Double, ByVal y As Double) As Double
    If x > 0.0 And y > 0.0 Then
        gepCL2A = 1.0
    Else
        gepCL2A = -1.0
    End If
End Function

Function gepCL2B(ByVal x As Double, ByVal y As Double) As Double
    If x >= 0.0 And y < 0.0 Then
        gepCL2B = -1.0
    Else
        gepCL2B = 1.0
    End If
End Function

Function gepCL2C(ByVal x As Double, ByVal y As Double) As Double
    If x > 1.0 And y < -1.0 Then
        gepCL2C = -1.0
    Else
        gepCL2C = 1.0
    End If
End Function

Function gepCL2D(ByVal x As Double, ByVal y As Double) As Double
    If x > 0.0 And y > 0.0 Then
        gepCL2D = 1.0
    Else
        gepCL2D = 0.0
    End If
End Function

Function gepCL2E(ByVal x As Double, ByVal y As Double) As Double
    If x >= 0.0 And y <= 0.0 Then
        gepCL2E = 0.0
    Else
        gepCL2E = 1.0
    End If
End Function

Function gepCL2F(ByVal x As Double, ByVal y As Double) As Double
    If x > 1.0 And y < -1.0 Then
        gepCL2F = 0.0
    Else
        gepCL2F = 1.0
    End If
End Function

Function gepCL3A(ByVal x As Double, ByVal y As Double) As Double
    If x > 0.0 And y < 0.0 Then
        gepCL3A = 1.0
    ElseIf x < 0.0 And y > 0.0 Then
        gepCL3A = -1.0
    Else
        gepCL3A = 0.0
    End If
End Function

Function gepCL3B(ByVal x As Double, ByVal y As Double) As Double
    If x >= 1.0 And y >= 1.0 Then
        gepCL3B = 1.0
    ElseIf x <= -1.0 And y <= -1.0 Then
        gepCL3B = -1.0
    Else
        gepCL3B = 0.0
    End If
End Function

Function gepCL3C(ByVal x As Double, ByVal y As Double) As Double
    If x > 0.0 And y > 0.0 Then
        gepCL3C = 1.0
    ElseIf x < 0.0 And y < 0.0 Then
        gepCL3C = -1.0
    Else
        gepCL3C = 0.0
    End If
End Function

Function gepMap3A(ByVal x As Double, ByVal y As Double) As Double
    Const SLACK As Double = 10.0
    If y < (x - SLACK) Then
        gepMap3A = -1.0
    ElseIf y > (x + SLACK) Then
        gepMap3A = 1.0
    Else
        gepMap3A = 0.0
    End If
End Function

Function gepMap3B(ByVal x As Double, ByVal y As Double, ByVal z As Double) As Double
    ' evaluate minValue(x,y) and maxValue(x,y)
    Dim minValue As Double
    Dim maxValue As Double
    minValue = x
    maxValue = y
    If minValue > y Then
        minValue = y
        maxValue = x
    End If

    If z < minValue Then
        gepMap3B = -1.0
    ElseIf z > maxValue Then
        gepMap3B = 1.0
    Else
        gepMap3B = 0.0
    End If
End Function

Function gepMap3C(ByVal a As Double, ByVal b As Double, ByVal c As Double, ByVal d As Double) As Double
    ' evaluate minValue(a,b,c) and maxValue(a,b,c)
    '
    ' evaluate minValue(a,b,c)
    Dim minValue As Double
    minValue = a
    If minValue > b Then minValue = b
    If minValue > c Then minValue = c
    ' evaluate maxValue(a,b,c)
    Dim maxValue As Double
    maxValue = a
    If maxValue < b Then maxValue = b
    If maxValue < c Then maxValue = c

    If d < minValue Then
        gepMap3C = -1.0
    ElseIf d > maxValue Then
        gepMap3C = 1.0
    Else
        gepMap3C = 0.0
    End If
End Function

Function gepMap4A(ByVal x As Double, ByVal y As Double) As Double
    Const SLACK As Double = 10.0
    If y < (x - SLACK) Then
        gepMap4A = 0.0
    ElseIf y >= (x - SLACK) And y < x Then
        gepMap4A = 1.0
    ElseIf y >= x And y < (x + SLACK) Then
        gepMap4A = 2.0
    ElseIf y >= (x + SLACK) Then
        gepMap4A = 3.0
    End If
End Function

Function gepMap4B(ByVal x As Double, ByVal y As Double, ByVal z As Double) As Double
    ' evaluate minValue(x,y), maxValue(x,y) and midrange
    Dim minValue As Double
    Dim maxValue As Double
    minValue = x
    maxValue = y
    If minValue > y Then
        minValue = y
        maxValue = x
    End If
    Dim midrange As Double
    midrange = (maxValue + minValue)/2.0

    If z < minValue Then
        gepMap4B = 0.0
    ElseIf z >= minValue And z < midrange Then
        gepMap4B = 1.0
    ElseIf z >= midrange And z < maxValue Then
        gepMap4B = 2.0
    ElseIf z >= maxValue Then
        gepMap4B = 3.0
    End If
End Function

Function gepMap4C(ByVal a As Double, ByVal b As Double, ByVal c As Double, ByVal d As Double) As Double
    ' evaluate minValue(a,b,c), maxValue(a,b,c) and midleValue(a,b,c)
    '
    ' evaluate minValue(a,b,c)
    Dim minValue As Double
    Dim argMin As Long
    minValue = a
    argMin = 0
    If minValue > b Then
        minValue = b
        argMin = 1
    End If
    If minValue > c Then
        minValue = c
        argMin = 2
    End If
    ' evaluate maxValue(a,b,c)
    Dim maxValue As Double
    Dim argMax As Long
    maxValue = a
    argMax = 0
    If maxValue < b Then
        maxValue = b
        argMax = 1
    End If
    If maxValue < c Then
        maxValue = c
        argMax = 2
    End If
    ' evaluate midleValue(a,b,c)
    Dim midleValue As Double
    midleValue = c
    If 0 <> argMin And 0 <> argMax Then
        midleValue = a
    End If
    If 1 <> argMin And 1 <> argMax Then
        midleValue = b
    End If

    If d < minValue Then
        gepMap4C = 0.0
    ElseIf d >= minValue And d < midleValue Then
        gepMap4C = 1.0
    ElseIf d >= midleValue And d < maxValue Then
        gepMap4C = 2.0
    ElseIf d >= maxValue Then
        gepMap4C = 3.0
    End If
End Function

Function gepMap5A(ByVal x As Double, ByVal y As Double) As Double
    Const SLACK As Double = 15.0
    If y < (x - SLACK) Then
        gepMap5A = 0.0
    ElseIf y >= (x - SLACK) And y < (x - SLACK/3.0) Then
        gepMap5A = 1.0
    ElseIf y >= (x - SLACK/3.0) And y < (x + SLACK/3.0) Then
        gepMap5A = 2.0
    ElseIf y >= (x + SLACK/3.0) And y < (x + SLACK) Then
        gepMap5A = 3.0
    ElseIf y >= (x + SLACK) Then
        gepMap5A = 4.0
    End If
End Function

Function gepMap5B(ByVal x As Double, ByVal y As Double, ByVal z As Double) As Double
    ' evaluate minValue(x,y), maxValue(x,y), midpoint1, midpoint2
    Dim minValue As Double
    Dim maxValue As Double
    minValue = x
    maxValue = y
    If minValue > y Then
        minValue = y
        maxValue = x
    End If
    Dim intervalLength As Double
    Dim midpoint1 As Double
    Dim midpoint2 As Double
    intervalLength = (maxValue - minValue)/3.0
    midpoint1 = minValue + intervalLength
    midpoint2 = minValue + 2.0*intervalLength

    If z < minValue Then
        gepMap5B = 0.0
    ElseIf z >= minValue And z < midpoint1 Then
        gepMap5B = 1.0
    ElseIf z >= midpoint1 And z < midpoint2 Then
        gepMap5B = 2.0
    ElseIf z >= midpoint2 And z < maxValue Then
        gepMap5B = 3.0
    ElseIf z >= maxValue Then
        gepMap5B = 4.0
    End If
End Function

Function gepMap5C(ByVal a As Double, ByVal b As Double, ByVal c As Double, ByVal d As Double) As Double
    ' evaluate minValue(a,b,c), maxValue(a,b,c), midleValue(a,b,c), midrange1, midrange2
    '
    ' evaluate minValue(a,b,c)
    Dim minValue As Double
    Dim argMin As Long
    minValue = a
    argMin = 0
    If minValue > b Then
        minValue = b
        argMin = 1
    End If
    If minValue > c Then
        minValue = c
        argMin = 2
    End If
    ' evaluate maxValue(a,b,c)
    Dim maxValue As Double
    Dim argMax As Long
    maxValue = a
    argMax = 0
    If maxValue < b Then
        maxValue = b
        argMax = 1
    End If
    If maxValue < c Then
        maxValue = c
        argMax = 2
    End If
    ' evaluate midleValue(a,b,c)
    Dim midleValue As Double
    midleValue = c
    If 0 <> argMin And 0 <> argMax Then midleValue = a
    If 1 <> argMin And 1 <> argMax Then midleValue = b
    Dim midrange1 As Double
    Dim midrange2 As Double
    midrange1 = (minValue + midleValue)/2.0
    midrange2 = (midleValue + maxValue)/2.0

    If d < minValue Then
        gepMap5C = 0.0
    ElseIf d >= minValue And d < midrange1 Then
        gepMap5C = 1.0
    ElseIf d >= midrange1 And d < midrange2 Then
        gepMap5C = 2.0
    ElseIf d >= midrange2 And d < maxValue Then
        gepMap5C = 3.0
    ElseIf d >= maxValue Then
        gepMap5C = 4.0
    End If
End Function

Function gepMap6A(ByVal x As Double, ByVal y As Double) As Double
    Const SLACK As Double = 10.0
    If y < (x - SLACK) Then
        gepMap6A = 0.0
    ElseIf y >= (x - SLACK) And y < (x - SLACK/2.0) Then
        gepMap6A = 1.0
    ElseIf y >= (x - SLACK/2.0) And y < x Then
        gepMap6A = 2.0
    ElseIf y >= x And y < (x + SLACK/2.0) Then
        gepMap6A = 3.0
    ElseIf y >= (x + SLACK/2.0) And y < (x + SLACK) Then
        gepMap6A = 4.0
    ElseIf y >= (x + SLACK) Then
        gepMap6A = 5.0
    End If
End Function

Function gepMap6B(ByVal x As Double, ByVal y As Double, ByVal z As Double) As Double
    ' evaluate minValue(x,y), maxValue(x,y), midrange, midpoint1, midpoint2
    Dim minValue As Double
    Dim maxValue As Double
    minValue = x
    maxValue = y
    If minValue > y Then
        minValue = y
        maxValue = x
    End If
    Dim midrange As Double
    Dim midpoint1 As Double
    Dim midpoint2 As Double
    midrange = (minValue + maxValue)/2.0
    midpoint1 = (minValue + midrange)/2.0
    midpoint2 = (midrange + maxValue)/2.0

    If z < minValue Then
        gepMap6B = 0.0
    ElseIf z >= minValue And z < midpoint1 Then
        gepMap6B = 1.0
    ElseIf z >= midpoint1 And z < midrange Then
        gepMap6B = 2.0
    ElseIf z >= midrange And z < midpoint2 Then
        gepMap6B = 3.0
    ElseIf z >= midpoint2 And z < maxValue Then
        gepMap6B = 4.0
    ElseIf z >= maxValue Then
        gepMap6B = 5.0
    End If
End Function

Function gepMap6C(ByVal a As Double, ByVal b As Double, ByVal c As Double, ByVal d As Double) As Double
    ' evaluate minValue(a,b,c), maxValue(a,b,c), midleValue(a,b,c), midrange1, midrange2
    '
    ' evaluate minValue(a,b,c)
    Dim minValue As Double
    Dim argMin As Long
    minValue = a
    argMin = 0
    If minValue > b Then
        minValue = b
        argMin = 1
    End If
    If minValue > c Then
        minValue = c
        argMin = 2
    End If
    ' evaluate maxValue(a,b,c)
    Dim maxValue As Double
    Dim argMax As Long
    maxValue = a
    argMax = 0
    If maxValue < b Then
        maxValue = b
        argMax = 1
    End If
    If maxValue < c Then
        maxValue = c
        argMax = 2
    End If
    ' evaluate midleValue(a,b,c)
    Dim midleValue As Double
    midleValue = c
    If 0 <> argMin And 0 <> argMax Then midleValue = a
    If 1 <> argMin And 1 <> argMax Then midleValue = b
    ' evaluate midrange1 and midrange2
    Dim midrange1 As Double
    Dim midrange2 As Double
    midrange1 = (minValue + midleValue)/2.0
    midrange2 = (midleValue + maxValue)/2.0

    If d < minValue Then
        gepMap6C = 0.0
    ElseIf d >= minValue And d < midrange1 Then
        gepMap6C = 1.0
    ElseIf d >= midrange1 And d < midleValue Then
        gepMap6C = 2.0
    ElseIf d >= midleValue And d < midrange2 Then
        gepMap6C = 3.0
    ElseIf d >= midrange2 And d < maxValue Then
        gepMap6C = 4.0
    ElseIf d >= maxValue Then
        gepMap6C = 5.0
    End If
End Function

Function gepECL3A(ByVal x As Double, ByVal y As Double, ByVal z As Double) As Double
    If y > x And z < x Then
        gepECL3A = 1.0
    ElseIf y < x And z > x Then
        gepECL3A = -1.0
    Else
        gepECL3A = 0.0
    End If
End Function

Function gepECL3B(ByVal x As Double, ByVal y As Double, ByVal z As Double) As Double
    If y > x And z > x Then
        gepECL3B = 1.0
    ElseIf y < x And z < x Then
        gepECL3B = -1.0
    Else
        gepECL3B = 0.0
    End If
End Function

Function gepECL3C(ByVal x As Double, ByVal y As Double, ByVal z As Double) As Double
    If y >= x And z >= x Then
        gepECL3C = 1.0
    ElseIf y <= -x And z <= -x Then
        gepECL3C = -1.0
    Else
        gepECL3C = 0.0
    End If
End Function

Function gepECL3D(ByVal a As Double, ByVal b As Double, ByVal c As Double, ByVal d As Double) As Double
    ' evaluate minValue(a,b) and maxValue(a,b)
    Dim minValue As Double
    Dim maxValue As Double
    minValue = a
    maxValue = b
    If minValue > b Then
        minValue = b
        maxValue = a
    End If

    If c >= maxValue And d >= maxValue Then
        gepECL3D = 1.0
    ElseIf c <= minValue And d <= minValue Then
        gepECL3D = -1.0
    Else
        gepECL3D = 0.0
    End If
End Function

Function gepAMin2(ByVal x As Double, ByVal y As Double) As Double
    If x < y Then
        gepAMin2 = 0.0
    Else
        gepAMin2 = 1.0
    End If
End Function

Function gepAMin3(ByVal x As Double, ByVal y As Double, ByVal z As Double) As Double
    Dim temp As Double
    Dim argMin As Double
    temp = x
    argMin = 0.0
    If temp >= y Then
        temp = y
        argMin = 1.0
    End If
    If temp >= z Then argMin = 2.0
    gepAMin3 = argMin
End Function

Function gepAMin4(ByVal a As Double, ByVal b As Double, ByVal c As Double, ByVal d As Double) As Double
    Dim temp As Double
    Dim argMin As Double
    temp = a
    argMin = 0.0
    If temp >= b Then
        temp = b
        argMin = 1.0
    End If
    If temp >= c Then
        temp = c
        argMin = 2.0
    End If
    If temp >= d Then argMin = 3.0
    gepAMin4 = argMin
End Function

Function gepAMax2(ByVal x As Double, ByVal y As Double) As Double
    If x >= y Then
        gepAMax2 = 0.0
    Else
        gepAMax2 = 1.0
    End If
End Function

Function gepAMax3(ByVal x As Double, ByVal y As Double, ByVal z As Double) As Double
    Dim temp As Double
    Dim argMax As Double
    temp = x
    argMax = 0.0
    If temp < y Then
        temp = y
        argMax = 1.0
    End If
    If temp < z Then argMax = 2.0
    gepAMax3 = argMax
End Function

Function gepAMax4(ByVal a As Double, ByVal b As Double, ByVal c As Double, ByVal d As Double) As Double
    Dim temp As Double
    Dim argMax As Double
    temp = a
    argMax = 0.0
    If temp < b Then
        temp = b
        argMax = 1.0
    End If
    If temp < c Then
        temp = c
        argMax = 2.0
    End If
    If temp < d Then argMax = 3.0
    gepAMax4 = argMax
End Function

Author Dr. Candida Ferreira

Debugging: Minus Zero in Excel Median Calculations

Posted Jan 3, 10:22 PM
Comments None

Debugging is as much a part of software development as is software design, programming or testing. But there are bugs that leave us scratching our heads and make us look under the hood of all our code components for the elusive cause. And sometimes we get lucky and find not only the bug but also new ways of making the code more efficient. We had one such instance happening during this project "New Project: Multi-class Classification & Trading Strategies".

The crash was a terrible one and was happening during Ensemble Deployment to Excel using the new math functions: Excel would crash, GeneXproTools would crash and we had to take them both down with the Task Manager. It only happened with model ensembles: when the models were deployed individually, the system wouldn't crash. Also interesting was that it only happened with unigenic systems using the new math functions with discrete outputs like {-1, 0, 1} or {0, 1, 2}; outputs like {3, 4, 5} worked just fine. And curiouser and curiouser, the crash disappeared if we added a neutral gene to all the models of the ensemble before deploying to Excel!

What the heck was going on? It couldn't be the Excel VBA code, right? Otherwise the individual models would also have crashed! And the same for the ensembles with a neutral gene! But they don't crash! So the culprit must be somewhere else. But where? (That's where out of desperation we went looking for it where it couldn't possibly be and found new ways of improving our code. And in this case we found a new way of speeding up the deployment of ensembles and models to Excel by optimizing the way Excel calculations were done.)

Well, after much head scratching we finally found the culprit: Minus zero in Excel calculations! A bug not even acknowledged by Microsoft (they say it's fixed and solved as it's just a minor inconvenience for anyone formatting cell colors according to the sign; see the discussion here).

But there's another problem with negative zero in Excel and that's the crash it causes in median calculations! Excel crashes completely and you have to close it down with the Task Manager. And it happens every time. I'm just surprised such a bug exists in Excel, but maybe medians are not as widely used as one would have thought.

Well, we could not wait for Microsoft to solve the problem of minus zero in their median algorithm (and elsewhere) and had to make a decision: either remove the calculations for the Median Model in the Excel worksheet (GeneXproTools evaluates the Median Model and the Average Model for the model ensembles deployed to Excel) or find a workaround to fix the minus zero problem in our Excel code.

We decided on the workaround as it does not corrupt the code much and solves the problem: So we now add zero to the model output in all Excel calculations. It looks a little silly if you don't know why it's there, but it prevents the crash of the median calculations in Excel with minus zero.

Author Dr. Candida Ferreira

← Older Newer →

GEPSODY - Gepsoft Blog

Genetic Operators: Fixed-Root Mutation

New Classification Algorithms

New Linking Functions

New Math Functions in Visual Basic

Debugging: Minus Zero in Excel Median Calculations

Search