For the new project "New Project: Multi-class Classification & Trading Strategies" we implemented 2 new genetic operators: Fixed-Root Mutation and Conservative Fixed-Root Mutation. These genetic operators were implemented in all modeling categories and core algorithms of GeneXproTools (GEP and GEP-RNC). In this post I'll talk about Fixed-Root Mutation and its uses in the context of this new project.
As the name implies, the Fixed-Root Mutation operator implements a different version of the Mutation operator, where the root of each sub-tree is conserved. So, if we have a multigenic system encoding n trees, with the Fixed-Root Mutation operator we can change all the elements in the chromosome except the ones in the first position of each gene (this position encodes the root of the sub-tree).
The Fixed-Root Mutation operator is interesting because it gives us more control over the root position, which might come in handy in problems like the ones we set out to solve with this new project "New Project: Multi-class Classification & Trading Strategies". For example, if we are using a single-tree system for discovering buy-sell-wait rules, by fixing the function of interest (the buy-sell-wait function in this case) at the root of the tree, we might get a faster and more efficient evolution.
The Fixed-Root Mutation operator has also other applications besides fixing the root function in a unigenic system. For example, in multigenic systems we can use it to impose certain structural constraints on our solutions, especially if we use it in combination with other genetic operators. For example, we can warm start the design process with a weak linear solution and then let the learning algorithm discover a better solution to our problem.
In the next post I'll talk about the related Conservative Fixed-Root Mutation operator and its uses.
For example, the new classifier functions of 2-6 discrete outputs greatly simplify the creation of classification models in the Regression Framework when we are interested in crisp outcomes for the model outputs.
Furthermore, these new classifier functions can be used both in unigenic and multigenic systems not only in binary classification but also in multi-class classification. It's worth noting, however, that for multi-class classification better results are usually obtained by decomposing the problem in k sub-tasks (k is the number of classes). But as usual in these situations it all depends on the problem and on the data. For example, all the new classifier functions of 3-6 outputs perform great on the Iris dataset (a 3-class classification problem), but on the Balance Scale dataset (also with 3 classes and the same number of input variables) they perform poorly, mainly due to the unbalanced distribution of classes in the original data. This is obviously a problem for which we can find a solution, like for instance, creating a balanced dataset, but the recommended course of action for multi-class classification is to divide the problem into k binary sub-tasks and solve each one of them in the Logistic Regression Framework or Classification Framework where we have access to the probabilities for each class.
Notwithstanding, for multi-class classification problems where these new 3-6 output classifier functions work well, you can either use these new discrete functions in combination with other math functions in systems with just one tree or in systems with multiple trees, linking the sub-trees with the most appropriate linking function. For example, the Min & Max linking functions work particularly well in those cases.
The other alternative is to use the new discrete output linking functions. In this case, you can either add some of the new discrete output functions to your function set or use just a discrete output function for the linking. Particularly interesting is the setup with just 2 trees linked by a function like the buy-sell-wait function described in the post "Function Design: The BUY-SELL-WAIT Function". But this obviously extends to all kinds of 3-class classification problems as long as the data is amenable to this kind of approach.
And to finish, just a quick note on the uses of these new discrete output linkers in the Logistic Regression Framework and the Classification Framework.
GeneXproTools allows you to use the new linking functions both in Logistic Regression and Classification, but you must be aware that the models you're creating are much simpler (which might be great if you're interested in just that) than the ones created with non-discrete linkers (addition, min, max, Avg2 and others). Therefore these simpler models won't give you much differentiation in terms of probabilities. But again, this might prove useful in certain problems or as a form to gain insight into your data.
So you're free to explore all these new classification algorithms by exploring different combinations of fitness functions (the fitness functions of Regression, Logistic Regression and Classification were all fine-tuned for different purposes), visualization tools (the model fitting charts of Regression and Logistic Regression/Classification are all different), and analytics (the measures of fit used in Regression and Logistic Regression/Classification are also different).
The new linking functions added to GeneXproTools with this mini-release include a total of 6 discrete output classifier functions of 2 and 3 outputs. All these functions showed good performance on the Iris dataset both as normal building blocks when added to the default function set and also as linking functions in systems with 2 and more genes.
In the 3-output category we implemented 3 new linking functions, all of them with outputs of the form {-1, 0, 1}. One of these linkers is the buy-sell-wait function (represented in GeneXproTools as CL3A) described in the post "Function Design: The BUY-SELL-WAIT Function". The other two are the new 3-output classifier functions CL3B and CL3C introduced in the post "Function Design: More 3-Output Classifier Functions".
In the binary category, we also added 3 new linking functions: CL2A, CL2D, and the argmin function of 2 arguments (represented as AMin2 in GeneXproTools). I didn't talk much about these functions, but they all perform very well on the Iris dataset, sub-problem Virginica/Not Virginica. But the main reason for their inclusion and not others (for example the functions CL2B and CL2C) was the fact that they have a neutral gene.
The C++ code for all these new linking functions is shown below:
double gepCL2A(double x, double y)
{
if (x > 0.0 && y > 0.0)
return 1.0;
else
return -1.0;
}
double gepCL2D(double x, double y)
{
if (x > 0.0 && y > 0.0)
return 1.0;
else
return 0.0;
}
double gepCL3A(double x, double y)
{
if (x > 0.0 && y < 0.0)
return 1.0;
else
if (x < 0.0 && y > 0.0)
return -1.0;
else
return 0.0;
}
double gepCL3B(double x, double y)
{
if (x >= 1.0 && y >= 1.0)
return 1.0;
else
if (x <= -1.0 && y <= -1.0)
return -1.0;
else
return 0.0;
}
double gepCL3C(double x, double y)
{
if (x > 0.0 && y > 0.0)
return 1.0;
else
if (x < 0.0 && y < 0.0)
return -1.0;
else
return 0.0;
}
These new linking functions were implemented in the modeling frameworks of Regression, Classification, and Logistic Regressionand therefore can have different uses depending on the framework you are working on. In all cases, though, these new linking functions can be used to solve different kinds of classification problems, exploring the different modeling tools available in a particular framework, including different fitness functions, visualization tools and analytics. In the next post I'll talk more about the different kinds of classification models these new linking functions allow us to explore.
For the new 39 math functions of GeneXproTools, the Visual Basic code is almost an exact clone of the Excel VBA code. The only difference is related to the problem of minus zero in Excel median calculations described in the post "Debugging: Minus Zero in Excel Median Calculations". We realized then that by changing the value returned by the new discrete output functions to "0" instead of "0.0", the number of "-0" occurrences in Excel decreased. So, although this fix did not solve the problem of negative zero in Excel, we decided to adopt this format for all discrete output values of all the new functions in the Excel VBA. And that's the only difference you'll find between the Visual Basic Grammar and the Excel VBA Grammar, as far the new 39 math functions are concerned.
Function gepRamp1(ByVal x As Double) As Double
If x > 0.0 Then
gepRamp1 = x
Else
gepRamp1 = 0.0
End If
End Function
Function gepRamp2(ByVal x As Double) As Double
If x > 0.0 Then
gepRamp2 = 0.0
Else
gepRamp2 = x
End If
End Function
Function gepRamp3(ByVal x As Double) As Double
If x > 0.0 Then
gepRamp3 = 0.0
Else
gepRamp3 = -x
End If
End Function
Function gepRamp4(ByVal x As Double) As Double
If x > 0.0 Then
gepRamp4 = -x
Else
gepRamp4 = 0.0
End If
End Function
Function gepStep1(ByVal x As Double) As Double
If x > 0.0 Then
gepStep1 = 1.0
Else
gepStep1 = -1.0
End If
End Function
Function gepStep2(ByVal x As Double) As Double
If x > 0.0 Then
gepStep2 = 1.0
Else
gepStep2 = 0.0
End If
End Function
Function gepStep3(ByVal x As Double) As Double
If x >= 1.0 Then
gepStep3 = 1.0
ElseIf x <= -1.0 Then
gepStep3 = -1.0
Else
gepStep3 = x
End If
End Function
Function gepStep4(ByVal x As Double) As Double
If x >= 1.0 Then
gepStep4 = 1.0
ElseIf x <= 0.0 Then
gepStep4 = 0.0
Else
gepStep4 = x
End If
End Function
Function gepCL2A(ByVal x As Double, ByVal y As Double) As Double
If x > 0.0 And y > 0.0 Then
gepCL2A = 1.0
Else
gepCL2A = -1.0
End If
End Function
Function gepCL2B(ByVal x As Double, ByVal y As Double) As Double
If x >= 0.0 And y < 0.0 Then
gepCL2B = -1.0
Else
gepCL2B = 1.0
End If
End Function
Function gepCL2C(ByVal x As Double, ByVal y As Double) As Double
If x > 1.0 And y < -1.0 Then
gepCL2C = -1.0
Else
gepCL2C = 1.0
End If
End Function
Function gepCL2D(ByVal x As Double, ByVal y As Double) As Double
If x > 0.0 And y > 0.0 Then
gepCL2D = 1.0
Else
gepCL2D = 0.0
End If
End Function
Function gepCL2E(ByVal x As Double, ByVal y As Double) As Double
If x >= 0.0 And y <= 0.0 Then
gepCL2E = 0.0
Else
gepCL2E = 1.0
End If
End Function
Function gepCL2F(ByVal x As Double, ByVal y As Double) As Double
If x > 1.0 And y < -1.0 Then
gepCL2F = 0.0
Else
gepCL2F = 1.0
End If
End Function
Function gepCL3A(ByVal x As Double, ByVal y As Double) As Double
If x > 0.0 And y < 0.0 Then
gepCL3A = 1.0
ElseIf x < 0.0 And y > 0.0 Then
gepCL3A = -1.0
Else
gepCL3A = 0.0
End If
End Function
Function gepCL3B(ByVal x As Double, ByVal y As Double) As Double
If x >= 1.0 And y >= 1.0 Then
gepCL3B = 1.0
ElseIf x <= -1.0 And y <= -1.0 Then
gepCL3B = -1.0
Else
gepCL3B = 0.0
End If
End Function
Function gepCL3C(ByVal x As Double, ByVal y As Double) As Double
If x > 0.0 And y > 0.0 Then
gepCL3C = 1.0
ElseIf x < 0.0 And y < 0.0 Then
gepCL3C = -1.0
Else
gepCL3C = 0.0
End If
End Function
Function gepMap3A(ByVal x As Double, ByVal y As Double) As Double
Const SLACK As Double = 10.0
If y < (x - SLACK) Then
gepMap3A = -1.0
ElseIf y > (x + SLACK) Then
gepMap3A = 1.0
Else
gepMap3A = 0.0
End If
End Function
Function gepMap3B(ByVal x As Double, ByVal y As Double, ByVal z As Double) As Double
' evaluate minValue(x,y) and maxValue(x,y)
Dim minValue As Double
Dim maxValue As Double
minValue = x
maxValue = y
If minValue > y Then
minValue = y
maxValue = x
End If
If z < minValue Then
gepMap3B = -1.0
ElseIf z > maxValue Then
gepMap3B = 1.0
Else
gepMap3B = 0.0
End If
End Function
Function gepMap3C(ByVal a As Double, ByVal b As Double, ByVal c As Double, ByVal d As Double) As Double
' evaluate minValue(a,b,c) and maxValue(a,b,c)
'
' evaluate minValue(a,b,c)
Dim minValue As Double
minValue = a
If minValue > b Then minValue = b
If minValue > c Then minValue = c
' evaluate maxValue(a,b,c)
Dim maxValue As Double
maxValue = a
If maxValue < b Then maxValue = b
If maxValue < c Then maxValue = c
If d < minValue Then
gepMap3C = -1.0
ElseIf d > maxValue Then
gepMap3C = 1.0
Else
gepMap3C = 0.0
End If
End Function
Function gepMap4A(ByVal x As Double, ByVal y As Double) As Double
Const SLACK As Double = 10.0
If y < (x - SLACK) Then
gepMap4A = 0.0
ElseIf y >= (x - SLACK) And y < x Then
gepMap4A = 1.0
ElseIf y >= x And y < (x + SLACK) Then
gepMap4A = 2.0
ElseIf y >= (x + SLACK) Then
gepMap4A = 3.0
End If
End Function
Function gepMap4B(ByVal x As Double, ByVal y As Double, ByVal z As Double) As Double
' evaluate minValue(x,y), maxValue(x,y) and midrange
Dim minValue As Double
Dim maxValue As Double
minValue = x
maxValue = y
If minValue > y Then
minValue = y
maxValue = x
End If
Dim midrange As Double
midrange = (maxValue + minValue)/2.0
If z < minValue Then
gepMap4B = 0.0
ElseIf z >= minValue And z < midrange Then
gepMap4B = 1.0
ElseIf z >= midrange And z < maxValue Then
gepMap4B = 2.0
ElseIf z >= maxValue Then
gepMap4B = 3.0
End If
End Function
Function gepMap4C(ByVal a As Double, ByVal b As Double, ByVal c As Double, ByVal d As Double) As Double
' evaluate minValue(a,b,c), maxValue(a,b,c) and midleValue(a,b,c)
'
' evaluate minValue(a,b,c)
Dim minValue As Double
Dim argMin As Long
minValue = a
argMin = 0
If minValue > b Then
minValue = b
argMin = 1
End If
If minValue > c Then
minValue = c
argMin = 2
End If
' evaluate maxValue(a,b,c)
Dim maxValue As Double
Dim argMax As Long
maxValue = a
argMax = 0
If maxValue < b Then
maxValue = b
argMax = 1
End If
If maxValue < c Then
maxValue = c
argMax = 2
End If
' evaluate midleValue(a,b,c)
Dim midleValue As Double
midleValue = c
If 0 <> argMin And 0 <> argMax Then
midleValue = a
End If
If 1 <> argMin And 1 <> argMax Then
midleValue = b
End If
If d < minValue Then
gepMap4C = 0.0
ElseIf d >= minValue And d < midleValue Then
gepMap4C = 1.0
ElseIf d >= midleValue And d < maxValue Then
gepMap4C = 2.0
ElseIf d >= maxValue Then
gepMap4C = 3.0
End If
End Function
Function gepMap5A(ByVal x As Double, ByVal y As Double) As Double
Const SLACK As Double = 15.0
If y < (x - SLACK) Then
gepMap5A = 0.0
ElseIf y >= (x - SLACK) And y < (x - SLACK/3.0) Then
gepMap5A = 1.0
ElseIf y >= (x - SLACK/3.0) And y < (x + SLACK/3.0) Then
gepMap5A = 2.0
ElseIf y >= (x + SLACK/3.0) And y < (x + SLACK) Then
gepMap5A = 3.0
ElseIf y >= (x + SLACK) Then
gepMap5A = 4.0
End If
End Function
Function gepMap5B(ByVal x As Double, ByVal y As Double, ByVal z As Double) As Double
' evaluate minValue(x,y), maxValue(x,y), midpoint1, midpoint2
Dim minValue As Double
Dim maxValue As Double
minValue = x
maxValue = y
If minValue > y Then
minValue = y
maxValue = x
End If
Dim intervalLength As Double
Dim midpoint1 As Double
Dim midpoint2 As Double
intervalLength = (maxValue - minValue)/3.0
midpoint1 = minValue + intervalLength
midpoint2 = minValue + 2.0*intervalLength
If z < minValue Then
gepMap5B = 0.0
ElseIf z >= minValue And z < midpoint1 Then
gepMap5B = 1.0
ElseIf z >= midpoint1 And z < midpoint2 Then
gepMap5B = 2.0
ElseIf z >= midpoint2 And z < maxValue Then
gepMap5B = 3.0
ElseIf z >= maxValue Then
gepMap5B = 4.0
End If
End Function
Function gepMap5C(ByVal a As Double, ByVal b As Double, ByVal c As Double, ByVal d As Double) As Double
' evaluate minValue(a,b,c), maxValue(a,b,c), midleValue(a,b,c), midrange1, midrange2
'
' evaluate minValue(a,b,c)
Dim minValue As Double
Dim argMin As Long
minValue = a
argMin = 0
If minValue > b Then
minValue = b
argMin = 1
End If
If minValue > c Then
minValue = c
argMin = 2
End If
' evaluate maxValue(a,b,c)
Dim maxValue As Double
Dim argMax As Long
maxValue = a
argMax = 0
If maxValue < b Then
maxValue = b
argMax = 1
End If
If maxValue < c Then
maxValue = c
argMax = 2
End If
' evaluate midleValue(a,b,c)
Dim midleValue As Double
midleValue = c
If 0 <> argMin And 0 <> argMax Then midleValue = a
If 1 <> argMin And 1 <> argMax Then midleValue = b
Dim midrange1 As Double
Dim midrange2 As Double
midrange1 = (minValue + midleValue)/2.0
midrange2 = (midleValue + maxValue)/2.0
If d < minValue Then
gepMap5C = 0.0
ElseIf d >= minValue And d < midrange1 Then
gepMap5C = 1.0
ElseIf d >= midrange1 And d < midrange2 Then
gepMap5C = 2.0
ElseIf d >= midrange2 And d < maxValue Then
gepMap5C = 3.0
ElseIf d >= maxValue Then
gepMap5C = 4.0
End If
End Function
Function gepMap6A(ByVal x As Double, ByVal y As Double) As Double
Const SLACK As Double = 10.0
If y < (x - SLACK) Then
gepMap6A = 0.0
ElseIf y >= (x - SLACK) And y < (x - SLACK/2.0) Then
gepMap6A = 1.0
ElseIf y >= (x - SLACK/2.0) And y < x Then
gepMap6A = 2.0
ElseIf y >= x And y < (x + SLACK/2.0) Then
gepMap6A = 3.0
ElseIf y >= (x + SLACK/2.0) And y < (x + SLACK) Then
gepMap6A = 4.0
ElseIf y >= (x + SLACK) Then
gepMap6A = 5.0
End If
End Function
Function gepMap6B(ByVal x As Double, ByVal y As Double, ByVal z As Double) As Double
' evaluate minValue(x,y), maxValue(x,y), midrange, midpoint1, midpoint2
Dim minValue As Double
Dim maxValue As Double
minValue = x
maxValue = y
If minValue > y Then
minValue = y
maxValue = x
End If
Dim midrange As Double
Dim midpoint1 As Double
Dim midpoint2 As Double
midrange = (minValue + maxValue)/2.0
midpoint1 = (minValue + midrange)/2.0
midpoint2 = (midrange + maxValue)/2.0
If z < minValue Then
gepMap6B = 0.0
ElseIf z >= minValue And z < midpoint1 Then
gepMap6B = 1.0
ElseIf z >= midpoint1 And z < midrange Then
gepMap6B = 2.0
ElseIf z >= midrange And z < midpoint2 Then
gepMap6B = 3.0
ElseIf z >= midpoint2 And z < maxValue Then
gepMap6B = 4.0
ElseIf z >= maxValue Then
gepMap6B = 5.0
End If
End Function
Function gepMap6C(ByVal a As Double, ByVal b As Double, ByVal c As Double, ByVal d As Double) As Double
' evaluate minValue(a,b,c), maxValue(a,b,c), midleValue(a,b,c), midrange1, midrange2
'
' evaluate minValue(a,b,c)
Dim minValue As Double
Dim argMin As Long
minValue = a
argMin = 0
If minValue > b Then
minValue = b
argMin = 1
End If
If minValue > c Then
minValue = c
argMin = 2
End If
' evaluate maxValue(a,b,c)
Dim maxValue As Double
Dim argMax As Long
maxValue = a
argMax = 0
If maxValue < b Then
maxValue = b
argMax = 1
End If
If maxValue < c Then
maxValue = c
argMax = 2
End If
' evaluate midleValue(a,b,c)
Dim midleValue As Double
midleValue = c
If 0 <> argMin And 0 <> argMax Then midleValue = a
If 1 <> argMin And 1 <> argMax Then midleValue = b
' evaluate midrange1 and midrange2
Dim midrange1 As Double
Dim midrange2 As Double
midrange1 = (minValue + midleValue)/2.0
midrange2 = (midleValue + maxValue)/2.0
If d < minValue Then
gepMap6C = 0.0
ElseIf d >= minValue And d < midrange1 Then
gepMap6C = 1.0
ElseIf d >= midrange1 And d < midleValue Then
gepMap6C = 2.0
ElseIf d >= midleValue And d < midrange2 Then
gepMap6C = 3.0
ElseIf d >= midrange2 And d < maxValue Then
gepMap6C = 4.0
ElseIf d >= maxValue Then
gepMap6C = 5.0
End If
End Function
Function gepECL3A(ByVal x As Double, ByVal y As Double, ByVal z As Double) As Double
If y > x And z < x Then
gepECL3A = 1.0
ElseIf y < x And z > x Then
gepECL3A = -1.0
Else
gepECL3A = 0.0
End If
End Function
Function gepECL3B(ByVal x As Double, ByVal y As Double, ByVal z As Double) As Double
If y > x And z > x Then
gepECL3B = 1.0
ElseIf y < x And z < x Then
gepECL3B = -1.0
Else
gepECL3B = 0.0
End If
End Function
Function gepECL3C(ByVal x As Double, ByVal y As Double, ByVal z As Double) As Double
If y >= x And z >= x Then
gepECL3C = 1.0
ElseIf y <= -x And z <= -x Then
gepECL3C = -1.0
Else
gepECL3C = 0.0
End If
End Function
Function gepECL3D(ByVal a As Double, ByVal b As Double, ByVal c As Double, ByVal d As Double) As Double
' evaluate minValue(a,b) and maxValue(a,b)
Dim minValue As Double
Dim maxValue As Double
minValue = a
maxValue = b
If minValue > b Then
minValue = b
maxValue = a
End If
If c >= maxValue And d >= maxValue Then
gepECL3D = 1.0
ElseIf c <= minValue And d <= minValue Then
gepECL3D = -1.0
Else
gepECL3D = 0.0
End If
End Function
Function gepAMin2(ByVal x As Double, ByVal y As Double) As Double
If x < y Then
gepAMin2 = 0.0
Else
gepAMin2 = 1.0
End If
End Function
Function gepAMin3(ByVal x As Double, ByVal y As Double, ByVal z As Double) As Double
Dim temp As Double
Dim argMin As Double
temp = x
argMin = 0.0
If temp >= y Then
temp = y
argMin = 1.0
End If
If temp >= z Then argMin = 2.0
gepAMin3 = argMin
End Function
Function gepAMin4(ByVal a As Double, ByVal b As Double, ByVal c As Double, ByVal d As Double) As Double
Dim temp As Double
Dim argMin As Double
temp = a
argMin = 0.0
If temp >= b Then
temp = b
argMin = 1.0
End If
If temp >= c Then
temp = c
argMin = 2.0
End If
If temp >= d Then argMin = 3.0
gepAMin4 = argMin
End Function
Function gepAMax2(ByVal x As Double, ByVal y As Double) As Double
If x >= y Then
gepAMax2 = 0.0
Else
gepAMax2 = 1.0
End If
End Function
Function gepAMax3(ByVal x As Double, ByVal y As Double, ByVal z As Double) As Double
Dim temp As Double
Dim argMax As Double
temp = x
argMax = 0.0
If temp < y Then
temp = y
argMax = 1.0
End If
If temp < z Then argMax = 2.0
gepAMax3 = argMax
End Function
Function gepAMax4(ByVal a As Double, ByVal b As Double, ByVal c As Double, ByVal d As Double) As Double
Dim temp As Double
Dim argMax As Double
temp = a
argMax = 0.0
If temp < b Then
temp = b
argMax = 1.0
End If
If temp < c Then
temp = c
argMax = 2.0
End If
If temp < d Then argMax = 3.0
gepAMax4 = argMax
End Function
Debugging is as much a part of software development as is software design, programming or testing. But there are bugs that leave us scratching our heads and make us look under the hood of all our code components for the elusive cause. And sometimes we get lucky and find not only the bug but also new ways of making the code more efficient. We had one such instance happening during this project "New Project: Multi-class Classification & Trading Strategies".
The crash was a terrible one and was happening during Ensemble Deployment to Excel using the new math functions: Excel would crash, GeneXproTools would crash and we had to take them both down with the Task Manager. It only happened with model ensembles: when the models were deployed individually, the system wouldn't crash. Also interesting was that it only happened with unigenic systems using the new math functions with discrete outputs like {-1, 0, 1} or {0, 1, 2}; outputs like {3, 4, 5} worked just fine. And curiouser and curiouser, the crash disappeared if we added a neutral gene to all the models of the ensemble before deploying to Excel!
What the heck was going on? It couldn't be the Excel VBA code, right? Otherwise the individual models would also have crashed! And the same for the ensembles with a neutral gene! But they don't crash! So the culprit must be somewhere else. But where? (That's where out of desperation we went looking for it where it couldn't possibly be and found new ways of improving our code. And in this case we found a new way of speeding up the deployment of ensembles and models to Excel by optimizing the way Excel calculations were done.)
Well, after much head scratching we finally found the culprit: Minus zero in Excel calculations! A bug not even acknowledged by Microsoft (they say it's fixed and solved as it's just a minor inconvenience for anyone formatting cell colors according to the sign; see the discussion here).
But there's another problem with negative zero in Excel and that's the crash it causes in median calculations! Excel crashes completely and you have to close it down with the Task Manager. And it happens every time. I'm just surprised such a bug exists in Excel, but maybe medians are not as widely used as one would have thought.
Well, we could not wait for Microsoft to solve the problem of minus zero in their median algorithm (and elsewhere) and had to make a decision: either remove the calculations for the Median Model in the Excel worksheet (GeneXproTools evaluates the Median Model and the Average Model for the model ensembles deployed to Excel) or find a workaround to fix the minus zero problem in our Excel code.
We decided on the workaround as it does not corrupt the code much and solves the problem: So we now add zero to the model output in all Excel calculations. It looks a little silly if you don't know why it's there, but it prevents the crash of the median calculations in Excel with minus zero.