What's New Products Buy Now Downloads Forum GeneXproTools Online Guide Learn how to use the 5 modeling platforms of GeneXproTools with the Online Guide

 Last update: February 19, 2014

Class Encoding

For classification and logistic regression, the learning algorithms of GeneXproTools require a two-valued numeric representation {a, b} for the two classes of the response variable, where a and b are two different real numbers (see how categorical variables and response variables with multiple classes are handled, respectively, in the guides Category Mapping and Class Merging & Discretization). You can change the class encoding in the Class Encoding Window.

The rationale for using other encodings besides the canonical {0, 1} representation, is that the learning algorithms of GeneXproTools for classification and logistic regression, which include different fitness functions and different types of rounding thresholds, can exploit different class encodings for searching the solution space much more efficiently. For example, for fitness based exclusively on the confusion matrix or the number of hits, the class encoding you choose is irrelevant as no error measure based on the distance to the target output is being used to evaluate fitness. However, for fitness functions that explore some kind of distance between raw model outputs and actual values such as the mean squared error, using different encodings can change significantly the fitness landscape, as different ranges for the class representation produce very different results (theoretically, mathematics tells us that all ranges are equivalent, but this doesn’t hold in computational evolutionary systems). In fact, in these systems the standard {0, 1} representation is far from unbiased and better models can be created if a more flexible range is used, such as the symmetrical {-1000, 1000}, {-100, 100} or {-10, 10} and the asymmetrical {0, 1000}, {0, 100} or {0, 10} ; even the {-1, 1} representation, which is not very far from the usual {0, 1} encoding, results in a richer fitness landscape.

Note, however, that the default class encoding of GeneXproTools is the standard {0, 1} representation, but you can change the encoding in the Class Encoding Window. We especially advise you to do so if you’re working with fitness functions based on some kind of distance measure. The default fitness functions of GeneXproTools for classification (ROC Measure) and logistic regression (Positive Correl), are not distance-dependent, which is the reason why GeneXproTools uses as default the more common representation of {0, 1}.

Note also that the choice of different class encodings has no bearing on the way GeneXproTools shows the response variable in all the charts and tables, as in all cases GeneXproTools uses the {0, 1} representation.

GeneXproTools also allows you to invert the class encoding, which might be useful if you’d rather think of a certain outcome as Positive rather than Negative. You can invert the class encoding also in the Class Encoding Window.

Related Tutorials:

Related Videos:

 Leave Feedback Please enter the number below using the combo boxes before sending your feedback. 3 8 4 0 1 2 3 4 5 6 7 8 9   0 1 2 3 4 5 6 7 8 9   0 1 2 3 4 5 6 7 8 9

Time Limited Trial

Released February 19, 2014

Last update: 5.0.3883

New Entries