|
|
|
|
Last update: February 19, 2014
|
|
|
|
|
|
Class Encoding
For classification and logistic regression,
the learning algorithms of GeneXproTools require a two-valued numeric representation
{a, b} for the two classes of the response variable, where a and b are
two different real numbers (see how categorical variables and response variables with
multiple classes are handled, respectively, in the guides
Category Mapping and
Class Merging & Discretization).
You can change the class encoding in the Class Encoding Window.
The rationale for using other encodings besides the canonical {0, 1} representation,
is that the learning algorithms of GeneXproTools for classification and logistic regression,
which include different fitness functions and different types of
rounding thresholds,
can exploit different class encodings for searching the solution space much more efficiently.
For example, for fitness based exclusively on the confusion matrix or the
number of hits,
the class encoding you choose is irrelevant as no error measure based on the distance to
the target output is being used to evaluate fitness. However, for fitness functions that
explore some kind of distance between raw model outputs and actual values such as the
mean squared error, using different encodings can change significantly the fitness landscape,
as different ranges for the class representation produce very different results
(theoretically, mathematics tells us that all ranges are equivalent, but this
doesn’t hold in computational evolutionary systems). In fact,
in these systems the standard {0, 1} representation
is far from unbiased and better models can be created if a more flexible range is used,
such as the symmetrical {-1000, 1000}, {-100, 100} or {-10, 10}
and the asymmetrical {0, 1000}, {0, 100} or {0, 10} ; even the {-1, 1} representation, which is not
very far from the usual {0, 1} encoding, results in a richer fitness landscape.
Note, however, that the default class encoding of GeneXproTools is the
standard {0, 1}
representation, but you can change the encoding in the Class Encoding Window. We especially
advise you to do so if you’re working with fitness functions based on some kind of
distance measure. The default fitness functions of GeneXproTools for classification
(ROC Measure) and logistic regression (Positive Correl), are not distance-dependent,
which is the reason why GeneXproTools uses as
default the more common representation of {0, 1}.
Note also that the choice of different class encodings has no bearing on the way
GeneXproTools shows the response variable in all the charts and tables, as in all cases
GeneXproTools uses the {0, 1} representation.
GeneXproTools also allows you to invert the class encoding, which might be useful if
you’d rather think of a certain outcome as Positive rather than Negative. You can invert
the class encoding also in the Class Encoding Window.
See Also:
Related Tutorials:
Related Videos:
|
|
|
|
|