What’s New in GeneXproTools 5.0?

Home

About Us

Contact

Blog


What's New	Products	Buy Now	Downloads	Forum

Discover the excitement & power of modeling with GeneXproTools! Create powerful predictive models for Regression & Time Series Prediction, Classification & Logistic Regression and explore Logic Synthesis. Learn More

GeneXproTools

GeneXproServer

Gene Expression Programming

What’s New in GeneXproTools 2023.1: Enhanced Parallel Processing Algorithm

We're pleased to introduce an upgraded parallel processing algorithm, bringing substantial performance enhancements. With this update:

Performance Boost: Achieve up to 4-5x faster processing. The speed increase depends on your CPU's core count* and specific run settings.
Default Use: Both new and existing runs will utilize the new algorithm by default.
Switching Back: Prefer the previous algorithm? Navigate to Settings > Random Constants and deselect the Use Parallel Algorithm checkbox.
Optimizing Performance: To maximize the new algorithm's benefits, set the 'Chromosomes' value in Settings to a multiple of your CPU's core count. For instance, with an 8-core CPU (8 threads depicted in the example image), recommended values include 64, 128, 256, etc. Note: This optimization is automatic for new runs, but existing runs remain unchanged.

There are several other performance oriented changes and small bug fixes in this release.

NOTE: Existing subscribers will need a new license file for this release. If you did not receive it, please contact us at support@gepsoft.com.

* Many CPUs have a certain number of cores and twice the number of threads. Set 'Number of threads' settings to the higher number for best performance.

What’s New in GeneXproTools 5.0?

New Multifunctional Data Panel: Variable Charts and Analysis; Record Charts and Analysis
Support for Categorical Variables and Missing Values
Support for Multinomial Classification & Logistic Regression
Support for Data Normalization
Dataset Partitioning and Sub-sampling
Support for GEP Files as Data Source
Summary Statistics
Outlier Detection & Removal
Regression Analysis
Variable Importance
Residual Analysis
New Logistic Regression Category: Fitness Functions; New Charts & Statistics; Code Generation
Model Browsing in the Run Panel
New Charts for Model Visualization & Selection: Classification & Logistic Regression; Regression & Time Series Prediction
New Tools for Model Selection
Introduction of Evolvable Rounding Thresholds in Classification
Improved Results Panel in all Categories
Favorite Statistics for all Categories
Improved History Panel
100+ New Fitness Functions
Adjustable Parsimony Pressure for all Fitness Functions
New and Adjustable Variable Pressure for all Fitness Functions
More Parameters for the Custom Fitness Functions
New Genetic Operators & Modeling Strategies
More Variables & Unlimited Ensemble Size
New Tools for Creating Ensembles & Random Forests
New Programming Languages: R, Octave & Excel VBA
Improvements in Generated Model Code
3 Different Forms of Model Output for Classification & Logistic Regression
Ensemble Deployment to Excel with Average & Median Probability Models
Ensemble Deployment without Embedded Code
New Linking Functions
Improved Expression Tree Display
New Defaults for the Function Sets
Import Function Set
Import Derived Variables
Analysis of Simple Models
New Charts for Monitoring Evolution
New Stop Conditions
New Online Help System
And Much More…

The support for categorical data and missing values is one of the most important new additions to version 5. GeneXproTools now supports categorical variables and missing values, from data loading to code generation, improving considerably the efficiency and quality of the modeling process.

Next in importance is the new support for dataset management, with different dataset partitioning methods and sub-sampling schemes, including bagging and mini-batch mode which are essential for the generation of good ensembles and processing of large datasets.

Also extremely important is the new multifunctional Data Panel where you can analyze not only your input variables but also your derived variables and all your models. Now with the new Data Panel you can cross analyze all these different types of variables across different datasets.

The new Data Panel also supports record analysis and visualization, with a wide range of tools and analyses such as error analysis and record prototypes.

Also important in this new release is the full integration of Logistic Regression, with dedicated fitness functions and the generation of the logistic regression model code in all supported programming languages.

Also major in GeneXproTools 5.0 is the introduction of Favorite Statistics for all modeling categories, allowing you to select your models using the statistic of your choice, including the Area Under the ROC curve, Correlation Coefficient, RMSE, and many more.

Also major in version 5 is the support for ensemble deployment in data-only mode, which is important for the deployment of large ensembles or GEP forests.

There are other major improvements in version 5, such as new genetic operators and modeling strategies, new linking functions and evolvable rounding thresholds, 100+ new fitness functions and multinomial Classification & Logistic Regression, variable importance for all generated models and much more. The list of minor improvements is huge and below you'll also find a brief description of the most important of them.

Below is a more detailed description of the new features. You can also watch a video introducing some of the main features in this new release.

New Multifunctional Data Panel

The new Data Panel goes beyond a simple platform for analyzing and visualizing input data. Besides allowing the analysis and visualization of input variables, it now also allows the visualization and analysis of all your models and derived variables, giving you access to a rich web of analyses between all these types of variables and across different datasets.

The myriad of analyses supported by the new Data Panel includes scatter plots and regression analysis between all possible pairs of variables (input variables, models and derived variables); histograms and summary statistics for all variables and across different datasets; different line charts for outlier detection and model analysis; evaluation of the variable importance and display of the variable importance chart for all the models; and so on.

The new Data Panel also supports extensive record analyses, allowing you to study different types of records using different charts and browsing tools. For example, you can now compare each record with different record prototypes in order to gain insight into both your data and your models. By selecting just the outliers or the misclassified records to browse, you can now also do error analysis in the Data Panel.

Variable Charts and Analysis

Below is a gallery of the new charts of the Data Panel for variable visualization & analysis.

The Sequential Distribution Chart, with its standard deviation lines and average line, is ideal for detecting outliers. The Bivariate Line Chart, with its sorting options and Show Normalized option, is a very flexible and useful tool for comparing any two variables.

The Scatter Plot allows you to very quickly visualize the correlation between all possible pairs of variables, including input variables, models and derived variables.

The Histogram allows you to very quickly analyze the distribution of all your variables (input variables, models and derived variables), and compare them across different datasets.

The Statistics Charts, which include 9 different charts for 9 different summary statistics (min, max, average, median, standard deviation, R-square, correlation coefficient, slope and intercept), allow you to quickly visualize and compare summary statistics for all your variables (input variables, models and derived variables) and for all your datasets.

The Variable Importance Chart shows the importance of all the variables in your models, including the importance of derived variables.

The new highlighting functionality of the new Data Panel offers another dimension to data visualization and analysis. By highlighting certain types of records, such as misclassifications in Classification and Logistic Regression or model outliers in Regression and Time Series Prediction, you can now quickly understand and visualize new patterns in all your variables and models.

Record Charts and Analysis

Below is a gallery of different record analyses you can easily perform in the new Data Panel.

By combining all the Record Charts with different browsing options, GeneXproTools now allows you to perform different record analyses and computations, including error analysis, record prototyping and summary statistics across different variables (input variables, models and derived variables) and datasets.

Support for Categorical Variables and Missing Values

GeneXproTools now supports categorical variables and 50+ different types of missing values, replacing them automatically during data loading, but allowing you to choose different mappings both for the categories and missing values.

Moreover GeneXproTools extends the support for categorical and missing data throughout the modeling process, generating model code that also supports categorical variables and missing values, offering you a much more robust and convenient platform for model design and deployment. Below is an example in C++ of a classification model with both categorical variables and missing values.

							
//------------------------------------------------------------------------
// Classification model generated by GeneXproTools 5.0 on 6/8/2013
// GEP File: D:\GeneXproTools\Version5.0\CreditApproval_CM_01.gep
// Training Records:  460
// Validation Records:   230
// Fitness Function:  Bounded ROC, ROC Threshold
// Training Fitness:  912.951824483014
// Training Accuracy: 87.83% (404)
// Validation Fitness:   950.172009488309
// Validation Accuracy:  90.43% (208)
//------------------------------------------------------------------------

#include "math.h"

int gepModel(double d[]);
double gep3Rt(double x);
double gepMin2(double x, double y);
void TransformCategoricalInputs(char* input[], double output[]);

int gepModel(char* d_string[])
{
    const double ROUNDING_THRESHOLD = 2.02223099900018;

    const double G1C2 = 0.481887264625996;
    const double G1C6 = 3.80779442732017;

    double d[15];
    TransformCategoricalInputs(d_string, d);

    double dblTemp = 0.0;

    dblTemp = atan((((d[12]*(d[7]-d[13]))-((exp(G1C6)+d[3])/2.0))-G1C2));
    dblTemp += gep3Rt(pow(gepMin2(d[8],d[9]),3));
    dblTemp += atan((d[9]*gep3Rt(((pow(d[7],2)-(d[9]-d[14]))
               -pow((d[4]+d[4]),4)))));
    dblTemp += pow(d[8],2);

    return (dblTemp >= ROUNDING_THRESHOLD ? 1 : 0);
}

double gep3Rt(double x)
{
    return x < 0.0 ? -pow(-x,(1.0/3.0)) : pow(x,(1.0/3.0));
}

double gepMin2(double x, double y)
{
    double varTemp = x;
    if (varTemp > y)
        varTemp = y;
    return varTemp;
}

void TransformCategoricalInputs(char* input[], double output[])
{
    if(strcmp("l", input[3]) == 0)
        output[3] = 1.0;
    else if(strcmp("u", input[3]) == 0)
        output[3] = 2.0;
    else if(strcmp("y", input[3]) == 0)
        output[3] = 3.0;
    else if(strcmp("?", input[3]) == 0)
        output[3] = 2.0;
    else output[3] = 0.0;
    
    if(strcmp("g", input[4]) == 0)
        output[4] = 2.0;
    else if(strcmp("gg", input[4]) == 0)
        output[4] = 1.0;
    else if(strcmp("p", input[4]) == 0)
        output[4] = 3.0;
    else if(strcmp("?", input[4]) == 0)
        output[4] = 2.0;
    else output[4] = 0.0;
    
    output[7] = atof(input[7]);
    
    if(strcmp("f", input[8]) == 0)
        output[8] = 1.0;
    else if(strcmp("t", input[8]) == 0)
        output[8] = 2.0;
    else output[8] = 0.0;
    
    if(strcmp("f", input[9]) == 0)
        output[9] = 1.0;
    else if(strcmp("t", input[9]) == 0)
        output[9] = 2.0;
    else output[9] = 0.0;
    
    if(strcmp("g", input[12]) == 0)
        output[12] = 1.0;
    else if(strcmp("p", input[12]) == 0)
        output[12] = 2.0;
    else if(strcmp("s", input[12]) == 0)
        output[12] = 3.0;
    else output[12] = 0.0;
    
    if(strcmp("?", input[13]) == 0)
        output[13] = 0.0;
    else output[13] = atof(input[13]);
    
    output[14] = atof(input[14]);
}

Support for Multinomial Classification & Logistic Regression

GeneXproTools now provides a platform for handling multiple classes in Classification and Logistic Regression, allowing you to easily setup different sub-classification tasks for each class in the response variable, without you having to prepare and load different datasets for each Classification or Logistic Regression problem. Through the Class Merging & Discretization window you can single out the class of interest and then create models for each sub task, keeping them under a single gep file or creating n different files for the n classes.

Support for Data Normalization

GeneXproTools now allows you to normalize your numerical variables, not only for purposes of visualization and analysis but also for model design. The normalization techniques supported by GeneXproTools include Standardization, 0/1 Normalization and Min/Max Normalization.

GeneXproTools again creates model code for scoring and deployment that also supports data normalization. Below is an example in Python of a logistic regression model created using data standardized in the GeneXproTools environment.

							
#------------------------------------------------------------------------
# Logistic regression model generated by GeneXproTools 5.0 on 6/9/2013
# GEP File: D:\GeneXproTools\Version5.0\Diabetes-DN_01a.gep
# Training Records:  512
# Validation Records:   256
# Fitness Function:  Maximum Likelihood, Logistic Threshold
# Training Fitness:  616.228156623755
# Training Accuracy: 79.69% (408)
# Validation Fitness:   546.661612370381
# Validation Accuracy:  80.08% (205)
#------------------------------------------------------------------------

from math import *

def gepModel(d_string):

    ROUNDING_THRESHOLD = -1.77709077258152

    G2C8 = 2.57484664448988
    G3C0 = -6.54957731864376
    G3C9 = 2.25135044404431
    G3C7 = 5.69697519760735
    G4C2 = 8.97033600878933
    G4C7 = 5.79271828363903
    G4C4 = 1.07699819940794

    d = [0.0] * len(d_string)
    TransformCategoricalInputs(d_string, d)

    Standardize(d)

    varTemp = 0.0

    varTemp = (d[1]-(1.0-pow(d[1],3.0)))
    varTemp = varTemp + min((((pow(d[5],3.0)+d[1])/2.0)*G2C8),\
              (gep3Rt(d[4])-atan(d[2])))
    varTemp = varTemp + max(max((gep3Rt(G3C7)+d[6]),(G3C0*G3C9)),\
              ((G3C0+pow(d[5],3.0))/2.0))
    varTemp = varTemp + (((((pow(d[7],2.0)+d[1])/2.0)+(G4C2+G4C7))/2.0)\
              *min(d[7],(1.0-G4C4)))

    if (varTemp >= ROUNDING_THRESHOLD):
        return 1
    else:
        return 0

def gep3Rt(x):
    if (x < 0.0):
        return -pow(-x,(1.0/3.0))
    else:
        return pow(x,(1.0/3.0))

def TransformCategoricalInputs(inputList, outputList):
    
    outputList[1] = { 
        "?" : lambda : 123.065088757397,
    }.get(inputList[1], lambda : float(inputList[1]))()
    
    outputList[2] = { 
        "?" : lambda : 72.5246913580247,
    }.get(inputList[2], lambda : float(inputList[2]))()
    
    outputList[4] = { 
        "?" : lambda : 156.600790513834,
    }.get(inputList[4], lambda : float(inputList[4]))()
    
    outputList[5] = { 
        "?" : lambda : 32.7155069582505,
    }.get(inputList[5], lambda : float(inputList[5]))()
    
    outputList[6] = float(inputList[6])
    outputList[7] = float(inputList[7])
    

def Standardize(input):
    AVERAGE_1 = 123.065088757396
    STDEV_1 = 30.5208809235629
    input[1] = (input[1] - AVERAGE_1) / STDEV_1

    AVERAGE_2 = 72.5246913580247
    STDEV_2 = 11.6112169688815
    input[2] = (input[2] - AVERAGE_2) / STDEV_2

    AVERAGE_4 = 156.600790513834
    STDEV_4 = 83.1959782804369
    input[4] = (input[4] - AVERAGE_4) / STDEV_4

    AVERAGE_5 = 32.7155069582505
    STDEV_5 = 6.54802505977307
    input[5] = (input[5] - AVERAGE_5) / STDEV_5

    AVERAGE_6 = 0.470236328125
    STDEV_6 = 0.312962638849511
    input[6] = (input[6] - AVERAGE_6) / STDEV_6

    AVERAGE_7 = 33.478515625
    STDEV_7 = 12.0024266685208
    input[7] = (input[7] - AVERAGE_7) / STDEV_7

Dataset Partitioning and Sub-sampling

Now in version 5.0 you can manage your datasets inside GeneXproTools, as it provides essential tools for you to easily split your data into training and validation/test datasets. The sampling schemes supported by GeneXproTools include Odds/Evens splitting, choosing different Partitions and Random Shuffle (random sampling without replacement). For all modeling categories, GeneXproTools also provides useful defaults optimized for efficiency and good model generalization.

Moreover GeneXproTools also supports sub-sampling of both the training and validation datasets, implementing several sub-sampling schemes, including bagging and mini-batch, which are, respectively, essential tools for creating random forests (more appropriately called GEP forests) and dealing with large datasets. The sub-sampling of the validation dataset also has multiple applications, including reserving part of the validation set for testing and calculating the cross-validation accuracy of your models.

The sub-sampling of both the training and validation datasets is done in the Settings Panel and includes: Top n, Bottom n, Top Half, Bottom Half, Odd Cases, Even Cases, Random, Shuffled, Balanced Random, and Ballanced Shuffled. The Balanced schemes are available only for classification, logistic regression and logic synthesis.

Support for GEP Files as Data Source

We’ve now added support for loading data from gep files, allowing you to choose either the training or the validation/test datasets or the original data, which in this context is defined as the union of the training and validation/test datasets. In either case, it’s the original raw data, with categorical variables and missing values if they exist, that is loaded into GeneXproTools.

Moreover, Time Series Prediction gep files can also be used as source data for Regression, Time Series Prediction, Classification and Logistic Regression runs, allowing you to easily create all kinds of gep runs using the transformed time series as input variables.

Summary Statistics

Now GeneXproTools provides summary statistics for all input variables, models and derived variables. The summary statistics are shown in the Statistics Report in the Data Panel. Moreover, GeneXproTools also shows the summary statistics (min, max, average, median, standard deviation, correlation coefficient, R-square, slope and intercept) graphically in the Statistics Charts.

GeneXproTools also evaluates summary statistics for the records, providing easy ways of comparing different types of records with record prototypes. For example, by browsing just the misclassified records these statistics and visualization tools are essential for error analysis.

Outlier Detection & Removal

In the data Panel, GeneXproTools now offers different tools for outlier detection, including the new Scatter Plot and the Sequential Distribution Chart, which now shows both the average line and standard deviation lines for easy detection of outliers.

Moreover, for the Sequential Distribution Chart, GeneXproTools also allows you to copy the indexes of all the outliers for the current variable by choosing Copy Outlier IDs (3 Sigma) in the context menu.

The outlier indexes can then be pasted directly into the Delete Records Window for the easy removal of all outliers.

Regression Analysis

GeneXproTools allows you to quickly analyze and visualize the correlation between all possible combinations of variables in the Data Panel, including input variables, models and derived variables. Through the Scatter Plot and Statistics Report of the Data Panel you have access to the regression line and regression equation, the correlation coefficient and R-square for all pairs of variables.

Error Analysis

GeneXproTools now allows you to do error analysis in the Data Panel. By selecting different records to browse, including misclassified records and outliers, GeneXproTools allows you to analyze these records using different charts to compare them with other records and record prototypes.

Variable Importance

GeneXproTools now computes and shows the importance of all the variables for all your models. The Variable Importance Chart is available through the Statistics Charts in the Data Panel. By selecting Model Variables in the Data combobox, you can quickly access the variables of each model and their relative importance. The variable importance is computed for all the models GeneXproTools generates, including regression models, classification and logistic regression models, and time series prediction models.

Residual Analysis

GeneXproTools now performs residual analysis for all regression models, including time series prediction models. The Residuals Plot is accessible both in the Run Panel and Results Panel.

New Logistic Regression Category

The Logistic Regression Framework is now implemented as an independent category, with its own set of fitness functions, visualization tools, and code generation.

Fitness Functions

A total of 60 new fitness functions for Logistic Regression, including ROC Measure, Rank Measure, Maximum Likelihood, Hinge Loss, Positive Correl, R-square, Dual Margin and F Measure, just to name a few. Most of these fitness functions are multi-objective and include different adjustable parameters such as the cost matrix, the number of bins, reference simple models, lower and upper bounds for the model output, parsimony pressure and variable pressure.

New Charts & Statistics

Now in the Run Panel, for up to 1000 records of the training data, you have access to 6 new charts for model design and visualization. These new charts include the Classification Tapestry, three Binomial Fit Charts with options for showing both the rounding threshold and misclassifications, the ROC Curve, and the sophisticated Classification Scatter Plot which is another Gepsoft’s beautiful innovation. In the gallery below, the same logistic regression model is being analyzed using the 6 model fitting charts of the Run Panel.

In the Results Panel GeneXproTools now shows the same model fitting charts it shows in the Run Panel, but now for all the records in the training/validation datasets. Moreover, for the Binomial Fit Charts, GeneXproTools allows you to plot the probability[1] in addition to the raw model output.

In the Table of the Results Panel now GeneXproTools shows the Probability[1] and the Most Likely Class in addition to the Raw Model Output.

Also important is the new set of statistics evaluated in the Results Panel, which now include the Classification Accuracy & Error, the Sensitivity & Specificity, the Positive Predictive Value & Negative Predictive Value, Recall & Precision, Correlation Coefficient & R-square, Jaccard Similarity, Matthews Correlation Coefficient, F1 Measure, and the Area Under the ROC Curve.

This same set of statistics is available in the History Panel for model and ensemble selection through the Favorite Statistic Window.

Code Generation

Now GeneXproTools generates the complete code for the logistic regression model. Moreover, GeneXproTools now lets you choose the kind of output you are most interested in: the model code for predicting the Probability[1], the Most Likely Class or the Raw Model Output.

This new modality for code generation is available during model scoring in the Scoring Panel and also during Model & Ensemble Deployment to Excel.

Also important is that in the Data Panel, for purposes of model visualization and analysis, you can also choose any of these 3 forms of model output: the Probability[1], the Most Likely Class or the Raw Model Output.

Model Browsing in the Run Panel

We’ve now added model navigation in the Run Panel, so now you can go back and forth analyzing and visualizing your models, not just the model output through the use of different charts, but also information about the model structure and composition, such as model size and used variables.

New Charts for Model Visualization & Selection

We’ve now added 100+ new charts for model visualization & analysis and improved most of the old ones for all modeling categories. We've added new charts to the Run Panel, Results Panel and Data Panel of all modeling categories, namely Classification & Logistic Regression, Regression & Time Series Prediction and Logic Synthesis. Below are highlighted the most important.

Classification & Logistic Regression

In the Run Panel and Results Panel for both Classification & Logistic Regression, you now have access to six different model fitting charts instead of just the Classification Tapestry, showing you different aspects of the evolving models, from the raw model output to the predicted class. You can now analyze and clearly see the distribution of model outputs relative to the rounding thresholds using three different Binomial Fit Charts and the new Classification Scatter Plot. The ROC Curve offers another widely used and useful dimension to the modeling process and the Classification Tapestry, which is new for Logistic Regression, offers a crisp visualization of the model output across the distribution of all possible outcomes (true positives, true negatives, false positives and false negatives).

In the Data Panel you now have access to additional tools for model selection and analysis, including the Variable Importance Chart and a total of nine different Statistics Charts (min, max, average, median, standard deviation, R-square, correlation coefficient, slope and intercept) in addition to all the other new charts of the new Data Panel for variable and record analysis, including Histograms, Scatter Plots and different Line Charts. By choosing History Models or Model Variables or All in the Data combobox in the Data Panel, you can now perform a myriad of analyses across different datasets and combinations of models (and model outputs) and variables.

Regression & Time Series Prediction

In the Run Panel and Results Panel we’ve added 3 new charts for model visualization and analysis: the Scatter Plot, the Residuals Plot, and the Stacked Distributions Chart.

The Scatter Plot, with the regression line and regression equation, offers a clear view of the model fit. The Residuals Plot shows the standard residual analysis for detecting unusual patterns in the distribution of the residuals. The Stacked Distributions Chart shows clearly the spread and overlap of the actual and predicted values.

These 3 new charts were also added to the Results Panel, allowing you to perform the same kind of model analysis for the training and validation/test datasets and also for different sub-samplings of the training and validation/test datasets.

In the Data Panel you now have access to additional tools for model selection and analysis, including the Variable Importance Chart and a total of nine different Statistics Charts (min, max, average, median, standard deviation, R-square, correlation coefficient, slope and intercept) in addition to all the other new charts of the new Data Panel for variable and record analysis, including Histograms, Scatter Plots and different Line Charts. By choosing History Models or Model Variables or All in the Data combobox in the Data Panel, you can now perform a myriad of analyses across different datasets and combinations of models and variables.

New Tools for Model Selection

GeneXproTools now features new tools to help you select your models from basically any panel, including:

Access to different Favorite Statistics for all categories;
Random sub-sampling that enables the application of bootstrap techniques to evaluate with more confidence the accuracy of your models or any other measure of fit;
Model navigation from all panels where models are being analyzed or visualized;
A new decentralized Delete Active Model functionality that allows you to delete any model you find lacking in some sense or another;
The new Rename All Models in the History Panel that allows you to reindex your models after sorting them using different statistics, so that you can analyze them in a particular order, allowing you to gain insights into their structure and performance;
New powerful tools for model selection in Time Series Prediction, namely going from Testing Mode to Prediction Mode and back and changing the number of Testing Predictions, allowing you to select your models by their performance in the testing data and then use the selected models to make predictions.

Introduction of Evolvable Rounding Thresholds in Classification

Now GeneXproTools implements the revolutionary idea pioneered by Candida Ferreira of different types of evolvable rounding thresholds for classification models. Now by combining different types of rounding thresholds with different fitness functions, you have access to a much richer solution space, allowing you to create even better classifiers. The new evolvable rounding thresholds GeneXproTools supports include:

Improved Results Panel in all Categories

We’ve improved the Results Panel in all categories, with more and better charts, a more extensive set of statistics, and a richer and more interactive table with different sorting and copying options. For example, the Results Panel for Classification and Logistic Regression now features 17 Measures of Fit, including the usual Classification Accuracy and Error, Correlation Coefficient and R-square, but also the Area Under the ROC Curve, Matthews Correlation Coefficient, Recall, Precision, F1 Measure, and others.

Favorite Statistics for all Categories

GeneXproTools allows you to use Favorite Statistics not only in the History Panel for model selection but also for ensemble management in the Deploy Ensemble to Excel Window.

The favorite statistics for Classification & Logistic Regression include:

For Regression & Time Series Prediction the favorite statistics include:

And for Logic Synthesis the favorite statistics include:

Improved History Panel

The biggest improvement in the History Panel is the implementation of Favorite Statistics in all categories. But other smaller improvements were also added, including:

Random sub-sampling that enables the application of bootstrap techniques to evaluate with more confidence the accuracy of your models or any measure of fit;
Extra functionality accessible both through the context menu and toolbar icons, such as Delete Active Model and Delete History;
A more comprehensive summary which now also computes the average of both the favorite statistic and fitness of all the models in the History;
A new feature for re-indexing all models (Rename All Models) that allows the analysis of models in a particular order;
The new Add Simple Models functionality that allows the analysis of all input and derived variables as simple models;
And new functionality for Updating the Rounding Thresholds in Classification and Logistic Regression.

100+ New Fitness Functions

We’ve added more than one hundred new fitness functions and improved old ones by combining them with a wider range of adjustable parameters and more sophisticated penalties for avoiding local optima and strong & mediocre attractors, such as models that classify everything as zero or one in classification and logistic regression.

Now the vast majority of GeneXproTools fitness functions combines multiple objectives, such as the use of different reference simple models in all categories; cost matrix in Classification, Logistic regression and Logic Synthesis; different types of rounding thresholds in Classification, including evolvable thresholds; lower and upper bounds for the model output; parsimony pressure and variable pressure; and many others.

Adjustable Parsimony Pressure for all Fitness Functions

Parsimony Pressure is now an adjustable parameter that you can fine tune between [0, 1] in order to apply pressure on the structural complexity of the evolving solutions. It’s available for all fitness functions, including custom fitness functions.

New and Adjustable Variable Pressure for all Fitness Functions

Now GeneXproTools also supports Variable Pressure for all fitness functions, including custom fitness functions. It’s an adjustable parameter between [0, 1], allowing you to control the blending of variables into your models.

We've also added a new Complexify Button in the Run Panel so that you can easily apply variable pressure to a particular solution.

More Parameters for the Custom Fitness Functions

For all modeling categories, we’ve increased the number of pre-computed parameters you can use to design your own Custom Fitness Functions, including very useful and widely used parameters readily accessible through the interface, such as the cost matrix, the evolvable rounding thresholds, model boundaries, and many others.

New Genetic Operators & Modeling Strategies

We’ve added 20 new genetic operators, covering a wide range of functionalities and behaviors that you can explore to design your own modeling strategies. From genetic operators for finding the most effective range for the random constants (Constant Fine-Tuning & Constant Range Finding), genetic operators that change only certain structures or elements in the evolving models (Leaf Mutation, Biased Leaf Mutation, Conservative Mutation, Conservative Function Mutation, Conservative Permutation, Biased Mutation, Tail Inversion and Tail Mutation), or genetic operators that inject fresh blood in the population or increase the frequency of particular models (Random Chromosomes, Random Cloning and Best Cloning), you can now have more control over evolution and therefore can push it in specific directions.

GeneXproTools ships with 4 built-in modeling strategies that cover some of the most common modeling needs, namely a strategy for fine-tuning the numerical constants of your models (Constant Fine-Tuning); a strategy for model fine-tuning where the overall structure of the model remains basically unchanged making only small changes in the model’s structure (Model Fine-Tuning); a strategy for Sub-set Selection which is ideal for creating good random forests, especially for datasets with many variables, as it works only with the elements that were randomly drawn for the initial population; and of course a strategy designed for Optimal Evolution which tries to give a good blend of diversity versus stability in order to optimize and accelerate evolution.

Moreover, GeneXproTools also allows you to create your own modeling strategies by changing any of the built-in strategies to create a Custom Strategy.

More Variables & Unlimited Ensemble Size

We’ve extended the number of independent variables in all editions, now with a max of 20 variables for the Standard Edition, 100 for the Advanced and 1000 for the Professional.

We’ve also removed all restrictions on ensemble size in all GeneXproTools editions. So now you can deploy ensembles to Excel of any size you need.

New Tools for Creating Ensembles & Random Forests

We've introduced new tools for creating ensemble models or GEP forests, including new fitness functions, new genetic operators, subset selection strategies, favorite statistics, unlimited ensemble size for external ensemble deployment to Excel, ensemble deployment in data-only mode, average probability multi-model & median probability multi-model for Classification and Logistic Regression, different bagging schemes and different stop conditions.

New Programming Languages: R, Octave & Excel VBA

We've now added 3 new grammars to the already extensive list of supported programming languages for automatic model code generation. The R Language, Octave and Excel VBA are now part of the 19 built-in grammars of GeneXproTools (Ada, C, C++, C#, Excel VBA, Fortran, Java, JavaScript, Matlab, Octave, Pascal, Perl, PHP, Python, R, Visual Basic, VB.Net, Verilog, and VHDL) that allow you to export automatically all your models to any of these programming languages.

Improvements in Generated Model Code

We’ve changed the implementation of functions involving the ternary operator in order to avoid very long lines of code, implementing these functions as individual methods. We’ve also improved the grammars, declaring only the numerical constants and variables that are being used in the model, improving the clarity of the model code.

3 Different Forms of Model Output for Classification & Logistic Regression

We’ve now implemented throughout all GeneXproTools panels support for 3 different forms of mode output for classification and logistic regression models, namely Probability[1], Most Likely Class and Raw Model Output.

Internally GeneXproTools gives you access to these 3 different forms of model output in the Data Panel, for both classification and logistic regression models.

In the Results Panel and for Logistic Regression, you also have access to the 3 different forms of model output. For Classification, however, GeneXproTools shows only the raw model output and the predicted class in the Results Panel. The probabilities for classification models are available in the Data Panel and through model scoring, either in the Scoring Panel or through model/ensemble deployment to Excel.

For scoring your models internally in GeneXproTools in the Scoring Panel, you can also choose any of these 3 forms of model output for classification and logistic regression models. This is also true for Model & Ensemble Deployment to Excel.

Finally, all the code generated by GeneXproTools in any of the supported programming languages for mathematical models (Ada, C, C++, C#, Excel VBA, Fortran, Java, JavaScript, Matlab, Octave, Pascal, Perl, PHP, Python, R, Visual Basic, and VB.Net), implements also the 3 different forms of model output: Probability[1], Most Likely Class and Raw Model Output, which you select directly in the Model Panel.

Ensemble Deployment to Excel with Average & Median Probability Models

Now for Logistic Regression and Classification, you can choose Probability[1] as the output for your models, both for ensemble and model deployment to Excel. For ensemble deployment we’ve added the new average probability model and median probability model, with the thresholds for the probabilities easily adjustable within Excel.

Ensemble Deployment without Embedded Code

Now GeneXproTools also supports ensemble and model deployment to Excel in data-only mode, that is, without embedding the model code. This is especially useful if you are dealing with large ensembles and large datasets, particularly in the ensemble evaluation phase when you need to quickly evaluate different candidate solutions.

New Linking Functions

We’ve now added 3 more linking functions: Avg2, Min and Max, all of which can be easily added to your own grammars. The Avg2 linking function works particularly well with time series data, producing better models by preventing autocorrelation, and therefore is now the new default in Time Series Prediction.

Improved Expression Tree Display

Now you can access the labels of all the variables in the expression trees with the help of tooltips. GeneXproTools also displays the definitions of all the functions and the values of the numerical constants in the tree representation of the model code.

New Defaults for the Function Sets

We created new defaults for the functions sets of all categories in an attempt to create an even better starting point for as wide a range of problem domains as possible.

We’ve also added new functionality to the Functions Panel so that you can now access all these different function set defaults from different modeling categories.

Import Function Set

We’ve introduced a new feature for importing function sets from other gep files. This also includes importing the code of all custom functions designed for a particular run.

Import Derived Variables

GeneXproTools allows you to import the code of all the new derived variables you’ve created. So now you no longer have to copy and paste the code of your favorite derived variables or variable transformations from one run to another. You just have to write them once and easily access them through the Import Derived Variables menu.

Analysis of Simple Models

GeneXproTools now allows you to Add Simple Models to the History so that you can analyze and also compare them to your models. The simple models you can add automatically to the History include all the original variables in your data and all the derived variables you've created.

New Charts for Monitoring Evolution

In the Run Panel GeneXproTools now shows the histograms of the fitness values and program sizes for each generation of evolving models. This is particularly useful if you are designing your own fitness functions or creating your own modeling strategies by adjusting the rates of the genetic operators.

New Stop Conditions

We’ve introduced 6 new Stop Conditions, some of which are very useful for creating good ensembles or random forests, such as Random Generation Number, Random Fitness Threshold, Random R-square Threshold and Random Accuracy Threshold. These stop conditions allow you to choose the limits for the lower and upper value for the number of generations, fitness, R-square or accuracy, allowing you to stop the design process at different stages, thus allowing you to create much more diverse ensembles of models.

New Online Help System

We’ve now introduced a new Online Help System, providing you with constantly updated materials that we will be updating and improving in response to users needs.

And Much More…

The list of smaller improvements and features added to GeneXproTools 5.0 is very large, but below you’ll find the most salient of them.

Improved Classification Tapestry
Improved All Curve Fitting Charts
Improved All Binomial Fitting Charts
Improved Logistic Regression Window Charts
Conversion of Logistic Regression Runs to Classification
Improved Change Seed Window
New Complexify Button in the Run Panel
New Modalities for the Report Panel
Improved Notes
New Code Editor for the Custom Fitness
Improved User Defined Functions Window
Improved Derived Variables Window
Extra Functionality in the Functions Panel
New Home Panel
New System for Identifying Modeling Categories
New Sample Runs
New Defaults for the Genetic Operators
Improved Evolutionary Dynamics Chart
Improved Average/Best Size Chart
Improved Sub-Program Sizes Chart
Improved All Sizes Chart & All Fitnesses Chart
Improved Navigation of All Charts
New Sorting Tools in the Results Panel
More Copy Options for the Results Tables
New Custom Fitness Functions Examples
New DDFs Examples
Improved Defaults

Improved Classification Tapestry

Now, if no models exist in the History, the new Classification Tapestry shows simply the distribution of Positives and Negatives in the response variable.

The new Classification Tapestry also uses another color scheme when the accuracy is below the 50% mark, which can happen not only for weak intermediate solutions but also for good inverted models when fitness functions that select for both positive and negative correlations are used. The R-square and Symmetric ROC fitness functions of Logistic Regression are examples of such fitness functions.

Improved All Curve Fitting Charts

All the Curve Fitting Charts of Regression and Time Series Prediction were improved with a better design and by adding extra functionality, such as zooming, showing labels, legend, titles, tooltips and gridlines.

Improved All Binomial Fitting Charts

The old Binomial Fitting Charts of Logistic Regression were improved with a better design and by adding extra functionality, of which the most important is plotting the rounding threshold and highlighting the misclassifications. The new ones (all three Binomial Fitting Charts of Classification and the brand new Binomial Fitting Chart Sorted by Target & Model) were also created with similar design and features.

Improved Logistic Regression Window Charts

The charts for Quantile Regression, Cutoff Points, Log Odds and Logistic Fit of the Logistic Regression Window were also improved and redesigned.

Conversion of Logistic Regression Runs to Classification

Now, thanks to the introduction of evolvable rounding thresholds, the conversion of Logistic Regression runs to Classification is done flawlessly for all the models in the Run History. By setting the rounding threshold to Logistic Thresholds, it is now possible to bring the unique logistic threshold of each logistic regression model to the Classification Platform.

Improved Change Seed Window

Now in the Change Seed Window you can copy & paste the entire set of numerical constants at once or copy & paste just the numerical constants of a particular gene.

New Complexify Button in the Run Panel

Variable Pressure was introduced in version 5.0 and we also added a Complexify Button to the Run Panel, giving you an easy way of controlling the blending of variables into your models.

New Modalities for the Report Panel

Now GeneXproTools allows you to choose between a Full Report and a Short Report. The Full Report shows a list of all the models in the Run History, whereas the Short Report lists only the active model. Choosing the shorter version is advisable for very large histories with hundreds or thousands of models.

Improved Notes

We’ve improved the Notes Editor of GeneXproTools with a larger and better window and also provided easier access to the Notes Editor through the Reports Menu.

New Code Editor for the Custom Fitness

We’ve improved the Code Editor for the Custom Fitness Function through a larger and better window and the use of different colors for JavaScript keywords, comments and constants.

Improved User Defined Functions Window

We’ve improved the Dynamic UDFs Window with a new combobox for the number of arguments and a new Code Editor that now allows the use of different colors for JavaScript keywords, comments and constants.

Improved Derived Variables Window

We’ve improved the UDFs Window, which is the window for generating new derived variables, with a new Code Editor that now highlights in different colors all JavaScript keywords, comments and constants. We’ve also added new functionality for copying the output of each test.

Extra Functionality in the Functions Panel

Now the Select All and Clear buttons work more intelligently, selecting/clearing only the functions that are visible.

We’ve also added a new feature that allows you to copy all the sub-sets of functions shown in the Functions Table, which is useful for designing and studying your own function sets.

New Home Panel

We’ve redesigned the Home Panel of GeneXproTools, giving you access to a larger list of the most recent runs. We’ve also added a list of useful links for getting help and support and easy access to the new discussion forum.

New System for Identifying Modeling Categories

Now in GeneXproTools 5.0 we color-code each modeling category to help in the quick identification of the modeling category of each run and to make everything more, well, colorful.

New Sample Runs

We’ve created new Sample Runs for this version so that you could easily explore all the new features of GeneXproTools 5.0, such as the support for categorical variables and missing values, the support for multiple classes in Classification and Logistic Regression, data normalization, sub-sampling, and so on.

For example, the Classification sample run Iris Plants has 3 classes (Iris Setosa, Iris Versicolor and Iris Virginica), allowing you to explore with ease how to handle multiple classes in Classification. The Satellite Images sample run plays a similar role for Logistic Regression, with 6 classes in this case. The Credit Approval sample run deals with standardized data. Some datasets of the new sample runs have missing values (Credit Approval, Breast Cancer, Diabetes and Fuel Consumption), while others use a mix of categorical and numerical variables (Credit Approval, Breast Cancer, Iris Plants, Loan Risk, Satellite Images and Emotiv EEG).

For all benchmark sample runs that deal with well-known datasets, we used the original datasets as provided by the donors so that you could compare and reproduce any other studies conducted using these datasets. To revert to the original dataset, just press the Original Button in the Dataset Partitioning Window.

New Defaults for the Genetic Operators

With so many new genetic operators we had to readjust their rates to make the best of evolution. Moreover, now all genetic operators are implemented probabilistically, so now you can use very small rates and still get them to work probabilistically over extended periods of time. And finally, each of the new modeling strategies comes with specific defaults that you can access every time you choose a particular strategy.

Improved Evolutionary Dynamics Chart

The chart for monitoring the evolutionary dynamics in the Run Panel was improved with a better design and by adding support for zooming, labels, legend, titles and better tooltips.

Improved Average/Best Size Chart

The chart for visualizing the changes in Best Size and Average Size during evolution was improved with a better design and by adding support for zooming, labels, legend, titles and better tooltips.

Improved Sub-Program Sizes Chart

We’ve improved the Sub-Program Sizes Chart in the Run Panel with a better design and by adding extra functionality for labels and copying the chart data.

Improved All Sizes Chart & All Fitnesses Chart

We’ve also improved the All Sizes Chart and All Fitnesses Chart in the Run Panel with a better design and by adding support for labels and copying the chart data.

Improved Navigation of All Charts

We’ve now improved the navigation of all charts in the Run Panel and Results Panel through the use of arrows that work in concert with a particular combobox. So now it’s much easier to go back and forth between different charts both during evolution in the Run Panel and during model analysis in the Results Panel.

New Sorting Tools in the Results Panel

We’ve now added Record IDs to all Results Tables and extended the sorting options in all modeling categories.

More Copy Options for the Results Tables

We’ve extended the number of copying options for the Results Table in all modeling categories in order to include all the new additions to the Results Tables, such as Copy Probability[1] in Logistic Regression.

New Custom Fitness Functions Examples

Now the Custom Fitness Functions examples for Classification, Logistic Regression and Logic Synthesis are implementations of a simple fitness function based on the Classification Accuracy. For Regression and Time Series Prediction we now implement a fitness function based on the RMSE.

New DDFs Examples

We created new examples for the math and Boolean custom functions in order to provide more useful examples that are more interesting to use.

Improved Defaults

We've improved a few defaults for things to run even more smoothly:

We've changed the Function Set defaults in all modeling categories in order to create even more accurate models and to work well with the new genetic operators;
In Time Series Prediction we've changed significantly the Function Set default in order to make the best of the new Avg2 linking function default;
Now all fitness functions with adjustable parameters come with preset defaults to help you make the most of them without the need for understanding their inner workings in all detail. These defaults reset every time you change the fitness function so that you always have a good starting point.

Release Date: May 28, 2013

Legacy: 4.3 | 4.0

Time Limited Trial

Try GeneXproTools for free for 30 days!