Getting Started with Time Series Prediction

Home

About Us

Contact

Blog


What's New	Products	Buy Now	Downloads	Forum	Time Series Prediction

GeneXproTools

GeneXproServer

Gene Expression Programming

Getting Started with Time Series Prediction

This tutorial covers the fundamental steps in the creation of nonlinear forecasting models in the Time Series Prediction Framework of GeneXproTools. The successful creation of predictive models requires the use of very sharp modeling tools and in this tutorial we’ll cover the most important ones.

Loading Time Series Data: Time Series Transformation Engine; Changing the Embedding Dimension & Delay Time
Data Visualization Tools: Sequential Distribution Chart; Bivariate Line Chart; Histogram; Scatter Plot; Statistics Charts; Autocorrelation Function
Choosing the Fitness Function
Choosing the Function Set: Built-in Mathematical Functions; User Defined Functions
Creating Derived Features/Variables
Exploring the Learning Algorithms
Monitoring the Modeling Process
Model Selection: Favorite Statistics; Changing the Prediction Mode
Simplifying a Model
Model Evaluation & Testing: Comparing Actual & Predicted Values; Evaluating Performance; Variable Importance
Modeling from Seed Models
Adding a Neutral Gene
Complexity Increase Engine
Generating the Model Code: Visualizing Models as Expression Trees; Automatic Code Generation Using Built-in Grammars; Automatic Code Generation with User Defined Grammars
Making Predictions
Deployment to Excel: Model Deployment; Ensemble Deployment

Loading Time Series Data

GeneXproTools handles time series one at a time. So if you have multivariate time series you must create a different project or run for each one of them. Before modeling, each univariate time series is automatically transformed by GeneXproTools according to parameters specified by the user.

Time Series Transformation Engine

GeneXproTools includes a time series transformation engine that transforms the time series according to the Embedding Dimension and Delay Time. In addition, GeneXproTools also transforms the time series depending on the Prediction Mode – Prediction or Testing. The first mode, Prediction Mode, is used when you want to make predictions straightaway with your models; the second, Testing Mode, is used for backtesting the predictive accuracy of the evolved models on known past behavior. You must choose the Prediction Mode in the New Run Wizard when you load your time series, but you can change it later to a different mode in the Settings Panel.

Before evolving a model with GeneXproTools you must first load and transform the input time series for the learning algorithm. GeneXproTools allows you to work with databases/Excel, text files and GeneXproTools files. In all cases, though, the time series must be in a single column:

TimeSeriesName
101
82
66
35
31
7
20
92
154
125
85
68
38
23
10
24
83
132
131
118

Then GeneXproTools automatically transforms the time series according to your specifications (the Embedding Dimension, Delay Time, and Prediction Mode you chose for your models). For instance, for the small time series of 20 observations above, an Embedding Dimension of 5 and a Delay Time of 1 is automatically transformed into the following training data, where t_0 is the response variable and t_1 trough t_5 are the lagged predictor variables:

t_5 t_4 t_3 t_2 t_1 t_0
101 82 66 35 31 7
82 66 35 31 7 20
66 35 31 7 20 92
35 31 7 20 92 154
31 7 20 92 154 125
7 20 92 154 125 85
20 92 154 125 85 68
92 154 125 85 68 38
154 125 85 68 38 23
125 85 68 38 23 10
85 68 38 23 10 24
68 38 23 10 24 83
38 23 10 24 83 132
23 10 24 83 132 131
10 24 83 132 131 118

You can then observe the transformed time series on the Training Data window of the New Run Wizard before proceeding with the new run.

Changing the Embedding Dimension & Delay Time

The time series transformation engine of GeneXproTools is operational every time you create a new run or every time you change either the Embedding Dimension or the Delay Time, the Prediction Mode or the Number of Testing Predictions in the General Settings Tab.

In practice, the Embedding Dimension corresponds to the number of predictor variables after your time series has been transformed. The Delay Time t determines how the time series is processed, that is, continuously if t = 1 or at t intervals.

These two parameters, together with the size of the time series and the Prediction Mode, will determine the final number of training records after the transformation of the time series.

The time series is then ready to be used as training data by the learning algorithm and you just have to click the Start button in the Run Panel to create your models.

Data Visualization Tools

GeneXproTools allows you to visualize and analyze both the original and the transformed time series in the Data Panel using different charts and analyses.

Sequential Distribution Chart

The Sequential Distribution Chart, with the option to show the standard deviation lines and the average line, offers a simple and very effective way of detecting outliers and analyzing the distribution of values for all your variables.

Bivariate Line Chart

The Bivariate Line Chart is a very powerful and flexible tool that allows very useful comparisons of any pair of different variables. With the Bivariate Line Chart you can select any two variables and then plot them in order or sorted in different ways. This chart also allows you to scale your variables so that you can compare them in a more meaningful way.

Histogram

With the Histogram you can visualize very quickly the distribution of values of all your variables. GeneXproTools allows you to browse easily from one variable to the other and also change the number of bins in your histograms.

Scatter Plot

Like the Bivariate Line Charts, Scatter Plots also allow the comparison of any pair of different variables which are easily selected using up-downs both for the X-axis and Y-axis.

Scatter Plots are powerful analytic tools for showing the correlation between two variables, especially when the regression line and the regression equation with its slope and intercept are also shown.

Statistics Charts

The Statistics Charts offer a simple and clear way of analyzing the summary statistics of all your variables: minimum, maximum, average, median, and standard deviation. Moreover, for all variables, GeneXproTools also plots the slope, intercept, correlation coefficient and R-square, all evaluated against the response variable.

These statistics are also shown on the Statistics Report, but the Statistics Charts aggregate them all together by statistic so that you can visualize the summary statistics of all your variables quickly with just a glance.

Autocorrelation Function

The Autocorrelation Function measures the correlation of a signal with itself shifted by some time delay. GeneXproTools plots the Autocorrelation Function in the Statistics Charts in the Data Panel. With its help you can find the right Embedding Dimension and Delay Time for your time series data so that you can create the best possible forecasting models.

Choosing the Fitness Function

For Time Series Prediction problems, in the Fitness Function Tab of the Settings Panel you have access to a total of 49 built-in fitness functions, most of which combine multiple objectives, such as the use of different reference simple models, lower and upper bounds for the model output, parsimony pressure, variable pressure, and many more.

Additionally, you can also design your own custom fitness functions and explore the solution space with them. By clicking the Edit Custom Fitness button, the Custom Fitness Editor is opened and there you can write the code of your fitness function in JavaScript.

The kind of fitness function you choose will depend most probably on the cost function or error measure you are most familiar with. And although there is nothing wrong with this, for all of them can accomplish an efficient evolution, you might want to try different fitness functions for they travel the fitness landscape differently: some of them very straightforwardly in their pursuits while others choose less travelled paths, considerably enhancing the search process. Having different fitness functions in your modeling toolbox is also essential if you want to combine your forecasting models in more powerful ensembles.

Choosing the Function Set

GeneXproTools allows you to choose your function set from a total of 279 built-in mathematical functions and an unlimited number of custom functions, designed using the JavaScript language in the GeneXproTools environment.

Built-in Mathematical Functions

GeneXproTools offers a total of 279 built-in mathematical functions, including 186 different if then else rules, that can be used to design both linear and nonlinear time series prediction models. This wide range of mathematical functions allows the evolution of highly sophisticated and accurate models, easily built with the most appropriate functions. You can find the description of all the 279 built-in mathematical functions available in GeneXproTools, including their representation in the Online Knowledge Base.

The Function Selection Tools of GeneXproTools can help you in the selection of different function sets very quickly through the combination of the Show options with the Random, Default, Clear, and Select All buttons plus the Add/Reduce Weight buttons in the Functions Panel.

User Defined Functions

Despite the great diversity of GeneXproTools built-in mathematical functions, some users sometimes want to model with different ones. GeneXproTools gives the user the possibility of creating custom functions (called Dynamic UDFs and represented as DDFs in the generated code) in order to evolve models with them. Note however that the use of custom functions is computationally demanding, slowing considerably the evolutionary process and therefore should be used with moderation.

By selecting the Functions Tab in the Functions Panel, you have full access to all the available functions, including all the functions you've designed and all the built-in math functions. It's also here in the Functions Panel that you add your custom functions (Dynamic UDFs or DDFs) to your modeling toolbox through the Dynamic UDFs frame.

To add a custom function to your function set, just check the checkbox on the Select/Weight column and select the appropriate weight for the function (the weight determines the probability of each function being drawn during mutation and other random events in the creation/modification of programs). By default, the weight of each newly added function is 1, but you can increase the probability of a function being included in your models by increasing its weight in the Select/Weight column. GeneXproTools automatically balances your function set with the number of independent variables in your data, therefore you just have to select the set of functions for your problem and then choose their relative proportions by choosing their weights.

To create a new custom function, just click the Add button on the Dynamic UDFs frame and the DDF Editor appears. You can also edit old functions through the Edit button or remove them altogether from your modeling toolbox by clicking the Remove button.

By choosing the number of arguments (minimum is 1 and maximum is 4) in the Arguments combobox, the function header appears in the code window. Then you just have to write the body of the function in the code editor. The code must be in JavaScript and can be conveniently tested for compiling errors by clicking the Test button.

In the Definition box, you can write a brief description of the function for your future reference. The text you write there will appear in the Definition column in the Functions Panel.

Dynamic UDFs are extremely powerful and interesting tools as they are treated exactly like the built-in functions of GeneXproTools and therefore can be used to model all kinds of relationships not only between the original variables but also between derived features created on the fly by the learning algorithm. For instance, you can design a DDF so that it will model the log of the sum of four expressions, that is, DDF = log((expression 1) + (expression 2) + (expression 3) + (expression 4)), where the value of each expression will depend on the context of the DDF in the program.

Creating Derived Features/Variables

Derived variables or new features such as moving averages can be easily created in GeneXproTools. They are created in the Functions Panel in the Static UDFs Tab.

Historically, derived variables were called UDFs or User Defined Functions and in GeneXproTools they are represented as UDF0, UDF1, UDF2, and so on. Note however that UDFs are in fact new features derived from the original variables in the training and test data. Like DDFs, they are implemented in JavaScript using the UDF Editor of GeneXproTools.

These user defined features are then used by the learning algorithm exactly as the original features, that is, they are incorporated into the evolving models adaptively, with the most important being chosen and selected according to how much they contribute to the performance of each model.

Exploring the Learning Algorithms

GeneXproTools uses two different learning algorithms for Time Series Prediction problems. The first – the basic gene expression algorithm or simply Gene Expression Programming (GEP) – does not support the direct manipulation of random numerical constants, whereas the second – GEP with Random Numerical Constants or GEP-RNC for short – implements a structure for handling them directly. These two algorithms search the solution landscape differently and therefore it might be a good idea to try them both on your problems. For example, GEP-RNC models are usually more compact than models generated without random numerical constants.

The kinds of models these algorithms produce are quite different and, even if both of them perform equally well on the problem at hand, you might still prefer one over the other. But there are cases, however, where numerical constants are crucial for an efficient modeling and, therefore, the GEP-RNC algorithm is the default in GeneXproTools. You activate this algorithm in the Settings Panel -> Numerical Constants by checking the Use Random Numerical Constants checkbox. In the Numerical Constants Tab you can also adjust the range and type of constants and also the number of constants per gene.

The GEP-RNC algorithm is slightly more complex than the basic gene expression algorithm as it uses an additional gene domain (Dc) for encoding the random numerical constants. Consequently, this algorithm includes an additional set of genetic operators (RNC Mutation, Constant Fine-Tuning, Constant Range Finding, Constant Insertion, Dc Mutation, Dc Inversion, Dc IS Transposition, and Dc Permutation) especially developed for handling random numerical constants (if you are not familiar with these operators, please use the default Optimal Evolution Strategy by selecting Optimal Evolution in the Strategy combobox as it works very well in all cases; or you can learn more about the genetic operators in the Legacy Knowledge Base).

Monitoring the Modeling Process

While the model is being created by the learning algorithm, you can evaluate and visualize the actual design process through the real-time monitoring of different model fitting charts and statistics in the Run Panel, such as different curve fitting charts, the scatter plot, the residuals plot, the correlation coefficient, the R-square and the fitness. Both the correlation coefficient and R-square measure the correlation between the model output and the target (the actual values of the dependent variable).

In the Run Panel of the Time Series Prediction Framework of GeneXproTools you have access to a total of 15 different charts not only for visualizing model design but also for monitoring evolution itself.

Curve Fitting Chart
The Curve Fitting Chart plots the target and the model output of the first 1000 data points of the training set and shows how well the evolving models are fitting the target. The Curve Fitting Chart can be invoked any time during model design by selecting Curve Fitting in the rightmost combobox at the bottom.
Target Sorted Fitting Chart
This chart plots the target and the model output of the first 1000 training data points sorted by target and shows how well the evolving models are fitting the target. The Target Sorted Fitting Chart can be invoked any time during model design by selecting Target Sorted Fitting in the rightmost combobox at the bottom.
Model Sorted Fitting Chart
This chart plots the target and the model output of the first 1000 training data points sorted by model output and shows how well the evolving models are fitting the target. The Model Sorted Fitting Chart can be invoked any time during model design by selecting Model Sorted Fitting in the rightmost combobox at the bottom.
Stacked Distributions Chart
The Stacked Distributions Chart shows how the target and model outputs cover the entire range of target and model outputs by plotting both distributions in parallel stacked scatter plots with dummy random points in the Y-axis and shows clearly the spread and overlap of the actual and predicted values. The Stacked Distributions Chart can be invoked any time during model design by selecting Stacked Distributions in the rightmost combobox at the bottom.
Scatter Plot
The Scatter Plot, with the regression line and regression equation, shows the correlation between the target and model output. GeneXproTools also shows the model R-square in the Scatter Plot which measures the percentage of variance in the target explained by the model. The Scatter Plot can be invoked any time during model design by selecting Scatter Plot in the rightmost combobox at the bottom.
Residuals Plot
The Residuals Plot shows the correlation between the residuals and model output and consists of the standard residual analysis for detecting unusual patterns in the distribution of the residuals. The Residuals Plot can be invoked any time during model design by selecting Residuals Plot in the rightmost combobox at the bottom.
Variables Usage Map
The Variables Usage Map shows not only the variables that are being used by the best-of-generation models but also their count. Not only after a run but also during evolution, by placing the cursor over each square you can access the variable ID and label and the number of times it appears in the current model. In addition, through the context menu you can also change the appearance of the Variable Usage Map by choosing Heat Map, Random Colors, or Monochromatic.
Evolutionary Dynamics Chart
The Evolutionary Dynamics Chart shows the average fitness of the population plus the fitness and R-square of the best-of-generation model for periods of 1000 generations at a time, refreshing each time 1000 generations go by.
Average/Best Size Chart
The Average/Best Size Chart compares the average program size of all the models in the population with the program size of the best-of-generation model for periods of 1000 generations at a time, refreshing each time 1000 generations go by. This chart is especially useful during simplification and can be activated any time during evolution by selecting Avg/Best Size in the rightmost combobox at the bottom.
Sub-Program Sizes Chart
The Sub-Program Sizes Chart shows the sizes of all the sub-programs of the best-of-generation model. Not only after a run but also during evolution, by placing the cursor over each bar or by choosing Show Labels in the context menu, you can access the size of all sub-programs in your model. The Sub-Program Sizes Chart can be activated any time during model design by selecting Sub-Program Sizes in the leftmost combobox at the bottom.
Program Size Chart
The Program Size Chart shows the size of the best-of-generation model. Not only after a run but also during evolution, by placing the cursor over the horizontal bar you can access the size of the best model.
Size Distribution Chart
The Size Distribution Chart shows the histogram of the program sizes for each generation of evolving models. This is particularly useful if you are designing your own fitness functions or creating your own modeling strategies by adjusting the rates of the genetic operators. Not only after a run but also during evolution, by placing the cursor over each bar or by choosing Show Labels in the context menu, you can access the frequency of all bins in the histogram. The Size Distribution Chart can be activated any time during evolution by selecting Size Distribution in the leftmost combobox at the bottom.
Fitness Distribution Chart
The Fitness Distribution Chart shows the histogram of the fitness values for each generation of evolving models. This is particularly useful if you are designing your own fitness functions or creating your own modeling strategies by adjusting the rates of the genetic operators. Not only after a run but also during evolution, by placing the cursor over each bar or by choosing Show Labels in the context menu, you can access the frequency of all bins in the histogram. The Fitness Distribution Chart can be activated any time during evolution by selecting Fitness Distribution in the leftmost combobox at the bottom.
All Sizes Chart
The All Sizes Chart shows the sizes of all the models in the population. Not only after a run but also during evolution, by placing the cursor over each bar or by choosing Show Labels in the context menu, you can access the size of a particular model. The best-of-generation model always occupies the first position so you can also easily see how it fares relatively to the others. The All Sizes Chart can be activated any time during evolution by selecting All Sizes in the leftmost combobox at the bottom.
All Fitnesses Chart
The All Fitnesses Chart shows the fitness of all the models in the population. Not only after a run but also during evolution, by placing the cursor over each bar or by choosing Show Labels in the context menu, you can access the fitness of a particular model. The best-of-generation model always occupies the first position so you can also easily see how it fares relatively to the others. The All Fitnesses Chart can be activated any time during evolution by selecting All Fitnesses in the leftmost combobox at the bottom.

The evolutionary process can be stopped whenever you are satisfied with the results by clicking the Stop button or you can use one of the stop conditions of GeneXproTools for stopping the design process exactly when you see fit.

When the evolutionary process stops, the best-of-run model is ready either for analysis or for making predictions. And if you are still not happy with the results, you can continue the fine-tuning of your model by clicking any of the optimization buttons GeneXproTools provides: Continue, Simplify and Complexify. You can repeat this process for as long as you see fit or until you are completely satisfied with your model.

Model Selection

GeneXproTools saves all the best-of-generation models designed during a run and you can select any of them for analysis in any of the panels with model navigation (Run Panel, Predictions Panel, Data Panel and Model Panel). In addition, in the History Panel GeneXproTools lists all your models, allowing you to select any of the models in the History by checking the model you are interested in.

Favorite Statistics
GeneXproTools allows you to use Favorite Statistics not only in the History Panel for model selection but also for ensemble management in the Deploy Ensemble to Excel Window.

The favorite statistics for Time Series Prediction include:

For example, by using your favorite statistic in the History Panel you can then sort all your models by your favorite statistic either in the training or testing data. Then by reindexing your models through the Rename All Models functionality, you can analyze them in a particular order in any of the panels with model navigation so that you can gain insight into their structure and performance.

Changing the Prediction Mode
GeneXproTools allows you to go from Testing Mode to Prediction Mode and the other way around and to change the number of Testing Predictions without deleting the models in the run History. This is a powerful modeling tool as it allows you to select your models by their performance in the testing data and then use the selected models for forecasting. The images below show a model created in Testing Mode to forecast product sales. The first image shows the testing results obtained for past known sales and the second shows the model forecast.

Simplifying a Model

GeneXproTools allows you to simplify an existing model (either created with GeneXproTools or with another modeling technology) either by clicking the Simplify button on the Run Panel or by turning on the Parsimony Pressure in the Fitness Function Tab and then clicking Continue or Simplify in the Run Panel. GeneXproTools allows you to adjust the parsimony pressure you exert on the size of the evolving models, but bigger models can always appear during evolution if the gain in fitness trumps the smaller size.

For models created outside GeneXproTools or for GeneXproTools models modified by the user in the Change Seed Window, the starting model is fed to the learning algorithm through the Change Seed Window where both the fitness and structural soundness of the model are tested.

Then, in the Run Panel, by clicking the Simplify button, an evolutionary process starts in which all the subsequent models will be descendants of the model you want to simplify. Keep in mind, however, that the simplification algorithms GeneXproTools uses are evolutionary in nature and models continue to be selected primarily by fitness. This means that their complexity might even increase temporarily if the gain in fitness outweighs the loss in simplicity.

For models created in the GeneXproTools environment, you just have to select the model you want to simplify (either the best-of-run or an intermediate model) and then click the Simplify button and let the algorithm create better descendants not only in terms of fitness but also in terms of size.

Model Evaluation & Testing

While the time series prediction model is being created by the learning algorithm, you can evaluate and visualize the actual design process through the real-time monitoring of different model fitting charts and statistics in the Run Panel. Then in the Predictions Panel you can further evaluate your model using different charts and analyses.

Comparing Actual & Predicted Values

GeneXproTools offers two different ways of analyzing and comparing the output of your model with the actual or target values both for the training and testing data.

In the first, the Target or actual values are listed in a Table side by side with the predicted values or Model output. In the second, the target and predicted values are plotted in a Curve Fitting Chart for easy visualization. And both table and chart are shown simultaneously in the Predictions Panel.

If you are testing the predictive accuracy of your model using past known observations or backtesting, the comparison of the predictions with the actual values are listed at the end of the table and highlighted in blue and are also plotted in the Curve Fitting Chart whenever All is selected in the Chart options.

In addition, these testing predictions can also be observed separately in a different Curve Fitting Chart by selecting Testing in the Chart options. Note, however, that when Training or All is selected in the Chart options, the measures of fit shown in the Statistics Report refer to the training data; when Testing is selected, the measures of fit refer obviously to the testing data.

Evaluating Performance

GeneXproTools allows a quick and easy assessment of a wide range of statistics for measuring the goodness of fit. Most of these measures of fit are immediately computed and shown in the Statistics Report every time you go to the Predictions Panel (for instance, mean squared error, root mean squared error, mean absolute error, relative squared error, root relative squared error, relative absolute error, R-square, correlation coefficient and fitness).

If you are backtesting the predictive accuracy of your model using past known observations, you can evaluate the same set of statistics (mean squared error, root mean squared error, mean absolute error, relative squared error, root relative squared error, relative absolute error, R-square, correlation coefficient and fitness) for the recursive testing by selecting Testing in the Chart options.

These and other measures of fit (Up/Down Accuracy, Up/Down Error, Up/Down Hits, and Up/Down Errors) are also shown in the Report Panel for the active model, but you must evaluate them first in the Predictions Panel.

Variable Importance

GeneXproTools uses a sophisticated stochastic method to compute the variable importance of all the variables in a model. For all forecasting models the importance of each model variable is computed by randomizing its input values and then computing the decrease in the R-square between the model output and the target. The results for all variables are then normalized so that they add up to 1.

GeneXproTools evaluates the variable importance of all the variables (original and derived) in a model and shows the results in the Statistics Report in the Data Panel. The variable importance is also shown graphically in the Variable Importance Chart in the Data Panel.

The Variable Importance Chart is available through the Statistics Charts in the Data Panel. By selecting Model Variables in the Variables combobox, you can quickly access the variables of each model and quickly visualize their relative importance in a chart.

Modeling from Seed Models

GeneXproTools allows the use of an existing model (either generated by GeneXproTools or by another modeling tool) as the starting point of an evolutionary process in order to create better models.

For models created outside GeneXproTools or for GeneXproTools models modified by the user in the Change Seed Window, the starting model or seed is fed to the algorithm through the Change Seed Window where both the fitness and structural soundness of the model are tested. The Change Seed Window accepts Karva Code only, so you must translate your model into Karva notation first in order to explore it in GeneXproTools.

Then, in the Run Panel, by clicking any of the optimization buttons GeneXproTools provides (Continue, Simplify and Complexify), an evolutionary process starts in which all the subsequent models will be descendants of the seed you introduced. Note, however, that if your seed has a very small fitness, you risk losing it early in the run as better models could be randomly created by GeneXproTools, leaving your seed behind. If your seed has zero fitness, though, you will receive a warning, allowing you to modify your seed until it becomes a viable seed capable of breeding new models.

For models created in the GeneXproTools environment, the seed (the active model) is fed automatically to the algorithm every time you click Continue, Simplify or Complexify in the Run Panel.

Adding a Neutral Gene

The addition of a neutral gene to a program (in mathematical terms, it’s like adding zero or multiplying by one) might seem at first sight the wrong thing to do as we are usually interested in creating accurate and parsimonious models. But one should look at this as modeling in progress as this allows us to tackle a complex problem incrementally.

Indeed, being able to introduce extra terms into your evolving programs is a powerful modeling tool and GeneXproTools allows you to do that by selecting Add Neutral Gene in the Model menu or through the Change Seed Window.

When you click the Add Neutral Gene button in the Change Seed window, you will see a neutral gene being added to your model (in the example above, zero, encoded as -.d0.d0, was added to the program). By doing this, you are giving the learning algorithm more room to play and, hopefully, a better, more complex program will evolve.

Neutral genes can also be introduced automatically as part of a modeling strategy when you turn on the Complexity Increase Engine of GeneXproTools, which is the topic of the next section.

Complexity Increase Engine

GeneXproTools also allows you to introduce neutral genes automatically during a run by activating the Complexity Increase Engine in the Settings Panel -> General Settings Tab.

Whenever you are using the Complexity Increase Engine of GeneXproTools, you must fill the Generations Without Change box to set the period of time you think acceptable for evolution to occur without improvement in best fitness, after which a mass extinction and a neutral gene (an extra neutral term) is automatically added to your models; the Number of Tries corresponds to the number of consecutive evolutionary epochs (defined by the parameter Generations Without Change) you will allow before a neutral gene is introduced in all evolving models; in the Max Complexity box you write the maximum number of terms (genes) you’ll allow in your models and no other terms will be introduced beyond this threshold during the run.

The Complexity Increase Engine of GeneXproTools is a very powerful modeling tool, especially for time series forecasting models where good models often require the careful blending of extra terms, but you must be careful not to create excessively complex models as a greater complexity does not necessarily imply greater predictive accuracy.

Generating the Model Code

In the Model Panel you can see and analyze the model code not only in the programming language of your choice but also as a diagram representation or expression tree. GeneXproTools includes 17 built-in programming languages or grammars for Time Series Prediction. These grammars allow you to generate code automatically in some of the most popular programming languages around, namely Ada, C, C++, C#, Excel VBA, Fortran, Java, JavaScript, Matlab, Octave, Pascal, Perl, PHP, Python, R, Visual Basic, and VB.Net. But more importantly, GeneXproTools also allows you to add your own programming languages through user-defined grammars, which can be easily created using one of the built-in grammars as template.

Visualizing Models as Expression Trees

GeneXproTools includes a parse tree generator that automatically converts the native Karva code of your models into diagram representations or expression trees, allowing a quicker and more complete understanding of their mathematical intricacies. By placing the cursor over each node of the expression tree, you have access to the label of each variable and its index, the value of each numerical constant and the definition of each function.

Automatic Code Generation Using Built-in Grammars

GeneXproTools supports a total of 17 built-in programming languages so that the models evolved by GeneXproTools in its native Karva code can be automatically translated into some of the most commonly used programming languages (Ada, C, C++, C#, Excel VBA, Fortran, Java, JavaScript, Matlab, Pascal, Perl, PHP, Python, R, Octave, Visual Basic and VB.Net). This code can then be used in other applications to deploy the forecasting model.

Automatic Code Generation with User Defined Grammars

GeneXproTools allows the design of User Defined Grammars so that the models evolved by GeneXproTools in its native Karva code can be automatically translated into the programming language that you need. Indeed, if you need to generate model code in a programming language other than the 17 built-in programming languages of GeneXproTools available for Time Series Prediction (Ada, C, C++, C#, Excel VBA, Fortran, Java, JavaScript, Matlab, Pascal, Perl, PHP, Python, R, Octave, Visual Basic and VB.Net), you can easily create your own grammars to generate code automatically in as many languages as you need.

As an illustration, the C++ grammar of GeneXproTools is shown here. Other grammars may be easily created using this or other GeneXproTools built-in grammars as reference.

Making Predictions

The algorithms that GeneXproTools implements for Time Series Prediction allow not only the design of forecasting models to explain past events but also the immediate utilization of these models to make predictions either straightaway in the GeneXproTools environment or elsewhere using the generated code, which as we saw in the previous section is available in a total of 17 different programming languages.

GeneXproTools models for Time Series Prediction allow you to make two kinds of predictions: one for backtesting past known observations and another to forecast future events. In both cases, though, predictions are made recursively, by evaluating the forecast at t +1, then using it to forecast t+2, and so on.

GeneXproTools requires you to choose one of these methods while loading your time series, as this imposes some constraints on the transformation of the time series for training. However, you can change the Prediction Mode and also the number of testing records in the GeneXproTools modeling environment, in the General Settings Tab.

The first prediction mode – Testing Mode – can be used for backtesting as it allows you to test the forecasting accuracy of your models on a set of known past observations.

The second type – Prediction Mode – is used to forecast future events, and GeneXproTools allows you to venture into the future as far as you see fit. For that, in the Predictions Panel, you set the number of predictions in the Predictions box and then click the Predict button. Like when you are in Testing Mode, GeneXproTools allows you to plot the predictions together with the results obtained for the training data or to display them separately in a different chart. The first method is particularly useful for visualizing the overall trend and you access it by selecting All in the Chart combobox.

Deployment to Excel

GeneXproTools allows you to automatically deploy to Excel any model you create without intervention from IT. Thanks to the special-purpose Excel VBA grammar of GeneXproTools the complete code of your models becomes readily accessible without a need for a software developer. Note that for this feature to work you have to enable it in Excel. Visit the step by step instructions in this link.

Model Deployment

When you deploy a model to Excel, the Excel workbook includes not only the results obtained for the training data but also the recursive predictions made either for backtesting past observations or for forecasting future events, depending whether you are in Testing Mode or Prediction Mode.

Ensemble Deployment

In addition to the deployment of individual models to Excel, you can also deploy model ensembles to Excel, combining them in different ways. For time series prediction models, the model ensemble in the Excel workbook includes the average multi-model (Average Model) and the median multi-model (Median Model).

Last modified: November 9, 2013

Cite this as:

Ferreira, C. "Getting Started with Time Series Prediction." From GeneXproTools Tutorials – A Gepsoft Web Resource.
https://www.gepsoft.com/tutorials/GettingStartedWithTimeSeriesPrediction.htm

Time Limited Trial

Try GeneXproTools for free for 30 days!