Getting Started with Time Series Prediction
This tutorial covers the fundamental steps in the creation of
nonlinear forecasting models in the Time Series Prediction
Framework of GeneXproTools. The successful creation of predictive models
requires the use of very sharp modeling tools and in this tutorial
we’ll cover the most important ones.
|
Loading Time Series Data |
GeneXproTools handles time series one at a time.
So if you have multivariate time series you must create
a different project or run for each one of them. Before modeling,
each univariate time series is automatically transformed by GeneXproTools
according to parameters specified by the user.
Time Series Transformation Engine
GeneXproTools includes a time series transformation engine that transforms
the time series according to the Embedding Dimension and Delay Time.
In addition, GeneXproTools also transforms the time series depending on the
Prediction Mode – Prediction or Testing. The first mode,
Prediction Mode,
is used when you want to make predictions straightaway with your models;
the second, Testing Mode, is used for backtesting the predictive accuracy
of the evolved models on known past behavior. You must choose the
Prediction Mode in the New Run Wizard when you load your time series,
but you can change it later to a different mode in the
Settings Panel.
Before evolving a model with GeneXproTools you must first load
and transform the input time series for the learning algorithm.
GeneXproTools allows you to work
with databases/Excel, text
files and GeneXproTools files. In
all cases, though, the time series must be in a single
column:
TimeSeriesName 101 82 66 35 31 7 20 92 154 125 85 68 38 23 10 24 83 132 131 118
Then GeneXproTools automatically transforms the time series
according to your specifications (the Embedding Dimension,
Delay Time, and Prediction Mode you chose for your models). For
instance, for the small time series of 20 observations above, an
Embedding
Dimension of 5 and a Delay Time of 1 is automatically transformed
into the following training data, where t_0 is the response variable
and t_1 trough t_5 are the lagged predictor variables:
t_5 t_4 t_3 t_2 t_1 t_0
101 82 66 35 31 7
82 66 35 31 7 20
66 35 31 7 20 92
35 31 7 20 92 154
31 7 20 92 154 125
7 20 92 154 125 85
20 92 154 125 85 68
92 154 125 85 68 38
154 125 85 68 38 23
125 85 68 38 23 10
85 68 38 23 10 24
68 38 23 10 24 83 38 23 10 24 83 132 23 10 24 83 132 131 10 24 83 132 131 118
You can then observe the transformed
time series on the Training Data window
of the New Run Wizard before proceeding with the new
run.
Changing the Embedding Dimension & Delay Time
The time series transformation engine of GeneXproTools is operational every time you create a new run or every time you change either the
Embedding Dimension or the Delay Time, the
Prediction Mode or the Number of Testing Predictions in the General Settings Tab.
In practice, the Embedding Dimension corresponds to the number of
predictor variables after your time series has been transformed. The
Delay Time
t determines how
the time series is processed, that is, continuously if
t = 1 or at t intervals.
These two parameters, together with the size of the time series and
the Prediction Mode, will determine the final
number of training
records after the transformation of the time series.
The time series is then ready to be
used as training data by the
learning algorithm and you just have to click the Start button
in the Run Panel to create your models.
|
Data Visualization Tools |
GeneXproTools allows you to
visualize and analyze both the original and the
transformed time series in the
Data Panel using different
charts and analyses.
Sequential Distribution Chart
The Sequential Distribution Chart, with the option to show the
standard deviation lines and
the average line, offers a simple and very effective way of
detecting outliers and analyzing the
distribution of values for all your variables.
Bivariate Line Chart
The Bivariate Line Chart is a very powerful and flexible tool that allows very useful
comparisons of any pair of different variables. With the Bivariate Line Chart you can select
any two variables and then plot them in order or sorted in different ways.
This chart also
allows you to scale your variables so that you can compare them in a
more meaningful way.
Histogram
With the Histogram you can visualize very quickly the distribution
of values of all your variables.
GeneXproTools allows you to browse easily from one variable to the other and also change
the number of bins in your histograms.
Scatter Plot
Like the Bivariate Line Charts, Scatter Plots also allow the comparison of any
pair of
different variables which are easily selected using up-downs
both for the X-axis and Y-axis.
Scatter Plots are powerful analytic tools for showing the
correlation between
two variables, especially when the regression line and the
regression equation
with its slope and intercept are also shown.
Statistics Charts
The Statistics Charts offer a simple and clear way of analyzing the
summary statistics
of all your variables: minimum, maximum,
average, median, and standard deviation. Moreover,
for all variables, GeneXproTools also plots the
slope, intercept, correlation coefficient and
R-square,
all evaluated against the response variable.
These statistics are also shown on the Statistics Report, but the Statistics Charts
aggregate them all together by statistic so that you can visualize the summary statistics
of all your variables quickly with just a glance.
Autocorrelation Function
The Autocorrelation Function measures the correlation of a signal with itself
shifted by some time delay. GeneXproTools plots the Autocorrelation Function
in the Statistics Charts in the Data Panel. With its help you can find the
right Embedding Dimension and Delay Time for your time series
data so that you can create the best possible forecasting models.
|
Choosing the Fitness Function |
For Time Series Prediction problems, in the Fitness Function Tab of the Settings Panel you have access to
a total of 49
built-in fitness
functions, most of which combine multiple objectives, such as the use
of different reference simple models, lower and upper bounds for
the model output, parsimony pressure, variable pressure, and many more.
Additionally, you can also design your own
custom fitness functions and explore the solution space with
them.
By clicking the Edit Custom Fitness button, the Custom
Fitness
Editor is opened and there you can write the code of
your fitness function in JavaScript.
The kind of fitness function you choose will depend most probably on the
cost function
or error measure you are most familiar with. And although there is nothing wrong with this,
for all of them can accomplish an efficient
evolution, you might want to try different
fitness functions for they travel the
fitness landscape differently: some of them
very straightforwardly in their pursuits
while others choose less travelled paths,
considerably enhancing the search process. Having
different fitness functions in your modeling toolbox is also
essential if you want to combine your forecasting models in more powerful ensembles.
|
Choosing the Function Set |
GeneXproTools allows you to choose your function set from a total of
279
built-in mathematical functions and an unlimited number of
custom functions, designed using the JavaScript language in the GeneXproTools environment.
Built-in Mathematical Functions
GeneXproTools offers a total of 279
built-in mathematical functions, including 186 different if then else rules, that
can be used to design both linear and nonlinear time
series prediction models. This wide
range of mathematical functions allows the evolution of
highly sophisticated
and accurate models, easily built with the most appropriate functions.
You can find the description of all the 279 built-in mathematical functions
available in GeneXproTools, including their representation in the
Online Knowledge Base.
The Function Selection Tools of GeneXproTools
can help you
in the selection of different function sets very quickly
through the combination of the Show options with the
Random, Default, Clear, and Select All buttons plus the
Add/Reduce
Weight buttons in the Functions
Panel.
User Defined Functions
Despite the great diversity of GeneXproTools
built-in mathematical functions, some users
sometimes want to model with different ones.
GeneXproTools gives the user the possibility
of creating custom functions (called
Dynamic UDFs and represented as DDFs
in the generated code) in order to evolve models with them.
Note however that the use of custom functions is
computationally demanding, slowing considerably the evolutionary process and therefore should be used with moderation.
By selecting the Functions Tab in the Functions Panel, you have full access to
all the available functions, including all the
functions you've designed and all the built-in math functions. It's also here in the Functions Panel
that you add
your custom functions (Dynamic UDFs or DDFs) to your modeling
toolbox through the Dynamic UDFs frame.
To add a custom function to your function set, just check the checkbox on the Select/Weight column and
select the appropriate weight for the function (the weight determines the probability of each function being drawn
during mutation and other random events in the creation/modification of programs).
By default, the weight of each newly added function is 1, but you can increase the probability of a function being included in your models by increasing its weight in the Select/Weight column. GeneXproTools automatically balances your
function set with the number of independent variables in your data,
therefore you just have to select the set of functions for your problem and then choose their relative
proportions by choosing their weights.
To create a new custom function, just click the Add button on the Dynamic UDFs frame and the
DDF Editor appears. You can also edit old functions
through the Edit button or remove them
altogether from your modeling toolbox by clicking
the Remove button.
By choosing the number of arguments (minimum is 1 and maximum is 4) in the
Arguments combobox, the function header appears
in the code window. Then you just have to write the body of the function in the code editor. The code must be in JavaScript and can be
conveniently tested for compiling errors by clicking the Test button.
In the Definition box, you can write a brief description of the function for your future reference. The text you write
there will appear in the Definition column in the Functions Panel.
Dynamic UDFs are extremely powerful and interesting tools as they are treated exactly
like the built-in functions of GeneXproTools and therefore can be used to model
all kinds of relationships not only between the original variables but also between
derived features created on the fly by the learning algorithm. For instance, you can design
a DDF so that it will model the log of the sum of four expressions, that is,
DDF = log((expression 1) + (expression 2) + (expression 3) + (expression 4)),
where the value of each expression will depend on the context of the DDF in the
program.
|
Creating Derived Features/Variables |
Derived variables or new features such as
moving averages can be easily created in GeneXproTools.
They are created in the Functions Panel in the
Static UDFs Tab.
Historically, derived variables were called
UDFs or User Defined Functions
and in GeneXproTools they are represented as UDF0, UDF1, UDF2, and so on. Note however that
UDFs are in fact new features derived from the original variables in the training and test data.
Like DDFs, they are implemented in JavaScript using the
UDF Editor of GeneXproTools.
These user defined features are then used by the learning algorithm exactly as
the original features, that is, they are incorporated into the evolving models
adaptively, with the most important being chosen and selected according to
how much they contribute to the performance of each
model.
|
Exploring the Learning Algorithms |
GeneXproTools uses two different learning algorithms for
Time Series Prediction problems. The first – the basic gene expression algorithm
or simply
Gene Expression Programming (GEP) – does not support the direct manipulation of random numerical constants,
whereas the second – GEP with Random Numerical Constants or
GEP-RNC
for short – implements a structure for handling them directly. These
two algorithms search the solution landscape differently and
therefore it might be a good idea to try them both on your problems.
For example, GEP-RNC models are usually more
compact than models generated without random
numerical constants.
The kinds of models these algorithms produce are quite different
and, even if both of them perform equally well on the problem at hand,
you might still prefer one over the other. But there are cases, however,
where numerical constants are crucial for an efficient modeling and,
therefore, the GEP-RNC algorithm is the default in
GeneXproTools. You activate this algorithm in the Settings Panel ->
Numerical Constants by checking the Use Random Numerical Constants checkbox.
In the Numerical Constants Tab you can also adjust the range and
type of constants and also the number of constants per gene.
The GEP-RNC algorithm is slightly more complex than
the basic gene expression algorithm as it uses an additional gene domain (Dc) for encoding the random
numerical constants. Consequently, this algorithm
includes an additional set of genetic operators (RNC
Mutation, Constant Fine-Tuning, Constant Range
Finding, Constant Insertion, Dc Mutation, Dc Inversion, Dc IS
Transposition, and Dc Permutation) especially developed for handling random
numerical constants (if you are not familiar with these operators,
please use the default Optimal Evolution Strategy by
selecting Optimal Evolution in the Strategy combobox
as it works very well in all cases; or you can learn more about the
genetic operators in
the
Legacy Knowledge Base).
|
Monitoring the Modeling Process |
While the model is being created by the learning algorithm, you can evaluate and
visualize the actual design process through the real-time
monitoring of different
model fitting charts and statistics
in the Run Panel, such as different curve fitting
charts, the scatter plot, the residuals plot, the correlation
coefficient, the R-square and the fitness. Both the
correlation coefficient and R-square measure the
correlation between the model output and the target
(the actual values of the dependent variable).
In the Run Panel of the Time Series Prediction Framework of GeneXproTools you have access
to a total of 15 different charts not only for
visualizing model design but also for
monitoring evolution itself.
- Curve Fitting Chart
The Curve Fitting Chart plots the target and
the model output of the first
1000 data points of the training set
and shows how well the evolving models are fitting the target.
The Curve Fitting Chart can be
invoked any time during
model design by selecting Curve Fitting in the rightmost
combobox at the bottom.
- Target Sorted Fitting Chart
This chart plots the target and
the model output of the first
1000 training data points sorted
by target and shows how well the evolving models are fitting the target.
The Target Sorted Fitting Chart can be
invoked any time during
model design by selecting Target
Sorted Fitting in the rightmost combobox at the
bottom.
- Model Sorted Fitting Chart
This chart plots the target and
the model output of the first
1000 training data points sorted
by model output and shows how well the evolving models are fitting the target.
The Model Sorted Fitting Chart can be
invoked any time during
model design by selecting Model
Sorted Fitting in the rightmost combobox at the
bottom.
- Stacked Distributions Chart
The Stacked Distributions Chart
shows how the target and model
outputs cover the entire range
of target and model outputs by
plotting both distributions in
parallel stacked scatter plots
with dummy random points in the
Y-axis and shows clearly the
spread and overlap of the actual
and predicted values.
The Stacked Distributions Chart can be
invoked any time during
model design by selecting Stacked
Distributions in the rightmost combobox at the
bottom.
- Scatter Plot
The Scatter Plot, with the regression line and regression equation,
shows the correlation between the target and model output.
GeneXproTools also shows the model R-square in the Scatter Plot
which measures the percentage of variance in the target explained
by the model.
The Scatter Plot can be
invoked any time during
model design by selecting Scatter Plot in the rightmost
combobox at the bottom.
- Residuals Plot
The Residuals Plot shows the correlation between the residuals
and model output and consists of the standard residual analysis
for detecting unusual patterns in the distribution of the residuals.
The Residuals Plot can be
invoked any time during
model design by selecting Residuals Plot in the rightmost
combobox at the bottom.
- Variables Usage Map
The Variables Usage Map shows not only the variables that are being used by the
best-of-generation models but also their count. Not only after
a run but also during evolution, by placing the cursor over each
square you can access the variable ID
and label and the number of times it
appears in the current model. In
addition, through the context menu you can also change the appearance of
the Variable Usage Map by choosing Heat Map,
Random Colors, or Monochromatic.
- Evolutionary Dynamics Chart
The Evolutionary Dynamics Chart shows
the average fitness of the population plus the fitness and R-square of the best-of-generation model
for periods of 1000 generations
at a time, refreshing each time
1000 generations go by.
- Average/Best Size Chart
The Average/Best Size Chart compares the average
program size of all the models
in the population with the
program size of the
best-of-generation model for periods of 1000
generations at a time, refreshing each time 1000
generations go by. This
chart is especially useful
during simplification and can be
activated any time during
evolution by selecting Avg/Best
Size in the rightmost combobox at the bottom.
- Sub-Program Sizes Chart
The Sub-Program Sizes Chart shows
the sizes of all the sub-programs of the best-of-generation model.
Not only after a run but also during evolution, by placing the
cursor over each bar or by
choosing Show Labels in the
context menu, you can access the size of all sub-programs
in your model.
The Sub-Program Sizes
Chart can be activated any
time during model design by selecting Sub-Program Sizes in the
leftmost combobox at the bottom.
- Program Size Chart
The Program Size Chart shows
the size of the best-of-generation model. Not only after a run but
also during evolution, by placing the cursor over the horizontal
bar you can access the size of the best model.
- Size Distribution Chart
The Size Distribution Chart shows the histogram of the program sizes
for each generation of evolving models. This is particularly useful
if you are designing your own fitness functions or creating your own
modeling strategies by adjusting the rates of the genetic operators.
Not only after a run but also during evolution, by placing the
cursor over each bar or by
choosing Show Labels in the
context menu, you can access the frequency of all bins in the histogram.
The Size Distribution Chart can be activated any time during evolution
by selecting Size Distribution in the leftmost
combobox at the bottom.
- Fitness Distribution Chart
The Fitness Distribution Chart shows the histogram of the fitness values
for each generation of evolving models. This is particularly useful
if you are designing your own fitness functions or creating your own
modeling strategies by adjusting the rates of the genetic operators.
Not only after a run but also during evolution, by placing the
cursor over each bar or by
choosing Show Labels in the
context menu, you can access the frequency of all bins in the histogram.
The Fitness Distribution Chart can be activated any time during evolution
by selecting Fitness Distribution in the leftmost
combobox at the bottom.
- All Sizes Chart
The All Sizes Chart shows
the sizes of all the models in the population. Not only after a run
but also during evolution, by placing the cursor over each bar
or by choosing Show Labels in
the context menu, you can access the size of a particular model.
The
best-of-generation model always occupies the first position so
you can also easily see how it fares relatively to the others.
The All Sizes Chart can be activated any time during evolution
by selecting All Sizes in the leftmost combobox
at the bottom.
- All Fitnesses Chart
The All Fitnesses Chart shows
the fitness of all the models in the population. Not only after a
run but also during evolution, by placing the cursor over each
bar or by choosing Show Labels
in the context menu, you can access the fitness of a particular model.
The
best-of-generation model always occupies the first position so
you can also easily see how it fares relatively to the others.
The All Fitnesses Chart can be activated any time during
evolution by selecting All Fitnesses in the leftmost
combobox at the bottom.
The evolutionary process can be stopped whenever you are satisfied with the results by
clicking the
Stop button or you can use one of the
stop conditions of
GeneXproTools for stopping the design process exactly when you see fit.
When the evolutionary process stops, the best-of-run model is ready either for analysis or
for making predictions. And if you are still not happy with the results, you can
continue the fine-tuning of your model by
clicking any of the optimization buttons
GeneXproTools provides: Continue, Simplify
and Complexify. You can repeat this process
for as long as you see fit or until you are
completely satisfied with your model.
|
Model Selection |
GeneXproTools saves all the best-of-generation models
designed during a run and you can
select any of them for analysis
in any of the panels with model navigation
(Run Panel, Predictions Panel,
Data Panel and Model Panel). In addition, in the
History Panel
GeneXproTools lists all your models, allowing you to
select any of the models in the History by checking the
model you are interested in.
Favorite Statistics
GeneXproTools allows you to use Favorite Statistics not only in
the History Panel for model selection but also for ensemble management
in the Deploy Ensemble to Excel Window.
The favorite statistics for Time Series Prediction include:
For example, by using your favorite statistic in the History Panel you can
then sort all your models by your favorite statistic either in the training or
testing data. Then by reindexing your models through the Rename All Models functionality,
you can analyze them in a particular order in any of the panels with model navigation
so that you can gain insight into their structure and performance.
Changing the Prediction Mode
GeneXproTools allows you to go from Testing Mode to Prediction Mode and the other way around and
to change the number of Testing Predictions without deleting the models in the run History.
This is a powerful modeling tool as it allows you to select your models by their performance
in the testing data and then use the selected models for forecasting. The images below show
a model created in Testing Mode to forecast product sales. The first image shows the
testing results obtained for past known sales and the second shows the model forecast.
|
Simplifying a Model |
GeneXproTools allows you to simplify an existing model (either
created with GeneXproTools or with another modeling
technology) either by clicking the
Simplify button on the Run
Panel or by turning on the Parsimony
Pressure in the Fitness Function
Tab and then clicking Continue or
Simplify in the
Run Panel. GeneXproTools allows you
to adjust the parsimony pressure you
exert on the size of the evolving
models, but bigger models can always
appear during evolution if the gain in fitness trumps
the smaller size.
For models created outside GeneXproTools or for GeneXproTools
models modified by the user in the Change Seed Window, the starting model is fed to
the learning algorithm through the Change Seed Window where both the fitness and structural soundness
of the model are tested.
Then, in the Run Panel, by clicking the
Simplify button, an evolutionary process starts in which all the subsequent models will be descendants of the
model you want to simplify. Keep in mind, however, that the
simplification algorithms GeneXproTools
uses are evolutionary
in nature and models continue
to be selected primarily by fitness.
This means that their
complexity might even increase temporarily if the gain in fitness outweighs
the loss in simplicity.
For models created in the
GeneXproTools environment, you just
have to select the model you want to
simplify (either the best-of-run or
an intermediate model)
and then click the Simplify button and let the algorithm create
better descendants not only in terms of fitness but also in terms of
size.
|
Model Evaluation & Testing |
While the time series prediction model is being created by the learning algorithm,
you can evaluate and visualize the actual design process through the real-time monitoring
of different model fitting charts and statistics in the
Run Panel. Then in the
Predictions Panel you can further evaluate your model using different charts and
analyses.
Comparing Actual & Predicted Values
GeneXproTools offers two different ways of analyzing and comparing the output of your model with the actual or target values both for
the training and testing data.
In the first, the Target or actual values are listed in a
Table side by side with the predicted
values or Model output.
In the second, the target and predicted values are plotted in a
Curve Fitting Chart for easy visualization.
And both table and chart are shown simultaneously in the
Predictions
Panel.
If you are testing the predictive
accuracy of your model using past known observations
or backtesting, the comparison of the predictions with the actual values are listed at the end of the table
and highlighted in
blue and are also plotted in the
Curve Fitting Chart whenever All is selected
in the Chart options.
In addition, these testing predictions can also be observed separately in a different Curve Fitting
Chart by
selecting Testing in the Chart options.
Note, however, that when Training or All is selected in the Chart options,
the
measures of fit shown in the Statistics Report refer to the training data;
when Testing is selected, the
measures of fit refer obviously to the testing data.
Evaluating Performance
GeneXproTools allows a quick and easy assessment of a wide
range of statistics for measuring
the goodness of fit. Most of these
measures of fit are immediately computed and shown
in the Statistics
Report every time you
go to the
Predictions Panel
(for instance,
mean squared error,
root mean squared error,
mean absolute error,
relative squared error,
root relative squared
error,
relative absolute
error,
R-square,
correlation
coefficient and
fitness).
If you are backtesting the predictive
accuracy of your model using past known observations, you can
evaluate the same set of statistics
(mean squared error,
root mean squared error,
mean absolute error,
relative squared error,
root relative squared
error,
relative absolute
error,
R-square,
correlation
coefficient and
fitness) for the recursive testing by selecting
Testing in the
Chart options.
These and other measures of fit (Up/Down
Accuracy,
Up/Down Error,
Up/Down Hits, and
Up/Down Errors)
are also shown in the Report Panel for the
active model, but you must
evaluate them first in the Predictions Panel.
Variable Importance
GeneXproTools uses a sophisticated stochastic method to compute the
variable importance
of all the variables in a model. For all forecasting models the importance of each
model variable
is computed by randomizing its input values and then computing the decrease in
the R-square between the model output and the target. The results for all variables
are then normalized so that they add up to 1.
GeneXproTools evaluates the variable importance of all
the variables (original and derived)
in a model and shows the results in the
Statistics Report in the Data Panel. The variable importance is also shown graphically in the
Variable Importance Chart in the Data Panel.
The Variable Importance Chart is available through the
Statistics Charts in the Data Panel. By selecting
Model Variables in the Variables combobox,
you can quickly access the variables of each model
and quickly visualize their relative importance in a
chart.
|
Modeling from Seed Models |
GeneXproTools allows the use of an existing model (either generated by
GeneXproTools or by another modeling
tool) as the starting point of an evolutionary process
in order to create
better models.
For models created outside GeneXproTools or for GeneXproTools
models modified by the user in the
Change Seed Window, the starting model or
seed is fed to the algorithm through the
Change Seed Window where both the fitness and structural soundness of the model are tested.
The Change Seed Window accepts
Karva Code only, so you must translate your model into Karva notation
first in order to explore it in GeneXproTools.
Then, in the Run Panel,
by clicking any of the optimization buttons GeneXproTools provides (Continue,
Simplify and Complexify),
an evolutionary process starts in which all the subsequent models will be descendants of the seed you introduced.
Note, however, that if your seed has a very small fitness, you risk losing it early in the run
as better models could be randomly created by GeneXproTools, leaving your seed behind.
If your seed has zero fitness, though, you will receive a warning,
allowing you to modify
your seed until it becomes a viable
seed capable of breeding new models.
For models created in the
GeneXproTools environment, the seed
(the active model) is fed
automatically to the algorithm every
time you click Continue, Simplify
or Complexify in the Run
Panel.
|
Adding a Neutral Gene |
The addition of a neutral gene to a program
(in mathematical terms, it’s like adding zero or
multiplying by one) might seem at first sight the wrong thing to do
as we are usually interested in creating accurate and parsimonious models. But one
should look at this as modeling in progress as this allows us to tackle a complex problem incrementally.
Indeed, being able to introduce extra terms into your
evolving programs is a powerful modeling tool and
GeneXproTools allows you to do that by selecting
Add Neutral
Gene in the Model menu or through the
Change Seed Window.
When you
click the Add Neutral Gene button in the
Change Seed window, you will see a neutral gene being added to your model
(in the example above, zero, encoded as -.d0.d0, was
added to the program). By doing this, you are giving the learning algorithm more room to play and, hopefully, a better, more complex program will evolve.
Neutral genes can also be introduced automatically
as part of a modeling strategy when you turn on the Complexity Increase
Engine of GeneXproTools, which is the topic of the next section.
|
Complexity Increase Engine |
GeneXproTools also allows you to introduce neutral genes automatically
during a run by activating the
Complexity Increase
Engine in the Settings Panel -> General Settings Tab.
Whenever you are using the Complexity Increase Engine of GeneXproTools, you must fill the
Generations Without Change box to set the period of time you think acceptable for evolution to occur without improvement in best fitness, after which
a mass extinction and a neutral gene (an extra
neutral term) is automatically added to your
models; the Number of Tries corresponds to the
number of consecutive evolutionary epochs (defined by the parameter
Generations Without Change) you will allow before a neutral gene is
introduced in all evolving models; in the
Max Complexity box you write the maximum number of terms (genes) you’ll allow in your models and no other terms will be introduced beyond this threshold
during the run.
The Complexity Increase Engine of GeneXproTools
is a very powerful modeling tool,
especially for time series
forecasting models where good models
often require the careful blending
of extra terms, but you must be careful not to create excessively complex models as a greater complexity
does not necessarily imply greater
predictive accuracy.
|
Generating the Model Code |
In the Model Panel you can see
and analyze the model code not only in the programming language of
your choice but also as a diagram representation or expression tree.
GeneXproTools includes 17 built-in programming
languages or grammars for Time Series
Prediction.
These grammars allow you to generate code automatically in
some of the most popular programming
languages around, namely Ada, C, C++, C#, Excel VBA, Fortran, Java, JavaScript, Matlab,
Octave, Pascal, Perl, PHP, Python, R, Visual Basic,
and VB.Net.
But more importantly, GeneXproTools also allows you
to add your own programming languages through
user-defined grammars, which can be easily
created using one of the built-in grammars as
template.
Visualizing Models as Expression Trees
GeneXproTools includes a parse tree generator that automatically converts
the native
Karva code of your models into diagram representations or
expression trees, allowing a
quicker and more complete
understanding of their mathematical
intricacies. By placing the cursor over each node of the expression tree,
you have access to the label of each variable and its index,
the value of each numerical constant and the definition of each function.
Automatic Code Generation Using Built-in Grammars
GeneXproTools supports a total of 17
built-in programming languages so that the models evolved by
GeneXproTools in its native
Karva code
can be automatically translated into
some of the most commonly used programming languages
(Ada, C, C++, C#, Excel VBA,
Fortran, Java, JavaScript, Matlab,
Pascal, Perl, PHP, Python, R,
Octave, Visual Basic and VB.Net). This code can then be used in other applications to
deploy the forecasting model.
Automatic Code Generation with User Defined Grammars
GeneXproTools allows the design of
User Defined Grammars so that the models evolved by
GeneXproTools in its native
Karva code
can be automatically translated into the programming language
that you need. Indeed, if you need to generate model code in
a programming language other than the 17 built-in
programming languages of
GeneXproTools available for Time
Series Prediction (Ada, C, C++, C#,
Excel VBA, Fortran, Java,
JavaScript, Matlab, Pascal, Perl,
PHP, Python, R, Octave, Visual Basic
and VB.Net), you can easily create your own grammars to generate code
automatically in as many languages as you need.
As an illustration, the C++ grammar of GeneXproTools is shown
here.
Other grammars may be easily created using this or other GeneXproTools
built-in grammars as reference.
|
Making Predictions |
The algorithms that GeneXproTools implements for Time Series Prediction allow not only the
design of forecasting models
to explain past events but also the immediate
utilization of these models to make predictions
either straightaway in the
GeneXproTools environment or
elsewhere using the generated code,
which as we saw in the
previous section is available in a total of 17
different programming languages.
GeneXproTools models for Time Series
Prediction allow you to make two kinds of predictions: one for
backtesting past known observations and
another to forecast future events. In both cases, though, predictions are made recursively,
by evaluating the forecast at
t +1, then using it to forecast t+2, and so on.
GeneXproTools requires you to choose one of these
methods while loading your time series, as this
imposes some constraints on the transformation of
the time series for training. However, you can change the
Prediction Mode and also the
number of testing records in the GeneXproTools
modeling environment, in the
General Settings Tab.
The first prediction mode – Testing Mode – can be used for
backtesting as it allows you to test the forecasting
accuracy of your models on a set of
known past observations.
The second type – Prediction Mode – is used
to forecast future events, and
GeneXproTools allows you to venture into the future as far as you see fit.
For that, in the Predictions
Panel, you set the number of
predictions in the Predictions
box and then click the Predict button.
Like when you are in Testing Mode,
GeneXproTools allows you to plot the predictions together
with the results obtained for the training
data or to display them separately in a
different chart. The first method is particularly useful
for visualizing the overall trend and you access it by
selecting All in the Chart combobox.
|
Deployment to Excel |
GeneXproTools allows you to automatically deploy to Excel any model you create without intervention from IT. Thanks to the special-purpose
Excel VBA grammar of GeneXproTools the complete code of your models becomes readily
accessible without a need for a software developer. Note that for this feature
to work you have to enable it in Excel. Visit the step by step instructions in
this link.
Model Deployment
When you deploy a model to Excel, the Excel workbook includes not only the results
obtained for the training data but also the recursive predictions made either for
backtesting past observations or for forecasting future events, depending whether
you are in Testing Mode or Prediction Mode.
Ensemble Deployment
In addition to the deployment of individual models to Excel, you can also
deploy model ensembles to Excel, combining them in different ways.
For time series prediction models, the model ensemble
in the Excel workbook includes the
average multi-model (Average Model) and the median multi-model (Median Model).
Last modified:
November 9, 2013
Cite this as:
Ferreira, C. "Getting Started with
Time Series Prediction." From GeneXproTools
Tutorials – A Gepsoft Web Resource.
https://www.gepsoft.com/tutorials/GettingStartedWithTimeSeriesPrediction.htm
|