Before evolving a model with GeneXproTools 4.0 you must first load
and transform the input time series for the learning algorithm.
GeneXproTools 4.0 allows you to work either with databases/Excel or text
files. In both cases, though, the time series must be in a single
column:
101
82
66
35
31
7
20
92
154
125
85
68
38
23
10
24
83
132
131
118
Then GeneXproTools automatically transforms the time series
according to your specifications (the embedding dimension, delay
time, and prediction mode you chose for your analysis). For
instance, for the small time series of 20 observations above, an embedding
dimension of 5 and a delay time of 1 is automatically transformed
into the following training data:
101 82 66 35 31 7
82 66 35 31 7 20
66 35 31 7 20 92
35 31 7 20 92 154
31 7 20 92 154 125
7 20 92 154 125 85
20 92 154 125 85 68
92 154 125 85 68 38
154 125 85 68 38 23
125 85 68 38 23 10
85 68 38 23 10 24
68 38 23 10 24 83
38 23 10 24 83 132
23 10 24 83 132 131
10 24 83 132 131 118
To Load the Input Time Series for Modeling
- Click the File Menu and then choose New.
The New Run wizard appears. You must give a name to your new run file (the default filename extension of
GeneXproTools 4.0 run files is .gep) and then choose Time Series
Prediction in the Problem Category box. The frame with the
parameters used for transforming the time series lights up and
you must set the appropriate Embedding Dimension and Delay Time for transforming the time series. Then, depending on whether you wish to make predictions about the future or test the predictive power of the evolved models on known behavior, choose either Testing or
Prediction in the Prediction Mode box.
Here you must also choose the kind of source file
in the Data Source Type box and choose either Excel & Databases
or Text Files.
- Then go to the Training Data window by clicking the Next button.
Choose the path for the time series by browsing the Open dialog box.
The time series is automatically transformed and you can monitor both the original and transformed time
series by selecting the appropriate option in the Data box.
- Click Finish to save your new run file.
The Save As dialog box appears and after choosing the directory where you want your new run file to be saved, the
GeneXproTools modeling environment appears.
Then you just have to click the Evolve button to create a model as
GeneXproTools automatically chooses, from a gallery of templates, default settings that will enable you to
create a model immediately.
In time series modeling, be it performed by learning algorithms or conventional statistical methods, it is extremely important the way one chooses to transform the time series before embarking on a complex, usually time consuming modeling process.
First of all, there is the size of the time series to consider and how you choose to transform it, that is, which values will be chosen for the
Embedding Dimension and the Delay Time, because all these factors together will determine the number of
Training Samples actually used for modeling.
The time series transformation engine of GeneXproTools 4.0 is operational every time you create a new run or every time you change either the
Embedding Dimension or the Delay Time, the Prediction Mode or the
number of Testing Predictions during a
run in the General Settings Tab.
In practice, the embedding dimension corresponds to the number of independent variables or terminals after your time series has been transformed and, therefore, will have a strong impact on the complexity of the problem.
The delay time t determines how data are processed, that is, continuously if
t = 1 or at t
intervals.
These two parameters, together with the size of the time series and
the prediction mode, will determine the final number of training samples after the transformation of the time series.
The data are then ready for evolving prediction models with them and can be visualized in the
Data
Panel before evolving a new model.
And, as for Function Finding or Classification problems, it is important to
choose a reasonable number of samples for training: an excessively large number of samples will slow the modeling process
unnecessarily and will make it harder for the algorithm to find
useful patterns. A good rule of thumb consists of using about 8-10 samples for each independent variable in your training data. For instance, for a time series with 100 observations, an embedding dimension of 10 and a delay time of one, will result in a training set with 90 samples which consists of an extremely well balanced training set.
|