| Introduction 
 
 
						With GeneXproServer 5.0 we are introducing an API (Application Programming Interface) that allows you to take control of the process of creating, improving and testing new models as well as scoring data against models and make predictions. You can use the API, for example, when you need to create complex workflows that are not supported by GeneXproServer’s job definition processing.
						 
						The focus of this first version of GeneXproServer’s API is simplicity. It defines a small set of operations that can be grasped and put to work in a few minutes if you 
						know how to program in any of the .NET languages such as C#, VB.NET, IronPython or C++ CLI. All code samples in this document are in C#.
						 Requirements
 
 
 
						The GeneXproServer 5.0 API was built against the .NET Framework 4.0. When starting a project you need to add a reference to the library gxps5api.dll that is installed to the folder C:\Program Files (x86)\GeneXproServer 50\ in 64 bits versions of Windows or to C:\Program Files\GeneXproServer 50\ otherwise. 
						
 GeneXproServer ships with a sample project with examples of all supported operations that can be found in the folder C:\Program Files (x86)\GeneXproServer 50\samples\GeneXproServerApiSample\ or C:\Program Files\GeneXproServer 50\samples\GeneXproServerApiSample\ assuming you installed GeneXproServer to the default location.
 Structure
 
 
 
						The current version of the API contains four interfaces and four public classes. The interfaces are IDataset, IRun, IScorer and IPredictor and are implemented internally. The creation of instances that implement these interfaces follows the Factory pattern which is implemented by the RunFactory static class. The classes Model and Statistics are DTOs 
						(Data Transfer Objects) and the RunEventArgs derives from EventArgs and is used to report on the processing engines.
						 
						All method calls in the interfaces are blocking calls and the member instances are not thread safe.
						 Start, Continue, Simplify and Complexify
 
 
 
						Opening and starting a run is a very simple operation:
						 
var run = RunFactory.OpenRun(@"c:\MyRun.gep");
run.Start(50);
Console.WriteLine(run.ActiveModel.TrainingStatistics.Fitness); 
						The snippet above opens the run MyRun.gep, processes it for 50 generations and then prints the training fitness to the console.
						 
						RunFactory.OpenRun returns an implementation of the interface IRun that allows you to start new runs and continue existing ones, change the current model and test existing models. It also contains a list of all the models in the run and summary information about both the training and validation datasets. Finally, the IRun interface includes an event that you can subscribe to 
						in order to receive notifications of the run processing. 
						 
						Continuing a run from the active model is also a simple operation:
						 
var run = RunFactory.OpenRun(@"c:\MyRun.gep");
run.Continue(50);
Console.WriteLine(run.ActiveModel.TrainingStatistics.Fitness); 
						This snippet opens the run MyRun.gep and continues 
						improving the active model for 50 generations. It finishes by printing the new training fitness to the console.
						 
						Simplify and Complexify operations are similar:
						 
						 
var run = RunFactory.OpenRun(@"c:\MyRun.gep");
run.Simplify(50); and 
						 
var run = RunFactory.OpenRun(@"c:\MyRun.gep");
run.Complexify(50); Changing the Active Model
 
 
 The IRun interface also exposes a way to change the active model: 
var run = RunFactory.OpenRun(@"c:\MyRun.gep");
Console.WriteLine(run.ActiveModel.Index);
run.SelectModel(run.Models[0]);
Console.WriteLine(run.ActiveModel.Index);
 The code above opens a run, prints the index of the active model, changes the active model to be the first model in the run and then prints the index (which is 1). Testing a Model
 
 
 IRun exposes functionality that lets you test a model against the training or validation datasets. This is also a simple operation where you only need to identify the model and the dataset type you want to test: 
var run = RunFactory.OpenRun(@"c:\MyRun.gep");
run.Test(run.Models[3], DataSetEnum.ValidationSet);
Console.WriteLine(run.Models[3].ValidationStatistics.Fitness); The code above opens the run and tests the model with the Index 4 (note that the Index of a model starts with 1) 
						on the validation set and finally prints the newly 
						evaluated fitness. Scoring Data
 
 
 Scoring data is a different operation and it is best 
						done in two phases because initializing the model for 
						scoring is an expensive operation. On the other hand, 
						after initialization is complete, scoring each case is a fast operation. All the initialization 
						process is done by the RunFactory class when you call OpenRunForScoring as seen below: 
var run = RunFactory.OpenRunForScoring (@"c:\MyRun.gep", OutputTypeEnum.RawModel);
var data = new double[] { 5, 1, 1, 1, 2, 1, 3, 1, 1 };
var result = run.Calculate(data);The first line opens the run and initializes the active model for scoring and returns an implementation of the IScorer interface. The second line creates some fake data for scoring. Note that the array must have the same number of items as there are variables in the dataset even if they are not being used by that specific model. Finally, scoring a record is just a matter of passing the array to the Calculate method. You can call the Calculate method repeatedly with new records without having to create new instances of IScorer. The IScorer interface has two overloaded Calculate methods. One accepts an array of doubles as in the example above, whereas the second accepts an array of strings. The former should be used when all the variables are numeric and the latter when there are categorical variables 
						or missing values in the dataset. The format of the data must match the format of the training dataset. Note that the OpenRunForScoring method of the RunFactory class also takes an output type enumeration which matches the "Output Type" in the Model Panel in GeneXproTools. This variable is only important for Logistic Regression and Classification runs. It must be set to RawModel for Regression, Time Series Prediction and Logic Synthesis. The following example has categorical values and 
						the most likely class as the output type: 
var run = RunFactory.OpenRunForScoring (@"c:\MyRun.gep", OutputTypeEnum.MostLikelyClass);
var data = new[] { "b", "30.83", "0", "u", "g", "w", "v", "1.25", "t", "t", "1", "f", "g", "202", "0" }; 
var result = r.Calculate(data);Making Predictions
 
 
 To make predictions in Time Series Prediction runs you request a different interface (IPredictor) which has a single method called Predict that takes the number of predictions to make and returns an array with the predictions: 
var run = RunFactory.OpenRunForPredictions (@"c:\MyTimSeriesRun.gep");
double[] predictions = run.Predict(5); In the example above the returned array contains five 
						predictions. Model Parameters & Model Statistics
 
 
 The IRun interface contains a list of models of type List. The Model class contains basic information about the model such as: 
						Id (Int32): This is the internal id of the model and it is unique throughout the run.Index (Int32): The order number of the model. Corresponds to the model number shown in the History Panel of GeneXproTools and it 
						is also unique.Generation (Int32): The generation when the model was created.IsActive (Boolean):  True if the model is the run’s active model.RoundingThreshold (Double): The value of the rounding threshold for Logistic Regression and Classification runs.Slope (Double): The value of the slope for Logistic Regression 
						runs.Intercept (Double): The value of the intercept for Logistic Regression 
						runs.TrainingStatistics (of type Statistics): Summary statistics 
						and performance measures for the training set.ValidationStatistics (of type Statistics): Summary statistics 
						and performance measures for the validation set. The Statistics class has the following members:
 
						Fitness (double?): The fitness of the model on the 
						dataset; it is null if it has not been calculated yet (validation 
						set only).FitnessName (string): The name of the fitness function used to calculate the fitness.Accuracy (double?): The accuracy of the model on the 
						dataset; it is null if it has not been calculated yet (validation 
						set only).Rsquare (double?): The R-square of the model on the 
						dataset; it is null if it has not been calculated yet (validation 
						set only).CorrelationCoefficient(double?): The correlation coefficient of the 
						model on the dataset; it is null if it has not been calculated yet (validation 
						set only).TruePositives (int?): The number of true positives (TP) of the 
						model on the dataset; it is null if it has not been calculated yet (validation 
						set only).TrueNegatives (int?): The number of true negatives (TN) of the 
						model on the dataset; it is null if it has not been calculated yet (validation 
						set only).FalsePositives (int?): The number of false positives (FP) of the 
						model on the dataset; it is null if it has not been calculated yet (validation 
						set only).FalseNegatives (int?): The number of false negatives (FN) of the 
						model on the dataset; it is null if it has not been calculated yet (validation 
						set only).CalculationErrors (int): The number of calculation 
						errors of the model on the dataset.Favorite (double?): The favorite statistic value of the 
						model on the dataset; it is null if it has not been calculated yet.FavoriteName (string): The name of the favorite statistic.Average (double): The average of the model output on 
						the dataset.StandardDeviation (double): The standard deviation of the 
						model output on the dataset.Min (double): The minimum value of the model output 
						on the dataset.Max (double): The maximum value of the model output 
						on the dataset. Replacing Dataset & Dataset Information
 
 
 Each run has at least one dataset (Training) and at most two (Training and Validation). The IRun information contains a Dictionary of these datasets indexed by DataSetEnum. Each dataset is an implementation of the IDataset interface that contains the number of records and variables and the type of the dataset. The interface also allows the replacement of the dataset’s contents with new data from a text file. The new data must have the exact same format but can 
						have any number of records, except zero or 1. 
var run = RunFactory.OpenRun(@"c:\MyRun.gep");
run.Datasets[DataSetEnum.TrainingSet].ReplaceDatasetWith(@"c:\newdata.txt", SeparatorEnum.Tab, true); The code above starts by opening the run and then proceeds to replace the training set with data from the text file newdata.txt. The 
						columns in the file are separated by tabs and they have 
						headers since the last argument is
						true.   See Also:
 
 
 Related Tutorials:
 
 
 Related Videos:
 
 
 
 
 |