In problems with multiple variables, one often finds models with high fitness, but these models often contain only a small portion of independent variables instead of the entire set. Is this reasonable? Can this be avoided?
Yes, this happens frequently and although it can be avoided, in real-world problems it is not good practice to avoid this as most datasets contain noisy information (including noisy variables) and one usually uses the algorithm to find exactly what is relevant and what is not. However, for some problems one can create better models by applying Variable Pressure (a new feature introduced in version 5) and it’s not uncommon to get better generalization on the test/validation set. It all depends on the problem, so you have to check how it works with your data. The good thing about Variable Pressure in GeneXproTools is that it’s adjustable, so you can apply a little pressure and see if your models are getting better. And if applying variable pressure is not really contributing to the creation of better models, the learning algorithm will very diligently include more variables in the models which upon analysis will turn out to be part of neutral blocks, for example, x1+x2-x2.