The
most obvious method of selecting a subset of variables is the examination of all
combinations of variables. Thereby a subset of variables is selected, a neural
network utilizing only these variables is calibrated, and the error of
prediction of an independent test data set is calculated. Finally, the
combination with the smallest error of prediction is chosen. Besides of some problems
due to the random weight initialization of the networks and the limitation of
the size of the data set, this so-called brute force variable selection is the
most accurate approach. However, this approach is only feasible for a very
limited number of variables, as the number of variable subsets increases dramatically
with the number of variables.
For a
fixed number nv of
variables to be selected from ntot
variables in total, the number n of
different variable subsets can be calculated as [12],[126],[127]:
(14)
In the
common case, when an optimal solution is searched, the number of variables to
select is not fixed resulting in even more possible combinations n of variable subsets:
(15)
For example
40 variables (refrigerant data introduced in section
4.5.1.1) result in 1 099 511 627 775 different combinations
to be examined. If a fast up-to-date computer needs 1 minute for the training
of a neural net (the time needed for the prediction can be neglected) the examination
of all possible combinations needs 2 090 540 years computing time
rendering the brute force variable selection useless for this work.