The two
stepwise algorithms forward addition (forward selection) and backward elimination
(backward selection) are also sometimes called gradient methods as the next
addition or elimination step is performed on the basis of the steepest gradient
of the error surface. The forward selection begins by selecting the variable,
which results in the lowest error of prediction. In the next step, the variable
out of the remaining variables is added, which minimizes the error in combination
with the first variable. The stepwise addition of further variables is repeated
until an optimal subset is found with a maximum of n_{tot} steps. The backward elimination
works in the opposite direction by starting with all variables and eliminating
single variables. In addition, combinations of both methods are known as stepwise
multiple regressions [12]. Yet, the stepwise algorithms
fail to take the information into account that involves the combined effect
of several variables. Thus, these algorithms hardly find an optimal solution,
which requires several independent variables to be selected [25],[128]. The stepwise algorithms
walk during the minimum search in the valleys of the error surface and cannot
find minima surrounded by high mountains. In figure
4, the error surface of the selection of 2 variables out of 40 is shown
for the refrigerant data introduced in section 4.5.1.1.
Even in this figure, which represents only a highly constrained 2-dimensional
lateral surface of the 40-dimensional error surface, it is visible that the
error surface is too rough for the stepwise algorithms finding an optimal solution
and not usable for high dimensional data sets with many correlated variables.

figure 4:
Root mean square of prediction versus the index of the 2 variables selected.