Consider a general relationship between a
dependent variable and n independent variables that is linearintheparameters:
where
= the i'th observation of the dependent variable
= the i'th observation of the j'th independent variable
= the coefficient associated with the j'th independent variable
Say m sets of observations
(measurements) of dependent and independent variables have been made, i.e.
This set of expressions can be rewritten
in more compact form using matrixvector notation as:
Letting
then
If the number of observations is equal to
the number of unknown parameters, then X is a square matrix. If X is a
square matrix, and if the inverse exists, the inverse can be computed directly and the unknown
parameters are estimated according to:
Back to top

However, it is usually the case that
there are more observations than unknown parameters. In this case X is no longer a
square matrix, and therefore, does not exist. Since there are more equations than
unknowns, this means that any solution will not be unique. Thus, we have to determine the
'best' and one way is to find a such that the sum of the squared
difference between the observed dependent variable and its estimates is a minimum,
namely,
which reads find a set of the unknowns, , such that the sum of squared difference
between the estimates obtaind using ,
that is:
and the corresponding observed value, , is a minimum. This is therefore an
optimisation problem where the objective is to find a that will set the sum of squared errors between observed and estimated
values to a minimum. Solution of this problem yields the least squares estimates
of as:
This can be verified easily as the
following will show:
Back to top
