Linear Discriminant functions
Linear Discriminant functions are the basis for the majority of Pattern Recognition techniques. It is a function that maps input features onto a classification space.
A dividing boundary that separates two clusters (group) are shown below-
The mathematical definition of such a decision boundary is a Discriminant function. A linear Discriminant (LD) is a linear combination of the components of x ={x1,x2….xd } given by a weight vector, w ={w1,w2….wd}, and a bias parameter, w0, also known as the threshold weight.
Then LD can be written as:
g(x)= wtx+w0 or as: g(x) = (w, x) + w0
In the simplest classification scenario we have two classes, the decision rule of a LD is:
– If g(x) > 0 then x is classified w1
– If g(x) < 0 then x is classified w2
g(x) = 0 defines the decision surface that separates the two classes. When g(x) is linear the decision surface, i.e. g(x), is a hyperplane.
How to Compute g(x)?
There are many different ways to compute the w, and w0, needed for the definition of g(x). Just some of them:
Bayes Theorem
Fisher’s linear discriminants
Perceptrons
Logistic Regression
Support Vector Machines
Minimum squared error (MSE) solution
Least-mean squares (LMS) rule
Ho-Kashyap procedure.
Linear Discriminant Function using Bayes Theorem
C= (g1(x), g2(x)…..gn(x))
wi: i=argmax(gi(x))
where, gi(x)= p(wi|x)
Discriminants DO NOT have to relate to probabilities.
For two-class problem:
Classes: w1, w2.
g1(x)= p(w1|x) ; g2(x)= p(w2|x) ; g(x)= g1(x)- g2(x)
Then (assuming that classes are encoded as +1 and -1): wi=sign(g(x)) i.e., Classify x in w1 if g(x)>0 or in w2, otherwise.