Machine Learning Classification

Himanshu Lohiya
3 min readJul 7, 2018

Classification is the process of predicting the class of given data points. Classes are sometimes called as targets/ labels or categories.

Classification is a supervised learning approach

Types of Classification

  1. Bi-class Classifiers (like the mail is spam or non-spam)
  2. Multi-class Classifiers
  3. Multi-label Classifiers

Some examples are: speech recognition, handwriting recognition, bio metric identification, document classification etc.

Types of classification algorithms in Machine Learning:

  1. Linear Classifiers: Naive Bayes Classifier, Logistic Regression
  2. Support Vector Machines
  3. Decision Trees
  4. Boosted Trees
  5. Random Forest
  6. Neural Networks
  7. Nearest Neighbour

Naive Bayes Classifier (Generative Learning Model) :

Naive Bayes model is easy to build and particularly useful for very large data sets. It outperform’s even highly sophisticated classification methods.

Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability.

Logistic Regression (Predictive Learning Model) :

The goal of it is to find the best fitting model to describe the relationship between the dichotomous characteristic of interest (dependent variable = outcome variable) and a set of independent (predictor) variables.

It is a statistical method for analysing a data set in which there are one or more independent variables that determine an outcome.
The outcome is measured with a dichotomous variable (in which there are only two possible outcomes).

Decision Trees:

It builds classification or regression models in the form of a tree structure.
It utilizes an if-then rule set which is mutually exclusive and exhaustive for classification. The tree is constructed in a top-down recursive divide-and-conquer manner.
Decision trees can handle both categorical and numerical data.
A decision tree can be easily over-fitted generating too many branches and may reflect anomalies due to noise or outliers.

Random Forest:

Its an ensemble learning method for classification, regression and other tasks.
At training time constructs multitude of decision trees and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.
It correct for decision trees’ habit of over fitting to their training set.
RF is not as performant (memory and cpu wise) as some other classifiers, but I always found it very efficient with small datasets (80k — 120k rows).

Neural Network:

Neurons arranged in layers which convert an input vector into some output. Each unit takes an input, applies a function to it and then passes the output on to the next layer.
Generally the networks are defined to be feed-forward: a unit feeds its output to all the units on the next layer, but there is no feedback to the previous layer.
Weightings are applied to the signals passing from one unit to another, and it is these weightings which are tuned in the training phase to adapt a neural network to the particular problem at hand.

Nearest Neighbor:

It takes a bunch of labelled points and uses them to label other points.

To label a new point, it looks at the labelled points closest to that new point, and has those neighbors vote, so whichever label the most of the neighbors have is the label for the new point (the “k” is the number of neighbors it checks).

--

--