Linear & Logistic Regression in Machine Learning:
Some machine learning algorithms work on data that has the input values are pre defined with the corresponding output values. These algorithms are called supervised learning algorithms. It only restricts their output value to the output values provided in the data. Two of the most commonly used supervised 02+31learning algorithms are Linear and Logistic Regression.
What Is Regression?
Regression is a statistical method that allows you to predict a dependent output variable based on the values of independent input variables.
Regression, a type of supervised learning, finds the relationship between input and output values and, a given input data, to predict the output value. It does this by finding a mathematical, linear relationship between input and output values. It can have multiple inputs but has a single output.
You can understand regression better, using the diagram below. Using the given input variables or grocery ingredients, you can get a new output or dish. Here, Regression acts as a recipe used to find how these variables go together and the relationship between them.
Figure 1: Regression
What Is Classification?
Classification allows you to divide a given input into some pre-defined categories. The output is a discrete value, i.e., distinct, like 0/1, True/False, or a pre-defined output label class.
Linear Regression is a supervised machine learning algorithm where the predicted output is continuous and has a constant slope. It’s used to predict values within a continuous range, (e.g. sales, price) rather than trying to classify them into categories (e.g. cat, dog). There are two main types:
Simple regression
Simple linear regression uses traditional slope-intercept form, where 𝑚m and 𝑏b are the variables our algorithm will try to “learn” to produce the most accurate predictions. 𝑥x represents our input data and 𝑦y represents our prediction.
𝑦=𝑚𝑥+𝑏y=mx+b
Multivariable regression
A more complex, multi-variable linear equation might look like this, where 𝑤w represents the coefficients, or weights, our model will try to learn.
𝑓(𝑥,𝑦,𝑧)=𝑤1𝑥+𝑤2𝑦+𝑤3𝑧f(x,y,z)=w1x+w2y+w3z
The variables 𝑥,𝑦,𝑧x,y,z represent the attributes, or distinct pieces of information, we have about each observation. For sales predictions, these attributes might include a company’s advertising spend on radio, TV, and newspapers.
𝑆𝑎𝑙𝑒𝑠=𝑤1𝑅𝑎𝑑𝑖𝑜+𝑤2𝑇𝑉+𝑤3𝑁𝑒𝑤𝑠Sales=w1Radio+w2TV+w3News
Simple regression
Let’s say we are given a dataset with the following columns (features): how much a company spends on Radio advertising each year and its annual Sales in terms of units sold. We are trying to develop an equation that will let us to predict units sold based on how much a company spends on radio advertising. The rows (observations) represent companies.
Making predictions
Our prediction function outputs an estimate of sales given a company’s radio advertising spend and our current values for Weight and Bias.
𝑆𝑎𝑙𝑒𝑠=𝑊𝑒𝑖𝑔ℎ𝑡⋅𝑅𝑎𝑑𝑖𝑜+𝐵𝑖𝑎𝑠Sales=Weight⋅Radio+Bias
Weight
the coefficient for the Radio independent variable. In machine learning we call coefficients weights.
Radio
the independent variable. In machine learning we call these variables features.
Bias
the intercept where our line intercepts the y-axis. In machine learning we can call intercepts bias. Bias offsets all predictions that we make.
Our algorithm will try to learn the correct values for Weight and Bias. By the end of our training, our equation will approximate the line of best fit.
Logistic regression is a classification algorithm used to assign observations to a discrete set of classes. Unlike linear regression which outputs continuous number values, logistic regression transforms its output using the logistic sigmoid function to return a probability value which can then be mapped to two or more discrete classes.
Comparison to linear regression
Given data on time spent studying and exam scores. Linear Regression and logistic regression can predict different things:
Linear Regression could help us predict the student’s test score on a scale of 0 - 100. Linear regression predictions are continuous (numbers in a range).
Logistic Regression could help use predict whether the student passed or failed. Logistic regression predictions are discrete (only specific values or categories are allowed). We can also view probability scores underlying the model’s classifications.
Types of logistic regression
Binary (Pass/Fail)
Multi (Cats, Dogs, Sheep)
Ordinal (Low, Medium, High)
Binary logistic regression
Say we’re given data on student exam results and our goal is to predict whether a student will pass or fail based on number of hours slept and hours spent studying. We have two features (hours slept, hours studied) and two classes: passed (1) and failed (0).
Graphically we could represent our data with a scatter plot.
No comments