mardi, octobre 3, 2023
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions
Edition Palladium
No Result
View All Result
  • Home
  • Artificial Intelligence
    • Robotics
  • Intelligent Agents
    • Data Mining
  • Machine Learning
    • Natural Language Processing
  • Computer Vision
  • Contact Us
  • Desinscription
Edition Palladium
  • Home
  • Artificial Intelligence
    • Robotics
  • Intelligent Agents
    • Data Mining
  • Machine Learning
    • Natural Language Processing
  • Computer Vision
  • Contact Us
  • Desinscription
No Result
View All Result
Edition Palladium
No Result
View All Result

Mastering Logistic Regression. From concept to implementation in Python | by Dr. Roi Yehoshua | Might, 2023

Admin by Admin
mai 20, 2023
in Artificial Intelligence
0
Mastering Logistic Regression. From concept to implementation in Python | by Dr. Roi Yehoshua | Might, 2023


The next plot reveals the log loss when y = 1:

The log loss equals to 0 solely in case of an ideal prediction (p = 1 and y = 1, or p = 0 and y = 0), and approaches infinity because the prediction will get worse (i.e., when y = 1 and p → 0 or y = 0 and p → 1).

The value perform calculates the typical loss over the entire information set:

The price perform could be written in a vectorized kind as follows:

the place y = (y₁, …, yₙ) is a vector that incorporates all of the labels of the coaching samples, and p = (p₁, …, pₙ) is a vector that incorporates all the expected chances of the mannequin for all of the coaching samples.

This value perform is convex, i.e., it has a single world minimal. Nonetheless, there isn’t any closed-form answer for locating the optimum w* (as a result of non-linearities launched by the log perform). Subsequently, we have to use iterative optimization strategies akin to gradient descent so as to discover the minimal.

Gradient descent is an iterative strategy for locating a minimal of a perform, the place we take small steps in the wrong way of the gradient so as to get nearer to the minimal:

Gradient descent

In an effort to use gradient descent to search out the minimal of the least squares value, we have to compute the partial derivatives of J(w) with respect to every one of many weights.

The partial by-product of J(w) with respect to any of the weights wⱼ is:

Proof:

Thus, the gradient vector could be written in vectorized kind as follows:

And the gradient descent replace rule is:

the place α is a studying charge that controls the step measurement (0 < α < 1).

Word that everytime you use gradient descent, you will need to guarantee that your information set is normalized (in any other case gradient descent could take steps of various sizes in numerous instructions, which is able to make it unstable).

We’ll now implement the logistic regression mannequin in Python from scratch, together with its value perform and gradient computation, optimizing the mannequin utilizing gradient descent, analysis of the mannequin, and plotting the ultimate resolution boundary.

For the demonstration we are going to use the Iris data set (BSD license). The unique information set incorporates 150 samples of Iris flowers that belong to one in all three species (Setosa, Versicolor and Virginica). We’ll make it right into a binary classification drawback by utilizing solely the primary two sorts of flowers (Setosa and Versicolor). As well as, we are going to use solely first two options of every flower (sepal width and sepal size).

Loading the Knowledge Set

Let’s first import the required libraries and repair the random seed so as to get reproducible outcomes:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

np.random.seed(0)

Subsequent, we load the information set:

from sklearn.datasets import load_iris

iris = load_iris()
X = iris.information[:, :2] # Take solely the primary two options
y = iris.goal

# Take solely the setosa and versicolor flowers
X = X[(y == 0) | (y == 1)]
y = y[(y == 0) | (y == 1)]

Let’s plot the information:

def plot_data(X, y):
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=iris.target_names[y], type=iris.target_names[y],
palette=['r','b'], markers=('s','o'), edgecolor='ok')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.legend()
plot_data(X, y)
The Iris information set

As could be seen, the information set is linearly separable, due to this fact logistic regression ought to be capable of discover the boundary between the 2 lessons.

Subsequent, we have to add a column of ones to the options matrix X so as to signify the bias (w₀):

# Add a column for the bias
n = X.form[0]
X_with_bias = np.hstack((np.ones((n, 1)), X))

We now break up the information set into coaching and take a look at units:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_with_bias, y, random_state=0)

Mannequin Implementation

We at the moment are able to implement the logistic regression mannequin. We begin by defining a helper perform to compute the sigmoid perform:

def sigmoid(z):
""" Compute the sigmoid of z (z generally is a scalar or a vector). """
z = np.array(z)
return 1 / (1 + np.exp(-z))

Subsequent, we implement the fee perform that returns the price of a logistic regression mannequin with parameters w on a given information set (X, y), and in addition its gradient with respect to w.

def cost_function(X, y, w):
""" J, grad = cost_function(X, y, w) computes the price of a logistic regression mannequin
with parameters w and the gradient of the fee w.r.t. to the parameters. """
# Compute the fee
p = sigmoid(X @ w)
J = -(1/n) * (y @ np.log(p) + (1-y) @ np.log(1-p))

# Compute the gradient
grad = (1/n) * X.T @ (p - y)
return J, grad

Word that we’re utilizing the vectorized types of the fee and the gradient features which have been proven beforehand.

To sanity examine this perform, let’s compute the fee and gradient of the mannequin on some random weight vector:

w = np.random.rand(X_train.form[1])
value, grad = cost_function(X_train, y_train, w)

print('w:', w)
print('Value at w:', value)
print('Gradient at w (zeros):', grad)

The output we get is:

w: [0.5488135  0.71518937 0.60276338]
Value at w: 2.314505839067951
Gradient at w (zeros): [0.36855061 1.86634895 1.27264487]

Gradient Descent Implementation

We now implement gradient descent so as to discover the optimum w* that minimizes the fee perform of the mannequin on a given coaching set. The algorithm will run at most max_iter passes over the coaching set (defaults to 5000), except the fee has not decreased by at the very least tol (defaults to 0.0001) because the earlier iteration, during which case the coaching stops instantly.

def optimize_model(X, y, alpha=0.01, max_iter=5000, tol=0.0001):
""" Optimize the mannequin utilizing gradient descent.
X, y: The coaching set
alpha: The educational charge
max_iter: The utmost variety of passes over the coaching set (epochs)
tol: The stopping criterion. Coaching will cease when (new_cost > value - tol)
"""
w = np.random.rand(X.form[1])
value, grad = cost_function(X, y, w)

for i in vary(max_iter + 1):
w = w - alpha * grad
new_cost, grad = cost_function(X, y, w)
if new_cost > value - tol:
print(f'Converged after {i} iterations')
return w, new_cost
value = new_cost

print('Most variety of iterations reached')
return w, value

Usually at this level you would need to normalize your information set, since gradient descent doesn’t work nicely with options which have completely different scales. In our particular information set normalization just isn’t obligatory because the ranges of the 2 options are related.

Let’s now name this perform to optimize our mannequin:

opt_w, value = optimize_model(X_train, y_train)

print('opt_w:', opt_w)
print('Value at opt_w:', value)

The algorithm converges after 1,413 iterations and the optimum w* we get is:

Converged after 1413 iterations
opt_w: [ 0.28014029 0.80541854 -1.48367938]
Value at opt_w: 0.28389717767222555

There are different optimizers you should use which are sometimes quicker than gradient descent, akin to conjugate gradient (CG) and truncated Newton (TNC). See scipy.optimize.minimize for extra particulars on how one can use these optimizers.

Utilizing the Mannequin for Predictions

Now that we’ve got discovered the optimum parameters of the mannequin, we will use it for predictions.

First, let’s write a perform that will get a matrix of recent samples X and returns their chances of belonging to the optimistic class:

def predict_prob(X, w):
""" Return the chance that samples in X belong to the optimistic class
X: the function matrix (each row in X represents one pattern)
w: the realized logistic regression parameters
"""
p = sigmoid(X @ w)
return p

The perform computes the predictions of the mannequin by merely taking the sigmoid of Xᵗw (which computes σ(wᵗx) for each row x within the matrix).

For instance, let’s discover out the chance {that a} pattern situated at (6, 2) belongs to the versicolor class:

predict_prob([[1, 6, 2]], opt_w)
array([0.89522808])

This pattern has 89.52% likelihood of being a versicolor flower. This is smart since this pattern is situated nicely throughout the space of the versicolor flowers removed from the border between the lessons.

Alternatively, the chance {that a} pattern situated at (5.5, 3) belongs to the versicolor class is:

predict_prob([[1, 5.5, 3]], opt_w)
array([0.56436688])

This time the chance is far decrease (solely 56.44%), since this pattern is near the border between the lessons.

Previous Post

Partnering with Avi Muchnick and BurnerPage to Increase Internet Web page Efficiency With Generative AI — LDV Capital

Next Post

Bringing A Damaged Industrial Arm Again To Life As A CNC Mill

Next Post
Bringing A Damaged Industrial Arm Again To Life As A CNC Mill

Bringing A Damaged Industrial Arm Again To Life As A CNC Mill

Trending Stories

Knowledge + Science

Knowledge + Science

octobre 2, 2023
Constructing Bill Extraction Bot utilizing LangChain and LLM

Constructing Bill Extraction Bot utilizing LangChain and LLM

octobre 2, 2023
SHAP vs. ALE for Characteristic Interactions: Understanding Conflicting Outcomes | by Valerie Carey | Oct, 2023

SHAP vs. ALE for Characteristic Interactions: Understanding Conflicting Outcomes | by Valerie Carey | Oct, 2023

octobre 2, 2023

Step into the UR+ purposes

octobre 2, 2023
Getting Began with Google’s Palm API Utilizing Python

Getting Began with Google’s Palm API Utilizing Python

octobre 2, 2023
Evaluating Language Competence of Llama 2-based fashions: Belebele Benchmark | by Geronimo | Oct, 2023

Evaluating Language Competence of Llama 2-based fashions: Belebele Benchmark | by Geronimo | Oct, 2023

octobre 2, 2023
Upskilling for Rising Industries Affected by Information Science

Upskilling for Rising Industries Affected by Information Science

octobre 2, 2023

Welcome to Rosa-Eterna The goal of The Rosa-Eterna is to give you the absolute best news sources for any topic! Our topics are carefully curated and constantly updated as we know the web moves fast so we try to as well.

Categories

  • Artificial Intelligence
  • Computer Vision
  • Data Mining
  • Intelligent Agents
  • Machine Learning
  • Natural Language Processing
  • Robotics

Recent News

Knowledge + Science

Knowledge + Science

octobre 2, 2023
Constructing Bill Extraction Bot utilizing LangChain and LLM

Constructing Bill Extraction Bot utilizing LangChain and LLM

octobre 2, 2023
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions

Copyright © 2023 Rosa Eterna | All Rights Reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
    • Robotics
  • Intelligent Agents
    • Data Mining
  • Machine Learning
    • Natural Language Processing
  • Computer Vision
  • Contact Us
  • Desinscription

Copyright © 2023 Rosa Eterna | All Rights Reserved.