lundi, septembre 25, 2023
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions
Edition Palladium
No Result
View All Result
  • Home
  • Artificial Intelligence
    • Robotics
  • Intelligent Agents
    • Data Mining
  • Machine Learning
    • Natural Language Processing
  • Computer Vision
  • Contact Us
  • Desinscription
Edition Palladium
  • Home
  • Artificial Intelligence
    • Robotics
  • Intelligent Agents
    • Data Mining
  • Machine Learning
    • Natural Language Processing
  • Computer Vision
  • Contact Us
  • Desinscription
No Result
View All Result
Edition Palladium
No Result
View All Result

Information Professions Wage Analysis and Prediction | by Kevin Kibe | Jun, 2023

Admin by Admin
juin 12, 2023
in Machine Learning
0
Information Professions Wage Analysis and Prediction | by Kevin Kibe | Jun, 2023


Kevin Kibe

Picture Courtesy: The Human Capital Hub

Downside Assertion

This challenge goals to offer an answer for recruiters who need assistance with figuring out applicable wage ranges to supply candidates, in addition to candidates who might have clarification concerning the wage ranges.

The issue addressed by this challenge is the estimation of wage ranges for professionals within the knowledge trade, particularly for the roles of Information Analyst, Information Engineer, Information Scientist, and Machine Studying Engineer. The pocket book is linked on the backside of the article

Dataset

The dataset used is from a public salary dataset collected anonymously from professionals worldwide within the ML and Information Science House. The info features a description within the hyperlink. The dataset consists of salaries for a lot of totally different roles however I filtered out 4 roles i.e Information Analyst, Information Engineer, Information Scientist, and Machine Studying Engineer.

EDA

A small description of the dataset :

Courtesy: Creator

That is the distribution of expertise ranges within the knowledge, SE: Senior, MI: Mid-Degree, EN: Entry-Degree, EX: Govt-Degree.

Courtesy: Creator

A boxplot of the wage distribution of the salaries.

Information PreProcessing

I began by changing the worker residence and firm location from nation to continent utilizing the pycountry-convert library.

!pip set up -q pycountry-convert

def get_continent(col):
strive:
if len(col) == 2:
country_code = col
else:
country_code = computer.country_name_to_country_alpha2(col.strip(''"'))
continent_name = computer.convert_continent_code_to_continent_name(computer.country_alpha2_to_continent_code(country_code))
return continent_name
besides:
return None

df['company_location']=df['company_location'].apply(lambda x: get_continent(x))
df['employee_residence']=df['employee_residence'].apply(lambda x: get_continent(x))

Dropping pointless columns.

columns_to_drop = ['salary', 'salary_currency','remote_ratio']
new_df=df.drop(columns=columns_to_drop)

Eradicating outliers within the ‘salaries_in_usd’ column to get a greater illustration and distribution of the information.

def remove_outliers(df, column_name, threshold=1.5):
Q1 = new_df[column_name].quantile(0.25)
Q3 = new_df[column_name].quantile(0.75)

IQR = Q3 - Q1

lower_bound = Q1 - threshold * IQR
upper_bound = Q3 + threshold * IQR

filtered_df = df[(df[column_name] >= lower_bound) & (df[column_name] <= upper_bound)]

return filtered_df
new_df = remove_outliers(new_df, 'salary_in_usd')

Subsequent is to find out the wage ranges for the predictions utilizing the minimal and most wage quantities within the dataset.

max_salary = new_df['salary_in_usd'].max()
min_salary = new_df['salary_in_usd'].min()

num_subranges = 15

subranges = np.linspace(min_salary, max_salary, num=num_subranges+1, endpoint=True)
range_labels = []
for i in vary(len(subranges)-1):
subrange_min = int(subranges[i])
subrange_max = int(subranges[i+1])
range_label = f"{subrange_min:,} - {subrange_max:,}"
range_labels.append(range_label)

range_labels

Picture Courtesy: Creator

Performing one-hot encoding on the specific columns in our dataset.

categorical_cols=['experience_level','employment_type','job_title','company_size','employee_residence','company_location']
encoded_df = pd.get_dummies(new_df[categorical_cols], prefix=categorical_cols, prefix_sep='_')
df_encoded = pd.concat([new_df.drop(categorical_cols, axis=1), encoded_df], axis=1)

df_encoded_ = pd.get_dummies(df_encoded['work_year'], prefix='12 months')
df_encoded_f = pd.concat([df_encoded, df_encoded_], axis=1)

Mannequin Becoming

I attempted a number of fashions which you’ll see within the GitHub repo linked beneath however the best-performing one was the ridge regression.

param_grid = {'alpha': [0.1, 1.0, 10.0],

'solver': ['auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga']}

rig = Ridge()
grid_search = GridSearchCV(rig, param_grid, scoring='r2', cv=10)

grid_search.match(X_train, y_train)
best_ridge = grid_search.best_estimator_

y_pred = best_ridge.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Imply Squared Error (MSE):", mse)
print("Imply Absolute Error (MAE):", mae)
print("R-squared (R2) Rating:", r2)

Characteristic Importances

Plotting the options that influenced the mannequin

import seaborn as sns
import matplotlib.pyplot as plt

sns.set(fashion="whitegrid")
feature_importances = np.abs(mannequin.coef_)
print(feature_importances.form)
feature_names = X_train.columns
original_feature_names = 'experience_level','employment_type','job_title','company_size','12 months','company_location','employee_residence'

if feature_importances.form[0] != len(original_feature_names):
feature_importances = feature_importances[:len(original_feature_names)]

importance_df = pd.DataFrame({'Characteristic': original_feature_names, 'Significance': feature_importances})
importance_df = importance_df.groupby('Characteristic', as_index=False).sum()
importance_df = importance_df.sort_values('Significance', ascending=False)

plt.determine(figsize=(10, 6))
sns.barplot(knowledge=importance_df, x='Significance', y='Characteristic', palette='viridis')
plt.xlabel('Significance')
plt.ylabel('Characteristic')
plt.title('Characteristic Importances - Ridge Regression')
plt.present()

Picture Courtesy: Creator

Making Predictions

Making a prediction utilizing the ridge regression mannequin.

import joblib
import numpy as np

def make_prediction(feature_values):
feature_names = ['experience_level_EN', 'experience_level_EX', 'experience_level_MI',
'experience_level_SE','employment_type_CT', 'employment_type_FL', 'employment_type_FT',
'employment_type_PT', 'job_title_data engineer','job_title_data analyst',
'job_title_data scientist', 'job_title_machine learning engineer',
'company_size_M', 'company_size_S', 'company_size_L','employee_residence_Africa',
'employee_residence_Asia', 'employee_residence_Europe',
'employee_residence_North America', 'employee_residence_Oceania',
'employee_residence_South America', 'company_location_Africa',
'company_location_Asia', 'company_location_Europe',
'company_location_North America', 'company_location_Oceania',
'company_location_South America', 'year_2020', 'year_2021',
'year_2022', 'year_2023']
# Create a numpy array for the enter knowledge
input_data = np.array([[feature_values['employee_residence'] == 'Africa',
feature_values['employee_residence'] == 'Asia',
feature_values['employee_residence'] == 'Europe',
feature_values['employee_residence'] == 'North America',
feature_values['employee_residence'] == 'Oceania',
feature_values['employee_residence'] == 'South America',
feature_values['company_location'] == 'Africa',
feature_values['company_location'] == 'Asia',
feature_values['company_location'] == 'Europe',
feature_values['company_location'] == 'North America',
feature_values['company_location'] == 'Oceania',
feature_values['company_location'] == 'South America',
feature_values['experience_level'] == 'EN',
feature_values['experience_level'] == 'EX',
feature_values['experience_level'] == 'MI',
feature_values['experience_level'] == 'SE',
feature_values['employment_type'] == 'CT',
feature_values['employment_type'] == 'FL',
feature_values['employment_type'] == 'FT',
feature_values['employment_type'] == 'PT',
feature_values['job_title'] == 'knowledge analyst',
feature_values['job_title'] == 'knowledge engineer',
feature_values['job_title'] == 'knowledge scientist',
feature_values['job_title'] == 'machine studying engineer',
feature_values['company_size'] == 'M',
feature_values['company_size'] =='S',
feature_values['company_size'] =='L',
feature_values['year'] == 2020,
feature_values['year'] == 2021,
feature_values['year'] == 2022,
feature_values['year'] == 2023]])
prediction = mannequin.predict(input_data)

ranges = [(15000 , 48875),
(48875 , 82750),
(82750 , 116625),
(116625 , 150500),
(150500 , 184375),
(184375 , 218250),
(218250 , 252125),
(252125 , 286000)]
prediction_range = None
for range_min, range_max in ranges:
if range_min <= prediction < range_max:
prediction_range = f"{range_min:,} - {range_max:,}"
break

return prediction_range

An instance of a prediction.

input_features = {

'experience_level': 'EX',
'employment_type': 'FL',
'job_title': 'knowledge scientist',
'12 months': 2023,
'company_size': 'L',
'employee_residence': 'Europe',
'company_location' :'Europe',
}

prediction = make_prediction( input_features)

print("Prediction:", prediction)

The result’s ‘Prediction: 15,000–48,875’.

Conclusion

The mannequin would carry out higher with extra knowledge and extra options. Contributions are welcome.

The stay Streamlit app is here.

The hyperlink to the GitHub repo.

Previous Post

Conseils de sécurité et meilleures pratiques

Next Post

If you happen to didn’t already know

Next Post
Should you didn’t already know

If you happen to didn't already know

Trending Stories

Opening up a physics simulator for robotics

septembre 25, 2023
Defending Your Data in a Linked World

Defending Your Data in a Linked World

septembre 25, 2023
Educating with AI

Educating with AI

septembre 24, 2023
Optimizing Information Storage: Exploring Information Sorts and Normalization in SQL

Optimizing Information Storage: Exploring Information Sorts and Normalization in SQL

septembre 24, 2023
Efficient Small Language Fashions: Microsoft’s 1.3 Billion Parameter phi-1.5

Efficient Small Language Fashions: Microsoft’s 1.3 Billion Parameter phi-1.5

septembre 24, 2023
Matplotlib Tutorial: Let’s Take Your Nation Maps to One other Degree | by Oscar Leo | Sep, 2023

Matplotlib Tutorial: Let’s Take Your Nation Maps to One other Degree | by Oscar Leo | Sep, 2023

septembre 24, 2023

Automating with robots – study extra about find out how to get began

septembre 24, 2023

Welcome to Rosa-Eterna The goal of The Rosa-Eterna is to give you the absolute best news sources for any topic! Our topics are carefully curated and constantly updated as we know the web moves fast so we try to as well.

Categories

  • Artificial Intelligence
  • Computer Vision
  • Data Mining
  • Intelligent Agents
  • Machine Learning
  • Natural Language Processing
  • Robotics

Recent News

Opening up a physics simulator for robotics

septembre 25, 2023
Defending Your Data in a Linked World

Defending Your Data in a Linked World

septembre 25, 2023
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions

Copyright © 2023 Rosa Eterna | All Rights Reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
    • Robotics
  • Intelligent Agents
    • Data Mining
  • Machine Learning
    • Natural Language Processing
  • Computer Vision
  • Contact Us
  • Desinscription

Copyright © 2023 Rosa Eterna | All Rights Reserved.