Picture by Writer

Ever questioned how the newer, extra superior GPT-4 compares to its predecessor, ChatGPT 3.5?

What’s it about GPT-4 that makes it stand out, and the place does it attain its limitations?

As we delve into the restrictions of language fashions and their potential functions, it is essential to understand these nuances.

On this journey via the panorama of synthetic intelligence, we’ll spotlight the distinctive options of ChatGPT 4, its progressive plug-ins, and the way they’re revolutionizing the best way we remedy technical questions and issues within the subject of information science.

From addressing the complexity of statistics and chance to wrestling with complicated modeling issues and coding queries in Python, SQL, or R, we’ll discover all of the methods GPT-4 can help in information science.

So, buckle up as we embark on this explorative journey, peeling again the layers of GPT-4 and its capability to reshape our method to information science!

GPT-3.5, also called ChatGPT, was made public by OpenAI in November 2022.

It grew to become in style instantly worldwide. It even grew to become so in style that it exceeded server capability at instances, which led OpenAI to introduce a paid model, ChatGPT Plus, to handle the demand.

Transferring on from GPT-3.5, GPT-4 was launched on 14 March, and it has a big improve.

It has superior reasoning capabilities, and even when it will possibly generate that means from information as an alternative of textual content (like photos), the present model of ChatGPT+ remains to be textual content output solely.

The issue-solving functionality of GPT-4 is extra correct than GPT-3.5.

Presently, the paid model of ChatGPT+ prices $20 per thirty days. But it has its restrictions, just like the cap of 25 messages each 3 hours.

Additionally, Microsoft’s search engine Bing has been working a personalized model of GPT-4.

For extra data, here is the official weblog submit about GPT-4 on the OpenAI web site.

Supply: https://chat.openai.com/

OpenAI unveiled a brand new characteristic for ChatGPT – plugins – on March 23.

These instruments are designed to enhance the language mannequin’s capabilities, permitting it to entry the latest data and carry out computations.

The rollout of those plugins is being performed step by step to evaluate their sensible use, impression, and potential security and alignment points.

Varied corporations, together with KAYAK, Klarna, Milo, Shopify, Slack, Communicate, Wolfram, Zapier, and extra, have developed the preliminary set of plugins.

These plugins provide a broad spectrum of functionalities, equivalent to journey planning, grocery ordering, computation entry, and real-time information retrieval.

OpenAI additionally offers two plugins: a web browser and a code interpreter.

Whereas the online browser lets you attain the online with the ability of the GPT mannequin, a code interpreter will allow you to manipulate and analyze your information by simply prompting.

A few of these plug-ins, like Noteable and Wolfram, are particularly helpful in information science.

These plugins are meant to extend the utility of ChatGPT.

We will divide information science questions into non-coding and coding interview questions.

By testing ChatGPT4’s capacity in these sections, we’ll higher perceive how ChatGPT4 can be utilized to reply technical questions in information science.

Let’s begin with the non-coding questions.

## Non-coding Questions

Right here, we’ll divide the non-coding questions into statistics, chance, and modeling.

Picture by Writer

### Statistics

Statistics is important to information science, encompassing information gathering, evaluation, interpretation, show, and group.

GPT-4 is supplied to reply questions on statistical ideas, providing a complete understanding of complicated statistical ideas.

Utilizing the GPT-4 this fashion can enhance the understanding of statistical theories and empower information scientists to extract significant insights from information.

Right here is the statistics question that we’ll use for testing the Chat’s skills. We’re going to use the Wolfram plug-in, which assists customers when doing calculations.

Additionally, one essential be aware when utilizing ChatGPT-4 and its plugins: They could be good, however they’re not good so, all the time, all the time double-check the output as a result of it may be fallacious.

**Immediate**: Why is MSE ceaselessly used as a loss operate for Linear Regression?

(Use Wolfram Plug-in, present formulation)

**ChatGPT** : The components for MSE is:

the place:

Listed here are the explanation why MSE is usually used:

**Differentiability**: MSE is a easy and differentiable operate. That is essential as a result of optimization algorithms, like gradient descent, depend on the operate’s spinoff to search out the minimal.**Sensitivity to outliers**: The squaring time period in MSE offers greater weight to bigger errors. This implies the mannequin will probably be extra delicate to outliers and can attempt to reduce massive errors.**Interpretability**: MSE is simple to interpret because it represents the common squared distinction between the expected and precise values. A decrease MSE signifies a greater match of the mannequin to the information.**Convexity**: MSE results in a convex loss panorama. Convex issues have a single world minimal, making them simpler to unravel utilizing optimization algorithms.**Consistency with OLS**: Within the case of linear regression, minimizing the MSE is equal to minimizing the residuals in Odd Least Squares (OLS), which is the usual methodology for estimating the parameters of a linear regression mannequin.

Observe from the creator: Regardless that MSE has these benefits, it isn’t all the time your best option for each regression downside. Relying on the precise traits of the information and the issue at hand, different loss capabilities could be extra acceptable.

**Management**

The ChatGPT’s reply is detailed and proper.

### Chance

Chance is integral to information science because it provides a construction for making information predictions and comprehending uncertainty. It could possibly illustrate these ideas intimately, explaining the mathematical computations and logic behind the chance of a particular occasion.

This assists information scientists in constructing fashions and making clever choices grounded in probabilistic reasoning.

Right here is the probability question ChatGPT will attempt to remedy.

**Immediate**: You might be enjoying a recreation with a good friend the place you roll two honest six-sided cube. If the sum of the 2 cube is 7 or 11, you win; in any other case, your good friend wins. What’s the chance that you just win in your first roll?

**ChatGPT**: To find out the chance of profitable in your first roll, we have to calculate the variety of favorable outcomes (profitable outcomes) and divide it by the entire variety of doable outcomes.

On this case, there are 6 doable outcomes for every die since they’re honest six-sided cube. Subsequently, the entire variety of outcomes when rolling two cube is: 66 =36

To win on the primary roll, the sum of the 2 cube have to be both 7 or 11. We will calculate the variety of favorable outcomes as follows:

- Sum of seven: There are 6 methods to realize this final result: (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), and (6, 1).
- Sum of 11: There are 2 methods to realize this final result: (5, 6) and (6, 5).

Subsequently, there are a complete of 6 + 2 = 8 favorable outcomes.

Now, we are able to calculate the chance of profitable on the primary roll by dividing the variety of favorable outcomes by the entire variety of doable outcomes:

Chance of profitable on the primary roll = Variety of favorable outcomes / Complete Numbers of Doable Outcomes = 8 / 22 = 2 / 9 = 0,222

So, the chance that you just win in your first roll is roughly 0.222 or 22.2%.

**Management**

The output is right and matches the answer from the platform.

### Modeling

Modeling is the method of forming a mathematical depiction of a system to investigate and forecast habits.

It could possibly assist information scientists comprehend ideas, assumptions, implementation, and interpretation of their outcomes. This results in improved predictions and comprehension of the information.

Right here is the modeling question we’re going to take a look at.

**Immediate** : How does the Okay-means algorithm work?

**ChatGPT** : The Okay-means algorithm is an iterative clustering algorithm that goals to partition a given dataset into Okay clusters, the place Okay is a user-defined parameter.

It really works as follows:

**Initialization**: Randomly choose Okay information factors from the dataset because the preliminary centroids (consultant factors) of the clusters.**Project**: Assign every information level to the closest centroid based mostly on a distance metric, often the Euclidean distance. This step varieties Okay clusters.**Replace**: Recalculate the centroids of the clusters by taking the imply of all information factors assigned to every cluster. This step goals to search out higher centroids that symbolize the middle of every cluster.**Iteration**: Repeat the project and replace steps iteratively till convergence. Convergence happens when both the centroids cease altering considerably or the utmost variety of iterations is reached.**Output**: The algorithm returns the ultimate Okay clusters, the place every information level belongs to the cluster with the closest centroid.

The Okay-means algorithm seeks to attenuate the within-cluster sum of squared distances, also called the inertia or the entire intra-cluster variance. It assumes that the clusters are spherical and have equal variance. Nevertheless, the algorithm could converge to suboptimal options relying on the initialization of centroids.

To boost the efficiency of Okay-means, numerous strategies may be utilized, equivalent to utilizing a number of initializations, performing smarter centroid initialization strategies (e.g., Okay-means++), or utilizing the Elbow methodology to find out the optimum worth of Okay.

It is essential to notice that Okay-means is delicate to the preliminary selection of centroids and will wrestle with datasets containing non-linear or irregularly formed clusters. In such instances, alterntive clustering algorithms like DBSCAN or Gaussian Combination Fashions (GMM) could also be extra appropriate.

**Management**

The ChatGPT’s reply is detailed and proper.

## Coding Questions

Coding is obligatory for information scientists. It consists of utilizing Python for information wrangling and machine studying, SQL for database dealing with, and R for statistical computations. GPT-4 can help in resolving coding points.

Picture by Writer

Let’s take a look at ChatGPT-4 on coding questions, beginning with Python.

### Python

Python is a well-liked language within the information science group as a result of its user-friendly nature and the presence of strong libraries equivalent to pandas, numpy, and scikit-learn.

GPT-4 can help in debugging Python code, recommending greatest coding practices, and even producing code snippets for specific duties.

Now let’s attempt it with the next query.

**Query 1**

Right here is the question.

**Immediate:**

My dataset identify is airbnb_search_details and right here is extra data;

Unnamed: 0:int64

id:int64

worth:float64

property_type: object

room_type: object

facilities:object

accommodates:int64

loos:int64

bed_type:object

cancellation_policy:object

cleaning_fee:bool

metropolis:object

host_identity_verified:object

host_response_rate:object

host_since:datetime64[ns]

neighbourhood:object

number_of_reviews:int64

review_scores_rating:float64

zipcode:int64

bedrooms:int64

beds:int64

The query is: Discover all searches for lodging the place the variety of bedrooms is the same as the variety of loos. Guarantee that the anticipated output sort is a dataframe.

**Observe from the creator:** To feed the dataset into the Chat, simply copy it from the platform like proven under.

**ChatGPT: **To search out all searches for lodging the place the variety of bedrooms is the same as the variety of loos, you should utilize the next code in Python:

```
import pandas as pd
# Assuming 'airbnb_search_details' is the DataFrame containing your dataset
# Exchange 'airbnb_search_details' with the precise identify of your DataFrame if totally different
# Filter the DataFrame based mostly on the situation: variety of bedrooms == variety of loos
consequence = airbnb_search_details[airbnb_search_details['bedrooms'] == airbnb_search_details['bathrooms']]
# Show the ensuing DataFrame
print(consequence)
```

**Management**

Really, most of ChatGPT’s reply is true besides then the final line. There is no such thing as a must print out the consequence dataframe utilizing the print operate, so the answer works when the code is up to date as follows.

```
import pandas as pd
# Assuming 'airbnb_search_details' is the DataFrame containing your dataset
# Exchange 'airbnb_search_details' with the precise identify of your DataFrame if totally different
# Filter the DataFrame based mostly on the situation: variety of bedrooms == variety of loos
consequence = airbnb_search_details[airbnb_search_details['bedrooms'] == airbnb_search_details['bathrooms']]
# Show the ensuing DataFrame
consequence
```

**Query 2**

Right here is the question from LeetCode I’ll feed into the ChatGPT-4.

I’m going to principally copy and paste all the data in my immediate.

**Immediate**:

Given an integer array nums and an integer val, take away all occurrences of val in nums in-place. The order of the weather could also be modified. Then return the variety of components in nums which aren’t equal to val.

Take into account the variety of components in nums which aren’t equal to val be ok, to get accepted, you could do the next issues:

Change the array nums such that the primary ok components of nums comprise the weather which aren’t equal to val. The remaining components of nums should not essential in addition to the dimensions of nums.

Return ok.

Customized Choose:

The choose will take a look at your answer with the next code:

int[] nums = […]; // Enter array

int val = …; // Worth to take away

int[] expectedNums = […]; // The anticipated reply with right size.

// It’s sorted with no values equaling val.

int ok = removeElement(nums, val); // Calls your implementation

assert ok == expectedNums.size;

kind(nums, 0, ok); // Kind the primary ok components of nums

for (int i = 0; i < actualLength; i++) {

assert nums[i] == expectedNums[i];

}

If all assertions cross, then your answer will probably be accepted.

**Instance 1:**

Enter: nums = [3,2,2,3], val = 3

Output: 2, nums = [2,2,_,_]

Clarification: Your operate ought to return ok = 2, with the primary two components of nums being 2.

It doesn’t matter what you permit past the returned ok (therefore they’re underscores).

**Instance 2:**

Enter: nums = [0,1,2,2,3,0,4,2], val = 2

Output: 5, nums = [0,1,4,0,3,_,_,_]

Clarification: Your operate ought to return ok = 5, with the primary 5 components of nums containing 0, 0, 1, 3, and 4.

Observe that the 5 components may be returned in any order.

It doesn’t matter what you permit past the returned ok (therefore they’re underscores).

**Constraints:**

0 <= nums.size <= 100

0 <= nums[i] <= 50

0 <= val <= 100

The answer format ;

class Answer:

def removeElement(self, nums: Record[int], val: int) -> int:

**ChatGPT:** This is the answer in Python:

```
class Answer:
def removeElement(self, nums: Record[int], val: int) -> int:
ok = 0 # Counter for components not equal to val
# Iterate via the array
for i in vary(len(nums)):
if nums[i] != val:
nums[k] = nums[i] # Transfer the component to the kth place
ok += 1
return ok
```

The answer makes use of a two-pointer method to change the nums array in-place and depend the variety of components that aren’t equal to val. The variable ok is used to maintain monitor of the place the place the following non-val component needs to be positioned.

The algorithm iterates via the array and every time it encounters a component that’s not equal to val, it’s moved to the kth place, and ok is incremented. Ultimately, ok represents the variety of components not equal to val.

You possibly can take a look at the answer utilizing the supplied customized choose code to confirm if it produces the anticipated output for the given examples and passes the assertions.

**Management**

The code is working completely while not having any updates.

### SQL

SQL is important for working with databases. GPT-4 may help formulate SQL queries, improve database effectivity, and grasp intricate SQL ideas.

**Query 1**

Right here is the SQL question we’ll attempt to remedy.

**Immediate **: Right here is the query: Following a latest promoting marketing campaign, the advertising and marketing division needs to categorise its efforts based mostly on the entire variety of models offered for every product.

You could have been tasked with calculating the entire variety of models offered for every product and categorizing advert efficiency based mostly on the next standards for gadgets offered:

Excellent: 30+

Passable: 20 – 29

Unsatisfactory: 10 – 19

Poor: 1 – 9

Your output ought to comprise the product ID, whole models offered in descending order, and its categorized advert efficiency.

My dataset identify is marketing_campaign and listed below are the information varieties.

marketing_campaign

user_id: int

created_at: datetime

product_id: int

amount: int

worth: int

Right here is the anticipated output:

product_id total_sold ad_performance

105 41 Excellent

102 29 Passable

114 23 Passable

118 22 Passable

120 21 Passable

Use mysql to unravel this.

**ChatGPT**: To unravel this process utilizing MySQL, you possibly can write a question to calculate the entire variety of models offered for every product and categorize the advert efficiency based mostly on the given standards.

This is the question:

```
SELECT
product_id,
SUM(amount) AS total_sold,
CASE
WHEN SUM(amount) >= 30 THEN 'Excellent'
WHEN SUM(amount) BETWEEN 20 AND 29 THEN 'Passable'
WHEN SUM(amount) BETWEEN 10 AND 19 THEN 'Unsatisfactory'
WHEN SUM(amount) BETWEEN 1 AND 9 THEN 'Poor'
END AS ad_performance
FROM
marketing_campaign
GROUP BY
product_id
ORDER BY
total_sold DESC;
```

On this question, we choose the product_id, calculate the sum of portions as total_sold, and use a CASE assertion to categorize the advert efficiency based mostly on the total_sold worth. The question teams the outcomes by product_id, orders them in descending order of total_sold, and offers the specified output.

Be sure to switch « marketing_campaign » with the precise identify of your desk within the database.

Working this question offers you the anticipated output with the product_id, total_sold, and ad_performance categorized in accordance with the factors.

**Management**

The code is completely working while not having any updates.

**Query 2**

Right here is the second SQL question taken from LeetCode.

**Immediate** : Desk: Gross sales

+————-+——-+

| Column Title | Sort |

+————-+——-+

| sale_id | int |

| product_id | int |

| yr | int |

| amount | int |

| worth | int |

+————-+——-+

(sale_id, yr) is the first key of this desk. product_id is a overseas key to Product desk.

Every row of this desk exhibits a sale on the product product_id in a sure yr. Observe that the worth is per unit.

Desk: Product

+————–+———+

| Column Title | Sort |

+————–+———+

| product_id | int |

| product_name | varchar |

+————–+———+

product_id is the first key of this desk.

Every row of this desk signifies the product identify of every product.

Write an SQL question that selects the product id, yr, amount, and worth for the primary yr of each product offered.

Return the ensuing desk in any order.

The question consequence format is within the following instance.

Instance 1:

Enter:

Gross sales desk:

+———+————+——+———-+——-+

| sale_id | product_id | yr | amount | worth |

+———+————+——+———-+——-+

| 1 | 100 | 2008 | 10 | 5000 |

| 2 | 100 | 2009 | 12 | 5000 |

| 7 | 200 | 2011 | 15 | 9000 |

+———+————+——+———-+——-+

Product desk:

+————+————–+

| product_id | product_name |

+————+————–+

| 100 | Nokia |

| 200 | Apple |

| 300 | Samsung |

+————+————–+

Output:

+————+————+———-+——-+

| product_id | first_year | amount | worth |

+————+————+———-+——-+

| 100 | 2008 | 10 | 5000 |

| 200 | 2011 | 15 | 9000 |

+————+————+———-+——-+

**ChatGPT: **To pick the product id, yr, amount, and worth for the primary yr of each product offered, you should utilize the next SQL question:

```
SELECT s.product_id, s.yr AS first_year, s.amount, s.worth
FROM Gross sales s
JOIN (
SELECT product_id, MIN(yr) AS first_year
FROM Gross sales
GROUP BY product_id
) t ON s.product_id = t.product_id AND s.yr = t.first_year
ORDER BY s.product_id;
```

Clarification:

- The subquery (SELECT product_id, MIN(yr) AS first_year FROM Gross sales GROUP BY product_id) selects the minimal yr for every product utilizing the MIN() operate and teams the outcomes by product_id.
- The principle question joins the Gross sales desk with the subquery based mostly on the matching product_id and yr columns to retrieve the gross sales information for the primary yr of every product.
- The chosen columns, together with product_id, yr as first_year, amount, and worth, are returned from the Gross sales desk for the matching information.
- The result’s ordered by product_id in ascending order.

The question will return the product id, first yr, amount, and worth for the primary yr of each product offered, as proven within the instance output.

**Management**

The code is completely working while not having any updates.

### R

R is a language purpose-built for statistical evaluations and information visualization. GPT-4 can assist comprehending the subtleties of the R language, debugging the R code, and using numerous R packages for information evaluation.

**Query**

Right here is the coding question.

**Immediate:**

Right here is the query: For every visitor reviewer, discover the nationality of the reviewer’s favourite host based mostly on the visitor’s highest evaluate rating given to a number. Output the person ID of the visitor together with their favourite host’s nationality. In case there’s multiple favourite host from the identical nation, listing that nation solely as soon as (take away duplicates).

Each the from_user and to_user columns are person IDs.

Right here is the anticipated output;

from_user nationality

0 Luxembourg

1 Brazil

2 Australia

3 Australia

4 Brazil

Listed here are the information frames;

airbnb_reviews

from_user:int64

to_user:int64

from_type: object

to_type: object

review_score:int64

airbnb_hosts

host_id: int64

nationality: object

gender: object

age:int64

Resolve this with R.

Hints:

Choose particular column/s from dataframe utilizing [column_name] then choose rows with values equal to ‘==’ visitor

Use .groupby(column_name) to group the dataframe concerning the specified column and max() to get the utmost rating given per group

Carry out inside be a part of on authentic df and grouped df to get the corresponding attributes/particulars of the grouped df through the use of pd.merge(dataframe1, dataframe2, on = common_table_keys)

Carry out inside be a part of on ensuing df and airbnb_hosts

Use [ [ column_name/s] ] to pick out a specified column of the dataframe then use drop_duplicates(column_name) to get distinct values within the specified column.

**ChatGPT**: To unravel this process utilizing R, you possibly can observe these steps:

- Choose the required columns from the ‘airbnb_reviews’ dataframe and filter the rows the place ‘from_type’ is ‘visitor’.
- Group the filtered dataframe by ‘from_user’ and get the utmost evaluate rating for every visitor utilizing the max() operate.
- Carry out an inside be a part of between the unique dataframe and the grouped dataframe based mostly on the ‘from_user’ and ‘review_score’ columns to get the corresponding rows with the utmost evaluate rating for every visitor.
- Carry out one other inside be a part of between the ensuing dataframe and the ‘airbnb_hosts’ dataframe based mostly on the ‘to_user’ and ‘host_id’ columns to get the nationality of the favourite host.
- Choose the ‘from_user’ and ‘nationality’ columns from the ultimate joined dataframe.
- Use the distinct() operate to take away duplicate rows based mostly on the ‘nationality’ column.

This is the code to realize this:

```
# Choose needed columns from airbnb_reviews and filter rows the place from_type is 'visitor'
filtered_reviews <- airbnb_reviews[airbnb_reviews$from_type == 'guest', c('from_user', 'to_user', 'review_score')]
# Group by from_user and get the utmost evaluate rating for every visitor
max_scores <- mixture(review_score ~ from_user, filtered_reviews, max)
# Carry out inside be a part of to get the rows with the utmost evaluate rating for every visitor
joined_df <- merge(filtered_reviews, max_scores, by = c('from_user', 'review_score'))
# Carry out inside be a part of with airbnb_hosts to get the nationality of the favourite host
final_df <- merge(joined_df, airbnb_hosts, by.x = 'to_user', by.y = 'host_id')
# Choose the required columns from the ultimate joined dataframe
consequence <- final_df[, c('from_user', 'nationality')]
# Take away duplicate rows based mostly on nationality
distinct_result <- distinct(consequence, .keep_all = TRUE)
# Print the ultimate consequence
distinct_result
```

Working this code offers you the anticipated output with the ‘from_user’ (visitor ID) and the corresponding ‘nationality’ of their favourite host.

**Management**

The code is completely working while not having any updates.

## Bonus: Doing Knowledge Undertaking With ChatGPT

The data project we’ll test ChatGPT on has been used as a take-home project within the recruitment course of for the information science positions at Uber.

General, there are two totally different situations;

**Situation 1**: It’ll be an enormous Saturday and there’ll have to be many extra vehicles on the highway than final week. So as to get drivers to go surfing, we’re assessing the next two bonus choices by way of value:

- Possibility 1: $50 for every driver that’s on-line not less than 8 hours, accepts 90% of requests, full 10 journeys, and has a ranking of 4.7 or higher throughout the time-frame;
- Possibility 2: $4/journey for all drivers who full 12 journeys, and have a 4.7 or higher ranking.

**Situation 2**: A taxi driver presently generates $200 per day in fares (earlier than bills), works six days every week, takes three weeks off, and has the next bills:

- Gasoline – $200 per week
- Insurance coverage – $400 per thirty days
- Car hire (by the week) – $500

The driving force would not pay fuel and hire bills on off weeks.

**ChatGPT File Uploader**

Now, to do an information mission, we want the ChatGPT File Uploader Chrome extension. It is going to permit you to add information to ChatGPT.

You don’t want it if you have already got a Code Interpreter plug-in – merely add a file utilizing this plug-in.

For extra details about it, please watch this video.

After we set the File Uploader extension, let’s attempt to remedy this information mission.

Open ChatGPT and click on the button of this extension.

You will note the submit button as proven under.

Supply: https://chat.openai.com/

Now, let’s feed ChatGPT with the dataset first through the use of our dataset file and clicking the Submit File button first.

Supply: https://chat.openai.com/

After loading the dataset, ChatGPT will immediate the dataset.

Supply: https://chat.openai.com/

Now, right here is the immediate you must use to unravel this information mission.

**Immediate:** Given the operational situations and the supplied dataset, reply the next questions through the use of Python

**Project**

Situation 1: It’ll be an enormous Saturday and there’ll have to be many extra vehicles on the highway than final week. So as to get drivers to go surfing, we’re assessing the next two bonus choices by way of value:

Possibility 1: $50 for every driver that’s on-line not less than 8 hours, accepts 90% of requests, completes 10 journeys,, and has a ranking of 4.7 or higher throughout the time-frame;

Possibility 2: $4/journey for all drivers who full 12 journeys, and have a 4.7 or higher ranking.

Utilizing the dataset supplied and given Situation 1, present solutions to the questions under:

How a lot would the entire bonus payout be with Possibility 1?

How a lot would the entire bonus payout be with Possibility 2?

What number of drivers would qualify for a bonus beneath Possibility 1 however not beneath Possibility 2?

What percentages of drivers on-line accomplished lower than 10 journeys, had an acceptance charge of lower than 90%, and had a ranking of 4.7 or greater?

Situation 2: A taxi driver presently generates $200 per day in fares (earlier than bills), works six days every week, takes three weeks off, and has the next bills:

Gasoline – $200 per week

Insurance coverage – $400 per thirty days

Car hire (by the week) – $500

The driving force would not pay fuel and hire bills on off weeks.

Now, let’s assume that the identical driver would purchase a City Automotive and accomplice with Uber. If he does, his fuel bills would go up by 5%, his insurance coverage expense would lower by 20%, and he would not be renting a car. Nevertheless, he would want to purchase a automotive. The driving force would nonetheless take three weeks off per yr.

Given Situation 2, present solutions to the questions under:

How a lot cash (after bills) does the taxi driver make per yr with out partnering with Uber?

You might be convincing the identical driver above to purchase a City Automotive and accomplice with Uber. Assuming the brand new automotive is 40,000 USD, how a lot would the motive force’s gross fares want to extend per week to completely pay for the automotive in yr 1 and keep the identical yearly revenue margin as earlier than?

**ChatGPT:**

To carry out these computations, we first must load and preprocess the dataset. We’ll use Python’s pandas library to load the CSV file and parse the required columns.

Sadly, as a text-based mannequin, I’m unable to immediately execute code. However, I can offer you the Python code you should utilize to reply your questions:

This is the code you would want to reply your questions for Situation 1:

```
import pandas as pd
# Load dataset
information = pd.read_csv('dataset_2.csv')
# Convert 'Settle for Charge' from string share to drift
information['Accept Rate'] = information['Accept Rate'].str.rstrip('%').astype('float') / 100.0
# Calculate bonus beneath each choices
information['Option 1 Bonus'] = ((information['Supply Hours'] >= 8) & (information['Accept Rate'] >= 0.9) &
(information['Trips Completed'] >= 10) & (information['Rating'] >= 4.7)) * 50
information['Option 2 Bonus'] = ((information['Trips Completed'] >= 12) & (information['Rating'] >= 4.7)) * information['Trips Completed'] * 4
# Compute whole payout for every choice
total_payout_option1 = information['Option 1 Bonus'].sum()
total_payout_option2 = information['Option 2 Bonus'].sum()
# Compute variety of drivers who qualify beneath Possibility 1 however not Possibility 2
num_option1_only = ((information['Option 1 Bonus'] > 0) & (information['Option 2 Bonus'] == 0)).sum()
# Compute share of drivers who full lower than 10 journeys, had an acceptance charge of lower than 90% and had a ranking of 4.7 or greater
percent_undesirable_drivers = ((information['Trips Completed'] < 10) & (information['Accept Rate'] < 0.9) & (information['Rating'] >= 4.7)).imply() * 100
print('Complete bonus payout for Possibility 1: $', total_payout_option1)
print('Complete bonus payout for Possibility 2: $', total_payout_option2)
print('Variety of drivers who qualify beneath Possibility 1 however not Possibility 2: ', num_option1_only)
print('Share of drivers who full lower than 10 journeys, had an acceptance charge of lower than 90%, and had a ranking of 4.7 or greater: ', percent_undesirable_drivers)
```

This is the code to reply your questions for Situation 2:

```
# Calculate the taxi driver's yearly earnings and bills
daily_income = 200
weekly_income = daily_income * 6 # Working 6 days every week
yearly_income = weekly_income * 52 # Working all yr
yearly_expenses = (200 * 49) + (400 * 12) + (500 * 49) # No bills on off weeks
yearly_net = yearly_income - yearly_expenses
# Calculate the brand new bills and web earnings if the motive force companions with Uber
new_yearly_expenses = (200 * 1.05 * 49) + (400 * 0.8 * 12) # 5% enhance in fuel, 20% lower in insurance coverage
new_yearly_net = yearly_income - new_yearly_expenses - 40000 # Subtract automotive value
# Calculate the rise in gross fares needed to take care of the identical revenue margin
additional_income_required = yearly_net - new_yearly_net
additional_income_required_weekly = additional_income_required / 52
print('Taxi driver yearly web earnings: $', yearly_net)
print('Required enhance in weekly gross fares: $', additional_income_required_weekly)
```

You possibly can change the file path within the read_csv operate with the precise path to your dataset_2.csv file, after which run these code snippets in your native Python atmosphere.

**Management**

First Situation’s output is completely right (the primary 4 questions) Nevertheless, the second situation’s output is just not right.

For instance, it calculates that the motive force works 52 weeks (all yr); nonetheless, the motive force takes 3 weeks off.

That’s why you could watch out concerning the calculations that ChatGPT makes and verify the steps it follows to make sure concerning the consequence.

After a couple of tries, right here is the code that outputs the proper consequence that ChatGPT offers.

```
# Weeks off per yr
weeks_off = 3
# Complete weeks per yr
total_weeks_per_year = 52
# Weeks of labor per yr
working_weeks = total_weeks_per_year - weeks_off
# Complete months per yr
total_months_per_year = 12
# Bills
gas_per_week = 200
insurance_per_month = 400
vehicle_rent_per_week = 500
# Income
fare_per_day = 200
work_days_per_week = 6
total_revenue = fare_per_day * work_days_per_week * working_weeks
# Complete Bills
total_expenses = (gas_per_week + vehicle_rent_per_week) * working_weeks + insurance_per_month * total_months_per_year
# Internet Earnings
net_income = total_revenue - total_expenses
print("Internet Earnings per yr: $", net_income)
# For Query 6:
# Bills adjustments
new_gas_per_week = gas_per_week * 1.05 # 5% enhance
new_insurance_per_month = insurance_per_month * 0.8 # 20% lower
car_cost = 40000
# New Complete Bills
new_total_expenses = car_cost + (new_gas_per_week * working_weeks) + (new_insurance_per_month * total_months_per_year)
# To take care of the identical web earnings, the brand new income needs to be:
new_total_revenue = new_total_expenses + net_income
# This interprets to a required enhance in weekly fares:
additional_fare_per_week = (new_total_revenue - total_revenue) / working_weeks
print("Required enhance in weekly fares: $", additional_fare_per_week)
```

This exploration has make clear the progressive functions and inherent limitations of GPT-4, significantly within the realm of information science.

Whether or not tackling complicated statistical and chance issues, modeling points, or fielding coding queries, GPT-4 has proven promising capabilities.

Nevertheless, it is important to do not forget that whereas GPT-4 offers a beneficial device, human oversight and understanding stay crucial to make sure high quality outcomes, particularly in mathematical calculations.

This exploration is barely the start of the GPT-4 journey, as additional developments and refinements are all the time on the horizon. Sam Altman, the OpenAI CEO, mentioned that sooner or later mannequin (GPT5), OpenAI plans to unravel the issue with inaccurate calculations and the cap on the variety of prompts.

Proceed to remain curious, maintain exploring, and keep in mind, the one restrict to AI’s potential is the one we set for it.

**Nate Rosidi** is an information scientist and in product technique. He is additionally an adjunct professor instructing analytics, and is the founding father of StrataScratch, a platform serving to information scientists put together for his or her interviews with actual interview questions from prime corporations. Join with him on Twitter: StrataScratch or LinkedIn.