Similarity Evaluation
Subsequent, I wished to take a look at the similarities between every batch of the generated opinions and the unique opinions. To do that, we are able to use cosine similarity to calculate how comparable the totally different sentence vectors from every supply are. First, we are able to create a cosine similarity matrix that may first rework our sentences into vectors utilizing TfidVectorizer() after which calculate the cosine similarity between the 2 new sentence vectors.
def cosine_similarity(sentence1, sentence2):
"""
A perform that accepts two sentences as enter and outputs their cosine
similarityInputs:
sentence1 (str): A string of phrase
sentence2 (str): A string of phrases
Returns:
cosine_sim: Cosine similarity rating for the 2 enter sentences
"""
# Initialize the TfidfVectorizer
vectorizer = TfidfVectorizer()
# Create the TF-IDF matrix
tfidf_matrix = vectorizer.fit_transform([sentence1, sentence2])
# Calculate the cosine similarity
cosine_sim = cosine_similarity(tfidf_matrix[0], tfidf_matrix[1])
return cosine_sim[0][0]
One drawback I had was the datasets had been now so huge that the calculations had been taking too lengthy (and generally I didn’t have sufficient RAM on Google Colab to proceed). To fight this challenge, I randomly sampled 200 opinions from every of the datasets for calculating the similarity.
#Random Pattern 200 Evaluations
o_review = pattern(reviews_dict['original review'],200)
p_review = pattern(reviews_dict['fake positive review'],200)
n_review = pattern(reviews_dict['fake negative review'],200)r_dict = {'unique evaluate': o_review,
'faux optimistic evaluate': p_review,
'faux adverse evaluate':n_review}
Now that we’ve the randomly chosen samples, we are able to take a look at cosine similarities between totally different combos of the datasets.
#Cosine Similarity Calcualtion
supply = ['original review','fake negative review','fake positive review']
source_to_compare = ['original review','fake negative review','fake positive review']
avg_cos_sim_per_word = {}
for s in supply:
depend = []
for s2 in source_to_compare:
if s != s2:
for despatched in r_dict[s]:
for sent2 in r_dict[s2]:
similarity = calculate_cosine_similarity(despatched, sent2)
depend.append(similarity)
avg_cos_sim_per_word['{0} to {1}'.format(s,s2)] = np.imply(depend)outcomes = pd.DataFrame(avg_cos_sim_per_word,index=[0]).T
For the unique dataset, the adverse opinions had been extra comparable. My speculation is that this is because of my utilizing extra prompts to create adverse opinions than optimistic opinions. No shock, the ChatGPT-generated opinions confirmed the very best indicators of similarity between themselves.
Nice, we’ve the cosine similarities, however is there one other step we are able to take to evaluate the similarities of the opinions? There’s! Let’s visualize the sentences as vectors. To do that, we should embed the sentences (flip them into vectors of numbers) after which we are able to visualize them in 2D area. I used Spacy to embed my vectors and visualize them.
# Load pre-trained GloVe mannequin
nlp = spacy.load('en_core_web_lg')source_embeddings = {}
for supply, source_sentences in reviews_dict.objects():
source_embeddings[source] = []
for sentence in source_sentences:
# Tokenize the sentence utilizing spaCy
doc = nlp(sentence)
# Retrieve phrase embeddings
word_embeddings = np.array([token.vector for token in doc])
# Save phrase embeddings for the supply
source_embeddings[source].append(word_embeddings)
def legend_without_duplicate_labels(determine):
handles, labels = plt.gca().get_legend_handles_labels()
by_label = dict(zip(labels, handles))
determine.legend(by_label.values(), by_label.keys(), loc='decrease proper')
# Plot embeddings with colours primarily based on supply
fig, ax = plt.subplots()
colours = ['g', 'b', 'r'] # Colours for every supply
i=0
for supply, embeddings in source_embeddings.objects():
for embedding in embeddings:
ax.scatter(embedding[:, 0], embedding[:, 1], c=colours[i], label=supply)
i+=1
legend_without_duplicate_labels(plt)
plt.present()
The excellent news is we are able to clearly see the embeddings and distributions of the sentence vectors intently align. Visible inspection reveals there may be extra variability within the distribution of the unique opinions, supporting the assertion they’re extra numerous. Since ChatGPT generated optimistic and adverse opinions, we’d suspect their distributions to be the identical. Discover, nevertheless, the faux adverse opinions even have a wider distribution and extra variance than optimistic opinions. Why would possibly this be? In all probability it’s due partly to the truth that I needed to trick ChatGPT to create the faux adverse opinions (ChatGPT is designed to say optimistic statements) and I needed to truly present extra prompts to ChatGPT to get sufficient adverse opinions vs. optimistic ones. This helps the dataset as a result of, with the extra variety within the dataset, we are able to practice higher-performing machine studying fashions.
Subsequent, we are able to examine the variations within the three totally different distributions of opinions and see if there are any distinguishing patterns.
What can we see? Visually, we are able to see that the majority of the opinions for the dataset are centered across the origin and span from -10 to 10. It is a optimistic signal and helps using faux opinions for coaching prediction fashions. The variances are considerably the identical, nevertheless, the unique opinions had a wider variance of their distribution, each laterally and longitudinally, a proxy that there’s extra variety within the lexicon inside these opinions. The opinions from ChatGPT positively had comparable distributions, however the optimistic opinions had extra outliers. As said, these distinctions might be a results of the way in which I used to be prompting the system to generate opinions.