Poem or sarcastic code ? (1)

I was searching Google for the English translation of this Hindi poem “with time” little did I know it was actually sarcastic code.. well I had a suspicion.. MFs!

IMG_7372.MOV

from google.colab import drive import pandas as pd import numpy as np from numpy import array from numpy import asarray from numpy import zeros import nltk from nltk.corpus import stopwords import re import string from itertools import groupby from collections import Counter import matplotlib.pyplot as plt from scipy.sparse import hstack from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, precision_score from sklearn.metrics import recall_score, f1_score from sklearn.model_selection import train_test_split from sklearn.utils import shuffle from fuzzywuzzy import process

In [ ]:

pip install fuzzywuzzy

( FUZZY WUZZY WAS A “BEAR” wasn’t he? Stupid mfs !!! all year long!!! I have been plagued by the hackers (and my roommate) calling me black bear and making reference to it constantly . I knew it was something ,. But not being technologically advanced, I didn’t know what ! Do you know how hard it is to look someone in the face whom you know is lying to you and talking shit and making fun of you .. but you can’t prove it ( other than gut intuition) and you have to respond like the dumb twat they think you are .. and continue being nice and in the dark but no really .. I want to kill them ) and they keep doing it constantly degrading you’re very self-esteem and not only them but everybody you know until that’s happened to you. You’ve never walked a mile in my shoes.

Collecting fuzzywuzzy Downloading <https://files.pythonhosted.org/packages/43/ff/74f23998ad2f93b945c0309f825be92e04e0348e062026998b5eefef4c33/fuzzywuzzy-0.18.0-py2.py3-none-any.whl> Installing collected packages: fuzzywuzzy Successfully installed fuzzywuzzy-0.18.0

In [ ]:

drive.mount('/content/drive')

`Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code&scope=email https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly

Enter your authorization code: ·········· Mounted at /content/drive`

In [ ]:

# sarcasm datawith open('/content/drive/My Drive/Data Files/code-mixed analysis data/Sarcasm_tweets.txt') as f: lines = [line.rstrip() for line in f]

new_lines = [] for line in lines: if line == '': continue new_lines.append(line)

tweet_ids = [] tweets = []

for i in range(len(new_lines)): if i%2 == 0: tweet_ids.append(new_lines[i]) else: tweets.append(new_lines[i])

# annotationswith open('/content/drive/My Drive/Data Files/code-mixed analysis data/Sarcasm_tweet_truth.txt') as f1: lines1 = [line.rstrip() for line in f1]

labels = []

for i in range(len(lines1)): if i%2 != 0: labels.append(lines1[i])

# tweets with language f2 = open('/content/drive/My Drive/Data Files/code-mixed analysis data/Sarcasm_tweets_with_language.txt', 'r') tokens_list = [] tokens = [] languages_list = [] languages = [] for line in f2: line = line.strip() line = line.split(' ') line = [token.strip() for token in line if token != '' and token != ' ' and token != '\n'] if len(line) == 0: tokens_list.append(tokens) languages_list.append(languages) tokens = [] languages = [] elif len(line) == 1: continue else: tokens.append(line[0]) languages.append(line[1])

tokens_list.append(tokens) languages_list.append(languages)

In [ ]:

df = pd.DataFrame(data=tweet_ids, columns=['Tweet ID']) df['Tweet'] = tweets df['Label'] = labels df['Tokens'] = tokens_list df['Languages'] = languages_list df

Out[ ]:

	Tweet ID	Tweet	Label	Tokens	Languages
0	866871160725794816	Triple Talaq par Burbak Kuchh nahi bolega	NO	[Triple, Talaq, par, Burbak, Kuchh, nahi, bolega]	[en, hi, hi, hi, hi, hi, hi]
1	880356789358743553	Batao ye uss site pr se akki sir ke verdict ni...	YES	[Batao, ye, uss, site, pr, se, akki, sir, ke, ...	[hi, hi, hi, en, hi, hi, hi, en, hi, en, hi, h...
2	877751493889105920	Hindu baheno par julam bardas nahi hoga @Tripl...	NO	[Hindu, baheno, par, julam, bardas, nahi, hoga...	[hi, hi, hi, hi, hi, hi, hi, rest, hi, hi, hi,...
3	901806457871466496	Naa bhai.. aisa nhi hai.. mere handle karne se...	NO	[Naa, bhai, .., aisa, nhi, hai, .., mere, hand...	[hi, hi, rest, hi, hi, hi, rest, hi, en, hi, h...
4	866264330748219392	#RememberingRajiv aaj agar musalman auraten tr...	NO	[#RememberingRajiv, aaj, agar, musalman, aurat...	[rest, hi, hi, hi, hi, en, hi, hi, hi, hi, hi,...
...	...	...	...	...	...
5245	256002351670898688	Khiladi anari, aur shaamat equipment ki aye! B...	NO	[Khiladi, anari, ,, aur, shaamat, equipment, k...	[hi, hi, rest, hi, hi, en, hi, hi, rest, hi, e...
5246	256306978811441152	#irony RT @techno_charan: pallu k neche chhupa...	NO	[#irony, RT, @techno_charan:, pallu, k, neche,...	[rest, hi, rest, hi, hi, hi, hi, hi, hi, hi, h...
5247	256416888568045569	Jab Thak Hai Jaan. #Irony	NO	[Jab, Thak, Hai, Jaan, ., #Irony]	[hi, hi, hi, hi, rest, rest]
5248	257194830449487872	@beeba_puttar Acha! Aur koi nae mila tha #sarc...	NO	[@beeba_puttar, Acha, !, Aur, koi, nae, mila, ...	[rest, hi, rest, hi, hi, en, hi, hi, rest, hi,...
5249	257448839827578880	@Nirmalogy sacchi mucchi mein? Yah ye bhi #Sar...	NO	[@Nirmalogy, sacchi, mucchi, mein, ?, Yah, ye,...	[rest, hi, hi, hi, rest, hi, hi, hi, rest, hi,...

5250 rows × 5 columns

In [ ]:

np.random.seed(10) df_y = df[df.Label =="YES"] df_n = df[df.Label == "NO"] drop_indices = np.random.choice(df_n.index, 4000, replace=False) df_subset_n = df_n.drop(drop_indices) frames = [df_y , df_subset_n] df = pd.concat(frames, ignore_index = True) df