I was searching Google for the English translation of this Hindi poem “with time” little did I know it was actually sarcastic code.. well I had a suspicion.. MFs!


IMG_7372.MOV

from google.colab import drive import pandas as pd import numpy as np from numpy import array from numpy import asarray from numpy import zeros import nltk from nltk.corpus import stopwords import re import string from itertools import groupby from collections import Counter import matplotlib.pyplot as plt from scipy.sparse import hstack from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, precision_score from sklearn.metrics import recall_score, f1_score from sklearn.model_selection import train_test_split from sklearn.utils import shuffle from fuzzywuzzy import process

In [ ]:

pip install fuzzywuzzy

( FUZZY WUZZY WAS A “BEAR” wasn’t he? Stupid mfs !!! all year long!!! I have been plagued by the hackers (and my roommate) calling me black bear and making reference to it constantly . I knew it was something ,. But not being technologically advanced, I didn’t know what ! Do you know how hard it is to look someone in the face whom you know is lying to you and talking shit and making fun of you .. but you can’t prove it ( other than gut intuition) and you have to respond like the dumb twat they think you are .. and continue being nice and in the dark but no really .. I want to kill them ) and they keep doing it constantly degrading you’re very self-esteem and not only them but everybody you know until that’s happened to you. You’ve never walked a mile in my shoes.

Collecting fuzzywuzzy Downloading <https://files.pythonhosted.org/packages/43/ff/74f23998ad2f93b945c0309f825be92e04e0348e062026998b5eefef4c33/fuzzywuzzy-0.18.0-py2.py3-none-any.whl> Installing collected packages: fuzzywuzzy Successfully installed fuzzywuzzy-0.18.0

In [ ]:

drive.mount('/content/drive')

`Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code&scope=email https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly

Enter your authorization code: ·········· Mounted at /content/drive`

In [ ]:

# sarcasm datawith open('/content/drive/My Drive/Data Files/code-mixed analysis data/Sarcasm_tweets.txt') as f: lines = [line.rstrip() for line in f]

new_lines = [] for line in lines: if line == '': continue new_lines.append(line)

tweet_ids = [] tweets = []

for i in range(len(new_lines)): if i%2 == 0: tweet_ids.append(new_lines[i]) else: tweets.append(new_lines[i])

# annotationswith open('/content/drive/My Drive/Data Files/code-mixed analysis data/Sarcasm_tweet_truth.txt') as f1: lines1 = [line.rstrip() for line in f1]

labels = []

for i in range(len(lines1)): if i%2 != 0: labels.append(lines1[i])

# tweets with language f2 = open('/content/drive/My Drive/Data Files/code-mixed analysis data/Sarcasm_tweets_with_language.txt', 'r') tokens_list = [] tokens = [] languages_list = [] languages = [] for line in f2: line = line.strip() line = line.split(' ') line = [token.strip() for token in line if token != '' and token != ' ' and token != '\n'] if len(line) == 0: tokens_list.append(tokens) languages_list.append(languages) tokens = [] languages = [] elif len(line) == 1: continue else: tokens.append(line[0]) languages.append(line[1])

tokens_list.append(tokens) languages_list.append(languages)

In [ ]:

df = pd.DataFrame(data=tweet_ids, columns=['Tweet ID']) df['Tweet'] = tweets df['Label'] = labels df['Tokens'] = tokens_list df['Languages'] = languages_list df

Out[ ]:

Tweet ID Tweet Label Tokens Languages
0 866871160725794816 Triple Talaq par Burbak Kuchh nahi bolega NO [Triple, Talaq, par, Burbak, Kuchh, nahi, bolega] [en, hi, hi, hi, hi, hi, hi]
1 880356789358743553 Batao ye uss site pr se akki sir ke verdict ni... YES [Batao, ye, uss, site, pr, se, akki, sir, ke, ... [hi, hi, hi, en, hi, hi, hi, en, hi, en, hi, h...
2 877751493889105920 Hindu baheno par julam bardas nahi hoga @Tripl... NO [Hindu, baheno, par, julam, bardas, nahi, hoga... [hi, hi, hi, hi, hi, hi, hi, rest, hi, hi, hi,...
3 901806457871466496 Naa bhai.. aisa nhi hai.. mere handle karne se... NO [Naa, bhai, .., aisa, nhi, hai, .., mere, hand... [hi, hi, rest, hi, hi, hi, rest, hi, en, hi, h...
4 866264330748219392 #RememberingRajiv aaj agar musalman auraten tr... NO [#RememberingRajiv, aaj, agar, musalman, aurat... [rest, hi, hi, hi, hi, en, hi, hi, hi, hi, hi,...
... ... ... ... ... ...
5245 256002351670898688 Khiladi anari, aur shaamat equipment ki aye! B... NO [Khiladi, anari, ,, aur, shaamat, equipment, k... [hi, hi, rest, hi, hi, en, hi, hi, rest, hi, e...
5246 256306978811441152 #irony RT @techno_charan: pallu k neche chhupa... NO [#irony, RT, @techno_charan:, pallu, k, neche,... [rest, hi, rest, hi, hi, hi, hi, hi, hi, hi, h...
5247 256416888568045569 Jab Thak Hai Jaan. #Irony NO [Jab, Thak, Hai, Jaan, ., #Irony] [hi, hi, hi, hi, rest, rest]
5248 257194830449487872 @beeba_puttar Acha! Aur koi nae mila tha #sarc... NO [@beeba_puttar, Acha, !, Aur, koi, nae, mila, ... [rest, hi, rest, hi, hi, en, hi, hi, rest, hi,...
5249 257448839827578880 @Nirmalogy sacchi mucchi mein? Yah ye bhi #Sar... NO [@Nirmalogy, sacchi, mucchi, mein, ?, Yah, ye,... [rest, hi, hi, hi, rest, hi, hi, hi, rest, hi,...

5250 rows × 5 columns

In [ ]:

np.random.seed(10) df_y = df[df.Label =="YES"] df_n = df[df.Label == "NO"] drop_indices = np.random.choice(df_n.index, 4000, replace=False) df_subset_n = df_n.drop(drop_indices) frames = [df_y , df_subset_n] df = pd.concat(frames, ignore_index = True) df