Ngram с подсчетами в приведенных ниже желаемых выходных данных

#python #n-gram

#python #n-грамм

Вопрос:

следующее привело меня к приведенному ниже выводу:

         words   freq
0        hello   5
1        yes     10


I would like the above output to be same for ngrams(4). The results is only showing freq with "1". Can someone help me tune the codes for ngrams and as per the above output. The requirement is Ngrams with freqencies and output in excel(xlsx).

Примеры, показанные ниже:

  (('benito', 'kanchan'), 1),
 (('kanchan', 'tata'), 1),
 (('tata', 'arora'), 1),

So far the code:

df = pd.read_excel(r"Filename")

#Converting to lovercase
df['Body'] = df['Body'].apply(lambda x: " ".join(x.lower() for x in x.split()))
df['Body'].head()

#Count of Words
df['word_count'] = df['Body'].apply(lambda x: len(str(x).split(" ")))
df[['Body','word_count']].head()

#Removing Punctuation
df['Body'] = df['Body'].str.replace('[^ws]','')
df['Body'].head()

#Removing Stop Words
from nltk.corpus import stopwords
stop = stopwords.words('english')

df['Body'] = df['Body'].apply(lambda x: " ".join(x for x in x.split() if x not in stop))
df['Body'].head()

#df['Body'] = df['Body'].astype('|S')

# Word Count
tf1 = (df['Body']).apply(lambda x: pd.value_counts(x.split(" "))).sum(axis = 0).reset_index()
print (tf1)
tf1.columns = ['words','tf']
tf1

Ngrams
из коллекций импортируйте счетчик
из textblob импортируйте TextBlob
a = TextBlob(tf1[‘words’][0]).ngrams(4)
a = [‘,’.join(map(str, l)) для l в a]
печать (a)
счетчик = (счетчик (a))
счетчик.most_common(150)
счетчик.столбцы = [‘ngram’,’tf’]
счетчик

Вопрос:

Вам также может понравиться

Поддержка SVG-фильтров Qt QML (например, feGaussianBlur) в Qt5.12 и выше

Проблема с доступом к Excel VBA с обновлением Office 365

«Параметры» — это «тип», который недопустим в данном контексте