Take the first 1000 entries of the training portion of the Scikit-learn 20 Newsgroups dataset (in the original order) as the training set, and set aside the next 100 entries as the test set. Predict whether a post belongs to a political discussion group using a bag-of-words model (category includes 'talk.politics'). Provide the accuracy on the test set, the input shape of the network, and the predictions of the network for the last two entries in the test set as "accuracy on the test set, network input shape, network predictions for the last two entries in the test set as 'politics' or 'non-politics'". Here's what I have so far: from sklearn.datasets import fetch_20newsgroupsfrom sklearn.feature_extraction.text import CountVectorizerimport numpy as npfrom keras.models import Sequentialfrom keras.layers import Dense, Dropoutfrom keras.optimizers import Adam categories = ['talk.politics.guns', 'talk.politics.mideast', 'talk.politics.misc']newsgroups = fetch_20newsgroups(subset='train') train_data = newsgroups.data[:1000]test_data = newsgroups.data[1000:1100] vectorizer = CountVectorizer()X_train = vectorizer.fit_transform(train_data)X_test = vectorizer.transform(test_data) y_train = np.array(['talk.politics' in newsgroups.target_names[y] for y in newsgroups.target[:1000]], dtype=int)y_test = np.array(['talk.politics' in newsgroups.target_names[y] for y in newsgroups.target[1000:1100]], dtype=int) model = Sequential()input_shape = X_train.shape[1] model.add(Dense(128, input_shape=(input_shape,), activation='relu'))model.add(Dense(64, activation='relu'))model.add(Dense(32, activation='relu'))model.add(Dropout(0.5))model.add(Dense(3, activation='softmax')) model.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy']) model.fit(X_train, y_train, epochs=5, batch_size=128, validation_data=(X_test, y_test)) # Evaluate the modelaccuracy = model.evaluate(X_test, y_test, verbose=0)[1]predictions = model.predict(X_test[-2:]) However, this code does not work, what am i doing wrong?
Take the first 1000 entries of the training portion of the Scikit-learn 20 Newsgroups dataset (in the original order) as the training set, and set aside the next 100 entries as the test set. Predict whether a post belongs to a political discussion group using a bag-of-words model (category includes 'talk.politics'). Provide the accuracy on the test set, the input shape of the network, and the predictions of the network for the last two entries in the test set as "accuracy on the test set, network input shape, network predictions for the last two entries in the test set as 'politics' or 'non-politics'".
Here's what I have so far:
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import Adam
categories = ['talk.politics.guns', 'talk.politics.mideast', 'talk.politics.misc']
newsgroups = fetch_20newsgroups(subset='train')
train_data = newsgroups.data[:1000]
test_data = newsgroups.data[1000:1100]
X_train = vectorizer.fit_transform(train_data)
X_test = vectorizer.transform(test_data)
y_train = np.array(['talk.politics' in newsgroups.target_names[y] for y in newsgroups.target[:1000]], dtype=int)
y_test = np.array(['talk.politics' in newsgroups.target_names[y] for y in newsgroups.target[1000:1100]], dtype=int)
model = Sequential()
input_shape = X_train.shape[1]
model.add(Dense(128, input_shape=(input_shape,), activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])
model.fit(X_train, y_train, epochs=5, batch_size=128, validation_data=(X_test, y_test))
# Evaluate the model
accuracy = model.evaluate(X_test, y_test, verbose=0)[1]
predictions = model.predict(X_test[-2:])
However, this code does not work, what am i doing wrong?
Step by step
Solved in 2 steps