this module is machine learning 700, please answer all correctly. Question One Predicting House Prices using Real Estate Data (25 Marks) 25 Marks A real estate company, UrbanNest Realty, wants to improve its property pricing strategy. Traditional valuation methods often lead to mispricing, affecting revenue and customer satisfaction. The company aims to use machine learning to predict house prices more accurately. Dataset: California Housing Prices from sklearn.datasets (a) Load the dataset and perform basic exploratory data analysis (EDA) by displaying summary statistics, handling missing values, and visualizing key features. (6 Marks) (b) Train a Linear Regression and a Decision Tree Regressor to predict house prices. Evaluate the models using Mean Absolute Error (MAE) and R² Score. (12 Marks) (c) Interpret the model performances and discuss how feature selection or additional preprocessing could improve accuracy. (7 Marks) Question Two Customer Segmentation using Mall Customers Dataset (20 Marks) 25 Marks A large retail chain, ShopEase, is struggling with ineffective marketing campaigns. The company wants to use machine learning to segment its customer base and deliver targeted promotions that improve engagement and sales. Dataset: Mall Customers Dataset (available from Kaggle or UCI repository) (a) Load the dataset and perform basic exploratory data analysis (EDA), including missing values handling and visualizing spending patterns. (6 Marks) (b) Apply k-Means Clustering to segment customers based on Annual Income and Spending Score. Determine the optimal number of clusters using the Elbow Method. (12 Marks) (c) Visualize and interpret the resulting clusters. Discuss how a retail business could use this information for marketing strategies. (7 Marks) Question Three Credit Card Fraud Detection using Machine Learning (25 Marks) 25 Marks Scenario: A major financial institution, SafeBank, is facing increased credit card fraud cases. Their rule-based system is failing to detect modern fraudulent techniques. The bank wants to implement a machine learning model to detect fraudulent transactions while minimizing false positives. Dataset: Credit Card Fraud Detection Dataset (available on Kaggle) (a) Load the dataset and perform exploratory data analysis (EDA) to understand fraud and non-fraud transactions. Use data balancing techniques if necessary. (6 Marks) (b) Train a Logistic Regression and a Random Forest Classifier to detect fraudulent transactions. Compare their performance using Precision, Recall, and F1-Score. (12 Marks) (c) Discuss the ethical considerations of deploying such a fraud detection system, including issues of false positives and customer experience. (7 Marks) Question Four Sentiment Analysis on Product Reviews (25 Marks) 25 Marks Scenario: An e-commerce platform, ReviewMaster, is overwhelmed by the volume of customer reviews. The company wants to automate the sentiment analysis process to identify common complaints and improve product recommendations. Dataset: Amazon Product Reviews Dataset (available on Kaggle) (a) Load the dataset and preprocess the text data (cleaning, tokenization, stopword removal, and vectorization). (6 Marks) (b) Train a Naïve Bayes and a Logistic Regression model to classify reviews as positive or negative. Evaluate using Accuracy, Precision, and Recall. (12 Marks) (c) Discuss how sentiment analysis could help e-commerce businesses improve customer satisfaction and product recommendations. (7 Marks)
this module is machine learning 700, please answer all correctly.
Question One
Predicting House Prices using Real Estate Data (25 Marks)
25 Marks
A real estate company, UrbanNest Realty, wants to improve its property pricing strategy.
Traditional valuation methods often lead to mispricing, affecting revenue and customer
satisfaction. The company aims to use machine learning to predict house prices more
accurately.
Dataset: California Housing Prices from sklearn.datasets
(a) Load the dataset and perform basic exploratory data analysis (EDA) by displaying summary
statistics, handling missing values, and visualizing key features. (6 Marks)
(b) Train a Linear Regression and a Decision Tree Regressor to predict house prices. Evaluate
the models using Mean Absolute Error (MAE) and R² Score. (12 Marks)
(c) Interpret the model performances and discuss how feature selection or additional
preprocessing could improve accuracy. (7 Marks)
Question Two
Customer Segmentation using Mall Customers Dataset (20 Marks)
25 Marks
A large retail chain, ShopEase, is struggling with ineffective marketing campaigns. The
company wants to use machine learning to segment its customer base and deliver targeted
promotions that improve engagement and sales.
Dataset: Mall Customers Dataset (available from Kaggle or UCI repository)
(a) Load the dataset and perform basic exploratory data analysis (EDA), including missing
values handling and visualizing spending patterns. (6 Marks)
(b) Apply k-Means Clustering to segment customers based on Annual Income and Spending
Score. Determine the optimal number of clusters using the Elbow Method. (12 Marks)
(c) Visualize and interpret the resulting clusters. Discuss how a retail business could use this
information for marketing strategies. (7 Marks)
Question Three
Credit Card Fraud Detection using Machine Learning (25 Marks)
25 Marks
Scenario: A major financial institution, SafeBank, is facing increased credit card fraud cases.
Their rule-based system is failing to detect modern fraudulent techniques. The bank wants to
implement a machine learning model to detect fraudulent transactions while minimizing false
positives.
Dataset: Credit Card Fraud Detection Dataset (available on Kaggle)
(a) Load the dataset and perform exploratory data analysis (EDA) to understand fraud and
non-fraud transactions. Use data balancing techniques if necessary. (6 Marks)
(b) Train a Logistic Regression and a Random Forest Classifier to detect fraudulent
transactions. Compare their performance using Precision, Recall, and F1-Score. (12 Marks)
(c) Discuss the ethical considerations of deploying such a fraud detection system, including
issues of false positives and customer experience. (7 Marks)
Question Four
Sentiment Analysis on Product Reviews (25 Marks)
25 Marks
Scenario: An e-commerce platform, ReviewMaster, is overwhelmed by the volume of
customer reviews. The company wants to automate the sentiment analysis process to identify
common complaints and improve product recommendations.
Dataset: Amazon Product Reviews Dataset (available on Kaggle)
(a) Load the dataset and preprocess the text data (cleaning, tokenization, stopword removal,
and vectorization). (6 Marks)
(b) Train a Naïve Bayes and a Logistic Regression model to classify reviews as positive or
negative. Evaluate using Accuracy, Precision, and Recall. (12 Marks)
(c) Discuss how sentiment analysis could help e-commerce businesses improve customer
satisfaction and product recommendations. (7 Marks)

Step by step
Solved in 2 steps
