this module is machine learning 700, please answer all correctly.   Question One       Predicting House Prices using Real Estate Data (25 Marks) 25 Marks A real estate company, UrbanNest Realty, wants to improve its property pricing strategy. Traditional valuation methods often lead to mispricing, affecting revenue and customer satisfaction. The company aims to use machine learning to predict house prices more accurately. Dataset: California Housing Prices from sklearn.datasets (a) Load the dataset and perform basic exploratory data analysis (EDA) by displaying summary statistics, handling missing values, and visualizing key features. (6 Marks) (b) Train a Linear Regression and a Decision Tree Regressor to predict house prices. Evaluate the models using Mean Absolute Error (MAE) and R² Score. (12 Marks) (c) Interpret the model performances and discuss how feature selection or additional preprocessing could improve accuracy. (7 Marks) Question Two       Customer Segmentation using Mall Customers Dataset (20 Marks) 25 Marks A large retail chain, ShopEase, is struggling with ineffective marketing campaigns. The company wants to use machine learning to segment its customer base and deliver targeted promotions that improve engagement and sales. Dataset: Mall Customers Dataset (available from Kaggle or UCI repository) (a) Load the dataset and perform basic exploratory data analysis (EDA), including missing values handling and visualizing spending patterns. (6 Marks) (b) Apply k-Means Clustering to segment customers based on Annual Income and Spending Score. Determine the optimal number of clusters using the Elbow Method. (12 Marks) (c) Visualize and interpret the resulting clusters. Discuss how a retail business could use this information for marketing strategies. (7 Marks) Question Three       Credit Card Fraud Detection using Machine Learning (25 Marks) 25 Marks Scenario: A major financial institution, SafeBank, is facing increased credit card fraud cases. Their rule-based system is failing to detect modern fraudulent techniques. The bank wants to implement a machine learning model to detect fraudulent transactions while minimizing false positives. Dataset: Credit Card Fraud Detection Dataset (available on Kaggle) (a) Load the dataset and perform exploratory data analysis (EDA) to understand fraud and non-fraud transactions. Use data balancing techniques if necessary. (6 Marks) (b) Train a Logistic Regression and a Random Forest Classifier to detect fraudulent transactions. Compare their performance using Precision, Recall, and F1-Score. (12 Marks) (c) Discuss the ethical considerations of deploying such a fraud detection system, including issues of false positives and customer experience. (7 Marks) Question Four       Sentiment Analysis on Product Reviews (25 Marks) 25 Marks Scenario: An e-commerce platform, ReviewMaster, is overwhelmed by the volume of customer reviews. The company wants to automate the sentiment analysis process to identify common complaints and improve product recommendations. Dataset: Amazon Product Reviews Dataset (available on Kaggle) (a) Load the dataset and preprocess the text data (cleaning, tokenization, stopword removal, and vectorization). (6 Marks) (b) Train a Naïve Bayes and a Logistic Regression model to classify reviews as positive or negative. Evaluate using Accuracy, Precision, and Recall. (12 Marks) (c) Discuss how sentiment analysis could help e-commerce businesses improve customer satisfaction and product recommendations. (7 Marks)

icon
Related questions
Question

this module is machine learning 700, please answer all correctly.

 

Question One       
Predicting House Prices using Real Estate Data (25 Marks) 
25 Marks 
A real estate company, UrbanNest Realty, wants to improve its property pricing strategy. 
Traditional valuation methods often lead to mispricing, affecting revenue and customer 
satisfaction. The company aims to use machine learning to predict house prices more 
accurately. 
Dataset: California Housing Prices from sklearn.datasets 
(a) Load the dataset and perform basic exploratory data analysis (EDA) by displaying summary 
statistics, handling missing values, and visualizing key features. (6 Marks) 
(b) Train a Linear Regression and a Decision Tree Regressor to predict house prices. Evaluate 
the models using Mean Absolute Error (MAE) and R² Score. (12 Marks) 
(c) Interpret the model performances and discuss how feature selection or additional 
preprocessing could improve accuracy. (7 Marks) 
Question Two       
Customer Segmentation using Mall Customers Dataset (20 Marks) 
25 Marks 
A large retail chain, ShopEase, is struggling with ineffective marketing campaigns. The 
company wants to use machine learning to segment its customer base and deliver targeted 
promotions that improve engagement and sales. 
Dataset: Mall Customers Dataset (available from Kaggle or UCI repository) 
(a) Load the dataset and perform basic exploratory data analysis (EDA), including missing 
values handling and visualizing spending patterns. (6 Marks) 
(b) Apply k-Means Clustering to segment customers based on Annual Income and Spending 
Score. Determine the optimal number of clusters using the Elbow Method. (12 Marks) 
(c) Visualize and interpret the resulting clusters. Discuss how a retail business could use this 
information for marketing strategies. (7 Marks) 
Question Three       
Credit Card Fraud Detection using Machine Learning (25 Marks) 
25 Marks 
Scenario: A major financial institution, SafeBank, is facing increased credit card fraud cases. 
Their rule-based system is failing to detect modern fraudulent techniques. The bank wants to 
implement a machine learning model to detect fraudulent transactions while minimizing false 
positives. 
Dataset: Credit Card Fraud Detection Dataset (available on Kaggle) 
(a) Load the dataset and perform exploratory data analysis (EDA) to understand fraud and 
non-fraud transactions. Use data balancing techniques if necessary. (6 Marks) 
(b) Train a Logistic Regression and a Random Forest Classifier to detect fraudulent 
transactions. Compare their performance using Precision, Recall, and F1-Score. (12 Marks) 
(c) Discuss the ethical considerations of deploying such a fraud detection system, including 
issues of false positives and customer experience. (7 Marks) 
Question Four       
Sentiment Analysis on Product Reviews (25 Marks) 
25 Marks 
Scenario: An e-commerce platform, ReviewMaster, is overwhelmed by the volume of 
customer reviews. The company wants to automate the sentiment analysis process to identify 
common complaints and improve product recommendations. 
Dataset: Amazon Product Reviews Dataset (available on Kaggle) 
(a) Load the dataset and preprocess the text data (cleaning, tokenization, stopword removal, 
and vectorization). (6 Marks) 
(b) Train a Naïve Bayes and a Logistic Regression model to classify reviews as positive or 
negative. Evaluate using Accuracy, Precision, and Recall. (12 Marks) 
(c) Discuss how sentiment analysis could help e-commerce businesses improve customer 
satisfaction and product recommendations. (7 Marks)

Expert Solution
steps

Step by step

Solved in 2 steps

Blurred answer