Assignment_4_DataAnalytics_Fall2023

pdf

School

Rensselaer Polytechnic Institute *

*We aren’t endorsed by this school

Course

4960

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

2

Uploaded by MinisterHorseMaster970

Report
Assignment 4 : Data Analytics (Fall 2023) (15% written) Due: October 31 st , 2023 (by 11:59pm ET) Submission method: written document via LMS Please use the following file naming for electronic submission: DataAnalytics_A4_YOURFIRSTNAME_YOURLASTNAME.xxx Late submission policy: first time with valid reason no penalty, otherwise 20% of score deducted each late day Note: Your report for this assignment should be the result of your own individual work. Take care to avoid plagiarism (“copying”), and include references to all web resources, texts, and class presentations. You may discuss the problems with other students, but do not take written notes during these discussions, and do not share your written solutions. General Assignment : Pattern, trend, relations: model development and evaluation of housing (Brooklyn, Manhattan, Queens) NYC Citywide Annualized Calendar Sales Update datasets available: https://data.cityofnewyork.us/City-Government/NYC-Citywide-Annualized-Calendar-Sales- Update/w2pb-icbu The weighting score for each question is included below. Please use the question numbering below for your written responses for this assignment. Please include code (fragments and/or scripts) and the plots you generate for the questions below. 1. For any one of the Brooklyn, Manhattan, Queens sales datasets, perform the following: a). Describe the type of patterns or trends you might look for and how you plan to model them. Describe any exploratory data analysis you performed. Include plots and other descriptions. Min. 5 sentences (1%) b). Identify the outlier values in the data for Sales Price or on a variable you choose, explain why you consider those data points are outliers? Use the Cook s Distance and IQR (Inter Quartile Range) to identify the outlier points (1%) c). Conduct Multivariate Regression on the chosen dataset to predict the Sales Price using Gross Square feet, Land Square feet. When you conduct the multivariate regression, make sure to draw at least 3 samples from the data and compare the different results you obtained. Explain the results Min. 5 sentences (1%) d) . Pick one or more models (these need not be restricted to the models you’ve learned so far [Decision Trees, KNN, K-Means, RandomForest ]) to explore the chosen data. Interpret the model fits and indicate significance. Describe any cleaning you had to do and why. Min. 5 sentences (2%)
2. For your chosen dataset: a). Apply the model(s) to predict quantities of interest (that you choose). Describe (contingency table) or plot the predictions. Min. 2-3 sentences (4000-level 5%, 6000-level 3%) b). Examine the fit(s). Perform a significance test that is suitable for the variables you are investigating and describe the results. Min. 2-3 sentences (4000-level 4%, 6000-level 3%) c). Discuss any observations you had about the datasets/ variables, other data in the dataset and/or your confidence in the result. Min 1-2 sentences (1%) 3. 6000-level question (3%). Draw conclusions from this study about the model type and suitability/ deficiencies. Describe what worked and why/ why not. Min. 4-5 sentences
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help