Assignment 01

pdf

School

Humber College *

*We aren’t endorsed by this school

Course

4000

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

3

Uploaded by UltraCloverAlligator20

Report
Page | 1 of 3 Assignment 01 Assignment 01 Assignment 01 Assignment 01 (05 05 05 05%) %) %) %) I. About the Data The Boston Housing dataset contains data collected by the US Census Service concerning housing around Boston Massachusetts. It was obtained from the StatLib archive (http://lib.stat.cmu.edu/datasets/boston). The dataset has 167 cases. The data was originally published by Harrison Jr., David, and Daniel L. Rubinfeld. "Hedonic housing prices and the demand for clean air." Journal of environmental economics and management 5.1 (1978): 81-102. The BostonHousing.xlsx dataset has 11 attributes. The dataset comes with different imperfections (missing and outliers). As described earlier, most algorithms will not process records with these imperfections. II. Requirements A. Make a review of such techniques, data, and examples with references. B. Use the provided data file in the following tasks: 1. Except PTRATIO predictor, perform the necessary “Handling Missing Data” operations to the missing values and highlight them. 2. Find possible "outliers" in the PTRATIO predictor. The possible causes of outliers are: (a) Typing non-numeric value. (b) Shift in decimal place while data entry error. (c) Genuine case of an outlier. Highlight the cells with outlier cases and state the possible cause indicating a, b, or c. C. Use the provided data file in the following tasks: 1. Substitute the missing data with NaN (not a number). 2. Write and provide Python code to implement: Omission Imputation D. Compute the mean, median, min, max, and standard deviation for each of the quantitative variables. E. Plot a histogram for each of the quantitative variables. Based on the histograms and summary statistics, answer the following questions: i. Which variables have the largest variabilities?
Page | 2 of 3 ii. Which variables were seen skewed? iii. Are there any values that seem extreme? F. Plot a side-by-side box plot comparing any two variables. Explain what this plot shows us. G. Compute the correlation table for the quantitative variable. In addition, generate a matrix plot for these variables (Heatmap). i. Which pair of variables are most strongly correlated? ii. How can we reduce the number of variables based on these correlations? iii. How would the correlations change if we normalized the data first? III. Deliverables A report (Max. 10 pages). Feel free to choose the report format. All the Python code used to develop the models (provide all the developed-in .pdf , ipynb files) IV. Instructions: This assignment is to be completed in groups. The due date until the next lecture time, submit what you have before next week's lecture. Late submissions will NOT be marked . (no excuse) The solutions must be submitted via Blackboard through the assignment’s link. Follow the accepted file format word, PDF editable file (no images), and the .ipynb file. Any feedback/issue on the Assignment grades should be a clear email within a week after grading (use Blackboard email please). A zero-tolerance policy regarding plagiarism and cheating is in effect.
Page | 3 of 3 V. Important Notes about the Report Submission: Notebook (.ipynb) and editable Adobe (.pdf) format need to submit for your code A report in (word or Adobe) format required. In other words, two separate documents 1. Report (.docx or pdf) 2. Code (.ipynb and . pdf) Report body: Make your report a complete story. Do not make your report as PART A, B, …etc. Or answer questions. This will be in your code. Select a name for your assignment, related to the topics. Talk about the data, assume the reader doesn’t know anything about the assignment and/or the data. Add a literature review about the data, algorithms, techniques, … etc. Make a table and explain, so the reader can see what you are talking about. Do not use images in your tables (editable tables). Figures and tables should contain numbers, e.g., Fig. 1, Table 1, … etc. With a clear explanation in your text. Do not use code in your report, unless it is very important. You can use Pseudocode. Discuss your results, compare them with others (if any), … etc. If possible, focus on business objectives. Add conclusions . References: Write references properly. Use Google Scholar citation. Cite them in your text, whenever you use them. Do not add the URL for your reference unless it is a website.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help