Hands-On Exercise 2

docx

School

California State University, Los Angeles *

*We aren’t endorsed by this school

Course

4200

Subject

Computer Science

Date

Feb 20, 2024

Type

docx

Pages

Uploaded by MasterOwlMaster673

Kathlyne Alilain Dr. Li CIS 4200-01 20 Feb. 2024 Hands-On Exercise 2 1. Data Understanding: - What does each observation represent? Different variables or attributes about each car - How many variables are there? 12 - Which data attributes are categorical, and which are numeric? Categorical: manufacturer, model, year, trans, drv, fl, class Numeric: displ, cyl, cty, hwy 2. Data Preprocessing: - Check for duplicate and missing data. Are there any duplicate rows? Are there any missing values? 9 duplicates, 1 missing value - Propose solutions for handling missing data. Deletion and imputation: Use the remove duplicates function and find similar records to fill in gaps 3. Data Enrichment: Create a new variable called "mpg" that represents the average of city ("cty") and highway ("hwy") miles per gallon.

4. Understanding Numerical Variables: Calculate and present descriptive statistics (mean, median, range, variance, and standard deviation) for the numeric variable: "mpg" Compare and explain the mean and standard deviation. The mean is 20.24 and the standard deviation is 5.063234. The mean is the average value of mpg across all cars. The standard deviation is how much variation there is from the mean. This means that the value by which our mean may deviate may be +/- 5.063234. 5. Understanding categorical variables: What are the unique values of drive train type (drv)? What is the mode for "drv" variable? 4, f, r are the unique values of drv. F is the mode. Create bar plots to illustrate the distribution of "drv". Compare the distribution of "drv" in 1999 and 2008. Summarize the difference.

The distribution of drv in 199 and 2008 are similar to each other. In 1999, front-wheel drive cars were a bit more higher than in 2008, where 4-wheel drive is a bit more prevalent. There is a slight increase in rear-wheel drive in 2008 than 1999, but still very similar and small. 6. Box Plots for numeric variables: Use a box plot to show the summary distribution of numeric variables: "mpg" Report key statistics(Q1, median, Q2, max, min) displayed in the box plot. Q1= 15.75 Median= 20.24 Q2= 20.5 Max= 35 Min= 10.5 What is "mpg" range of the middle 50% of cars in the dataset? 15.75-23.5 Box plots by year: Use a box plot to show the distribution of "mpg" variable in 1999 and 2008. summarize the difference. The range is a bit larger in 2008 than 1999, from 13- 30.5 to 10.5-32.5. The interquartile range is also larger in 2008, where it is 16-24. In 1999, it had 3 outliers whereas 2008 had none. The mean in each is very similar to one another, along with their Q2. Box Plots by Classes: Use a box plot to show the distribution of "mpg" variable in different classes. summarize the difference.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

The data is tightly grouped and the data is not skewed, except for the subcompact class. Their means and Q2 fall within similar ranges of each other. The whiskers on the subcompact class is much longer than the others. The compact, subcompact, SUV, and minivan classes all had some outliers. 7. Histogram for numeric variables: Use Histogram to show the detailed distribution of numeric variables: "mpg" Explore different bin widths and discuss what is a proper bin width. The proper bin width must be consecutive, non-overlapping intervals. They should be of equal size. Different bin widths can reveal patterns or trends in the data. Use a bin width of 4, how many models fall into the common range? About 4-5 models.

Related Documents

Sample Midterm 2 - With Answers, Updated.pdf

Practice Midterm 1.pdf

Practice Midterm 1 - With Answers.pdf

Changes to Security and Safety Report by Traci Phelps.docx

CIS 4880-02_ Linux Essentials.pdf

Hands-On Exercise 1.docx

Research Brief.docx

Lab 2 (1.6.2).docx

Privacy and Civil Liberties Outline.docx

Privacy and Civil Liberties Outline Rewrite.docx

Current Topic Critique.docx

p2-COMP421-2024.pdf

Recommended textbooks for you

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Fundamentals of Information Systems

Computer Science

ISBN:9781337097536

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Management Of Information Security

Computer Science

ISBN:9781337405713

Author:WHITMAN, Michael.

Publisher:Cengage Learning,

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781305971776

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

MIS

Computer Science

ISBN:9781337681919

Author:BIDGOLI

Publisher:Cengage

SEE MORE TEXTBOOKS

Recommended textbooks for you

Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781337097536
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Management Of Information Security
Computer Science
ISBN:9781337405713
Author:WHITMAN, Michael.
Publisher:Cengage Learning,
Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781305971776
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
MIS
Computer Science
ISBN:9781337681919
Author:BIDGOLI
Publisher:Cengage