Statistical Analysis and Visualization of Octane Ratings

HW01 Lokranjan Lakshmikanthan 9/7/2021 Q6.2.4 Chapter6 = read.csv ( "ch06.csv" , header = TRUE ) q1 = na.omit (Chapter6 $ EX. 6 . 2.4 ) stem (q1, scale = 2 ) ## ## The decimal point is at the | ## ## 83 | 4 ## 84 | 33 ## 85 | 3 ## 86 | 777 ## 87 | 456789 ## 88 | 23334556679 ## 89 | 0233678899 ## 90 | 0111344456789 ## 91 | 0001112256688 ## 92 | 22236777 ## 93 | 023347 ## 94 | 2247 ## 95 | ## 96 | 15 ## 97 | ## 98 | 8 ## 99 | ## 100 | 3 quantile (q1)[ 2 : 4 ] ## 25% 50% 75% ## 88.6 90.4 92.2 First quartile - 88.6 Median - 90.4 Third Quartile - 92.2

#Q 6.3.2 Chapter6 = read.csv ( "ch06.csv" , header = TRUE ) Octane_Rating = na.omit (Chapter6 $ EX. 6 . 2.4 ) hist (Octane_Rating)

#Q 6.3.4 Chapter6 = read.csv ( "ch06.csv" , header = TRUE ) Octane_Rating = na.omit (Chapter6 $ EX. 6 . 2.4 ) ggplot ( as.data.frame (Octane_Rating), aes (Octane_Rating)) + geom_histogram ( bins = 8 , color = "blue" ) + ggtitle ( "Octane rating 8 bin" ) ggplot ( as.data.frame (Octane_Rating), aes (Octane_Rating)) + geom_histogram ( bins = 16 , color = "blue" ) + ggtitle ( "Octane rating 16 bin" ) Both of the outputs are histograms of Octane Ratings. Both histograms are centered on a mean of 90. The 8-bin histogram makes the data appear to be skewed to the right while the 16-bin histogram makes the data appear to be less skewed and more centered. The two histograms misrepresent the true distribution of the data.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Q 6.3.9 AutomobileDefects = c ( 4 , 4 , 6 , 21 , 8 , 5 , 30 , 3 ) names (AutomobileDefects) = c ( "dents" , "pits" , "parts assembled out of sequence" , "parts undertrimmed" , "missing holes/slots" , "parts not lubricated" , "parts out of contour" , "parts not deburred" ) pareto.chart (AutomobileDefects) ## ## Pareto chart analysis for AutomobileDefects ## Frequency Cum.Freq. Percentage Cum.Percent. ## parts out of contour 30.000000 30.000000 37.037037 37.037037 ## parts undertrimmed 21.000000 51.000000 25.925926 62.962963 ## missing holes/slots 8.000000 59.000000 9.876543 72.839506 ## parts assembled out of sequence 6.000000 65.000000 7.407407 80.246914 ## parts not lubricated 5.000000 70.000000 6.172840 86.419753

## dents 4.000000 74.000000 4.938272 91.358025 ## pits 4.000000 78.000000 4.938272 96.296296 ## parts not deburred 3.000000 81.000000 3.703704 100.000000 The pareto chart shows that the most common source of defects are “Parts Out of Contour” followed by “Parts Undertrimmed”. The occurances of other sources of defects have a much lower frequency. Q 6.4.4 Chapter6 = read.csv ( "ch06.csv" , header = TRUE ) PercentageConversion = na.omit (Chapter6 $ EX. 6 . 4.4 ) cat (( "Mean: " ), mean (PercentageConversion), " \n " ) ## Mean: 4 cat (( "Variance: " ), var (PercentageConversion), " \n " ) ## Variance: 0.8663158 cat (( "Standard Deviation: " ), sd (PercentageConversion), " \n " ) ## Standard Deviation: 0.9307609 boxplot (PercentageConversion, main= "percentage mole conversion of naphthalene to maleic anhydride" ) ## Mean: 4 ## Variance: 0.8663158 ## Standard Deviation: 0.9307609

Q 6.4.9 Chapter6 = read.csv ( "ch06.csv" , header = TRUE , stringsAsFactors = FALSE ) HD = as.numeric (Chapter6 $ EX. 6 . 4.9 [ 2 : 22 ]) C1 = as.numeric (Chapter6 $ EX. 6 . 4 . 9.1 [ 2 : 22 ]) C2 = as.numeric (Chapter6 $ EX. 6 . 4 . 9.2 [ 2 : 22 ]) C3 = as.numeric (Chapter6 $ EX. 6 . 4 . 9.3 [ 2 : 22 ]) boxplot (HD,C1,C2,C3, names = c ( "High Dose" , "Control 1" , "Control 2" , "Control 3" )) The IQRs and ranges of each group increase greatly for each patient group as moving from left to right. The median of the control-3 group is the highest while the High Dose group’s median is the lowest. The range of the third quartile increases for each group from left to right indicating that the groups grow more right skewed from left to right. The high dose and control 3 patients have no outliers while the other two groups do have outliers.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Q 6.5.4 Chapter6 = read.csv ( "ch06.csv" , header = TRUE , stringsAsFactors = FALSE ) Time = Chapter6 $ EX. 6 . 5.4 [ 2 : 126 ] Anomaly = Chapter6 $ EX. 6 . 5 . 4.1 [ 2 : 126 ] CO2 = Chapter6 $ EX. 6 . 5 . 4.2 [ 2 : 126 ] Anomaly = ts (Anomaly, start = 1880 , end = 2004 , frequency = 1 ) Carbon_Dioxide = ts (CO2, start = 1880 , end = 2004 , frequency = 1 ) plot (Anomaly) plot (Carbon_Dioxide) There is a general upwards trend for global mean surface air temperature. There appears to be slight increased fluctuation as time goes on. The regular fluctuations likely represent seasonal temperature changes.

There is a general upwards trend for carbon dioxide concentration from 1880 to 2004. The rate of increase for concentration has increased significantly over time indicating some kind of exponential growth. plot (Carbon_Dioxide) par ( new = TRUE ) plot (Anomaly, ylab = ( "" ), axes = FALSE ) axis ( 4 ) mtext ( "Anomaly" , side = 4 ) The overlayed plot shows that there is a moderate corelation between the two data features. Carbon dioxide concentration has followed the same relative trend of the anomaly data in the last century. Q 6.6.1 y = as.numeric (Chapter6 $ EX. 6 . 6 . 1.1 [ 2 : 27 ]) x1 = as.numeric (Chapter6 $ EX. 6 . 6 . 1.2 [ 2 : 27 ]) x2 = as.numeric (Chapter6 $ EX. 6 . 6 . 1.3 [ 2 : 27 ]) x3 = as.numeric (Chapter6 $ EX. 6 . 6 . 1.4 [ 2 : 27 ]) pairs ( ~ y + x1 + x2 + x3)

Most variables have very weak or no correlation to eachother. However there are two pairs of variables that have recognizable correlations. Y and X3 as well as X2 and X3 have moderately strong correlations Q 6.7.2 ggplot ( as.data.frame (Octane_Rating), aes ( sample = Octane_Rating)) + stat_qq () + stat_qq_line () This data is normal based on this normal probability plot. There aren’t many outliers and the majority of data points fall on the line. Q 6.7.3 Cycles_To_Failure = as.numeric ( na.omit (Chapter6 $ EX. 6 . 2.5 )) ggplot ( as.data.frame (Cycles_To_Failure), aes ( sample = Cycles_To_Failure)) + stat_qq () + stat_qq_line ()

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

I believe that we can assume cycles to failure is not normally distributed because approximately 50% of the data is not on or close to the linear line that indicates whether the data is linear.

HW01

Related Documents