R Programming Assignment Output Report.docxweek 1 Zaki

docx

School

Trine University *

*We aren’t endorsed by this school

Course

5003

Subject

Computer Science

Date

Nov 24, 2024

Type

docx

Pages

12

Uploaded by AmbassadorPencilRedPanda44

Report
R Programming Assignment Output Report Zaki Ahmed Mohammed Trine University IS---5213-3O2--OL-FA-2023 - Data Science and Big Data Dr. Louis DeWeaver
R Programming Assignment Output Report Step 1: Describe the Data In the first stage, the primary step is importing the requisite dataset and R library. The research used the well-recognized "iris" dataset. The dataset includes comprehensive information about the range of iris blossoms. The str() function is used to examine the structural characteristics of a dataset. The process above comprehensively summarizes the dataset's variables and their respective data types. Subsequently, the summary() method is used to compute summary statistics for all numeric variables in the dataset. The data shown before are categorized as follows: lowest, first quartile (25th percentile), median (second quartile), mean, third quartile (75th percentile), and maximum values. The data includes a categorical variable called "Species" that represents the frequency distribution of each species. The head() method is then used to display the first six observations of the dataset, providing a preliminary overview of the data. The analysis is now concluded. Output : Summary statistics, together with the first six records. 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... Sepal.Length Sepal.Width Petal.Length Petal.Width Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 Median :5.800 Median :3.000 Median :4.350 Median :1.300 Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500 Species setosa :50 versicolor:50 virginica :50 A data.frame: 6 × 5 Sepal.Length Sepal.Width Petal.Length Petal.Width Species <dbl> <dbl> <dbl> <dbl> <fct> 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Step 2: Box-Whisker Plots At this point, a box plot illustrates the range of values for the "Sepal. Length" variable in several iris flower species. The various species are shown along the horizontal axis, while the sepal length is shown along the vertical axis. The boxes graphically demonstrate the range of sepal lengths seen in a given species, including a notch to facilitate a basic comparison of medians. When comparing sepal lengths across species, color may help you see similarities and differences more quickly. According to the guidelines, "Your Name" should serve as the main heading of the story. Output:
The resulting visualization is a box plot that effectively illustrates the range of sepal lengths across various iris species.
Step 3: Histograms A histogram is generated for the variable "Sepal. Length" at this juncture. The histogram visually represents the distribution of sepal lengths over the whole sample. To ensure the presence of appropriate intervals within the data, the determination of the number of breaks has been conducted using human means.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
A red density line has been included in the histogram to improve the legibility of the graphical representation. The statement may provide a more precise visualization of the intrinsic probability density function of the data distribution. Output:
The resulting output comprises a histogram visually representing the distribution of sepal lengths, along with a density line overlaid over the histogram.
Step 4: Scatter Plots This step shows the variables "Sepal. Length" and "Sepal. Width" as a scatter plot. We've added a marker in the number 16 to each data point to make it easier to read.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
The data points for the "setosa" group are red, the data points for the "versicolor" group are green, and the data points for the "Virginia" group are blue. Their distinct colors aid the capacity to recognize various species. Output: The resulting visual representation is a scatter plot that effectively depicts the correlation between the measurements of sepal length and sepal breadth while also using distinct color coding to differentiate between several species of the iris flower.
Step 5: Simple Math The resulting scatter plot clearly shows the connection between sepal length and sepal width measurements. In addition, the scatter plot uses a distinct color scheme to differentiate between the many different iris flower species.
Output: The result displays, in decreasing order, the derived statistics for the "Sepal.Length" variable as well as the median values for each species.'Mean: 5.84' 'Median: 5.80' 'Min: 4.30' 'Max: 7.90' 'Standard Deviation: 0.83' A data.frame: 3 × 2 Species Sepal.Length <fct> <dbl> 3 virginica 6.5 2 versicolor 5.9 1 setosa 5.0
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help