HW2
pdf
keyboard_arrow_up
School
North Carolina State University *
*We aren’t endorsed by this school
Course
308
Subject
Statistics
Date
Jan 9, 2024
Type
Pages
3
Uploaded by AgentRatPerson905
R: Programming Assignment 2 (57 pts)
In this assignment you will create a .Rmd file and corresponding .html output. Write code to answer the questions
below, and upload both the .Rmd file and the .html file you create to the Moodle assignment link.
•
The file submitted must meet the R File Submission Guidelines available in the Resources and Information
section of the course.
•
If your file doesn’t meet these guidelines, we may take up to 50% off from your score.
•
It is time to put what you’ve learned into practice! You may not work with others on this assignment. You
cannot post to the discussion forum nor post anywhere else to obtain help on this assignment.
•
You may obtain help from your instructor (or another class’s instructor) if you are stuck. Class time is the best
time to get help!
•
Remember that there is a one business day turn around on email.
•
No late work will be accepted. If you have a documented emergency that prevents you from completing a
homework assignment, please contact your instructor and provide proof of the emergency.
The homework involves two parts:
•
One ‘prescriptive’ part where you are asked to perform tasks similar to what we’ve done in class.
•
One ‘open-ended’ part where you will discuss a dataset you’ve found and read it into R.
This second part will be used on the final project for the course!
Prescriptive Part (32 pts)
Dataset
About the data (more info at
http://archive.ics.uci.edu/ml/datasets/Dry+Bean+Dataset
):
Seven different types of dry beans were used in this research, taking into account the features such
as form, shape, type, and structure by the market situation. A computer vision system was developed
to distinguish seven different registered varieties of dry beans with similar features in order to obtain
uniform seed classification. For the classification model, images of 13,611 grains of 7 different registered
dry beans were taken with a high-resolution camera. Bean images obtained by computer vision system
were subjected to segmentation and feature extraction stages, and a total of 16 features; 12 dimensions
and 4 shape forms, were obtained from the grains.
There are 17 different variables in the data:
1. Area (A): The area of a bean zone and the number of pixels within its boundaries.
2. Perimeter (P): Bean circumference is defined as the length of its border.
3. Major axis length (L): The distance between the ends of the longest line that can be drawn from a bean.
4.
Minor axis length (l): The longest line that can be drawn from the bean while standing perpendicular to the
main axis.
5. Aspect ratio (K): Defines the relationship between L and l.
6. Eccentricity (Ec): Eccentricity of the ellipse having the same moments as the region.
7. Convex area (C): Number of pixels in the smallest convex polygon that can contain the area of a bean seed.
8. Equivalent diameter (Ed): The diameter of a circle having the same area as a bean seed area.
9. Extent (Ex): The ratio of the pixels in the bounding box to the bean area.
10. Solidity (S): Also known as convexity. The ratio of the pixels in the convex shell to those found in beans.
11. Roundness (R): Calculated with the following formula: (4piA)/(Pˆ2)
12. Compactness (CO): Measures the roundness of an object: Ed/L
13. ShapeFactor1 (SF1)
14. ShapeFactor2 (SF2)
15. ShapeFactor3 (SF3)
16. ShapeFactor4 (SF4)
17. Class (Seker, Barbunya, Bombay, Cali, Dermosan, Horoz and Sira)
1
Tasks
For this section, write a brief discussion about what you are going to do (
don’t just copy and paste the question
prompts
) using markdown text followed by code chunks that display the code and results for each question below.
Any answers/information that you are asked to provide should be done using markdown.
1.
(2 pts) Use
tidyverse
type functions to read in the
Dry_Bean_Dataset.xlsx
data set. The data is available
at
https://www4.stat.ncsu.edu/~online/datasets/Dry_Bean_Dataset.xlsx
.
•
(2 pts) After the code chunk write markdown text that includes inline R code. You should output the
sentence
The data has been read into the object
name of your object here
.
This object is a
describe
the object class here
that has
use_in-line_R_code_to_render_this_value
variables and
use_in-line_R_code_to_render_this_value
observations.
2.
Use
tidyverse
functions and
chaining
to do the following modifications to your data object (saving the
result as a new object): The new data object should
•
Not include the
Extent
or
AspectRatio
variables. (2 pts)
•
Renames the
ShapeFactor1
,
ShapeFactor2
,
ShapeFactor3
and
ShapeFactor4
variables to
SF1
,
SF2
,
SF3
, and
SF4.
(2 pt)
•
Removes any observations in which the bean class is not one of
DERMASON
,
SEKER
,
SIRA
,
BOMBAY
or
CALI.
(2 pts)
•
Creates a new variable that is the average of the newly renamed shape factor variables. (2 pts)
•
Creates a new categorical variable that bins the new average shape factor variable into three categories
(you chose the values for the category names) as follows (3 pts):
–
If the value is larger than 0.6 give the new variable a character string indicating it is in the largest
category
–
If the value is larger than 0.4 but less than or equal to 0.6 give the new variable a character string
indicating it is in the middle category
–
Otherwise give the new variable a character string indicating it is in the lowest category
•
Reorders the rows of the data set descending on the
MajorAxisLength
variable. (2 pts)
Finally, display the new data object so it prints out (just call the new object name). Remember, you should
have a bit of markdown text prior to this code chunk that describes what you are going to do (and it shouldn’t
just be a copy and paste of the prompt!). Write out what you want to do and which function your are choosing
(from the
tidyverse
) to accomplish it. (2 pts)
3.
Using your new data object, create a two-way contingency table for the Class variable and the
binned
variable
you created in the above step. (2 pts)
•
When creating the table, only include observations where the
SF1
variable is less than 0.008 (1 pt)
•
In a sentence below the code chunk, describe what the number in the top left cell of the table means. (1
pt)
4.
Find the Mean, Median, and Standard Deviation summary statistics for the Area and Perimeter variables
for
each Class of bean
. (4 pts)
•
In a sentence below the code chunk, describe what all of the statistics found mean for one of the bean
classes (2 pts)
5.
Create a correlation matrix between the
Area
,
Perimeter
,
MajorAxisLength
, and
MinorAxisLength
vari-
ables. Note: This can be easily done using the
cor()
function. (3 pts)
2
“Open-ended part” (25 pts)
Create a second section where you write markdown and create code to answer the questions below.
In the first homework you were to find a dataset of interest and read that data into R. We will reread this data into
this program (or you can find a new data set if your first dataset didn’t work or is no longer of interest to you).
First, write markdown that answers the following questions (you can copy these from your first homework if you
aren’t changing your dataset).
1. In two or three sentences, describe your major and how you might interact with data. (1 pts)
2.
Find a dataset that relates to your major and/or your goals for interacting with data. The dataset should have
at least four variables to investigate. Give the following information:
•
The URL where you found the dataset (1 pts)
•
Two or three sentences about why this dataset is of interest to you. (1 pts)
•
NEW ITEM!
Give a brief description of each variable in your dataset including whether the variable is
quantitative or categorical. (3 pts)
•
NEW ITEM!
Identify two questions about the data that you can answer using numerical (contingency
tables, summary stats) and graphical summaries of the data. (4 pts) For example, if you had data about
animals and the amount they sleep, one question you might want to try to answer is
i.
Do certain types of animals tend to sleep longer than others? This could be investigated numerically
using sample means/medians and graphically using histograms or density plots.
ii.
You’d want to develop a second question as well that you can investigate with numerical and
graphical summaries.
Now you’ll write R code to
3.
Read the data into R. You can read the data from the URL directly (if that’s possible) or you can download it
and read it in from there. You can use your code from homework 1 if nothing has changed with your dataset!
(5 pts)
4.
Summarize the data numerically in an effort to answer your two questions of interest. You should create
useful statistics (contingency tables/summary stats) to do so! (6 pts)
•
Write markdown to state what the summaries say about your questions of interest. (4 pts)
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Related Questions
* Question Completion Status:
QUESTION 1
Arnold purchased a $1,300 set of golf clubs on a nine-month layaway plan and had to pay a monthly payment of $158.89. What is the fee charged for the layaway plan?
For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac).
BIUS
Paragraph
Arial
14px
A
arrow_forward
Please show how to do this problem using excel, Show all of the formulas you use plese
arrow_forward
Ski resorts are interested in the average age that children take their first ski and snowboard lessons. They need this information to optimally plan their ski classes. Match the vocabulary word with its corresponding example.
The list of the 92 ages that the children from the study took their first lessonThe age that children take their first lessonThe average age that all children take their first lessonAll children who ski and snowboardThe average age that the 92 children took their first lessonThe 92 children who were asked when they took their first lesson
StatisticSamplePopulationDataParameterVariable
arrow_forward
create a diagram of the situation 81% of college students take online classes, 36% online students have gained more than 5 pounds in last 6 months 15% students did not take online gained 5 pounds in the last 6 months.
arrow_forward
please answer the first 2 question thank you
arrow_forward
K Maya Paniagua - Unit 8- REVIEW X
E The
web.kamihq.com/web/viewer.html?state%=%7B"ids"%3A%5!
marks
Maya Paniagua - Cl..
nt Edu D
O A Geometry 202.
Maya Paniagua - Unit 8
33. Find AB.
34
5x + 7
A
63
C
8х-11
arrow_forward
Tn Poodll Record MP3..
EDUTYPING LINK .
Kami Schoolo...
STAAR+2019.pdf
athematics
GO ON
age 13
7 Aquarium I contains 4.6 gallons of water. Louise will begin filling Aquarium I at a rate of
1.2 gallons per minute.
Aquarium II contains 54.6 gallons of water. Isaac will begin draining Aquarium II at a rate of
0.8 gallon per minute.
+
After how many minutes will both aquariums contain the same amount of water?
A 148 min
125 min
C
25 min
D 50 min
arrow_forward
How do I graph this?
arrow_forward
Widget Sales
Jesaki Inc. is trying to enter the widget market. The research department established the following
price-demand, cost, and revenue functions:
p(x) = 60 - 1.20x
C(x) = 210 + 12x
Cost function
|R(x) = xp(x) = x(60 - 1.20x) Revenue function
Price-demand function
where a is in thousands of widgets and C(x) and R(x) are in thousands of dollars. The price p(x)
is the price in dollars of one widget when the demand is a thousand widgets. All three functions
have domain 1 ≤ x ≤ 50.
arrow_forward
help please answer in text form with proper workings and explanation for each and every part and steps with concept and introduction no AI no copy paste remember answer must be in proper format with all working
arrow_forward
Option 1: Inyestigate salaries in a career that you choose in cities across
the country. Use the table to help organize your information.
• Choose 5 cities in the United States.
• Choose a career you would like to investigate.
Find the average salary for the career you are investigating for each
of those cities.
o Include a link to the sources of your information.
• Find a way to compare the cost of living in each of those cities. You
can use CPI numbers like you did in Lesson 5B (this link may help:
https://www.bls.gov/regions/subjects/consumer-price-
indexes.htm) or other sources to find the cost of a "basket of goods"
that is relevant to your search.
o Include links to the sources of your information.
Based on your findings, compare and contrast the value of the salary
in each city.
o You may want to use a table or spreadsheet like you did in
lessons 5A or 5B. Include this in your submission.
• Why do you think the value of the salary paid for the same career
may be different in…
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
data:image/s3,"s3://crabby-images/b163a/b163ac7fc560a1b46434c46e2314e7017295e5d4" alt="Text book image"
Algebra for College Students
Algebra
ISBN:9781285195780
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/9713a/9713a0961d31aeaf477565871bf8bdfb5893217c" alt="Text book image"
Intermediate Algebra
Algebra
ISBN:9781285195728
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/af711/af7111c99977ff8ffecac4d71f474692077dfd4c" alt="Text book image"
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
data:image/s3,"s3://crabby-images/de8e7/de8e720adb18d6b639db473f76934bb9fad70292" alt="Text book image"
data:image/s3,"s3://crabby-images/381d1/381d1772a18ca438dafea53a92d71824e6c59dd4" alt="Text book image"
Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/21a4f/21a4f62f7828afb60a7e1c20d51feee166b1a145" alt="Text book image"
Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,
Related Questions
- * Question Completion Status: QUESTION 1 Arnold purchased a $1,300 set of golf clubs on a nine-month layaway plan and had to pay a monthly payment of $158.89. What is the fee charged for the layaway plan? For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac). BIUS Paragraph Arial 14px Aarrow_forwardPlease show how to do this problem using excel, Show all of the formulas you use plesearrow_forwardSki resorts are interested in the average age that children take their first ski and snowboard lessons. They need this information to optimally plan their ski classes. Match the vocabulary word with its corresponding example. The list of the 92 ages that the children from the study took their first lessonThe age that children take their first lessonThe average age that all children take their first lessonAll children who ski and snowboardThe average age that the 92 children took their first lessonThe 92 children who were asked when they took their first lesson StatisticSamplePopulationDataParameterVariablearrow_forward
- create a diagram of the situation 81% of college students take online classes, 36% online students have gained more than 5 pounds in last 6 months 15% students did not take online gained 5 pounds in the last 6 months.arrow_forwardplease answer the first 2 question thank youarrow_forwardK Maya Paniagua - Unit 8- REVIEW X E The web.kamihq.com/web/viewer.html?state%=%7B"ids"%3A%5! marks Maya Paniagua - Cl.. nt Edu D O A Geometry 202. Maya Paniagua - Unit 8 33. Find AB. 34 5x + 7 A 63 C 8х-11arrow_forward
- Tn Poodll Record MP3.. EDUTYPING LINK . Kami Schoolo... STAAR+2019.pdf athematics GO ON age 13 7 Aquarium I contains 4.6 gallons of water. Louise will begin filling Aquarium I at a rate of 1.2 gallons per minute. Aquarium II contains 54.6 gallons of water. Isaac will begin draining Aquarium II at a rate of 0.8 gallon per minute. + After how many minutes will both aquariums contain the same amount of water? A 148 min 125 min C 25 min D 50 minarrow_forwardHow do I graph this?arrow_forwardWidget Sales Jesaki Inc. is trying to enter the widget market. The research department established the following price-demand, cost, and revenue functions: p(x) = 60 - 1.20x C(x) = 210 + 12x Cost function |R(x) = xp(x) = x(60 - 1.20x) Revenue function Price-demand function where a is in thousands of widgets and C(x) and R(x) are in thousands of dollars. The price p(x) is the price in dollars of one widget when the demand is a thousand widgets. All three functions have domain 1 ≤ x ≤ 50.arrow_forward
- help please answer in text form with proper workings and explanation for each and every part and steps with concept and introduction no AI no copy paste remember answer must be in proper format with all workingarrow_forwardOption 1: Inyestigate salaries in a career that you choose in cities across the country. Use the table to help organize your information. • Choose 5 cities in the United States. • Choose a career you would like to investigate. Find the average salary for the career you are investigating for each of those cities. o Include a link to the sources of your information. • Find a way to compare the cost of living in each of those cities. You can use CPI numbers like you did in Lesson 5B (this link may help: https://www.bls.gov/regions/subjects/consumer-price- indexes.htm) or other sources to find the cost of a "basket of goods" that is relevant to your search. o Include links to the sources of your information. Based on your findings, compare and contrast the value of the salary in each city. o You may want to use a table or spreadsheet like you did in lessons 5A or 5B. Include this in your submission. • Why do you think the value of the salary paid for the same career may be different in…arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- Algebra for College StudentsAlgebraISBN:9781285195780Author:Jerome E. Kaufmann, Karen L. SchwittersPublisher:Cengage LearningIntermediate AlgebraAlgebraISBN:9781285195728Author:Jerome E. Kaufmann, Karen L. SchwittersPublisher:Cengage LearningBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt
- Elementary Geometry for College StudentsGeometryISBN:9781285195698Author:Daniel C. Alexander, Geralyn M. KoeberleinPublisher:Cengage LearningMathematics For Machine TechnologyAdvanced MathISBN:9781337798310Author:Peterson, John.Publisher:Cengage Learning,
data:image/s3,"s3://crabby-images/b163a/b163ac7fc560a1b46434c46e2314e7017295e5d4" alt="Text book image"
Algebra for College Students
Algebra
ISBN:9781285195780
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/9713a/9713a0961d31aeaf477565871bf8bdfb5893217c" alt="Text book image"
Intermediate Algebra
Algebra
ISBN:9781285195728
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/af711/af7111c99977ff8ffecac4d71f474692077dfd4c" alt="Text book image"
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
data:image/s3,"s3://crabby-images/de8e7/de8e720adb18d6b639db473f76934bb9fad70292" alt="Text book image"
data:image/s3,"s3://crabby-images/381d1/381d1772a18ca438dafea53a92d71824e6c59dd4" alt="Text book image"
Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/21a4f/21a4f62f7828afb60a7e1c20d51feee166b1a145" alt="Text book image"
Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,