In-class Worksheet- Data Science and Social Interactions Week 1-1-1
docx
keyboard_arrow_up
School
University of Cincinnati, Main Campus *
*We aren’t endorsed by this school
Course
1082L
Subject
Statistics
Date
Apr 3, 2024
Type
docx
Pages
8
Uploaded by MegaSalamanderMaster948
Names: Becca Rose, Avery Maple, Diya Patel, Sydney Raleigh Section: 001
In-class Worksheet: Data Science and Social Interactions Week 1
Biology 1082L
This worksheet will take you through a series of steps to teach you the basics of coding using R Studio. Specifically, you will examine a dataset that you would be familiar with from the Behavior module and analyze the data using both Excel and R Studio. The goals of this exercise are to help you learn some of the differences between the two statistical programs, understand
the benefits of R, and learn the basics of R before examining some more complicated biological data during Week 2.
First, visit https://posit.cloud/
and create an account to use the free
version of R Studio. While you can download R or R Studio to your personal hard drive, the cloud-based version of this through Posit Cloud will help ensure consistency across all PC and Mac devices, making it easier for us to assist you.
1.
Conduct an ANOVA in Excel.
Download the Excel spreadsheet from Canvas under the Data Science and Social Interactions Week 1 Module. The spreadsheet contains raw data that is in a familiar format that you might have utilized for the Behavior Module. Using the Data Analysis Toolpak, conduct an ANOVA analysis on this data.
a.
Copy/paste below the entire output from the ANOVA analysis that the Toolpak provides.
(1 pt)
b.
Write a sentence that summarizes the results from the ANOVA analysis, following the format in the General Scientific Writing Expectations document for ANOVA analyses. (Using the incorrect format will earn an automatic 0 for this question). (2 pts)
From running the ANOVA test our data showed us that we will reject the null hypothesis because the p-value < .05.
2.
Make a bar graph with error bars in Excel.
Using either the formulas in Excel or the Descriptive Statistics function in Excel’s Data Analysis Toolpak, find the mean values and standard error values for each of the three groups and create a bar graph
. Do not forget that when you add error bars in Excel, you need to “customize”, “specify the values”, and highlight the three standard error values you
need to use. Copy/paste the bar graph below. Be sure to include a figure legend. (2 pts)
pH =6
pH=7
pH=8
0
50
100
150
200
250
Water pH
Time Spent Moving(s)
3.
Format your data set to be imported into R Studio.
There are two main steps you will need to follow in order to format your Excel file in such a way that can be properly imported into R Studio.
a.
Re-organize the data into two columns and “clean up” the spreadsheet itself.
b.
Once complete, save the file to make sure you have a .xls or .xlsx version of the file. Then, “Save As” in order to convert the file to a .csv file.
At this point in time, your group should watch the Code Along Video 1 – Week 1.
During these Code Along videos, you will want to have this worksheet nearby, watch
the Code Along Video, and have all of your group members follow the same steps as
Dr. Hobson demonstrates. 4.
Import your .csv file into R Studio. Utilize the videos on Canvas and the Code Along Video 1 to assist you. This essentially involves three steps:
a.
Upload your document into the R Studio platform.
b.
Import the document into your Console using the read.csv command.
c.
Attach the data set to ensure that R Studio knows which data set you are utilizing.
5.
The Code Along Video will lead you through the following:
a.
Downloading and activating the “tidyverse” package.
b.
Loading your data set.
c.
Checking the data set.
d.
Summarizing the data set to find the means and standard error values.
e.
Plotting a bar plot using the means, adjusting the colors of the bars, adding error bars, and adding axis labels.
**Note: please ignore the instructions to add a title to your graph**
6.
Copy/paste your code here that produced your graph: (1 pt)
a.
ggplot(data=summary.stats, aes(x=pH, y=mean)) + b.
geom_bar(stat='identity', fill="darkred") +
c.
labs(title="Effect of pH on movement", d.
y="Mean time spent moving") +
e.
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), f.
width=0.25, linewidth=1)
7.
Copy/paste your graph itself here: (2 pts)
a.
a
At this point in time, your group should watch the Code Along Video 2 – Week 1.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
8.
Conduct an ANOVA in R Studio. a.
Copy/paste below the code you used to conduct the ANOVA and to provide your summary results. (1 pt)
-
results.anova <- aov(TimeSpentMoving ~ pH, data=anova_data)
b.
Write a sentence that summarizes the results from the ANOVA analysis, following the format in the General Scientific Writing Expectations document for ANOVA analyses. (Using the incorrect format will earn an automatic 0 for this question). (2 pts)
(F = 13.88; df1 = 2; df2 = 42; p = 2.35 x 10^-5)
9.
Make a box plot in R Studio following the instructions provided in the video. a.
Copy/paste the coding below: (1 pt)
-
ggplot(anova_data, aes(x=pH, y=TimeSpentMoving)) +
-
geom_boxplot(fill="green") +
-
labs(title="Effect of pH on movement")
b.
Copy/paste the box plot itself below: (2 pts)
10. Add the raw data to your boxplot using the “jitter” command as shown in the video.
a.
Copy/paste the coding below: (1 pt)
i.
ggplot(anova_data, aes(x=pH, y=TimeSpentMoving)) +
ii.
geom_boxplot(fill="green") +
iii.
labs(title="Effect of pH on movement") +
iv.
geom_jitter(shape=16, position=position_jitter(0.1))
b.
Copy/paste the box plot itself below: (2 pts)
c.
Are there any outliers revealed in the box plot with raw data? If so, describe them. (0.5 pt)
Yes, there were a few outliers that became apparent with the box plot. The outliers found were right above the error lines. This shows that although we did have a few outliers they were close to the rest of our data which means that the outliers had no substantial impact on the rest of the data.
d.
Are there any clusters of data within a treatment group that have been revealed in the box plot with raw data? If so, describe them. (0.5 pt)
No, I do not believe we have any clusters of data.
11. Now, go into Excel and make up some data of your own that would require an ANOVA analysis and use a bar graph for data visualization. Utilize three treatment groups and make sure your dependent variable is quantitative. Following the same prompts as above, but now doing so without a video to show you how to do so, import your data set into R Studio, and make a box plot with raw data showing on the graph. This time, each individual group member needs to provide a unique graph
. Each graph should
have different axis labels, different mean values, different color schemes, etc. (2 pts)
1.
Becca Rose
2.
Avery
3.
Sydney
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
5.
Diya Patel
Related Questions
Just need help with part 3, thank you!
arrow_forward
On December 17, 2007 baseball writer John Hickey wrote an article for the Seattle P-I about increases to ticket prices for Seattle Mariners games during the 2008 season. The article included a data set that listed the average ticket price for each MLB team, the league in which the team plays (AL or NL), the number of wins during the 2007 season and the cost per win (in dollars). The data for the 16 National League teams are shown below.
team
league
price
wins
cost/win
Arizona Diamondbacks
NL
19.68
90
35.40
Atlanta Braves
NL
17.07
84
32.89
Chicago Cubs
NL
34.30
85
65.33
Cincinnati Reds
NL
17.90
72
40.32
Colorado Rockies
NL
14.72
90
26.67
Florida Marlins
NL
16.70
71
38.13
Houston Astros
NL
26.66
73
59.11
Los Angeles Dodgers
NL
20.09
82
34.64
Milwaukee Brewers
NL
18.11
83
35.37
N.Y. Mets
NL
25.28
88
46.56
Philadelphia Phillies
NL
26.73
89
48.69
Pittsburgh Pirates
NL
17.08
68
40.67
San Diego Padres
NL
20.83
89
38.15
San Francisco Giants
NL
24.53
71…
arrow_forward
On December 17, 2007 baseball writer John Hickey wrote an article for the Seattle P-I about increases to ticket prices for Seattle Mariners games during the 2008 season. The article included a data set that listed the average ticket price for each MLB team, the league in which the team plays (AL or NL), the number of wins during the 2007 season and the cost per win (in dollars). The data for the 16 National League teams are shown below.
team
league
price
wins
cost/win
Arizona Diamondbacks
NL
19.68
90
35.40
Atlanta Braves
NL
17.07
84
32.89
Chicago Cubs
NL
34.30
85
65.33
Cincinnati Reds
NL
17.90
72
40.32
Colorado Rockies
NL
14.72
90
26.67
Florida Marlins
NL
16.70
71
38.13
Houston Astros
NL
26.66
73
59.11
Los Angeles Dodgers
NL
20.09
82
34.64
Milwaukee Brewers
NL
18.11
83
35.37
N.Y. Mets
NL
25.28
88
46.56
Philadelphia Phillies
NL
26.73
89
48.69
Pittsburgh Pirates
NL
17.08
68
40.67
San Diego Padres
NL
20.83
89
38.15
San Francisco Giants
NL
24.53
71…
arrow_forward
uppose Laura, a facilities manager at a health and wellness company, wants to estimate the difference in the average amount of time that men and women spend at the company's fitness centers each week.
Laura randomly selects 13 adult male fitness center members from the membership database and then selects 13 adult female members from the database. Laura gathers data from the past month containing logged time at the fitness center for these members. She plans to use the data to estimate the difference in the time men and women spend per week at the fitness center. The sample statistics are summarized in the table.
Population
Populationdescription
Population mean(unknown)
Samplesize
Sample mean(min)
Sample standarddeviation (min)
1
male
?1
?1=13
?⎯⎯⎯1=132.5
?1=45.3
2
female
?2
?2=13
?⎯⎯⎯2=108.7
?2=31.1
df=21.256
The population standard deviations are unknown and unlikely to be equal, based on the sample data. Laura plans to use the two-sample ?-procedures to estimate the…
arrow_forward
tion 2 of 15
Last summer, the Smith family drove through seven different states and visited various popular landmarks. The prices of gasoline
in dollars per gallon varied from state to state and are listed below.
$2.34, $2.75, $2.48, $3.58, $2.87, $2.53, $3.31
Click to download the data in your preferred format.
CrunchIt! CSV Excel JMP Mac Text Minitab PC Text R SPSS TI Calc
Calculate the range of the price of gas. Give your solution to the nearest cent.
range:
dollars per gallon
DELL
&
4.
7
8.
arrow_forward
Please take a screenshot of all the steps
********************************
Q 1 (a) Enter the following data into PSPP :
STUDENT NAME
DEPARTMENT
COURSE
MARKS
Tommy
1
Computer Networks
75
John
2
Software Engineering
87
Anabell
1
Programming
94
Rose
2
Information Technology
50
Sarah
2
Software Engineering
72
Value=1 represents “CS” Value=2 represents “IT”
Perform the following on the above data:
Using the Descriptive analysis calculate the Sum, Mean, Mode and Standard deviation for Marks
Do a Frequency analysis on the variable “Department” and create a Pie chart for
arrow_forward
Continue monitoring the process. A second ten days of data have been collected, see table labeled “2nd 10 Days of Monitoring Reservation Processing Time” in the Data File.
Develop Xbar and R charts for the 2nd 10 days of monitoring. Plot the data for the 2nd 10 days on the Xbar and R charts.
Is the reservation process for the 2nd 10 days of monitoring in control? If the control chart indicates an out-of-control process, note which days, the pattern, and whether it is the Xbar or R chart.
Based on the X-bar and R Charts that you developed for the 2nd 10 days of data, is the process in control?
Group of answer choices
No. The X-bar and R Charts are both out of control.
No. The X-bar Chart is in control, but the R Chart is out of control.
No. The R Chart is in control, but the X-bar Chart is out of control.
Yes. The X-bar and R Charts are both in control.
arrow_forward
Data is shared with us every day and we encounter it wherever we go. There are different types of data from a variety of data sources.
Identify 4 different types of data.
arrow_forward
On December 17, 2007 baseball writer John Hickey wrote an article for the Seattle P-I about increases to ticket prices for Seattle Mariners
games during the 2008 season. The article included a data set that listed the average ticket price for each MLB team, the league in which the team
plays (AL or NL), the number of wins during the 2007 season and the cost per win (in dollars). The data for the 16 National League teams are shown
below.
league
price
wins
cost/win
team
Arizona Diamondbacks
NL
19.68
90
35.40
Atlanta Braves
NL
17.07
84
32.89
Chicago Cubs
NL
34.30
85
65.33
cincinnati Reds
NL
17.90
72
40.32
Colorado Rockies
NL
14.72
90
26.67
Florida Marlins
NL
16.70
71
38.13
Houston Astros
NL
26.66
73
59.11
Los Angeles Dodgers
20.09
82
34.64
NL
Milwaukee Brewers
NL
18.11
83
35.37
N.Y. Mets
NL
25.28
88
46.56
Philadelphia Phillies
26.73
89
48.69
NL
Pittsburgh Pirates
NL
17.08
68
40.67
San Diego Padres
NL
20.83
89
38.15
San Francisco Giants
NL
24.53
71
56.00
St. Louis Cardinals
NL
29.78
78…
arrow_forward
Please help with (d)
arrow_forward
Please help me
arrow_forward
Explain when can we use data grouping?
arrow_forward
A group of 20 young adults was asked about their political party affiliation (examples are Democrat, Republican, Libertarian, or Green Party), and the results can be downloaded from the data file Political Affiliation.
In StatKey, which menu option would you select under "Descriptive Statistics and Graphs" to graph the data?
One Quantitative Variable
One Categorical Variable
One Quantitative and One Categorical Variable
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Related Questions
- Just need help with part 3, thank you!arrow_forwardOn December 17, 2007 baseball writer John Hickey wrote an article for the Seattle P-I about increases to ticket prices for Seattle Mariners games during the 2008 season. The article included a data set that listed the average ticket price for each MLB team, the league in which the team plays (AL or NL), the number of wins during the 2007 season and the cost per win (in dollars). The data for the 16 National League teams are shown below. team league price wins cost/win Arizona Diamondbacks NL 19.68 90 35.40 Atlanta Braves NL 17.07 84 32.89 Chicago Cubs NL 34.30 85 65.33 Cincinnati Reds NL 17.90 72 40.32 Colorado Rockies NL 14.72 90 26.67 Florida Marlins NL 16.70 71 38.13 Houston Astros NL 26.66 73 59.11 Los Angeles Dodgers NL 20.09 82 34.64 Milwaukee Brewers NL 18.11 83 35.37 N.Y. Mets NL 25.28 88 46.56 Philadelphia Phillies NL 26.73 89 48.69 Pittsburgh Pirates NL 17.08 68 40.67 San Diego Padres NL 20.83 89 38.15 San Francisco Giants NL 24.53 71…arrow_forwardOn December 17, 2007 baseball writer John Hickey wrote an article for the Seattle P-I about increases to ticket prices for Seattle Mariners games during the 2008 season. The article included a data set that listed the average ticket price for each MLB team, the league in which the team plays (AL or NL), the number of wins during the 2007 season and the cost per win (in dollars). The data for the 16 National League teams are shown below. team league price wins cost/win Arizona Diamondbacks NL 19.68 90 35.40 Atlanta Braves NL 17.07 84 32.89 Chicago Cubs NL 34.30 85 65.33 Cincinnati Reds NL 17.90 72 40.32 Colorado Rockies NL 14.72 90 26.67 Florida Marlins NL 16.70 71 38.13 Houston Astros NL 26.66 73 59.11 Los Angeles Dodgers NL 20.09 82 34.64 Milwaukee Brewers NL 18.11 83 35.37 N.Y. Mets NL 25.28 88 46.56 Philadelphia Phillies NL 26.73 89 48.69 Pittsburgh Pirates NL 17.08 68 40.67 San Diego Padres NL 20.83 89 38.15 San Francisco Giants NL 24.53 71…arrow_forward
- uppose Laura, a facilities manager at a health and wellness company, wants to estimate the difference in the average amount of time that men and women spend at the company's fitness centers each week. Laura randomly selects 13 adult male fitness center members from the membership database and then selects 13 adult female members from the database. Laura gathers data from the past month containing logged time at the fitness center for these members. She plans to use the data to estimate the difference in the time men and women spend per week at the fitness center. The sample statistics are summarized in the table. Population Populationdescription Population mean(unknown) Samplesize Sample mean(min) Sample standarddeviation (min) 1 male ?1 ?1=13 ?⎯⎯⎯1=132.5 ?1=45.3 2 female ?2 ?2=13 ?⎯⎯⎯2=108.7 ?2=31.1 df=21.256 The population standard deviations are unknown and unlikely to be equal, based on the sample data. Laura plans to use the two-sample ?-procedures to estimate the…arrow_forwardtion 2 of 15 Last summer, the Smith family drove through seven different states and visited various popular landmarks. The prices of gasoline in dollars per gallon varied from state to state and are listed below. $2.34, $2.75, $2.48, $3.58, $2.87, $2.53, $3.31 Click to download the data in your preferred format. CrunchIt! CSV Excel JMP Mac Text Minitab PC Text R SPSS TI Calc Calculate the range of the price of gas. Give your solution to the nearest cent. range: dollars per gallon DELL & 4. 7 8.arrow_forwardPlease take a screenshot of all the steps ******************************** Q 1 (a) Enter the following data into PSPP : STUDENT NAME DEPARTMENT COURSE MARKS Tommy 1 Computer Networks 75 John 2 Software Engineering 87 Anabell 1 Programming 94 Rose 2 Information Technology 50 Sarah 2 Software Engineering 72 Value=1 represents “CS” Value=2 represents “IT” Perform the following on the above data: Using the Descriptive analysis calculate the Sum, Mean, Mode and Standard deviation for Marks Do a Frequency analysis on the variable “Department” and create a Pie chart forarrow_forward
- Continue monitoring the process. A second ten days of data have been collected, see table labeled “2nd 10 Days of Monitoring Reservation Processing Time” in the Data File. Develop Xbar and R charts for the 2nd 10 days of monitoring. Plot the data for the 2nd 10 days on the Xbar and R charts. Is the reservation process for the 2nd 10 days of monitoring in control? If the control chart indicates an out-of-control process, note which days, the pattern, and whether it is the Xbar or R chart. Based on the X-bar and R Charts that you developed for the 2nd 10 days of data, is the process in control? Group of answer choices No. The X-bar and R Charts are both out of control. No. The X-bar Chart is in control, but the R Chart is out of control. No. The R Chart is in control, but the X-bar Chart is out of control. Yes. The X-bar and R Charts are both in control.arrow_forwardData is shared with us every day and we encounter it wherever we go. There are different types of data from a variety of data sources. Identify 4 different types of data.arrow_forwardOn December 17, 2007 baseball writer John Hickey wrote an article for the Seattle P-I about increases to ticket prices for Seattle Mariners games during the 2008 season. The article included a data set that listed the average ticket price for each MLB team, the league in which the team plays (AL or NL), the number of wins during the 2007 season and the cost per win (in dollars). The data for the 16 National League teams are shown below. league price wins cost/win team Arizona Diamondbacks NL 19.68 90 35.40 Atlanta Braves NL 17.07 84 32.89 Chicago Cubs NL 34.30 85 65.33 cincinnati Reds NL 17.90 72 40.32 Colorado Rockies NL 14.72 90 26.67 Florida Marlins NL 16.70 71 38.13 Houston Astros NL 26.66 73 59.11 Los Angeles Dodgers 20.09 82 34.64 NL Milwaukee Brewers NL 18.11 83 35.37 N.Y. Mets NL 25.28 88 46.56 Philadelphia Phillies 26.73 89 48.69 NL Pittsburgh Pirates NL 17.08 68 40.67 San Diego Padres NL 20.83 89 38.15 San Francisco Giants NL 24.53 71 56.00 St. Louis Cardinals NL 29.78 78…arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillElementary Geometry for College StudentsGeometryISBN:9781285195698Author:Daniel C. Alexander, Geralyn M. KoeberleinPublisher:Cengage LearningBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt
- Mathematics For Machine TechnologyAdvanced MathISBN:9781337798310Author:Peterson, John.Publisher:Cengage Learning,Holt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL