Past exam Qus
pdf
keyboard_arrow_up
School
University of New South Wales *
*We aren’t endorsed by this school
Course
2041
Subject
Geography
Date
May 3, 2024
Type
Pages
31
Uploaded by AdmiralSparrow4236
2 Question 1.
Answer all parts 1A to 1E.
Prompted by the devastating fire season this year in Australia, a climate scientist decided to analyse if climate change played a significant role in this particular event. The scientist first plotted the annual average of monthly maximum temperature anomalies in southern Australia since 1910: The scientist then performed a statistical analysis on her computer to analyse the data above and sees the following on her screen: 1A)
What kind of analysis did she perform? Write the equation, explain the information you can gather from this output in your own words, and conclude.
3 Dangerous fire conditions do not only depend on maximum temperatures but also on fuel load (vegetation biomass) and if the fuel load is dry enough to burn. She therefore performed a similar analysis for annual mean precipitation (in mm per month) over time and gets the following result: 1B) Write the equation, explain the information you can gather from this output in your own words, and conclude. The scientist then divided her data into three groups, years with large fires (>1M acres burnt), years with medium fires (between 0.5M and 1M acres burnt) and years with small fires (< 0.5 M burnt), and produced the following box plot:
4 She then ran a statistical analysis and sees the following on her screen: Source SS df MS F Prob>F Groups 4.7245 2 2.36225 5.38 0.0059 Error 46.9945 107 0.4392 Total 51.719 109 1C) What kind of analysis did she perform? Explain the information you can gather from this output in your own words, and conclude. Here is a similar plot for precipitation and the result of a similar analysis for precipitation: Source SS df MS F Prob>F Groups 317.33 2 158.666 4.72 0.0109 Error 3596.84 107 33.615 Total 3914.18 109
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
5 1D) Explain the information you can gather from this output in your own words and conclude. Below is a different way to visualize the data with the black diamonds marking the years with large fires (exceeding 1M acres of burnt land). 1E) Most of the fires are sitting in the upper left quadrant. What does this mean? Would there have been a smarter way to analyse the data? How?
6 Question 2
.
Answer all parts 2A to 2D.
A team of conservation biologists was monitoring the reproduction of an endangered plant in a national park south of Sydney. They set up 15 plots in the heathland in Spring 2019 and measured the number of seedlings that had recently emerged. Over summer 2019/2020, the group actively removed weeds from all the plots. In Spring 2020, they revisited the plots and again measured the number of seedlings that had recently emerged. They collected the following data plotted the differences between seedling recruitment in the two years. Plot Number of seedlings in 2019 Number of seedlings in 2020 1 5 18 2 3 24 3 7 45 4 9 23 5 10 76 6 23 30 7 2 15 8 8 8 9 12 26 10 15 34 11 31 41 12 10 32 13 8 19 14 2 15 15 9 24 A paired-t test was run to formally test whether the two years were different. Paired t-test data: Seedlings$Seedlings by Seedlings$Year t = -4.5718, df = 14, p-value = 0.0004352 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -27.032092 -9.767908 sample estimates: mean of the differences -18.4
7 2A) Why did the scientists decide to use a paired t
-test? 2B)
Write a paragraph that could be given the national park managers to explain the result. 2C) Can you see any problems with the analysis that they ran?
2D) How confident are you that the results can be explained by the weed removal work conducted between the two sampling events? Describe a sampling design that would allow a more effective test of whether removing weeds could help the regeneration of the endangered plant.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
8 Question 3
.
Answer all parts 3A to 3D.
Reef Life Survey is a citizen science program where SCUBA divers record the abundance of fish and other organisms from reefs all around the world. By April 2020, over 13000 surveys have been conducted in 53 countries recording over 18 million individuals from almost 5000 species. From the 432325 observations currently available from Australia, I have extracted a sample of these data. The sample has 1000 of the surveys conducted, with data on the abundance of 1759 species of fish. These come from the Temperate Australasia and Indo-West Pacific biogeographic realms (the light green and dark purple regions to the south and north of Australia) Within those realms, there are smaller ecoregions .
9 Here is a multidimensional scaling (MDS) plot that visualises those surveys with each point colour-
coded by each of the 20 ecoregions around Australia, with a different symbol for the two biogeographic realms. 3A) What do each of the points on the plot and the distance between them represent? 3B) The stress value for that MDS analysis was 0.105. What does this mean?
3C) What do you conclude about the variation in fish communities across Australia from the MDS plot? 3D)
“
The plot has strong evidence that tropical fish are more abundant than those in temperate regions
” True or false? Discuss with reasons.
1/4/2022, 5
:
15 pm
Final - Term 1 2022 - BEES2041-BEES5041 - Data Analysis: Life & Earth Sc
Page 1 of 26
https://unsw.inspera.com/static/player
?
viewMedia=print&printPar…%7D&locale=en_us&context=preview&assessmentRunId=108331043#/all
!
2022 - BEES2041 Cover sheet
BEES2041-BEES5041 – Data Analysis for Life and Earth Scientists
Final Exam –Term 1 2022
Instructions:
1. Time allowed – 2 hours
, plus 15 minutes.
2. Total number of questions to be answered – 16
3. Total marks available – 100 marks
, worth 35% of the total marks for the course.
4. Marks available for each question are shown in the exam.
5. Students are advised to read all of the examination questions before attempting to
answer the questions.
6. This exam cannot be copied, forwarded, or shared in any way
7. Students are reminded of the UNSW rules regarding academic integrity and plagiarism 8. Your work will be saved periodically throughout the exam and will be automatically
submitted when the test ends provided you are connected to the internet
9. You must upload all of your work within the exam time. There is no extra time to
upload. No late submissions will be accepted.
If you have a question or concern during today’s exam, you should contact the exams team
for support at Phone
+61 2 8936 7007
or the online form
.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
1/4/2022, 5
:
15 pm
Final - Term 1 2022 - BEES2041-BEES5041 - Data Analysis: Life & Earth Sc
Page 2 of 26
https://unsw.inspera.com/static/player
?
viewMedia=print&printPar…%7D&locale=en_us&context=preview&assessmentRunId=108331043#/all
NSW is home to around 7200 native plant species. Of these, 574 are listed as
either Vulnerable, Endangered or Critically Endangered under the NSW
Biodiversity Conservation Act 2016. From here on we'll refer to species as being
listed or un-listed.
A scientist was interested to understand whether there are particular
characteristics that make species more likely to be listed under the act. They
hypothesised that some growth forms (trees) would be more likely to be listed,
as they have longer life cycles and smaller populations in a given area of
habitat. To test this, they first compiled a data set of the entire NSW flora,
including species name, status (listed or not listed) and the species growth form
(tree, herb, shrub, climber). They then calculated the following by growth form:
n_species_listed: the number of listed species in each growth form
n_species_total: the number of species in each growth form
percent_listed: the percentage of each species listed, in each growth from
(= n_species_listed/n_species_total *100)
The resulting dataset looked like this:
1/4/2022, 5
:
15 pm
Final - Term 1 2022 - BEES2041-BEES5041 - Data Analysis: Life & Earth Sc
Page 3 of 26
https://unsw.inspera.com/static/player
?
viewMedia=print&printPar…%7D&locale=en_us&context=preview&assessmentRunId=108331043#/all
When first analysing the data, they noticed a large number of shrubs recorded
as listed, and fewer trees (panel a). But there are also big differences in the
number of species present (panel b). They therefore calculated the fraction of
species in each growth form that were listed (panel c), so that they could
standardise the number of species listed by the total number of species. This
panel seemed to support the researchers hypothesis, that more trees were
listed, but also shrubs.
Before getting too excited, the researchers ran a statistical test comparing the
number of listed species in each group to the proportion of total species in each
growth form.
1/4/2022, 5
:
15 pm
Final - Term 1 2022 - BEES2041-BEES5041 - Data Analysis: Life & Earth Sc
Page 4 of 26
https://unsw.inspera.com/static/player
?
viewMedia=print&printPar…%7D&locale=en_us&context=preview&assessmentRunId=108331043#/all
1
Hypotheses
Explain what test was run. What are the null and alternate hypothesis?
Format
"
#
Σ
$
Words: 0
Maximum marks: 5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
1/4/2022, 5
:
15 pm
Final - Term 1 2022 - BEES2041-BEES5041 - Data Analysis: Life & Earth Sc
Page 5 of 26
https://unsw.inspera.com/static/player
?
viewMedia=print&printPar…%7D&locale=en_us&context=preview&assessmentRunId=108331043#/all
2
Expected
Explain how the variable `p_expected` was calculated and the purpose of including it
in the test.
Format
"
#
Σ
$
Words: 0
Maximum marks: 5
1/4/2022, 5
:
15 pm
Final - Term 1 2022 - BEES2041-BEES5041 - Data Analysis: Life & Earth Sc
Page 6 of 26
https://unsw.inspera.com/static/player
?
viewMedia=print&printPar…%7D&locale=en_us&context=preview&assessmentRunId=108331043#/all
3
Conclusions
Write a short paragraph for National Parks, describing what you concluded from the
test
Format
"
#
Σ
$
Words: 0
Maximum marks: 5
1/4/2022, 5
:
15 pm
Final - Term 1 2022 - BEES2041-BEES5041 - Data Analysis: Life & Earth Sc
Page 7 of 26
https://unsw.inspera.com/static/player
?
viewMedia=print&printPar…%7D&locale=en_us&context=preview&assessmentRunId=108331043#/all
4
Design
Based on the results of this study, the researcher sought to understand why some tree
species were listed and others weren't. Suggest a hypothesis for why tree shrub species may be more likely to be listed, and
outline a sampling design to test the hypothesis, by comparing listed and un listed
species.
Format
"
#
Σ
$
Words: 0
Maximum marks: 10
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
1/4/2022, 5
:
15 pm
Final - Term 1 2022 - BEES2041-BEES5041 - Data Analysis: Life & Earth Sc
Page 8 of 26
https://unsw.inspera.com/static/player
?
viewMedia=print&printPar…%7D&locale=en_us&context=preview&assessmentRunId=108331043#/all
Baiting with poison has been used to control populations of dingoes (often
referred to as wild dogs) in many agricultural areas in Australia, because they
sometimes kill livestock. Larger dingoes have been observed to be less
susceptible to baiting, leading researchers to hypothesise that baiting would
cause the size of dingoes to increase where baiting had been present for long
periods.
The ideal design for testing this would involve sampling weights of adults before
and after baiting was introduced. As weights cannot be collected for locations
where baiting is already present, the researchers wondered if they could use the
size of dingo skulls, stored as museum samples, as an indicator of body mass?
Fortunately, a dataset had been collected where mass at the time of death for
many of the skulls that the researchers measured had been recorded. The
researchers therefore first tested the hypothesis that body mass increased
significantly with skull length. The dataset had the following variables:
They first visualised the data. While there was overall a good relationship, there
were some obvious outliers in the data.
1/4/2022, 5
:
15 pm
Final - Term 1 2022 - BEES2041-BEES5041 - Data Analysis: Life & Earth Sc
Page 9 of 26
https://unsw.inspera.com/static/player
?
viewMedia=print&printPar…%7D&locale=en_us&context=preview&assessmentRunId=108331043#/all
1/4/2022, 5
:
15 pm
Final - Term 1 2022 - BEES2041-BEES5041 - Data Analysis: Life & Earth Sc
Page 10 of 26
https://unsw.inspera.com/static/player
?
viewMedia=print&printPar…7D&locale=en_us&context=preview&assessmentRunId=108331043#/all
5
Outliers
Outline what steps you would take to decide whether the outliers should remain in
the analysis. What factors could justify removing outliers?
Format
"
#
Σ
$
Words: 0
Maximum marks: 5
6
Test 1
With the outliers removed, the researchers ran a test relating body mass to skull length. The
results of the test are shown below.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
1/4/2022, 5
:
15 pm
Final - Term 1 2022 - BEES2041-BEES5041 - Data Analysis: Life & Earth Sc
Page 11 of 26
https://unsw.inspera.com/static/player
?
viewMedia=print&printPar…%7D&locale=en_us&context=preview&assessmentRunId=108331043#/all
What kind of analysis did they perform? Write the equation, explain the information
you can gather from this output in your own words, and conclude.
Format
"
#
Σ
$
Words: 0
1/4/2022, 5
:
15 pm
Final - Term 1 2022 - BEES2041-BEES5041 - Data Analysis: Life & Earth Sc
Page 12 of 26
https://unsw.inspera.com/static/player
?
viewMedia=print&printPar…7D&locale=en_us&context=preview&assessmentRunId=108331043#/all
Maximum marks: 5
7
Test 2
The researchers subsequently realised that it could matter whether the samples came from
males and females animals, as they may have different morphology. They made a plot
which suggested some differences.
So they decided to test if the predicted equation differed between males and females. They
ran the following test, with results as shown:
1/4/2022, 5
:
15 pm
Final - Term 1 2022 - BEES2041-BEES5041 - Data Analysis: Life & Earth Sc
Page 13 of 26
https://unsw.inspera.com/static/player
?
viewMedia=print&printPar…7D&locale=en_us&context=preview&assessmentRunId=108331043#/all
What kind of analysis did they perform? Explain the information you can gather from
this output in your own words, and conclude.
Format
"
#
Σ
$
Words: 0
Maximum marks: 7.5
8
Test 3
Finally, the researchers ran an anova to see whether skull size changed in three different
geographic zones after the introduction of baiting. The plot shows the distribution of skull
sizes recorded before ("pre") and after ("post") baiting was initiated.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
1/4/2022, 5
:
15 pm
Final - Term 1 2022 - BEES2041-BEES5041 - Data Analysis: Life & Earth Sc
Page 14 of 26
https://unsw.inspera.com/static/player
?
viewMedia=print&printPar…7D&locale=en_us&context=preview&assessmentRunId=108331043#/all
Results of the test are as follows.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
1/4/2022, 5
:
15 pm
Final - Term 1 2022 - BEES2041-BEES5041 - Data Analysis: Life & Earth Sc
Page 15 of 26
https://unsw.inspera.com/static/player
?
viewMedia=print&printPar…7D&locale=en_us&context=preview&assessmentRunId=108331043#/all
Was the researchers original hypothesis supported? What can you conclude from
this test?
Format
"
#
Σ
$
Words: 0
Maximum marks: 7.5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
BEES2041-BEES5041 - Final Exam - T1 2023 - Data Analysis for Life and Earth Sciences-Data Analysis: Environmental
Science & Management
1/27
BEES2041-BEES5041 – Data Analysis for Life and Earth Scientists
Final Exam –Term 1 2023
Instructions:
1. Time allowed – 2 hours
, plus 15 minutes.
2. Total number of questions to be answered – 15
3. Total marks available – 100 marks
, worth 35% of the total marks for the course.
4. Marks available for each question are shown in the exam.
5. Students are advised to read all of the examination questions before attempting to answer
the questions.
6. This exam cannot be copied, forwarded, or shared in any way
7. Students are reminded of the UNSW rules regarding academic integrity and plagiarism 8. Your work will be saved periodically throughout the exam and will be automatically submitted
when the test ends provided you are connected to the internet
9. You must upload all of your work within the exam time. There is no extra time to upload. No
late submissions will be accepted.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
BEES2041-BEES5041 - Final Exam - T1 2023 - Data Analysis for Life and Earth Sciences-Data Analysis: Environmental
Science & Management
13/27
Gap-filling with predictive models
Note: This question logically follows Questions 4-7 above (on leaf area vs temperature). We
suggest completing those questions before attempting this section.
As noted earlier, the size of a plant's leaves (called "leaf area") is a trait that affects where species
are found. Species with larger leaves tend to be found in warmer and wetter areas. Many people
therefore consider leaf area a useful indicator of a species preferred climate. There are over 22,000
plant species in Australia. If we had traits like "leaf area" measured for all species, we could use
them as an easy indicator of species ecology.
However, we currently only have data on leaf area for around 20% of known species. There is
therefore a strong need to increase the number of species with records of leaf area.
To increase coverage of the dataset in Australia, Isaac's team decided to see whether they could
predict a species leaf area from other traits, for which they have more data. This is called gap-
filling.
The table below shows the number of species for which we have data on a range of variables:
Variable
Number of species with records
in Australia
species name
24,472 family
24,435
growth form
15,010
leaf_width
14,149
leaf_length
14,619
leaf_area
4,841
As you can see, we have lots more data on characters like growth form, and quite a lot on other
leaf dimensions, such as "leaf width" and "leaf length". The researchers therefore wanted to test
how well they could predict leaf area from these other traits. If it worked, they could greatly
increase coverage of this important trait.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
BEES2041-BEES5041 - Final Exam - T1 2023 - Data Analysis for Life and Earth Sciences-Data Analysis: Environmental
Science & Management
14/27
3(a)
The team decided to use a random forest model to make the predictions, using the function
`ranger` from the R package `ranger`.
Following standard techniques for predictive modelling, they first assembled a "labelled"
dataset, where the desired outcome variable (leaf area) is known. The dataset had the following
columns. Note that the 3 numeric traits were log transformed first.
They then split the dataset into two parts using the following code. Explain the role of training and testing datasets, created above, for development of the
predictive model. If the goal is to predict the variable leaf area, why is it included in the labelled
dataset?
Fill in your answer here
Format
Σ
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
BEES2041-BEES5041 - Final Exam - T1 2023 - Data Analysis for Life and Earth Sciences-Data Analysis: Environmental
Science & Management
15/27
Words: 0
Maximum marks: 5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
BEES2041-BEES5041 - Final Exam - T1 2023 - Data Analysis for Life and Earth Sciences-Data Analysis: Environmental
Science & Management
16/27
3(b)
The researchers ran their model with all covariates included, then calculated the Root Mean
Square Error (RMSE) between observed and predicted values from the model, in both the
training and testing datasets. The following plot shows observed Y vs predicted Y, with RMSE
and 1:1 line, in both the training and testing datasets.
Describe the result. How well does this model predict leaf area?
Fill in your answer here
Format
Σ
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
BEES2041-BEES5041 - Final Exam - T1 2023 - Data Analysis for Life and Earth Sciences-Data Analysis: Environmental
Science & Management
17/27
Words: 0
Maximum marks: 5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
BEES2041-BEES5041 - Final Exam - T1 2023 - Data Analysis for Life and Earth Sciences-Data Analysis: Environmental
Science & Management
18/27
3(c)
There's a lot more data available on simple categorical variables like growth form. So we could
potentially predict leaf area for many more species if our model only used the categorical
variables to make predictions. But would such a model be skilful enough?
The researchers therefore compared the predictive skill of 3 models using different covariates:
1. All covariates (as in the previous page)
2. A model using only one of the other numeric variables along with categorical data (leaf
length, so excluding leaf width)
3. A model using only the categorical variables.
The following plot shows observed Y vs predicted Y, with RMSE and 1:1 line, in both the training
and testing datasets, for model 3 (Only categorical variables). The following table shows the RMSE in the testing dataset for the 3 models:
Model
RMSE
1. All covariates
0.40
2. leaf length plus categorical variables
0.59
3. Only categorical variables
0.84
Based on these results, and those shown on the previous page, write a paragraph to
summarise your findings on our ability to use predictive models to estimate a species leaf area.
Fill in your answer here
Format
Σ
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
BEES2041-BEES5041 - Final Exam - T1 2023 - Data Analysis for Life and Earth Sciences-Data Analysis: Environmental
Science & Management
19/27
Words: 0
Maximum marks: 10
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help