Problem Set 1

docx

School

University of Texas, Dallas *

*We aren’t endorsed by this school

Course

6359

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

Uploaded by CaptainChimpanzeeMaster771

Question 1 Part a For luxury hotels: Mean = Sum of all values/Number of values Therefore, Mean = (275+280+220+250+220+225)/6 = 245 Standard Deviation = σ = √ (Σ (xi - μ)² / N) Where: σ represents the standard deviation. Σ represents the summation symbol, which means you should sum over all data points. xi represents each data point. μ represents the mean (average) of the dataset. N represents the total number of data points. Here μ=245 and N=6 Therefore, Standard Deviation = √ (((275-245)²+ (280-245)²+ (220-245)²+ (250-245)²+ (220-245)²+ (225-245)²)/6) = 25.81 For budget hotels: Mean = Sum of all values/Number of values Therefore, Mean = (70+70+69+65+62+75+70+70+60)/9 = 67.89 Standard Deviation = σ = √ (Σ (xi - μ)² / N) Where: σ represents the standard deviation. Σ represents the summation symbol, which means you should sum over all data points. xi represents each data point. μ represents the mean (average) of the dataset. N represents the total number of data points. Here μ=67.89 and N=9 Therefore, Standard Deviation = √ (((70-67.89)²+ (70-67.89)²+ (69-67.89)²+ (65-67.89)²+ (62-67.89)²+ (75- 67.89)²+ (70-67.89)²+ (70-67.89)²+ (60-67.89)²)/9) = 4.55 Part b Luxury hotels display more significant price variability than budget hotels due to two main factors:

1. Diverse Luxury Amenities : Luxury hotels offer a wide range of amenities, leading to variable pricing based on the specific amenities included. In contrast, budget hotels provide more standardized services. 2. Dynamic Influences : Pricing in luxury hotels is influenced by factors like occupancy rates and special events, causing rates to fluctuate. Budget hotels typically maintain more stable pricing structures. Part c R code: Hotels <- data.frame( + Price = c (275, 280, 220, 250, 220, 225, 70, 70, 69, 65, 62, 75, 70, 70, 60), + Type = c(rep('Luxury',6), rep ('Budget', 9) ) ) > head(Hotels) > luxury_hotels <- Hotels$Price[Hotels$Type == "Luxury"] > budget_hotels <- Hotels$Price[Hotels$Type == "Budget"] > # Calculate the mean and standard deviation for Luxury and Budget hotels > mean_luxury <- mean(luxury_hotels) > std_dev_luxury <- sd(luxury_hotels) > mean_budget <- mean(budget_hotels) > std_dev_budget <- sd(budget_hotels) Output: Part d R code: boxplot <- ggplot(Hotels, aes(x = Type, y = Price)) + + geom_boxplot() + + labs( + x = "Hotel Type",

+ y = "Price", + title = "Price Comparison Between Luxury and Budget Hotels" + ) > print(boxplot) Output: Question 2 Introducing a $1,000 bonus to each employee's salary does not alter the standard deviation. This is because the standard deviation reflects the degree of dispersion among data points within a dataset. When a fixed amount is uniformly added to every data point, it merely shifts the entire dataset by the same value, preserving the relative differences between data points. Consequently, the spread or variability of the data remains unaffected by this uniform adjustment. Question 3 Discrete Variables:  Gender  Gender is a discrete categorical variable as it represents distinct categories or groups (e.g., male and female), but it's not measured on a continuous scale.  Ethnicity  Ethnicity is also a discrete categorical variable, with distinct categories (e.g., African American, Asian, Hispanic) and no continuous measurement.  Heart Rate (bpm - beats per minute)  Heart Rate is a discrete numerical variable because it represents a count of heartbeats within a specific time interval. Heart rate values are typically whole numbers (integers).

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Continuous Variables:  Age (years)  Age is a continuous numerical variable. It is measured continuously, and there can be decimal values between measurements. Age can vary continuously within a certain range.  Height (meters)  Height is a continuous numerical variable. It is measured on a continuous scale, and like age, there can be decimal values between measurements. Height can vary continuously within a certain range.  Weight (kilograms)  Weight is a continuous numerical variable. It is measured continuously, and there can be decimal values between measurements. Weight can vary continuously within a certain range.  Blood Pressure (mmHg)  Blood Pressure is a continuous numerical variable. It is measured on a continuous scale with decimal values, and it can vary continuously within a specific range of values. Question 4 Part a The boxplots clearly show that all three sites share a similar median value for the percentage of chemical Z, which is approximately 7%. What sets them apart is the variability or spread in the values. Based on the information presented in the boxplots, it's evident that sites I and III exhibit a wider spread of data for the percentage values of chemical Z. In contrast, site II demonstrates a much narrower spread, suggesting that most of its values cluster closely around the median value. Part b Subpart I The likely source of this information can be attributed to Site III. While it's conceivable that values beyond the range of the sampled data could exist, the calculations using the available data suggest approximate minimum and maximum total percentages for the three chemicals, as demonstrated in the table below. Notably, Site III is the only location where the sum of the minimum and maximum values falls within the range of 20.5. Subpart II Chemical Y appears to be the most beneficial due to the distinct distribution of its total weight percentages across the three sites. The distributions of Chemicals X and Z, on the other hand, exhibit significant overlap. Question 5 Part a To inspect the nature of the given data, we have the following commands in R:

gss2014 <- readRDS("C:/Users/prath/Downloads/gss2014 (1).rds") > education <- gss2014$EDUC > mean_education <- mean(education, na.rm = TRUE) > median_education <- median(education, na.rm = TRUE) > sd_education <- sd(education, na.rm = TRUE) The above commands give the following results: In summary, based on these results, it can be inferred that, on average, American adults in the dataset had a reasonably high level of education (mean = 13.699). However, there is some variability in education levels, as indicated by the standard deviation (3.07128), meaning that the dataset includes individuals with a range of education levels. The median (14) provides insight into the central tendency of the distribution, showing that approximately half of the individuals had education levels below 14 and the other half had education levels above 14. Part b In the previous part, we used education <- gss2014$EDUC to subset the required data about education years. Continuing the code, we can further write: education <- na.omit(education) hist(education, breaks = seq(0, max(education) + 10, by = 10), + main = "Education Levels in 2014", + xlab = "Education Level", ylab = "Frequency", col = "blue") We use na.omit() to remove any rows with missing values from the "EDUC" column.

Question 6 Part a Calculating the mean and median of INCOME: mean_income <- mean(gss2014$INCOME, na.rm = TRUE) > median_income <- median(gss2014$INCOME, na.rm = TRUE) This gives us the following result: Therefore, Mean income = 10.96*10000 (Approximately) Median income = 120000

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Checking whether the annual income distribution is symmetric: hist(gss2014$INCOME, main = "Distribution of Annual Incomes in 2014", + xlab = "Annual Income", ylab = "Frequency", col = "blue") This gives us the following result (graph): The histogram clearly shows us the annual income distribution is not symmetric. Part b R code: >income_range <- max(gss2014$INCOME, na.rm = TRUE) - min(gss2014$INCOME, na.rm = TRUE) > income_variance <- var(gss2014$INCOME, na.rm = TRUE) > income_sd <- sqrt(income_variance)

Part c Create a scatter plot: plot(gss2014$EDUC, gss2014$INCOME, + xlab = "Years of Education (EDUC)", + ylab = "Annual Income (INCOME)", + main = "Scatter Plot: Income vs. Education") Result: Calculation of correlation coefficient: correlation_coefficient <- cor(gss2014$EDUC, gss2014$INCOME, use = "complete.obs") complete.obs": When you specify use = "complete.obs", the cor() function will calculate the correlation coefficient only using pairs of observations (data points) where both variables being correlated have non-missing values. Any rows in your data frame where either INCOME or EDUC has missing values will be excluded from the calculation. Now, since the|correlation coefficient|is <0.3, the variables are said to be weakly correlated .

Related Documents

module1_stata_lab.docx

Assignment-6-Introduction-to-working-with-R-RStudio.docx

Test 2 copy.pdf

Week 4 - Assignment-Heather Montoya z score formula.pdf

Problem Set 3.docx

Problem Set 5.docx

Q2-Statistical Process Control.docx

CML 11.docx

hw5.template.pdf

HUB670 Statistics Exericse.docx

Copy of Helper or Hinderer Assignment.docx-2.pdf

19BCE1567_LAB1.pdf

Recommended textbooks for you

Big Ideas Math A Bridge To Success Algebra 1: Stu...

Algebra

ISBN:9781680331141

Author:HOUGHTON MIFFLIN HARCOURT

Publisher:Houghton Mifflin Harcourt

Algebra: Structure And Method, Book 1

Algebra

ISBN:9780395977224

Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole

Publisher:McDougal Littell

Glencoe Algebra 1, Student Edition, 9780079039897...

Algebra

ISBN:9780079039897

Author:Carter

Publisher:McGraw Hill

College Algebra (MindTap Course List)

Algebra

ISBN:9781305652231

Author:R. David Gustafson, Jeff Hughes

Publisher:Cengage Learning

Holt Mcdougal Larson Pre-algebra: Student Edition...

Algebra

ISBN:9780547587776

Author:HOLT MCDOUGAL

Publisher:HOLT MCDOUGAL

SEE MORE TEXTBOOKS

Recommended textbooks for you

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
College Algebra (MindTap Course List)
Algebra
ISBN:9781305652231
Author:R. David Gustafson, Jeff Hughes
Publisher:Cengage Learning
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL