Katreddy_Project1_Report

pdf

School

Northeastern University *

*We aren’t endorsed by this school

Course

6000

Subject

English

Date

Apr 3, 2024

Type

pdf

Pages

Uploaded by SargentRose1791

1 Project 1 – R Practice: Results and Key Findings Harsha Katreddy College of Professional Studies, Northeastern University, Boston ALY 6000: Introduction to Analytics S9 Fall 2023 (CRN: 70407) Dr. Richard He September 25, 2023

2 Introduction and Key Findings This report summarizes the results and key inferences derived by setting up a project in RStudio and executing the code as per 42 instructions given in the ALY_6000_Project1.pdf. The assignment provides with clear instructions to setup the project in RStudio and code to clear out the environment in RStudio. Key Findings Problem 1: Answers for all the operations are in below attached screenshot with computed results of mathematical and logical operations performed as per instructions. This problem exhibits the working of arithmetic and logical operators in R. Figure 1. Answers for Problem 1 Problem 14: On execution of 4 lines of code as per given instructions, the result of each line is explained below. second_vector + 20 : Increments each element of second_vector by 20 and returns a new vector second_vector * 20 : Multiplies each element of second_vector by 20 and returns a new vector second_vector >= 20 : If element in second_vector greater than or equal to 20, returns True and if element is less than 20 returns False. Output is a logical vector. second_vector != 20 : If element in second vector is not equal to 20, returns True and if element is equal to 20, returns False. Output is a logical vector.

3 Problem 23: Code executed extracts elements from first_vector [17 12 -33 5] by indexing using logical vector [FALSE TRUE FALSE TRUE] . It returns elements of first_vector with indexing value as TRUE and stores it in vector_from_boolean_brackets [12 5] Problem 24: If element in second_vector [10 12 14 16 18 20 22 24 26 28 30] greater than or equal to 20, returns True and if element is less than 20 returns False, output is a logical vector [FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE] Problem 25: Returns a vector containing sequence of numbers from 10 and incrementing by 2, and ending at 30 and stores in ages_vector [10 12 14 16 18 20 22 24 26 28 30] Problem 26: ages_vector >= 20 generates a logical vector with elements as TRUE if element in ages_vector is > or = 20 or FALSE if < 20, this logical vector is used to index ages_vector which returns vector with elements > or = 20 in ages_vector . Ans: [20 22 24 26 28 30] Problem 30: set.seed(5) intializes the random number generator to a certain starting point. This is used to ensure reproducibility. runif(n=10, min=0, max=1000) generates a vector with 10 random numbers between 0 and 1000 that follow uniform distribution. This is stored in random_vector [200.2145 685.2186 916.8758 284.3995 104.6501 701.0575 527.9600 807.9352 956.5001 110.4530] Problem 37: set.seed(5) ensures the random number generator is set to the same starting point as in Problem 30. rnorm(n=1000, mean=50, sd=15) generates a vector with 1000 random numbers between 0 and 1000 with mean as 50 and standard deviation as 15. This is stored in random_vector

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

4 Problem 38: Figure 2. Histogram generated in Problem 38 hist(random_vector) generates a histogram of random_vector as generated previously. It graphically represents the distribution of random_vector . This histogram represents count of values (Frequency) that fall within different intervals represented on x-axis. As random_vector contains 1000 random numbers with normal distribution with mean as 50, the histogram rightly represents a bell-shaped curve with center around 50. The spread of values is influenced by standard deviation of 15. Problem 42: first_dataframe is generated using read_csv function for “ds_salaries.csv” provided head(first_dataframe) : Returns first 6 rows of the "first_dataframe" as n=6 is the default value for number of rows to be returned. head(first_dataframe, n=7 : Returns first 7 rows of the "first_dataframe" as n=7 defines number of rows to be returned. names(first_dataframe : Returns column names of all columns in "first_dataframe"

5 smaller_dataframe <- select(first_dataframe, job_title, salary_in_usd) smaller_dataframe select() is used to create a new dataframe by selecting two columns namely "job_title" and "salary_in_usd" from " first_dataframe " and assigned it to "smaller_dataframe" , which is subsequently displayed. better_smaller_dataframe <- arrange(smaller_dataframe, desc(salary_in_usd)) better_smaller_dataframe arrange() is used to create a new dataframe by sorting all rows of "smaller_dataframe" from highest to lowest based on "salary_in_usd" column and assigned it to "better_smaller_dataframe" , which is subsequently displayed. better_smaller_dataframe <- filter(smaller_dataframe, salary_in_usd > 80000) better_smaller_dataframe filter() is used to create a new dataframe by filtering rows of "smaller_dataframe" which include only rows where " salary_in_usd" is greater than 80,000 USD and assigned it to "better_smaller_dataframe" , which is subsequently displayed. better_smaller_dataframe <- mutate(smaller_dataframe, salary_in_euros = salary_in_usd*.94) better_smaller_dataframe mutate() is used to add a new column "salary_in_euros" whose values are generated by multiplying salary in "salary_in_usd" by 0.94 (exchange rate). This new dataframe is stored in "better_smaller_dataframe" and displayed subsequently.

6 better_smaller_dataframe <- slice(smaller_dataframe,1,1,2,3,4,10,1) better_smaller_dataframe slice() is used to create a new dataframe with rows selected from "smaller_dataframe" corresponding to given indices. This new dataframe is stored in "better_smaller_dataframe" and displayed subsequently. ggplot(better_smaller_dataframe) +  Intializes the ggplot() using "better_smaller_dataframe" as argument geom_col(mapping = aes(x = job_title, y=salary_in_usd), fill="blue")+  Adds a bar chart layer with x-axis variable as " job_title" and y-axis variable as " salary_in_usd". The bars are generated in "blue" color. xlab("Job Title") +  Sets the x-axis variable as " Job title" ylab("Salary in US Dollars") +  Sets the y-axis variable as "Salary in US Dollars" labs (title = "Comparision of Jobs ") +  Sets the title of plot as "Comparision of Jobs" . scale_y_continuous(labels = scales::dollar) +  Formats y-axis labels as dollar values and represents salary in USD theme(axis.text.x = element_text(angle = 50, hjust = 1))  Formats x-axis labels by rotating by 50 degrees and adjusting horizontal alignment to 1. This ensures no overlapping of job titles.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

7 Figure 3. Bar Chart generated in Problem 43 Conclusion This assignment enabled me to learn some basic concepts in R programming language and made me familiar with several functions in R. However, for the last ggplot() executed for “better_smaller_dataframe”, the barchart generated a skewed graphical representation due to the data frame selected. As the dataframe selected for the barchart has multiple rows of job title corresponding to “Data Scientist” due to multiple indices of “1” given in slice(). This resulted in Salary value for “Data Scientist” as around 2,40,000 USD instead of 79,833 USD Citations: Robert I. Kabacoff. (2022). R in Action (3rd ed.). Manning Publications. Datacamp. (2023). https://rdocumentation.org/ American Psycological Associstion. (2023). https://www.apa.org Introduction to problem solving with R. (2023). Instructions set. ALY6000_Project_1.pdf OpenAI.(September 25, 2023). https://chat.openai.com/ Prompt : Explain geom_col(mapping = aes(x = job_title, y=salary_in_usd), fill="blue") in context on R Language

Katreddy_Project1_Report

Related Documents