Assignment Description In this assignment, we will try to estimate factors associated with the productivity of garment manufacturing workers. The data from the project comes from the University of California - Irvine's Machine Learning Repository. The link to the dataset is: https://archive.ics.uci.edu/ml/datasets/Productivity+Prediction+of+Garment+Employees# In R, you may load the dataset using the following command: d <- read.csv"https://archive.ics.uci.edu/ ml/machine-learning-databases/00597/ garments_worker_productivity.csv", header = TRUE, as.is = TRUE) The dataset is available in a csv file names 'garment_worker_productivity.csv'. The dataset will be loaded in the data-frame named d. The variable of interest is 'actual productivity' which is a number between 0 and 1 indicating the productivity of workers in garment manufacturing. The variables in the original dataset are the following (taken from the data webpage): Column # Name date day quarter department team_no 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 no_of_workers no_of_style_change targeted_productivity smv wip over_time incentive idle_time idle_men actual productivity Description Date in MM-DD-YYYY Day of the Week A portion of the month. A month was divided into four quarters Associated department with the instance Associated team number with the instance Number of workers in each team Number of changes in the style of a particular product Targeted productivity set by the Authority for each team for each day. Standard Minute Value, it is the allocated time for a task Work in progress. Includes the number of unfinished items for products Represents the amount of overtime by each team in minutes Represents the amount of financial incentive (in BDT) that enables or motivates a particular course of action. The amount of time when the production was interrupted due to several reasons The number of workers who were idle due to production interruption The actual % of productivity that was delivered by the workers. It ranges from 0-1. Perform the following analysis. 1. Data processing - 10 points. a. Remove the column 'wip' from the dataset. b. Create another variable names log_productivity which is defined as log_prodductivity log(actual_prouctivity * 100). Store any new variable as an additional column in the original data- = frame. c. Create another variable called 'log_no_of_workers' which is the natural logarithm of the no_of_workers. d. Convert the following variables to factor variables team, quarter, department, and day. e. Create another variable called 'percentage_achivement which is defined as follows: percentage_achievment (actual_productivity targeted_productivity) / = targeted productivity X 100. f. Also for cleaning the variable department, please run the following command (there are some coding errors in the variable department). > levels (d$department)<-c("finishing", "finishing", "sewing") 2. Exploratory Analysis - 40 points. a. Create the histograms of actual productivity and log_productivity. How does the distribution of log_productivity change with respect to actual productivity? Do the same for number of workers. b. Each month is divided into five quarters, where approximately each week is a quarter. How does the distribution of logarithm of productivity change in each quarter? Create a box plot of logarithm of productivity by quarter. Comment on your observations. Does the worker productivity increase towards the end of the month (quarter 5) as compared to other quarters? Perform a t-test for quarter 5 with respect to (individually) all other quarters. (Hint. There will be 4 different t-tests). What do you observe for each t-test? Comment on the findings. Use a 95% confidence. (You need to state the hypotheses explicitly in your answer, the mean and standard deviations for each of the groups in a t-tests, the t-statistics and the p-values. Then you need to explain what the p-value means.). c. Repeat part (b) for department instead of quarter, day instead of quarter, and no_of_style_change instead of quarter. In these cases, perform the t-test for all pairs of departments and all pairs of style changes. For day, compare Sunday with all other weekdays. d. Perform a scatter plot of the natural logarithm of no_of_workers +1 on x-axis and natural logarithm of productivity on y-axis. What do you observe? Comment on any pattern that you may observe. Report the correlation coefficient between the two variables. e. Perform a scatter plot of the natural logarithm of incentive + 1 on x-axis and natural logarithm of productivity on y-axis. What do you observe? Comment on any patterns that you may observe. Report the correlation coefficient between the two variables. f. Repeat (d) and (e) for percentage_achievement instead of logarithm of productivity.