Assignment Description In this assignment, we will try to estimate factors associated with the productivity of garment manufacturing workers. The data from the project comes from the University of California - Irvine's Machine Learning Repository. The link to the dataset is: https://archive.ics.uci.edu/ml/datasets/Productivity+Prediction+of+Garment+Employees# In R, you may load the dataset using the following command: d <- read.csv"https://archive.ics.uci.edu/ ml/machine-learning-databases/00597/ garments_worker_productivity.csv", header = TRUE, as.is = TRUE) The dataset is available in a csv file names 'garment_worker_productivity.csv'. The dataset will be loaded in the data-frame named d. The variable of interest is 'actual productivity' which is a number between 0 and 1 indicating the productivity of workers in garment manufacturing. The variables in the original dataset are the following (taken from the data webpage): Column # Name date day quarter department team_no 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 no_of_workers no_of_style_change targeted_productivity smv wip over_time incentive idle_time idle_men actual productivity Description Date in MM-DD-YYYY Day of the Week A portion of the month. A month was divided into four quarters Associated department with the instance Associated team number with the instance Number of workers in each team Number of changes in the style of a particular product Targeted productivity set by the Authority for each team for each day. Standard Minute Value, it is the allocated time for a task Work in progress. Includes the number of unfinished items for products Represents the amount of overtime by each team in minutes Represents the amount of financial incentive (in BDT) that enables or motivates a particular course of action. The amount of time when the production was interrupted due to several reasons The number of workers who were idle due to production interruption The actual % of productivity that was delivered by the workers. It ranges from 0-1. Perform the following analysis. 1. Data processing - 10 points. a. Remove the column 'wip' from the dataset. b. Create another variable names log_productivity which is defined as log_prodductivity log(actual_prouctivity * 100). Store any new variable as an additional column in the original data- = frame. c. Create another variable called 'log_no_of_workers' which is the natural logarithm of the no_of_workers. d. Convert the following variables to factor variables team, quarter, department, and day. e. Create another variable called 'percentage_achivement which is defined as follows: percentage_achievment (actual_productivity targeted_productivity) / = targeted productivity X 100. f. Also for cleaning the variable department, please run the following command (there are some coding errors in the variable department). > levels (d$department)<-c("finishing", "finishing", "sewing") 2. Exploratory Analysis - 40 points. a. Create the histograms of actual productivity and log_productivity. How does the distribution of log_productivity change with respect to actual productivity? Do the same for number of workers. b. Each month is divided into five quarters, where approximately each week is a quarter. How does the distribution of logarithm of productivity change in each quarter? Create a box plot of logarithm of productivity by quarter. Comment on your observations. Does the worker productivity increase towards the end of the month (quarter 5) as compared to other quarters? Perform a t-test for quarter 5 with respect to (individually) all other quarters. (Hint. There will be 4 different t-tests). What do you observe for each t-test? Comment on the findings. Use a 95% confidence. (You need to state the hypotheses explicitly in your answer, the mean and standard deviations for each of the groups in a t-tests, the t-statistics and the p-values. Then you need to explain what the p-value means.). c. Repeat part (b) for department instead of quarter, day instead of quarter, and no_of_style_change instead of quarter. In these cases, perform the t-test for all pairs of departments and all pairs of style changes. For day, compare Sunday with all other weekdays. d. Perform a scatter plot of the natural logarithm of no_of_workers +1 on x-axis and natural logarithm of productivity on y-axis. What do you observe? Comment on any pattern that you may observe. Report the correlation coefficient between the two variables. e. Perform a scatter plot of the natural logarithm of incentive + 1 on x-axis and natural logarithm of productivity on y-axis. What do you observe? Comment on any patterns that you may observe. Report the correlation coefficient between the two variables. f. Repeat (d) and (e) for percentage_achievement instead of logarithm of productivity.

MATLAB: An Introduction with Applications
6th Edition
ISBN:9781119256830
Author:Amos Gilat
Publisher:Amos Gilat
Chapter1: Starting With Matlab
Section: Chapter Questions
Problem 1P
icon
Related questions
Question
Assignment Description
In this assignment, we will try to estimate factors associated with the productivity of garment manufacturing
workers. The data from the project comes from the University of California - Irvine's Machine Learning
Repository. The link to the dataset is:
https://archive.ics.uci.edu/ml/datasets/Productivity+Prediction+of+Garment+Employees#
In R, you may load the dataset using the following command:
d <- read.csv"https://archive.ics.uci.edu/ ml/machine-learning-databases/00597/ garments_worker_productivity.csv", header
= TRUE, as.is = TRUE)
The dataset is available in a csv file names 'garment_worker_productivity.csv'. The dataset will be loaded in the
data-frame named d. The variable of interest is 'actual productivity' which is a number between 0 and 1
indicating the productivity of workers in garment manufacturing. The variables in the original dataset are the
following (taken from the data webpage):
Column # Name
date
day
quarter
department
team_no
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
no_of_workers
no_of_style_change
targeted_productivity
smv
wip
over_time
incentive
idle_time
idle_men
actual productivity
Description
Date in MM-DD-YYYY
Day of the Week
A portion of the month. A month was divided into four quarters
Associated department with the instance
Associated team number with the instance
Number of workers in each team
Number of changes in the style of a particular product
Targeted productivity set by the Authority for each team for each day.
Standard Minute Value, it is the allocated time for a task
Work in progress. Includes the number of unfinished items for products
Represents the amount of overtime by each team in minutes
Represents the amount of financial incentive (in BDT) that enables or
motivates a particular course of action.
The amount of time when the production was interrupted due to several
reasons
The number of workers who were idle due to production interruption
The actual % of productivity that was delivered by the workers. It ranges
from 0-1.
Transcribed Image Text:Assignment Description In this assignment, we will try to estimate factors associated with the productivity of garment manufacturing workers. The data from the project comes from the University of California - Irvine's Machine Learning Repository. The link to the dataset is: https://archive.ics.uci.edu/ml/datasets/Productivity+Prediction+of+Garment+Employees# In R, you may load the dataset using the following command: d <- read.csv"https://archive.ics.uci.edu/ ml/machine-learning-databases/00597/ garments_worker_productivity.csv", header = TRUE, as.is = TRUE) The dataset is available in a csv file names 'garment_worker_productivity.csv'. The dataset will be loaded in the data-frame named d. The variable of interest is 'actual productivity' which is a number between 0 and 1 indicating the productivity of workers in garment manufacturing. The variables in the original dataset are the following (taken from the data webpage): Column # Name date day quarter department team_no 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 no_of_workers no_of_style_change targeted_productivity smv wip over_time incentive idle_time idle_men actual productivity Description Date in MM-DD-YYYY Day of the Week A portion of the month. A month was divided into four quarters Associated department with the instance Associated team number with the instance Number of workers in each team Number of changes in the style of a particular product Targeted productivity set by the Authority for each team for each day. Standard Minute Value, it is the allocated time for a task Work in progress. Includes the number of unfinished items for products Represents the amount of overtime by each team in minutes Represents the amount of financial incentive (in BDT) that enables or motivates a particular course of action. The amount of time when the production was interrupted due to several reasons The number of workers who were idle due to production interruption The actual % of productivity that was delivered by the workers. It ranges from 0-1.
Perform the following analysis.
1. Data processing - 10 points.
a. Remove the column 'wip' from the dataset.
b. Create another variable names log_productivity which is defined as log_prodductivity
log(actual_prouctivity * 100). Store any new variable as an additional column in the original data-
=
frame.
c. Create another variable called 'log_no_of_workers' which is the natural logarithm of the
no_of_workers.
d. Convert the following variables to factor variables team, quarter, department, and day.
e. Create another variable called 'percentage_achivement which is defined as follows:
percentage_achievment
(actual_productivity
targeted_productivity) /
=
targeted productivity X 100.
f. Also for cleaning the variable department, please run the following command (there are some
coding errors in the variable department).
> levels (d$department)<-c("finishing", "finishing", "sewing")
2. Exploratory Analysis - 40 points.
a. Create the histograms of actual productivity and log_productivity. How does the distribution
of log_productivity change with respect to actual productivity? Do the same for number of
workers.
b. Each month is divided into five quarters, where approximately each week is a quarter. How
does the distribution of logarithm of productivity change in each quarter? Create a box plot
of logarithm of productivity by quarter. Comment on your observations. Does the worker
productivity increase towards the end of the month (quarter 5) as compared to other quarters?
Perform a t-test for quarter 5 with respect to (individually) all other quarters. (Hint. There will
be 4 different t-tests). What do you observe for each t-test? Comment on the findings. Use a
95% confidence. (You need to state the hypotheses explicitly in your answer, the mean and
standard deviations for each of the groups in a t-tests, the t-statistics and the p-values. Then
you need to explain what the p-value means.).
c. Repeat part (b) for department instead of quarter, day instead of quarter, and
no_of_style_change instead of quarter. In these cases, perform the t-test for all pairs of
departments and all pairs of style changes. For day, compare Sunday with all other weekdays.
d. Perform a scatter plot of the natural logarithm of no_of_workers +1 on x-axis and natural
logarithm of productivity on y-axis. What do you observe? Comment on any pattern that you
may observe. Report the correlation coefficient between the two variables.
e. Perform a scatter plot of the natural logarithm of incentive + 1 on x-axis and natural logarithm
of productivity on y-axis. What do you observe? Comment on any patterns that you may
observe. Report the correlation coefficient between the two variables.
f. Repeat (d) and (e) for percentage_achievement instead of logarithm of productivity.
Transcribed Image Text:Perform the following analysis. 1. Data processing - 10 points. a. Remove the column 'wip' from the dataset. b. Create another variable names log_productivity which is defined as log_prodductivity log(actual_prouctivity * 100). Store any new variable as an additional column in the original data- = frame. c. Create another variable called 'log_no_of_workers' which is the natural logarithm of the no_of_workers. d. Convert the following variables to factor variables team, quarter, department, and day. e. Create another variable called 'percentage_achivement which is defined as follows: percentage_achievment (actual_productivity targeted_productivity) / = targeted productivity X 100. f. Also for cleaning the variable department, please run the following command (there are some coding errors in the variable department). > levels (d$department)<-c("finishing", "finishing", "sewing") 2. Exploratory Analysis - 40 points. a. Create the histograms of actual productivity and log_productivity. How does the distribution of log_productivity change with respect to actual productivity? Do the same for number of workers. b. Each month is divided into five quarters, where approximately each week is a quarter. How does the distribution of logarithm of productivity change in each quarter? Create a box plot of logarithm of productivity by quarter. Comment on your observations. Does the worker productivity increase towards the end of the month (quarter 5) as compared to other quarters? Perform a t-test for quarter 5 with respect to (individually) all other quarters. (Hint. There will be 4 different t-tests). What do you observe for each t-test? Comment on the findings. Use a 95% confidence. (You need to state the hypotheses explicitly in your answer, the mean and standard deviations for each of the groups in a t-tests, the t-statistics and the p-values. Then you need to explain what the p-value means.). c. Repeat part (b) for department instead of quarter, day instead of quarter, and no_of_style_change instead of quarter. In these cases, perform the t-test for all pairs of departments and all pairs of style changes. For day, compare Sunday with all other weekdays. d. Perform a scatter plot of the natural logarithm of no_of_workers +1 on x-axis and natural logarithm of productivity on y-axis. What do you observe? Comment on any pattern that you may observe. Report the correlation coefficient between the two variables. e. Perform a scatter plot of the natural logarithm of incentive + 1 on x-axis and natural logarithm of productivity on y-axis. What do you observe? Comment on any patterns that you may observe. Report the correlation coefficient between the two variables. f. Repeat (d) and (e) for percentage_achievement instead of logarithm of productivity.
AI-Generated Solution
AI-generated content may present inaccurate or offensive content that does not represent bartleby’s views.
steps

Unlock instant AI solutions

Tap the button
to generate a solution

Recommended textbooks for you
MATLAB: An Introduction with Applications
MATLAB: An Introduction with Applications
Statistics
ISBN:
9781119256830
Author:
Amos Gilat
Publisher:
John Wiley & Sons Inc
Probability and Statistics for Engineering and th…
Probability and Statistics for Engineering and th…
Statistics
ISBN:
9781305251809
Author:
Jay L. Devore
Publisher:
Cengage Learning
Statistics for The Behavioral Sciences (MindTap C…
Statistics for The Behavioral Sciences (MindTap C…
Statistics
ISBN:
9781305504912
Author:
Frederick J Gravetter, Larry B. Wallnau
Publisher:
Cengage Learning
Elementary Statistics: Picturing the World (7th E…
Elementary Statistics: Picturing the World (7th E…
Statistics
ISBN:
9780134683416
Author:
Ron Larson, Betsy Farber
Publisher:
PEARSON
The Basic Practice of Statistics
The Basic Practice of Statistics
Statistics
ISBN:
9781319042578
Author:
David S. Moore, William I. Notz, Michael A. Fligner
Publisher:
W. H. Freeman
Introduction to the Practice of Statistics
Introduction to the Practice of Statistics
Statistics
ISBN:
9781319013387
Author:
David S. Moore, George P. McCabe, Bruce A. Craig
Publisher:
W. H. Freeman