SIE 464-564 Midterm Spring 2019
docx
keyboard_arrow_up
School
University Of Arizona *
*We aren’t endorsed by this school
Course
464
Subject
Industrial Engineering
Date
Jan 9, 2024
Type
docx
Pages
15
Uploaded by BaronParrotMaster742
Due date:
Sunday April 7, 2019 11:59 PM
Exam must be submitted by uploading to the “Midterm” folder in the D2L course
Dropbox.
Students are expected to uphold the
UA Code of Academic Integrity
(
http://deanofstudents.arizona.edu/policies-and-codes/code-academic-integrity
).
This
take home exam is expected to be an individual effort.
Any evidence of cheating will
result in a failing grade on the exam for all parties involved.
This has happened before
and I don’t believe the risk is worthwhile.
In addition to your textbooks, you are allowed to consult any reference in the library, on
the internet, and on D2L.
You are not
allowed to consult with anyone else, regardless of
whether they are enrolled in the class or at UA.
Everyone must answer the first 6 questions.
Those enrolled in SIE 564 must also answer question 7.
An extra credit question is provided at the end.
Question 1 (10 points)
On the episode of FREAKONOMICS “Here’s Why all Your Projects Are Always Late
and What to do About it”, Stephen J. Dubner talks about a tendency that many people
have, the planning fallacy, a term coined by Nobel prize winning psychologist Daniel
Kahneman and another psychologist Amos Tversky in the 1970s. In the podcast,
specialists talk about reasons for this tendency.
A)
Apply metacognition to analyze the manner in which you turn in assignments
during your college career and describe a way in which the planning fallacy may
apply to you, different from the ones mentioned in the podcast.
Applying the metacognition, I observed and evaluated my assignment and found out
that most of my assignments were submitted very close to the due date and time.
Part of it was fear of not getting the correct answers and another part was that I had
a planning fallacy procrastinating each assignment till it was close to the due time
and date to realize that I have to put all my effort into it in order to get it done.
Planning fallacy applies to me in this way as my mind plants fear seeds to prevent
me from trusting my ability to get the assignments correctly right away then I would
have to do them over and over.
B)
Continuing with the planning fallacy, describe a way to overcome it using
DMAIC. Explain in detail what you want to improve, how you will measure your
progress, how you will analyze your data, and what you would do if you
improved?
I want to improve on my ability to get the assignments submitted on earlier than a
few minutes from the due date and time. I want to improve the timing and precision
manner in which I do submit my assignments.
I would measure my assignments’ submission using DMAIC by starting with
working on assignments much earlier. I would also remove distractions. I would set
up a time frame for me to start on each assignment a week earlier than the due date
to give myself enough time to work on them without the pressure associated with
them being close to submission and enough time to review them and ask about them.
I would measure the improvement of my assignment submission by putting a time
frame to evaluate the data I would get if I started my assignments couple days early
giving myself time to research and fully understand the materials of the assignment
before attempting to solve the assignments without fully knowledge of what I was
supposed to do.
The data also will be analyzed by the amount of assignments I submit in a matter of
a month with with the consideration of me getting more understanding of the
materials before doing my assignments.
I would actually reward myself if Improved. The reward I would have would be
having less stressful time during my assignments submission and more time to get
other assignments done or work on hobbies.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Question 2 (10 points)
Based on the Python code from the following link:
https://github.com/coin-dataset/code/blob/master/tc-rc3d/evaluate.py
A)
Please calculate the following:
Physical Lines of Code
131
Logical Lines of Code
83
Comment Lines
35
B)
Based on your answers from Part A, calculate the effort in person-months using
COCOMO II.
Assume all cost driver ratings to be nominal and all scale factors to
be rated high.
The effort based on the values given turned out to be 0.2
C)
Do you think the result from Part B is an overestimate or an underestimate?
Explain
I do think that the results were underestimated because the effort was
extremely small.
D)
Your former classmate is a manager of a commercial building construction
company.
He wants to estimate the effort of constructing commercial buildings
and is asking for your help.
Would it be appropriate if your former classmate
used COCOMO II?
Explain
No, because COCOMO is suitable for software engineering projects not
construction or industrial engineering. Also, COCOMO is old and not updated
recently which gives a high chance of inaccurate estimation.
Question 3 (10 points)
The weather company you work for is performing an analysis on predicting sandstorms
accurately in Phoenix. You have been given a large dataset and are asked to determine
which factors could be used to accurately predict them.
The dataset you are provided with has over 50 variables. With your vast knowledge of
Excel, you determine that 4 variables can be used as predictors to accurately predict
sandstorms. Your boss is not convinced and decided to ask you some questions.
A)
You mentioned that before you even begin doing this “regression analysis” of
yours, you discarded about half of the variables. How did you choose which
variables to discard, and more importantly, why does it matter?
Predictor variables can be excluded from the analysis on the basis of the
following: Identify outliers and influential points - maybe exclude them at least
temporarily. The need to keep only the required predictor variables in the
regression analysis because of the following reasons: 1) Unnecessary predictors
will add noise to the estimation of other quantities that we are interested in.
Degrees of freedom will be wasted. 2) Collinearity is caused by having too many
variables trying to do the same job. 3) If the model is to be used for prediction,
we can save time and/or money by not measuring redundant predictors.
High P-value decreased coefficient goes up and the variable would have high
effect on the model. Which is something you can use in order to discard variables
that are not needed.
Another thing to keep in mind that the cells has text would give errors based
from the Kaggle assignment that all the cells that got it give errors and wouldn’t
work in getting data for the regression analysis.
It also does matter as keeping variable that has high correlation to each other
wouldn’t accurately represent the model. Another reason would be the fact that
these would have high impact on R-squared yet may not project the model to be
as accurate as it should be for the weather company.
B)
You asked a friend and he told you that in his company they use both “Relative
Humidity” and “Number of Sunny Hours” to predict sandstorms. In your report,
you mention you only use the “Relative Humidity”, because of it being highly
correlated with “Number of Sunny Hours”.
However, by themselves each are
good predictors, why didn’t you use both? Wouldn’t that make the model better?
As both “Relative Humidity” and “Number of Sunny Hours” are highly
correlated, it will add collinearity in the model as both the variables are
doing the same job.
C)
By looking at the correlation matrix you created between independent variables
and the dependent variable
, which independent variables based on the type of
correlation did you discard or keep, and why?
a.
Strong positive correlation
b.
Weak positive correlation
c.
No correlation
d.
Weak negative correlation
e.
Strong negative correlation
the variables to keep or discard will depend on the interest of your output. We
cannot filter the correlation values. If there is a correlation between independent
and dependent variables, those will be included in the analysis. If dependent and
independent variables do not have any correlation, that variables will be excluded
from the analysis. Hence, the variables with the following types of correlation will be
included in the analysis:
1.
Strong positive correlation, I will keep.
2.
Strong negative correlation, I will also keep.
I will discard the rest. The reason I’m keeping these variable and discarding the rest
is that the strong correlation whether its positive or negative will have an effect on
the model therefore, they must be kept. As for the ones that have weak or no
correlation, they wouldn’t have that effect on the model so they shouldn’t be in the
model.
For weak correlation, I would take discard them because they are not that impactful
on the model just like the no correlated ones.
Question 4 (20 points)
You tasked Mark and David to come up with a separate regression model to predict house
prices using the Kaggle dataset
train.csv
.
With
SalePrice
as the dependent variable, they
came up with a model based on the first half of the dataset (from Row 2 to Row 731).
Each model consists of the following independent variables:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Mark’s model:
1stFlrSF
and
2ndFlrSF
David’s model:
LotArea
,
BsmtFinSF1
,
TotalBsmtSF
,
GarageArea
,
WoodDeckSF
,
and
OpenPorchSF
A)
What is the R-squared value of each model?
Based on the second half of the
dataset (from Row 732 to Row 1461), what is the average validation error of each
model?
Which model would you choose?
Explain why you made your selection.
Mark’s model R-Squared: 0.62743197
David’s model R-Squared: First model
:
0.62113796
R-Squared After removing the highly correlated variable : 0.61720776
Mark’s Model validation error is 35087.6738
David’s Model validation error is 36137.6968
Based on what I saw from the regression model, I concluded that Mark’s
model is more accurate than David’s that’s why I selected it since the R-
Squared is higher.
B)
You noticed David chose more independent variables than Mark, and you suspect
the issue of multicollinearity (i.e., highly correlated independent variables) might
exist in David’s model.
Assume multicollinearity exists if the correlation between
any two independent variables in the entire dataset
is at least 0.5.
Based on your
correlation matrix, which variable(s) would you eliminate?
Show your
correlation matrix and explain the process behind your decision to eliminate the
variable(s).
The variable I chose to eliminate based on my correlation matrix was
BsmtFinSF1
As seen highlighted in my matrix, The highest correlated variable that doesn’t
highly correlated with the SalesPrice is
BsmtFinSF1 only.
C)
If you removed any variable in Part B, provide the R-squared value and the
average validation error of David’s updated model.
Will your model selection in
Part A change?
Why or why not?
This is the validation error I got for the new David model 36346.5593
The New R-Squared is
0.61720776
No, I wouldn’t change my choice for Part A. Its because even after what we
have done, Mark’s model has a slightly higher R-squared.
D)
Andrew created his own model and tells you that his model’s R-squared value is
0.75.
However, he does not tell you what independent variables he chose for his
model.
He also does not provide his validation error because he claims that his
model would do a good job predicting house prices accurately.
Assume Andrew
is a well-intentioned person, he provides you house price predictions based on his
model using the same dataset, and he performed the same 50/50 data split just like
in Part A.
Should you choose Andrew’s model?
Why or why not?
In condition of Andrew choosing The same independent variable like Mark and
David (which is Salesprice), I would choose his model as I think it would be more
accurate since the R-Squared is higher than both of the pervious models. However,
Since we don’t, I wouldn’t choose it because it may be as accurate since variables
are unknown which may not give a proper prediction of the sales prices.
Question 5 (10 points)
A Request for Proposal (RFP) was released from the Government to bid on a specific
project. Bob, the business development manager, comes to the Tracy, the Program
Manager, wanting input for the System Engineering effort that it will take to complete
this job. Bob wants to know if Tracy has a team that can complete the job and how many
person-months it will take to complete it if we were to win.
As Tracy reviewed the Statement of Work (SOW), she began to formulate a plan. When
reading the SOW, she saw that there were a total of 550 systems engineering
requirements. In reviewing the requirements, there will be approximately 50 of the 550
requirements have been done before on a comparable project. There are four system
interfaces (two nominal and two difficult), five algorithms (two easy and three difficult)
and two nominal operation scenarios.
Cost Parameters
Level
Requirements Understanding
L
Architecture Understanding
VL
Level of Service Requirements
H
Migration Complexity
N
Technology risk
L
Documentation
N
# of diversity of installation/platforms
H
# of recursive levels in the design
N
Stakeholder team cohesion
N
Personnel/team capability
L
Personnel experience/continuity
L
Process capability
H
Mutlisite Coordination
N
Tool Support
N
Very Low (VL), Low (L), Nominal (N), High (H), Very High (VH)
A)
Tracy provided Bob the cost parameters and the level rating that will affect the
person-months for the system engineering part of the scope of work. Use the
COSYSMO tool provided to give Bob the estimate (in person-months) that it will
take for the System Engineering group to complete the project.
The estimation I have gotten from the information given was 880.1 person
months.
B)
Bob tells Tracy the person-months she provided is too high and we want to be
aggressive at winning this project. Which cost parameters level ratings do you
recommend Tracy change so that the number of person-months required for this
scope of work decreases? Obviously you cannot just change the level rating, so
recommend what Tracy can do within the company to improve these ratings?
What will the level rating be if these changes were implemented?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
I would recommend that Tracy work on one of three parameters to improve.
1-Requirment understanding. 2-Level of service requirements 3-
Architecture understanding .
In order to have a better impact on the company’s person months in this project,
Tracy should focus mostly on getting the people to have a better understanding of
the requirements since it has a very high rate and it would impact the person
months greatly. If it would reach a “Nominal” It would drop the person months
dramatically from 880.1 to 647.1. Also, if they have a better understanding of the
level of service, it would drop the person months even more.
The requirement understanding, Tracy would have to find ways to clear up some
undefined areas in order to be able for her to get the rating from the low to nominal.
The Architecture is the lowest thing Tracy has. Therefore, she needs to work on
getting minimal understanding of the project and have more work breakdown
structure (5-6 levels) in order to make the VL rating they have to get lowered into L
so the person months would decrease.
The reason I’m saying this is that a part of the project was familiar, and some
people had worked on it before which makes it slightly easier for them to
understand the requirements and improve in them and also lower the person
months.
C)
Provide the person-months for the project with the new cost parameter level
Ratings.
All the cost parameters will be the same except for two: understanding the
requirements and also level of service requirements. With these modified into
nominal, The person months would be 399.4. Yet, if they are only changed
slightly which is in from Low to nominal for ‘understanding requirements’
and from very low to low in ‘level of service requirements’, the person
months will be 508.4
Question 6 (10 points)
The class performed a calibration test in class to see if a student’s confidence intervals
could improve to 90% as more rounds of confidence interval questions were performed.
The goal of the calibration test is to have students estimate exactly 9 out of 10 questions
correctly. The confidence interval class average in the first round was only 30% and
improved to just below 90% by the last round.
One student analyzed how his confidence intervals compared to the correct answers in
the first round and realized that the values of the correct answers were lower than the
majority of his respective confidence interval ranges. For the subsequent rounds, he
predicted that his initial confidence interval ranges would continue to be too high and
decreased the lower bound of his confidence interval ranges a little bit, hoping to capture
the correct answers. This technique worked for many questions, but occasionally his
confidence interval ranges were still too high.
On the other hand, another student saw that the values of the correct answers were higher
than the majority of her respective confidence interval ranges in the first round. For the
subsequent rounds, she wildly guessed her answers with very wide confidence interval
ranges for all questions. Sometimes her confidence intervals still did not capture the
correct answers.
Discuss the methods these two students used to improve their confidence intervals and
what would Hubbard likely say about the student’s methods. Which would he say is
better and why?
The first student was improving based on the assumption that his would be still too
high based on the data of the first round he got. It would underconfident case that
made him get some more answers right but still falls under the too high category.
For the second student, she was just guessing with a very wide range. That would
not help the uncertainty to be cleared up. It will just give her some more right
answers.
Hubbard would likely to approve of the firsts student method because the first
student nearly avoided the “anchoring” method Hubbard mentioned the book to be
able to get the get better calibrated and get closer intervals to the correct answer.
What he used was to just started with smaller intervals and started expanding on
what he was getting.
The second student didn’t bother to get anything to improve other than having an
‘absurd’values then hope for an improvement.
Hubbard however, would likely to say that the first student has a better method. The
reason for that this is that he depended on a past data he acquired from the first
round and then based on that, he reduced the uncertainty by decreasing his lower
bound unlike like the other student who just widely guessed with extremely high
ranges. It would also be something Hubbard spoke about in the book to explain this
and get people to be more calibrated.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Extra credit (2 points)
Select something you need to forecast in the future.
For example, how long it will take to
write your thesis, how long it will take to finish your senior design project, or when you
expect to receive a job offer.
Apply one technique from
How to Measure Anything
that
would help you make a better forecast.
I would apply the method of regression in order to help myself getting more
information on how long I would take me to buy a new car. I would first look at
historical data (ask people who brought similar cars).Then, I would start working
on making a data set of how long did it take them and if they brought the car new or
used. Then, I would look which model the car is. Then, I after that I would gather
even more information about they are college students like me or not. Also, I would
find more if they have debts or not. Also, I would get more data on how their
parent’s financial status are if they would help or not or even if the person owns a
business or not. Then, I would start construction my regression model on how long
would I be based on previous data from other individuals and their financial status
and their condition. I would also find out the error and validate it making the car
year model to be the independent variable as it may vary. Also, I would then look
into other methods in the book like the confidence interval to be able to see if there
is a high difference or not. That would give out more accurate results to improve my
forecast of how long would it take me to get the car of my dreams.
This question is for students enrolled in SIE 564
Question 7 (10 points)
Having made their fortune playing online poker, Barry Boehm and Ricardo Valerdi skip
town to go to the island of Kokomo to bask in the sun and drink margaritas (with salt),
taking with them all of their scholarly works.
In a bizarre hurricane, all information
regarding the COCOMO II model is lost, except for the historical data from all programs
which have used it (including ratings for each factor and final effort for each project).
A)
You know the original form of the model’s equation, but not the numerical values
for each rating.
You would like to “rebuild” the COCOMO II model.
How would
you determine appropriate numerical values for each factor rating level?
I would start off with looking into what could work in order to rebuild the model.
Based on information from How the model was first constructed, I would start with
the 4
th
step in the modeling methodology in order to be able to define the relative
data and how much influence each factor had on previous projects. Then, I would
establish a first rating for each of the data ratings. I would also start re estimating
the influences of each rating to ensure more accurate. Not to mention that I have to
summarize my data as well. Then, I would utilize the 6
th
step which talks about the
data gathering of the project. This method isn’t entirely relevant to rebuild the
model. However, It would help in getting a bit more information on each factor in
order to have a more accurate numerical value for each rating. Then I would go
with the 7
th
step of the modeling and run regression models on the data and find
more information about the impact of each factor .Then, find out which ones are the
most significant. I would also make sure to make all the data get calibrated to have
the best possible data set that would help out in re building the model. Last step
would be using the data I got and refine the model based on doing more regression
model and also gather more data if possible.
To explain a little more, since we already have step 4, it would only a look to ensure
that the outcome is what we have in the historical data that COCOMO used. The
approach we can use to be able to back track the data used in the projects (the ones
we have) to go back to use the step 5 “Initial Delphi assessment” in order to define
each rating scale and establish the parameters. I can also look at experts and see the
results they have for their projects to formulate an opinion on how each rating
numerical value would be. I would also look at the produces mean between the
values I get and each outcome that could influence each value. I would also look at
the Bayesian Calibration step to have multiple regression analysis on data points to
validate each of them and ensure my output in consistent with the values I have
from the other steps to decide the influence of each value. These would help me
getting the numerical values for each of the factor rating levels.
B)
Your colleague insists that another effort multiplier “CHUMP”, exists.
Your
history in cost estimation contradicts his claim.
How would you determine that its
effects are or are not worth modeling?
I would start off with the set of the data I have. Its a historical data of projects “Raw
data”. Based on these, I would start working on a regression model then validate the
error along with using the correlation matrix which is the linear relation between
“CHUMP” and other factors in order to find how much the R-Square value would
be and then compare if the CHUMP is highly correlated then the effect wouldn’t
worth modeling since it is based on that and the P value I get from the regression
model. Then, I would go and do that again after removing the CHUMP effort
multiplier then see the effect on the validation error and also the R-Squared values
in order to validate if that is worth modeling or not. It’s because we can only use our
data that we have.
I would also look into uncertainty reduction from the historical data of these
projects to be able to see how they change regarding the end of the project. If the
CHUMP doesn’t really change at all then I would be able to know that it has no
effect on the model.
Lastly, I would look into the equations of calculating the effort multiplier and then
see the results I would get from there to better predict the influence of supposed
CHUMP multiplier. Knowing the little impact, the CHUMP would be making it get
discarded.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help