Software defects in NASA spacecraft instrument code. Portions of computer software code that may contain undetected defects are called blind spots. The issue of blind spots in software code evaluation was addressed at the 8th IEEE International Symposium on High Assurance Software Engineering (March 2004 ). The researchers developed guidelines for assessing methods of predicting software defects using data on 498 modules of software code written in “C” language for a NASA spacecraft instrument. One simple prediction algorithm is to count the lines of code in the module; any module with more than 50 lines of code is predicted to have a defect. The accompanying file contains the predicted and actual defect status of all 498 modules. A standard approach to evaluating a software defect prediction algorithm is to form a two-way summary table similar to the one shown here. In the table, a, b, c, and d represent the number of modules in each cell. Software engineers use these table entries to compute several
- a. Accuracy is defined as the probability that the prediction algorithm is correct. Write a formula for accuracy as a
function of the table values a, b, c, and d. - b. The detection rate is defined as the probability that the algorithm predicts a defect, given that the module actually is a defect. Write a formula for detection rate as a function of the table values a, b, c, and d.
- c. The false alarm rate is defined as the probability that the algorithm predicts a defect, given that the module actually has no defect. Write a formula for false alarm rate as a function of the table values a, b, c, and d.
- d. Precision is defined as the probability that the module has a defect, given that the algorithm predicts a defect. Write a formula for precision as a function of the table values a, b, c, and d.
- e. Access the accompanying file and compute the values of accuracy, detection rate, false alarm rate, and precision. Interpret the results.
Want to see the full answer?
Check out a sample textbook solutionChapter 3 Solutions
Statistics for Business and Economics (13th Edition)
Additional Math Textbook Solutions
Statistics: Informed Decisions Using Data (5th Edition)
Elementary Statistics (Text Only)
Fundamentals of Statistics (5th Edition)
STATS:DATA+MODELS-W/DVD
Business Statistics: A First Course (8th Edition)
Elementary Statistics: A Step By Step Approach
- A1 1 2 3 7 5 5 7 B 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 A Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 X B Discharged_Patients A 54 63 110 105 131 137 80 63 75 92 105 112 120 95 72 128 126 106 129 136 94 74 107 135 124 113 140 83 62 106 fx Day с Records_not_Processed 18 18 52 29 70 57 26 29 14 27 49 38 47 26 28 49 45 39 27 38 44 25 29 62 44 60 46 38 25 39 D E F G H I Jarrow_forwardA national standard requires that public bridges over 20 feet in length must be inspected and rated every 2 years. The rating scale ranges from 0 (poorest rating) to 9 (highest rating). A group of engineers used a probabilistic model to forecast the inspection ratings of all major bridges in a city. For the year 2020, the engineers forecast that 8% of all major bridges in that city will have ratings of 4 or below. Complete parts a and b. a. Use the forecast to find the probability that in a random sample of 10 major bridges in the city, at least 3 will have an inspection rating of 4 or below in 2020. P(x2 3) = (Round to five decimal places as needed.) b. Suppose that you actually observe 3 or more of the sample of 10 bridges with inspection ratings of 4 or below in 2020. What inference can you make? Why? Select the correct answer below. O A. Since the probability of this observation occurring is so large, it can be concluded that the forecast of 8% is too small. There would probably be…arrow_forwardA national standard requires that public bridges over 20 feet in length must be inspected and rated every 2 years. The rating scale ranges from 0 (poorest rating) to 9 (highest rating). A group of engineers used a probabilistic model to forecast the inspection ratings of all major bridges in a city. For the year 2020, the engineers forecast that 6% of all major bridges in that city will have ratings of 4 or below. Complete parts a and b. a. Use the forecast to find the probability that in a random sample of 7 major bridges in the city, at least 3 will have an inspection rating of 4 or below in 2020. P(x≥3)=arrow_forward
- A researcher is interested in examining whether the location that a person lives is related to the number of hours that they spend on the internet each week. The researcher collected data from a sample of 30 participants who were classified in one of three groups: (1) 10 people who live in an urban setting, (2) 10 people who live in a suburban setting, and (3) 10 people who live in a rural setting. Each participant reported the number of hours they spend on the internet in a typical week (the dependent variable). The researcher found the following descriptive statistics: Urban participants reported an average of 8.9 hours of internet use per week with a standard deviation of 2.77. Suburban participants reported an average of 12.7 hours of internet use per week with a standard deviation of 4.88. Rural participants reported an average of 9.8 hours of internet use per week with a standard deviation of 2.82. Using the data that was collected, the researcher found the following: The…arrow_forwardA researcher is interested in examining whether the location that a person lives is related to the number of hours that they spend on the internet each week. The researcher collected data from a sample of 30 participants who were classified in one of three groups: (1) 10 people who live in an urban setting, (2) 10 people who live in a suburban setting, and (3) 10 people who live in a rural setting. Each participant reported the number of hours they spend on the internet in a typical week (the dependent variable). The researcher found the following descriptive statistics: Urban participants reported an average of 8.9 hours of internet use per week with a standard deviation of 2.77. Suburban participants reported an average of 12.7 hours of internet use per week with a standard deviation of 4.88. Rural participants reported an average of 9.8 hours of internet use per week with a standard deviation of 2.82. Using the data that was collected, the researcher found the following: The…arrow_forwardA manager believes that the introduction of a new model of keyboard for administrative staff will increase productivity. If productivity increases substantially, she will replace all the firm's current keyboards with the new models. To test this belief, she asked a sample of 12 administrative staff members to type a standard document on his/her old keyboard, and the number of words per minute were measured. After receiving the new keyboards and spending a few weeks becoming familiar with their operation, each employee then typed the same document using the new model of keyboard. Let Difference = New – Old. 1. This is an example of a: O a. One-sample z-test O b. One-sample t-test O c. One-sample proportion O d. Dependent samples t-test O e. Independent samples t-test 2. The test statistic value for this right-sided test is equal to 2.38. In which interval will the p-value of this hypothesis test fall? ) a. (0, 0.01) O b. (0.01, 0.05) OC. (0.05, 0.10) Od.(0.10, 1)arrow_forward
- Consider the following set of one - dimensional data points: [3, 7, 11, 14, 18, 22]. Perform hierarchical clustering using the complete - linkage method. Provide a handwritten, detailed solution with all computationsarrow_forwardA clinical psychologist is investigating whether the number of mistakes made by an employee can be forecast by the number of tasks an employee was busy with. She randomly assigned a different number of tasks (of the same level of complexity) to each of eight participants. After a participant completed all assigned tasks, the psychologist calculated the number of errors made. The data for all eight participants have been listed in the table below: Number of tasks performed simultaneously 2 4 7 6 2 5 1 3 Number of errors made 0 1 6 2 0 1 0 2 State which variable is the independent variable (?) and which is the dependent variable (?). Calculate Pearson’s correlation coefficient and interpret it.arrow_forwardRecent research suggests that the amount of time that parents spend talking about numbers can have a big effect on the mathematical development of their children (Levine, Suriyakham, Rowe, Huttenlocher, & Gunderson, 2010). In the study, the researchers visited the children's homes between the ages of 14 and 30 months and recorded the amount of “number talk" they heard from the children's parents. The researchers then tested the children's knowledge of the meaning of numbers at 46 months. The following data are similar to the results obtained in the study. Children's Knowledge-of-Numbers Scores for Two Groups of Parents Low Number-Talk High Number-Talk Parents Parents 2, 1, 2, 3, 4 3, 4, 5, 4, 5 3, 3, 2, 2, 1 5, 3, 4, 1, 2 4, 2, 3, 5, 4 5, 3, 4, 5, 4 Sketch a polygon showing the frequency distribution for children with low number-talk parents. In the same graph, sketch a polygon showing the scores for the children with high number-talk parents. (Use two different colors or use a solid…arrow_forward
- you are asked to create a stemplot for a collection of four digit pieces of data: 6574, 2089, 8891, 6452, 1446, 7799. What is the optimal number of digits in each stem?arrow_forward......arrow_forwardDays Precipitation Yield 261 34.2 115 215 53.7 178 202 42.8 131 238 36.9 147 170 39.1 137 323 13.4 191 220 63.2 133arrow_forward
- Mathematics For Machine TechnologyAdvanced MathISBN:9781337798310Author:Peterson, John.Publisher:Cengage Learning,