GREAT PROJECT Part 1: Correlation and Linear Regression It is widely believed that the more education one receives the higher the income earned at the time of first employment and over the course of a career. However, due to varying reasons, many people never complete high school and, thus, never receive their high-school diploma. Although individuals without a high-school diploma are often able to find employment, they experience economic outcomes quite different from those who finish high school before entering the workforce to earn a living. Across the nation, there are millions of individuals with families who are now working but do not possess the credentials of a high-school diploma. Many of these individuals and their families are considered to be a part of the working poor that make up a considerable portion of this nation’s labor force. 1. Use technology to create and provide a scatterplot of the association between the “percent of low-income working families” and the “percent of 18-64 yr-olds with no high school diploma” data for each jurisdiction. Write at least two sentences explaining how/why it is appropriate to create such a scatterplot, and describe the characteristics of the association seen in the scatterplot. Be sure to use the actual names of the variables in their appropriate places in your response(s). (Print or copy-and-paste the scatterplot and be sure to clearly identify the predictor and response variables based on the possible believed association.) 2. Use technology to find the regression equation for the linear association between the “percent of low-income working families” and the “percent of 18-64 yr-olds with no high school diploma.” (Round final values to two decimal places.) Provide this equation and write a brief interpretation of the slope using the variable names. (Print or copy-and-paste the printout that identified the equation of the linear regression line, or any other form of evidence that technology was used.) 3. A student states that a decrease in the “percent of 18-64 yr-olds with no high school diploma” will lead to a decrease in the “percent of low-income working families.” Write at least two concise sentences addressing the key uses of linear correlation and comment on its limitations in a response to the student’s statement. 4. Calculate and provide the R-squared value for the regression equation. Provide a statement about its meaning, in general, and, its specific interpretation in the context of this assignment. 5. After examining these data for all the jurisdictions, someone notes that certain areas have an unusually high “percent of 18-64 yr-olds with no high school diploma.” Based on this finding, this individual concludes that the high percentages are due to the rising population of immigrants in those areas. Further, the individual argues that any estimates of the associated “percent of low-income working families” in those areas should be recalculated after removing this sub-population from the data set, as they are causing the area to “look bad”. In addition to thinking critically, use the key rules about linear regression and extrapolation to write a statistically appropriate and socially responsible response to the individual’s conclusion and argument. Reference(s): The Working Poor Families Project. (2011). Indicators and Data. Retrieved from http://www.workingpoorfamilies.org/indicators/ 2011 Data Jurisdiction Percent of low income working families (<200% poverty level) Percent of 18-64 year olds with no HS diploma Alabama 37.3 15.3 Alaska 25.9 8.6 Arizona 38.9 14.8 Arkansas 41.8 14 California 34.3 17.6 Colorado 27.6 10.1 Connecticut 21.1 9.5 Delaware 27.8 11.9 District of Columbia 23.2 10.8 Florida 37.3 13.1 Georgia 36.6 14.9 Hawaii 25.8 7.2 Idaho 38.6 10.7 Illinois 30.4 11.5 Indiana 31.9 12.2 Iowa 28.8 8.1 Kansas 32 9.7 Kentucky 34.1 13.6 Louisiana 36.3 16.1 Maine 30.4 7.1 Maryland 19.5 9.7 Massachusetts 20.1 9.1 Michigan 31.6 10 Minnesota 24.2 7.3 Mississippi 43.6 17 Missouri 32.7 11.1 Montana 36 7 Nebraska 31.1 8.7 Nevada 37.4 16.6 New Hampshire 19.7 7.3 New Jersey 21.2 10.1 New Mexico 43 16.2 New York 30.2 13 North Carolina 36.2 13.6 North Dakota 27.2 5.9 Ohio 31.8 10.3 Oklahoma 37.4 13.2 Oregon 33.9 10.8 Pennsylvania 26 9.4 Rhode Island 26.9 12 South Carolina 38.3 14.2 South Dakota 31 8.7 Tennessee 36.6 12.7 Texas 38.3 17.8 Utah 32.3 9.9 Vermont 26.2 6.6 Virginia 23.3 10.2 Washington 26.4 10.2 West Virginia 36.1 12.9 Wisconsin 28.7 8.5 Wyoming 28.1 8
Correlation
Correlation defines a relationship between two independent variables. It tells the degree to which variables move in relation to each other. When two sets of data are related to each other, there is a correlation between them.
Linear Correlation
A correlation is used to determine the relationships between numerical and categorical variables. In other words, it is an indicator of how things are connected to one another. The correlation analysis is the study of how variables are related.
Regression Analysis
Regression analysis is a statistical method in which it estimates the relationship between a dependent variable and one or more independent variable. In simple terms dependent variable is called as outcome variable and independent variable is called as predictors. Regression analysis is one of the methods to find the trends in data. The independent variable used in Regression analysis is named Predictor variable. It offers data of an associated dependent variable regarding a particular outcome.
GREAT PROJECT
Part 1:
It is widely believed that the more education one receives the higher the income earned at the time of first employment and over the course of a career. However, due to varying reasons, many people never complete high school and, thus, never receive their high-school diploma. Although individuals without a high-school diploma are often able to find employment, they experience economic outcomes quite different from those who finish high school before entering the workforce to earn a living. Across the nation, there are millions of individuals with families who are now working but do not possess the credentials of a high-school diploma. Many of these individuals and their families are considered to be a part of the working poor that make up a considerable portion of this nation’s labor force.
1. Use technology to create and provide a scatterplot of the association between the “percent of low-income working families” and the “percent of 18-64 yr-olds with no high school diploma” data for each jurisdiction. Write at least two sentences explaining how/why it is appropriate to create such a scatterplot, and describe the characteristics of the association seen in the scatterplot. Be sure to use the actual names of the variables in their appropriate places in your response(s). (Print or copy-and-paste the scatterplot and be sure to clearly identify the predictor and response variables based on the possible believed association.)
2. Use technology to find the regression equation for the linear association between the “percent of low-income working families” and the “percent of 18-64 yr-olds with no high school diploma.” (Round final values to two decimal places.) Provide this equation and write a brief interpretation of the slope using the variable names. (Print or copy-and-paste the printout that identified the equation of the linear regression line, or any other form of evidence that technology was used.)
3. A student states that a decrease in the “percent of 18-64 yr-olds with no high school diploma” will lead to a decrease in the “percent of low-income working families.” Write at least two concise sentences addressing the key uses of
4. Calculate and provide the R-squared value for the regression equation. Provide a statement about its meaning, in general, and, its specific interpretation in the context of this assignment.
5. After examining these data for all the jurisdictions, someone notes that certain areas have an unusually high “percent of 18-64 yr-olds with no high school diploma.” Based on this finding, this individual concludes that the high percentages are due to the rising population of immigrants in those areas. Further, the individual argues that any estimates of the associated “percent of low-income working families” in those areas should be recalculated after removing this sub-population from the data set, as they are causing the area to “look bad”. In addition to thinking critically, use the key rules about linear regression and extrapolation to write a statistically appropriate and socially responsible response to the individual’s conclusion and argument.
Reference(s): The Working Poor Families Project. (2011). Indicators and Data. Retrieved from http://www.workingpoorfamilies.org/indicators/
2011 Data
Jurisdiction Percent of low income working families (<200% poverty level) Percent of 18-64 year olds with no HS diploma
Alabama 37.3 15.3
Alaska 25.9 8.6
Arizona 38.9 14.8
Arkansas 41.8 14
California 34.3 17.6
Colorado 27.6 10.1
Connecticut 21.1 9.5
Delaware 27.8 11.9
District of Columbia 23.2 10.8
Florida 37.3 13.1
Georgia 36.6 14.9
Hawaii 25.8 7.2
Idaho 38.6 10.7
Illinois 30.4 11.5
Indiana 31.9 12.2
Iowa 28.8 8.1
Kansas 32 9.7
Kentucky 34.1 13.6
Louisiana 36.3 16.1
Maine 30.4 7.1
Maryland 19.5 9.7
Massachusetts 20.1 9.1
Michigan 31.6 10
Minnesota 24.2 7.3
Mississippi 43.6 17
Missouri 32.7 11.1
Montana 36 7
Nebraska 31.1 8.7
Nevada 37.4 16.6
New Hampshire 19.7 7.3
New Jersey 21.2 10.1
New Mexico 43 16.2
New York 30.2 13
North Carolina 36.2 13.6
North Dakota 27.2 5.9
Ohio 31.8 10.3
Oklahoma 37.4 13.2
Oregon 33.9 10.8
Pennsylvania 26 9.4
Rhode Island 26.9 12
South Carolina 38.3 14.2
South Dakota 31 8.7
Tennessee 36.6 12.7
Texas 38.3 17.8
Utah 32.3 9.9
Vermont 26.2 6.6
Virginia 23.3 10.2
Washington 26.4 10.2
West Virginia 36.1 12.9
Wisconsin 28.7 8.5
Wyoming 28.1 8
Trending now
This is a popular solution!
Step by step
Solved in 8 steps with 1 images