The statistics that were analyzed for the dataset crime.csv focuses on diverse variables across 51 states. The dataset included the rates of murder, poverty, high school graduation, college attendance, percentage of single parent households, and the population distribution in metropolitan areas. The main objective was to conduct a statistical analysis to find the relationship between these various socioeconomic factors with the help of utilizing linear regression models. This statistical analysis will help determine factors that influence crime rates across the United States.
The analytical process involves data preparation, correlation tests, and the use of linear regression and multiple linear regression. These methods help provide a comprehensive understanding of the relationships within the dataset, enabling more informed decision-making and give a better understanding of the factors influencing crime rates in the United States. The analysis’ primary result aids to enhance decision-making tools for policymakers and law enforcement agencies. These decision-making tools can be provided by identifying the socioeconomic reasons associated with the crime rates in different regions.
Data preparation involves the refining of the crime.csv dataset by removing the non-
numerical variable “state” to create the dataset known as crime.data. By removing this variable, the dataset becomes more suitable for the application of statistical models like linear regressions and multiple linear regressions. This dataset specifically includes murder rates, poverty rates, high school graduation rates, college attendance rates, the percentage of single parent households, and population distribution in metropolitan areas. The newly refined dataset is now able to provide analysis about the relationships between socioeconomic variables and crime rates
across different states.
In simple linear regression, the focus is on predicting the murder rate based on a single predictor variable, the percentage of single parent households. The model focuses on the linear relationship between these two variables, and because of its simplicity, it results in this equation:
murder rate
=−
8.2477
+
0.5595
∗
single parent percentage
. The P value associated with the predictor variable, is (P < 0.001), indicating that this predictor variable significantly influences the variation in murder rates across states.
P-Value
Value
P
(
T
≤
t
)
One Tail
9.822
E
−
47
P
(
T
≤
t
)
Two Tail
1.964E-46
To illustrate the practical application of this model, imagine a state with a single parent percentage of 29. The model predicts a murder rate of approximately 7.98. In contrast, a state with a single parent percentage of 25.4 is estimated to have a murder rate of approximately 5.96. These statistics offer tangible insights into the potential impact of single-parent percentages on
murder rates, demonstrating the utility of simple linear regressions in analyzing relationships. In multiple linear regression, a model is used to predict the murder rate based on several predictor variables simultaneously. These variables include poverty rates, high school graduation rates, college attendance rates, unemployment rates, and the proportion of the population residing in metropolitan regions. The model's overall P value indicates that at least one predictor variable has statistical relevance in forecasting murder rates.