Data_wrangling

docx

School

Cumberland University *

*We aren’t endorsed by this school

Course

3268

Subject

Statistics

Date

Nov 24, 2024

Type

docx

Pages

5

Uploaded by ngararichrisgmail.com

Report
1 Data Wrangling Assignment Name Institutional Affiliation Date
2 Data Wrangling Task Step 1: Loading the dataset I started by loading the crime data and COVID-19 data into pandas DataFrames. This allowed me to work with the datasets in a format that is easy to manipulate and analyze. Step 2: Exploring the structure and content Next, I explored the structure and content of both datasets using the head() function. This gave me an initial understanding of the data and helped me identify the relevant variables for analysis. Step 3: Identifying the analytical question To guide my data wrangling and analysis process, I formulated a specific analytical question: What is the relationship between the number of reported shoplifting offenses and COVID-19 cases? This question provided a clear objective for the analysis. Step 4: Data wrangling I performed data wrangling to prepare the datasets for analysis. This involved handling missing values, converting data types, and filtering relevant columns. These operations ensured that the data was in a suitable format and contained only the necessary information. Step 5: Merging the datasets
3 In order to analyze the relationship between shoplifting offenses and COVID-19 cases, I merged the crime data and COVID-19 data based on the common ZIP code column. This created a unified dataset that contained information about both variables. Step 6: Data visualization I created a scatter plot to visualize the relationship between the number of reported shoplifting offenses and COVID-19 cases. This allowed me to observe any patterns or trends in the data and gain initial insights. Step 7: Drawing conclusions By examining the scatter plot and analyzing the distribution of data points, I drew conclusions or made inferences about the relationship between the variables. This provided initial insights into the relationship which were: Based on the analysis of the scatter plot, where the x-axis is the number of reported shoplifting offenses and the y-axis is the number of COVID-19 cases by ZIP code, I observe that the data points are located towards the left side of the scatter plot. This concentration of data points towards the left side suggests that there is a higher number of reported shoplifting offenses in ZIP codes with a lower number of COVID-19 cases. On the other hand, ZIP codes with a higher number of COVID-19 cases tend to have a lower number of reported shoplifting offenses.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 Step 8: Calculating correlation coefficient and hypothesis testing To quantitatively measure the strength and significance of the relationship, I calculated the correlation coefficient and conducted hypothesis testing. This allowed me to determine the statistical significance of the observed relationship. Step 9: Interpreting the results Finally, I interpreted the correlation coefficient and p-value to draw meaningful conclusions about the relationship between the variables. This helped me answer the analytical question and understand the practical significance of the findings which were: Based on the calculation results, the correlation coefficient between the number of reported shoplifting offenses and COVID-19 cases is approximately 0.2391 which suggests a weak positive correlation between the of reported shoplifting offenses and the number of COVID- 19 cases by ZIP code. The p-value is approximately 0.1426 which indicates a null hypothesis that indicates no correlation. In this case, the p-value is greater than the commonly used significance level of 0.05. Since its greater than 0.05, there is not enough evidence to reject the null hypothesis. Therefore, I cannot conclude that the relationship between the number of reported shoplifting offenses and COVID-19 cases is statistically significant. In summary, based on the correlation coefficient and p-value, there is a weak positive correlation between the reported shoplifting offenses and the number of COVID-19 cases by ZIP code, which is not statistically significant. By following these steps, I was able to systematically analyze the data, perform relevant data wrangling operations, visualize the relationship between shoplifting offenses and COVID-19 cases, and conduct statistical analysis to draw meaningful conclusions. The
5 chosen solutions, such as merging the datasets and calculating correlation coefficients, were directly aligned with the analytical question and allowed me to investigate the relationship between the variables in a comprehensive and structured manner.