Report_Unveiling Insights_Vancouver Crime

docx

School

British Columbia Institute of Technology *

*We aren’t endorsed by this school

Course

3205

Subject

Sociology

Date

Jan 9, 2024

Type

docx

Pages

4

Uploaded by KidStarCaterpillar31

Report
Unveiling Insights: Vancouver Crime (2003 – 2016) - Report This report aims to articulate the process of analyzing and uncovering insights from the Crime in Vancouver dataset The report is structured into four key sections: Data Sources Data Cleaning Data Analysis Conclusion For a more comprehensive understanding, please refer to the accompanying PowerPoint presentation and explore the interactive Tableau dashboard for additional insights. 1. Data Sources Two data sources are utilized for analysis. The first is the Crime in Vancouver dataset , which functions as the primary dataset, furnishing information on type of crime, location, date, and time spanning from January 1, 2003, to July 13, 2017, comprising a total of 530,652 records. This dataset has been extracted from the Vancouver Open Data Catalogue, accessible through Kaggle.com. The second data source is the Local Area Boundary dataset , serving as a supplementary dataset for mapping neighborhood boundaries, encompassing 22 distinct neighborhoods. This dataset has been extracted from the Open Data Portal at the City of Vancouver website. 2. Data Cleaning During the process of data cleaning, our team made several changes to optimize the dataset for subsequent analysis. Change Types The fields YEAR, MONTH, DAY, HOUR , and MINUTE were converted to string type as they were deemed categorical data for the analysis. This conversion facilitates a more accurate representation of these aspects in the visualization. Remove Fields The X and Y fields , initially representing coordinate values in UTM Zone 10, were removed. Despite their original purpose, the data owner had already converted these coordinates to Latitude and Longitude, allowing for a more user-friendly format for map creation in the dashboard. Additionally, the Latitude and Longitude coordinates were solely utilized for mapping purposes. To add on, the field Hundred_Block was removed from the dataset during the cleaning process. This decision was made on the basis that this particular data element held did not have a meaningful relevance for the intended dashboard. The primary goal in the analysis and subsequent visualization was directed toward neighborhood-centric information to help give a better understanding. 1
Filter To refine the dataset and avoid any confusion, data for the year 2017 was deliberately excluded from the analysis. The exclusion was justified by the incomplete nature of the 2017 dataset, covering only seven months. Including this incomplete data might have led users to a misguided perception that 2017 had the lowest crime rates. We chose to take this route of data cleaning aimed to ensure the accuracy and reliability of the dataset for the subsequent creation of a meaningful and informative dashboard. Group Value To streamline the data representation, the categories " Vehicle Collision or Pedestrian Stuck (with Fatality) " and " Vehicle Collision or Pedestrian Stuck (with Injury) " were combined into a singular type labeled " Vehicle Collision or Pedestrian Stuck. " Left Join Subsequently, a Left Join operation was executed, integrating the Crime in Vancouver data table with the Local Area Boundary data table. This aimed to consolidate information for a more comprehensive analysis. Change Name Upon completing the join operation, an examination of the neighborhood field revealed discrepancies in three instances: Central Business District, Musqueam , and Stanley Park . To showcase this misalignment, the following adjustments were made based on real geographical data: Central Business District was renamed Downtown to align with the Local Area Boundary, and Musqueam was renamed Dunbar-Southlands for a similar alignment. Despite the adjustments made, unmatched records persisted for the Stanley Park area due to the absence of corresponding information in the Local Area Boundary dataset. Stanley Park Boundary Issue While joining the data, we found that Stanley Park's data is missing from the Local Area Boundary data, preventing the display of its boundary on the map. To resolve this issue, we determined that it is necessary to generate a new boundary data for Stanley Park using the specific geographic application called QGIS. Upon completing the addition of this new data to the Local Area Boundary, we changed the file name to Vancouver.shp. Then, the Stanley Park boundary now displays accurate information in the visualization. 3. Data Analysis The intended audience for this is Police and law enforcement agencies for example (VPD, RCMP) etc. Our Purpose is to allocate resources effectively so certain adjustments can be made to ensure future preventions of Theft. 2
Hypothesis 1: Which types of crimes should the police focus on? In order to narrow our focus within the data, our initial inquiry revolves around determining which types of crimes demand the police's primary attention. We begin by pinpointing the crime that has the highest occurrence, and our findings reveal that Theft from Vehicle constitutes 32.52% of the reported incidents. Furthermore, our analysis indicates the absence of any significant seasonal patterns in these occurrences. Hypothesis 2: Which neighborhood area should the police focus on? To streamline resource allocation, we concentrate on neighborhood with a high incidence of Theft from Vehicle cases. Our investigation highlights Downtown as the area with the highest reported cases, accounting for 27.82%. Additionally, from 2003 to 2016, Downtown consistently recorded the highest incidence of Theft from Vehicle among all neighborhoods, with the lowest at 21.49% in 2022 and the highest at 30.09% in 2003. This suggests that prioritizing efforts in Downtown could potentially yield a significant impact. Hypothesis 3: What time should the police focus on? We analyze in detail the time for the police to concentrate on monitoring Theft from Vehicle crimes in Downtown by dividing it into quartiles. The criterion for determining when the police should focus is based on all periods falling into the 4th quartile. The data indicates that during 17:00-22:59, these periods consistently fall under the 4th quartile, with the top 3 peak times occurring at 18:00 – 18:59, 19:00 – 19:59, and 22:00 – 22:59, respectively. Therefore, it is recommended that the police commence resource allocation to Downtown starting from 17:00. The times of concern are 18:00 – 18:59, 19:00 – 19:59, and 22:00 – 22:59. Subsequently, reduce resource allocation to Downtown for Theft from Vehicle after 22:59. Considerations The presence of outdated data, stemming from the dataset's timeframe of 2003-2016, poses a challenge as it may not be entirely applicable to the current situation. To enhance accuracy, the recommended solution involves incorporating the latest available data. Additionally, the issue of missing data, particularly in terms of neighborhood and time information, has been identified. However, the impact is relatively minor, with only 3.9% missing neighborhood data for Theft from Vehicle. Furthermore, the absence of time data for categories such as 'Homicide' and 'Offense Against Person' lies outside our current focus scope. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
To address this, a pragmatic solution involves excluding the missing data from the visualization, ensuring a more comprehensive and representative analysis. 4. Conclusion For enhanced crime prevention effectiveness, the police should strategically allocate resources to focus on “Theft from Vehicle” , especially in “Downtown”, with more emphasis on resource allocation during “17:00 and 22:59” . Analyzing data from 2003 to 2016, covering 24 neighborhoods and encompassing 10 types of crimes, revealed a total of 474,028 cases. Specifically, Downtown recorded 19,237 cases of Theft from Vehicle between 17:00 and 22:59. Our recommendation underscores that concentrating resources on the right location and time can significantly impact crime rates, facilitating more efficient law enforcement efforts. Data Source Crime in Vancouver: https://www.kaggle.com/datasets/wosaku/crime-in-vancouver Local Area Boundary: https://opendata.vancouver.ca/explore/dataset/local-area-boundary/information/?disjunctive.name 4