Potential Sources of Bias for Predictive modeling: Task: Ambulance Demand in NY: Using the service data from New York City during the years 2008-2016 to build your models: The data is collected by the " NYC FDNY Emergency Medical Services Ambulance Calls data” Representativeness: {What is this dataset attempting to capture data about? Is the dataset designed to be representative of the underlying phenomenon that it is attempting to cover?} Geographic Coverage {Does this dataset cover all geographic areas of interest for your model? If not, what is missing? Why is it missing?} Demographic Coverage {Does this dataset cover all demographics of interest for your model? Are there individuals of a specific age / race / ethnicity / ability that would not be included in this dataset? If so, what is missing? Why is it missing?} Temporal Coverage {Does this dataset cover all time periods of interest for your model? Are specific times, days, months, weeks, years missing from the data effort? If so, what is missing? Why is it missing?} Comprehensiveness {Does the dataset capture all of the relevant features about your subject of interest that, you think, would be relevant for building your model? If not, what is missing and why is it missing? What limitations will those missing features have on the model you want to build?} System Drift {Based on your review of the data source, have you identified any specific factors that suggest changes in the systems collecting the data you are using. This could be a result to a changed research design, the inclusion of different questions, or can be seen in variables that have two options for the same answer (think “Y” and “True”, “N” and “False”). If present, which variables are these, and what is your plan for addressing these variables? Do you think the potential system drift would dramatically change your underlying model assumptions? Why or why not?
Potential Sources of Bias for Predictive modeling:
Task: Ambulance Demand in NY:
Using the service data from New York City during the years 2008-2016 to build your models: The data is collected by the " NYC FDNY Emergency Medical Services Ambulance Calls data”
Representativeness: {What is this dataset attempting to capture data about? Is the dataset designed to be representative of the underlying phenomenon that it is attempting to cover?}
Geographic Coverage {Does this dataset cover all geographic areas of interest for your model? If not, what is missing? Why is it missing?}
Demographic Coverage {Does this dataset cover all demographics of interest for your model? Are there individuals of a specific age / race / ethnicity / ability that would not be included in this dataset? If so, what is missing? Why is it missing?}
Temporal Coverage {Does this dataset cover all time periods of interest for your model? Are specific times, days, months, weeks, years missing from the data effort? If so, what is missing? Why is it missing?}
Comprehensiveness {Does the dataset capture all of the relevant features about your subject of interest that, you think, would be relevant for building your model? If not, what is missing and why is it missing? What limitations will those missing features have on the model you want to build?}
System Drift {Based on your review of the data source, have you identified any specific factors that suggest changes in the systems collecting the data you are using. This could be a result to a changed research design, the inclusion of different questions, or can be seen in variables that have two options for the same answer (think “Y” and “True”, “N” and “False”). If present, which variables are these, and what is your plan for addressing these variables? Do you think the potential system drift would dramatically change your underlying model assumptions? Why or why not?
Predictive modeling for ambulance demand in New York City using the "NYC FDNY Emergency Medical Services Ambulance Calls data" from 2008 to 2016 requires a thorough assessment of potential sources of bias in the dataset. Bias in data can lead to inaccurate predictions and flawed insights. In this analysis, we will examine key aspects of the dataset's representativeness, geographic coverage, demographic coverage, temporal coverage, comprehensiveness, and system drift.
Step by step
Solved in 3 steps