Potential Sources of Bias for Predictive modeling: Task: Ambulance Demand in NY: Using the service data from New York City during the years 2008-2016 to build your models: The data is collected by the " NYC Traffic data” the question specifically asking for the NYC traffic data Representativeness: {What is this dataset attempting to capture data about? Is the dataset designed to be representative of the underlying phenomenon that it is attempting to cover?} Geographic Coverage {Does this dataset cover all geographic areas of interest for your model? If not, what is missing? Why is it missing?} Demographic Coverage {Does this dataset cover all demographics of interest for your model? Are there individuals of a specific age / race / ethnicity / ability that would not be included in this dataset? If so, what is missing? Why is it missing?} Temporal Coverage {Does this dataset cover all time periods of interest for your model? Are specific times, days, months, weeks, years missing from the data effort? If so, what is missing? Why is it missing?} Comprehensiveness {Does the dataset capture all of the relevant features about your subject of interest that, you think, would be relevant for building your model? If not, what is missing and why is it missing? What limitations will those missing features have on the model you want to build?} System Drift {Based on your review of the data source, have you identified any specific factors that suggest changes in the systems collecting the data you are using. This could be a result to a changed research design, the inclusion of different questions, or can be seen in variables that have two options for the same answer (think “Y” and “True”, “N” and “False”). If present, which variables are these, and what is your plan for addressing these variables? Do you think the potential system drift would dramatically change your underlying model assumptions? Why or why not?
Potential Sources of Bias for Predictive modeling:
Task: Ambulance Demand in NY:
Using the service data from New York City during the years 2008-2016 to build your models: The data is collected by the " NYC Traffic data” the question specifically asking for the NYC traffic data
Representativeness: {What is this dataset attempting to capture data about? Is the dataset designed to be representative of the underlying phenomenon that it is attempting to cover?}
Geographic Coverage {Does this dataset cover all geographic areas of interest for your model? If not, what is missing? Why is it missing?}
Demographic Coverage {Does this dataset cover all demographics of interest for your model? Are there individuals of a specific age / race / ethnicity / ability that would not be included in this dataset? If so, what is missing? Why is it missing?}
Temporal Coverage {Does this dataset cover all time periods of interest for your model? Are specific times, days, months, weeks, years missing from the data effort? If so, what is missing? Why is it missing?}
Comprehensiveness {Does the dataset capture all of the relevant features about your subject of interest that, you think, would be relevant for building your model? If not, what is missing and why is it missing? What limitations will those missing features have on the model you want to build?}
System Drift {Based on your review of the data source, have you identified any specific factors that suggest changes in the systems collecting the data you are using. This could be a result to a changed research design, the inclusion of different questions, or can be seen in variables that have two options for the same answer (think “Y” and “True”, “N” and “False”). If present, which variables are these, and what is your plan for addressing these variables? Do you think the potential system drift would dramatically change your underlying model assumptions? Why or why not?
Ambulance demand prediction in New York City is crucial for efficient emergency response services. Leveraging NYC traffic data spanning 2008-2016, this analysis aims to develop predictive models, but potential biases must be addressed to ensure accurate and equitable results.
Step by step
Solved in 3 steps