Potential Sources of Bias for Predictive modeling: Task: Ambulance Demand in NY: Using the service data from New York City during the years 2008-2016 to build your models: The data is collected by the " NYC Traffic data” the question specifically asking for the NYC traffic data     Representativeness:     {What is this dataset attempting to capture data about? Is the dataset designed to be representative of the underlying phenomenon that it is attempting to cover?} Geographic Coverage  {Does this dataset cover all geographic areas of interest for your model?  If not, what is missing?  Why is it missing?} Demographic Coverage               {Does this dataset cover all demographics of interest for your model?  Are there individuals of a specific age / race / ethnicity / ability that would not be included in this dataset?  If so, what is missing?  Why is it missing?} Temporal Coverage      {Does this dataset cover all time periods of interest for your model?  Are specific times, days, months, weeks, years missing from the data effort?  If so, what is missing?  Why is it missing?} Comprehensiveness     {Does the dataset capture all of the relevant features about your subject of interest that, you think, would be relevant for building your model?  If not, what is missing and why is it missing?  What limitations will those missing features have on the model you want to build?} System Drift       {Based on your review of the data source, have you identified any specific factors that suggest changes in the systems collecting the data you are using.  This could be a result to a changed research design, the inclusion of different questions, or can be seen in variables that have two options for the same answer (think “Y” and “True”, “N” and “False”).  If present, which variables are these, and what is your plan for addressing these variables?  Do you think the potential system drift would dramatically change your underlying model assumptions?  Why or why not?

icon
Related questions
Question

Potential Sources of Bias for Predictive modeling:

Task: Ambulance Demand in NY:

Using the service data from New York City during the years 2008-2016 to build your models: The data is collected by the " NYC Traffic data” the question specifically asking for the NYC traffic data

 

 

Representativeness:     {What is this dataset attempting to capture data about? Is the dataset designed to be representative of the underlying phenomenon that it is attempting to cover?}

Geographic Coverage  {Does this dataset cover all geographic areas of interest for your model?  If not, what is missing?  Why is it missing?}

Demographic Coverage               {Does this dataset cover all demographics of interest for your model?  Are there individuals of a specific age / race / ethnicity / ability that would not be included in this dataset?  If so, what is missing?  Why is it missing?}

Temporal Coverage      {Does this dataset cover all time periods of interest for your model?  Are specific times, days, months, weeks, years missing from the data effort?  If so, what is missing?  Why is it missing?}

Comprehensiveness     {Does the dataset capture all of the relevant features about your subject of interest that, you think, would be relevant for building your model?  If not, what is missing and why is it missing?  What limitations will those missing features have on the model you want to build?}

System Drift       {Based on your review of the data source, have you identified any specific factors that suggest changes in the systems collecting the data you are using.  This could be a result to a changed research design, the inclusion of different questions, or can be seen in variables that have two options for the same answer (think “Y” and “True”, “N” and “False”).  If present, which variables are these, and what is your plan for addressing these variables?  Do you think the potential system drift would dramatically change your underlying model assumptions?  Why or why not?

Expert Solution
Step 1: Introduce the problem:

Ambulance demand prediction in New York City is crucial for efficient emergency response services. Leveraging NYC traffic data spanning 2008-2016, this analysis aims to develop predictive models, but potential biases must be addressed to ensure accurate and equitable results.

steps

Step by step

Solved in 3 steps

Blurred answer