Potential Sources of Bias for Predictive modeling: Task: Ambulance Demand in NY: Using the service data from New York City during the years 2008-2016 to build your models: The data is collected by the "City Health Department's Emergency Medical Services Division." Representativeness:     {What is this dataset attempting to capture data about? Is the dataset designed to be representative of the underlying phenomenon that it is attempting to cover?} Geographic Coverage  {Does this dataset cover all geographic areas of interest for your model?  If not, what is missing?  Why is it missing?} Demographic Coverage               {Does this dataset cover all demographics of interest for your model?  Are there individuals of a specific age / race / ethnicity / ability that would not be included in this dataset?  If so, what is missing?  Why is it missing?} Temporal Coverage      {Does this dataset cover all time periods of interest for your model?  Are specific times, days, months, weeks, years missing from the data effort?  If so, what is missing?  Why is it missing?} Comprehensiveness     {Does the dataset capture all of the relevant features about your subject of interest that, you think, would be relevant for building your model?  If not, what is missing and why is it missing?  What limitations will those missing features have on the model you want to build?} System Drift       {Based on your review of the data source, have you identified any specific factors that suggest changes in the systems collecting the data you are using.  This could be a result to a changed research design, the inclusion of different questions, or can be seen in variables that have two options for the same answer (think “Y” and “True”, “N” and “False”).  If present, which variables are these, and what is your plan for addressing these variables?  Do you think the potential system drift would dramatically change your underlying model assumptions?  Why or why not?

icon
Related questions
Question

Potential Sources of Bias for Predictive modeling:

Task: Ambulance Demand in NY:

Using the service data from New York City during the years 2008-2016 to build your models: The data is collected by the "City Health Department's Emergency Medical Services Division."

Representativeness:     {What is this dataset attempting to capture data about? Is the dataset designed to be representative of the underlying phenomenon that it is attempting to cover?}

Geographic Coverage  {Does this dataset cover all geographic areas of interest for your model?  If not, what is missing?  Why is it missing?}

Demographic Coverage               {Does this dataset cover all demographics of interest for your model?  Are there individuals of a specific age / race / ethnicity / ability that would not be included in this dataset?  If so, what is missing?  Why is it missing?}

Temporal Coverage      {Does this dataset cover all time periods of interest for your model?  Are specific times, days, months, weeks, years missing from the data effort?  If so, what is missing?  Why is it missing?}

Comprehensiveness     {Does the dataset capture all of the relevant features about your subject of interest that, you think, would be relevant for building your model?  If not, what is missing and why is it missing?  What limitations will those missing features have on the model you want to build?}

System Drift       {Based on your review of the data source, have you identified any specific factors that suggest changes in the systems collecting the data you are using.  This could be a result to a changed research design, the inclusion of different questions, or can be seen in variables that have two options for the same answer (think “Y” and “True”, “N” and “False”).  If present, which variables are these, and what is your plan for addressing these variables?  Do you think the potential system drift would dramatically change your underlying model assumptions?  Why or why not?

 

Expert Solution
Step 1: Definitions

Predictive models are mathematical or computational models used in data science, machine learning, and statistics to make predictions or forecasts about future events or outcomes based on historical data and patterns. These models are designed to learn from existing data and then apply that knowledge to predict outcomes for new, unseen data.

Key characteristics of predictive models include:

  • Historical Data

  • Features and Variables

  • Training

  • Algorithm Selection

  • Model Building

  • Validation

  • Deployment

  • Continuous Improvement

steps

Step by step

Solved in 3 steps

Blurred answer
Knowledge Booster
Enhanced ER Model
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, ai-and-machine-learning and related others by exploring similar questions and additional content below.
Similar questions