Potential Sources of Bias for Predictive modeling: Task: Ambulance Demand in NY: Using the service data from New York City during the years 2008-2016 to build your models: The data is collected by the "City Health Department's Emergency Medical Services Division." Representativeness: {What is this dataset attempting to capture data about? Is the dataset designed to be representative of the underlying phenomenon that it is attempting to cover?} Geographic Coverage {Does this dataset cover all geographic areas of interest for your model? If not, what is missing? Why is it missing?} Demographic Coverage {Does this dataset cover all demographics of interest for your model? Are there individuals of a specific age / race / ethnicity / ability that would not be included in this dataset? If so, what is missing? Why is it missing?} Temporal Coverage {Does this dataset cover all time periods of interest for your model? Are specific times, days, months, weeks, years missing from the data effort? If so, what is missing? Why is it missing?} Comprehensiveness {Does the dataset capture all of the relevant features about your subject of interest that, you think, would be relevant for building your model? If not, what is missing and why is it missing? What limitations will those missing features have on the model you want to build?} System Drift {Based on your review of the data source, have you identified any specific factors that suggest changes in the systems collecting the data you are using. This could be a result to a changed research design, the inclusion of different questions, or can be seen in variables that have two options for the same answer (think “Y” and “True”, “N” and “False”). If present, which variables are these, and what is your plan for addressing these variables? Do you think the potential system drift would dramatically change your underlying model assumptions? Why or why not?
Potential Sources of Bias for Predictive modeling:
Task: Ambulance Demand in NY:
Using the service data from New York City during the years 2008-2016 to build your models: The data is collected by the "City Health Department's Emergency Medical Services Division."
Representativeness: {What is this dataset attempting to capture data about? Is the dataset designed to be representative of the underlying phenomenon that it is attempting to cover?}
Geographic Coverage {Does this dataset cover all geographic areas of interest for your model? If not, what is missing? Why is it missing?}
Demographic Coverage {Does this dataset cover all demographics of interest for your model? Are there individuals of a specific age / race / ethnicity / ability that would not be included in this dataset? If so, what is missing? Why is it missing?}
Temporal Coverage {Does this dataset cover all time periods of interest for your model? Are specific times, days, months, weeks, years missing from the data effort? If so, what is missing? Why is it missing?}
Comprehensiveness {Does the dataset capture all of the relevant features about your subject of interest that, you think, would be relevant for building your model? If not, what is missing and why is it missing? What limitations will those missing features have on the model you want to build?}
System Drift {Based on your review of the data source, have you identified any specific factors that suggest changes in the systems collecting the data you are using. This could be a result to a changed research design, the inclusion of different questions, or can be seen in variables that have two options for the same answer (think “Y” and “True”, “N” and “False”). If present, which variables are these, and what is your plan for addressing these variables? Do you think the potential system drift would dramatically change your underlying model assumptions? Why or why not?
Predictive models are mathematical or computational models used in data science, machine learning, and statistics to make predictions or forecasts about future events or outcomes based on historical data and patterns. These models are designed to learn from existing data and then apply that knowledge to predict outcomes for new, unseen data.
Key characteristics of predictive models include:
Historical Data
Features and Variables
Training
Algorithm Selection
Model Building
Validation
Deployment
Continuous Improvement
Step by step
Solved in 3 steps