The world population data spans from 1960 to 2017. We'd like to build a predictive model that can give us the best guess at what the future or past population of a particular country was or might be. First, however, we need to formulate our data such that sklearn's Ridge regression class can train on our data. To do this, we will write a function that takes as input a country name and return a 2-d numpy array that contains the year and the measured population. Function Specifications: Should take a str as input and return a numpy array type as output. The array should only have two columns containing the year and the population, in other words, it should have a shape (?, 2) where ? is the length of the data. The values within the array should be of type int.
attached image contains external data
Question 1
The world population data spans from 1960 to 2017. We'd like to build a predictive model that can give us the best guess at what the future or past population of a particular country was or might be.
First, however, we need to formulate our data such that sklearn's Ridge regression class can train on our data. To do this, we will write a function that takes as input a country name and return a 2-d numpy array that contains the year and the measured population.
Function Specifications:
- Should take a str as input and return a numpy array type as output.
- The array should only have two columns containing the year and the population, in other words, it should have a shape (?, 2) where ? is the length of the data.
- The values within the array should be of type int.
Hint: You'll need to use both the the population and country map dataframes given above.
def get_year_pop(country_name):
b)
Now that we have have our data, we need to split this into a training set, and a testing set. But before we split our data into training and testing, we also need to split our data into the predictive features (denoted X) and the response (denoted y).
Write a function that will take as input a 2-d numpy array and return four variables in the form of (X_train, y_train), (X_test, y_test), where (X_train, y_train) are the features + response of the training set, and (X-test, y_test) are the features + response of the testing set.
Function Specifications:
- Should take a 2-d numpy array as input.
- Should split the array such that X is the year, and y is the corresponding population.
- Should return two tuples of the form (X_train, y_train), (X_test, y_test).
def feature_response_split(arr):
c)
Now that we have formatted our data, we can fit a model using sklearn's Ridge() class. We'll write a function that will take as input the features and response variables that we created in the last question, and returns a trained model.
Function Specifications:
- Should take two numpy arrays as input in the form (X_train, y_train).
- Should return an sklearn Ridge model.
- The returned model should be fitted to the data.
Hint: You may need to reshape the data within the function. You can use .reshape(-1, 1) to do this
Trending now
This is a popular solution!
Step by step
Solved in 3 steps with 1 images
thanks for answering the first question
this is the follow up question
b)
Now that we have have our data, we need to split this into a training set, and a testing set. But before we split our data into training and testing, we also need to split our data into the predictive features (denoted X) and the response (denoted y).
Write a function that will take as input a 2-d numpy array and return four variables in the form of (X_train, y_train), (X_test, y_test), where (X_train, y_train) are the features + response of the training set, and (X-test, y_test) are the features + response of the testing set.
Function Specifications:
- Should take a 2-d numpy array as input.
- Should split the array such that X is the year, and y is the corresponding population.
- Should return two tuples of the form (X_train, y_train), (X_test, y_test).