Answered: The world population data spans from…

Computer Networking: A Top-Down Approach (7th Edition)

7th Edition

ISBN:9780133594140

Author:James Kurose, Keith Ross

Publisher:James Kurose, Keith Ross

Chapter1: Computer Networks And The Internet

Section: Chapter Questions

Problem R1RQ: What is the difference between a host and an end system? List several different types of end...

See similar textbooks

attached image contains external data

Question 1

The world population data spans from 1960 to 2017. We'd like to build a predictive model that can give us the best guess at what the future or past population of a particular country was or might be.

First, however, we need to formulate our data such that sklearn's Ridge regression class can train on our data. To do this, we will write a function that takes as input a country name and return a 2-d numpy array that contains the year and the measured population.

Function Specifications:

Should take a str as input and return a numpy array type as output.
The array should only have two columns containing the year and the population, in other words, it should have a shape (?, 2) where ? is the length of the data.
The values within the array should be of type int.

Hint: You'll need to use both the the population and country map dataframes given above.

def get_year_pop(country_name):

b)

Now that we have have our data, we need to split this into a training set, and a testing set. But before we split our data into training and testing, we also need to split our data into the predictive features (denoted X) and the response (denoted y).

Write a function that will take as input a 2-d numpy array and return four variables in the form of (X_train, y_train), (X_test, y_test), where (X_train, y_train) are the features + response of the training set, and (X-test, y_test) are the features + response of the testing set.

Function Specifications:

Should take a 2-d numpy array as input.
Should split the array such that X is the year, and y is the corresponding population.
Should return two tuples of the form (X_train, y_train), (X_test, y_test).

def feature_response_split(arr):

Now that we have formatted our data, we can fit a model using sklearn's Ridge() class. We'll write a function that will take as input the features and response variables that we created in the last question, and returns a trained model.

Function Specifications:

Should take two numpy arrays as input in the form (X_train, y_train).
Should return an sklearn Ridge model.
The returned model should be fitted to the data.

Hint: You may need to reshape the data within the function. You can use .reshape(-1, 1) to do this

import numpy as np
import pandas as pd
from numpy import array
from sklearn.ensemble import RandomForest Regressor
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
population_df = pd. read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Analyse Project/world_p
opulation.csv', index_col='Country Code')
meta_df = pd. read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Analyse Project/metadata.csv
', index_col='Country Code')
population_df.head()

Expert Solution

Trending now

This is a popular solution!

Step by step

Solved in 3 steps with 1 images

SEE SOLUTION Check out a sample Q&A here

Follow-up Questions

Read through expert solutions to related follow-up questions below.

Follow-up Question

thanks for answering the first question

this is the follow up question

b)

Function Specifications:

Should take a 2-d numpy array as input.
Should split the array such that X is the year, and y is the corresponding population.
Should return two tuples of the form (X_train, y_train), (X_test, y_test).