population_df = pd.read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/AnalyseProject/world_population.csv', index_col='Country Code') meta_df = pd.read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/AnalyseProject/metadata.csv', index_col='Country Code') Question 1 As we've seen previously, the world population data spans from 1960 to 2017. We'd like to build a predictive model that can give us the best guess at what the world population in a given year was. However, as a slight twist this time, we want to compute this estimate for only countries within a given income group. First, however, we need to organise our data such that the sklearn's RandomForestRegressor class can train on our data. To do this, we will write a function that takes as input an income group and return a 2-d numpy array that contains the year and the measured population. Function Specifications: Should take a str argument, called income_group_name as input and return a numpy array type as output. Set the default argument of income_group_name to equal 'Low income'. If the specified value of income_group_name does not exist, the function must raise a ValueError. The array should only have two columns containing the year and the population, in other words, it should have a shape (?, 2) where ? is the length of the data. The values within the array should be of type np.int64. Further Reading: Data types are associated with memory allocation. As such, your choice of data type affects the precision of computations in your program. For example, the np.int data type in numpy can only store values between -2147483648 to 2147483647 and assigning values outside this range for variables of this data type may cause run-time errors. To avoid this, we can use data types with larger memory capacity e.g. np.int64. https://docs.scipy.org/doc/numpy/user/basics.types.html ### START FUNCTION def get_total_pop_by_income(income_group_name='Low income'): # your code here return ### END FUNCTION
population_df = pd.read_csv('https://raw.githubusercontent.com/Explore-
meta_df = pd.read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/AnalyseProject/metadata.csv', index_col='Country Code')
Question 1
As we've seen previously, the world population data spans from 1960 to 2017. We'd like to build a predictive model that can give us the best guess at what the world population in a given year was. However, as a slight twist this time, we want to compute this estimate for only countries within a given income group.
First, however, we need to organise our data such that the sklearn's RandomForestRegressor class can train on our data. To do this, we will write a function that takes as input an income group and return a 2-d numpy array that contains the year and the measured population.
Function Specifications:
- Should take a str argument, called income_group_name as input and return a numpy array type as output.
- Set the default argument of income_group_name to equal 'Low income'.
- If the specified value of income_group_name does not exist, the function must raise a ValueError.
- The array should only have two columns containing the year and the population, in other words, it should have a shape (?, 2) where ? is the length of the data.
- The values within the array should be of type np.int64.
Further Reading:
Data types are associated with memory allocation. As such, your choice of data type affects the precision of computations in your program. For example, the np.int data type in numpy can only store values between -2147483648 to 2147483647 and assigning values outside this range for variables of this data type may cause run-time errors. To avoid this, we can use data types with larger memory capacity e.g. np.int64.
https://docs.scipy.org/doc/numpy/user/basics.types.html
### START FUNCTION
def get_total_pop_by_income(income_group_name='Low income'):
# your code here
return
### END FUNCTION
Step by step
Solved in 2 steps