Q.3 Data are available for 201 countries on five demographic and economic characteristics: the rate of population growth (popgrwth), GDP per capita in US dollars (gdpcap), births per 1000 persons/year (birthrate), proportion of workforce employed in agriculture (agriculture) and the rate of literacy (literacy). A Principal Component Analysis is to be done out to look for interesting structure in this data. Required: a) Why is it a good idea to first scale the data? b) The Figure1 below is a scatter plot of the first two principal components of the scaled data. country 4. 1st Principal Component Figure 1 a) Do you think there is correlation between the first and second principal components? Justify your answer. b) Describe the interesting features of the Figure 1. Describe in a sentence or two the main differences between supervised and unsupervised c) learning. 2nd Principal Component 2 -1 0 1 2 3 4
Inverse Normal Distribution
The method used for finding the corresponding z-critical value in a normal distribution using the known probability is said to be an inverse normal distribution. The inverse normal distribution is a continuous probability distribution with a family of two parameters.
Mean, Median, Mode
It is a descriptive summary of a data set. It can be defined by using some of the measures. The central tendencies do not provide information regarding individual data from the dataset. However, they give a summary of the data set. The central tendency or measure of central tendency is a central or typical value for a probability distribution.
Z-Scores
A z-score is a unit of measurement used in statistics to describe the position of a raw score in terms of its distance from the mean, measured with reference to standard deviation from the mean. Z-scores are useful in statistics because they allow comparison between two scores that belong to different normal distributions.
Step by step
Solved in 2 steps