Lab Assignment_2_PartA&B_2024

docx

School

Georgia Institute Of Technology *

*We aren’t endorsed by this school

Course

6570

Subject

Civil Engineering

Date

Feb 20, 2024

Type

docx

Pages

8

Uploaded by CountGrouseMaster920

Report
Spring 2024 CP 6570 Socioeconomic GIS School of City & Regional Planning Lab Assignment #2- Part A Assigned: Jan 26 th , 2024 Due: Feb 2 nd , 2024 (intermediate deadline) Objectives: Familiarize yourself with basic JMP and Excel functions for data assembly and analysis Familiarize yourself with commonly used socioeconomic variables used for index construction Learn how to download, clean and assemble data for an analytics project (R, manual download) Understand the basic steps in index construction using Principal Components Analysis I. Download socioeconomic variables and assemble the dataset: Research has shown that multiple markers of economic disadvantage tend to cluster at the neighborhood level. Messer et al (2006) define a neighborhood deprivation index across five socio- demographic domains including income/poverty, education, employment, housing, and occupation. For this lab, you will deprivation index similar to the ADI published on the Neighborhood Atlas website ( https://www.neighborhoodatlas.medicine.wisc.edu ). The highlighted variables in the table below are part of the original ADI. I have provided additional variables that are also important components of measuring deprivation. a. Download the following variables from the 2021 census tract data from the American Community Survey (use 5-year estimates). You can definitely use other variables that you think might be useful to your particular project or has been used in other research endeavors. Limit your data to the state of Georgia: Variable Name Table ID Variable ID Percent population aged 25 and above without a High School diploma S1501 100 - S1501_C02_014 Median household income in US dollars S1903 S1903_C03_001 Income disparity (Gini Index) B19083 B19083_001 Median home value in US dollars DP04 DP04_0089 Median gross rent in US dollars DP04 DP04_0134 Median monthly mortgage in US dollars DP04 DP04_0101 Percent of owner-occupied housing units DP04 DP04_0002P Percent of civilian labor force population aged 16 years and older who are unemployed S2301 S2301_C04_001 Percent of families below federal poverty level S1702 S1702_C02_001 Percent of single-parent households with children less than 18 years of age DP02 DP02_0011P + DP02_0007P Percent of households without a motor vehicle DP04 DP04_0058 Percent of households with more than 1 person per room DP04 100 - DP04_0077P Percent non-white DP05 Percent of vacant housing units DP04 Percent of households where housing costs greater than 30% of household income DP04 Percent of households receiving public assistance income B19057 Percent of individuals without health insurance S2701 Percent of households without internet access S2801 1
Spring 2024 CP 6570 Socioeconomic GIS School of City & Regional Planning Percent of households receiving SNAP or Foodstamps b. These variables need to be extracted from multiple tables. You will need to clean the tables before you can join them. Cleaning will include but not be limited to the following steps: Selecting key variables from the tables Deleting the MOE columns Impute or eliminate variables that have missing values Renaming variables into a suitable analysis format Scaling and transforming variables to ensure consistent direction Lab Assignment #2- Part B Assigned: Feb 9 th , 2024 Due: Feb 16 th , 2024 (intermediate deadline) Objectives: Familiarize yourself with basic JMP and Excel functions for data assembly and analysis Familiarize yourself with commonly used socioeconomic variables used for index construction Learn how to download, clean and assemble data for an analytics project Understand the basic steps in index construction using Principal Components Analysis II. Analyze the distributions in your dataset: a. Using the jmp Analyze>Distribution platform, generate the distributions for all the variables that you intend to use in your index b. Describe your observations regarding the distributions (normal vs skewed, etc.). Include screenshots wherever relevant. Gini Index 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 2 This variable is normally distributed data. This distribution could suggest that economic inequality within the population follows a typical statistical pattern, like many other phenomena observed in nature and society. It's important to note that the Gini index is bounded between 0 and 1, so a truly normal distribution may not be the most accurate representation.
Spring 2024 CP 6570 Socioeconomic GIS School of City & Regional Planning p. population Black 0 20 40 60 80 100 3 This data is not normally distributed and is skewed. This variable is the percentage of the population that identifies as Black and could suggests that there is an unequal distribution of Black population across the census tracts. The positive skewness could indicate that there are a few census tracts with very high percentages of Black population, while many tracts have lower percentages. This variable is the transformed median household income. The data is negatively skewed and suggests that there are more census tracts with higher median household incomes compared to those with lower incomes. The long tail towards lower incomes suggests that there are still some areas with relatively lower median household incomes, albeit fewer in number compared to the higher income areas.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Spring 2024 CP 6570 Socioeconomic GIS School of City & Regional Planning III. Analyze the correlations in your dataset: a. Using the JMP Analyze>Multivariate Methods>Multivariate platform, generate the correlations and covariance matrices for all the variables that you intend to use in your index b. Describe your observations regarding the matrices (standardized vs unstandardized, range of values for each matrix, etc.). Include screenshots wherever relevant. There are some highly correlated variables in the data table. The data in the correlation matrix is standardized and only runs from -1 to 1. There are some variables with high correlation, and these will be the dominant variables in the index. Some variables have very small correlation coefficients, and these will most likely be some of the variables dropped. In the covariance matrix helps us understand the degree to which two variables change together. A positive covariance indicates that as one variable increases, the other variable tends to increase as well. A negative covariance indicates that as one variable increases, the other variable tends to decrease. The units of measurement of the covariance are based on the original unit of the variable. The range of values is therefore based on the original unit of the variable. c. Produce a “clustered correlations” plot. Describe your observations regarding the pattern of correlations. Include screenshots wherever relevant. How can these patterns potentially guide your analysis? 4
Spring 2024 CP 6570 Socioeconomic GIS School of City & Regional Planning There is some clustering of correlated variables in the data shown in the outlined square. The data in the correlation matrix is standardized and only runs from -1 to 1. There are some variables with a high correlation of around 0.8 which indicates a near-perfect positive linear relationship. Two variables have a correlation coefficient of -1 indicating a perfect negative linear relationship. These two variables are the single-family head of house male and female, so it makes sense that there is a perfect correlation between the two variables. IV. Run the Principal Components Analysis on your dataset: Interpret the following report elements (include graphics/screenshots for all): a. Eigenvalue chart (how many Principal Components should you use? Why?) 5
Spring 2024 CP 6570 Socioeconomic GIS School of City & Regional Planning There are 4 principal components with values greater than 1. However, it does not make sense to keep four principal components. The first component has a value of 5.02 and explains about 30% of the variance. b. Loading plot (how do the variables load on the axes?) Explain c. Scree plot (how many Principal Components should you use? Why?) The curve starts to plateau around 4 components indicating that this is an adequate number as any other component only helps to explain very minimal amounts of variance. d. Loading matrix (what does this represent? What patterns do you observe for your data?) 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Spring 2024 CP 6570 Socioeconomic GIS School of City & Regional Planning A loading matrix identifies which variables have the largest effect on each component. Loadings can range from -1 to 1. Loadings close to -1 or 1 indicate that the variable strongly influences the component. Loadings close to 0 indicate that the variable has a weak influence on the component. The second time around taking only variables that have a high correlation with the first component. e. Eigenvectors (what does this represent?) What does this represent? f. If your initial analysis yields an unsatisfactory result (e.g. too many significant Principal Components), refine and run the analysis again. Explain why you did this. The second time around taking only variables that have a high correlation with the first component. 7
Spring 2024 CP 6570 Socioeconomic GIS School of City & Regional Planning V. Save the Principal Components to your data table: a. Briefly describe how this value is calculated b. Map the index (include map graphic) c. What spatial patterns do you observe? 8