Lab Assignment 1

docx

School

University of Arkansas, Little Rock *

*We aren’t endorsed by this school

Course

7342

Subject

Computer Science

Date

Dec 6, 2023

Type

docx

Pages

3

Uploaded by JusticeGalaxy2907

Report
Python Lab Experiments Lab Assignment 1 INFQ 7342 – Information Quality Tools Complete the following exercises. Turn in your answers (including the questions please) via Blackboard. Do not send assignments by email. No assignments sent via email will be graded Turn in all scripts, answers and charts to Black Board. Data understanding and visualization (A) Download survey_multiple_choice.tsv . This data represents the answers to the multiple- choice survey given to another Data Science class. A brief description of this dataset is given in survey_multiple_choice_description.txt. Use the Excel to answer the following questions. Document how you arrived at the answer, as well as the answer. 1. How many students have completed the survey? 2. How many students claim to have each skill level for the unix shell, databases, and programming, respectively? 3. Which discipline (unix, database, or programming) has the highest overall skill level amongst the students responding to the survey? Which discipline has the lowest? 4. Write a simple python program to consume this data. (hint: look at the string split method to get at the data in the individual rows) Program must at least read the data, split and output the data. Submit the program with this homework. 5. Using Excel, make a plot of the distribution of skill levels, starting from the lowest and going to the highest for each of the three disciplines. You may wish to substitute the strings describing the skill level with a numeric value. Combine these three plots, overlaying them on a single graph. Make sure each line is a different color for each line, and make a legend to tell the different colors apart. Income Prediction Download marketing.data . This data set is described in marketing.info . Use Python to answer these questions. 1. How many lines are in the file? 2. Notice that many lines have some fields unavailable (NA). Remove any lines without complete data. How many lines remain?
3. The fifth column corresponds to education level. What is the most common education level? 4. What is the income distribution for households with education of ‘Grad Study’? 5. Consider the following simple model of income level using only education level. Let 4 be the nominal income level, with the following adjustments in income level being made according to education: education level income modifier 1 -3 2 -1 3 0 4 +1 5 +3 6 +4 a. What is the total difference between actual and predicted income level using the above model? b. What about the average difference per user? 7. Consider the following modification to the model presented in question 6 that additionally incorporates the following information about a person’s occupation: occupation income modifier 1 +2.5 2 +.6 3 0 4 +.2 5 -.5 6 -1.5 7 +.3 8 +.8 9 -2.5 In this setting, we are using a two-factor estimate an individual’s income, according to both occupation and education level. a. What is the total difference between actual and predicted income level using the above model? b. What about the average difference per user? c. Is this better or worse than the model presented in question 6? d. Is this model more likely to overestimate or underestimate an individual’s income level?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help