dis03_solutions

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

8

Subject

Computer Science

Date

Feb 20, 2024

Type

pdf

Pages

5

Uploaded by CaptainFinch748

Report
Data 8 Fall 2023 Data Types, Extending Tables Lab 03 September 2023 In lecture, you have been introduced to various data types in Python such as integers, strings, and arrays. These data types are particularly important for manipulating and extracting useful information out of data, an important skill for data science. In this section, we’ll be analyzing some of the behavior that Python displays when dealing with particular data types. 1. ”Fun” with Arrays Suppose we have executed the following lines of code. Answer each question with the appropriate output associated with each line of code, or write ERROR if you think the operation is not possible. odd array = make array(1, 3, 5, 7) even array = np.arange(2, 10, 2) an array = make array(‘1’, ‘2’, ‘3’, ‘4’) a. odd array + even array array([3, 7, 11, 15]) b. odd array + an array ERROR Arrays added together must of be of same size and similar data type (i.e. ints and floats) c. even array.item(3) * odd array.item(1) 24 d. odd array * 3 array([3, 9, 15, 21]) e. (odd array + 1) == even array array([True, True, True, True]) f. an array.item(3) + ‘abcd’ ‘4abcd’ In this next section, we will practice working with tables. In particular, we’ll be focusing on table methods and what data types they return. This will help in understanding how to effectively manipulate tables. Remember to make use of the Python Reference guide when working through these questions – A similar guide will be provided on exams. 1
2. Drought Your friend Sarah is interested in the level of drought experienced in California over time. She obtained data on the percentage of the population experiencing each level of drought (D0 being the lowest severity, D4 being the highest), as well as the DSCI score (higher = more prevalence and severity of drought). The table below is called drought and was obtained from the National Drought Mitigation Center at the University of Nebraska-Lincoln. Unfortunately, the code Sarah wrote to analyze the data has some bugs. Below are some error messages that appeared, along with what Sarah was trying to calculate; describe the bugs and how you would fix them. a. The proportion of weeks with less than 20% of people experiencing D0 drought We cannot divide a table by a single integer. Correct code: drought.where(’D0’, are.below(20)).num rows / drought.num rows b. The difference each week of the percent of people in D1 drought and the percent in D4 drought We cannot subtract a table from an array. Correct code: drought.column(’D1’) - drought.column(’D4’) c. The week with the lowest DSCI score out of weeks with more than 10% experiencing D4 drought We have mismatched parentheses on line 1. Correct code: worst weeks = drought.where(’D4’, are.above(10)) week lowest DSCI = worst weeks.sort(’DSCI’).column(’Week’).item(0) 2
3. Concertgoers The pedagogy team loves to go to concerts! The table concerts contains information about the concerts they attended. There are 6 columns: Name : string, name of the concertgoer Artist : string, name of the performing artist Price : float, price of the ticket in dollars Seating : string, location of the seat at the concert Day : int, day of the week the concert was held (Monday is 1, Tuesday is 2, etc.) Parking : boolean, whether or not there was reserved parking at the venue Some rows are shown below: a. For each of the columns in concerts , identify if the data contained in that column is numerical or categorical. Name - Categorical Artist - Categorical Price - Numerical Seating - Categorical Day - Categorical Parking - Categorical b. Sam, Meiqi, and Lillian want to buy tickets for their next concert, but are concerned about the price. They have very specific conditions: The concert must be on a Friday, the concert venue must have parking, the seating must be in the pit, and the cost of the ticket must be $150 or less. Fill in the code blanks to find the average ticket cost of concerts that meet their conditions. good concerts = concerts.where( ).where( ) .where( ).where( ) average price = (good concerts.column( )) 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
good concerts = concerts.where(’Day’, are.equal to(5)).where(’Parking’, are.equal to(True)).where(’Seating’, are.equal to(’Pit’)).where(’Price’, are.below or equal to(150)) average price = np.mean(good concerts.column(’Price’)) 4
4. Fa17 Midterm Q2 Modified A table named seat contains a row for each time a student submitted the attendance form in lecture on September 18th, 20th, or 22nd. The table contains four columns. Email : a string, the email address of the student Row : a string, the letter of the row in which they claim to be seated Seat : an int, the number of the seat in which they claim to be seated Date : an int, the date of the submission, either 18, 20, or 22. Fill in the blanks of the Python expressions to compute the described values. You must use all and only the lines provided. The last (or only) line of each answer should evaluate to the value described. a. The largest seat number in the seat table. Method 1: max( ) max(seat.column(‘Seat’)) Method 2: .sort( , ) . . seat.sort(‘Seat’, descending=True).column(‘Seat’).item(0) b. The total number of attendance submissions for September 20th in rows A, B, C, D, or E. Hint: You can use Table.where predicates to compare letters lexicographically (e.g. A is below B), or use a different Table.where predicate u = seat. ( , ) u. ( , ). u = seat.where(‘Row’, are.below(‘F’)) OR u = seat.where(’Row’, are.contained in(make array(’A’, ’B’, ’C’, ’D’, ’E’))) u.where(‘Date’, 20).num rows 5