lab01

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

100

Subject

Computer Science

Date

Feb 20, 2024

Type

pdf

Pages

13

Uploaded by ColonelKookaburaMaster916

Report
lab01 January 24, 2024 [1]: # Initialize Otter import otter grader = otter . Notebook( "lab01.ipynb" ) 1 Lab 01 Welcome to the first lab of Data 100! This lab is meant to help you familiarize yourself with JupyterHub, review Python and NumPy , and introduce you to matplotlib , a Python visualization library. To receive credit for a lab, answer all questions correctly and submit before the deadline. You must submit this assignment to Gradescope by the on-time deadline, Tuesday, January 23rd, 11:59pm. Please read the syllabus for the grace period policy. Please read the syllabus for the grace period policy. No late submissions beyond the grace period will be accepted. While course staff is happy to help you if you encounter diffculties with submission, we may not be able to respond to late-night requests for assistance (TAs need to sleep, after all!). We strongly encourage you to plan to submit your work to Gradescope several hours before the stated deadline. This way, you will have ample time to contact staff for submission support. 1.1 Lab Walk-Through In addition to the lab notebook, we have also released a prerecorded walk-through video of the lab. We encourage you to reference this video as you work through the lab. Run the cell below to display the video. Note: This video is recorded in Spring 2022. There may be slight inconsistencies between the version you are viewing and the version used in the recording, but content is identical. [2]: from IPython.display import YouTubeVideo YouTubeVideo( "PS7lPZUnNBo" , list = 'PLQCcNQgUcDfrhStFqvgpvLNhOS43bnSQq' , listType = 'playlist' ) [2]: 1
1.1.1 Collaboration Policy Data science is a collaborative activity. While you may talk with others about the labs, we ask that you write your solutions individually . If you do discuss the assignments with others please include their names below. (It’s a good way to learn your classmates’ names too!) Collaborators : list collaborators here 1.2 Part 1: Jupyter Tips 1.2.1 Viewing Documentation To output the documentation for a function, use the help function. [3]: help( print ) Help on built-in function print in module builtins: print(*args, sep=' ', end='\n', file=None, flush=False) Prints the values to a stream, or to sys.stdout by default. sep string inserted between values, default a space. 2
end string appended after the last value, default a newline. file a file-like object (stream); defaults to the current sys.stdout. flush whether to forcibly flush the stream. You can also use Jupyter to view function documentation inside your notebook. The function must already be defined in the kernel for this to work. Below, click your mouse anywhere on the print block below and use Shift + Tab to view the function’s documentation. [4]: print ( 'Welcome to Data 100.' ) Welcome to Data 100. 1.2.2 Importing Libraries and Magic Commands In Data 100, we will be using common Python libraries to help us process data. By convention, we import all libraries at the very top of the notebook. There are also a set of standard aliases that are used to shorten the library names. Below are some of the libraries that you may encounter throughout the course, along with their respective aliases. [5]: import pandas as pd import numpy as np import matplotlib.pyplot as plt plt . style . use( 'fivethirtyeight' ) % matplotlib inline %matplotlib inline is a Jupyter magic command that configures the notebook so that matplotlib displays any plots that you draw directly in the notebook rather than to a file, al- lowing you to view the plots upon executing your code. (Note: In practice, this is no longer necessary, but we’re showing it to you now anyway.) Another useful magic command is %%time , which times the execution of that cell. You can use this by writing it as the first line of a cell. (Note that %% is used for cell magic commands that apply to the entire cell, whereas % is used for line magic commands that only apply to a single line.) [6]: %%time lst = [] for i in range ( 100 ): lst . append(i) CPU times: user 0 ns, sys: 24 µs, total: 24 µs Wall time: 30 µs 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1.2.3 Keyboard Shortcuts Even if you are familiar with Jupyter, we strongly encourage you to become proficient with keyboard shortcuts (this will save you time in the future). To learn about keyboard shortcuts, go to Help –> Keyboard Shortcuts in the menu above. Here are a few that we like: 1. Ctrl + Return (or Cmd + Return on Mac): Evaluate the current cell 1. Shift + Return : Evaluate the current cell and move to the next 1. Ctrl + + + / : Comment or uncomment the selected code at once 1. ESC : command mode (may need to press before using any of the commands below) 1. a : create a cell above 1. b : create a cell below 1. dd : delete a cell 1. z : undo the last cell operation 1. m : convert a cell to markdown 1. y : convert a cell to code 1.2.4 Running Cells Aside from keyboard shortcuts (specifically Shift + Return ), you can also run a single cell by clicking the Run button in the top left corner of your notebook. If you hover over the button, you will also find some other options that allow you to run multiple cells. Specifically, the Run All Above Selected Cell option is particularly useful for situations wherein you have restarted your notebook and need to run all the cells up until the question you were working on in a lab/homework. 1.3 Part 2: Prerequisites It’s time to answer some review questions. Each question has a response cell directly below it. Most response cells are followed by a test cell that runs automated tests to check your work. Please don’t delete questions, response cells, or test cells. You won’t get credit for your work if you do. If you have extra content in a response cell, such as an example call to a function you’re imple- menting, that’s fine. Also, feel free to add cells between the question cells and test cells (or the next cell, for questions without test cases). Any extra cells you add will be considered part of your submission. Finally, when you finish an assignment, make sure to “restart and run all cells” to ensure everything works properly. Note that for labs, on-time submissions that pass all the test cases will receive full credit. However, for homeworks, test cells don’t always confirm that your response is correct. They are meant to give you some useful feedback, but it’s your responsibility to ensure your response answers the question correctly. There may be other tests that we run when scoring your notebooks. We strongly recommend that you check your solutions yourself rather than just relying on the test cells. 1.3.1 Python Python is the main programming language we’ll use in the course. We expect that you’ve taken CS 61A, Data 8, or an equivalent class, so we will not be covering general Python syntax. If any of the following exercises are challenging (or if you would like to refresh your Python knowledge), please review one or more of the following materials. Python Tutorial : Introduction to Python from the creators of Python. Composing Programs Chapter 1 : This is more of an introduction to programming with Python. Advanced Crash Course : A fast crash course which assumes some programming back- ground. 4
1.3.2 NumPy NumPy is the numerical computing module introduced in Data 8, which is a prerequisite for this course. Here’s a quick recap of NumPy . For more review, read the following materials. NumPy Quick Start Tutorial DS100 NumPy Review Stanford CS231n NumPy Tutorial The Data 8 Textbook Chapter on NumPy 1.3.3 Question 1 The core of NumPy is the array. Like Python lists, arrays store data; however, they store data in a more effcient manner. In many cases, this allows for faster computation and data manipulation. In Data 8, we used make_array from the datascience module, but that’s not the most typical way. Instead, use np.array to create an array. It takes a sequence, such as a list or range. Below, create an array arr containing the values 1, 2, 3, 4, and 5 (in that order). [8]: arr = np . array([ 1 , 2 , 3 , 4 , 5 ]) arr [8]: array([1, 2, 3, 4, 5]) [9]: grader . check( "q1" ) [9]: q1 results: All test cases passed! In addition to values in the array, we can access attributes such as shape and data type. A full list of attributes can be found here . [10]: arr[ 3 ] [10]: 4 [11]: arr[ 2 : 4 ] [11]: array([3, 4]) [12]: arr . shape [12]: (5,) [13]: arr . dtype [13]: dtype('int64') Arrays, unlike Python lists, cannot store items of different data types. 5
[14]: # A regular Python list can store items of different data types [ 1 , '3' ] [14]: [1, '3'] [15]: # Arrays will convert everything to the same data type np . array([ 1 , '3' ]) [15]: array(['1', '3'], dtype='<U21') [16]: # Another example of array type conversion np . array([ 5 , 8.3 ]) [16]: array([5. , 8.3]) Arrays are also useful in performing vectorized operations . Given two or more arrays of equal length, arithmetic will perform element-wise computations across the arrays. For example, observe the following: [17]: # Python list addition will concatenate the two lists [ 1 , 2 , 3 ] + [ 4 , 5 , 6 ] [17]: [1, 2, 3, 4, 5, 6] [18]: # NumPy array addition will add them element-wise np . array([ 1 , 2 , 3 ]) + np . array([ 4 , 5 , 6 ]) [18]: array([5, 7, 9]) 1.3.4 Question 2 1.3.5 Question 2a Write a function summation that evaluates the following summation for 𝑛 ≥ 1 : 𝑛 𝑖=1 𝑖 3 + 3𝑖 2 Note : You should not use for loops in your solution. Check the NumPy documentation . If you’re stuck, try a search engine! Searching the web for examples of how to use modules is very common in data science. You may find np.arange helpful for this question! [23]: def summation (n): """Compute the summation i^3 + 3 * i^2 for 1 <= i <= n.""" arr = np . arange( 1 , n +1 ) 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
newArr = arr **3 + 3 * arr **2 return sum (newArr) [24]: grader . check( "q2a" ) [24]: q2a results: All test cases passed! 1.3.6 Question 2b Write a function elementwise_array_sum that computes the square of each value in list_1 , the cube of each value in list_2 , then returns a list containing the element-wise sum of these results. Assume that list_1 and list_2 have the same number of elements, do not use for loops. The input parameters will both be Python lists , so you may need to convert the lists into arrays before performing your operations. The output should be a NumPy array. [25]: def elementwise_array_sum (list_1, list_2): """Compute x^2 + y^3 for each x, y in list_1, list_2. Assume list_1 and list_2 have the same length. Return a NumPy array. """ assert len (list_1) == len (list_2), "both args must have the same number of elements" arr_1 = np . array(list_1) arr_2 = np . array(list_2) squaredArray1 = arr_1 **2 cubedArray2 = arr_2 **3 return squaredArray1 + cubedArray2 [26]: grader . check( "q2b" ) [26]: q2b results: All test cases passed! You might have been told that Python is slow, but array arithmetic is carried out very fast, even for large arrays. Below is an implementation of the above code that does not use NumPy arrays. [27]: def elementwise_list_sum (list_1, list_2): """Compute x^2 + y^3 for each x, y in list_1, list_2. Assume list_1 and list_2 have the same length. """ return [x ** 2 + y ** 3 for x, y in zip (list_1, list_2)] 7
For ten numbers, elementwise_list_sum and elementwise_array_sum both take a similar amount of time. [28]: sample_list_1 = list ( range ( 10 )) sample_array_1 = np . arange( 10 ) [29]: %%time elementwise_list_sum(sample_list_1, sample_list_1) CPU times: user 7 µs, sys: 2 µs, total: 9 µs Wall time: 15 µs [29]: [0, 2, 12, 36, 80, 150, 252, 392, 576, 810] [30]: %%time elementwise_array_sum(sample_array_1, sample_array_1) CPU times: user 0 ns, sys: 120 µs, total: 120 µs Wall time: 129 µs [30]: array([ 0, 2, 12, 36, 80, 150, 252, 392, 576, 810]) The time difference seems negligible for a list/array of size 10; depending on your setup, you may even observe that elementwise_list_sum executes faster than elementwise_array_sum ! However, we will commonly be working with much larger datasets: [31]: sample_list_2 = list ( range ( 100000 )) sample_array_2 = np . arange( 100000 ) [32]: %%time elementwise_list_sum(sample_list_2, sample_list_2) # The semicolon hides the output ; CPU times: user 18.6 ms, sys: 5.85 ms, total: 24.4 ms Wall time: 24.1 ms [33]: %%time elementwise_array_sum(sample_array_2, sample_array_2) # The semicolon hides the output ; CPU times: user 2.22 ms, sys: 2.09 ms, total: 4.31 ms Wall time: 3.62 ms With the larger dataset, we see that using NumPy results in code that executes over 50 times faster! Throughout this course (and in the real world), you will find that writing effcient code will be important; arrays and vectorized operations are the most common way of making Python programs run quickly. 8
1.3.7 Question 2c Recall the formula for population variance below: 𝜎 2 = 𝑁 𝑖=1 (𝑥 𝑖 − 𝜇) 2 𝑁 Complete the functions below to compute the population variance of population , an array of numbers. For this question, do not use built-in NumPy functions, such as np.var . Again, avoid using for loops! For a refresher on what variance is, feel free to read up on it in the Data 8 Textbook here ! [39]: def mean (population): """ Returns the mean of population (mu) Keyword arguments: population -- a numpy array of numbers """ # Calculate the mean of a population return sum (population) / len (population) def variance (population): """ Returns the variance of population (sigma squared) Keyword arguments: population -- a numpy array of numbers """ # Calculate the variance of a population return sum ((population - mean(population)) **2 ) / len (population) [40]: grader . check( "q2c" ) [40]: q2c results: All test cases passed! 1.3.8 Question 2d Given the array random_arr , assign valid_values to an array containing all values 𝑥 such that 2𝑥 4 > 1 . Note : You should not use for loops in your solution. Instead, look at NumPy ’s documentation on Boolean Indexing . Documentation can be very intimidating at first glance, but don’t worry, that’s completely okay, one of the goals of this class is to build familiarity with reading the documentation of data science tools. Ask for help if needed, we’re always there for you! 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
[45]: np . random . seed( 42 ) random_arr = np . random . rand( 60 ) evaluation = 2* random_arr **4 trueFalseArray = evaluation > 1 valid_values = random_arr[trueFalseArray] valid_values [45]: array([0.95071431, 0.86617615, 0.96990985, 0.94888554, 0.96563203, 0.9093204 , 0.96958463, 0.93949894, 0.89482735, 0.92187424]) [46]: grader . check( "q2d" ) [46]: q2d results: All test cases passed! 1.4 Part 3: Plotting Here we explore plotting using matplotlib and NumPy . 1.4.1 Question 3 Consider the function 𝑓(𝑥) = 𝑥 2 for −∞ < 𝑥 < ∞ . 1.4.2 Question 3a Find the equation of the tangent line to 𝑓 at 𝑥 = 0 . Type your solution, such that it looks like the serif font used to display the math expressions in the sentences above. HINT : You can click any text cell to see the raw Markdown syntax. If you choose to use LaTeX, our Latex tips guide is linked here , but by no means do you need to use it. 𝑦 = 0 1.4.3 Question 3b Find the equation of the tangent line to 𝑓 at 𝑥 = 8 . 𝑦 = 16𝑥 − 64 1.4.4 Question 3c Write code to plot the function 𝑓 , the tangent line at 𝑥 = 8 , and the tangent line at 𝑥 = 0 . Set the range of the x-axis to (-15, 15) and the range of the y-axis to (-100, 300) and the figure size to (4,4). 10
Your resulting plot should look like this (it’s okay if the colors in your plot don’t match with ours, as long as they’re all different colors): You should use the plt.plot function to plot lines. You may find the following functions useful (click on them to read about their documentation!): plt.plot(..) plt.figure(figsize=..) plt.ylim(..) plt.axhline(..) [47]: def f (x): return x ** 2 def df (x): return 2 * x def plot (f, df): plt . figure(figsize = ( 4 , 4 )) x = np . arange( -15 , 15 , .2 ) plt . plot(x, f(x)) plt . axhline( 0 ) plot(f, df) 11
1.4.5 Question 4 (Ungraded) Data science is a rapidly expanding field and no degree program can hope to teach you everything that will be helpful to you as a data scientist. So it’s important that you become familiar with looking up documentation and learning how to read it. Below is a section of code that plots a three-dimensional “wireframe” plot. You’ll see what that means when you draw it. Replace each # Your answer here with a description of what the line above does, what the arguments being passed in are, and how the arguments are used in the function. For example, np.arange(2, 5, 0.2) # This returns an array of numbers from 2 to 5 with an interval size of 0.2 Hint: The Shift + Tab tip from earlier in the notebook may help here. Remember that objects must be defined in order for the documentation shortcut to work; for example, all of the docu- mentation will show for method calls from np since we’ve already executed import numpy as np . However, since z is not yet defined in the kernel, z.reshape(x.shape) will not show documentation until you run the line z = np.cos(squared) . [ ]: from mpl_toolkits.mplot3d import axes3d u = np . linspace( 1.5 * np . pi, -1.5 * np . pi, 100 ) # Your answer here [x, y] = np . meshgrid(u, u) # Your answer here squared = np . sqrt(x . flatten() ** 2 + y . flatten() ** 2 ) z = np . cos(squared) # Your answer here z = z . reshape(x . shape) # Your answer here fig = plt . figure(figsize = ( 6 , 6 )) ax = fig . add_subplot( 111 , projection = '3d' ) # Your answer here ax . plot_wireframe(x, y, z, rstride = 5 , cstride = 5 , lw = 2 ) # Your answer here ax . view_init(elev = 60 , azim = 25 ) # Your answer here plt . savefig( "figure1.png" ) # Your answer here 1.4.6 Question 5 (Ungraded) Do you think a hotdog is a sandwich? Tell us what you think in the following Markdown cell. :) Answer: Yes, its like a po boy 12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1.5 Congratulations! You have finished Lab 1! 1.6 Submission Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. Please save before exporting! [48]: # Save your notebook first, then run this cell to export your submission. grader . export(pdf = False , run_tests = True ) Running your submission against local test cases… Your submission received the following results when run against available test cases: q1 results: All test cases passed! q2a results: All test cases passed! q2b results: All test cases passed! q2c results: All test cases passed! q2d results: All test cases passed! <IPython.core.display.HTML object> 13