homework1

pdf

School

San Jose State University *

*We aren’t endorsed by this school

Course

256

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

4

Uploaded by MagistrateProtonKookabura32

Report
1/4 Homework1 General Instructions This (and every) assignment has a written part and a programming part. a. This icon means a written answer is expected in cs256homework1.pdf . b. This icon means you should write code in submission.py . All written answers must be in order and clearly and correctly labeled to receive credit. You should modify the code in submission.py between # BEGIN_YOUR_CODE and # END_YOUR_CODE but you can add other helper functions outside this block if you want. Do not make changes to files other than submission.py . Your code will be evaluated on two types of test cases, basic and hidden , which you can see in grader.py . Basic tests, which are fully provided to you, do not stress your code with large inputs or tricky corner cases. Hidden tests are more complex and do stress your code. The inputs of hidden tests are provided in grader.py , but the correct outputs are not. To run the tests, you will need to have graderUtil.py in the same directory as your code and grader.py . Then, you can run all the tests by typing python grader.py This will tell you only whether you passed the basic tests. On the hidden tests, the script will alert you if your code takes too long or crashes, but does not say whether you got the correct output. You can also run a single test (e.g., 3a-0-basic ) by typing python grader.py 3a-0-basic We strongly encourage you to read and understand the test cases, create your own test cases, and not just blindly run grader.py . Do not pay attention to the max points displayed by grader.py Welcome to your first CS256 assignment! The goal of this assignment is to sharpen your math and programming skills needed for this class. Some of these problems may occur again as subproblems of later homework, so make sure you know how to do them. Problem 1: Optimization and probability (20 points) In this class, we will cast a lot of AI problems as optimization problems, that is, finding the best solution in a rigorous mathematical sense. At the same time, we must be adroit at coping with uncertainty in the world, and for that, we appeal to tools from probability. a. Let x 1 , , x n be real numbers representing positions on a number line. Let w 1 , , w n be positive real numbers representing the importance of each of these positions. Consider the quadratic function: f (θ) = 1 ! n w i " x i ) 2 . What value of θ minimizes f (θ) ? You can think about this problem as trying to find the 2 i=1 point θ that's not too far away from the x i 's. Over time, hopefully you'll appreciate how nice quadratic functions are to minimize. What problematic issues could arise if some of the w i 's are negative? b. Suppose you repeatedly roll a fair six-sided die until you roll a 1 (and then you stop). Every time you roll a 2 , you lose a points, and every time you roll a 6, you win b points. You do not win or lose any points if you roll a 3, 4, or a 5. What is the expected number of points (as a function of a and b ) you will have when you stop? Hint: it is recommended to think of defining a recurrence. c. Suppose the probability of a coin turning up heads is 0 < p < 1 , and that we flip it 7 times and get {H, H, T, H, T, T, H} . We know the probability (likelihood) of obtaining this sequence is L(p) = pp(1 " p)p(1 " p)(1 " p)p = p 4 (1 " p) 3 . Now let's go back and ask the question: what value of p maximizes L(p) ? What is an intuitive interpretation of this value of p ? Hint: Consider taking the derivative of log L(p) . You can also directly take the derivative of L(p) , but it is cleaner and more natural to differentiate log L(p) . You can verify for yourself that the value of p which maximizes log L(p) must also maximize L(p) (you are not required to prove this in your solution). d. [Extra Credit] Let's practice taking gradients, which is a key operation for being able to optimize continuous
2/4 k=1 k n i=1 n j=1 i j λ 2 2 ) functions. For w $ d (represented as a column vector) and constants a i , b j $ d (also represented as column vectors) and λ $ , define the scalar-valued function. f (w) = ( ∑ ∑ (a ! w " b ! w) 2 ) + w 2 , where the vector is w = (w 1 , , w d ) ! and w 2 = ! d w 2 = w T w is known as the L 2 norm. Compute the gradient % f (w) . If you're not comfortable with vector calculus, first warm up by working out this problem using scalars in place of vectors and derivatives in place of gradients. Not everything for scalars goes through for vectors, but the two should at least be consistent with each other (when d = 1 ). Do not write out summations over dimensions, because that gets tedious Recall: the gradient is a d -dimensional vector of the partial derivatives with respect to each w i : % f (w) = ( & f (w) , & w 1 & f (w) ! . & w d
3/4 Problem 2: Complexity (10 points) When designing algorithms, it's useful to be able to do quick back of the envelope calculations to see how much time or space an algorithm needs. Hopefully, you'll start to get more intuition for this by being exposed to different types of problems. a. Suppose we have an image of a human face consisting of n × n pixels. In our simplified setting, a face consists of two eyes, two ears, one nose, and one mouth, each represented as an arbitrary axis-aligned rectangle (i.e. the axes of the rectangle are aligned with the axes of the image). As we'd like to handle Picasso portraits too, there are no constraints on the location or size of the rectangles. How many possible faces (choice of its component rectangles) are there? In general, we only care about asymptotic complexity, so give your answer in the form of O(n c ) or O(c n ) for some integer c . b. Suppose we have a staircase with n steps (we start on the ground, so we need n total steps to reach the top). We can take as many steps forward at a time, but we will never step backwards. How many ways are there to reach the top? Give your answer as a function of n . For example, if n = 3 , then the answer is 4 . The four options are the following: (1) take one step, take one step, take one step (2) take two steps, take one step (3) take one step, take two steps (4) take three steps. Problem 3: Ethical Issue Spotting (10 points) One of the goals of this course is to teach you how to tackle real-world problems with tools from AI. But real-world problems have real- world consequences. Along with technical skills, an important skill every practitioner of AI needs to develop is an awareness of the ethical issues associated with AI. The purpose of this exercise is to practice spotting potential ethical concerns in applications of AI - even seemingly innocuous ones. In this question, you will explore the ethics of four different real-world scenarios using the ethics guidelines produced by a machine learning research venue, the NeurIPS conference. The NeurIPS Ethical Guidelines list sixteen non-exhaustive concerns under Potential Negative Social Impacts and General Ethical Conduct (the numbered lists). For each scenario, you will write a potential negative impacts statement. To do so, you will first determine if the algorithm / dataset / technique could have a potential negative social impact or violate general ethical conduct (again, the sixteen numbered items taken from the NeurIPS Ethical Guidelines page). If the scenario does violate ethical conduct or has potential negative social impacts, list one concern it violates and justify why you think that concern applies to the scenario. If you do not think the scenario has an ethical concern, explain how you came to that decision. Unlike earlier problems in the homework there are many possible good answers. If you can justify your answer, then you should feel confident that you have answered the question well. Each of the scenarios is drawn from a real AI research paper. The ethics of AI research closely mirror the potential real-world consequences of deploying AI, and the lessons you’ll draw from this exercise will certainly be applicable to deploying AI at scale. As a note, you are not required to read the original papers, but we have linked to them in case they might be useful. Furthermore, you are welcome to respond to anything in the linked article that's not mentioned in the written scenario, but the scenarios as described here should provide enough detail to find at least one concern. What we expect: A 2-5 sentence paragraph for each of the scenarios where you either A. identify at least one ethical concern from the NeurIPS Ethical Guidelines and justify why you think it applies, or B. state that you don’t think a concern exists and justify why that’s the case. Chosen scenarios may have anywhere from zero to multiple concerns that match, but you are only required to pick one concern (if it exists) and justify your decision accordingly. We have also included a citation in the example solution below, but you are not required to add citations to your response. Example Scenario : You work for a U.S. hospital that has recently implemented a new intervention program that enrolls at-risk patients in programs to help address their chronic medical issues proactively before the patients end up in the hospital. The intervention program automatically identifies at-risk patients by predicting patients’ risk scores, which are measured in terms of healthcare costs. However, you notice that for a given risk score tier, the Black patients are considerably sicker when enrolled than white patients, even though their assigned illness risk score is identical. You manually re-assign patients’ risk scores based on their current symptoms and notice that the percentage of Black patients who would be enrolled has increased from 17% to over 45% [1] . Example Solution : This algorithm has likely encoded, contains, or potentially exacerbates bias against people of a certain race or ethnicity since the algorithm predicts healthcare costs. Because access to medical care in the U.S. is unequal, Black patients tend to have lower healthcare costs than their white counterparts [2] . Thus the algorithm will incorrectly predict that they are at lower risk. a. [2 points] An investment firm develops a simple machine learning model to predict whether an individual is likely to default on a loan from a variety of factors, including location, age, credit score, and public record. After looking through their results, you find that the model predicts mainly based on location and that the model mainly accepts loans from urban centers and denies loans from rural applicants [3] . Furthermore, looking at the gender and ethnicity of the applicants, you find that the model has a significantly higher false positive rate for Black and male applicants than for other groups. In a false positive prediction, a model misclassifies someone who does not default as likely to default.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4/4 b. [2 points] Stylometry is a way of predicting the author of contested or anonymous text by analyzing the writing patterns in the anonymous text and other texts written by the potential authors. Recently, highly accurate machine learning algorithms have been developed for this task. While these models are typically used to analyze historical documents and literature, they could be used for deanonymizing a wide range of texts, including code [4] . c. [2 points] A research group scraped millions of faces of celebrities off of Google images to develop facial recognition technology [5] . The celebrities did not give permission for their images to be used in the dataset and many of the images are copyrighted. For copyrighted photos, the dataset provides URL links to the original image along with bounding boxes for the face. d. [2 points] Researchers have recently created a machine learning model that can predict plant species automatically directly from a single photo [6] . The model was trained using photos uploaded to the iNaturalist app by users who consented to use of their photos for research purposes, and the model is only used within the app to help users identify plants they might come across in the wild. Problem 4: Programming (30 points) In this problem, you will implement a bunch of short functions. The main purpose of this exercise is to familiarize yourself with Python, but as a bonus, the functions that you will implement will come in handy in subsequent homeworks. If you're new to Python, the following provide pointers to various tutorials and examples for the language: Python for Programmers Example programs of increasing complexity a. Implement findAlphabeticallyLastWord in submission.py . b. Implement euclideanDistance in submission.py . c. Implement sparseVectorDotProduct in submission.py . d. Implement incrementSparseVector in submission.py . e. Implement findSingletonWords in submission.py . Submission Submission is done on Canvas. Please submit cs256homework1.pdf and submission.py . [1] Obermeyer et al. Dissecting racial bias in an algorithm used to manage the health of populations. 2019. [2] Institue of Medicine of the National Academies. Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care. 2003. [3] Imperial College London. Loan Default Prediction Dataset. 2014. [4] Caliskan-Islam et. al. De-anonymizing programmers via code stylometry. 2015. [5] Parkhi et al. VGG Face Dataset. 2015. [6] iNaturalist. A new vision model. 2020.