Exam-3 Practice Exam

pdf

School

University of Southern California *

*We aren’t endorsed by this school

Course

561

Subject

Computer Science

Date

Feb 20, 2024

Type

pdf

Pages

19

Uploaded by MegaRam3541

Report
Final Exam CSCI 561 Fall 2022: Foundation of Artificial Intelligence Instructions: 1. Maximum credits/points for this midterm: 100 points. 2. No books (or any other material) are allowed. 3. All the questions in this exam are going to be auto-graded. This means that you should exactly follow the instructions in entering your results. 4. You are allowed to use a calculator. 5. Some questions have hints. Be sure to check them before solving the problem. 6. Adhere to the Academic Integrity Code. 7. Please make sure that you write the answers in the format discussed. Problems 100 Percent Total 1-General Al Knowledge 18% 2 Decision Trees 12% 3 - Neural Networks 15% 4 Bayesian Networks 15% 5 Probability Theory 15% 6 —HMM, Temporal Model 15% 7 Naive Bayes 10%
1. True/False [18%] For each of the statements below, fill in the bubble T if the statement is always and unconditionally true, or fill in the bubble F if it is always false, sometimes false, or just does not make sense: 1. Both deductive and inductive learning agents learn new rules/facts from a dataset. 2. Learning is useful as a system construction method, because we only need to expose the agents to reality without any manual input. 3. Inthe ID3 algorithm, we need to choose the attribute that has the largest expected information gain. 4. Both perceptron and decision tree learning can learn majority function (output 1 if and only if more than half of n binary variables are 1) easily. It is representable within a perceptron and only needs a few branches in DTL (Decision Tree Learning). 5. The process of learning of a neural network happens in both the feed-forward (prediction) part and back-propagation part. 6. The basic principles of deep learning are similar to those of basic neural networks, but deep learning has newer methods for larger datasets. 7. Probabilities of propositions may change with new evidence. 8. A complete probability model specifies every entry in the joint distribution for all the variables. 9. When calculating a probability distribution, normalization will be needed in the end to make the distribution sum to 1. However, even if you use the inference rules properly, the normalization may not be preserved. 10. Probabilistically speaking, two coin tosses are conditionally independent. 11. The reward for a probabilistic decision making model can be given from states R(s), stats-action R(s, a), or transition R(s, a, s). 12. The principle of a MEU (Maximum Expected Utility) is that a rational agent should always choose the action that maximizes the utility. 13. The major difference between a POMDP (Partially Observed Markov Decision Process) and a general MDP (Markov Decision Process) is merely a sensor model P(e|s). 14. States transit randomly for Markov Chains and Hidden Markov Models. 15. In HMM (Hidden Markov Models), there are two important independence properties. The first is that the future depends only on the present; the second is that observations are independent of each other. 16. Forward procedure computes all at(si) on state trellis, while the Viterbi algorithm only computes the best for each step.
17. Discrete valued dynamic Bayes nets are not HMMs (Hidden Markov Models). 18. For Bayesian learning, we are given a set of new data D, background knowledge X, and are supposed to predict a concept C where P(C|DX) is the most probable.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2. Decision Trees [12%] Lyft wants to analyze if a student at USC gets a lyft depending on if it is raining around the university, if the destination is near or far and whether or not the ride was free. They have provided the training data below and they need your help to train a machine to decide whether a student gets a lyft. Note: (for calculations, always take digits up to 3 decimal places and drop the rest without rounding. Eg. 0.9737 becomes 0.973) For all the following questions, use log base 2 (Use Table 2.1 to answer Q1-3) (Table 2.1) # Rain Free? Near? Takes Lyft 1| Yes No Yes Yes 2| No No Yes No 3| Yes No No Yes 4| Yes No No Yes 5| No Yes Yes Yes 6| Yes Yes Yes Yes 7 | No Yes Yes No 8| Yes Yes No Yes 9 [ No Yes Yes No 10 | Yes Yes Yes Yes
Q1. Calculate the information conveyed by the distribution of the Takes Lyft column to 3 decimal places [2%] 1. 0.879 2. 0.933 3. 1 4. 0.325 Q2. Which would be the best attribute to split on? (Assume this attribute to be X for further questions) [4%] A. Rain B. Free C. Near Q3. What is the value of Remainder(Free) ? (Ans up to 3 decimal places) [2%] a. 0.423 b. 0.634 c. 0.875 d. 0 Q4. Assume that the Entropy and Remainder values of the given training data is: (Use Table 2.2 to answer questions Q4 a. and Q4 b.) Entropy = 0.910 Remainder(X) = 0.230 Remainder(Y) = 0.510 Remainer(Z) = 0.810 ( Table 2.2)
SrNo X Y z is Correct? 1 TRUE TRUE FALSE Yes 2 FALSE TRUE TRUE No 3 FALSE TRUE TRUE No 4 TRUE FALSE TRUE Yes 5 TRUE FALSE TRUE Yes 6 FALSE FALSE TRUE No 7 TRUE TRUE TRUE Yes 8 FALSE FALSE FALSE No 9 TRUE TRUE FALSE Yes a. Which would be the worst attribute to split on for the given data? [2%] 1. X 2. Y 3 z b. What output (IS Correct) would the machine give after being trained (assuming the calculations are correct for the root node) for the test data where X=False, Y= False and Z=True. [Hint: The decision tree learned uses only one attribute] [2%)] 1. Yes
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3. Neural Networks [15%] For the following perceptron model, (o) > Loss function, P = —(d—0)%/2 Learning rate, a = 100 Activation function = sigmoid(x) = 1/(1+e™) Expected outputd =1 1. Forthe giveninputs X0=1,X1=1,X2=1,X3 =1, X4 =0.5, X5 = 0 if the value of the predicted output “0” after forward propagation is 0.9, calculate the updated weights after running backpropagation:
Note: (for calculations, always take digits up to 3 decimal places and drop the rest without rounding. Eg. 0.9737 becomes 0.973) [2% each] Original Solution: 1. Minimizing with 0.9 (d-0)*o*(1-0)*x; Original Weight (w;) | Rate (alpha) [ Input(x) | alpha * dP/dw = alpha * New Weight (d-0)*o*(1-0)*x; 2. Maximizing with 0.9 Original Weight (w;) | Rate (alpha) [ Input(x) | alpha * dP/dw = alpha * New Weight
3. Minimizing with 0.88 (d-0)*0*(1-0)*x; Original Weight (w;) | Rate (alpha) [ Input(x) [alpha* dP/dw =alpha * New Weight (d-0)*0*(1-0)*x; 4. Maximizing with 0.88 Original Weight (w;) | Rate (alpha) [ Input(x) [alpha* dP/dw =alpha * New Weight 2. [3%] Calculate the predicted output for the updated weights using the same input values: 1. Using (1)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2. Using(2) 3. Using (3) 4. Using (4)
4. Bayesian Networks [15%)] Round off your final answer to three decimal places. Ex 0.36899 should be chopped as 0.368, 0.1 should be written as as 0.100. P(X|D) +d | +x [ 07 P(AID, X) +d [ +z [ +a 09 Fd |+ [ —a |01 E —d | 45|08 Yd |~ | fa |08 P(D) —d| -z |02 Td| o] —a02 0 Fd]o01 —d [ ¥z | +a |06 —d |09 P(BID) (a) [4 pts] What is the probability of having disease D and getting a positive result on test A? P(+d, +a) = Y P(+d,x,+a) = Y, P(+a|+d,x)P(x|+d)P(+d) = P(+d)}; P(+a| + d,x)P(x| +d) = (0.1)((0.9)(0.7) + (0.8)(0.3)) = 0.087 (b) [4 pts] What is the probability of not having disease D and getting a positive result on test A? P(—d,+a) = ¥, P(—d,x,+a) = Y;P(+a| d,x)P(x| d)P(—d) = P(—d)} P(+a| d,x)P(x| = d) = (0.9)((0.6)(0.8) + (0.1)(0.2)) = 0.450 (c) [3 pts] What is the probability of having disease D given a positive result on test A? P(+d| +a) = P(+a,+d) / P(+a) = P(+a,+d)/ 3;P(+a,d) = 0.087/(0.087+0.45) = 0.162 d
(d) [4 pts] What is the probability of having disease D given a positive result on test B? P (+d| + b) = P (+b|+d)P (+d)/P(+b) = P (+b|+d)P (+d)/ XP(+b|d)P(d)= d (0.7)(0.1)/((0.7)(0.1)+(0.5)(0.9))=0.1346 =0.134
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
5. Probability theory[15%] Ash enters a reality game where 30 different types of Pokémon are present, he sets his journey to chase down powerful ones. Being a Pokémon capturing expert he can capture every Pokémon possible. Each one that he finds and battles against has an equally likely chance of being one of the 30 types. 1. (5%) If Ash captures 5 Pokémon, what is the probability that he has captured at least 2 of the same type. Solution: 0. 296 Probability of all 5 dif ferent types = 30 By B 2 26 _ 30 X 30 X 50 X 30 X 50 = 0-7037 Probability of at least 2 of the same type = 1 (0.7037) = 0.296 2. (5%) Assuming there is no limit on the number of Pokémon that Ash can capture, how many would he have to take down to make the probability of at least 2 of the same type more likely than not. (Answer as a whole number without decimal point precision) Solution: 7 Since we know that P(at least 2 of the same type) = 1 P(all k dif ferent types) P(all k dif ferent types) = 3‘;0 X 3‘;1 X 3‘;32 X 32: .. Where0 < i<k We need to find the smallest k where P(all k dif ferent types) < 0.5 so that the first expression becomes > 0.5. We can find this iteratively, Probability of all 5 dif ferent types % X % X % X % X % = 0.70 30 29 28 27 26 25 25w 2L X = Probability of all 6 dif ferent types = —- X 55 X 5= X 5= X 5 X == =0.59 30 29 28 27 26 25 24 30 X 30 X 30 X730 X 30 X 30 X730 - 047 Probability of all 7 dif ferent types
Smallest k that makes P(all k dif ferent types) < 0.5is 7. 3. (5%) Ash loses a battle after 2 powerful Pokémon types Darkrai and Scizor are added to the game. Probability that the Pokémon that he lost to being Darkrai is 70% and Scizor is 30%. Darkrai has a special attack which it uses with a probability of 0.90. The same special attack is occasionally used by Scizor with a probability of 0.08. If Ash loses because of the special attack, what is the probability that the Pokémon that he encountered is Darkrai. Solution: 0.963 Say, Let D stand for an event of a Pokemon being Darkrai P(D) = 0.7 Let S stand for an event of a Pokemon being Scizor P(S) = 0.3 Let A stand for an event where a special attack is seen P(A|D) = 0.90 P(A|S) = 0.08 _ P(A| D)X P(D PO = Saoxroye reA |7 PG) 0.90% 0.7 P1A) = 5507+ ooex03 P(D|A) = 0.9633
6. HMM - Temporal Modeling[15%] An environment is defined as follows: States = {S1, S2, S3} Observations = {a, b, ¢} Transition probabilities: T(<initial state>, <final state>) S1 | s2 | s3 S1 |00 (05|05 S2 | 1.0 (00 | 00 S3 | 00 (10 |00 Emission/Observation Probabilities: E(<state>, <observation>) a b c S1 |05 (05|00 S2 |03 (03|04 S3 |0.25( 0.0 |0.75 Initial State Probabilities: Tr(<state>) m S1 |0.25 S2 |0.75 S3 | 0.0 Answer the below questions based on the information above: 1. [2%] How many non-zero edges will be there in the state diagram of this environment? [ANS] 4 or 11(if including emmision edges)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Explanation: 2. [4%)] Select all the possible state paths for the Observation Sequence O = 3, ¢, a in the list below. a. S1,52,51 b. S1,53,52 ¢ $3,52,51 d. S1,52,51 e. S2,51,52 f s2,51,53 g $1,52,53 3. [5%] What is the probability of the observation sequence O =3, ¢, a? [ANS] 0.027 Explanation: P(0) =3 P(O, 5) =P(0, 5=51, 52, S1) + P(O, s=51, $3, S2) s P(0) = (S1). E(S1, @). T(S1, S2).E(S2, c). T(S2, S1).E(S1, a) + T(S1). E(S1, ). T(S1, S3). E(S3, c). T(S3, S2).E(S2, @) P(O) = 0.25*0.5*0.5%0.4*1.0%0.5 + 0.25*0.5*0.5*0.75*1.00.3 P(O) = 0.01250 + 0.01406 = 0.02656 4. [2%] What is the most probable path for getting the observation sequence O =3, ¢, a?
a. S1,52,51 b. S1,S3, S2 explanation: P(O, s=S1, S3, S2) = 0.01406 is the maximum ¢ $3,52,51 d. S1,52,51 e. S2,51,52 f s2,51,53 g $1,52,53 5. [2%] The classroom slides defined some general problems for temporal models. Which general problem does the above question (i.e, determining the most probable state path for getting the observation sequence O = 3, c, a), come in: a. State Explanation (P(Xl_t|E1_t)) b. State Estimation (P(X |E, )) c. State Prediction (P(X,_ |E, ), k > 0) t+k! d. State Smoothing (P(XkIE1-:)' k<t e. Model Learning (P(M:IEl:r))
7. Naive Bayes[10%] As a star student in Al class, you have been recruited by the local weather station to help predict the weather. They give you access to their secret data store, displayed below. Only consider data from this table when making your assessments. Simplify all fractions if possible. (Answers in red) Temp above 75 | Humidity above 65 | Pressure USC won football | Storm next day? below 40 0 1 1 1 0 1 0 0 0 0 1 0 1 1 0 0 1 1 1 1 1 0 1 0 1 1 0 0 1 0 0 0 1 1 0 Use the following abbreviations for the columns: T =temp above 75 H = humidity above 65 P = pressure below 40 W = USC won football S = Storm next day? 1. (45) What is the maximum likelihood estimate for a storm the next day? [2%] P(S=1) =_/_ A:2/7[1%] P(S=0) =_/_ A:5/7[1%] 2. (46) What is the conditional probability of USC winning football given there will be a storm the next day? [2%] PW=1]S=1) =_/_A:1/2[1%] PW=1]S=0) =_/_A:4/5[1%]
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3. (47) Although the news team seems insistent, you recommend not using the USC won football column to predict storms. Select all the correct reasons you can give them as to why you ignore this data. [3%] The USC won football column does not contain any information and won’t produce better results. Even if this column contains information, it is likely irrelevant to the predicted variable and may cause noisy predictions. Using more data typically causes overfitting, leading to incorrect future predictions. Using more data typically causes underfitting, decreasing future performance. OR The USC won football column does not contain any information and won’t produce better results. Even if this column contains information, it is likely irrelevant to the predicted variable and may cause noisy predictions. Using more data typically causes overfitting, leading to incorrect future predictions. Using more data typically causes underfitting, decreasing future performance. 4. (48) Using the 3 approved columns (T,H,P), use the Naive Bayes method to determine the joint probabilities P(S=1, X) and P(S=0, X), where X is (T =1, H = 1, P = 0), then select the predicted class c* (S=1 or S=0). Report your answer to 3 significant digits (ex: 0.0123). [3%) P(S=1,X) = 0 P(S=0,X) = 0.0342 (0.034 <= A <= 0.035) c*= 0 P(X|S=1) =>P(T=1|S=1)*P(H=1|S=1)*P(P=0]S=1)=1/2*1/2*0 =0 P(X|S=0) =>P(T=1|S=0)*P(H=1|S=0)*P(P=0]S=0)=3/5%1/5*2/5=6/125 Use: c* = argmax P(c) * P(d|c) =>P(S=1,X) =P(S=1) *P(X|S=1) = 2/7*1/4 =0 =>P(S=0,X) =P(S=0) *P(X | S=0) = 5/7 * 6/125 = 12/875 = 0.0342 =>c*=>5=0 (Full marks given, original answer was incorrect)