midterm_solution

pptx

School

Arizona State University *

*We aren’t endorsed by this school

Course

598

Subject

Computer Science

Date

Jun 4, 2024

Type

pptx

Pages

19

Uploaded by loading....

Report
Midterm Solutions Duo Lu <duolu@asu.edu>, Yezhou Yang <yz.yang@asu.edu>
Question 1 In which of the following scenarios can you use a weak perspective camera model for the target object? (a) A Squirrel passing quickly in front of you. (b) An airplane flying at a very high attitude. (c) The Eiffel tower when you are taking a photo of it. (d) A car beside you when you are driving. The best option is (b). Since the distance of the object is far away (an airplane typically flying at around 30,000 feet or 10,000 meters), the relative size of the object on the ground in the direction of depth is not significant ( Z Δ Z ). Hence, a weak perspective camera model can approximate the actual perspective camera model with negligible error.
2D-1D Weak Perspective Projection Weak Perspective Projection Perspective Projection X Δ Z x C orthographic projection scaling z -axis
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Perspective Projection vs. Orthographic Projection C f x F x y z X f = ∞ x X Camera center at a finite point. Camera center at infinite and focal length is infinite. perspective camera weak perspective camera (affine camera)
Perspective Projection vs. Orthographic Projection perspective camera x = X / Z y = Y / Z x = X y = Y Scaling factor is the depth. Scaling factor is arbitrary. weak perspective camera (affine camera)
Image Courtesy: Richard Hartley and Andrew Zisserman
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question 1 In which of the following scenarios can you use a weak perspective camera model for the target object? (a) A Squirrel passing quickly in front of you. (b) An airplane flying at a very high attitude. (c) The Eiffel tower when you are taking a photo of it. (d) A car beside you when you are driving. The best option is (b). Since the distance of the object is far away (an airplane typically flying at around 30,000 feet or 10,000 meters), the relative size of the object on the ground in the direction of depth is not significant ( Z Δ Z ). Hence, a weak perspective camera model can approximate the actual perspective camera model with negligible error.
Question 5 A point Q is observed in a known (i.e., intrinsic parameters are calibrated) affine camera with image plane M 1 . Then you translate the camera parallel to the image plane with a known translation to a new image plane M 2 and observe it again. (a) Is it possible to find the depth of the 3D point Q in this scenario? Why? ( No. Q is on a line parallel to the principal axis. Any point on this line is projected to the same point. ) (b) What if this is a perspective camera? Is it possible to find the depth of the 3D point Q in this scenario? Why? ( Yes. The depth can be triangulated as long as the point has finite depth. )
Stereo Perspective Camera vs. Stereo Affine Camera perspective camera affine camera Depth can be triangulated by a rectified stereo camera system. Depth cannot be determined by a rectified stereo camera system.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question 2 What is the biggest benefit of image rectification for stereo matching? (a) Image contents are uniformly scaled to a desirable size. (b) All epipolar lines intersect at the vanishing point. (c) All epipolar lines are perfectly vertical. (d) All epipolar lines are perfectly horizontal. (e) Epipoles are moved to the center of the image. The best option is (d) or (c). For stereo matching, i.e., searching correspondences along the epipolar lines on the images, it is desirable that the epipolar lines are just rows or columns. Image rectification is essentially applying a perspective transformation on the image , such that the transformed images are in row or column correspondence.
Stereo Rectification There is always a pair of rotations (R 1 , R 2 ) that can make a pair of cameras in row correspondence. Computing (R 1 , R 2 ) and the pixels mapping is stereo rectification . Note that (R 1 , R 2 ) is not unique. A common method (Bouguet's algorithm) uses two rotations for each camera to minimize distortion. R 1 R 2 R, t I, t "
Dense Stereo Correspondences Dense stereo correspondence is matching each pixel on two views ( I , I ') to calculate a disparity image I d , i.e., for each pixel ( x , y ) on the first view, find the pixel ( x ', y ) on the second view that has the best match. This is usually done by variants of block matching . the first view the second view disparity ground truth Image Courtesy: University of Tsukuba, CVLAB
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Block Match Given a window size W and a maximum disparity n , for every pixel ( x , y ) on the first view, search the same row on the second view from x up to n pixels, for each searched pixel, compute a distance with two blocks of W by W pixels, find x ' that has the smallest distance. Typical distance metrics are sum of absolute differences (SAD), sum of squared differences (SSD), cross relation, Birchfield–Tomasi dissimilarity, etc. ( x , y ) ( x , y ) matching score n W W x ' first view second view
Question 3 (a) What is the rank of the fundamental matrix? ( The rank is 2. ) (b) In the 8-point algorithm, what math technique is used to enforce the estimated fundamental matrix to have the proper rank? ( SVD, by setting the smallest singular value to zero. ) If you consider the vector space R 3×3 , andy fundamental matrix F is definitely in this space. However, not all matrix in R 3×3 is a fundamental matrix. All fundamental matrices form a manifold Ω . In the 8-point algorithm, the solution F is a matrix in R 3×3 but not necessarily on the manifold Ω . Hence, it is desired to find a matrix F' on the manifold which is "closest" to F . Here "closest" means minimizing the Frobenius norm ||F - F'|| , and the solution can be obtained through SVD.
Enforcing Singularity Constraint Usually we have more point correspondences than eight and A has full rank, i.e., the solution F or E is nonsingular, and we want to enforce the singularity constraint. Consider F has SVD F = UΣV T , Σ = diag{σ 1 , σ 2 , σ 3 } , let F' = U Σ' V T , Σ' = diag{ σ 1 , σ 2 , 0 } , and F' is the actual solution. Consider E has SVD E = UΣV T , Σ = diag{σ 1 , σ 2 , σ 3 } , let E' = U Σ' V T , Σ' = diag{ σ , σ , 0} , where σ = (σ 1 + σ 2 ) / 2 , and E' is the actual solution.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question 4 A robot equipped with a camera and people-detection software is monitoring the entrance to a building, looking for approaching people. While testing the people-detection software, the system was presented with a training set of images; some with people present, and some without. Given an image, define A = the image actually has a person , and A = the image has no person . define B = the robot detected a person on the image .
Question 4 Out of 100 images with people, the system successfully detected people in 75 cases. P(B|A) = 0.75 Out of another set of 100 images without people, the system erroneously detected a person in 20 cases. P(B| A) = 0.2 At the start of its operation, it assumes no prior knowledge of the likelihood that a person is in view or not. P(A) = P( A) = 0.5 For the first image acquired by the robot, the people-detection system reports that a person is present. Now event B happened.
Question 4 Given this observation, what is the probability that a person actually is visible in this image? Derive the formula for the conditional probability and calculate the actual percentage. This is essentially asking P(A|B) . Using Bayes' rule, P(A|B) = P(B|A) P(A) / P(B) . Here the likelihood P(B|A) = 0.75 , and the prior P(A) = 0.5 . The marginal probability P(B) = P(B|A) P(A) + P(B| A) P( A) = 0.75 * 0.5 + 0.2 * 0.5 = 0.475 . Finally, P(A|B) = P(B|A) P(A) / P(B) = 0.75 * 0.5 / 0.475 = 0.789 .
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Thank you! Duo Lu <duolu@asu.edu>, Yezhou Yang <yz.yang@asu.edu>