Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
GEORGIA TECH’S CS 6476 COMPUTER VISION
Problem Set 3: Introduction to AR and Image Mosaic
January 17, 2025
ASSIGNMENT DESCRIPTION
Description
Problem Set 3 introduces basic concepts behind Augmented Reality, using the contents that you will learn in modules 3A-3D and 4A-4C: Projective geometry, Corner detection, Perspective imaging, and Homographies, respectively. Additionally, you will also learn how to insert images within images and stitch multiple images together.
Learning Objectives
Problem Overview
Refer to this problem set’s autograder post for a list of banned function calls.
Please do not use absolute paths in your submission code. All paths should be relative to the submission directory. Any submissions with absolute paths are in danger of receiving apenalty!
Obtaining the Starter Files:
Obtain the starter code from the PS3 GitHub repo.
Programming Instructions
Write-up Instructions
Create ps3_report.pdf - a PDF file that shows all your output for the problem set, including images labeled appropriately (by filename, e.g. ps3-1-a-1.png) so it is clear which section they are for and the small number of written responses necessary to answer some of the questions (as indicated). For a guide as to how to showcase your results, please refer to the Latex template for PS3.
Two assignments have been created on Gradescope: one for the report - PS3_report, and the other for the code - PS3_code.
- Report: the report (PDF only) must be submitted to the PS3_report assignment.
- Code: all files must be submitted to the PS3_code assignment. DO NOT upload zipped folders or any sub-folders, please upload each file individually. Drag and drop all files into Gradescope.
- You can only submit to the autograder 10 times in an hour. You’ll receive a message like "You have exceeded the number of submissions in the last hour. Please wait for 36.0 mins before you submit again." when you exceed those 10 submissions. You’ll also receive a message "You can submit 8 times in the next 53.0 mins" with each submission so that you may keep track of your submissions.
- If you wish to modify the autograder functions, create a copy of those functions and DO NOT mess with the original function call.
Write-up Instructions
The assignment will be graded out of 100 points. Only the last submission before the time limitwill be considered. The code portion (autograder) represents 60% of the grade and the report the remaining 40%.
The images included in your report must be generated using experiment.py. This file should be set to run as is to verify your results. Your report grade will be affected if we cannot reproduceyour output images.
The report grade breakdown is shown in the question heading. As for the code grade, you will be able to see it in the console message you receive when submitting.
A glass/windshield manufacturer wants to develop an interactive screen that can be used in cars and eyeglasses. They have partnered with a billboard manufacturer in order to render certain marketing products according to each customer’s preferences.
Their goal is to detect four points (markers) currently present in the screen’s field-of-view andinsert an image or video in the scene. To help with this task, the advertising company is installing blank billboards with four distinct markers, which determine the area’s intended fourcorners. The advertising company plans to insert a target image/video into this space.
1 MARKER DETECTION IN A SIMULATED SCENE [40]
The first task is to identify the markers for this Augmented Reality exercise. In real practice, markers can be used (in the form of unique pictures) that stand out from the background of an image. Below is an image with four markers.
Notice that they contain a cross-section bounded by a circle. The cross-section is useful in that it forms a distinguished corner. In this section, you will create a function/set of functions that can detect these markers, as shown above. You will use the images provided to detect the (x, y) center coordinates of each of these markers in the image. The position should be represented by the center of the marker (where the cross-section is). To approach this problem you should consider using techniques like detecting circles in the image, detecting corners and/or detecting a template.
You will use the function mark_location(image, pt) in experiment.py to create a resulting im age that highlights the center of each marker and overlays the marker coordinates in the image. Each marker should present its location similar to this:
Images like the one above may not be that hard to solve. However, in a real-life scene, it proves to be much more difficult. Make sure your methods are robust enough to also locate the markers in images like the one below, where there could be other objects in the scene:
Let’s now assume there is “noise” in the scene (i.e. rain, fog, etc.).
Report: This part will be graded on autograder. Do not include this part in your report.
Now that you have a working method to detect markers in simulated scenes, you will adapt it to identify these same markers in real scenes like the image shown below. Use the images pro vided to essentially repeat the task of section 1 above and draw a box (four 1-pixel wide lines, RED color) where the box corners touch the marker centers.
Code: Complete draw_box(image, markers)
Now that you know where the billboard markers are located in the scene, we want to add the marketing image. The advertising company requires that their client’s billboard image is visible from all possible angles since you are not just driving straight into the advertisements. Unphased, you know enough about computer vision to introduce projective geometry. The next task will use the information obtained in the previous section to compute a transformation matrix H. This matrix will allow you to project a set of points (x, y) to another plane represented by the points (x’, y’) in a 2D view. In other words, we are looking at the following operation:
In this case, the 3x3 matrix is a homography, also known as a perspective transform or projective transform. There are eight unknowns, a through h, and i is 1. If we have four pairs of corresponding (u,v) ⇐⇒ (u’,v’) points, we can solve for the homography.
The objective here is to insert an image in the rectangular area that the markers define. This insertion should be robust enough to support cases where the markers are not in an orthogonal plane from the point of view and present rotations. Here are two examples of what you should achieve:
When implementing project_imageA_onto_imageB() you will have to make the design choice between forward or backward warping. To make the best choice, you should test both ap proaches and comment in the report on what helped you choose one method over the other. (Note: to better see differences between the two methods you should pick a marketing image with low resolution).
• test_get_corners_list()• find_four_point_transform(src_points, dst_points)• project_imageA_onto_imageB(imageA, imageB, homography)
Report: Report what warping technique you have used and comment on what led you to choose this method.
In this part, you will work with a short video sequence of a similar scene. When processing videos, you will read the input file and obtain images (frames). Once the image is obtained, you will apply the same concept as explained in the previous sections. Unlike the static image, the input video will change in translation, rotation, and perspective. Additionally, there may be cases where a few markers are partially visible. Finally, you will assemble this collection of modified images into a new video. Your output must render each marker position relative to the current frame coordinates.
Besides making all the necessary modifications to make your code more robust, you will complete a function that outputs a video frame generator. This function is almost complete and it is placed so that you can learn how videos are read using OpenCV. Follow the instructions placed in ps3.py.
• First we will start with the following videos.
– Input: ps3-4-a.mp4.– Input: ps3-4-b.mp4.– Output: ps3-4-a-1.png, ps3-4-a-2.png, ps3- 4-a-3.png, ps3-4-a-4.png, ps3-4-a-5.png, ps3- 4-a-6.png
• Now work with noisy videos:
– Input: ps3-4-c.mp4.– Input: ps3-4-d.mp4.– Output: ps3-4-b-1.png, ps3-4-b-2.png, ps3- 4-b-3.png, ps3-4-b-4.png, ps3-4-b-5.png, ps3- 4-b-6.png
• First we will start with the following videos.
– Input: ps3-4-a.mp4– Input: ps3-4-b.mp4
• Now work with noisy videos:
– Input: ps3-4-c.mp4– Input: ps3-4-d.mp4 - Frames to record: 207, 367, and 737 -– Output: ps3-5-b-4.png, ps3-5-b-5.png, ps3- 5-b-6.png
Report: In order to grade your implementation, you should extract a few frames from your last generated video and add them to the corresponding slide in your report.
In the next few tasks, you will be reusing the tools that you have built to stitch together 2 images of the same object from different viewpoints to create a combined panorama.
In this task, you will be completing the code to perform the final image stitching and create the output mosaic. So far, you have calculated the homography transform from one image to the other. Use perspective transformation to stitch the 2 images together.
NOTE: Ensure that you stitch (or attach) the destination image onto the source image and not the other way around. This is purely for the purpose of matching the convention on the auto grader while evaluating your code.
• image_warp_inv()• output_mosaic()
• gradients()• second_moments()• harris_response_map()• nms_maxpool()• harris_corner()
Note, this problem is extra credit for the CS 4476 undergraduate section, but required for the graduate CS 6476 section.
Code: There is no template provided for this task and you are free to implement it as creatively as you like. There is also no autograder component for this section.
• Place in the report the two generated panorama images (one using RANSAC, and the other without using RANSAC).• Comment on the quality difference between the two outputs and how it relates to the importance of choosing the correct correspondence points for the image.