CS231A

This is my blog.

CS231A笔记,有空就看看,然后更新这里啦!

(未完待续)

Note

Camera Models

Pinhole cameras

The aperture is referred to as the pinhole O or center of the camera.

The distance between the image plane and the pinhole O is the focal length f.

Sometimes, the retinal plane is placed between O and the 3D object at a distance f from O. In this case, it is called the virtual image or virtual retinal plane.

$P = [x\ y\ z]^T$ be a point on some 3D object visible to the pinhole camera. P will be mapped or projected onto the image plane Π’ , resulting in point $P’ = [x’\ y’]^T$ . Similarly, the pinhole itself can be projected onto the image plane, giving a new point C”.

The line defined by C” and O is called the optical axis of the camera system.

As the aperture size decreases, the image gets sharper, but darker.

Cameras and lenses

In modern cameras, the above conflict between crispness and brightness is mitigated by using lenses, devices that can focus or disperse light.

The corresponding projection into the image will be blurred or out of focus.

Because the paraxial refraction model(近轴折射模型) approximates using the thin lens assumption, a number of aberrations can occur. The most common one is referred to as radial distortion, which causes the image magnification to decrease or increase as a function of the distance to the optical axis. We classify the radial distortion as pincushion distortion when the magnification increases and barrel distortion when the magnification decreases. Radial distortion is caused by the fact that different portions of the lens have differing focal lengths.

Going to digital image space

k, l whose units would be something like $\frac{pixels}{cm}$, correspond to the change of units in the two axes of the image plane. Note that k and l may be different because the aspect ratio of the unit element is not guaranteed to be one.

If k = l, we often say that the camera has square pixels.

we see that this projection $P → P’$ is not linear,

Note that the equality between a vector and its homogeneous coordinates only occurs when the final coordinate equals one.

The matrix K is often referred to as the camera matrix. Two parameters are currently missing from our formulation: skewness and distortion. Most cameras have zero-skew, but some degree of skewness may occur because of sensor manufacturing errors.

Deriving the new camera matrix accounting for skewness is outside the scope of this class and we give it to you below:

All parameters contained in the camera matrix K are the intrinsic parameters, which change as the type of camera changes. The extrinsic paramters include the rotation and translation, which do not depend on the camera’s build.

Camera Calibration

This problem of estimating the extrinsic and intrinsic camera parameters is known as camera calibration.

We set up a linear system of equations from n correspondences such that for each correspondence $P_i , p_i$ and camera matrix M whose rows are $m_1,m_2,m_3$:

Given n of these corresponding points, the entire linear system of equations becomes

When 2n > 11, our homogeneous linear system is overdetermined.

We know that the camera matrix has 11 unknown parameters. This means that we need at least 6 correspondences to solve this.

If we let $P = UDV^T$ , then the solution to the above minimization is to set m equal to the last column of V . We know our SVD-solved M is known up to scale, which means that the true values of the camera matrix are some scalar multiple of M:

Here, $r_1^T,r_2^T$ and $r_3^T$ are the three rows of R.

Solving for the intrinsics gives

The extrinsics are

Handling Distortion in Camera Calibration

Often, distortions are radially symmetric because of the physical symmetry of the lens. We model the radial distortion with an isotropic transforma-tion:

We get

And before we know

Similar to before, this gives a matrix-vector product that we can solve via SVD:

Rigid Transformations

One intuitive way to think of rotations is how much we rotate around each degree of freedom, which is often referred to as Euler angles. However, this methodology can result in what is known as singularities, or gimbal lock, in which certain configurations result in a loss of a degree of freedom for the rotation.

One way to prevent this is to use rotation matrices, which are a more general form of representing rotations. Rotation matrices are square, orthogonal matrices with determinant one.

we can represent a rotation α, β, γ around each of the respective axes as follows:

Finally, if we want to scale the vector in certain directions by some amount $S_x,S_y,S_z$ , we can construct a scaling matrix

Therefore, if we want to scale a vector, then rotate, then translate, our final transformation matrix would be:

projective transformations occur when the final row of T is not $\left[\begin{matrix}0&0&0&1\end{matrix}\right]$

Different Camera Models

In the weak perspective model, points are first projected to the reference plane using orthogonal projection and then projected to the image plane using a projective transformation.

Overall, weak perspective models result in much simpler math, at the cost of being somewhat imprecise. However, it often yields results that are very accurate when the object is small and distant from the camera.

Lecture

后记

母亲节快到了呢,还没有想好送什么

每次日子将近的时候,就会越来越纠结

然后 发现自己要干的事情还挺多的呢

加油啦

虽然最近因为夏令营九推的事情,没有什么经验

有点愁,但还是很认真地在做呢

还有两个大题啦!

转载请注明出处,谢谢。

愿 我是你的小太阳

买糖果去喽