This is my blog.

CS231A笔记，有空就看看，然后更新这里啦！

（未完待续）

# Note

## Camera Models

### Pinhole cameras

The aperture is referred to as the pinhole O or center of the camera.

The distance between the image plane and the pinhole O is the focal length f.

Sometimes, the retinal plane is placed between O and the 3D object at a distance f from O. In this case, it is called the virtual image or virtual retinal plane.

$P = [x\ y\ z]^T$ be a point on some 3D object visible to the pinhole camera. P will be mapped or projected onto the image plane Π’ , resulting in point $P’ = [x’\ y’]^T$ . Similarly, the pinhole itself can be projected onto the image plane, giving a new point C”.

The line deﬁned by C” and O is called the optical axis of the camera system.

As the aperture size decreases, the image gets sharper, but darker.

### Cameras and lenses

In modern cameras, the above conﬂict between crispness and brightness is mitigated by using lenses, devices that can focus or disperse light.

The corresponding projection into the image will be blurred or out of focus.

Because the paraxial refraction model(近轴折射模型) approximates using the thin lens assumption, a number of aberrations can occur. The most common one is referred to as radial distortion, which causes the image magniﬁcation to decrease or increase as a function of the distance to the optical axis. We classify the radial distortion as pincushion distortion when the magniﬁcation increases and barrel distortion when the magniﬁcation decreases. Radial distortion is caused by the fact that diﬀerent portions of the lens have diﬀering focal lengths.

### Going to digital image space

k, l whose units would be something like $\frac{pixels}{cm}$, correspond to the change of units in the two axes of the image plane. Note that k and l may be diﬀerent because the aspect ratio of the unit element is not guaranteed to be one.

If k = l, we often say that the camera has square pixels.

we see that this projection $P → P’$ is not linear,

Note that the equality between a vector and its homogeneous coordinates only occurs when the ﬁnal coordinate equals one.

The matrix K is often referred to as the camera matrix. Two parameters are currently missing from our formulation: skewness and distortion. Most cameras have zero-skew, but some degree of skewness may occur because of sensor manufacturing errors.

Deriving the new camera matrix accounting for skewness is outside the scope of this class and we give it to you below:

All parameters contained in the camera matrix K are the intrinsic parameters, which change as the type of camera changes. The extrinsic paramters include the rotation and translation, which do not depend on the camera’s build.

### Camera Calibration

This problem of estimating the extrinsic and intrinsic camera parameters is known as camera calibration.

We set up a linear system of equations from n correspondences such that for each correspondence $P_i , p_i$ and camera matrix M whose rows are $m_1,m_2,m_3$:

Given n of these corresponding points, the entire linear system of equations becomes

When 2n > 11, our homogeneous linear system is overdetermined.

We know that the camera matrix has 11 unknown parameters. This means that we need at least 6 correspondences to solve this.

If we let $P = UDV^T$ , then the solution to the above minimization is to set m equal to the last column of V . We know our SVD-solved M is known up to scale, which means that the true values of the camera matrix are some scalar multiple of M:

Here, $r_1^T,r_2^T$ and $r_3^T$ are the three rows of R.

Solving for the intrinsics gives

The extrinsics are

### Handling Distortion in Camera Calibration

Often, distortions are radially symmetric because of the physical symmetry of the lens. We model the radial distortion with an isotropic transforma-tion:

We get

And before we know

Similar to before, this gives a matrix-vector product that we can solve via SVD:

### Rigid Transformations

One intuitive way to think of rotations is how much we rotate around each degree of freedom, which is often referred to as Euler angles. However, this methodology can result in what is known as singularities, or gimbal lock, in which certain conﬁgurations result in a loss of a degree of freedom for the rotation.

One way to prevent this is to use rotation matrices, which are a more general form of representing rotations. Rotation matrices are square, orthogonal matrices with determinant one.

we can represent a rotation α, β, γ around each of the respective axes as follows:

Finally, if we want to scale the vector in certain directions by some amount $S_x,S_y,S_z$ , we can construct a scaling matrix

Therefore, if we want to scale a vector, then rotate, then translate, our ﬁnal transformation matrix would be:

projective transformations occur when the ﬁnal row of T is not $\left[\begin{matrix}0&0&0&1\end{matrix}\right]$

### Different Camera Models

In the weak perspective model, points are ﬁrst projected to the reference plane using orthogonal projection and then projected to the image plane using a projective transformation.

Overall, weak perspective models result in much simpler math, at the cost of being somewhat imprecise. However, it often yields results that are very accurate when the object is small and distant from the camera.