This is my blog.

## Suggestion

• 多读文献
• 编程
• 培养自己的直觉，并且相信自己的直觉

## What is Neural Nerworks

### 激活函数

• 线性整流函数(Rectified linear unit) RELU

但是在负数的时候，导数为0；因此出现了leaky Relu，eg.max(0.01z,z), the 0.01 can change，但后者不常使用

max(0,y)

• 达到数据中心化的效果

一般来说，效果比sigmoid更好

### Unstructured Data

The iterative process of developing DL systems： Idea->Code->Experiment->Idea->….

• Standard NN
• CNN
• RNN

## Logistics Regression as a neural network

It is a Binary classification （二分类问题）

Regression Model-Lesson6提到过一些概念，现大致重述如下：

### optimization

vectorization

pythonnumpy包中有许多内置函数，可以进行并行运算，比for循环更快

You can get time running time of one programm is to use time.time() method in package time. 单位10^{-6}s

python中，若是一个向量加上一个常数，那么就等同于一个向量加上一个同样大小的向量，并且这个向量的值都是这个常数，这叫作broadcasting

### Computation graph

One step of backward propagation on a computation graph yields derivative of final output variable.后向计算导数，优化代价函数

## Standard NN(标准神经网络)

The general methodology to build a Neural Network is to:

1. Define the neural network structure ( # of input units, # of hidden units, etc).
2. Initialize the model’s parameters
3. Loop:
• Implement forward propagation
• Compute loss
• Implement backward propagation to get the gradients
• Update parameters (gradient descent)

Logistic Regression doesn’t have a hidden layer. If you initialize the weights to zeros, the first example x fed in the logistic regression will output zero but the derivatives of the Logistic Regression depend on the input x (because there’s no hidden layer) which is not zero. So at the second iteration, the weights values follow x’s distribution and are different from each other if x is not a constant vector.

the “cache” records values from the forward propagation units and sends it to the backward propagation units because it is needed to compute the chain rule derivatives.

### Hyperparameters 超参数

• learning rate
• Iteration
• Hidden layers
• Hidden units
• choice of activation function

• Momentum term
• Mini batch size
• Regularizations Parameters
• ……

we cannot avoid a for loop iterating over the layers

To compute the function using a shallow network circuit, you will need a large network (where we measure size by the number of logic gates in the network), but to compute it using a deep network circuit, you need only an exponentially smaller network.