这是 吴恩达 的课程《神经网络与深度学习》第一个编程练习作业。

在这个作业中我们将不会使用任何循环(for/while),除非它明确说明要显式的使用循环。

将会学习到:

  • 建立一个学习算法的基本结构,它包括:
    • 初始化参数
    • 计算成本函数(cost function)以及它的梯度
    • 使用优化算法 -- 梯度下降法(gradient descent)
  • 将上面提到的三个函数放入我们的主函数 model 中


1、需要用到的包

  • Python 3.7
  • h5py 2.9
  • numpy 1.17
  • matplotlib 3.1
  • from lr_utils import load_dataset

以上用到的包是在 conda 环境中运行的。lr_utils 负责读取数据集。


2、问题描述以及数据处理

给定的数据集 "data.h5" 包括了:

  • 训练集\((m_{train})\)图片已经做好了标记,0 代表图片不是猫,1 代表图片是猫
  • 测试集\((m_{test})\)图片同样做了标记
  • 每张图片的大小是 \((num_{px}, num_{px}, 3)\),3 代表的是 RGB 颜色通道。因此每张图片是一个 \(num_{px} * num_{px}\) 大小的矩形

这里需要建立一个简单的图片识别算法去区分哪些图片是猫。

来看看给定的数据:

1
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

带有 _orig 后缀的代表我们之后需要对它们进行预处理,之后我们就会得到预处理之后的数据集 train_set_xtest_set_x

train_set_xtest_set_x 用数组储存了图片,具体可以通过下面例子展示:

1
2
3
index = 25
plt.imshow(train_set_x_orig[index])
print ("y = " + str(train_set_y[:,index]) + ", it's a '" + classes[np.squeeze(train_set_y[:,index])].decode("utf-8") + "' picture.")

输出:

深度学习中的很多 bug 都是因为 矩阵/向量 的大小不匹配导致的,认真检查好它们的大小将会让你避免很多 bug

下面的方法能让我们获取到:

  • \(m_{train}\)(训练集的数量)
  • \(m_{test}\)(测试集的数量)
  • \(num_{px}\)(图片的宽高)

记住 train_set_x_orig\((m_{train}, num_{px}, num_{px}, 3)\) 大小的 numpy 数组,你可以通过 train_set_x_orig.shape[0] 获取 \(m_{train}\) 的大小。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
### START CODE HERE ### (≈ 3 lines of code)
m_train = train_set_y.shape[1]
m_test = test_set_y.shape[1]
num_px = train_set_x_orig.shape[1]
### END CODE HERE ###

print ("Number of training examples: m_train = " + str(m_train))
print ("Number of testing examples: m_test = " + str(m_test))
print ("Height/Width of each image: num_px = " + str(num_px))
print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
print ("train_set_x shape: " + str(train_set_x_orig.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x shape: " + str(test_set_x_orig.shape))
print ("test_set_y shape: " + str(test_set_y.shape))

输出:

1
2
3
4
5
6
7
8
Number of training examples: m_train = 209
Number of testing examples: m_test = 50
Height/Width of each image: num_px = 64
Each image is of size: (64, 64, 3)
train_set_x shape: (209, 64, 64, 3)
train_set_y shape: (1, 209)
test_set_x shape: (50, 64, 64, 3)
test_set_y shape: (1, 50)


为了进行后面的算法学习,我们对图片做以下预处理

\((num_{px}, num_{px}, 3)\) 大小的图片变成 \((num_{px}*num_{px}*3, 1)\),相当于把图片由矩阵变为了向量

1
2
3
4
5
6
7
8
9
10
11
12
# Reshape the training and test examples

### START CODE HERE ### (≈ 2 lines of code)
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T
### END CODE HERE ###

print ("train_set_x_flatten shape: " + str(train_set_x_flatten.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x_flatten shape: " + str(test_set_x_flatten.shape))
print ("test_set_y shape: " + str(test_set_y.shape))
print ("sanity check after reshaping: " + str(train_set_x_flatten[0:5,0]))

输出:

1
2
3
4
5
train_set_x_flatten shape: (12288, 209)
train_set_y shape: (1, 209)
test_set_x_flatten shape: (12288, 50)
test_set_y shape: (1, 50)
sanity check after reshaping: [17 31 56 22 33]

最后,我们要将数据进行标准化,对于图片数据,我们可以将它除以 255 (RGB通道的最大值)。

1
2
train_set_x = train_set_x_flatten / 255.
test_set_x = test_set_x_flatten / 255.


3、学习算法的一般结构

接下来就是设计我们的算法去区分猫。

这里将会用神经网络建立 Logistic 回归,不得不说,Logistic 是一种非常简单的神经网络。如下图所示:

算法的数学语言描述:

对于样本\(x^{(i)}:\) \[ z^{(i)} = w^Tx^{(i)}+b\\\ \hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\\\ \mathcal{L(a^{(i)}, y^{(i)})} = -y^{(i)}log(a^{(i)})-(1-y^{(i)})log(1-a^{(i)}) \] 最后所有样本的损失函数为: \[ J = \frac{1}{m}\sum_{i=1}^{m}\mathcal{L(a^{(i)}, y^{(i)})} \] 这次的练习,我们会进行以下步骤:

  • 初始化模型参数
  • 通过最小化成本函数去学习模型参数
  • 使用学习到的参数去做预测
  • 分析以及总结


4、实现算法的每一部分

这是我们建立神经网络的主要步骤:

  • 定义模型的结构

  • 初始化模型参数

  • 循环:

    • 计算当前成本函数(前向传播)
    • 计算当前梯度(反向传播)
    • 更新参数(梯度下降)

4.1 有用的函数

sigmoid 函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# GRADED FUNCTION: sigmoid

def sigmoid(z):
"""
Compute the sigmoid of z

Arguments:
x -- A scalar or numpy array of any size.

Return:
s -- sigmoid(z)
"""

### START CODE HERE ### (≈ 1 line of code)
s = 1 / (1 + np.exp(-z))
### END CODE HERE ###

return s

测试:

1
2
print ("sigmoid(0) = " + str(sigmoid(0)))
print ("sigmoid(9.2) = " + str(sigmoid(9.2)))

输出:

1
2
sigmoid(0) = 0.5
sigmoid(9.2) = 0.999898970806

4.2 初始化参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# GRADED FUNCTION: initialize_with_zeros

def initialize_with_zeros(dim):
"""
This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.

Argument:
dim -- size of the w vector we want (or number of parameters in this case)

Returns:
w -- initialized vector of shape (dim, 1)
b -- initialized scalar (corresponds to the bias)
"""

### START CODE HERE ### (≈ 1 line of code)
w = np.zeros(shape=(dim, 1))
b = 0
### END CODE HERE ###

assert(w.shape == (dim, 1))
assert(isinstance(b, float) or isinstance(b, int))

return w, b

测试:

1
2
3
4
dim = 2
w, b = initialize_with_zeros(dim)
print ("w = " + str(w))
print ("b = " + str(b))

输出:

1
2
3
w = [[ 0.]
[ 0.]]
b = 0

4.3 前向传播和反向传播

实现 propagate() 函数去计算前向传播和反向传播。

有用的公式: \[ \frac{\partial{J}}{\partial{w}} = \frac{1}{m}X(A-Y)^T\\\ \frac{\partial{J}}{\partial{b}} = \frac{1}{m}\sum_{i=1}^m(a^{(i)}-y^{(i)}) \]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# GRADED FUNCTION: propagate

def propagate(w, b, X, Y):
"""
Implement the cost function and its gradient for the propagation explained above

Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)

Return:
cost -- negative log-likelihood cost for logistic regression
dw -- gradient of the loss with respect to w, thus same shape as w
db -- gradient of the loss with respect to b, thus same shape as b

Tips:
- Write your code step by step for the propagation
"""

m = X.shape[1]

# FORWARD PROPAGATION (FROM X TO COST)
### START CODE HERE ### (≈ 2 lines of code)
A = sigmoid(np.dot(w.T, X) + b) # compute activation
cost = (- 1 / m) * np.sum(Y * np.log(A) + (1 - Y) * (np.log(1 - A))) # compute cost
### END CODE HERE ###

# BACKWARD PROPAGATION (TO FIND GRAD)
### START CODE HERE ### (≈ 2 lines of code)
dw = (1 / m) * np.dot(X, (A - Y).T)
db = (1 / m) * np.sum(A - Y)
### END CODE HERE ###

assert(dw.shape == w.shape)
assert(db.dtype == float)
cost = np.squeeze(cost)
assert(cost.shape == ())

grads = {"dw": dw,
"db": db}

return grads, cost

测试:

1
2
3
4
5
w, b, X, Y = np.array([[1], [2]]), 2, np.array([[1,2], [3,4]]), np.array([[1, 0]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))

输出:

1
2
3
4
dw = [[ 0.99993216]
[ 1.99980262]]
db = 0.499935230625
cost = 6.00006477319

4.4 优化

使用梯度下降进行优化。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# GRADED FUNCTION: optimize

def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
"""
This function optimizes w and b by running a gradient descent algorithm

Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of shape (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
num_iterations -- number of iterations of the optimization loop
learning_rate -- learning rate of the gradient descent update rule
print_cost -- True to print the loss every 100 steps

Returns:
params -- dictionary containing the weights w and bias b
grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.

Tips:
You basically need to write down two steps and iterate through them:
1) Calculate the cost and the gradient for the current parameters. Use propagate().
2) Update the parameters using gradient descent rule for w and b.
"""

costs = []

for i in range(num_iterations):


# Cost and gradient calculation (≈ 1-4 lines of code)
### START CODE HERE ###
grads, cost = propagate(w, b, X, Y)
### END CODE HERE ###

# Retrieve derivatives from grads
dw = grads["dw"]
db = grads["db"]

# update rule (≈ 2 lines of code)
### START CODE HERE ###
w = w - learning_rate * dw # need to broadcast
b = b - learning_rate * db
### END CODE HERE ###

# Record the costs
if i % 100 == 0:
costs.append(cost)

# Print the cost every 100 training examples
if print_cost and i % 100 == 0:
print ("Cost after iteration %i: %f" % (i, cost))

params = {"w": w,
"b": b}

grads = {"dw": dw,
"db": db}

return params, grads, costs

测试:

1
2
3
4
5
6
params, grads, costs = optimize(w, b, X, Y, num_iterations= 100, learning_rate = 0.009, print_cost = False)

print ("w = " + str(params["w"]))
print ("b = " + str(params["b"]))
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))

输出:

1
2
3
4
5
6
w = [[ 0.1124579 ]
[ 0.23106775]]
b = 1.55930492484
dw = [[ 0.90158428]
[ 1.76250842]]
db = 0.430462071679

下面是实现预测函数

  • 计算\(\hat{y}=A=\sigma{(w^TX+b)}\)
  • 将激活函数计算的值转换为 0( <= 0.5)或 1( > 0.5),并储存在 Y_prediction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# GRADED FUNCTION: predict

def predict(w, b, X):
'''
Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)

Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)

Returns:
Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
'''

m = X.shape[1]
Y_prediction = np.zeros((1, m))
w = w.reshape(X.shape[0], 1)

# Compute vector "A" predicting the probabilities of a cat being present in the picture
### START CODE HERE ### (≈ 1 line of code)
A = sigmoid(np.dot(w.T, X) + b)
### END CODE HERE ###

for i in range(A.shape[1]):
# Convert probabilities a[0,i] to actual predictions p[0,i]
### START CODE HERE ### (≈ 4 lines of code)
Y_prediction[0, i] = 1 if A[0, i] > 0.5 else 0
### END CODE HERE ###

assert(Y_prediction.shape == (1, m))

return Y_prediction


5、将上面的函数放入 model 中

现在讲上面实现的函数放入 model 中,搭建我们的模型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# GRADED FUNCTION: model

def model(X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.5, print_cost=False):
"""
Builds the logistic regression model by calling the function you've implemented previously

Arguments:
X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
print_cost -- Set to true to print the cost every 100 iterations

Returns:
d -- dictionary containing information about the model.
"""

### START CODE HERE ###
# initialize parameters with zeros (≈ 1 line of code)
w, b = initialize_with_zeros(X_train.shape[0])

# Gradient descent (≈ 1 line of code)
parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)

# Retrieve parameters w and b from dictionary "parameters"
w = parameters["w"]
b = parameters["b"]

# Predict test/train set examples (≈ 2 lines of code)
Y_prediction_test = predict(w, b, X_test)
Y_prediction_train = predict(w, b, X_train)

### END CODE HERE ###

# Print train/test Errors
print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))


d = {"costs": costs,
"Y_prediction_test": Y_prediction_test,
"Y_prediction_train" : Y_prediction_train,
"w" : w,
"b" : b,
"learning_rate" : learning_rate,
"num_iterations": num_iterations}

return d

进行训练

1
d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)

输出:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Cost after iteration 0: 0.693147
Cost after iteration 100: 0.584508
Cost after iteration 200: 0.466949
Cost after iteration 300: 0.376007
Cost after iteration 400: 0.331463
Cost after iteration 500: 0.303273
Cost after iteration 600: 0.279880
Cost after iteration 700: 0.260042
Cost after iteration 800: 0.242941
Cost after iteration 900: 0.228004
Cost after iteration 1000: 0.214820
Cost after iteration 1100: 0.203078
Cost after iteration 1200: 0.192544
Cost after iteration 1300: 0.183033
Cost after iteration 1400: 0.174399
Cost after iteration 1500: 0.166521
Cost after iteration 1600: 0.159305
Cost after iteration 1700: 0.152667
Cost after iteration 1800: 0.146542
Cost after iteration 1900: 0.140872
train accuracy: 99.04306220095694 %
test accuracy: 70.0 %

训练准确率接近100%。 这说明了模型正在运行并且具有足够的容量来适应训练数据。 测试准确率为70%, 考虑到我们使用的小数据集并且逻辑回归是线性分类器,这个简单模型实际上并不坏。 不用担心,下周你将建立一个更好的分类器!

此外,你会发现该模型显然过度拟合了训练数据。 后面将会学习如何减少过拟合,例如使用正则化。

打印成本函数的变化:

1
2
3
4
5
6
costs = np.squeeze(d['costs'])
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()

可以看到我们的成本函数是在不断减小的,说明我们的模型在不断的学习中。


6、进一步分析

关于学习率的选择

为了让我们梯度下降法很好的工作,学习率的选择会很重要。因为学习率的大小影响着我们的学习速度,如果学习率太大了,可能会错过最优解,太小的话则会增加迭代次数降低效率。

下面来看看不同学习率会对模型产生的影响:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
learning_rates = [0.01, 0.001, 0.0001]
models = {}
for i in learning_rates:
print ("learning rate is: " + str(i))
models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False)
print ('\n' + "-------------------------------------------------------" + '\n')

for i in learning_rates:
plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))

plt.ylabel('cost')
plt.xlabel('iterations')

legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

输出:

不同的学习率会产生不同的成本,从而产生不同的预测结果。

如果学习率太大(0.01),则成本可能上下波动甚至可能发散。

较低的成本并不意味着更好的模型。 你必须检查是否有可能发生了过拟合。 当训练精度远高于测试精度时,就会发生这种情况。

在深度学习中,我们通常建议您:

  • 选择更好地降低成本函数的学习率。
  • 如果您的模型发生过拟合,请使用其他技术来减少过拟合。


7、使用自己的图片测试

网上找了一张图:

进行预测:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from PIL import Image
# 读取图片并将图片转换为 (num_px, num_px, 3) 大小
my_image = Image.open('my.jpeg')
my_image = my_image.convert("RGB")
my_image = my_image.resize((num_px, num_px),Image.ANTIALIAS)

# 转换为 numpy 中的数组
image = np.array(my_image)
my_image = image.reshape((1, num_px*num_px*3)).T

# 预测
my_predicted_image = predict(d["w"], d["b"], my_image)

# 显示处理后的图片
plt.imshow(image)
print("y = " + str(np.squeeze(my_predicted_image)) + ", your algorithm predicts a \"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") + "\" picture.")

输出: