深度学习-03-线性神经网络-softmax回归

Gls

2024-03-19 帮助3人

3.4 softmax回归

此前，我们介绍了线性回归。

回归可以用来预测多少的问题。比如预测房屋被出售的价格、棒球队可能获胜的场次...

事实上，还有一类问题是：分类问题，不是问“多少”，而是问“哪一个”：

某个电子邮件是否属于垃圾邮件文件夹？
某个图像描绘的是驴、狗、猫、还是鸡？

3.4.1 分类问题

这里介绍一种表示分类数据的简单方法：独热编码（one-hot-encoding）

独热编码是一个向量，它的分量和类别一样多。类别对应的分量设置为1，其他所有分量设置为0

3.4.2 网络架构

为了估计所有可能类别的条件概率，我们需要一个有多个输出的模型，每个类别对应一个输出。

条件概率：在样本为x的情况下，y=猫的概率。简化为 P(y=猫 | x)
每个类别对应一个输出：表示预测结果可能是“猫”的概率，这里的输出指的是一个标量，那个所有类合起来的输出就是一个向量了。

学新通

3.4.3 全连接层的参数开销

具体来说，对于任何具有d个输入和q个输出的全连接层，参数开销为O(dq)，这个数字在实践中可能高得令人望而却步。幸运的是，将d个输入转换为q个输出的成本可以减少到O(dq/n)，其中超参数n可以由我们灵活指定，以在实际应用中平衡参数节约和模型有效性

3.4.4 softmax运算

学新通

向softmax函数传入一个向量，softmax返回一个向量

softmax运算不会改变未规范化的预测o之间的大小次序，只会确定分配给每个类别的概率。

所以尽管softmax不是一个线性函数，但softmax回归模型仍然是一个线性模型

3.4.5 小批量样本是矢量化

此前，我们是一次传一个样本给模型，这样太慢了，我们能否一次传一批样本给模型进行矢量计算呢？答案是可以的。

学新通

3.4.6 损失函数

学新通

该损失函数名为：交叉熵损失函数。

由于y是一个长度为q的独热编码向量，所以只会保留yi为1的那一项，对应到预测的(y_hat)i

所以损失函数可进一步化简。

3.4.6.2 softmax及其导数

学新通

导数是我们softmax模型分配的概率与实际发生的情况（由独热标签向量表示）之间的差异。

3.4.8 模型预测与评估

在训练softmax回归模型后，给出任何样本特征，我们可以预测每个输出类别的概率。

通常我们使用预测概率最高的类别作为输出类别。如果预测与实际类别（标签）一致，则预测是正确的。

在接下来的实验中，我们将使用精度（accuracy）来评估模型的性能。精度等于正确预测数与预测总数之间的比率。

3.4.9 小结

softmax运算获取一个向量并将其映射为概率。
softmax回归适用于分类问题，它使用了softmax运算中输出类别的概率分布。
交叉熵是一个衡量两个概率分布之间差异的很好的度量，它测量给定模型编码数据所需的比特数。

3.6 softmax回归的从零开始实现

引入的Fashion-MNIST数据集，并设置数据迭代器的批量大小为256。

import torch
from IPython import display
from d2l import torch as d2l

batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

3.6.1 初始化参数模型

num_inputs = 784
num_outputs = 10

W = torch.normal(0, 0.01, size=(num_inputs, num_outputs), requires_grad=True)
b = torch.zeros(num_outputs, requires_grad=True)

3.6.2 定义softmax操作

学新通

# 定义softmax操作
# params: 小批量样本
# return: softmax化后的 概率矩阵
def softmax(X):
    X_exp = torch.exp(X)
    partition = X_exp.sum(1, keepdim=True)
    return X_exp / partition  # 这里应用了广播机制

按列求和，相当于对列进行压缩，最后只剩下一列

3.6.3 定义模型

# 定义模型
# params: 还是softmax运算， 其实返回的还是矩阵
def net(X):
    print(torch.matmul(X.reshape((-1, W.shape[0])), W).detach().numpy())
    return softmax(torch.matmul(X.reshape((-1, W.shape[0])), W)   b)

Tips为什么需要softmax操作？

我们对样本进行预测，最后要得到的是一个预测向量，也就是概率向量。没有经过softmax操作的预测向量，我们称为未规范化的预测向量，我们能否将未规范化的向量作为输出呢？答案是否定的。

一方面，我们要限制这些数字的总和为1，另一方面，它们可能为负值，所以我们要进行softmax操作。

我们可以打印未规范化的矩阵看看数值

np.set_printoptions(threshold=np.inf)  # 不隐藏矩阵输出
print(torch.matmul(X.reshape((-1, W.shape[0])), W).detach().numpy())

学新通

所以需要softmax操作

3.6.4 定义损失函数

Tips 损失函数，因为独热编码的特点，最后是一个数值。

因为有小批量样本中有batch_size个样本，所以最后生成一个维度为batch_size的向量

# 交叉熵损失函数
# params: y_hat： 预测矩阵    维度为batch_size的标签向量
def cross_entropy(y_hat, y):
    # len(y_hat) 求出矩阵的行数,  相当于 batch_size
    return - torch.log(y_hat[range(len(y_hat)), y])

3.6.5 分类精度

# params: y_hat是一个矩阵(也可以是向量)，  y是一个向量
# return: 预测对的个数
def accuracy(y_hat, y):  #@save
    """计算预测正确的数量"""
    if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:  # 如果y_hat是一个矩阵
        y_hat = y_hat.argmax(axis=1)
    cmp = y_hat.type(y.dtype) == y
    return float(cmp.type(y.dtype).sum())

这里定义一个实用程序类Accumulator，用于对多个变量进行累加。

# function：存储变量
class Accumulator:  #@save
    """在n个变量上累加"""
    def __init__(self, n):
        self.data = [0.0] * n

    def add(self, *args):
        self.data = [a   float(b) for a, b in zip(self.data, args)]

    def reset(self):
        self.data = [0.0] * len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]

在下面的evaluate_accuracy函数中，我们在Accumulator实例中创建了2个变量，分别用于存储正确预测的数量和预测的总数量。当我们遍历数据集时，两者都将随着时间的推移而累加。

# 计算在指定数据集上模型的精度
def evaluate_accuracy(net, data_iter):  #@save
    """计算在指定数据集上模型的精度"""
    if isinstance(net, torch.nn.Module):
        net.eval()  # 将模型设置为评估模式
    metric = Accumulator(2)  # 正确预测数、预测总数
    with torch.no_grad():
        for X, y in data_iter:
            metric.add(accuracy(net(X), y), y.numel())
    return metric[0] / metric[1]

3.6.6 训练

在展示训练函数的实现之前，我们定义一个在动画中绘制数据的实用程序类Animator

# 在动画中绘制数据
class Animator:  #@save
    """在动画中绘制数据"""
    def __init__(self, xlabel=None, ylabel=None, legend=None, xlim=None,
                 ylim=None, xscale='linear', yscale='linear',
                 fmts=('-', 'm--', 'g-.', 'r:'), nrows=1, ncols=1,
                 figsize=(3.5, 2.5)):
        # 增量地绘制多条线
        if legend is None:
            legend = []
        d2l.use_svg_display()
        self.fig, self.axes = d2l.plt.subplots(nrows, ncols, figsize=figsize)
        if nrows * ncols == 1:
            self.axes = [self.axes, ]
        # 使用lambda函数捕获参数
        self.config_axes = lambda: d2l.set_axes(
            self.axes[0], xlabel, ylabel, xlim, ylim, xscale, yscale, legend)
        self.X, self.Y, self.fmts = None, None, fmts

    def add(self, x, y):
        # 向图表中添加多个数据点
        if not hasattr(y, "__len__"):
            y = [y]
        n = len(y)
        if not hasattr(x, "__len__"):
            x = [x] * n
        if not self.X:
            self.X = [[] for _ in range(n)]
        if not self.Y:
            self.Y = [[] for _ in range(n)]
        for i, (a, b) in enumerate(zip(x, y)):
            if a is not None and b is not None:
                self.X[i].append(a)
                self.Y[i].append(b)
        self.axes[0].cla()
        for x, y, fmt in zip(self.X, self.Y, self.fmts):
            self.axes[0].plot(x, y, fmt)
        self.config_axes()
        display.display(self.fig)
        d2l.plt.draw()
        # d2l.plt.show()
        d2l.plt.pause(0.001)
        display.clear_output(wait=True)

def train_epoch_ch3(net, train_iter, loss, updater):  #@save
    """训练模型一个迭代周期（定义见第3章）"""
    # 将模型设置为训练模式
    if isinstance(net, torch.nn.Module):
        net.train()
    # 训练损失总和、训练准确度总和、样本数
    metric = Accumulator(3)
    for X, y in train_iter:
        # 计算梯度并更新参数
        y_hat = net(X)
        l = loss(y_hat, y)
        if isinstance(updater, torch.optim.Optimizer):
            # 使用PyTorch内置的优化器和损失函数
            updater.zero_grad()
            # 对损失函数求mean, 这样梯度也会除以 batch_size，
            # 在updater更新参数时，就不用传batch_size了
            l.mean().backward()
            updater.step()
        else:
            # 使用定制的优化器和损失函数
            l.sum().backward()
            updater(X.shape[0]) # X.shape[0]相当于batch_size  用于除以grad梯度
        metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
    # 返回训练损失和训练精度
    return metric[0] / metric[2], metric[1] / metric[2]

def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):  #@save
    """训练模型（定义见第3章）"""
    animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
                        legend=['train loss', 'train acc', 'test acc'])
    for epoch in range(num_epochs):
        train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
        test_acc = evaluate_accuracy(net, test_iter)
        animator.add(epoch   1, train_metrics   (test_acc,))
    train_loss, train_acc = train_metrics
    assert train_loss < 0.5, train_loss
    assert train_acc <= 1 and train_acc > 0.7, train_acc
    assert test_acc <= 1 and test_acc > 0.7, test_acc

至于如何优化参数的模型，还是可以使用小批量随机梯度下降来优化模型的损失函数

def updater(batch_size):
    return d2l.sgd([W, b], lr, batch_size)

完整代码

import numpy as np
import torch
from IPython import display
from d2l import torch as d2l
from d2l.torch import Accumulator


# 定义softmax操作
# params: 小批量样本
# return: softmax化后的 概率矩阵
def softmax(X):
    X_exp = torch.exp(X)
    partition = X_exp.sum(1, keepdim=True)
    return X_exp / partition  # 这里应用了广播机制

# 定义模型
# params: 还是softmax运算， 其实返回的还是矩阵
def net(X):
    print(torch.matmul(X.reshape((-1, W.shape[0])), W).detach().numpy())
    return softmax(torch.matmul(X.reshape((-1, W.shape[0])), W)   b)

# 交叉熵损失函数
# params: y_hat： 预测矩阵    维度为batch_size的标签向量
def cross_entropy(y_hat, y):
    # len(y_hat) 求出矩阵的行数,  相当于 batch_size
    return - torch.log(y_hat[range(len(y_hat)), y])

def updater(batch_size):
    return d2l.sgd([W, b], lr, batch_size)

def train_epoch_ch3(net, train_iter, loss, updater):  #@save
    """训练模型一个迭代周期（定义见第3章）"""
    # 将模型设置为训练模式
    if isinstance(net, torch.nn.Module):
        net.train()
    # 训练损失总和、训练准确度总和、样本数
    metric = Accumulator(3)
    for X, y in train_iter:
        # 计算梯度并更新参数
        y_hat = net(X)
        l = loss(y_hat, y)
        if isinstance(updater, torch.optim.Optimizer):
            # 使用PyTorch内置的优化器和损失函数
            updater.zero_grad()
            # 对损失函数求mean, 这样梯度也会除以 batch_size， 在updater更新参数时，就不用传batch_size了
            l.mean().backward()
            updater.step()
        else:
            # 使用定制的优化器和损失函数
            l.sum().backward()
            updater(X.shape[0])
        metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
    # 返回训练损失和训练精度
    return metric[0] / metric[2], metric[1] / metric[2]

def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):  #@save
    """训练模型（定义见第3章）"""
    animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
                        legend=['train loss', 'train acc', 'test acc'])
    for epoch in range(num_epochs):
        train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
        test_acc = evaluate_accuracy(net, test_iter)
        animator.add(epoch   1, train_metrics   (test_acc,))
    train_loss, train_acc = train_metrics
    assert train_loss < 0.5, train_loss
    assert train_acc <= 1 and train_acc > 0.7, train_acc
    assert test_acc <= 1 and test_acc > 0.7, test_acc


def predict_ch3(net, test_iter, n=6):  #@save
    """预测标签（定义见第3章）"""
    for X, y in test_iter:
        break
    trues = d2l.get_fashion_mnist_labels(y)
    preds = d2l.get_fashion_mnist_labels(net(X).argmax(axis=1))
    titles = [true  '\n'   pred for true, pred in zip(trues, preds)]
    d2l.show_images(
        X[0:n].reshape((n, 28, 28)), 1, n, titles=titles[0:n])
    d2l.plt.show() # 对图进行展示，  要不然不会展示

# params: y_hat是一个矩阵(也可以是向量)，  y是一个向量
# return: 预测对的个数
def accuracy(y_hat, y):  #@save
    """计算预测正确的数量"""
    if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:  # 如果y_hat是一个矩阵
        y_hat = y_hat.argmax(axis=1)
    cmp = y_hat.type(y.dtype) == y
    return float(cmp.type(y.dtype).sum())

# 计算在指定数据集上模型的精度
def evaluate_accuracy(net, data_iter):  #@save
    """计算在指定数据集上模型的精度"""
    if isinstance(net, torch.nn.Module):
        net.eval()  # 将模型设置为评估模式
    metric = Accumulator(2)  # 正确预测数、预测总数
    with torch.no_grad():
        for X, y in data_iter:
            metric.add(accuracy(net(X), y), y.numel())
    return metric[0] / metric[1]

# function：存储变量
class Accumulator:  #@save
    """在n个变量上累加"""
    def __init__(self, n):
        self.data = [0.0] * n

    def add(self, *args):
        self.data = [a   float(b) for a, b in zip(self.data, args)]

    def reset(self):
        self.data = [0.0] * len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]
# 在动画中绘制数据
class Animator:  #@save
    """在动画中绘制数据"""
    def __init__(self, xlabel=None, ylabel=None, legend=None, xlim=None,
                 ylim=None, xscale='linear', yscale='linear',
                 fmts=('-', 'm--', 'g-.', 'r:'), nrows=1, ncols=1,
                 figsize=(3.5, 2.5)):
        # 增量地绘制多条线
        if legend is None:
            legend = []
        d2l.use_svg_display()
        self.fig, self.axes = d2l.plt.subplots(nrows, ncols, figsize=figsize)
        if nrows * ncols == 1:
            self.axes = [self.axes, ]
        # 使用lambda函数捕获参数
        self.config_axes = lambda: d2l.set_axes(
            self.axes[0], xlabel, ylabel, xlim, ylim, xscale, yscale, legend)
        self.X, self.Y, self.fmts = None, None, fmts

    def add(self, x, y):
        # 向图表中添加多个数据点
        if not hasattr(y, "__len__"):
            y = [y]
        n = len(y)
        if not hasattr(x, "__len__"):
            x = [x] * n
        if not self.X:
            self.X = [[] for _ in range(n)]
        if not self.Y:
            self.Y = [[] for _ in range(n)]
        for i, (a, b) in enumerate(zip(x, y)):
            if a is not None and b is not None:
                self.X[i].append(a)
                self.Y[i].append(b)
        self.axes[0].cla()
        for x, y, fmt in zip(self.X, self.Y, self.fmts):
            self.axes[0].plot(x, y, fmt)
        self.config_axes()
        display.display(self.fig)
        d2l.plt.draw()
        # d2l.plt.show()
        d2l.plt.pause(0.001)
        display.clear_output(wait=True)

if __name__ == '__main__':
    batch_size = 256
    # 训练集迭代器、测试集迭代器
    train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

    for X, y in test_iter:
        break
    np.set_printoptions(threshold=np.inf)  # 不隐藏矩阵输出
    # print(X.numpy())


    # 展平矩阵， 对应784维特征，  10个类别输出
    num_inputs = 784
    num_outputs = 10
    # 初始化参数W、b
    W = torch.normal(0, 0.01, size=(num_inputs, num_outputs), requires_grad=True)
    b = torch.zeros(num_outputs, requires_grad=True)

    lr = 0.1  # 学习率
    num_epochs = 10   # 迭代次数
    # 训练模型
    train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, updater)
    # 预测
    predict_ch3(net, test_iter)

3.6.8 小结

借助softmax回归，我们可以训练多分类的模型。
训练softmax回归循环模型与训练线性回归模型非常相似：先读取数据，再定义模型和损失函数，然后使用优化算法训练模型。大多数常见的深度学习模型都有类似的训练过程。

3.7 softmax回归的简洁实现

import torch
from torch import nn
from d2l import torch as d2l

def init_weights(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, std=0.01)

if __name__ == '__main__':

    batch_size = 256
    train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
    # PyTorch不会隐式地调整输入的形状。因此，
    # 我们在线性层前定义了展平层（flatten），来调整网络输入的形状
    net = nn.Sequential(nn.Flatten(), nn.Linear(784, 10))
    net.apply(init_weights)

    loss = nn.CrossEntropyLoss(reduction='none')

    trainer = torch.optim.SGD(net.parameters(), lr=0.1)

    num_epochs = 10
    d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)

    d2l.predict_ch3(net, test_iter)



    # 想问一下这里的softmax归一是怎么实现的呢？并没有看到有对网络的输出进行softmax回归的代码啊…
    # 答：把网络输出的未归一化的预测传到Loss，在里面过Softmax做Cross-Entropy

这篇好文章是转载于：学新通技术网

深度学习-03-线性神经网络-softmax回归

3.4 softmax回归

3.4.1 分类问题

3.4.2 网络架构

3.4.3 全连接层的参数开销

3.4.4 softmax运算

3.4.5 小批量样本是矢量化

3.4.6 损失函数

3.4.6.2 softmax及其导数

3.4.8 模型预测与评估

3.4.9 小结

3.6 softmax回归的从零开始实现

3.6.1 初始化参数模型

3.6.2 定义softmax操作

3.6.3 定义模型

3.6.4 定义损失函数

3.6.5 分类精度

3.6.6 训练

3.6.8 小结

3.7 softmax回归的简洁实现

photoshop保存的图片太大微信发不了怎么办

Android 11 保存文件到外部存储，并分享文件

word里面弄一个表格后上面的标题会跑到下面怎么办

《学习通》视频自动暂停处理方法

photoshop扩展功能面板显示灰色怎么办

微信公众号没有声音提示怎么办

excel下划线不显示怎么办

excel打印预览压线压字怎么办

怎样阻止微信小程序自动打开

TikTok加速器哪个好免费的TK加速器推荐