CutMix原理和代码解读

00000cj

2024-03-28 帮助1人

paper：CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features

前言

之前的数据增强方法存在的问题：

mixup：混合后的图像在局部是模糊和不自然的，因此会混淆模型，尤其是在定位方面。

cutout：被cutout的部分通常用0或者随机噪声填充，这就导致在训练过程中这部分的信息被浪费掉了。

cutmix在cutout的基础上进行改进，cutout的部分用另一张图像上cutout的部分进行填充，这样即保留了cutout的优点：让模型从目标的部分视图去学习目标的特征，让模型更关注那些less discriminative的部分。同时比cutout更高效，cutout的部分用另一张图像的部分进行填充，让模型同时学习两个目标的特征。

从下图可以看出，虽然Mixup和Cutout都提升了模型的分类精度，但在若监督定位和目标检测性能上都有不同程度的下降，而CutMix则在各个任务上都获得了显著的性能提升。

学新通

CutMix

cutmix的具体过程如下

学新通

其中\(M\in\left \{ 0,1 \right \}^{W\times H}\)是一个binary mask表明从两张图中裁剪的patch的位置，和mixup一样，\(\lambda\)也是通过\(\beta(\alpha, \alpha)\)分布得到的，在文章中作者设置\(\alpha=1\)，因此\(\lambda\)是从均匀分布\((0,1)\)中采样的。

为了得到mask，首先要确定cutmix的bounding box的坐标\(B=(r_{x},r_{y},r_{w},r_{h})\)，其值通过下式得到

学新通

即 \(\lambda\) 确定了patch与原图的面积比，即A图cutout的面积越大，标签融合时A图的比例越小。

代码实现

下面是torchvision的官方实现

class RandomCutmix(torch.nn.Module):
"""Randomly apply Cutmix to the provided batch and targets.
The class implements the data augmentations as described in the paper
`"CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features"
<https://arxiv.org/abs/1905.04899>`_.
Args:
num_classes (int): number of classes used for one-hot encoding.
p (float): probability of the batch being transformed. Default value is 0.5.
alpha (float): hyperparameter of the Beta distribution used for cutmix.
Default value is 1.0.
inplace (bool): boolean to make this transform inplace. Default set to False.
"""
def __init__(self, num_classes: int, p: float = 0.5, alpha: float = 1.0, inplace: bool = False) -> None:
super().__init__()
if num_classes < 1:
raise ValueError("Please provide a valid positive value for the num_classes.")
if alpha <= 0:
raise ValueError("Alpha param can't be zero.")
self.num_classes = num_classes
self.p = p
self.alpha = alpha
self.inplace = inplace
def forward(self, batch: Tensor, target: Tensor) -> Tuple[Tensor, Tensor]:
"""
Args:
batch (Tensor): Float tensor of size (B, C, H, W)
target (Tensor): Integer tensor of size (B, )
Returns:
Tensor: Randomly transformed batch.
"""
if batch.ndim != 4:
raise ValueError(f"Batch ndim should be 4. Got {batch.ndim}")
if target.ndim != 1:
raise ValueError(f"Target ndim should be 1. Got {target.ndim}")
if not batch.is_floating_point():
raise TypeError(f"Batch dtype should be a float tensor. Got {batch.dtype}.")
if target.dtype != torch.int64:
raise TypeError(f"Target dtype should be torch.int64. Got {target.dtype}")
if not self.inplace:
batch = batch.clone()
target = target.clone()
if target.ndim == 1:
target = torch.nn.functional.one_hot(target, num_classes=self.num_classes).to(dtype=batch.dtype)
if torch.rand(1).item() >= self.p:
return batch, target
# It's faster to roll the batch by one instead of shuffling it to create image pairs
batch_rolled = batch.roll(1, 0)
target_rolled = target.roll(1, 0)
# Implemented as on cutmix paper, page 12 (with minor corrections on typos).
lambda_param = float(torch._sample_dirichlet(torch.tensor([self.alpha, self.alpha]))[0])
_, H, W = F.get_dimensions(batch)
r_x = torch.randint(W, (1,))
r_y = torch.randint(H, (1,))
r = 0.5 * math.sqrt(1.0 - lambda_param)
r_w_half = int(r * W)
r_h_half = int(r * H)
x1 = int(torch.clamp(r_x - r_w_half, min=0))
y1 = int(torch.clamp(r_y - r_h_half, min=0))
x2 = int(torch.clamp(r_x r_w_half, max=W))
y2 = int(torch.clamp(r_y r_h_half, max=H))
batch[:, :, y1:y2, x1:x2] = batch_rolled[:, :, y1:y2, x1:x2]
lambda_param = float(1.0 - (x2 - x1) * (y2 - y1) / (W * H))
target_rolled.mul_(1.0 - lambda_param)
target.mul_(lambda_param).add_(target_rolled)
return batch, target
def __repr__(self) -> str:
s = (
f"{self.__class__.__name__}("
f"num_classes={self.num_classes}"
f", p={self.p}"
f", alpha={self.alpha}"
f", inplace={self.inplace}"
f")"
)
return s

实验结果

从下图可以看出，CutMix在ImageNet上的精度超过了Cutout和Mixup等数据增强方法

学新通

在若监督目标定位方面，CutMix也超过了Mixup和Cutout

学新通

当作为预训练模型迁移到其它下游任务比如目标检测和图像描述时，CutMix也取得了最好的效果

学新通

这篇好文章是转载于：学新通技术网

CutMix原理和代码解读

前言

CutMix

代码实现

实验结果

photoshop保存的图片太大微信发不了怎么办

《学习通》视频自动暂停处理方法

word里面弄一个表格后上面的标题会跑到下面怎么办

Android 11 保存文件到外部存储，并分享文件

photoshop扩展功能面板显示灰色怎么办

微信公众号没有声音提示怎么办

excel下划线不显示怎么办

excel打印预览压线压字怎么办

TikTok加速器哪个好免费的TK加速器推荐

怎样阻止微信小程序自动打开