注意力机制代码详解（涨点技巧：注意力机制—Yolov8引入CBAM、GAM、Resnet_CBAM）

时间2025-07-10 02:29:11分类IT科技浏览4773

导读： 1.计算机视觉中的注意力机制一般来说，注意力机制通常被分为以下基本四大类：...

1.计算机视觉中的注意力机制

一般来说，注意力机制通常被分为以下基本四大类：

通道注意力 Channel Attention

空间注意力机制 Spatial Attention

时间注意力机制 Temporal Attention

分支注意力机制 Branch Attention

1.1.CBAM：通道注意力和空间注意力的集成者

轻量级的卷积注意力模块，它结合了通道和空间的注意力机制模块

论文题目：《CBAM: Convolutional Block Attention Module》

论文地址： https://arxiv.org/pdf/1807.06521.pdf

上图可以看到，CBAM包含CAM（Channel Attention Module）和SAM（Spartial Attention Module）两个子模块，分别进行通道和空间上的Attention。这样不只能够节约参数和计算力，并且保证了其能够做为即插即用的模块集成到现有的网络架构中去。

1.2 GAM：Global Attention Mechanism

超越CBAM，全新注意力GAM：不计成本提高精度！

论文题目：Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions

论文地址：https://paperswithcode.com/paper/global-attention-mechanism-retain-information

从整体上可以看出，GAM和CBAM注意力机制还是比较相似的，同样是使用了通道注意力机制和空间注意力机制。但是不同的是对通道注意力和空间注意力的处理。

1.3 ResBlock_CBAM

CBAM结构其实就是将通道注意力信息核空间注意力信息在一个block结构中进行运用。

在resnet中实现cbam：即在原始block和残差结构连接前，依次通过channel attention和spatial attention即可。

1.4性能评价

2.Yolov8加入CBAM、GAM

2.1 CBAM加入modules.py中（相当于yolov5中的common.py）

2.2 GAM_Attention加入modules.py中：

def channel_shuffle(x, groups=2): ##shuffle channel # RESHAPE----->transpose------->Flatten B, C, H, W = x.size() out = x.view(B, groups, C // groups, H, W).permute(0, 2, 1, 3, 4).contiguous() out = out.view(B, C, H, W) return out class GAM_Attention(nn.Module): # https://paperswithcode.com/paper/global-attention-mechanism-retain-information def __init__(self, c1, c2, group=True, rate=4): super(GAM_Attention, self).__init__() self.channel_attention = nn.Sequential( nn.Linear(c1, int(c1 / rate)), nn.ReLU(inplace=True), nn.Linear(int(c1 / rate), c1) ) self.spatial_attention = nn.Sequential( nn.Conv2d(c1, c1 // rate, kernel_size=7, padding=3, groups=rate) if group else nn.Conv2d(c1, int(c1 / rate), kernel_size=7, padding=3), nn.BatchNorm2d(int(c1 / rate)), nn.ReLU(inplace=True), nn.Conv2d(c1 // rate, c2, kernel_size=7, padding=3, groups=rate) if group else nn.Conv2d(int(c1 / rate), c2, kernel_size=7, padding=3), nn.BatchNorm2d(c2) ) def forward(self, x): b, c, h, w = x.shape x_permute = x.permute(0, 2, 3, 1).view(b, -1, c) x_att_permute = self.channel_attention(x_permute).view(b, h, w, c) x_channel_att = x_att_permute.permute(0, 3, 1, 2) # x_channel_att=channel_shuffle(x_channel_att,4) #last shuffle x = x * x_channel_att x_spatial_att = self.spatial_attention(x).sigmoid() x_spatial_att = channel_shuffle(x_spatial_att, 4) # last shuffle out = x * x_spatial_att # out=channel_shuffle(out,4) #last shuffle return out

2.3 ResBlock_CBAM加入modules.py中：

class ResBlock_CBAM(nn.Module): def __init__(self, in_places, places, stride=1, downsampling=False, expansion=4): super(ResBlock_CBAM, self).__init__() self.expansion = expansion self.downsampling = downsampling self.bottleneck = nn.Sequential( nn.Conv2d(in_channels=in_places, out_channels=places, kernel_size=1, stride=1, bias=False), nn.BatchNorm2d(places), nn.LeakyReLU(0.1, inplace=True), nn.Conv2d(in_channels=places, out_channels=places, kernel_size=3, stride=stride, padding=1, bias=False), nn.BatchNorm2d(places), nn.LeakyReLU(0.1, inplace=True), nn.Conv2d(in_channels=places, out_channels=places * self.expansion, kernel_size=1, stride=1, bias=False), nn.BatchNorm2d(places * self.expansion), ) self.cbam = CBAM(c1=places * self.expansion, c2=places * self.expansion, ) if self.downsampling: self.downsample = nn.Sequential( nn.Conv2d(in_channels=in_places, out_channels=places * self.expansion, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(places * self.expansion) ) self.relu = nn.ReLU(inplace=True) def forward(self, x): residual = x out = self.bottleneck(x) out = self.cbam(out) if self.downsampling: residual = self.downsample(x) out += residual out = self.relu(out) return out

2.4 CBAM、GAM_Attention、ResBlock_CBAM加入tasks.py中（相当于yolov5中的yolo.py）

from ultralytics.nn.modules import (C1, C2, C3, C3TR, SPP, SPPF, Bottleneck, BottleneckCSP, C2f, C3Ghost, C3x, Classify, Concat, Conv, ConvTranspose, Detect, DWConv, DWConvTranspose2d, Ensemble, Focus, GhostBottleneck, GhostConv, Segment,CBAM, GAM_Attention , ResBlock_CBAM)

def parse_model(d, ch, verbose=True):函数中

if m in (Classify, Conv, ConvTranspose, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, Focus, BottleneckCSP, C1, C2, C2f, C3, C3TR, C3Ghost, nn.ConvTranspose2d, DWConvTranspose2d, C3x , CBAM , GAM_Attention ,ResBlock_CBAM):

2.4 CBAM、GAM修改对应yaml

2.4.1 CBAM加入yolov8

# Ultralytics YOLO 🚀, GPL-3.0 license # YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes scales: # model compound scaling constants, i.e. model=yolov8n.yaml will call yolov8.yaml with scale n # [depth, width, max_channels] n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs # YOLOv8.0n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 3, C2f, [128, True]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 6, C2f, [256, True]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 6, C2f, [512, True]] - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 - [-1, 3, C2f, [1024, True]] - [-1, 1, SPPF, [1024, 5]] # 9 # YOLOv8.0n head head: - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 3, C2f, [512]] # 12 - [-1, 1, CBAM, [512]] - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 3, C2f, [256]] # 16 (P3/8-small) - [-1, 1, CBAM, [256]] - [-1, 1, Conv, [256, 3, 2]] - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 3, C2f, [512]] # 20 (P4/16-medium) - [-1, 1, CBAM, [512]] - [-1, 1, Conv, [512, 3, 2]] - [[-1, 9], 1, Concat, [1]] # cat head P5 - [-1, 3, C2f, [1024]] # 24 (P5/32-large) - [-1, 1, CBAM, [1024]] - [[17, 21, 25], 1, Detect, [nc]] # Detect(P3, P4, P5)

2.4.2 GAM加入yolov8

# Ultralytics YOLO 🚀, GPL-3.0 license # YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes scales: # model compound scaling constants, i.e. model=yolov8n.yaml will call yolov8.yaml with scale n # [depth, width, max_channels] n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs # YOLOv8.0n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 3, C2f, [128, True]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 6, C2f, [256, True]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 6, C2f, [512, True]] - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 - [-1, 3, C2f, [1024, True]] - [-1, 1, SPPF, [1024, 5]] # 9 # YOLOv8.0n head head: - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 3, C2f, [512]] # 12 - [-1, 1, GAM_Attention, [512,512]] - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 3, C2f, [256]] # 16 (P3/8-small) - [-1, 1, GAM_Attention, [256,256]] - [-1, 1, Conv, [256, 3, 2]] - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 3, C2f, [512]] # 20 (P4/16-medium) - [-1, 1, GAM_Attention, [512,512]] - [-1, 1, Conv, [512, 3, 2]] - [[-1, 9], 1, Concat, [1]] # cat head P5 - [-1, 3, C2f, [1024]] # 24 (P5/32-large) - [-1, 1, GAM_Attention, [1024,1024]] - [[17, 21, 25], 1, Detect, [nc]] # Detect(P3, P4, P5)

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。