基于Swin Transformer和YOLOv5的无纺布瑕疵检测

doi:10.12422/j.issn.1672-6952.2024.03.011

摘要/Abstract

摘要：

对无纺布进行瑕疵检测，可以帮助企业提升生产效率，节约成本，但是基于CNN的目标检测算法受限于卷积核的局部特性，缺乏对图像的全局建模，对尺度变化范围大的瑕疵检出效果不理想。因此，提出了基于Swin Transformer和YOLOv5的无纺布瑕疵检测方法，并引入了CBAM注意力机制，同时微调了预测目标框的anchor尺寸；在自制数据集上对所提方法的有效性进行了验证。结果表明，通过其强大的自我注意力对特征进行编码、解码，网络可以获得更大的感受野，充分联系上下文关系；Swin的基于特征金字塔的分层构建结构与YOLOv5的neck设计十分相似，可以帮助网络在多尺度特征图上对目标进行预测；网络对重要信息的关注度得到了提高；通过Mosaic和MixUp数据增强丰富了数据分布；模型的鲁棒性和对无纺布的检测性能得到提高，回归预测结果更精准。

关键词: Swin Transformer模型, 自我注意力, CBAM注意力机制, 数据增强, anchor尺寸

Abstract:

The defect detection of non-woven fabrics can help enterprises improve production efficiency and save costs. Due to the local characteristics of the convolution kernel, the object detection algorithms based on CNN lack the global modeling of the image, and the detection effect is not ideal for defect detection with a large range of scale changes. Therefore, a non-woven fabric defect detection method is proposed based on the combination of Swin Transformer and YOLOv5, which encodes and decodes features through its powerful self-attention. The network can obtain a larger receptive field and fully relate to the context. The layered construction based on the feature pyramid of Swin coincides with the design of the neck of YOLOv5. It can help the network predict the target on the multi-scale feature map. On this basis, CBAM attention mechanism is introduced to help the network focus on important information. Through Mosaic and MixUp data augmentation, the data distribution is enriched and the robustness is increased. Finally, the anchor size of the prediction target frame is fine-tuned to make the regression prediction more accurate. The effectiveness of the proposed method is verified on the self-made data set, and the detection performance of non-woven fabrics is improved.

Key words: Swin Transformer model, Self-attention, CBAM attention mechanism, Data augmentation, anchor dimension

中图分类号:

TP391.1

刘佳玮, 曹江涛, 姬晓飞. 基于Swin Transformer和YOLOv5的无纺布瑕疵检测[J]. 辽宁石油化工大学学报, 2024, 44(3): 80-88.

Jiawei LIU, Jiangtao CAO, Xiaofei JI. Non-Woven Fabric Defect Detection Based on the Combination of Swin Transformer and YOLOv5[J]. Journal of Liaoning Petrochemical University, 2024, 44(3): 80-88.

图/表 14

图1 Swin Transformer主干网络结构（a）网络架构（b）内部构造

Fig.1 Swin Transformer backbone network architecture

图2 本文改进的算法框架示意图

Fig.2 Schematic diagram of the improved algorithm framework in this article

图3 CBAM注意力机制

Fig.3 CBAM attention mechanism

图4 Mosaic和MixUp混合的数据增强示例图

Fig.4 Sample image of data augmentation using a mixture of Mosaic and MixUp

图5 自制数据集中的云斑、褶皱和油污

Fig.5 Cloud spots, wrinkles, and oil stains in self-made datasets

表1 各类瑕疵尺寸统计

Table 1 Statistics of various defect sizes

瑕疵种类	$W ¯$ /px	$H ¯$ /px	$A ¯$ /px
云斑	73.02	114.51	9 606.76
褶皱	25.30	256.67	6 735.77
油污	7.97	18.35	155.59

表1 各类瑕疵尺寸统计

Table 1 Statistics of various defect sizes

瑕疵种类	$W ¯$ /px	$H ¯$ /px	$A ¯$ /px
云斑	73.02	114.51	9 606.76
褶皱	25.30	256.67	6 735.77
油污	7.97	18.35	155.59

表2 SwinTransformer主干与不同检测网络搭配性能测试

Table 2 The Swin Transformer backbone works with different detection networks for performance testing

模型	平均精度/%			0.5平均精度均值/%	t/ms
模型	云斑	褶皱	油污	0.5平均精度均值/%	t/ms
Swin-t+YOLOv5	88.53	72.58	91.23	84.12	20.43
Swin-s+ YOLOv5	91.14	73.16	94.46	86.25	30.15
Swin-t+Faster R-CNN	91.31	69.42	94.98	85.23	68.35
Swin-s+ Faster R-CNN	93.92	70.79	95.15	86.62	83.42
Swin-t+Mask R-CNN	91.48	71.47	96.37	86.45	82.45
Swin-s+Mask R-CNN	92.67	71.92	96.49	87.04	95.62

表3 引入注意力机制对网络性能的影响

Table 3 The impact of introducing attention mechanism on network performance

模型	平均精度/%			0.5平均精度均值/%
模型	云斑	褶皱	油污	0.5平均精度均值/%
Swin-t	88.53	72.58	91.23	84.12
Swin-t+CAM	90.93	75.98	91.85	86.25
Swin-t+SAM	92.08	81.97	92.17	88.76
Swin-t+CBAM	92.46	84.79	93.15	90.28

表4 数据增强对网络性能的影响

Table 4 The impact of data augmentation on network performance

方式	平均精度/%			0.5平均精度均值/%
方式	云斑	褶皱	油污	0.5平均精度均值/%
Mosaic	95.51	86.73	95.32	92.52
MixUp	92.54	84.87	93.84	90.43
Mosaic和MixUp混合	95.04	87.96	96.03	93.01

表5 微调anchor尺寸对网络性能的影响

Table 5 The impact of fine-tuning anchor size on network performance

调整

方式

图6 改进前后检测效果（a）原始算法（b）改进后算法

Fig.6 Detection effect before and after improvement

表6 不同网络检测结果

Table 6 Different network detection results

模型	0.5平均精度均值/%	t/ms
SSD	82.93	20.46
YOLOv5	83.26	17.32
Faster R-CNN	84.38	65.42
Mask R-CNN	85.67	86.35
改进的YOLOv4^[7]	93.62	32.41
改进的YOLOv5^[10]	90.51	18.56
本文	95.76	25.67

图7 各类别瑕疵示例

Fig.7 Examples of various types of defects

表7 三种模型的检测性能比较

Table 7 Comparison of detection performance of three models

模型	0.5平均精度均值/%	参数量/MB	t/ms
ES-Net^[8]	76.2	147.98	18.87
改进的YOLOv5^[9]	76.8	25.70	10.70
本文	79.1	184.07	23.68

参考文献 19

1	KUMAR A. Computer-vision-based fabric defect detection： A survey［J］. IEEE Transactions on Industrial Electronics， 2008， 55（1）： 348-363.
2	韩济阳，曹江涛，王贺楠，等. 计算机视觉布料瑕疵检测方法综述［J］. 辽宁石油化工大学学报， 2022， 42（1）：70-77.
	HAN J Y， CAO J T， WANG H N， et al. A review of fabric defect detection methods based on computer vision［J］. Journal of Liaoning Petrochemical University， 2022， 42（1）： 70-77.
3	赵艳，左保齐. 机器视觉在织物疵点检测上的应用研究综述［J］. 计算机工程与应用， 2020， 56（2）： 11-17.
	ZHAO Y， ZUO B Q. Analysis on application of machine vision in fabric defect detection［J］. Computer Engineering and Applications， 2020， 56（2）： 11-17.
4	孟志青，邱健数. 基于级联卷积神经网络的复杂花色布匹瑕疵检测算法［J］. 模式识别与人工智能， 2020， 33（12）：1135-1144.
	MENG Z Q， QIU J S. Defect detection algorithm of complex pattern fabric based on cascaded convolution neural network［J］. Pattern Recognition and Artificial Intelligence， 2020， 33（12）： 1135-1144.
5	蔡兆信，李瑞新，戴逸丹，等. 基于Faster RCNN的布匹瑕疵识别系统［J］. 计算机系统应用， 2021， 30（2）： 83-88.
	CAI Z X， LI R X， DAI Y D， et al. Fabric defect recognition system based on faster RCNN［J］. Computer Systems & Applications， 2021， 30（2）： 83-88.
6	谢团结，林贤伟，胡连信，等. 基于改进YOLOv5算法的织物疵点检测系统［J］. 棉纺织技术， 2022， 50（11）： 15-20.
	XIE T J， LIN X W， HU L X， et al. Fabric defect detection system based on improved YOLOv5 algorithm［J］. Cotton Textile Technology， 2022， 50（11）： 15-20.
7	YUE X， WANG Q， HE L， et al. Research on tiny target detection technology of fabric defects based on improved YOLO［J］. Applied Sciences， 2022， 12（13）： 6823.
8	YU X Y， LYU W， ZHOU D， et al. ES-Net： Efficient scale-aware network for tiny defect detection［J］. IEEE Transactions on Instrumentation and Measurement， 2022， 71： 1-14.
9	郭波，吕文涛，余序宜，等. 基于改进YOLOv5模型的织物疵点检测算法［J］. 浙江理工大学学报（自然科学版）， 2022， 47（5）： 755-763.
	GUO B， LÜ W T， YU X Y， et al. Fabric defect detection algorithm based on improved YOLOv5 model［J］. Journal of Zhejiang Sci-Tech University（Natural Sciences）， 2022， 47（5）： 755-763.
10	高敏，邹阳林，曹新旺. 基于改进YOLOv5模型的织物疵点检测［J］. 现代纺织技术， 2023， 31（4）： 155-163.
	GAO M， ZOU Y L， CAO X W. Fabric defect detection based on improved YOLOv5 model［J］. Advanced Textile Technology， 2023， 31（4）： 155-163.
11	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
12	CHEN M， RADFORD A， CHILD R， et al. Generative pretraining from pixels［C］//Proceedings of the 37th International Conference on Machine Learning. Red Hook： Curran Associates Inc.， 2020： 1691-1703.
13	ESSER P， ROMBACH R， OMMER B. Taming transformers for high-resolution image synthesis［C］//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Manhattan： IEEE， 2021： 12868-12878.
14	CARION N， MASSA F， SYNNAEVE G， et al. End-to-end object detection with transformers［C］//Computer Vision – ECCV 2020. Cham： Springer， Cham， 2020： 213-229.
15	LIN Z H， FENG M W， DOS SANTOS C N， et al. A structured self-attentive sentence embedding［C］//5th International Conference on Learning Representations. Toulon： OpenReview.net， 2017： 1-15.
16	李建，杜建强，朱彦陈，等. 基于Transformer的目标检测算法综述［J］. 计算机工程与应用， 2023， 59（10）： 48-64.
	LI J， DU J Q， ZHU Y C， et al. Overview of transformer based object detection algorithms［J］. Computer Engineering and Applications， 2023， 59（10）： 48-64.
17	LIU Z， LIN Y T， CAO Y， et al. Swin transformer： Hierarchical vision transformer using shifted Windows［C］//2021 IEEE/CVF International Conference on Computer Vision （ICCV）. Manhattan： IEEE， 2021： 9992-10002.
18	LIU Z， HU H， LIN Y T， et al. Swin transformer V2： Scaling up capacity and resolution［C］//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Manhattan： IEEE， 2022： 11999-12009.
19	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16×16 words： Transformers for image recognition at scale［EB/OL］. （2021-06-03）［2023-04-20］. https：//arxiv.org/abs/2010.11929.