中式快餐店菜品图片（基于YOLOv5的中式快餐店菜品识别系统）

时间2025-07-31 20:15:00分类IT科技浏览9234

导读：基于YOLOv5的中式快餐店菜品识别系统[金鹰物联智慧食堂项目] 摘要...

基于YOLOv5的中式快餐店菜品识别系统[金鹰物联智慧食堂项目]

摘要

本文基于YOLOv5v6.1提出了一套适用于中式快餐店的菜品识别自助支付系统 ，综述了食品识别领域的发展现状，简要介绍了YOLOv5模型的历史背景、发展优势和网络结构。在数据集预处理过程中，通过解析UNIMIB2016 ，构建了一套行之有效的标签格式转换与校验流程，解决了YOLOv5中文件路径问题、标签格式转换问题和因EXIF信息的存在而导致的标记错位问题。在模型训练阶段，配置了云服务器，引入了Weights and Bias可视化工具，实现了在线监督训练和sweep超参数调优的功能，在sweep中使用hyperband剪枝算法加速了sweep过程，并且给出了对于训练过程中可能出现的问题的解决方法。最后介绍了目标识别领域的评价指标和YOLOv5的损失函数，分析了sweep超参数调优的结果，选取最优参数组合训练模型，通过分析样本分布、PR曲线等，选取最佳预测置信度，大幅提升了预测精度和召回率，部署了模型并制作了客户端。

引言

随着智能信息化时代的到来，人工智能与传感技术取得了巨大进步，在智能交通、智能家居、智能医疗等民生领域产生积极正面影响。其中，社交网络、移动网络和物联网等新兴技术产生了食品大数据，这些大数据与人工智能，尤其是快速发展的深度学习催生了新的交叉研究领域食品计算。现在，在智慧健康、食品智能装备、智慧餐饮、智能零售及智能家居等方面都可以找到食品大数据与人工智能相结合的例子。

人工智能时代下的食品图像识别是当前计算机视觉研究的重要领域之一。我们希望研发一种可快速且高效识别菜品的校园菜肴识别系统，在校园食堂中应用本系统，可缩短收银员计算价格的时间、简化收银步骤；可协助管理者精准备餐、减少库存的浪费；就餐者还可以即时看见摄入的食物营养价值，实现膳食平衡；另外，可迅速实现食品的安全溯源，避免出现食品安全情况。

传统的食物图像识别方法是选择图像特征，然后使用某些方法（比如SIFT 、HOG）提取图像特征点，再将特征点用矢量表示，最后采用机器学习的方法训练分类器（如SVM、K-Means）。传统食物图像识别提取特定特征或者关键点对食物进行分类，但在实际应用中，拍摄的图像会受到环境的光照强度、噪声干扰、环境光等外部因素的干扰，导致拍摄图像质量参差，从而影响最终的检测结果同一事物的颜色形状会有差异，不同食物直接的颜色形状也会相同。所以传统的图像识别方法很难准确识别出食物。

深度学习的发展使得当前大部分工作均采用卷积神经网络，思路是先对菜品图像中不同的菜品区域进行检测或分割，然后对其区域进行识别。从2014年开始，基于深度学习的目标检测网络井喷式爆发，先是二阶段网络，如R-CNN、Fast-RCNN 、Mask-RCNN等，自2016年Joseph等提出You only Look Once(YOLOv1)以来学者者们的视野，开启了单阶段目标检测网络的新纪元。YOLO均是对单阶段目标检测模型改进的研究，为各研究领域提供了更快、更好的目标检测方法，也为单阶段目标检测算法的实际应用提供了重要理论保障。例如 Aguilar 等人微调物体检测算法 YOLOv2 来进行多种食物检测和识别。又如 Pandey 等人微调了 AlexNet、GoogLeNet 、ResNet等三种CNN网络，然后基于微调的网络提取和融合来自不同网络的视觉特征，通过集成学习方法实现菜品图像识别。随着深度学习的发展，卷积神经网络（CNN）在各领域中获得不俗的效果，菜品识别也围绕卷积神经网络展开研究，不仅提出了新的方法，也提升了检测精度。

2020 年 6 月 10 日 YOLOv5 发布，随着版本迭代更新，其已成为现今最先进的目标检测技术之一。YOLOv5 使用Pytorch框架，对用户非常友好，能够方便地训练自己的数据集；能够直接对视频甚至网络摄像头端口输入进行有效推理，有着高达140FPS的目标识别速度；能够轻松的将Pytorch权重文件转化为安卓使用的ONXX格式，或者通过CoreML转化为IOS格式，以便直接部署到手机应用端。

YOLO的核心思想就是将整张图片作为网络的输入，利用“分而治之 ”的思想，对图片进行网格划分，直接在输出层回归边界框的检测位置及其所属的类别。与Faster R-CNN相比，YOLO产生的背景错误要少得多。通过使用YOLO来消除Faster RCNN的背景检测，可以显着提高模型性能。实验表明YOLO v5可以达到比Faster R-CNN更快的收敛速度，并且在小目标的检测上比SSD模型更加准确。

数据集

数据集来源和说明

本文所使用的托盘食物数据集来源于 UNIMIB2016 Food Database. 此数据集在真实餐厅环境中收集而来，每张照片的尺寸为 (3264, 2448) ，包含一个托盘和托盘上不同的食物，有些食物放在餐具垫上而非碟子中。有时，多种菜会被放置在同一碟子中，这给图像分割带来了困难。此外，图像畸变和光线环境等影响也会给分割和识别带来挑战。

The dataset has been collected in a real canteen environment. The particularities of this setting are that each image depicts different foods on a tray, and some foods (e.g. fruit, bread and dessert) are placed on the placemats rather than on plates. Sides are often served in the same plate as the main dish making it difficulty to separate the two. Moreover, the acquisition of the images has been performed in a semi-controlled settings so the images present visual distortions as well as illumination changes due to shadows. These characteristics make this dataset challenging requiring both the segmentation of the trays for food localization, and a robust way to deal with multiple foods.

如图3所示，在数据集中，许多类别的食物非常相似，例如，有四种不同的“Pasta al sugo ”，其中添加了其他主要成分（如鱼肉、蔬菜或者其他的一些肉类）。最后，托盘上可能有其他物品造成干扰，比如有智能手机、钱包、校园卡等等。

Figure 3, many food classes have a very similar appearance. For example, we have four different “Pasta al sugo”, but with other main ingredients (e.g. fish, vegetables, or meat) added. Finally, on the tray there can be other “noisy ” objects that must be ignored during the recognition. For example, we may find cell phones, wallets, id cards, and other personal items. For these reasons we need to design of a very accurate recognition algorithm.

数据集处理

作者团队一共收集了1442张照片，去除模糊和重复照片后，将剩余有效图片保存在UNIMIB2016-images中。其中，包含1027张照片，共计73种菜品，总计3616个菜品实例。一些种类的食物只是在成分上有所不同，所以命名为“FoodName 1 ”, “FoodName 2 ”.

接下来，处理UNIMIB2016-annotations.zip中的annotations.mat文件，将其转换为yolo格式。

在UNIMIB2016-annotations中，存有annotations.mat标记文件，.mat文件是Matlab的Map对象(Map object) ，其介绍如下：

A Map object is a data structure that allows you to retrieve values using a corresponding key. Keys can be real numbers or character vectors. As a result, they provide more flexibility for data access than array indices, which must be positive integers. Values can be scalar or nonscalar arrays.

MAT文件解析

若使用scipy.io.loadmat工具解析.mat文件，如需要加载annotations.mat ，在Map object多级嵌套时，解析可能出现意想不到的错误，故编写Matlab脚本将annotations.mat文件解析为YOLOv5所需的标记文件格式。

% . % ├── annotations.mat % ├── demo.m % ├── formatted_annotations % │ ├── 20151127_114556.txt % │ ├── 20151127_114946.txt % │ ├── 20151127_115133.txt % │ ├── ... % │ └── 20151221_135642.txt % └── load_annotations.m %% load_annotations.m clc; clear; % output path output = ./formatted_annotations/; % Load the annotations in a map structure load(annotations.mat); % Each entry in the map corresponds to the annotations of an image. % Each entry contains many cell tuples as annotated food % A tuple is composed of 8 cells with the annotated: % - (1) item category (food for all tuples) % - (2) item class (e.g. pasta, patate, ...) % - (3) item name % - (4) boundary type (polygonal for all tuples) % - (5) items boundary points [x1,y1,x2,y2,...,xn,yn] % - (6) items bounding box [x1,y1,x2,y2,x3,y3,x4,y4] image_names = annotations.keys; n_images = numel(image_names); for j = 1 : n_images image_name = image_names{j}; tuples = annotations(image_name); count = size(tuples,1); coordinate_mat = cell2mat(tuples(:,6)); % open file file_path = [output image_name .txt]; ffile = fopen(file_path, w); % write file for k = 1 : count item = tuples(k,:); fprintf(ffile, %s %d %d %d %d %d %d %d %d\n, ... string(item(2)), ... % item class coordinate_mat(k,:)); % items bounding box end % close file fclose(ffile); end %% fprintf % Write data to text file % https://www.mathworks.com/help/matlab/ref/fprintf.html

运行上述Matlab脚本文件，在./formatted_annotations文件夹下生成以图片名命名的*.txt文件，每一行的格式为class x1 y1 x2 y2 x3 y3 x4 y4.

bounding box如图所示：(xy1左上，xy3右下)

数据集有效性检验

下载并解压 [UNIMIB2016-images.zip] ，./original文件夹内为所有图片数据。将 original文件夹重命名为images，今后该文件夹用来存放图片数据，否则YOLOv5模型训练会发生错误，具体原因请看一文彻底解决YOLOv5训练找不到标签问题。编写check_dataset.py ，检查formatted_annotations中标签文件是否和images中图像文件一一对应，删除无效的标签和不匹配的标签。

# UNIMIB2016 # ├── UNIMIB2016-annotations # │ ├── check_dataset.py <-- # │ └── formatted_annotations # └── images # check_dataset.py import os # path of formatted_annotations f_path = os.path.join(os.getcwd(), formatted_annotations) # path of images img_path = os.path.join(os.getcwd(), os.pardir, images) def check_dataset(): annotations = [i[:-4] for i in os.listdir(f_path)] imgs = [i[:-4] for i in os.listdir(img_path)] for annotation in annotations: label = annotation + .txt label_path = os.path.join(f_path, label) try: if annotation not in imgs: # remove annotation which is not in images print(not found image: {}, remove its annotation.format(annotation)) print(label_path) raise FileExistsError else: # check extra spaces in a line with open(label_path) as f: lines = f.readlines() for line in lines: item = line.split() if len(item) > 9: print(wrong label format: {}, {}.format(annotation, line)) raise FileExistsError except FileExistsError: os.remove(label_path) print(os.remove({}).format(label_path)) if __name__ == __main__: check_dataset()

部分输出如下，check_dataset.py检查出21份在images中找不到对应图片的*.txt标记文件，检查出1份在类别标签中含有空格的*.txt标记文件，剔除这22份无效标记文件后，formatted_annotations中还剩余1005份有效标记文件。

食物类别统计

编写class_count.py ，生成formatted_annotations中所有食品种类的统计数据：

# UNIMIB2016 # ├── UNIMIB2016-annotations # │ ├── check_dataset.py # │ ├── class_count.py <-- # │ └── formatted_annotations # └── images # class_count.py import os import pandas as pd # formatted_annotations path path = os.path.join(os.getcwd(), formatted_annotations) # output path output = os.path.join(os.getcwd(), class_counts_result.csv) # read file list of formatted_annotations annotations = os.listdir(path) if __name__ == __main__: labels = [] for annotation in annotations: with open(os.path.join(path, annotation)) as file: for line in file: item = line.split() cls = item[0] labels.append(cls) counts = pd.Series(labels).value_counts() counts.to_csv(output, header=False)

分类统计结果存于class_counts_result.csv. 部分统计数据如下：（未进行上一目有前性检验前共73个分类），按出现次数从高到低，从0开始为每个分类进行编号。

Class Num pane 479 mandarini 198 carote 161 patate/pure 151 cotoletta 148 fagiolini 131 yogurt 130 标签格式转换

接下来编写python脚本，将这些数据转换为YOLOv5所需格式：

编写toYolo.py，将formatted_annotations中所有*.txt转换为Yolo格式，将生成的结果存于labels中。

# UNIMIB2016 # ├── UNIMIB2016-annotations # │ ├── check_dataset.py # │ ├── class_count.py # │ ├── toYolo.py <-- # │ ├── class_counts_result.csv # │ └── formatted_annotations (1005) # ├── labels # └── images (1005) # toYolo.py import os from PIL import Image # formatted_annotations path path = os.path.join(os.getcwd(), formatted_annotations) # path of images img_path = os.path.join(os.getcwd(), os.pardir, images) # output path output_path = os.path.join(os.getcwd(), os.pardir, labels) # class count file path class_file_path = os.path.join(os.getcwd(), class_counts_result.csv) def convert_box(size, box): # convert VOC to yolo format dw, dh = 1. / size[0], 1. / size[1] x, y, w, h = (box[0] + box[1]) / 2.0, (box[2] + box[3]) / 2.0, box[1] - box[0], box[3] - box[2] return [x * dw, y * dh, w * dw, h * dh] def convert_bbox(ibb): # convert ibb to VOC format # ibb = [x1,y1,x2,y2,x3,y3,x4,y4] X = ibb[0::2] Y = ibb[1::2] xmin = min(X) ymin = min(Y) xmax = max(X) ymax = max(Y) return xmin, ymin, xmax, ymax def get_classes(): # output: class list cf = open(class_file_path, r) clss = [line.split(,)[0] for line in cf.readlines()] cf.close() return clss def toYolo(): # read file list of formatted_annotations annotations = os.listdir(path) # get class list clss = get_classes() # convert every annotation in ./formatted_annotations/ to yolo format for annotation in annotations: with open(os.path.join(path, annotation)) as file, open(os.path.join(output_path, annotation), w) as opfile: # read img img_f_path = os.path.join(img_path, annotation[:-4] + .jpg) img = Image.open(img_f_path) # get img size size = img.size # process every item in ./formatted_annotations/*.txt for line in file: item = line.split() # get class num cls = item[0] cls_num = clss.index(cls) # get bbox coordinates item_bounding_box = list(map(float, item[1:])) xmin, ymin, xmax, ymax = convert_bbox(item_bounding_box) b = [xmin, xmax, ymin, ymax] bb = convert_box(size, b) # append item to output file: ../labels/*.txt item_str = list(map(str, [cls_num] + bb)) line_yolo = .join(item_str) opfile.write(line_yolo + \n) print(annotation) if __name__ == __main__: toYolo()

数据集校验

图片修正

由于 EXIF Rotation Information 的存在，在 YOLOv5 使用的 cv2 读取图片时，对图片参考系的选取产生影响，导致labels偏离原图片，故需要对图片进行修正，具体原因请查阅 yolov5踩坑记录：标签错位（PIL读取图片方向异常）。

修正前（标记错位）修正后修正代码 # UNIMIB2016 # ├── UNIMIB2016-annotations # │ ├── check_dataset.py # │ ├── class_count.py # │ ├── toYolo.py # │ ├── class_counts_result.csv # │ └── formatted_annotations # ├── rectify_imgs.py <-- # ├── labels (1005) # └── images (1005) # rectify_imgs.py import os from PIL import Image import numpy as np # image type img_type = .jpg # image folder path path = os.path.join(os.getcwd(), images) def rectify_imgs(): for img_name in os.listdir(path): if not img_name[-4:] == img_type: continue img_path = os.path.join(path, img_name) img = Image.open(img_path) img_rectified = Image.fromarray(np.asarray(img)) img_rectified.save(img_path) print(img_name) if __name__ == __main__: rectify_imgs() 标签正确性检验

完成上述所有数据集准备工作后，编写labels_shower.py模块，随机选取n张图片，使用 YOLOv5内的图像加载和标记函数，校验 labels文件夹中标记是否正确转换。

# . # ├── datasets # │ └── UNIMIB2016 # │ ├── UNIMIB2016-annotations # │ ├── images # │ ├── labels # │ └── split # └── yolov5 # └── labels_shower.py <-- # labels_shower.py import os import yaml import numpy as np from random import sample from utils.general import xywhn2xyxy from utils.plots import Annotator from utils.general import cv2 from utils.datasets import LoadImages from utils.plots import Colors n = 5 # how many images you want to show # file path set # ../datasets/UNIMIB2016/labels/ labels_path = os.path.join(os.path.pardir, datasets, UNIMIB2016, labels) # ../datasets/UNIMIB2016/images/ imgs_path = os.path.join(os.path.pardir, datasets, UNIMIB2016, images) # data/UNIMIB2016.yaml cls_path = os.path.join(os.getcwd(), data, UNIMIB2016.yaml) # model data preparation # you shouldnt change them pt = True stride = 2 imgsz = (640, 640) datasets = os.listdir(labels_path) line_thickness = 3 # bounding box thickness (pixels) colors = Colors() # create instance for from utils.plots import colors with open(cls_path, errors=ignore) as f: names = yaml.safe_load(f)[names] # class names def labels_shower(): sources = sample(datasets, n) for source in sources: # Add bbox to image with open(os.path.join(labels_path, source)) as file: lines = file.readlines() dataset = LoadImages(os.path.join(imgs_path, source[:-4] + .jpg), img_size=imgsz, stride=stride, auto=pt) im0s = dataset.__iter__().__next__()[2] im0 = im0s.copy() annotator = Annotator(im0, line_width=line_thickness, example=str(names)) for line in lines: annot = line.split() c = int(annot[0]) # integer class label = names[c] xywhn = np.asarray([[float(i) for i in annot[1:]]]) xyxy = xywhn2xyxy(xywhn, w=annotator.im.shape[1], h=annotator.im.shape[0]) annotator.box_label(xyxy.tolist()[0], label, color=colors(c, True)) im0 = annotator.result() cv2.imshow(str(source[:-4] + .jpg), im0) # press ESC to destroy cv2 windows if cv2.waitKey(0) == 27: cv2.destroyAllWindows() if __name__ == __main__: labels_shower()

YOLOv5 网络结构

YOLOv5模型集成了FPN多尺度检测及Mosaic数据增强和SPP结构，整体结构可以分为四个模块，具体为：输入端(Input) 、主干特征提取网络(Backbone) 、Neck与输出层(Prediction) 。

输入端

输入端(Input)主要包括了Mosaic数据增强、自适应锚框计算和自适应图片缩放三大部分。

Mosaic数据增强是将数据集图片以随机缩放、随机裁剪、随机排布的方式进行拼接自适应锚框计算是指在网络训练中，网络在初始锚框的基础上输出预测框，进而和真实框进行比对，计算两者差距，再反向迭代，更新网络参数自适应图片缩放常用的方式是将原始图片统一缩放到一个标准尺寸，再送入检测网络中

主干特征提取网络

主干特征网络提取网络Backbone由Focus结构和CSP结构组成。YOLOv5中分别设计和使用了两种不同的CSP结构，其中CSP1_X结构应用于主干特征提取网络中，同时在Neck中使用了另一种CSP2_X结构。使用CPS模块有如下优点：

增强网络的学习能力，使得训练出的模型，既能保持轻量化，又能有较高的准确性有效降低了计算瓶颈，通过较少的计算量获得较高是检测性能降低内存成本，使得训练使用一个GPU即可完成训练

Neck层

Neck层由FPN和PAN组成。FPN是通过向上采样的方法将上层的特征进行传输融合，从而得到预测特征图，其中含有两个PAN结构。通过下采样操作，将低层的特征信息和高层特征进行融合，输出预测的特征图。

FPN采用了自顶向下的结构，这样就可以进行对于强语义特征的传输；特征金字塔采用了自底向上的结构，这样就可以进行对于强定位特征的传输，这两者经过练手结合后，就可以将每一个检测层做到特征聚合，这样就成功提高了特征提取的能力。

输出端

输出端(Prediction)，即网络预测层，负责在特征图上应用anchors ，并生成带有类概率、目标得分和坐标的输出向量，并进行NMS非极大值抑制处理，最后输出预测结果。

Adam优化器

本文选用Adam作为模型训练过程中梯度下降的优化器，Adam优化器是AdaGrad和RMSPropAdam参数优化器的结合，它具有如下优点：

实现简单、计算高效、对内存需求少参数的更新不受梯度伸缩变换影响参数具有很好的解释性、且通常无需调整调整或者微调更新步长能够被限制在大致的的范围内自动调整学习率

激活函数选择

隐藏层激活函数

隐藏层使用带泄露的ReLU（Leaky ReLU）激活函数，在输入

x\lt 0

x<0 时，保持一个很小的梯度

\gamma

γ ，这样神经元非激活时也能有一个非零的梯度可以更新参数，避免永远不能被激活。

采用ReLU激活函数只需要进行加、减、乘和比较的操作，计算上更加高效，ReLU函数也被认为具有生物学合理性（Biological Plausibility），比如单侧抑制、宽兴奋边界（即兴奋程度高）。Sigmoid型激活函数会导致一个非稀疏性的神经网络，而ReLU却具有很好的稀疏性。

在优化方面，相比Sigmoid型函数的两端饱和，ReLU函数左饱和函数且

x\gt 0

x>0 时导数为

1 ，在一定程度上缓解了神经网络梯度消失的问题，加速梯度下降的收敛速度。输出层激活函数

输出层使用了Sigmoid型激活函数。使用Sigmoid型函数，其输出可以直接看成一个概率分布，使得神经网络可以更好地统计学习模型进行结合，并且它还可以看成一个软性门（Soft Gate），用来控制其他的神经元输出信息的数量。

模型优化

YOLOv5 的模型优化内容包括：

Focus层优化：使用一个卷积层 Conv(k=6, s=2, p=2) 替换掉 backbone 中的 Focus 层； SPP层优化：SSP空间金字塔池化层的作用是使卷积神经网络（CNN）能够输入任意大小的图片，在CNN的最后一层卷积层后面加入一层SSP层，它能使不同任意尺寸的特征图通过SSP层之后都能输出一个固定长度的向量。然后将这个固定长度的向量输入到全连接层，进行后续的分类检测任务。SPP层只通过指定三次卷积核大小，将来自CBL模块的数据进行三次池化并拼接，然后再过一个CBL ，有效避免了对图像区域剪裁、缩放操作导致的图像失真等问题，解决了卷积神经网络对图像重复特征提取的问题，大大提高了产生候选框的速度，且节省了计算成本，增强特征图特征表达能力； C3层优化：Bottleneck 为基本残差块，被堆叠嵌入到C3模块中进行特征学习，它利用两个Conv模块将通道数先减小再扩大对齐，以此提取特征信息，并使用shortcut控制是否进行残差连接。在C3模块中，输入特征图会通过两个分支，第一个分支先经过一个Conv模块，之后通过堆叠的Botleneck模块对特征进行学习；另一分支作为残差连接，仅通过一个Conv模块。两分支最终按通道进行拼接后，再通过一个Conv模块进行输出。在backbone结构的最后一层的C3层改用shorcut短连接，因为原先的骨干网络最后一层是C3 ，而现在是SPPF层。所以最后一层改用shortcut层，这样能够使网络正常训练。

本地环境搭建

创建虚拟环境克隆YOLOv5项目安装依赖库 git clone https://github.com/ultralytics/yolov5 (venv) ➜ food_detect pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/

当前项目结构

. ├── venv ├── datasets │ └── UNIMIB2016 │ ├── images (1005) │ └── labels (1005) └── yolov5

注：上述目录结构中只列写了项目的关键文件和文件夹.

W&B配置

Weights & Biases可被用作替代tensorboard的监督模型训练过程的可视化工具，拥有如下几个优点：

其已经兼容各种深度学习框架(Pytorch/Tensorflow/Keras) 界面简洁无需与服务器连接，甚至可在移动端随时随地登录自己的account浏览模型训练情况其不仅可以monitor深度学习loss 、reward等与训练强相关的标量，还会监督CPU 、GPU等硬件占用率等参数不仅作为Dashboard显示一些curve，还可通过设置可视化model的weights以调整接下来的调参策略等通过训练呈现的各种分析Dashboard或可视化界面可直接创建report导出pdf分享

W&B由以下四个组件构成：

Dashboard: 实验跟踪 Artifacts: 数据集版本控制、模型版本控制 Sweeps: 超参数优化 Reports: 保存和共享可重现的结果

基于上述优势，本项目选择W&B作为模型训练和结果可视化的管理平台。版本号如下，虽然YOLOv5v6.1推荐使用wandb version 0.12.10 or below.

版本号 0.12.11 # From the command line, install and log in to wandb, Copy this key and paste it into your command line when asked to authorize your account pip install wandb==0.12.11 wandb login

环境配置说明表

YOLOv5 v6.1 wandb 0.12.11 IDE PyCharm python 3.8 OS MacOS

模型训练准备

预训练模型的选取

在预训练模型的选择上，为了同时兼顾菜品识别的速率和准确性，我们选择最近才发布的预训练模型YOLOv5s6. （22 Feb 2022, v6.1）

在COCO数据集上，虽然YOLOv5n在识别速度上远超其他模型，但精度相对较低。而YOLOv5s在保持着较高识别速度的前提下，识别准确性优于YOLOv5n 。在近期更新的版本中，YOLOv5s6模型识别的准确性进一步提高，识别速度也有所提升，模型参数量大幅减少，故选择该预训练模型。

下载模型：YOLOv5s6.pt ，放置于yolov5/文件夹下.

训练集和验证集的划分

编写脚本，将datasets/UNIMIB2016/labels中的有效数据按7:3划分训练集和验证集，验证集也做测试集之用。最终，训练集数据量为703，验证集为302. 将结果存入UNIMIB2016目录下的train.txt和test.txt.

# . # ├── venv # ├── datasets # │ └── UNIMIB2016 # │ ├── splitDataset.py <-- # │ ├── images (1005) # │ └── labels (1005) # └── yolov5 # splitDataset.py import os import random from random import shuffle # labels relative path # ./labels ya_path = os.path.join(os.getcwd(), labels) # images path (relative to dataset root dir in UNIMIB2016.yaml) # ./images/ img_path = os.path.join(os.getcwd(), images) # output files name output_train = train.txt output_test = test.txt # the percentage of train set train_percent = .7 def splitDataset(): all_samples = os.listdir(ya_path) num = len(all_samples) train_num = int(train_percent * num) # shuffle samples list random.seed(82322) shuffle(all_samples) train_set = all_samples[:train_num] test_set = all_samples[train_num:] # generate train set file with open(os.path.join(os.getcwd(), output_train), w) as f: for item in train_set: f.write(os.path.join(img_path, item[:-4] + .jpg) + \n) # generate test set file with open(os.path.join(os.getcwd(), output_test), w) as f: for item in test_set: f.write(os.path.join(img_path, item[:-4] + .jpg) + \n) print(train set num = + str(train_num)) print(test set num = + str(num - train_num)) if __name__ == __main__: splitDataset()

模型训练文件配置

UNIMIB2016.yaml

新建yolov5/data/UNIMIB2016.yaml ，内容设置如下：

# UNIMIB2016 dataset http://www.ivl.disco.unimib.it/activities/food-recognition/ (1027 available photos) # parent # ├── yolov5 # └── datasets # └── UNIMIB2016 ← downloads here path: ../datasets/UNIMIB2016 # dataset root dir train: train.txt # train images (relative to path) 703 images val: test.txt # val images (relative to path) 302 images test: test.txt # test images (optional) 302 images # Classes nc: 73 # number of classes names: [ pane, mandarini, carote, patate/pure, cotoletta, fagiolini, yogurt, budino, spinaci, scaloppine, pizza, pasta_sugo_vegetariano, mele, pasta_pesto_besciamella_e_cornetti, zucchine_umido, lasagna_alla_bolognese, arancia, pasta_sugo_pesce, patatine_fritte, pasta_cozze_e_vongole, arrosto, riso_bianco, medaglioni_di_carne, torta_salata_spinaci_e_ricotta, pasta_zafferano_e_piselli, patate/pure_prosciutto, torta_salata_rustica_(zucchine), insalata_mista, pasta_mare_e_monti, polpette_di_carne, pasta_pancetta_e_zucchine, pasta_ricotta_e_salsiccia, orecchiette_(ragu), pizzoccheri, finocchi_gratinati, pere, pasta_tonno, riso_sugo, pasta_tonno_e_piselli, piselli, torta_salata_3, torta_salata_(alla_valdostana), banane, salmone_(da_menu_sembra_spada_in_realta), pesce_2_(filetto), bruscitt, guazzetto_di_calamari, pasta_e_fagioli, pasta_sugo, arrosto_di_vitello, stinco_di_maiale, minestra_lombarda, finocchi_in_umido, pasta_bianco, cavolfiore, merluzzo_alle_olive, zucchine_impanate, pesce_(filetto), torta_crema_2, roastbeef, rosbeef, cibo_bianco_non_identificato, torta_crema, passato_alla_piemontese, pasta_e_ceci, crema_zucca_e_fagioli, focaccia_bianca, minestra, torta_cioccolato_e_pere, torta_ananas, rucola, strudel, insalata_2_(uova ] # class names my_train.py

创建 yolov5/my_train.py ，编写单次训练的启动程序，并设置模型各个参数：（这一步也可融入下一目中进行——超参优化）

my_train.py使用预置超参数data/hyps/hyp.scratch-myself.yaml ，优化器Adam ，输入图像尺寸640 ，batch size = 16.

# my_train import train params = {weights: yolov5s6.pt, cfg: hub/yolov5s6.yaml, data: UNIMIB2016.yaml, hyp: data/hyps/hyp.scratch-myself.yaml, epochs: 300, batch_size: 16, imgsz: 640, optimizer: Adam} if __name__ == __main__: train.run(**params)

图像增强

数据增强也叫数据扩增，意思是在不实质性的增加数据的情况下，让有限的数据产生等价于更多数据的价值。

在yolov5/data/hyps目录下，作者提供的初始超参数就包含了图像增强的参数，如下图所示（hyp.scratch-med.yaml）：

图例为一次运行时（batch_size=16），经过mosaic 、hsv 、flip up-down 、flip left-right后得到的增强图片。

超参数调优

YOLOv5的开发团队在 PR #3938 中添加了对于 W&B sweep 的支持。所以，对于YOLOv5s6预训练模型的超参数调优，我们使用W&B提供的sweep工具。

参数和配置

编写yolov5/utils/loggers/wandb/sweep.yaml ，确定项目路径配置和超参数搜索范围、方法等：

# sweep.yaml # Hyperparameters for training program: utils/loggers/wandb/sweep.py method: random metric: name: metrics/mAP_0.5 goal: maximize early_terminate: type: hyperband min_iter: 3 eta: 3 parameters: # hyperparameters: set either min, max range or values list data: value: "data/UNIMIB2016.yaml" weights: value: "yolov5s6.pt" cfg: value: "models/hub/yolov5s6.yaml" epochs: value: 100 imgsz: value: 640 optimizer: value: "Adam" batch_size: values: [4, 8, 16] lr0: distribution: uniform min: 0.005 max: 0.015 lrf: distribution: uniform min: 0.005 max: 0.015 momentum: distribution: uniform min: 0.92 max: 0.95 weight_decay: distribution: uniform min: 4e-4 max: 5e-4 warmup_epochs: value: 3.0 warmup_momentum: value: 0.8 warmup_bias_lr: value: 0.1 box: distribution: uniform min: 0.045 max: 0.055 cls: distribution: uniform min: 0.45 max: 0.55 cls_pw: value: 1.0 obj: distribution: uniform min: 0.95 max: 1.05 obj_pw: value: 1.0 iou_t: distribution: uniform min: 0.18 max: 0.22 anchor_t: value: 4.0 fl_gamma: value: 0.0 hsv_h: value: 0.015 hsv_s: value: 0.7 hsv_v: value: 0.4 degrees: value: 8.0 translate: value: 0.005 scale: value: 0.20 shear: value: 0.0 perspective: value: 0.0 flipud: value: 0.7 fliplr: value: 0.7 mosaic: value: 0.95 mixup: value: 0 copy_paste: value: 0 超参数调优的目标是最大化mAP@0.5 最优超参数搜索方法使用random ，每次迭代时随机地在超参数搜索范围中选择一组参数参数范围的选取根据data/hyps/hyp.scratch-low.yaml来确定，hyp.scratch-low.yaml也被用来作为 baseline ，在开始 sweep 前先以该参数训练模型 sweeping过程中，使用hyperband方法对表现较差的迭代进行减枝（prune），提前结束该次超参尝试，加速模型超参数优化速度。参数设置：

\eta=3

η=3,

min\_iter=3

min_iter=3. 意味着每轮运行将在[3, 9, 27, 81]次brackets时，对模型优化目标进行评估，及时终止无效的运行 Random search chooses a random set of values on each iteration. Hyperparameters. Default hyperparameters are in hyp.scratch.yaml. We recommend you train with default hyperparameters first before thinking of modifying any. In general, increasing augmentation hyperparameters will reduce and delay overfitting, allowing for longer trainings and higher final mAP. Hyperband stopping evaluates whether a program should be stopped or permitted to continue at one or more pre-set iteration counts, called “brackets ”. When a run reaches a bracket, its metric value is compared to all previous reported metric values and the run is terminated if its value is too high (when the goal is minimization) or low (when the goal is maximization). 调优程序运行（sweep）

运行超参数调优程序，迭代次数100次.

# get the sweep id wandb sweep --project YOLOv5 utils/loggers/wandb/sweep.yaml # set a target to automatically stop the sweep NUM=100 # input the sweep id got in preceding step SWEEPID="xxxxxxxx" # run an agent by nohup nohup wandb agent --count $NUM sylvanding/YOLOv5/$SWEEPID > ./sweeping.log 2>&1 &

模型训练

云服务器选取

本项目的模型训练使用MistGPU平台提供的带有GPU加速功能的主机. 服务器的配置如下：

操作系统 Linux-4.18.0-15-generic-x86_64-with-glibc2.27 显卡 NVIDIA GeForce GTX 1080 Ti 显存 11 Gbps CPU Intel Xeon CPU E5-2678 v3 @ 2.50GHz

YOLOv5开发环境配置如下：

Python version 3.8.13 W&B CLI Version 0.12.11 PyTorch 1.11.0 Opencv 4.5.5 Cuda/cudnn Cuda10.1/cudnn7.6.5

服务器环境配置

安装python3.8 # python3.8 安装 1. 以root用户或具有sudo访问权限的用户身份运行以下命令，以更新软件包列表并安装必备组件： 2. $ sudo apt update $ sudo apt install software-properties-common 3. 将Deadsnakes PPA添加到系统的来源列表中： $ sudo add-apt-repository ppa:deadsnakes/ppa 4. 启用存储库后，请使用以下命令安装Python 3.8： $ sudo apt install python3.8 5. 通过键入以下命令验证安装是否成功： $ python3.8 --version 上传项目

项目文件的组织结构如下（整个项目的必要文件均打包到model_training/文件夹下）：

labels/文件夹存有前文得到的yolov5格式.txt标记文件1005份 test.txt, train.txt存放前文划分好的测试集、训练集图片文件路径 yolov5/存放上文修改的yolov5项目初始时，images文件夹为空，需要编写脚本下载、解压、修正图片，图片压缩文件UNIMIB2016-images.zip 上图UNIMIB2016中缺少rectify_imgs.py，应当添加进来 scp -r -P61500 /Users/sylvanding/Downloads/food_detect/model_training.zip mist@ygg.mistgpu.xyz:~/ 创建虚拟环境和安装项目依赖 pip install virtualenv whereis python3.8 # get python3.8 path virtualenv -p /usr/bin/python3.8 venv # use python3.8 as interpreter source venv/bin/activate cd yolov5 pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/ 下载数据和初始化W&B # 注意：下载速度可能很慢 bash datasets/UNIMIB2016/imageSets_downloads.sh # 初始化W&B wandb login

服务器图片处理脚本

编写imageSets_downloads.sh ，以下载、解压、修正图片：

#!/bin/bash # Download UNIMIB2016 dataset # http://www.ivl.disco.unimib.it/activities/food-recognition/ # created by Sylvan Ding -- https://blog.csdn.net/IYXUAN # 2022.04.22 -- sylvanding@qq.com, sylvanding.online # Example usage: bash datasets/UNIMIB2016/imageSets_downloads.sh # before execution, you need to install wget and zip! # parent ← you should be here # ├── yolov5 # └── datasets # └── UNIMIB2016 # ├── labels # └── images ← downloads here # Download/unzip images d=./datasets/UNIMIB2016/images # unzip directory file=UNIMIB2016-images.zip # images.zip url=wget http://www.ivl.disco.unimib.it/download/http://www.ivl.disco.unimib.it/minisites/UNIMIB2016/UNIMIB2016-images.zip wget $url echo Unzipping... unzip -q -j -d $d $file echo Downloaded successfully! python3.8 $d/../rectify_imgs.py echo Rectified successfully!

运行截图

常见错误

Arial.ttf 第一次启动，需要下载 Arial.ttf 字体，卡住.

解决方法：

在自己主机上下载好再上传到服务器，或者用 wget 再服务器上下载再移至指定字体文件夹

wget https://ultralytics.com/assets/Arial.ttf scp -r -P61500 /Users/sylvanding/Downloads/Arial.ttf mist@ygg.mistgpu.xyz:/home/mist/.config/Ultralytics/ 图片下载速度慢图片数据集下载速度慢，用腾讯云SVM下载，下载好后传至自己的主机上，再用 MistGPU 提供的云储存上传数据集即可 # 从“云端 ”拷贝数据集到项目文件夹 cp -v /data/UNIMIB2016-images.zip ~/model_training Cuda out of memory 出现Cuda is out of memory ，内存或显存不足，应该 kill 释放其他占用内存的进程 results.png 生成问题 #7650 When generating results.png, bug happened with disorder on Y-axis of val/box_loss, val/obj_loss and val/cls_loss

结果分析

Metrics

precision & recall

precision = \frac{TP}{TP+FP} = \frac{TP}{\mathrm{all\ detections}}

precision=TP+FPTP=alldetectionsTP

recall = \frac{TP}{TP+FN} = \frac{TP}{\mathrm{all\ ground\ truths}}

recall=TP+FNTP=allgroundtruthsTP Precision 指的是预测出的样本中正例比例（查准率） Recall 指的是所有正例中预测出的正例比例（查全率） all detections 是所有bounding box的数量 all ground truths是所有ground truths的数量

在目标检测（object detection）中，混淆矩阵的定义如下：

≥

Confidence\ge Threshold_2

Confidence≥Threshold2

Confidence\lt Threshold_2

Confidence<Threshold2

≥

IoU\ge Threshold_1

IoU≥Threshold1 TP FN

IoU\lt Threshold_1

IoU<Threshold1 FP TN