前面一系列文章介绍了如何在飞腾派上运行经典机器学习算法,同时统计了飞腾派运行这些算法的时间。从本篇文章开始,接下来将介绍深度学习算法在飞腾派的部署与使用。
随着深度学习在目标检测,目标识别,目标分割领域的成功实践,其在机器视觉的重要性毋庸置疑。作为边缘设备,飞腾派不需要对深度模型进行训练,在使用时用户更关注飞腾派的推理性能。
飞腾派的CPU采用的是飞腾四核处理器,包括两颗1.8GHz的FTC664
和两颗1.5GHz的FTC310
兼容ARM v8指令集,搭载64位DDR内存,实际性能表现接近树莓派4b。
尽管没有像其他CPU例如RK3588,RK3568,全志V853一样搭载推理专用的NPU,然而飞腾派可以借助板载的Mini PCIe
接口连接边缘计算推理卡,或者借助USB3.0连接dangle推理器例如SONGKE® TPU
,Intel® VPU
,Intel® NCS2
等加强神经网络加速。本篇文章将介绍如何在飞腾派上实现数字识别。
1. 环境准备
为了在飞腾派上部署深度学习模型,我们一般需要完成两个方面的工作:1)获得并训练模型 2)部署模型。为了保持过程的一致性,本文选用Google TensorFlow
作为深度学习框架,与Pytorch
相比,TensorFlow
拥有更深的深度学习工业基础,提供一系列的工具方便训练与部署。在主机端使用TensorFlow
训练模型,使用TensorFlow Lite
或者TensorFlow Micro
将模型部署到边缘设备中进行推理。
1.1 安装Tensorflow
在PC端安装Tensorflow至少需要满足以下要求:
- Python 3.8-3.11
- Ubuntu 16.04 or later / Windows 10
- NVIDIA GPU Driver (如果需要GPU)
tensorflow从2.10开始不再原生Windows 支持NVIDIA GPU
由于作者常用机为Windows10,因此本文将介绍使用在Windows 10安装Tensorflow 2.10,Linux的安装请参考Tensorflow安装指南
- 安装MiniConda
Miniconda是Anaconda的简化版本,可以在系统创建一个隔离的Python开发环境,直接下载miniconda安装包,双击安装包即可安装。 - 创建conda环境
- 使用
Win + x
,唤起开始菜单,选择运行
,键入pwsh
,进入powershell。 - 进入miniconda安装位置,激活conda环境
$ cd D:\miniconda\shell\condabin
$ .\conda-hook.ps1
$ conda activate base
- 创建tf环境。注意:激活后,powershell命令行将带有
base
或tf
环境前缀以指示当前conda环境
(base) PS C:\Users> conda create --name tf python=3.9
(base) PS C:\Users> conda activate tf
(tf) PS C:\Users>
- 安装cuda与cudnn
(tf) PS C:\Users> conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
- 安装tensorflow并验证
(tf) PS C:\Users> pip install --upgrade pip -i https://mirrors.cloud.tencent.com/pypi/simple
(tf) PS C:\Users> pip install "tensorflow<2.11" -i https://mirrors.cloud.tencent.com/pypi/simple
(tf) PS C:\Users> python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
1.2 安装Tensorflow lite
由于google已经提供基于Debian ARM64的tflite Linux aarch64 pip包,因此在飞腾派上安装Tensorflow lite相对简单。
- 安装pip
$ sudo apt install python3-pip
- 安装tflite-runtime
$ python3 -m pip install -U pip -i https://mirrors.cloud.tencent.com/pypi/simple
$ python3 -m pip install tflite-runtime -i https://mirrors.cloud.tencent.com/pypi/simple
2. 代码时间
2.1 训练LeNet模型
LeNet模型由Yann LeCun在1998年设计并提出的,是一种非常经典的卷积神经网络模型,先使用Tensorflow API实现这个模型。
MNIST数据集可在此处下载,由于MNIST数据集图片大小为28x28,因此实际使用LeNet时,原数据需要填充至32x32
LeNet = keras.models.Sequential(
[
keras.layers.Conv2D(
input_shape=(32, 32, 1), filters=6, kernel_size=(3, 3), activation="relu"
),
keras.layers.AveragePooling2D(),
keras.layers.Conv2D(filters=16, kernel_size=(3, 3), activation="relu"),
keras.layers.AveragePooling2D(),
keras.layers.Flatten(),
keras.layers.Dense(units=120, activation="relu"),
keras.layers.Dense(units=84, activation="relu"),
keras.layers.Dense(units=10, activation="softmax"),
]
)
载入数据
def read_mnist(images_path: str, labels_path: str):
with gzip.open(labels_path, "rb") as labelsFile:
labels = np.frombuffer(labelsFile.read(), dtype=np.uint8, offset=8)
with gzip.open(images_path, "rb") as imagesFile:
length = len(labels)
# Load flat 28x28 px images (784 px), and convert them to 28x28 px
features = (
np.frombuffer(imagesFile.read(), dtype=np.uint8, offset=16)
.reshape(length, 784)
.reshape(length, 28, 28, 1)
)
# LeNet architecture accepts a 32x32 pixel images as input,
# mnist data is 28x28 pixels. We simply pad the images with zeros to overcome
features = np.pad(features, ((0, 0), (2, 2), (2, 2), (0, 0)), "constant")
return features, labels
x_train, y_train = read_mnist(
"train-images-idx3-ubyte.gz", "train-labels-idx1-ubyte.gz"
)
x_test, y_test = read_mnist("t10k-images-idx3-ubyte.gz", "t10k-labels-idx1-ubyte.gz")
x_train, x_test = x_train / 255.0, x_test / 255.0
设置损失函数以及优化器开始训练
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
LeNet.compile(optimizer='adam',
loss=loss_fn,
metrics=['accuracy'])
LeNet.fit(x_train, y_train, epochs=5)
评估模型
LeNet.evaluate(x_test, y_test, verbose=2)
1875/1875 [==============================] - 9s 3ms/step - loss: 0.2255 - accuracy: 0.9328
Epoch 2/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.0749 - accuracy: 0.9767
Epoch 3/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.0529 - accuracy: 0.9836
Epoch 4/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0408 - accuracy: 0.9869
Epoch 5/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0329 - accuracy: 0.9895
313/313 - 1s - loss: 0.0341 - accuracy: 0.9897 - 628ms/epoch - 2ms/step
可以看到我们的LeNet模型在MNIST的识别准确度高达98.97%!
为了在飞腾派中进行推理,还需要将模型导出为TFLite格式
# convert model
converter = tf.lite.TFLiteConverter.from_keras_model(LeNet)
tflite_model = converter.convert()
# Save the model.
with open('lenet5.tflite', 'wb') as f:
f.write(tflite_model)
2.2 在飞腾派部署LeNet模型
现在让我们教飞腾派识别数字吧
首先设置track装饰器,让我们了解飞腾派的推理速度
import time
import logging as log
import sys
log.basicConfig(format='[ %(levelname)s ] %(message)s', level=log.INFO, stream=sys.stdout)
def timer(label:str, ms_:bool=True):
def out_wrapper(func):
def wrapper(*args, **kwargs):
st = time.perf_counter()
ret = func(*args, **kwargs)
measure = time.perf_counter() - st
if ms_:
measure *= 1e3
tag = "ms" if ms_ else "s"
log.info(f"{label} elapsed {(measure):.3f} {tag}.")
return ret
return wrapper
return out_wrapper
推理函数:
@timer("infer")
def infer(model:str, input_blob:np.array) -> int:
interpreter = tflite.Interpreter(model_path=model, num_threads=4)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
# Get input and output tensors.
output_details = interpreter.get_output_details()
interpreter.set_tensor(input_details[0]['index'], input_blob)
interpreter.invoke()
return np.argmax(interpreter.get_tensor(output_details[0]['index']))
完整代码如下:
import gzip
import numpy as np
import matplotlib.pyplot as plt
# import tensorflow as tf
import tflite_runtime.interpreter as tflite
from track import timer
def read_mnist(images_path: str, labels_path: str):
with gzip.open(labels_path, "rb") as labelsFile:
labels = np.frombuffer(labelsFile.read(), dtype=np.uint8, offset=8)
with gzip.open(images_path, "rb") as imagesFile:
length = len(labels)
# Load flat 28x28 px images (784 px), and convert them to 28x28 px
features = (
np.frombuffer(imagesFile.read(), dtype=np.uint8, offset=16)
.reshape(length, 784)
.reshape(length, 28, 28, 1)
)
# LeNet architecture accepts a 32x32 pixel images as input,
# mnist data is 28x28 pixels. We simply pad the images with zeros to overcome
features = np.pad(features, ((0, 0), (2, 2), (2, 2), (0, 0)), "constant")
return features, labels
@timer("infer")
def infer(model:str, input_blob:np.array) -> int:
interpreter = tflite.Interpreter(model_path=model, num_threads=4)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
# Get input and output tensors.
output_details = interpreter.get_output_details()
interpreter.set_tensor(input_details[0]['index'], input_blob)
interpreter.invoke()
return np.argmax(interpreter.get_tensor(output_details[0]['index']))
def display_grid(images):
AUTO_RANGE = 100
AUTO_SIZE = 9
np.random.randint(AUTO_RANGE, size=AUTO_SIZE)
figure = plt.figure(figsize=(8, 8))
figure.subplots_adjust(hspace=0.36)
for i in range(AUTO_SIZE):
image = images[i]
input_blob = np.expand_dims(image, 0)
pred = infer(model_path, input_blob)
figure.add_subplot(3, 3, i+1)
plt.grid(False)
plt.title(f"Pred: {pred}")
plt.imshow(image, cmap=plt.cm.gray_r)
if __name__ == "__main__":
model_path = r"lenet5.tflite"
x_test, y_test = read_mnist("t10k-images-idx3-ubyte.gz", "t10k-labels-idx1-ubyte.gz")
x_test = x_test / 255.0
x_test = np.float32(x_test)
display_grid(x_test)
plt.show()
3. 运行结果
user@phytiumpi:~/Documents/cv/dl/lenet$ python3 inference.py
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
[ INFO ] infer elapsed 319.551 ms.
[ INFO ] infer elapsed 2.013 ms.
[ INFO ] infer elapsed 1.966 ms.
[ INFO ] infer elapsed 1.957 ms.
[ INFO ] infer elapsed 1.966 ms.
[ INFO ] infer elapsed 1.900 ms.
[ INFO ] infer elapsed 1.905 ms.
[ INFO ] infer elapsed 1.910 ms.
[ INFO ] infer elapsed 1.888 ms.
可以看到除第一次初始化需要一些时间,飞腾派推理速度还是不错的
4. 总结
本文介绍了如何“教”飞腾派识别数字,以及深度学习模型的训练与在飞腾派的部署,下一篇文章中将主要介绍借助YOLO使得飞腾派可以“识物”。
全部评论