文章目录
- 1. Conda 虚拟环境配置TensorRT
- 2. onnx, onnxslim, onnxruntime-gpu安装
- 2.1 简介
- 2.2 onnx,onnxslim安装
- 2.3 onnxruntime-gpu安装
- 3. TensorRT格式导出&推理验证
- 3.1 模型导出为TensorRT格式
- 3.2 推理验证
- 3.3 速度对比
1. Conda 虚拟环境配置TensorRT
TensorRT是NVIDIA推出的深度学习推理(Inference)加速器,可显著提升模型推理速度,支持多种精度和数据类型。
在NVIDIA Jetson 上使用TensorRT
若要在虚拟环境中使用tensorrt,由于tensorrt不能被虚拟环境pytorch
中定位使用。因此我们需要软链接一下,运行如下命令:
sudo ln -s /usr/lib/python3.8/dist-packages/tensorrt* /home/nx/miniconda3/envs/pytorch/lib/python3.8/site-packages/
测试一下,运行如下指令:
python -c "import tensorrt;print(tensorrt.__version__)"
若出现版本号8.5.2.2,则成功。
2. onnx, onnxslim, onnxruntime-gpu安装
2.1 简介
onnx:用于定义、操作和转换深度学习模型的开放格式,支持模型在不同框架间的无缝交换和验证。
onnxslim:用于优化 ONNX 模型,通过简化模型结构减少冗余操作,提高推理速度和减小模型大小。
onnxruntime:提供高性能的推理引擎,支持在多种硬件平台上高效运行 ONNX 模型,实现跨平台部署和优化。
不安装上述三个包直接导出engine时,尽管Ultralytics会尝试自动安装所需包,但是会显示无法找包名。
(pytorch) nx@nx-desktop:~$ yolo export model=yolo11n.pt format=engine
Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt to 'yolo11n.pt'...
100%|████████████████████████████████████████████████████████████████████████████████████████████| 5.35M/5.35M [00:00<00:00, 11.2MB/s]
WARNING ⚠️ TensorRT requires GPU export, automatically assigning device=0
Ultralytics 8.3.70 🚀 Python-3.8.20 torch-2.1.0a0+41361538.nv23.06 CUDA:0 (Xavier, 6854MiB)
YOLO11n summary (fused): 238 layers, 2,616,248 parameters, 0 gradients, 6.5 GFLOPs
PyTorch: starting from 'yolo11n.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 84, 8400) (5.4 MB)
requirements: Ultralytics requirements ['onnx>=1.12.0', 'onnxslim', 'onnxruntime-gpu'] not found, attempting AutoUpdate...
ERROR: Could not find a version that satisfies the requirement onnxruntime-gpu (from versions: none)
ERROR: No matching distribution found for onnxruntime-gpu
Retry 1/2 failed: Command 'pip install --no-cache-dir "onnx>=1.12.0" "onnxslim" "onnxruntime-gpu" ' returned non-zero exit status 1.
ERROR: Could not find a version that satisfies the requirement onnxruntime-gpu (from versions: none)
ERROR: No matching distribution found for onnxruntime-gpu
Retry 2/2 failed: Command 'pip install --no-cache-dir "onnx>=1.12.0" "onnxslim" "onnxruntime-gpu" ' returned non-zero exit status 1.
requirements: ❌ Command 'pip install --no-cache-dir "onnx>=1.12.0" "onnxslim" "onnxruntime-gpu" ' returned non-zero exit status 1.
ONNX: export failure ❌ 14.7s: No module named 'onnx'
TensorRT: export failure ❌ 14.7s: No module named 'onnx'
Traceback (most recent call last):
File "/home/nx/miniconda3/envs/pytorch/bin/yolo", line 8, in <module>
sys.exit(entrypoint())
File "/home/nx/Desktop/yolov11/ultralytics-main/ultralytics/cfg/__init__.py", line 986, in entrypoint
getattr(model, mode)(**overrides) # default args from model
File "/home/nx/Desktop/yolov11/ultralytics-main/ultralytics/engine/model.py", line 740, in export
return Exporter(overrides=args, _callbacks=self.callbacks)(model=self.model)
File "/home/nx/Desktop/yolov11/ultralytics-main/ultralytics/engine/exporter.py", line 402, in __call__
f[1], _ = self.export_engine(dla=dla)
File "/home/nx/Desktop/yolov11/ultralytics-main/ultralytics/engine/exporter.py", line 180, in outer_func
raise e
File "/home/nx/Desktop/yolov11/ultralytics-main/ultralytics/engine/exporter.py", line 175, in outer_func
f, model = inner_func(*args, **kwargs)
File "/home/nx/Desktop/yolov11/ultralytics-main/ultralytics/engine/exporter.py", line 820, in export_engine
f_onnx, _ = self.export_onnx() # run before TRT import https://github.com/ultralytics/ultralytics/issues/7016
File "/home/nx/Desktop/yolov11/ultralytics-main/ultralytics/engine/exporter.py", line 180, in outer_func
raise e
File "/home/nx/Desktop/yolov11/ultralytics-main/ultralytics/engine/exporter.py", line 175, in outer_func
f, model = inner_func(*args, **kwargs)
File "/home/nx/Desktop/yolov11/ultralytics-main/ultralytics/engine/exporter.py", line 503, in export_onnx
import onnx # noqa
ModuleNotFoundError: No module named 'onnx'
主要原因在于:找不到满足onnxruntime-gpu
要求的版本。
因此需要手动安装onnxruntime-gpu。
2.2 onnx,onnxslim安装
我们先把能安装的onnx
和onnxslim
包先装掉。
pip install --no-cache-dir "onnx>=1.12.0" "onnxslim"
2.3 onnxruntime-gpu安装
根据JetPack 版本,在官网下载对应版本的.whl文件:Jetson Zoo - eLinux.org
比如up的JetPack 5.1.4+python3.8,则下载下述的wheel安装包到本地PC,然后上传到Jetson。
安装onnxruntime-gpu
pip install /home/nx/Downloads/onnxruntime_gpu-1.17.0-cp38-cp38-linux_aarch64.whl
onnxruntime-gpu 会自动将 numpy 恢复到最新版本。因此,需要重新安装 numpy 到 1.23.5。
pip install numpy==1.23.5
3. TensorRT格式导出&推理验证
3.1 模型导出为TensorRT格式
导出为TensorRT的.engine
格式,up等待了386.1s
cd /home/nx/Desktop/yolov11/ultralytics-main/
yolo export model=yolo11n.pt format=engine
出现以下信息则导出成功。
3.2 推理验证
yolo predict task=detect model=yolo11n.engine imgsz=640 source='https://ultralytics.com/images/bus.jpg'
出现以下信息则推理成功。可以看见(1, 3, 640, 640)分辨率下inference推理延迟仅为16.4ms。
而直接使用pt文件在(1, 3, 640, 480)分辨率下的推理延时为239.4ms。
3.3 速度对比
yolo predict task=detect model=yolo11n.pt imgsz=640 source=videos/街道.mp4 # 原始pytrch模型
yolo predict task=detect model=yolo11n.engine imgsz=640 source=videos/街道.mp4 # tensorrt导出模型