CUDA on ARM 1

动机 / Motivation

这个系列的内容是关于在 ARM 开发板或者 ARM PC 上使用支持 CUDA 的 GPU,并希望至少是能搭建一个可以作为编译测试的平台。写这个系列的文章是因为从目前评估来看,已经找到了一个可能的方案。

This series is about using CUDA-enabled GPUs on ARM development boards or ARM PCs. I hope a platform can be set up and used to compile and test, at least. The beginning of writing this series is because there might be a practicable plan.

这一些列文章将会是关于这个折腾的内容,不可能定期更新的。而这篇文章将简单介绍一下目前的状况。此外因为这篇文章涉及的内容全网上基本上找不到,所以写个双语的。以中文为准。

This series will not be updated regularly. At the same time, I can not find any information about this subjecton the Internet, so this series will be bilingual. Chinese version prevails.

支持 PCIE 的 ARM 开发板 / ARM board with PCIE

搜索到的支持 PCIE 的开发板都是基于 Rockchip (瑞芯) 的 RK3399。 根据官网介绍,这个芯片支持 4个 PCIe 2.1 版本的通道的。所以找到的开发板的 PCIe 接口不会超过这个性能。 当然没有支持 NVLink 的 ARM 开发板的。

All the board I found on the Internet (via the search engine), is based on Rockchip's RK3399. By the introduction, the RK3399 has 4 full-duplex PCIe 2.1 lanes. Therefore, dev-boards can not better than that. Of course, there is not any arm dev-board with NVLink.

基于芯片这个找到了众多 ARM 开发板中,接口种类包括 PCIe 4x 的, mini PCIe 的,基于 M.2 的,和 IO 脚针的(真神奇)。

Based on Rockchip RK3399, I found a lot of dev-boards including the one having PCIe 4x slot, the one having mini PCIe slot, the one having M.2 slot, and the one providing PCIe via pins (amazing!).

支持的开发板包括 / These boards include:

  • RockPro64
  • FriendlyARM NanoPi and NanoPC 系列 / series
  • 等待 / etc.

在淘宝逛了一圈,发现 RockPro64 是性价比最高的,而且是有一个完整的 PCIe 4x 插槽,并且最高支持 4G 的内存,当然价格也不贵。所以目前初步打算用这个开发板。当然我还没有打算购买开发板。

After checking the prices on Taobao, I found RockPro64 is the best choice. It has a PCIe 4x 2.1 slot, and supports up to 4G of memory. Moreover, it's cheap. The rough plan is to buy this board, but now, there is no decision.

RockPro64 大致是这个样子 (连接时淘宝的购买连接):

CUDA on ARM

英伟达官方提供了基于 aarch64 的 CUDA 工具套件。 可以在 CUDA on ARM 下载到安装包。

NVIDIA official provides CUDA toolkit based on Linux aarch64, but is not the final release. Jump to CUDA on ARM to download it.

这个官方的的安装包是有驱动的,但是因为是在华为云上进行测试的,所以驱动没有安装成功。

The official installer includes a driver installer. But I tested it on Huawei Cloud, and the driver was not successfully installed.

驱动 / Driver

CUDA on ARM 的安装包中有一个 435 版本的驱动, Linux aarch64 的。在解压了驱动中的文档之后,这个驱动的文档中是写着支持当前所有 GPU 的。(不知道实际是什么情况)。英伟达官方的意思是 Power9 架构下只支持 V100,但是在文档中照样把所有 GPU 都列出来了。

The installer of CUDA on ARM includes a driver versioned 435 for Linux aarch64. After decompressed documents in the driver, I found all the GPUs are listed in that, but I don't know what will happen when installing it. The official note of Power9 says only Tesla V100 is supported, but the documents also listed all the GPUs.

PCIe 转换头 / PCIe Adpter

在 RockPro64 上直接插 GPU 的话,卡是可以插进去的,因为 RockPro64 上的卡槽是开放的。但是供电却是一个问题。

The GPU can be directly plugged into RockPro64, because RockPro64's slot is open-ended. But the power supply is a big problem.

幸好找到了一个 PCIe 4x 转 PCIe 16x 还带电源供电的玩意。 详见 ADT R23SG

Fortunately, I found a PCIe 4x to 16x adapter with a power supply. See ADT R23SG.

下一步 / Next step

根据上面的内容,驱动有了,供电有了,下一步就可以买一个开发板然后找一个显卡试一下了。

According to the above, there are the driver and an adapter, so the next step is to buy a RockPro64 and find a GPU to test.