本文介绍了如何验证当前镜像是否支持 RDMA 能力,用户可以根据下文中的步骤分别在 V100 RDMA(ml.hpcg1v.21xlarge 或 ml.hpcg1ve.21xlarge)和 A100 RDMA(ml.hpcpni2.28xlarge)两种机型上验证某个镜像是否符合 RDMA 的使用条件。
V100 和 A100 的 RDMA 网卡硬件不同,云服务器对 V100 和 A100 的 RDMA 网卡虚拟化支持方式不同,因此不同机型对镜像内相关软件库 / 包的版本也略有差异。
说明
不同发行版本的安装命令可能略有差异,目前主流的训练容器镜像是基于 Ubuntu(下文的 Ubuntu 版本为 20.04) 构建的,后续有其他发行版本的镜像,本文档会迭代更新。
在容器内执行 cat /etc/os-release
,输出示例如下:
root@iv-ybqs2pif757grbqpwubx:/workspace# cat /etc/os-release NAME="Ubuntu" VERSION="18.04.5 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.5 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic
apt update && apt install -y infiniband-diags
ibstatus
命令查看网卡速率。可以看到本例中网卡(mlx5_1)速率(rate)为 100Gb/s
,对 V100 RDMA 机型而言这是符合预期的。# ibstatus Infiniband device 'mlx5_0' port 1 status: default gid: fe80:0000:0000:0000:0216:3eff:fe5a:2a70 base lid: 0x0 sm lid: 0x0 state: 4: ACTIVE phys state: 5: LinkUp rate: 25 Gb/sec (1X EDR) link_layer: Ethernet Infiniband device 'mlx5_1' port 1 status: default gid: fe80:0000:0000:0000:0216:3fff:fe0e:db1b base lid: 0x0 sm lid: 0x0 state: 4: ACTIVE phys state: 5: LinkUp rate: 100 Gb/sec (4X EDR) link_layer: Ethernet
dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
输出示例如下:
# dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1 // 下面是输出 Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==================================================-==============================-==============================-========================================================================================================== ii ibverbs-providers:amd64 17.1-1ubuntu0.2 amd64 User space provider drivers for libibverbs ii libibverbs1:amd64 17.1-1ubuntu0.2 amd64 Library for direct userspace use of RDMA (InfiniBand/iWARP) ii libnl-3-200:amd64 3.2.29-0ubuntu3 amd64 library for dealing with netlink sockets ii libnl-route-3-200:amd64 3.2.29-0ubuntu3 amd64 library for dealing with netlink sockets - route interface dpkg-query: no packages found matching perftest dpkg-query: no packages found matching libibumad3 dpkg-query: no packages found matching librdmacm1
上述输出信息中包含了已安装(如ibverbs-providers:amd64
、libibverbs1:amd64
等)和未安装(如perftest
、libibumad3
等)的软件。
如有软件包未安装,请执行后续操作,否则即可正常使用上述软件验证当前镜像是否支持 RDMA。
4. 执行如下命令:
apt update && apt install -y perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
输出示例如下:
# dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1 Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==================================================-==============================-==============================-========================================================================================================== ii ibverbs-providers:amd64 28.0-1ubuntu1 amd64 User space provider drivers for libibverbs ii libibumad3:amd64 28.0-1ubuntu1 amd64 InfiniBand Userspace Management Datagram (uMAD) library ii libibverbs1:amd64 28.0-1ubuntu1 amd64 Library for direct userspace use of RDMA (InfiniBand/iWARP) ii libnl-3-200:amd64 3.4.0-1 amd64 library for dealing with netlink sockets ii libnl-route-3-200:amd64 3.4.0-1 amd64 library for dealing with netlink sockets - route interface ii librdmacm1:amd64 28.0-1ubuntu1 amd64 Library for managing RDMA connections ii perftest 4.4+0.5-1 amd64 Infiniband verbs performance tests
如未出现 dpkg-query: no packages found matching
报错,即可正常使用,版本号无需和本例保持一致。
6. 如果 nccl 版本低于 2.12
可以尝试安装 Sharp 插件以便启用 GDR(无法使用 GDR 将导致约 10% 的性能下降):
apt install automake autoconf libtool libibverbs-dev=28.0-1ubuntu1 libibverbs1=28.0-1ubuntu1 cd /tmp \ && git clone https://github.com/Mellanox/nccl-rdma-sharp-plugins.git \ && cd nccl-rdma-sharp-plugins \ && ./autogen.sh \ && ./configure --prefix=/usr/local/nccl-rdma-sharp-plugins --with-cuda=/usr/local/cuda \ && make && make install \ && rm -rf /tmp/nccl-rdma-sharp-plugins export LD_LIBRARY_PATH="/usr/local/nccl-rdma-sharp-plugins/lib:${LD_LIBRARY_PATH}"
cat /etc/os-release
,样例输出如下:[root@ncggrd8mrsfegjm28qvqg /]# cat /etc/os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/" CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"
执行如下命令安装测试软件包:
yum install -y infiniband-diags
注:由于 CentOS 8 已迁移到 CentOS 8 Stream,在使用上述命令时可能会遇到如下报错:
# yum install -y infiniband-diags Failed to set locale, defaulting to C.UTF-8 CentOS Linux 8 - AppStream 89 B/s | 38 B 00:00 Error: Failed to download metadata for repo 'appstream': Cannot prepare internal mirrorlist: No URLs in mirrorlist
此时可先使用如下两条命令,然后再次执行 yum install -y infiniband-diags
即可。
sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-Linux-* sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.epel.cloud|g' /etc/yum.repos.d/CentOS-Linux-*
ibstatus
命令查看网卡速率,可看到本例中网卡(mlx5_1)速率(rate)为 100 Gb/sec
,对 V100 RDMA 机型而言这是符合预期的。# ibstatus Infiniband device 'mlx5_1' port 1 status: default gid: fe80:0000:0000:0000:d069:89ff:fe00:e864 base lid: 0x0 sm lid: 0x0 state: 4: ACTIVE phys state: 5: LinkUp rate: 100 Gb/sec (4X EDR) link_layer: Ethernet
rpm -q perftest libibumad libibverbs libnl3 librdmacm
输出示例如下:
# rpm -q perftest libibumad libibverbs libnl3 librdmacm // 下面是输出 package perftest is not installed libibumad-22.4-6.el7_9.x86_64 package libibverbs is not installed package libnl3 is not installed package librdmacm is not installed
上述输出信息中包含了已安装(如 libibumad
)和未安装(如perftest
、libibverbs
等)的软件。 如有软件包未安装,请执行后续操作,否则即可正常使用上述软件验证当前镜像是否支持 RDMA**。**
yum install -y perftest libibumad libibverbs libnl3 librdmacm
rpm -q perftest libibumad libibverbs libnl3 librdmacm
输出示例如下:
# rpm -q perftest libibumad libibverbs libnl3 librdmacm perftest-4.2-2.el7.x86_64 libibumad-22.4-6.el7_9.x86_64 libibverbs-22.4-6.el7_9.x86_64 libnl3-3.2.28-4.el7.x86_64 librdmacm-22.4-6.el7_9.x86_64
如未出现 package x is not installed
报错,即可正常使用,版本号无需和本例保持一致。
lsb_release -c
输出示例如下:
# lsb_release -c // 下面是输出 Codename: bionic
Codename 和 Ubuntu 版本对应如下表格:
Codename | Version |
---|---|
bionic | 18.04 |
focal | 20.04 |
impish | 21.10 |
jammy | 22.04 |
dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
输出示例如下:
# dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1 // 下面是输出 Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==================================================-==============================-==============================-========================================================================================================== ii ibverbs-providers:amd64 17.1-1ubuntu0.2 amd64 User space provider drivers for libibverbs ii libibverbs1:amd64 17.1-1ubuntu0.2 amd64 Library for direct userspace use of RDMA (InfiniBand/iWARP) ii libnl-3-200:amd64 3.2.29-0ubuntu3 amd64 library for dealing with netlink sockets ii libnl-route-3-200:amd64 3.2.29-0ubuntu3 amd64 library for dealing with netlink sockets - route interface dpkg-query: no packages found matching perftest dpkg-query: no packages found matching libibumad3 dpkg-query: no packages found matching librdmacm1
上述输出信息中包含了已安装(如ibverbs-providers:amd64
、libibverbs1:amd64
等)和未安装(如perftest
、libibumad3
等)的软件。
如有软件包未安装或ibverbs-providers:amd64
和libibverbs1:amd64
的版本号前两位数字低于23
,请执行后续操作。否则可跳转至步骤 8 对比输出,如无问题即可正常使用。
4. 如在步骤 2 中获取到的版本低于20.04
,请从步骤 5 开始操作,如版本高于20.04
,请直接执行如下命令,然后可跳转到步骤 8。
apt update && apt install -y perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
deb http://mirrors.ivolces.com/ubuntu/
focal main universe
到 /etc/apt/sources.list
,或者直接执行如下命令(只需添加一次):echo "deb http://mirrors.ivolces.com/ubuntu/ focal main universe" >> /etc/apt/sources.list
APT::Default-Release "Codename";
到 /etc/apt/apt.conf.d/01-vendor-ubuntu
,这里的 Codename 替换为步骤 1 中获取到的结果,以 18.04
为例执行如下命令(只需添加一次):echo "APT::Default-Release \"bionic\";" >> /etc/apt/apt.conf.d/01-vendor-ubuntu
apt update && apt install -t focal perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
输出示例如下:
# dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1 Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==================================================-==============================-==============================-========================================================================================================== ii ibverbs-providers:amd64 28.0-1ubuntu1 amd64 User space provider drivers for libibverbs ii libibumad3:amd64 28.0-1ubuntu1 amd64 InfiniBand Userspace Management Datagram (uMAD) library ii libibverbs1:amd64 28.0-1ubuntu1 amd64 Library for direct userspace use of RDMA (InfiniBand/iWARP) ii libnl-3-200:amd64 3.4.0-1 amd64 library for dealing with netlink sockets ii libnl-route-3-200:amd64 3.4.0-1 amd64 library for dealing with netlink sockets - route interface ii librdmacm1:amd64 28.0-1ubuntu1 amd64 Library for managing RDMA connections ii perftest 4.4+0.5-1 amd64 Infiniband verbs performance tests
检查ibverbs-providers:amd64
、libibumad3:amd64
、libibverbs1:amd64
、librdmacm1:amd64
的版本号,该例中是 28.0-1ubuntu1
,前两位数字不低于 23
即可正常使用。
cat /etc/os-release
,样例输出如下:[root@ncggrd8mrsfegjm28qvqg /]# cat /etc/os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/" CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"
执行如下命令安装测试软件包:
yum install -y infiniband-diags
注:由于 CentOS 8 已迁移到 CentOS 8 Stream,在使用上述命令时可能会遇到如下报错:
# yum install -y infiniband-diags Failed to set locale, defaulting to C.UTF-8 CentOS Linux 8 - AppStream 89 B/s | 38 B 00:00 Error: Failed to download metadata for repo 'appstream': Cannot prepare internal mirrorlist: No URLs in mirrorlist
此时可先使用如下两条命令,然后再次执行 yum install -y infiniband-diags
即可。
sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-Linux-* sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.epel.cloud|g' /etc/yum.repos.d/CentOS-Linux-*
ibstatus
命令查看网卡速率,可看到本例中网卡(mlx5_1)速率(rate)为 100 Gb/sec
,对 V100 RDMA 机型而言这是符合预期的。# ibstatus Infiniband device 'mlx5_1' port 1 status: default gid: fe80:0000:0000:0000:d069:89ff:fe00:e864 base lid: 0x0 sm lid: 0x0 state: 4: ACTIVE phys state: 5: LinkUp rate: 100 Gb/sec (4X EDR) link_layer: Ethernet
rpm -q perftest libibumad libibverbs libnl3 librdmacm
输出示例如下:
# rpm -q perftest libibumad libibverbs libnl3 librdmacm // 下面是输出 package perftest is not installed libibumad-22.4-6.el7_9.x86_64 package libibverbs is not installed package libnl3 is not installed package librdmacm is not installed
上述输出信息中包含了已安装(如 libibumad
)和未安装(如perftest
、libibverbs
等)的软件。 如有软件包未安装,请执行后续操作,否则即可正常使用上述软件验证当前镜像是否支持 RDMA。
yum install -y perftest libibumad libibverbs libnl3 librdmacm
rpm -q perftest libibumad libibverbs libnl3 librdmacm
输出示例如下:
# rpm -q perftest libibumad libibverbs libnl3 librdmacm perftest-4.2-2.el7.x86_64 libibumad-22.4-6.el7_9.x86_64 libibverbs-22.4-6.el7_9.x86_64 libnl3-3.2.28-4.el7.x86_64 librdmacm-22.4-6.el7_9.x86_64
如未出现 package x is not installed
报错,即可正常使用,版本号无需和本例保持一致(以 libibverbs-22.4-6.el7_9.x86_64
为例,其中版本号为 22.4-6.el7_9.x86_64
,仅需其前两位不低于 22 即可)。
根据前文配置好环境后,可按照下列步骤进行镜像的配置验证,对于 V100 RDMA 和 A100 RDMA 两种机型而言,验证步骤相同。
ib_write_bw -d mlx5_1 &
输出示例如下:
# ib_write_bw -d mlx5_1 & [1] 104777 root@iv-ybrf933mwd8rx7gs2na5:/workspace# ************************************ * Waiting for client to connect... * ************************************
ib_write_bw -d mlx5_1 127.0.0.1 --report_gbits
输出示例如下:
# ib_write_bw -d mlx5_1 127.0.0.1 --report_gbits --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx5_1 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF CQ Moderation : 100 Mtu : 4096[B] Link type : Ethernet GID index : 2 Max inline data : 0[B] rdma_cm QPs : OFF Data ex. method : Ethernet --------------------------------------------------------------------------------------- local address: LID 0000 QPN 0x090a PSN 0x723cbf RKey 0x082200 VAddr 0x007f67e0c4c000 GID: 00:00:00:00:00:00:00:00:00:00:255:255:198:18:06:59 --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx5_1 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF TX depth : 128 CQ Moderation : 100 Mtu : 4096[B] Link type : Ethernet GID index : 2 Max inline data : 0[B] rdma_cm QPs : OFF Data ex. method : Ethernet --------------------------------------------------------------------------------------- local address: LID 0000 QPN 0x090b PSN 0xe78073 RKey 0x082300 VAddr 0x007fd7b287f000 GID: 00:00:00:00:00:00:00:00:00:00:255:255:198:18:06:59 remote address: LID 0000 QPN 0x090b PSN 0xe78073 RKey 0x082300 VAddr 0x007fd7b287f000 GID: 00:00:00:00:00:00:00:00:00:00:255:255:198:18:06:59 remote address: LID 0000 QPN 0x090a PSN 0x723cbf RKey 0x082200 VAddr 0x007f67e0c4c000 GID: 00:00:00:00:00:00:00:00:00:00:255:255:198:18:06:59 --------------------------------------------------------------------------------------- #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps] --------------------------------------------------------------------------------------- #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps] 65536 5000 93.90 93.67 0.178661 --------------------------------------------------------------------------------------- 65536 5000 93.90 93.67 0.178661 ---------------------------------------------------------------------------------------
对于 V100 RDMA 机型,带宽值(BW peak
、BW average
)应接近 100Gb/s
,A100 RDMA 机型应接近 200Gb/s
,如符合要求则说明配置无问题,如无输出或报错请回到根据机型配置环境的部分,检查是否有配置项的遗漏。
ib_write_bw -d mlx5_1 -x $NCCL_IB_GID_INDEX
输出示例如下:
# ib_write_bw -d mlx5_1 -x $NCCL_IB_GID_INDEX ************************************ * Waiting for client to connect... * ************************************
<MACHINE_A_HOST>
请替换为 A 机器的 RDMA 网口 IP。ib_write_bw -d mlx5_1 -x $NCCL_IB_GID_INDEX <MACHINE_A_HOST> --report_gbits