NVIDIA-Fabric Manager服务可以使多A100/A800显卡间通过NVSwitch互联。有关NVSwitch的更多介绍,请参见NVIDIA官网。
说明
您可以通过安装包或者源码两种方式安装NVIDIA-Fabric Manager服务,下文以GPU驱动为470.57.02版本为例,为您介绍如何安装并启动NVIDIA-Fabric Manager服务。如需下载其它版本,请将命令中的版本号替换为相应的GPU驱动版本号。您可以执行nvidia-smi
命令,查看GPU驱动版本。
CentOS 8.x
wget https://developer.download.nvidia.cn/compute/cuda/repos/rhel8/x86_64/nvidia-fabric-manager-470.57.02-1.x86_64.rpm rpm -ivh nvidia-fabric-manager-470.57.02-1.x86_64.rpm
CentOS 7.x
wget https://developer.download.nvidia.cn/compute/cuda/repos/rhel7/x86_64/nvidia-fabric-manager-470.57.02-1.x86_64.rpm rpm -ivh nvidia-fabric-manager-470.57.02-1.x86_64.rpm
Ubuntu 20.04
wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64/nvidia-fabricmanager-470_470.57.02-1_amd64.deb dpkg -i nvidia-fabricmanager-470_470.57.02-1_amd64.deb
Ubuntu 18.04
wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64/nvidia-fabricmanager-470_470.57.02-1_amd64.deb dpkg -i nvidia-fabricmanager-470_470.57.02-1_amd64.deb
Debain 10、veLinux 1.0
wget https://developer.download.nvidia.cn/compute/cuda/repos/debian10/x86_64/nvidia-fabricmanager-470_470.57.02-1_amd64.deb dpkg -i nvidia-fabricmanager-470_470.57.02-1_amd64.deb
CentOS 8.x
dnf config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo dnf module enable -y nvidia-driver:470 dnf install -y nvidia-fabric-manager-0:470.57.02-1
CentOS 7.x
yum -y install yum-utils yum-config-manager --add-repo https://developer.download.nvidia.cn/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo yum install -y nvidia-fabric-manager-470.57.02-1
Ubuntu 20.04
wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub apt-key add 7fa2af80.pub rm 7fa2af80.pub echo "deb http://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list apt-get update apt-get -y install nvidia-fabricmanager-470=470.57.02-1
Ubuntu 18.04
wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub apt-key add 7fa2af80.pub rm 7fa2af80.pub echo "deb http://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list apt-get update apt-get -y install nvidia-fabricmanager-470=470.57.02-1
CentOS 7.x/8.xyum install nvidia-fabric-manager-devel-470.57.02-1 -y
Ubuntu 20.04/18.04、Debain 10、veLinux 1.0dpkg -i nvidia-fabric-manager-devel-470.57.02-1_amd64.deb
执行如下命令启动Fabric Manager服务。sudo systemctl start nvidia-fabricmanager
执行如下命令查看Fabric Manager服务是否正常启动,回显active(running)
表示启动成功。sudo systemctl status nvidia-fabricmanager
执行如下命令配置Fabric Manager服务随实例开机自启动。sudo systemctl enable nvidia-fabricmanager