Introduction
Part one: HOST
Part two: Guest
 
Introduction
 
This is an instruction based on V100 and GPU compute purpose only.
 
Please make sure using Nvidia Tesla production, which means Maxwell,
Pascal, and Volta. We do not have hardware matrix from Nvidia yet.
 
Please also make sure you have an extra display card on the host at meantime, or a SSH enviroment at least.
 
Part one: HOST
 
1. HOST enviroment verification
1.1 Make sure Your HOST is SLES12SP3 and so on
baird:~/:[0]# cat /etc/issue
 
Welcome to SUSE Linux Enterprise Server 15  (x86_64) - Kernel \r (\l).
 
1.2  Make sure your HOST support VT-d and being enabled from BIOS:
baird:~/:[0]# dmesg | grep -e "Directed I/O"
[   12.819760] DMAR: Intel(R) Virtualization Technology for Directed I/O
 
1.3 Make sure if you an extra GPU or VGA card:
 
baird:~/:[0]# lspci | grep -i "vga"
07:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA
G200e [Pilot] ServerEngines (SEP1) (rev 05)
 
baird:~/:[0]# lspci | grep -i nvidia
03:00.0 3D controller: NVIDIA Corporation GV100 [Tesla V100 PCIe] (rev a1)
 
2. Enable IOMMU
 
vim /etc/default/grub
 
# Make this line look like this
GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt rd.driver.pre=vfio-pci"
 
grub2-mkconfig -o /boot/grub2/grub.cfg
 
After reboot, you could verify by
dmesg |  grep -e DMAR -e IOMMU
 
3. Add nouveau to blacklist
baird:~/:[0]# vim /etc/modprobe.d/50-blacklist.conf
 
add "blacklist nouveau"
 
4. Setup VFIO and isolate the GPU used for pass-through
 
Add a file under /etc/modprobe.d
 
baird:~/:[0]# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1db4
 
10de:1db4 is vender id and model id, lspci -nn will give you these values
 
baird:~/:[0]# lspci -nn | grep 03:00.0
03:00.0 3D controller [0302]: NVIDIA Corporation GV100 [Tesla V100 PCIe]
[10de:1db4] (rev a1)
 
 
5. load VFIO driver
baird:~/:[0]# modprobe vfio-pci
 
or add to your initrd file
 
baird:~/:[0]# cat /etc/dracut.conf.d/gpu-passthrough.conf
add_drivers+="vfio vfio_iommu_type1 vfio_pci vfio_virqfd"
 
dracut --force /boot/initrd $(uname -r)
 
 
6. Reboot Host and check GPU is isolated in different iommu group and
vfio driver is in use
 
find /sys/kernel/iommu_groups/*/devices/*
 
/sys/kernel/iommu_groups/47/devices/0000:03:00.0
 
/sys/kernel/iommu_groups/49/devices/0000:07:00.0
 
 
lspci -k
 
03:00.0 3D controller: NVIDIA Corporation GV100 [Tesla V100 PCIe] (rev a1)
        Subsystem: NVIDIA Corporation Device 1214
        Kernel driver in use: vfio-pci
        Kernel modules: nouveau
 
Part Two  Guest Installment by virt-manager
 
 
1.1 Make sure you are installing a VM with UEFI mode
 
1.2 Make sure Your HOST is SLES12SP2 and so on
 
1.3 Still need add a emulated device when installing,
Graphic: spice
Device: qxl
 
1.4 Add pci host devices above:
03:00.0
 
1.5. Install driver
 
 
1.5.2 following below steps
i) `rpm -i nvidia-diag-driver-local-repo-sles123-390.30-1.0-1.x86_64.rpm'
ii) `zypper refresh`
iii) `zypper install cuda-drivers`
iv) `reboot`
 
For verification:
cd /usr/local/cuda-9.1/samples/0_Simple/simpleTemplates
make
/usr/local/cuda-9.1/samples/0_Simple/simpleTemplates/:[0]#
./simpleTemplates
runTest<float,32>
GPU Device 0: "Tesla V100-PCIE-16GB" with compute capability 7.0
 
CUDA device [Tesla V100-PCIE-16GB] has 80 Multi-Processors
Processing time: 495.006000 (ms)
Compare OK
runTest<int,64>
GPU Device 0: "Tesla V100-PCIE-16GB" with compute capability 7.0
 
CUDA device [Tesla V100-PCIE-16GB] has 80 Multi-Processors
Processing time: 0.203000 (ms)
Compare OK
 
[simpleTemplates] -> Test Results: 0 Failures
 
 
1.5.3
It may need sign nvidia.ko/nvidia-uvm.ko in SLES
 
We are working on a better solution now
8.2 Signing Module Object Files (UEFI Secure Boot)
 
 
1.6 Display issue
Once you installed nvidia driver, the vi rt-manager display will lost
connect. You need to ssh login or change to console interface or install
a dedicated vnc server inside vm