[Qemu-devel] vfio-pci: Report on a hack to successfully pass through a boot GPU

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] vfio-pci: Report on a hack to successfully pass through a boot GPU
@ 2016-07-12  9:30 Robert Ou
  2016-07-12 15:04 ` Alex Williamson
  0 siblings, 1 reply; 3+ messages in thread
From: Robert Ou @ 2016-07-12  9:30 UTC (permalink / raw)
  To: qemu-devel

I would like to report on a hack that I created to successfully use
vfio-pci to pass through a boot GPU. The short TL;DR summary is that
the BOOTFB framebuffer memory region seems to cause a "BAR <n>: can't
reserve [mem <...>]" error, and this can be hackily worked around by
calling __release_region on the BOOTFB framebuffer. I was told by
someone on IRC to send this hack to this list.

My system setup is as follows: I have a Xeon E5-2630 v4 on an Asrock
X99 Extreme6 motherboard. The GPU I am attempting to pass through is
an NVIDIA GTX 1080 plugged into the slot closest to the CPU. There is
a second GPU, an AMD R5 240 OEM (Oland) being used as the "initial"
GPU for Linux ("Initial" in this case means that the text consoles and
the X graphical login appear on the monitor connected to this GPU.
After logging in, additional commands are run to either run a VM or
run a new X server using the NVIDIA GPU.). Each GPU has separate
monitor cables connected to them - there is no attempt to somehow
forward the output from one GPU to another. Linux is booted using
UEFI, not BIOS boot. The CSM is disabled. The UEFI splash and the GRUB
bootloader display using the NVIDIA GPU. There does not appear to be
an option to change the boot GPU. However, Linux is configured to
display its output on the AMD GPU by a) only describing the AMD GPU in
xorg.conf and b) passing "video=simplefb:off" on the command line as
well as putting radeon in the initrd so that it can load before the
nvidia driver does. I am running Debian sid with kernel 4.6.

I activate the vfio-pci drivers manually by writing to
/sys/bus/pci/drivers/vfio-pci/new_id and then unbinding the existing
driver and binding vfio-pci. This actually works most of the time
(more on this later). When I initially (without my hack) try to launch
a qemu-kvm guest (using virt-manager; guest OS is Windows 10; guest is
booting via OVMF; guest is using i440fx), the host kernel log gets
flooded with an error "vfio-pci 0000:04:00.0: BAR 1: can't reserve
[mem 0xc0000000-0xcfffffff 64bit pref]". Examining /proc/iomem shows
the memory region vfio-pci is trying to claim overlaps with a memory
region named BOOTFB which is apparently the UEFI framebuffer (despite
the fact that simplefb is disabled, apparently this memory region is
still created). As a really terrible hack, I wrote a kernel module
that calls "__release_region(&iomem_resource, <start of bootfb>, <size
of bootfb>)". This fixed the issue for me, and I was successfully able
to pass through the boot GPU to the guest.

The source code of this hacky kernel module is below. It is used by
running "insmod forcefully-remove-bootfb.ko bootfb_start=<addr>
bootfb_end=<addr>" using addresses found from /proc/iomem. The module
is then immediately unloaded with rmmod. (The kernel module can't find
BOOTFB by itself because I couldn't and didn't bother to figure out
how to actually traverse iomem_resource from a kernel module. The
resource_lock lock doesn't seem to be accessible from modules.)

Regarding activating the vfio-pci drivers, I actually do not have the
nvidia/snd_hda_intel drivers blacklisted. I allow them to load
normally on boot and unbind them when I run a VM. I also attempt to
rebind the normal drivers after shutting down the VM. The idea is that
I can either run a Windows VM using the NVIDIA GPU, or I can start a
second X server using the NVIDIA GPU and a separate xorg.nv.conf, and
I can switch between these two modes without rebooting the host
(restarting (the second) X is still required). Most of the time, this
actually works correctly. Occasionally however, the kernel will
encounter a general protection fault, but this is an unrelated issue
to this hack I am describing.

A dump of various pieces of information follows (this probably isn't
directly useful and is for reference only):

$ lspci -nn
<snip>
00:1b.0 Audio device [0403]: Intel Corporation C610/X99 series chipset
HD Audio Controller [8086:8d20] (rev 05)
<snip>
04:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104
[GeForce GTX 1080] [10de:1b80] (rev a1)
04:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f0] (rev a1)
<snip>
08:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc.
[AMD/ATI] Oland [Radeon HD 8570 / R7 240/340 OEM] [1002:6611]
08:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI]
Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
[1002:aab0]
<snip>

$ uname -a
Linux <hostname> 4.6.0-1-amd64 #1 SMP Debian 4.6.2-2 (2016-06-25)
x86_64 GNU/Linux

$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.6.0-1-amd64 root=UUID=<snip> ro
rootflags=subvol=@ cgroup_enable=memory intremap=no_x2apic_optout
intel_iommu=on video=simplefb:off quiet

# cat /proc/iomem    # before hack
<snip>
60000000-6fffffff : PCI MMCONFIG 0000 [bus 00-ff]
  60000000-6fffffff : reserved
70000000-fbffbfff : PCI Bus 0000:00
  c0000000-d1ffffff : PCI Bus 0000:04
    c0000000-cfffffff : 0000:04:00.0
      c0000000-c086ffff : BOOTFB
    d0000000-d1ffffff : 0000:04:00.0
<snip>

# cat /proc/iomem    # after hack
<snip>
60000000-6fffffff : PCI MMCONFIG 0000 [bus 00-ff]
  60000000-6fffffff : reserved
70000000-fbffbfff : PCI Bus 0000:00
  c0000000-d1ffffff : PCI Bus 0000:04
    c0000000-cfffffff : 0000:04:00.0
    d0000000-d1ffffff : 0000:04:00.0
<snip>

---------- full commands to prep for running VM ----------
sudo insmod forcefully-remove-bootfb.ko bootfb_start=0xc0000000
bootfb_end=0xc086ffff
sudo rmmod forcefully_remove_bootfb
echo "8086 8d20" | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id #
Intel HD Audio, unrelated to this hack
echo "10de 1b80" | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id
echo "10de 10f0" | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id
echo "0000:00:1b.0" | sudo tee
/sys/bus/pci/devices/0000\:00\:1b.0/driver/unbind # Intel HD Audio,
unrelated to this hack
echo "0000:04:00.0" | sudo tee /sys/bus/pci/devices/0000\:04\:00.0/driver/unbind
echo "0000:04:00.1" | sudo tee /sys/bus/pci/devices/0000\:04\:00.1/driver/unbind
echo "0000:00:1b.0" | sudo tee /sys/bus/pci/drivers/vfio-pci/bind
echo "0000:04:00.0" | sudo tee /sys/bus/pci/drivers/vfio-pci/bind
echo "0000:04:00.1" | sudo tee /sys/bus/pci/drivers/vfio-pci/bind
# Can run virt-manager and launch VM now

---------- full commands to switch back to Linux ----------
echo "0000:00:1b.0" | sudo tee /sys/bus/pci/devices/0000\:00\:1b.0/driver/unbind
echo "0000:04:00.0" | sudo tee /sys/bus/pci/devices/0000\:04\:00.0/driver/unbind
echo "0000:04:00.1" | sudo tee /sys/bus/pci/devices/0000\:04\:00.1/driver/unbind
echo "0000:00:1b.0" | sudo tee /sys/bus/pci/drivers/snd_hda_intel/bind
echo "0000:04:00.0" | sudo tee /sys/bus/pci/drivers/nvidia/bind
echo "0000:04:00.1" | sudo tee /sys/bus/pci/drivers/snd_hda_intel/bind

---------- forcefully-remove-bootfb.c ----------
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>

#include <linux/resource_ext.h>

static resource_size_t bootfb_start = 0;
static resource_size_t bootfb_end = 0;

static int __init remover_module_init(void)
{
    printk(KERN_INFO "forcefully-remove-bootfb loaded\n");

    if (sizeof(resource_size_t) != 8) {
        // lol
        printk(KERN_ERR "Herp derp what is a programming?\n");
    } else {
        printk(KERN_INFO "forcefully-remove-bootfb 0x%llx-0x%llx\n",
            bootfb_start, bootfb_end);
        if (bootfb_start == 0 && bootfb_end == 0) {
            printk(KERN_ERR "forcefully-remove-bootfb needs addresses!\n");
        } else {
            // Do the actual removal here
            __release_region(&iomem_resource,
                bootfb_start, bootfb_end - bootfb_start + 1);
        }
    }
    return 0;
}

static void __exit remover_module_exit(void)
{
    printk(KERN_INFO "forcefully-remove-bootfb unloaded\n");
}

module_init(remover_module_init);
module_exit(remover_module_exit);

module_param(bootfb_start, ullong, 0000);
module_param(bootfb_end, ullong, 0000);

MODULE_LICENSE("Dual BSD/GPL");
MODULE_AUTHOR("Robert Ou <rqou@robertou.com>");
MODULE_DESCRIPTION("Forcefully removes BOOTFB I/O resource");

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] vfio-pci: Report on a hack to successfully pass through a boot GPU
  2016-07-12  9:30 [Qemu-devel] vfio-pci: Report on a hack to successfully pass through a boot GPU Robert Ou
@ 2016-07-12 15:04 ` Alex Williamson
  2016-07-12 19:12   ` Robert Ou
  0 siblings, 1 reply; 3+ messages in thread
From: Alex Williamson @ 2016-07-12 15:04 UTC (permalink / raw)
  To: Robert Ou; +Cc: qemu-devel

On Tue, 12 Jul 2016 02:30:44 -0700
Robert Ou <rqou@robertou.com> wrote:

> I would like to report on a hack that I created to successfully use
> vfio-pci to pass through a boot GPU. The short TL;DR summary is that
> the BOOTFB framebuffer memory region seems to cause a "BAR <n>: can't
> reserve [mem <...>]" error, and this can be hackily worked around by
> calling __release_region on the BOOTFB framebuffer. I was told by
> someone on IRC to send this hack to this list.
> 
> My system setup is as follows: I have a Xeon E5-2630 v4 on an Asrock
> X99 Extreme6 motherboard. The GPU I am attempting to pass through is
> an NVIDIA GTX 1080 plugged into the slot closest to the CPU. There is
> a second GPU, an AMD R5 240 OEM (Oland) being used as the "initial"
> GPU for Linux ("Initial" in this case means that the text consoles and
> the X graphical login appear on the monitor connected to this GPU.
> After logging in, additional commands are run to either run a VM or
> run a new X server using the NVIDIA GPU.). Each GPU has separate
> monitor cables connected to them - there is no attempt to somehow
> forward the output from one GPU to another. Linux is booted using
> UEFI, not BIOS boot. The CSM is disabled. The UEFI splash and the GRUB
> bootloader display using the NVIDIA GPU. There does not appear to be
> an option to change the boot GPU. However, Linux is configured to
> display its output on the AMD GPU by a) only describing the AMD GPU in
> xorg.conf and b) passing "video=simplefb:off" on the command line as
> well as putting radeon in the initrd so that it can load before the
> nvidia driver does. I am running Debian sid with kernel 4.6.
> 
> I activate the vfio-pci drivers manually by writing to
> /sys/bus/pci/drivers/vfio-pci/new_id and then unbinding the existing
> driver and binding vfio-pci. This actually works most of the time
> (more on this later). When I initially (without my hack) try to launch
> a qemu-kvm guest (using virt-manager; guest OS is Windows 10; guest is
> booting via OVMF; guest is using i440fx), the host kernel log gets
> flooded with an error "vfio-pci 0000:04:00.0: BAR 1: can't reserve
> [mem 0xc0000000-0xcfffffff 64bit pref]". Examining /proc/iomem shows
> the memory region vfio-pci is trying to claim overlaps with a memory
> region named BOOTFB which is apparently the UEFI framebuffer (despite
> the fact that simplefb is disabled, apparently this memory region is
> still created). As a really terrible hack, I wrote a kernel module
> that calls "__release_region(&iomem_resource, <start of bootfb>, <size
> of bootfb>)". This fixed the issue for me, and I was successfully able
> to pass through the boot GPU to the guest.
> 
> The source code of this hacky kernel module is below. It is used by
> running "insmod forcefully-remove-bootfb.ko bootfb_start=<addr>
> bootfb_end=<addr>" using addresses found from /proc/iomem. The module
> is then immediately unloaded with rmmod. (The kernel module can't find
> BOOTFB by itself because I couldn't and didn't bother to figure out
> how to actually traverse iomem_resource from a kernel module. The
> resource_lock lock doesn't seem to be accessible from modules.)

Can't you simply boot with video=efifb:off (or video=vesafb:off if you
were running BIOS rather than UEFI).  This is what I do for IGD
assignment.  I'm sure nvidia.ko causes more problems than i915 though,
maybe that's where simplefb comes into play.
 
> Regarding activating the vfio-pci drivers, I actually do not have the
> nvidia/snd_hda_intel drivers blacklisted. I allow them to load
> normally on boot and unbind them when I run a VM. I also attempt to
> rebind the normal drivers after shutting down the VM. The idea is that
> I can either run a Windows VM using the NVIDIA GPU, or I can start a
> second X server using the NVIDIA GPU and a separate xorg.nv.conf, and
> I can switch between these two modes without rebooting the host
> (restarting (the second) X is still required). Most of the time, this
> actually works correctly. Occasionally however, the kernel will
> encounter a general protection fault, but this is an unrelated issue
> to this hack I am describing.

It would be a new development if nvidia.ko were to properly release
device resources on unload, I filed a bug with them about that awhile
ago that got closed WONTFIX.  They simply don't support dynamically
unbinding devices from their driver, but maybe they've fixed
something.  Even i915 isn't great at this, we can unbind devices from
it, but occasionally on re-bind the kernel freaks out.  Someone needs to
spend some time debugging each driver for the unbind/re-bind use case,
but unfortunately that's impossible to do on nvidia.ko.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] vfio-pci: Report on a hack to successfully pass through a boot GPU
  2016-07-12 15:04 ` Alex Williamson
@ 2016-07-12 19:12   ` Robert Ou
  0 siblings, 0 replies; 3+ messages in thread
From: Robert Ou @ 2016-07-12 19:12 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel

On Jul 12, 2016 08:12, "Alex Williamson" <alex.williamson@redhat.com> wrote:
>
> On Tue, 12 Jul 2016 02:30:44 -0700
> Robert Ou <rqou@robertou.com> wrote:
>
> > I would like to report on a hack that I created to successfully use
> > vfio-pci to pass through a boot GPU. The short TL;DR summary is that
> > the BOOTFB framebuffer memory region seems to cause a "BAR <n>: can't
> > reserve [mem <...>]" error, and this can be hackily worked around by
> > calling __release_region on the BOOTFB framebuffer. I was told by
> > someone on IRC to send this hack to this list.
> >
> > My system setup is as follows: I have a Xeon E5-2630 v4 on an Asrock
> > X99 Extreme6 motherboard. The GPU I am attempting to pass through is
> > an NVIDIA GTX 1080 plugged into the slot closest to the CPU. There is
> > a second GPU, an AMD R5 240 OEM (Oland) being used as the "initial"
> > GPU for Linux ("Initial" in this case means that the text consoles and
> > the X graphical login appear on the monitor connected to this GPU.
> > After logging in, additional commands are run to either run a VM or
> > run a new X server using the NVIDIA GPU.). Each GPU has separate
> > monitor cables connected to them - there is no attempt to somehow
> > forward the output from one GPU to another. Linux is booted using
> > UEFI, not BIOS boot. The CSM is disabled. The UEFI splash and the GRUB
> > bootloader display using the NVIDIA GPU. There does not appear to be
> > an option to change the boot GPU. However, Linux is configured to
> > display its output on the AMD GPU by a) only describing the AMD GPU in
> > xorg.conf and b) passing "video=simplefb:off" on the command line as
> > well as putting radeon in the initrd so that it can load before the
> > nvidia driver does. I am running Debian sid with kernel 4.6.
> >
> > I activate the vfio-pci drivers manually by writing to
> > /sys/bus/pci/drivers/vfio-pci/new_id and then unbinding the existing
> > driver and binding vfio-pci. This actually works most of the time
> > (more on this later). When I initially (without my hack) try to launch
> > a qemu-kvm guest (using virt-manager; guest OS is Windows 10; guest is
> > booting via OVMF; guest is using i440fx), the host kernel log gets
> > flooded with an error "vfio-pci 0000:04:00.0: BAR 1: can't reserve
> > [mem 0xc0000000-0xcfffffff 64bit pref]". Examining /proc/iomem shows
> > the memory region vfio-pci is trying to claim overlaps with a memory
> > region named BOOTFB which is apparently the UEFI framebuffer (despite
> > the fact that simplefb is disabled, apparently this memory region is
> > still created). As a really terrible hack, I wrote a kernel module
> > that calls "__release_region(&iomem_resource, <start of bootfb>, <size
> > of bootfb>)". This fixed the issue for me, and I was successfully able
> > to pass through the boot GPU to the guest.
> >
> > The source code of this hacky kernel module is below. It is used by
> > running "insmod forcefully-remove-bootfb.ko bootfb_start=<addr>
> > bootfb_end=<addr>" using addresses found from /proc/iomem. The module
> > is then immediately unloaded with rmmod. (The kernel module can't find
> > BOOTFB by itself because I couldn't and didn't bother to figure out
> > how to actually traverse iomem_resource from a kernel module. The
> > resource_lock lock doesn't seem to be accessible from modules.)
>
> Can't you simply boot with video=efifb:off (or video=vesafb:off if you
> were running BIOS rather than UEFI).  This is what I do for IGD
> assignment.  I'm sure nvidia.ko causes more problems than i915 though,
> maybe that's where simplefb comes into play.

What I'm saying is that this doesn't work. The BOOTFB region is always
created, even when video=simplefb:off (seems to have replaced efifb) is
passed. I got this idea for forcefully deleting BOOTFB from this random
nouveau-related email thread here:
https://lists.freedesktop.org/archives/nouveau/2013-October/014667.html.

> > Regarding activating the vfio-pci drivers, I actually do not have the
> > nvidia/snd_hda_intel drivers blacklisted. I allow them to load
> > normally on boot and unbind them when I run a VM. I also attempt to
> > rebind the normal drivers after shutting down the VM. The idea is that
> > I can either run a Windows VM using the NVIDIA GPU, or I can start a
> > second X server using the NVIDIA GPU and a separate xorg.nv.conf, and
> > I can switch between these two modes without rebooting the host
> > (restarting (the second) X is still required). Most of the time, this
> > actually works correctly. Occasionally however, the kernel will
> > encounter a general protection fault, but this is an unrelated issue
> > to this hack I am describing.
>
> It would be a new development if nvidia.ko were to properly release
> device resources on unload, I filed a bug with them about that awhile
> ago that got closed WONTFIX.  They simply don't support dynamically
> unbinding devices from their driver, but maybe they've fixed
> something.  Even i915 isn't great at this, we can unbind devices from
> it, but occasionally on re-bind the kernel freaks out.  Someone needs to
> spend some time debugging each driver for the unbind/re-bind use case,
> but unfortunately that's impossible to do on nvidia.ko.  Thanks,

Interestingly, I don't recall having any issues that seem to be obviously
related to nvidia.ko. I have had more issues unbinding and rebinding the
snd_hda_intel driver. I am running NVIDIA driver version 367.27.

> Alex

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-07-12 19:13 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-12  9:30 [Qemu-devel] vfio-pci: Report on a hack to successfully pass through a boot GPU Robert Ou
2016-07-12 15:04 ` Alex Williamson
2016-07-12 19:12   ` Robert Ou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).