* [Qemu-devel] AMD video card passthrough reset issues
@ 2014-12-02 12:53 Lucio Andrés Illanes Albornoz
2014-12-02 15:26 ` Alex Williamson
0 siblings, 1 reply; 4+ messages in thread
From: Lucio Andrés Illanes Albornoz @ 2014-12-02 12:53 UTC (permalink / raw)
To: qemu-devel
Hello,
I'm doing secondary VGA passthrough with an AMD Radeon R7 260X using QEMU v2.1.2 w/ KVM and VFIO on Debian v7.7 (wheezy) (qemu v2.1+dfsg-5~bpo70+1 from wheezy-backports) and kernel version 3.16.5 (from wheezy-backports as well) and Windows 8.1 Update 1 (x64) as the guest OS.
At present, rebooting the VM reproducibly has Windows fail to enable/start said video card upon bootup w/ an error code of 43, as seems to be the case w/ mostly everyone else running a comparable configuration; disabling/ejecting it before rebooting/powering down the VM from within the guest, as with everyone else, has proven to be a reliable mitigation. However, being that there are scenarios where this is either not feasible or impossible altogether, short of if done through a service or kernel-mode driver (and even then,) I had intended to investigate the causes behind this issue.
Unfortunately, the flu got to me first (so to speak.) I did notice that simply removing the PCI device in question and then causing a PCI bus (re)scan (both) through sysfs on the host in between VM reboots/power cycles is effectively equivalent to disabling it within the guest. Thus, I find myself wondering precisely what it is that does take place when doing so vs. when QEMU performs a `hot reset' through the corresponding interface in drivers/vfio/pci/; evidently, the difference must be of sufficient importance since the latter mechanism ends up leaving my video card unavailable for subsequent VM operation until the next host reboot.
I should very much appreciate any hints concerning whether it would be possible to have QEMU/VFIO perform whatever need be done itself or if it should be possible to have this be done by either itself.
Cheers
Lucio Andrés Illanes Albornoz
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] AMD video card passthrough reset issues
2014-12-02 12:53 [Qemu-devel] AMD video card passthrough reset issues Lucio Andrés Illanes Albornoz
@ 2014-12-02 15:26 ` Alex Williamson
2014-12-02 16:14 ` Lucio Andrés Illanes Albornoz
2014-12-08 14:11 ` Lucio Andrés Illanes Albornoz
0 siblings, 2 replies; 4+ messages in thread
From: Alex Williamson @ 2014-12-02 15:26 UTC (permalink / raw)
To: Lucio Andrés Illanes Albornoz; +Cc: Alex Deucher, qemu-devel
On Tue, 2014-12-02 at 13:53 +0100, Lucio Andrés Illanes Albornoz wrote:
> Hello,
>
> I'm doing secondary VGA passthrough with an AMD Radeon R7 260X using
> QEMU v2.1.2 w/ KVM and VFIO on Debian v7.7 (wheezy) (qemu v2.1
> +dfsg-5~bpo70+1 from wheezy-backports) and kernel version 3.16.5 (from
> wheezy-backports as well) and Windows 8.1 Update 1 (x64) as the guest
> OS.
>
> At present, rebooting the VM reproducibly has Windows fail to
> enable/start said video card upon bootup w/ an error code of 43, as
> seems to be the case w/ mostly everyone else running a comparable
> configuration; disabling/ejecting it before rebooting/powering down
> the VM from within the guest, as with everyone else, has proven to be
> a reliable mitigation. However, being that there are scenarios where
> this is either not feasible or impossible altogether, short of if done
> through a service or kernel-mode driver (and even then,) I had
> intended to investigate the causes behind this issue.
>
> Unfortunately, the flu got to me first (so to speak.) I did notice
> that simply removing the PCI device in question and then causing a PCI
> bus (re)scan (both) through sysfs on the host in between VM
> reboots/power cycles is effectively equivalent to disabling it within
> the guest. Thus, I find myself wondering precisely what it is that
> does take place when doing so vs. when QEMU performs a `hot reset'
> through the corresponding interface in drivers/vfio/pci/; evidently,
> the difference must be of sufficient importance since the latter
> mechanism ends up leaving my video card unavailable for subsequent VM
> operation until the next host reboot.
>
> I should very much appreciate any hints concerning whether it would be
> possible to have QEMU/VFIO perform whatever need be done itself or if
> it should be possible to have this be done by either itself.
All of the Bonaire-based AMD GPUs seems to have issues with reset
(R7790, R7 260/X). I've tried to engage AMD on this, but haven't gotten
any response on this topic yet. For devices like this that don't
support any kind of function level reset (FLR), VFIO will try to do a
PCI bus reset on guest reboot. This is as close as we can get to how
the BIOS resets the device on a host reboot. Unfortunately on these
cards there seems to be some sort of disconnect between the PCI bus
interface reset and resetting the rest of the GPU. I believe I've even
seen cases where a PCI bus reset appears to have no affect on the GPU
when running in VGA mode. My best guess is that some firmware running
in the card isn't clearing itself on reset an attempting to reload it
causes errors. Note that a guest can be reset multiple times and the
device continues to work if the guest is restricted to standard VGA
drivers (in VGA passthrough mode of course).
In your experiment with removing and rescanning the device, are you
simply doing 'echo 1 > remove; echo 1 > /sys/bus/pci/rescan'? Thanks,
Alex
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] AMD video card passthrough reset issues
2014-12-02 15:26 ` Alex Williamson
@ 2014-12-02 16:14 ` Lucio Andrés Illanes Albornoz
2014-12-08 14:11 ` Lucio Andrés Illanes Albornoz
1 sibling, 0 replies; 4+ messages in thread
From: Lucio Andrés Illanes Albornoz @ 2014-12-02 16:14 UTC (permalink / raw)
To: Alex Williamson; +Cc: Alex Deucher, qemu-devel
On Tue, 02 Dec 2014 08:26:20 -0700 Alex Williamson <alex.williamson@redhat.com> wrote:
> All of the Bonaire-based AMD GPUs seems to have issues with reset
> (R7790, R7 260/X). I've tried to engage AMD on this, but haven't gotten
> any response on this topic yet. For devices like this that don't
> support any kind of function level reset (FLR), VFIO will try to do a
> PCI bus reset on guest reboot. This is as close as we can get to how
> the BIOS resets the device on a host reboot. Unfortunately on these
> cards there seems to be some sort of disconnect between the PCI bus
> interface reset and resetting the rest of the GPU. I believe I've even
> seen cases where a PCI bus reset appears to have no affect on the GPU
> when running in VGA mode. My best guess is that some firmware running
> in the card isn't clearing itself on reset an attempting to reload it
> causes errors. Note that a guest can be reset multiple times and the
> device continues to work if the guest is restricted to standard VGA
> drivers (in VGA passthrough mode of course).
My experience is consistent with that description; the bus reset initiated through the hotplug reset interface appears to leave whichever part(s) of my video card in a state the AMD driver is not prepared to handle upon 2nd bootup (e.g. first VM reboot) and thereafter, it's completely gone: endless amounts of IOTLB_INV_TIMEOUT and `Completion-Wait loop timed out' kernel messages and particularly, no VGA output at all when doing primary passthrough (which I no longer require since vgacon isn't too fond of that,) and possibly even hangs upon running lspci (8) afterwards (if I remember correctly, that is.)
I had originally intended to have QEMU trace MMIO in general and PCI{,-E} bus/device traffic (as relevant) in order to establish what arcane incantations Windows could possibly be performing, but that only ended up showing me PCI configuration space read I/O and IRQ reassignments upon disabling my video card; WinDbg/Kd* is far too slow to facilitate tracing PCI{,-E} traffic through breakpoints and were I to possess the Windows Research Kernel source code, speaking completely hypothetically here, I would then unfortunately have to find out that QEMU w/ KVM plus AMD's drivers doesn't go along too well w/ Windows Server 2003. I then figured that having drivers/vfio/pci/* produce that information should ultimately lead me towards the solution but I can't quite see to that just yet; the remove/rescan dance is the only thing that, pragmatically speaking, actually works for me at present.
> In your experiment with removing and rescanning the device, are you
> simply doing 'echo 1 > remove; echo 1 > /sys/bus/pci/rescan'? Thanks,
Yes.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] AMD video card passthrough reset issues
2014-12-02 15:26 ` Alex Williamson
2014-12-02 16:14 ` Lucio Andrés Illanes Albornoz
@ 2014-12-08 14:11 ` Lucio Andrés Illanes Albornoz
1 sibling, 0 replies; 4+ messages in thread
From: Lucio Andrés Illanes Albornoz @ 2014-12-08 14:11 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Williamson
Has there been any progress on this matter?
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-12-08 14:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-02 12:53 [Qemu-devel] AMD video card passthrough reset issues Lucio Andrés Illanes Albornoz
2014-12-02 15:26 ` Alex Williamson
2014-12-02 16:14 ` Lucio Andrés Illanes Albornoz
2014-12-08 14:11 ` Lucio Andrés Illanes Albornoz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).