From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57505) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xvq6d-00051g-Mb for qemu-devel@nongnu.org; Tue, 02 Dec 2014 11:15:28 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Xvq6V-0001jI-VE for qemu-devel@nongnu.org; Tue, 02 Dec 2014 11:15:15 -0500 Received: from mout.gmx.net ([212.227.15.15]:56475) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xvq6V-0001iQ-Kt for qemu-devel@nongnu.org; Tue, 02 Dec 2014 11:15:07 -0500 Date: Tue, 2 Dec 2014 17:14:56 +0100 From: Lucio =?UTF-8?B?QW5kcsOpcw==?= Illanes Albornoz Message-ID: <20141202171456.5c3477c6@lucio-pc> In-Reply-To: <1417533980.6539.11.camel@ul30vt.home> References: <20141202135311.4a4fdf04@lucio-pc> <1417533980.6539.11.camel@ul30vt.home> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] AMD video card passthrough reset issues List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: Alex Deucher , qemu-devel@nongnu.org On Tue, 02 Dec 2014 08:26:20 -0700 Alex Williamson wrote: > All of the Bonaire-based AMD GPUs seems to have issues with reset > (R7790, R7 260/X). I've tried to engage AMD on this, but haven't gotten > any response on this topic yet. For devices like this that don't > support any kind of function level reset (FLR), VFIO will try to do a > PCI bus reset on guest reboot. This is as close as we can get to how > the BIOS resets the device on a host reboot. Unfortunately on these > cards there seems to be some sort of disconnect between the PCI bus > interface reset and resetting the rest of the GPU. I believe I've even > seen cases where a PCI bus reset appears to have no affect on the GPU > when running in VGA mode. My best guess is that some firmware running > in the card isn't clearing itself on reset an attempting to reload it > causes errors. Note that a guest can be reset multiple times and the > device continues to work if the guest is restricted to standard VGA > drivers (in VGA passthrough mode of course). My experience is consistent with that description; the bus reset initiated through the hotplug reset interface appears to leave whichever part(s) of my video card in a state the AMD driver is not prepared to handle upon 2nd bootup (e.g. first VM reboot) and thereafter, it's completely gone: endless amounts of IOTLB_INV_TIMEOUT and `Completion-Wait loop timed out' kernel messages and particularly, no VGA output at all when doing primary passthrough (which I no longer require since vgacon isn't too fond of that,) and possibly even hangs upon running lspci (8) afterwards (if I remember correctly, that is.) I had originally intended to have QEMU trace MMIO in general and PCI{,-E} bus/device traffic (as relevant) in order to establish what arcane incantations Windows could possibly be performing, but that only ended up showing me PCI configuration space read I/O and IRQ reassignments upon disabling my video card; WinDbg/Kd* is far too slow to facilitate tracing PCI{,-E} traffic through breakpoints and were I to possess the Windows Research Kernel source code, speaking completely hypothetically here, I would then unfortunately have to find out that QEMU w/ KVM plus AMD's drivers doesn't go along too well w/ Windows Server 2003. I then figured that having drivers/vfio/pci/* produce that information should ultimately lead me towards the solution but I can't quite see to that just yet; the remove/rescan dance is the only thing that, pragmatically speaking, actually works for me at present. > In your experiment with removing and rescanning the device, are you > simply doing 'echo 1 > remove; echo 1 > /sys/bus/pci/rescan'? Thanks, Yes.