From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gordan Bobic Subject: Re: Device Reset on Nvidia GPUs Date: Mon, 25 Nov 2013 15:25:08 +0000 Message-ID: <52936BD4.9050707@bobich.net> References: <87b7d4387d7ab0bbb220efa864170381@mail.shatteredsilicon.net> <20131115144004.GA28448@phenom.dumpdata.com> <22c4bfecbc50b939dc8795864fe766ce@mail.shatteredsilicon.net> <20131125151751.GB6095@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20131125151751.GB6095@phenom.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: xen-devel@lists.xen.org, Stefano Stabellini List-Id: xen-devel@lists.xenproject.org On 11/25/2013 03:17 PM, Konrad Rzeszutek Wilk wrote: > On Fri, Nov 15, 2013 at 02:52:56PM +0000, Gordan Bobic wrote: >> On Fri, 15 Nov 2013 09:40:04 -0500, Konrad Rzeszutek Wilk >> wrote: >>> On Fri, Nov 15, 2013 at 02:29:39PM +0000, Gordan Bobic wrote: >>>> On Fri, 15 Nov 2013 14:27:21 +0000, Stefano Stabellini >>>> wrote: >>>>> On Fri, 15 Nov 2013, Gordan Bobic wrote: >>>>>> I've noticed that nouveau driver has a sysfs reset implemented >>>>>> (although I'm not sure whether it is just a stub or whether it >>>>>> does anything). >>>>>> >>>>>> Now, I fully understand that this is not actually necessary, >>>>>> based purely on empirical evidence: >>>>>> >>>>>> My ATI cards reliably crash the host when the domU the are passed >>>>>> to is rebooted, and the xen-pciback driver does have the sysfs >>>>>> reset implemented for ATI cards. >>>>>> >>>>>> OTOH, my (modified) Nvidia cards handle domU reboots perfectly >>>>>> and the xen-pciback driver has no sysfs reset implementation >>>>>> for those. >>>>>> >>>>>> So I'm kind of torn between: >>>>>> 1) It's not broken so don't even think about trying to fix it. >>>>>> 2) Since FOSS reset implementation seems to exist, it might be >>>>>> handy to port it into the xen-pciback feature list (caveat: >>>>>> this may impact 1), which would be embarrasing). >>>>>> >>>>>> Thoughts? >>>>> >>>>> libxl is capable of using the sysfs reset node, so there >>>> shouldn't be >>>>> any needed for porting the reset code to pciback >>>> >>>> Not quite - when the device is owned by xen-pciback, there is >>>> no reset node. When it is owned by nouveau, the reset node in >>>> sysfs is there. >>> >>> Sure, but pciback does the reset: >>> >>> >>> /* We need the device active to save the state. */ >>> >>> dev_dbg(&dev->dev, "save state of device\n"); >>> >>> pci_save_state(dev); >>> >>> dev_data->pci_saved_state = pci_store_saved_state(dev); >>> >>> if (!dev_data->pci_saved_state) >>> >>> dev_err(&dev->dev, "Could not store PCI conf saved >>> state!\n"); >>> else { >>> >>> dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the >>> device\n"); >>> __pci_reset_function_locked(dev); >>> >>> pci_restore_state(dev); >>> >>> } >>> >>> The pci_reset_function(..) - the non-locked variant) is called >>> when you >>> do 'reset' to the SysFS. >>> >>> Unless the nouveau driver does some extra 'reset'? >> >> I don't know for sure at the moment. I was just basing this on the >> observation that xl complains that there is no sysfs reset for the >> device when instantiating the VM and there being no reset node for >> the device in sysfs. >> >> It seems oddly inconsistent that there is a reset node in sysfs for >> ATI cards (that crash the host on domU reboot) but no reset node for >> Nvidia cards (which work fine on domU reboot). >> >> There is no FLR or D3 PM on my Nvidia cards, so that could >> be why (there is D3hot on ATI). But nouveau driver still exposes >> a reset node for the device. > > You said that the nvidia cards have no reset (above) but then they do > have a reset? Or is that the nvidia driver has no reset, but the > nouvau has? I'm saying that the nouveau module seems to expose a reset node under sysfs, but seemingly only for the current console device: # lspci -nn -qq | grep NVIDIA | grep VGA 07:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK104GL [GRID K2] [10de:11bf] (rev a1) 0e:00.0 VGA compatible controller [0300]: NVIDIA Corporation G92 [GeForce 8800 GT] [10de:0611] (rev a2) # find . -name reset | grep 07:00.0 # find . -name reset | grep 0e:00.0 ./devices/pci0000:00/0000:00:03.0/0000:0b:00.0/0000:0c:00.0/0000:0e:00.0/reset Gordan