* Device Reset on Nvidia GPUs
@ 2013-11-15 12:18 Gordan Bobic
2013-11-15 14:27 ` Stefano Stabellini
0 siblings, 1 reply; 7+ messages in thread
From: Gordan Bobic @ 2013-11-15 12:18 UTC (permalink / raw)
To: xen-devel
I've noticed that nouveau driver has a sysfs reset implemented
(although I'm not sure whether it is just a stub or whether it
does anything).
Now, I fully understand that this is not actually necessary,
based purely on empirical evidence:
My ATI cards reliably crash the host when the domU the are passed
to is rebooted, and the xen-pciback driver does have the sysfs
reset implemented for ATI cards.
OTOH, my (modified) Nvidia cards handle domU reboots perfectly
and the xen-pciback driver has no sysfs reset implementation
for those.
So I'm kind of torn between:
1) It's not broken so don't even think about trying to fix it.
2) Since FOSS reset implementation seems to exist, it might be
handy to port it into the xen-pciback feature list (caveat:
this may impact 1), which would be embarrasing).
Thoughts?
Gordan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Device Reset on Nvidia GPUs
2013-11-15 12:18 Device Reset on Nvidia GPUs Gordan Bobic
@ 2013-11-15 14:27 ` Stefano Stabellini
2013-11-15 14:29 ` Gordan Bobic
0 siblings, 1 reply; 7+ messages in thread
From: Stefano Stabellini @ 2013-11-15 14:27 UTC (permalink / raw)
To: Gordan Bobic; +Cc: xen-devel
On Fri, 15 Nov 2013, Gordan Bobic wrote:
> I've noticed that nouveau driver has a sysfs reset implemented
> (although I'm not sure whether it is just a stub or whether it
> does anything).
>
> Now, I fully understand that this is not actually necessary,
> based purely on empirical evidence:
>
> My ATI cards reliably crash the host when the domU the are passed
> to is rebooted, and the xen-pciback driver does have the sysfs
> reset implemented for ATI cards.
>
> OTOH, my (modified) Nvidia cards handle domU reboots perfectly
> and the xen-pciback driver has no sysfs reset implementation
> for those.
>
> So I'm kind of torn between:
> 1) It's not broken so don't even think about trying to fix it.
> 2) Since FOSS reset implementation seems to exist, it might be
> handy to port it into the xen-pciback feature list (caveat:
> this may impact 1), which would be embarrasing).
>
> Thoughts?
libxl is capable of using the sysfs reset node, so there shouldn't be
any needed for porting the reset code to pciback
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Device Reset on Nvidia GPUs
2013-11-15 14:27 ` Stefano Stabellini
@ 2013-11-15 14:29 ` Gordan Bobic
2013-11-15 14:40 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 7+ messages in thread
From: Gordan Bobic @ 2013-11-15 14:29 UTC (permalink / raw)
To: Stefano Stabellini; +Cc: xen-devel
On Fri, 15 Nov 2013 14:27:21 +0000, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Fri, 15 Nov 2013, Gordan Bobic wrote:
>> I've noticed that nouveau driver has a sysfs reset implemented
>> (although I'm not sure whether it is just a stub or whether it
>> does anything).
>>
>> Now, I fully understand that this is not actually necessary,
>> based purely on empirical evidence:
>>
>> My ATI cards reliably crash the host when the domU the are passed
>> to is rebooted, and the xen-pciback driver does have the sysfs
>> reset implemented for ATI cards.
>>
>> OTOH, my (modified) Nvidia cards handle domU reboots perfectly
>> and the xen-pciback driver has no sysfs reset implementation
>> for those.
>>
>> So I'm kind of torn between:
>> 1) It's not broken so don't even think about trying to fix it.
>> 2) Since FOSS reset implementation seems to exist, it might be
>> handy to port it into the xen-pciback feature list (caveat:
>> this may impact 1), which would be embarrasing).
>>
>> Thoughts?
>
> libxl is capable of using the sysfs reset node, so there shouldn't be
> any needed for porting the reset code to pciback
Not quite - when the device is owned by xen-pciback, there is
no reset node. When it is owned by nouveau, the reset node in
sysfs is there.
Gordan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Device Reset on Nvidia GPUs
2013-11-15 14:29 ` Gordan Bobic
@ 2013-11-15 14:40 ` Konrad Rzeszutek Wilk
2013-11-15 14:52 ` Gordan Bobic
0 siblings, 1 reply; 7+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-11-15 14:40 UTC (permalink / raw)
To: Gordan Bobic; +Cc: xen-devel, Stefano Stabellini
On Fri, Nov 15, 2013 at 02:29:39PM +0000, Gordan Bobic wrote:
> On Fri, 15 Nov 2013 14:27:21 +0000, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> >On Fri, 15 Nov 2013, Gordan Bobic wrote:
> >>I've noticed that nouveau driver has a sysfs reset implemented
> >>(although I'm not sure whether it is just a stub or whether it
> >>does anything).
> >>
> >>Now, I fully understand that this is not actually necessary,
> >>based purely on empirical evidence:
> >>
> >>My ATI cards reliably crash the host when the domU the are passed
> >>to is rebooted, and the xen-pciback driver does have the sysfs
> >>reset implemented for ATI cards.
> >>
> >>OTOH, my (modified) Nvidia cards handle domU reboots perfectly
> >>and the xen-pciback driver has no sysfs reset implementation
> >>for those.
> >>
> >>So I'm kind of torn between:
> >>1) It's not broken so don't even think about trying to fix it.
> >>2) Since FOSS reset implementation seems to exist, it might be
> >>handy to port it into the xen-pciback feature list (caveat:
> >>this may impact 1), which would be embarrasing).
> >>
> >>Thoughts?
> >
> >libxl is capable of using the sysfs reset node, so there shouldn't be
> >any needed for porting the reset code to pciback
>
> Not quite - when the device is owned by xen-pciback, there is
> no reset node. When it is owned by nouveau, the reset node in
> sysfs is there.
Sure, but pciback does the reset:
/* We need the device active to save the state. */
dev_dbg(&dev->dev, "save state of device\n");
pci_save_state(dev);
dev_data->pci_saved_state = pci_store_saved_state(dev);
if (!dev_data->pci_saved_state)
dev_err(&dev->dev, "Could not store PCI conf saved state!\n");
else {
dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the device\n");
__pci_reset_function_locked(dev);
pci_restore_state(dev);
}
The pci_reset_function(..) - the non-locked variant) is called when you
do 'reset' to the SysFS.
Unless the nouveau driver does some extra 'reset'?
>
> Gordan
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Device Reset on Nvidia GPUs
2013-11-15 14:40 ` Konrad Rzeszutek Wilk
@ 2013-11-15 14:52 ` Gordan Bobic
2013-11-25 15:17 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 7+ messages in thread
From: Gordan Bobic @ 2013-11-15 14:52 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Stefano Stabellini
On Fri, 15 Nov 2013 09:40:04 -0500, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Fri, Nov 15, 2013 at 02:29:39PM +0000, Gordan Bobic wrote:
>> On Fri, 15 Nov 2013 14:27:21 +0000, Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>> >On Fri, 15 Nov 2013, Gordan Bobic wrote:
>> >>I've noticed that nouveau driver has a sysfs reset implemented
>> >>(although I'm not sure whether it is just a stub or whether it
>> >>does anything).
>> >>
>> >>Now, I fully understand that this is not actually necessary,
>> >>based purely on empirical evidence:
>> >>
>> >>My ATI cards reliably crash the host when the domU the are passed
>> >>to is rebooted, and the xen-pciback driver does have the sysfs
>> >>reset implemented for ATI cards.
>> >>
>> >>OTOH, my (modified) Nvidia cards handle domU reboots perfectly
>> >>and the xen-pciback driver has no sysfs reset implementation
>> >>for those.
>> >>
>> >>So I'm kind of torn between:
>> >>1) It's not broken so don't even think about trying to fix it.
>> >>2) Since FOSS reset implementation seems to exist, it might be
>> >>handy to port it into the xen-pciback feature list (caveat:
>> >>this may impact 1), which would be embarrasing).
>> >>
>> >>Thoughts?
>> >
>> >libxl is capable of using the sysfs reset node, so there shouldn't
>> be
>> >any needed for porting the reset code to pciback
>>
>> Not quite - when the device is owned by xen-pciback, there is
>> no reset node. When it is owned by nouveau, the reset node in
>> sysfs is there.
>
> Sure, but pciback does the reset:
>
>
> /* We need the device active to save the state. */
>
> dev_dbg(&dev->dev, "save state of device\n");
>
> pci_save_state(dev);
>
> dev_data->pci_saved_state = pci_store_saved_state(dev);
>
> if (!dev_data->pci_saved_state)
>
> dev_err(&dev->dev, "Could not store PCI conf saved
> state!\n");
> else {
>
> dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the
> device\n");
> __pci_reset_function_locked(dev);
>
> pci_restore_state(dev);
>
> }
>
> The pci_reset_function(..) - the non-locked variant) is called when
> you
> do 'reset' to the SysFS.
>
> Unless the nouveau driver does some extra 'reset'?
I don't know for sure at the moment. I was just basing this on the
observation that xl complains that there is no sysfs reset for the
device when instantiating the VM and there being no reset node for
the device in sysfs.
It seems oddly inconsistent that there is a reset node in sysfs for
ATI cards (that crash the host on domU reboot) but no reset node for
Nvidia cards (which work fine on domU reboot).
There is no FLR or D3 PM on my Nvidia cards, so that could
be why (there is D3hot on ATI). But nouveau driver still exposes
a reset node for the device.
Gordan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Device Reset on Nvidia GPUs
2013-11-15 14:52 ` Gordan Bobic
@ 2013-11-25 15:17 ` Konrad Rzeszutek Wilk
2013-11-25 15:25 ` Gordan Bobic
0 siblings, 1 reply; 7+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-11-25 15:17 UTC (permalink / raw)
To: Gordan Bobic; +Cc: xen-devel, Stefano Stabellini
On Fri, Nov 15, 2013 at 02:52:56PM +0000, Gordan Bobic wrote:
> On Fri, 15 Nov 2013 09:40:04 -0500, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> >On Fri, Nov 15, 2013 at 02:29:39PM +0000, Gordan Bobic wrote:
> >>On Fri, 15 Nov 2013 14:27:21 +0000, Stefano Stabellini
> >><stefano.stabellini@eu.citrix.com> wrote:
> >>>On Fri, 15 Nov 2013, Gordan Bobic wrote:
> >>>>I've noticed that nouveau driver has a sysfs reset implemented
> >>>>(although I'm not sure whether it is just a stub or whether it
> >>>>does anything).
> >>>>
> >>>>Now, I fully understand that this is not actually necessary,
> >>>>based purely on empirical evidence:
> >>>>
> >>>>My ATI cards reliably crash the host when the domU the are passed
> >>>>to is rebooted, and the xen-pciback driver does have the sysfs
> >>>>reset implemented for ATI cards.
> >>>>
> >>>>OTOH, my (modified) Nvidia cards handle domU reboots perfectly
> >>>>and the xen-pciback driver has no sysfs reset implementation
> >>>>for those.
> >>>>
> >>>>So I'm kind of torn between:
> >>>>1) It's not broken so don't even think about trying to fix it.
> >>>>2) Since FOSS reset implementation seems to exist, it might be
> >>>>handy to port it into the xen-pciback feature list (caveat:
> >>>>this may impact 1), which would be embarrasing).
> >>>>
> >>>>Thoughts?
> >>>
> >>>libxl is capable of using the sysfs reset node, so there
> >>shouldn't be
> >>>any needed for porting the reset code to pciback
> >>
> >>Not quite - when the device is owned by xen-pciback, there is
> >>no reset node. When it is owned by nouveau, the reset node in
> >>sysfs is there.
> >
> >Sure, but pciback does the reset:
> >
> >
> > /* We need the device active to save the state. */
> >
> > dev_dbg(&dev->dev, "save state of device\n");
> >
> > pci_save_state(dev);
> >
> > dev_data->pci_saved_state = pci_store_saved_state(dev);
> >
> > if (!dev_data->pci_saved_state)
> >
> > dev_err(&dev->dev, "Could not store PCI conf saved
> >state!\n");
> > else {
> >
> > dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the
> >device\n");
> > __pci_reset_function_locked(dev);
> >
> > pci_restore_state(dev);
> >
> > }
> >
> >The pci_reset_function(..) - the non-locked variant) is called
> >when you
> >do 'reset' to the SysFS.
> >
> >Unless the nouveau driver does some extra 'reset'?
>
> I don't know for sure at the moment. I was just basing this on the
> observation that xl complains that there is no sysfs reset for the
> device when instantiating the VM and there being no reset node for
> the device in sysfs.
>
> It seems oddly inconsistent that there is a reset node in sysfs for
> ATI cards (that crash the host on domU reboot) but no reset node for
> Nvidia cards (which work fine on domU reboot).
>
> There is no FLR or D3 PM on my Nvidia cards, so that could
> be why (there is D3hot on ATI). But nouveau driver still exposes
> a reset node for the device.
You said that the nvidia cards have no reset (above) but then they do
have a reset? Or is that the nvidia driver has no reset, but the
nouvau has?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Device Reset on Nvidia GPUs
2013-11-25 15:17 ` Konrad Rzeszutek Wilk
@ 2013-11-25 15:25 ` Gordan Bobic
0 siblings, 0 replies; 7+ messages in thread
From: Gordan Bobic @ 2013-11-25 15:25 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Stefano Stabellini
On 11/25/2013 03:17 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Nov 15, 2013 at 02:52:56PM +0000, Gordan Bobic wrote:
>> On Fri, 15 Nov 2013 09:40:04 -0500, Konrad Rzeszutek Wilk
>> <konrad.wilk@oracle.com> wrote:
>>> On Fri, Nov 15, 2013 at 02:29:39PM +0000, Gordan Bobic wrote:
>>>> On Fri, 15 Nov 2013 14:27:21 +0000, Stefano Stabellini
>>>> <stefano.stabellini@eu.citrix.com> wrote:
>>>>> On Fri, 15 Nov 2013, Gordan Bobic wrote:
>>>>>> I've noticed that nouveau driver has a sysfs reset implemented
>>>>>> (although I'm not sure whether it is just a stub or whether it
>>>>>> does anything).
>>>>>>
>>>>>> Now, I fully understand that this is not actually necessary,
>>>>>> based purely on empirical evidence:
>>>>>>
>>>>>> My ATI cards reliably crash the host when the domU the are passed
>>>>>> to is rebooted, and the xen-pciback driver does have the sysfs
>>>>>> reset implemented for ATI cards.
>>>>>>
>>>>>> OTOH, my (modified) Nvidia cards handle domU reboots perfectly
>>>>>> and the xen-pciback driver has no sysfs reset implementation
>>>>>> for those.
>>>>>>
>>>>>> So I'm kind of torn between:
>>>>>> 1) It's not broken so don't even think about trying to fix it.
>>>>>> 2) Since FOSS reset implementation seems to exist, it might be
>>>>>> handy to port it into the xen-pciback feature list (caveat:
>>>>>> this may impact 1), which would be embarrasing).
>>>>>>
>>>>>> Thoughts?
>>>>>
>>>>> libxl is capable of using the sysfs reset node, so there
>>>> shouldn't be
>>>>> any needed for porting the reset code to pciback
>>>>
>>>> Not quite - when the device is owned by xen-pciback, there is
>>>> no reset node. When it is owned by nouveau, the reset node in
>>>> sysfs is there.
>>>
>>> Sure, but pciback does the reset:
>>>
>>>
>>> /* We need the device active to save the state. */
>>>
>>> dev_dbg(&dev->dev, "save state of device\n");
>>>
>>> pci_save_state(dev);
>>>
>>> dev_data->pci_saved_state = pci_store_saved_state(dev);
>>>
>>> if (!dev_data->pci_saved_state)
>>>
>>> dev_err(&dev->dev, "Could not store PCI conf saved
>>> state!\n");
>>> else {
>>>
>>> dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the
>>> device\n");
>>> __pci_reset_function_locked(dev);
>>>
>>> pci_restore_state(dev);
>>>
>>> }
>>>
>>> The pci_reset_function(..) - the non-locked variant) is called
>>> when you
>>> do 'reset' to the SysFS.
>>>
>>> Unless the nouveau driver does some extra 'reset'?
>>
>> I don't know for sure at the moment. I was just basing this on the
>> observation that xl complains that there is no sysfs reset for the
>> device when instantiating the VM and there being no reset node for
>> the device in sysfs.
>>
>> It seems oddly inconsistent that there is a reset node in sysfs for
>> ATI cards (that crash the host on domU reboot) but no reset node for
>> Nvidia cards (which work fine on domU reboot).
>>
>> There is no FLR or D3 PM on my Nvidia cards, so that could
>> be why (there is D3hot on ATI). But nouveau driver still exposes
>> a reset node for the device.
>
> You said that the nvidia cards have no reset (above) but then they do
> have a reset? Or is that the nvidia driver has no reset, but the
> nouvau has?
I'm saying that the nouveau module seems to expose a reset node under
sysfs, but seemingly only for the current console device:
# lspci -nn -qq | grep NVIDIA | grep VGA
07:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK104GL
[GRID K2] [10de:11bf] (rev a1)
0e:00.0 VGA compatible controller [0300]: NVIDIA Corporation G92
[GeForce 8800 GT] [10de:0611] (rev a2)
# find . -name reset | grep 07:00.0
# find . -name reset | grep 0e:00.0
./devices/pci0000:00/0000:00:03.0/0000:0b:00.0/0000:0c:00.0/0000:0e:00.0/reset
Gordan
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-11-25 15:25 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-15 12:18 Device Reset on Nvidia GPUs Gordan Bobic
2013-11-15 14:27 ` Stefano Stabellini
2013-11-15 14:29 ` Gordan Bobic
2013-11-15 14:40 ` Konrad Rzeszutek Wilk
2013-11-15 14:52 ` Gordan Bobic
2013-11-25 15:17 ` Konrad Rzeszutek Wilk
2013-11-25 15:25 ` Gordan Bobic
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).