public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andreas Hartmann <andihartmann@freenet.de>
To: Alex Williamson <alex.williamson@redhat.com>,
	Andreas Hartmann <andihartmann@freenet.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 4/4] PCI: quirk Atheros AR93xx to avoid bus reset
Date: Mon, 12 Jan 2015 20:15:16 +0100	[thread overview]
Message-ID: <54B41D44.5080606@maya.org> (raw)
In-Reply-To: <1421081344.6130.54.camel@redhat.com>

Hello Alex!

Alex Williamson wrote:
> On Mon, 2015-01-12 at 16:20 +0100, Andreas Hartmann wrote:
>> Alex Williamson wrote:
>>> On Thu, 2015-01-08 at 09:07 -0700, Bjorn Helgaas wrote:
>>>> On Fri, Nov 21, 2014 at 11:24:27AM -0700, Alex Williamson wrote:
>>>>> Reports against the TL-WDN4800 card indicate that PCI bus reset of
>>>>> this Atheros device cause system lock-ups and resets.  I've also
>>>>> been able to confirm this behavior on multiple systems.  The device
>>>>> never returns from reset and attempts to access config space of the
>>>>> device after reset result in hangs.  Blacklist bus reset for the
>>>>> device to avoid this issue.
>>>>>
>>>>> Reported-by: Andreas Hartmann <andihartmann@freenet.de>
>>>>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>>>>> Tested-by: Andreas Hartmann <andihartmann@freenet.de>
>>>>
>>>> If I understand correctly, these two (patches 3 & 4) fix a v3.14 regression
>>>> caused by 425c1b223dac ("PCI: Add Virtual Channel to save/restore support").
>>>>
>>>> If so, these should go to for-linus for v3.19.  What about patches 1 & 2?
>>>> Do they fix a regression?  Is there a pointer to a bugzilla or problem
>>>> report about that issue?
>>>>
>>>> I don't understand the connection between 425c1b223dac and
>>>> PCI_DEV_FLAGS_NO_BUS_RESET, because 425c1b223dac doesn't seem to do any
>>>> resets.  Is that the wrong commit, or can you outline the connection for
>>>> me?
>>>
>>> TBH, I don't have a lot of faith in associating this to 425c1b223dac,
>>> I'm not sure how Andreas' bisect landed there. 
>>
>> Because removing this patch made it working again :-)
>>
>> And too:
>> http://thread.gmane.org/gmane.linux.kernel.pci/35170/focus=35984
>>
>> Kernel 2.10. and 2.12. and 2.13. did work fine for me. 2.14 is the first
>> kernel, which hangs the machine at startup of the VM. The userland
>> (qemu) didn't change in between.
> 
> s/2\./3\./

Thanks :-) It seems I don't like the number 3 :-)

> Ok, so what about VC save/restore (425c1b223dac) is the problem then?
> When we tried to determine that, you found that if we continue from the
> top of the save loop, everything works (ie. no VC state saved), but if
> you continue after the variable declaration of the same loop (ie. still
> no VC state saved), it breaks:
> 
> http://www.spinics.net/lists/linux-pci/msg36166.html
> 
> So, please forgive me if I don't have a whole lot of faith that
> 425c1b223dac is involved.

It's hard for me, too. Really. It's kind of mystique.

> We also both independently determined that this particular device never
> recovers from a PCI bus reset, even when done from userspace with setpci
> and absolutely no save/restore wrappers.

Yes.

>  Config space on the device is
> never accessible after the reset.

Yes.

>  Therefore, how could any sort of bus
> reset with save/restore ever work for this device?

I can't say. What I definitely can say, is that I never had problems
with running VMs w/ qemu until 3.14 came up. Do you think I'm lying? I
used 3.10. and 3.12. for long time w/o (known!) problems (3.12 only on
first start of VM). Otherwise I would have been here long time before :-))).

>> Therefore: from my point of view, it is a regression, because things
>> have been working < 2.14.
>>
>> Besides that: It is undoubted, that there is a problem with resetting
>> this card. But the difference between >= 3.14 and < 3.14 is, that < 3.14
>> has been working nevertheless. The patch
>> 425c1b223dac456d00a61fd6b451b6d1cf00d065 obviously changed something
>> which I can't say and I don't know off. Therefore, the quirk-patch is
>> definitely required, because things work completely fine again w/ this
>> patch.
>>
>> "Working" means for me here: I was able to start (and use) the VM w/o
>> crashing the machine and this isn't possible w/ unpatched 2.14+ any
>> more. Yes, w/ 2.12, I wasn't able to restart the VM (it then crashed the
>> machine), but w/ 2.10 even this was possible.
> 
> What?!  So v3.12 still had a machine crash when assigning this device.

Yes. If you *re*start the VM (long time, I didn't knew that fact at all
- I just discovered it during testing while analyzing the problem :-)).
The first start (after reboot) was not a problem. This was the usual use
case here :-)).

Believe me, I'm really convinced that this card does have a problem with
resets. I'm just wondering why it had worked for me until 3.13. That's all.

> The vfio hot reset interface was added in v3.12, so v3.10 didn't have
> any way to do a reset other than what pci_reset_function() decided to
> do.  That all seems to associate the machine crash to the ability to do
> a bus reset on the device.  I'm not sure why the behavior changed
> between v3.14 and v3.12 (maybe the try-reset addition), but there's some
> sort of pre-existing issue before we even got to 425c1b223dac.

Most probably.

> I'm perfectly happy tagging this for stable,

Thanks!! I'm really very comfortable with your patch and your support!
Really! Thanks a lot! It's just odd for me, why it partly worked (first
start of VM worked) w/ 3.12 and 3.13 and 3.14 suddenly no more at all.

You have been accidentally the sufferer - most probably it could have
hit any other change, too. Sorry for that :-(. Therefore: kudos for
anyway fixing the problem. This is really not a matter of course at all!

> but it seems like a
> hardware bug exposed by allowing userspace the ability to select a bus
> reset.  Whether or not that's a kernel regression isn't exactly clear to
> me ("new functionality exposes broken hardware, news at 11").  Thanks,
> 
> Alex


Kind regards,
Andreas

>>> IME, this device cannot,
>>> and has never been able to handle a bus reset.  A simple setpci
>>> experiment on the commandline can confirm this.  What I think happened
>>> is that with the PCI bus reset infrastructure we added, we switched QEMU
>>> to prefer PCI bus resets over things like PM D3hot->D0 resets.  So it's
>>> just more prolific use of bus resets by userspace.
>>>
>>> There's also no regression in 1 & 2, PM reset has never done anything
>>> useful on those devices.  Thanks,
>>>
>>> Alex
>>>
>>>>> ---
>>>>>
>>>>>  drivers/pci/quirks.c |   14 ++++++++++++++
>>>>>  1 file changed, 14 insertions(+)
>>>>>
>>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>>> index 561e10d..ebbd5b4 100644
>>>>> --- a/drivers/pci/quirks.c
>>>>> +++ b/drivers/pci/quirks.c
>>>>> @@ -3029,6 +3029,20 @@ static void quirk_no_pm_reset(struct pci_dev *dev)
>>>>>  DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_ATI, PCI_ANY_ID,
>>>>>  			       PCI_CLASS_DISPLAY_VGA, 8, quirk_no_pm_reset);
>>>>>  
>>>>> +static void quirk_no_bus_reset(struct pci_dev *dev)
>>>>> +{
>>>>> +	dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * Atheros AR93xx chips do not behave after a bus reset.  The device will
>>>>> + * throw a Link Down error on AER capable system and regardless of AER,
>>>>> + * config space of the device is never accessible again and typically
>>>>> + * causes the system to hang or reset when access is attempted.
>>>>> + * http://www.spinics.net/lists/linux-pci/msg34797.html
>>>>> + */
>>>>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0030, quirk_no_bus_reset);
>>>>> +
>>>>>  #ifdef CONFIG_ACPI
>>>>>  /*
>>>>>   * Apple: Shutdown Cactus Ridge Thunderbolt controller.
>>>>>
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> 
> 
> 


  reply	other threads:[~2015-01-12 19:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-21 18:24 [PATCH 0/4] PCI: Reset exclusions Alex Williamson
2014-11-21 18:24 ` [PATCH 1/4] PCI: Allow device quirks to exclude D3->D0 PM reset Alex Williamson
2014-11-21 18:24 ` [PATCH 2/4] PCI: quirk AMD/ATI VGA cards to avoid " Alex Williamson
2014-11-21 19:00   ` Deucher, Alexander
2014-11-21 18:24 ` [PATCH 3/4] PCI: Allow device quirks to exclude bus reset Alex Williamson
2014-11-21 18:24 ` [PATCH 4/4] PCI: quirk Atheros AR93xx to avoid " Alex Williamson
2014-12-26  7:56   ` Andreas Hartmann
2015-01-08 16:07   ` Bjorn Helgaas
2015-01-08 19:30     ` Alex Williamson
2015-01-08 23:10       ` Bjorn Helgaas
2015-01-12 15:20       ` Andreas Hartmann
2015-01-12 16:49         ` Alex Williamson
2015-01-12 19:15           ` Andreas Hartmann [this message]
2015-01-13  0:37             ` Bjorn Helgaas
2015-01-16  0:28 ` [PATCH 0/4] PCI: Reset exclusions Bjorn Helgaas
2015-01-16 16:15   ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54B41D44.5080606@maya.org \
    --to=andihartmann@freenet.de \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox