From: Basavaraj Natikar <bnatikar@amd.com>
To: Mario Limonciello <mario.limonciello@amd.com>,
Bjorn Helgaas <helgaas@kernel.org>
Cc: "Natikar, Basavaraj" <Basavaraj.Natikar@amd.com>,
"bhelgaas@google.com" <bhelgaas@google.com>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"thomas@glanzmann.de" <thomas@glanzmann.de>
Subject: Re: [PATCH] PCI: Add quirk to clear MSI-X
Date: Fri, 10 Mar 2023 13:11:40 +0530 [thread overview]
Message-ID: <ddbbfb50-24b6-202f-7452-c8959901c739@amd.com> (raw)
In-Reply-To: <0e1bd2cd-ea0e-7f2f-3d4a-62e9dea892b8@amd.com>
On 3/10/2023 6:27 AM, Mario Limonciello wrote:
> On 3/9/23 16:30, Bjorn Helgaas wrote:
>> On Thu, Mar 09, 2023 at 12:32:41PM -0600, Limonciello, Mario wrote:
>>> On 3/9/2023 12:25, Bjorn Helgaas wrote:
>>> ...
>>
>>>>>> https://gitlab.freedesktop.org/agd5f/linux/-/commit/07494a25fc8881e122c242a46b5c53e0e4403139
>>>>>>
>>>>
>>>> That nbio_v7.2.c patch and this patch don't look anything alike. It
>>>> looks like the nbio_v7.2.c patch might run once? Could *this* be done
>>>> once at enumeration-time, too?
>>>
>>> They don't look anything alike because they're attacking the problem
>>> from
>>> different angles.
>>
>> Why do we need different angles?
>
> The GPU driver approach only works if the GPU is enabled. If the GPU
> could never be disabled then it alone would be sufficient.
>
>>
>>> The NBIO patch fixes the initialization value for the internal
>>> registers.
>>> This is what the BIOS "should" have done. When the internal
>>> registers are
>>> configured properly then the behavior the kernel expects works as well.
>>>
>>> The NBIO patch will run both at amdgpu startup as well as when
>>> resuming from
>>> suspend.
>>
>> If initializing something as BIOS should have done makes the hardware
>> work correctly, isn't once enough? Why does the NBIO patch need to
>> run at resume-time?
>
> During suspend some internal registers are in a power domain that the
> state will be lost. These are typically restored by the BIOS to the
> values defined in initialization tables before handing control back to
> the OS.
>
>
>>
>>> This patch we're discussing treats the symptoms of the deficiency
>>> and avoids
>>> the impact.
>>> This patch runs any time the controller is runtime resumed. So, yes
>>> it will
>>> run more frequently. Because this patch is treating the symptoms it
>>> needs
>>> to be applied every single time the controller exits D3.
>>
>> This patch runs at *suspend*-time (DECLARE_PCI_FIXUP_SUSPEND), not
>> resume-time.
>>
>> The difference is important because with this broken BIOS, MSI-X is
>> disabled between the suspend quirk and some distant point in resume.
>> With non-broken BIOS, MSI-X remains *enabled* for at least part of
>> that period, and I don't want to have to figure out whether that
>> difference is important.
>
> I'll let Basavaraj comment on the timing here with the behavior
> workaround and sequence of events.
As replied in the previous mail, Bjron's suggestion works well and holds good so I will
change the quirk to apply in resume instead of suspend which also resolves the issue
as below i.e. restoring during resume if MSI-X is enabled works.
static void quirk_restore_msix_en(struct pci_dev *dev)
{
u16 ctrl;
pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &ctrl);
if (!(ctrl & PCI_MSIX_FLAGS_ENABLE))
return;
ctrl &= ~PCI_MSIX_FLAGS_ENABLE;
pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, ctrl);
ctrl |= PCI_MSIX_FLAGS_ENABLE;
pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, ctrl);
}
DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_AMD, 0x15b8, quirk_restore_msix_en);
Will change the commit message accordingly.
I guess for the questions below we already answered.
Please let us know if you need more clarifications.
Thanks,
--
Basavaraj
>
>>
>> We have fragments of a coherent commit log, but it's not quite a
>> complete story yet. I think so far we have:
>>
>> - Issue affects only the 1022:15b8 USB controller (well, I guess it
>> also affects some GPU device?)
>
> Same device. It's just a way to access the internal registers.
>
>> - Only a problem when BIOS doesn't initialize controller correctly
>> - Controller claims to preserve internal state on D3hot->D0
>> transition, but it doesn't
>> - D0->D3hot->D0 transitions do preserve external PCI_MSIX_FLAGS
>> state; only internal state is lost
>> - When MSI-X is enabled and controller transitions D0->D3hot->D0,
>> MSI-X appears enabled per PCI_MSIX_FLAGS, but is actually
>> *disabled* because the internal state was lost
>> - MSI-X being disabled leads to xhci_hcd command timeouts because
>> interrupts are missed
>> - Not possible for an enumeration-time quirk to fix the controller
>> initialization problem (why not?)
>> - Writing PCI_MSIX_FLAGS with a *different* value fixes the internal
>> state; writing the same value does nothing
>> - A suspend- or resume-time quirk can work around this, and this is
>> safe on *all* 1022:15b8 devices regardless of whether the BIOS is
>> broken
>> - The same approach can't be used for both 1022:15b8 and the GPU
>> device because ...?
>>
>> Bjorn
>
next prev parent reply other threads:[~2023-03-10 7:44 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-06 7:23 [PATCH] PCI: Add quirk to clear MSI-X Basavaraj Natikar
2023-03-06 8:14 ` Thomas Glanzmann
2023-03-08 22:44 ` Bjorn Helgaas
2023-03-08 23:04 ` Limonciello, Mario
2023-03-09 7:34 ` Basavaraj Natikar
2023-03-09 18:25 ` Bjorn Helgaas
2023-03-09 18:32 ` Limonciello, Mario
2023-03-09 22:30 ` Bjorn Helgaas
2023-03-10 0:57 ` Mario Limonciello
2023-03-10 7:41 ` Basavaraj Natikar [this message]
2023-03-10 22:13 ` Bjorn Helgaas
2023-03-20 1:32 ` Limonciello, Mario
2023-03-20 17:14 ` Bjorn Helgaas
2023-03-20 17:20 ` Limonciello, Mario
2023-03-20 19:36 ` Bjorn Helgaas
2023-03-20 19:47 ` Limonciello, Mario
2023-03-20 21:30 ` Bjorn Helgaas
2023-03-20 21:37 ` Limonciello, Mario
2023-03-20 22:08 ` Bjorn Helgaas
2023-03-20 22:52 ` Mario Limonciello
2023-03-21 11:07 ` Bjorn Helgaas
2023-03-28 13:15 ` Basavaraj Natikar
2023-03-28 13:25 ` Limonciello, Mario
2023-03-28 17:42 ` Bjorn Helgaas
2023-03-10 7:22 ` Basavaraj Natikar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ddbbfb50-24b6-202f-7452-c8959901c739@amd.com \
--to=bnatikar@amd.com \
--cc=Basavaraj.Natikar@amd.com \
--cc=bhelgaas@google.com \
--cc=helgaas@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mario.limonciello@amd.com \
--cc=thomas@glanzmann.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).