linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Basavaraj Natikar <bnatikar@amd.com>
To: Mario Limonciello <mario.limonciello@amd.com>,
	Bjorn Helgaas <helgaas@kernel.org>
Cc: "Natikar, Basavaraj" <Basavaraj.Natikar@amd.com>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"thomas@glanzmann.de" <thomas@glanzmann.de>
Subject: Re: [PATCH] PCI: Add quirk to clear MSI-X
Date: Fri, 10 Mar 2023 13:11:40 +0530	[thread overview]
Message-ID: <ddbbfb50-24b6-202f-7452-c8959901c739@amd.com> (raw)
In-Reply-To: <0e1bd2cd-ea0e-7f2f-3d4a-62e9dea892b8@amd.com>


On 3/10/2023 6:27 AM, Mario Limonciello wrote:
> On 3/9/23 16:30, Bjorn Helgaas wrote:
>> On Thu, Mar 09, 2023 at 12:32:41PM -0600, Limonciello, Mario wrote:
>>> On 3/9/2023 12:25, Bjorn Helgaas wrote:
>>> ...
>>
>>>>>> https://gitlab.freedesktop.org/agd5f/linux/-/commit/07494a25fc8881e122c242a46b5c53e0e4403139
>>>>>>
>>>>
>>>> That nbio_v7.2.c patch and this patch don't look anything alike.  It
>>>> looks like the nbio_v7.2.c patch might run once?  Could *this* be done
>>>> once at enumeration-time, too?
>>>
>>> They don't look anything alike because they're attacking the problem
>>> from
>>> different angles.
>>
>> Why do we need different angles?
>
> The GPU driver approach only works if the GPU is enabled.  If the GPU
> could never be disabled then it alone would be sufficient.
>
>>
>>> The NBIO patch fixes the initialization value for the internal
>>> registers.
>>> This is what the BIOS "should" have done.  When the internal
>>> registers are
>>> configured properly then the behavior the kernel expects works as well.
>>>
>>> The NBIO patch will run both at amdgpu startup as well as when
>>> resuming from
>>> suspend.
>>
>> If initializing something as BIOS should have done makes the hardware
>> work correctly, isn't once enough?  Why does the NBIO patch need to
>> run at resume-time?
>
> During suspend some internal registers are in a power domain that the
> state will be lost.  These are typically restored by the BIOS to the
> values defined in initialization tables before handing control back to
> the OS.
>
>
>>
>>> This patch we're discussing treats the symptoms of the deficiency
>>> and avoids
>>> the impact.
>>> This patch runs any time the controller is runtime resumed.  So, yes
>>> it will
>>> run more frequently.  Because this patch is treating the symptoms it
>>> needs
>>> to be applied every single time the controller exits D3.
>>
>> This patch runs at *suspend*-time (DECLARE_PCI_FIXUP_SUSPEND), not
>> resume-time.
>>
>> The difference is important because with this broken BIOS, MSI-X is
>> disabled between the suspend quirk and some distant point in resume.
>> With non-broken BIOS, MSI-X remains *enabled* for at least part of
>> that period, and I don't want to have to figure out whether that
>> difference is important.
>
> I'll let Basavaraj comment on the timing here with the behavior
> workaround and sequence of events.

As replied in the previous mail, Bjron's suggestion works well and holds good so I will
change the quirk to apply in resume instead of suspend which also resolves the issue
as below i.e. restoring during resume if MSI-X is enabled works.

static void quirk_restore_msix_en(struct pci_dev *dev)
{
        u16 ctrl;

        pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &ctrl);
        if (!(ctrl & PCI_MSIX_FLAGS_ENABLE))
                return;

        ctrl &= ~PCI_MSIX_FLAGS_ENABLE;
        pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, ctrl);
        ctrl |= PCI_MSIX_FLAGS_ENABLE;
        pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, ctrl);
}
DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_AMD, 0x15b8, quirk_restore_msix_en);


Will change the commit message accordingly.

I guess for the questions below we already answered.
Please let us know if you need more clarifications.


Thanks,
--
Basavaraj

>
>>
>> We have fragments of a coherent commit log, but it's not quite a
>> complete story yet.  I think so far we have:
>>
>>    - Issue affects only the 1022:15b8 USB controller (well, I guess it
>>      also affects some GPU device?)
>
> Same device.  It's just a way to access the internal registers.
>
>>    - Only a problem when BIOS doesn't initialize controller correctly
>>    - Controller claims to preserve internal state on D3hot->D0
>>      transition, but it doesn't
>>    - D0->D3hot->D0 transitions do preserve external PCI_MSIX_FLAGS
>>      state; only internal state is lost
>>    - When MSI-X is enabled and controller transitions D0->D3hot->D0,
>>      MSI-X appears enabled per PCI_MSIX_FLAGS, but is actually
>>      *disabled* because the internal state was lost
>>    - MSI-X being disabled leads to xhci_hcd command timeouts because
>>      interrupts are missed
>>    - Not possible for an enumeration-time quirk to fix the controller
>>      initialization problem (why not?)
>>    - Writing PCI_MSIX_FLAGS with a *different* value fixes the internal
>>      state; writing the same value does nothing
>>    - A suspend- or resume-time quirk can work around this, and this is
>>      safe on *all* 1022:15b8 devices regardless of whether the BIOS is
>>      broken
>>    - The same approach can't be used for both 1022:15b8 and the GPU
>>      device because ...?
>>
>> Bjorn
>


  reply	other threads:[~2023-03-10  7:44 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-06  7:23 [PATCH] PCI: Add quirk to clear MSI-X Basavaraj Natikar
2023-03-06  8:14 ` Thomas Glanzmann
2023-03-08 22:44 ` Bjorn Helgaas
2023-03-08 23:04   ` Limonciello, Mario
2023-03-09  7:34     ` Basavaraj Natikar
2023-03-09 18:25       ` Bjorn Helgaas
2023-03-09 18:32         ` Limonciello, Mario
2023-03-09 22:30           ` Bjorn Helgaas
2023-03-10  0:57             ` Mario Limonciello
2023-03-10  7:41               ` Basavaraj Natikar [this message]
2023-03-10 22:13               ` Bjorn Helgaas
2023-03-20  1:32                 ` Limonciello, Mario
2023-03-20 17:14                   ` Bjorn Helgaas
2023-03-20 17:20                     ` Limonciello, Mario
2023-03-20 19:36                       ` Bjorn Helgaas
2023-03-20 19:47                         ` Limonciello, Mario
2023-03-20 21:30                           ` Bjorn Helgaas
2023-03-20 21:37                             ` Limonciello, Mario
2023-03-20 22:08                               ` Bjorn Helgaas
2023-03-20 22:52                                 ` Mario Limonciello
2023-03-21 11:07                                   ` Bjorn Helgaas
2023-03-28 13:15                                     ` Basavaraj Natikar
2023-03-28 13:25                                       ` Limonciello, Mario
2023-03-28 17:42                                       ` Bjorn Helgaas
2023-03-10  7:22         ` Basavaraj Natikar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ddbbfb50-24b6-202f-7452-c8959901c739@amd.com \
    --to=bnatikar@amd.com \
    --cc=Basavaraj.Natikar@amd.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mario.limonciello@amd.com \
    --cc=thomas@glanzmann.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).