linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Norris <briannorris@chromium.org>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Rajat Jain <rajatja@google.com>,
	Guenter Roeck <linux@roeck-us.net>,
	Jeffy Chen <jeffy.chen@rock-chips.com>,
	Shawn Lin <shawn.lin@rock-chips.com>,
	Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [RFC PATCH] PCI: disable MSI/MSI-X before resetting
Date: Fri, 12 May 2017 14:09:24 -0700	[thread overview]
Message-ID: <20170512210923.GA23103@google.com> (raw)
In-Reply-To: <CAKgT0UdqfCjUvz4=FNEnBudJhSBEeYCfD66owO8LFVg6T_io0A@mail.gmail.com>

On Fri, May 12, 2017 at 10:07:29AM -0700, Alexander Duyck wrote:
> On Thu, May 11, 2017 at 2:54 PM, Brian Norris <briannorris@chromium.org> wrote:
> > Despite the claims in the associated comment block, it seems that
> > clearing the command register is not enough to guarantee that no
> > MSI interrupts get triggered during Function Level Reset. Through code
> > instrumentation, I'm able to clearly trace cases like this:
> >
> > (0) reset a device:
> >         echo 1 > /sys/bus/pci/devices/xxx/reset
> > (1) disable an MSI interrupt for device 'xxx' in a PCI reset handler
> >     (disable_irq())
> > (2) pcie_flr() initiates reset:
> >         pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR));
> > (3) about 0.5 ms later, kernel handles IRQ:
> >         -> __pci_msi_desc_mask_irq()
> >            -> pci_write_config_dword()
> 
> Is this an irq from the device being reset, or could this be an
> interrupt from something else such as the root port?

Hmm, I'm pretty sure it was from the device, but I'll double check to be
sure.

> > (4) this is well before the device is actually ready, and the system
> >     sees a bus abort
> >
> > Tested with MSI, but presumably MSI-X could have the same issue.
> >
> > Tested on Samsung Chromebook Plus, with RK3399 OP1.
> >
> > Signed-off-by: Brian Norris <briannorris@chromium.org>
> > ---
> > RFC, because I'm not really sure this is the right approach, or if
> > there's something else that's misconfigured or buggy.
> >
> > Note that right now, configuration space aborts trigger SError aborts
> > (and panics) on RK3399, so this sort of problem is fatal. It's not clear
> > to me if that's a spec violation (many other PCI controllers manage to
> > mask such aborts), or an acceptable behavior. But FWIW, that means that
> > polling behavior like in commit 5adecf817dd6 ("PCI: Wait for up to
> > 1000ms after FLR reset") cannot work; the first read when the device is
> > not ready will cause a panic.
> 
> I'm adding Alex Williamson since he is the author of the patch you
> call out here. It is also possible that he may have already submitted
> a patch somewhere to fix something like this that I might not be aware
> of.

I haven't seen anything, but I haven't been looking specifically for
(non-merged) submissions related to that commit.

> Odds are what probably needs to happen is that the we should probably
> take the same steps in to disable master abort mode that we do in
> pci_scan_bridge. That way if we don't get a response from the device
> in time we don't trigger a master abort which will then feed back up
> the bus and cause other issues. I'll see if I can put together a quick
> patch to address the issue if you are up for testing it.

Ah, that could make some sense. I'll give your patch a test. That
doesn't really address my main reported issue though, AFAICT from a
quick read (and test).

> > ---
> >  drivers/pci/pci.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index b01bd5bba8e6..861a3b2d7026 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -4156,6 +4156,17 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
> >         pci_set_power_state(dev, PCI_D0);
> >
> >         pci_save_state(dev);
> > +
> > +       /*
> > +        * Disable MSI/MSI-X before resetting. Some devices have been found to
> > +        * trigger interrupts while in the middle of Function Level Reset. The
> > +        * MSI/MSI-X state will get restored after we reset.
> > +        */
> > +       if (dev->msi_enabled)
> > +               pci_msi_set_enable(dev, 0);
> > +       if (dev->msix_enabled)
> > +               pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
> > +
> 
> So if your device is still issuing MSI or MSI-X interrupts after
> completing the line below which overwrites the command register then
> there is definitely an issue in the hardware. All bus master
> transactions should have been flushed out and blocked after clearing
> the bus master enable bit and verifying hat the transactions pending
> bit is cleared in the status register.

Yeah, I got this impression :) I'll see if I can double check what
exactly is triggering this interrupt before passing judgment.

> >         /*
> >          * Disable the device by clearing the Command register, except for
> >          * INTx-disable which is set.  This not only disables MMIO and I/O port

Brian

      reply	other threads:[~2017-05-12 21:09 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-11 21:54 [RFC PATCH] PCI: disable MSI/MSI-X before resetting Brian Norris
2017-05-12 17:07 ` Alexander Duyck
2017-05-12 21:09   ` Brian Norris [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170512210923.GA23103@google.com \
    --to=briannorris@chromium.org \
    --cc=alex.williamson@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=bhelgaas@google.com \
    --cc=jeffy.chen@rock-chips.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=rajatja@google.com \
    --cc=shawn.lin@rock-chips.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).