linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Sheng Yang <sheng@linux.intel.com>
Subject: Re: [PATCH] pci: Disable slot presence detection around bus reset
Date: Thu, 14 Feb 2013 20:53:59 -0700	[thread overview]
Message-ID: <1360900439.3248.310.camel@bling.home> (raw)
In-Reply-To: <CAErSpo4uopqKwXY-6HrSZ9mk08tgY9L-iUG2ykmKcMK077tDrQ@mail.gmail.com>

On Thu, 2013-02-14 at 16:47 -0700, Bjorn Helgaas wrote:
> On Thu, Feb 14, 2013 at 11:37 AM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > A bus reset can trigger a presence detection change and result in a
> > suprise hotplug.  This is generally not what we want to happen when
> > trying to reset a device.  Disable the presence detection control on
> > on bridges around bus reset.
> 
> This is a really interesting situation, and I'm not quite ready to
> sign up to the idea that this is really a problem and that if it is,
> this is the way we want to fix it.
> 
> What would happen if we *did* handle this as a hotplug event, with a
> removal followed by an add?
> 
> The scheme where pci_reset_function() does "pci_save_state(dev);
> pci_dev_reset(dev); pci_restore_state(dev);" makes me nervous.
> 
> We're saving and restoring some of PCI config space around the reset,
> but there's no guarantee that we're preserving *all* the important
> state in config space because I think devices can have non-architected
> device-specific things in config space that we don't know how to
> save/restore.
> 
> Devices also have internal state not exposed via config space.  That
> state is lost during the reset but can't be restored by
> pci_restore_state().  So it seems like pci_reset_function() is
> pretending to do something it can't really do reliably.
> 
> If we make it so a reset is always handled as a remove+add, then we'll
> use a more generic path, and we'll get all the stuff you expect when
> initializing a new device -- resource assignment, IRQ setup, quirks,
> etc.  Quirks in particular seem like something we want, but don't
> currently get with pci_reset_function().
> 
> Oh, and the "disable presence detect" approach below only works for
> things below a PCIe bridge with native hotplug, right?  I wonder what
> happens if we reset devices below a bridge using SHPC or acpiphp.

Triggering a remove+add is not useful for the way we use it today.  The
users I'm aware of are KVM device assignment and VFIO, where we trigger
it in an attempt to get the device to a known state so that we have some
hope of repeatability.  In those scenarios the reset is initiated by the
driver.  The interface isn't meant to guarantee the device is returned
to an identical state as it was before reset.  If it did, why would we
call it?  We want to get to a state as near to power on, but still with
config programming, as we can.

Being driver directed, having the reset initiate a remove is pretty near
the last thing we want.  That limits the scope of calling it to only
when the driver can readily release the device.  If we have the device
attached to a guest or userspace driver, that's potentially a lot of
setup and teardown and effectively extending a surprise removal all the
way up the stack.

Obviously a bus reset is a big hammer and we do exhaust all the little
hammers of flr and pm reset before we try it, but in this case, we know
the device that's going away and with all likelihood, it's coming right
back at the same location.  If we take the path of forcing a remove+add,
let's just remove it from the reset_function call path and we'll do
without reset for those devices.  Thanks,

Alex

> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> >  drivers/pci/pci.c |   29 ++++++++++++++++++++++++-----
> >  1 file changed, 24 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 5cb5820..c1f7d77 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3229,8 +3229,8 @@ static int pci_pm_reset(struct pci_dev *dev, int probe)
> >
> >  static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
> >  {
> > -       u16 ctrl;
> > -       struct pci_dev *pdev;
> > +       u16 ctrl, flags, sltctl = 0;
> > +       struct pci_dev *pdev, *bridge;
> >
> >         if (pci_is_root_bus(dev->bus) || dev->subordinate || !dev->bus->self)
> >                 return -ENOTTY;
> > @@ -3242,15 +3242,34 @@ static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
> >         if (probe)
> >                 return 0;
> >
> > -       pci_read_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, &ctrl);
> > +       bridge = dev->bus->self;
> > +
> > +       /*
> > +        * If the parent device supports a slot with presence detection
> > +        * change enabled, holding the bus in reset can trigger that and
> > +        * cause an unwanted surprise removal.  Disable presence detection
> > +        * around the bus reset.
> > +        */
> > +       pcie_capability_read_word(bridge, PCI_EXP_FLAGS, &flags);
> > +       if (flags & PCI_EXP_FLAGS_SLOT) {
> > +               pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, &sltctl);
> > +               if (sltctl & PCI_EXP_SLTCTL_PDCE)
> > +                       pcie_capability_write_word(bridge, PCI_EXP_SLTCTL,
> > +                                               sltctl & ~PCI_EXP_SLTCTL_PDCE);
> > +       }
> > +
> > +       pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &ctrl);
> >         ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
> > -       pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
> > +       pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
> >         msleep(100);
> >
> >         ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
> > -       pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
> > +       pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
> >         msleep(100);
> >
> > +       if (sltctl & PCI_EXP_SLTCTL_PDCE)
> > +               pcie_capability_write_word(bridge, PCI_EXP_SLTCTL, sltctl);
> > +
> >         return 0;
> >  }
> >
> >




  reply	other threads:[~2013-02-15  3:54 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-14 18:37 [PATCH] pci: Disable slot presence detection around bus reset Alex Williamson
2013-02-14 22:17 ` Alex Williamson
2013-02-14 23:18   ` Martin Mokrejs
2013-02-14 23:47 ` Bjorn Helgaas
2013-02-15  3:53   ` Alex Williamson [this message]
2013-04-24 21:33     ` Alex Williamson
2013-04-26 19:49       ` Bjorn Helgaas
2013-04-29 16:08         ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1360900439.3248.310.camel@bling.home \
    --to=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=rjw@sisk.pl \
    --cc=sheng@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).