All of lore.kernel.org
 help / color / mirror / Atom feed
From: helgaas@kernel.org (Bjorn Helgaas)
Subject: [Bug 112121] New: Some PCIe options cause devices to be removed after suspend
Date: Mon, 21 Mar 2016 11:36:37 -0500	[thread overview]
Message-ID: <20160321163637.GA12288@localhost> (raw)
In-Reply-To: <CAHbf0-GBiMVr4=n3ZT1NSH8Xgdh3TcdDd=Di+pok6Ep6h42ZOQ@mail.gmail.com>

Hi Mike,

I'm sorry this slipped through the cracks.   I apologize for the
inability of Google Inbox to send plaintext email; I use mutt
because that's a hassle for me, too.

On Sat, Feb 13, 2016@11:39:52PM +0000, Mike Lothian wrote:
> On 8 February 2016@13:51, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > [+cc linux-pci, NVMe folks, power management folks]
> >
> > On Sun, Feb 7, 2016@11:04 AM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> >> https://bugzilla.kernel.org/show_bug.cgi?id=112121
> >>
> >>             Bug ID: 112121
> >>            Summary: Some PCIe options cause devices to be removed after
> >>                     syspend
> >>            Product: Drivers
> >>            Version: 2.5
> >>     Kernel Version: 4.5-rc2
> >>           Hardware: All
> >>                 OS: Linux
> >>               Tree: Mainline
> >>             Status: NEW
> >>           Severity: normal
> >>           Priority: P1
> >>          Component: PCI
> >>           Assignee: drivers_pci at kernel-bugs.osdl.org
> >>           Reporter: mike at fireburn.co.uk
> >>         Regression: No
> >>
> >> Created attachment 203091
> >>   --> https://bugzilla.kernel.org/attachment.cgi?id=203091&action=edit
> >> Dmesg showing PCIe device removals
> >>
> >> I was having issues with suspend, when the machine was being resumed iommu
> >> started removing devices - including my PCIe NVMe drive which contained my root
> >> partition
> >>
> >> The problem showed up with:
> >>
> >> [*] PCI support
> >> [*]   Support mmconfig PCI config space access
> >> [*]   PCI Express Port Bus support
> >> [*]     PCI Express Hotplug driver
> >> [*]     Root Port Advanced Error Reporting support
> >> [*]       PCI Express ECRC settings control
> >> < >       PCIe AER error injector support
> >> -*-     PCI Express ASPM control
> >> [ ]       Debug PCI Express ASPM
> >>           Default ASPM policy (BIOS default)  --->
> >> [*]   Message Signaled Interrupts (MSI and MSI-X)
> >> [ ]   PCI Debugging
> >> [*]   Enable PCI resource re-allocation detection
> >> < >   PCI Stub driver
> >> [*]   Interrupts on hypertransport devices
> >> [ ] PCI IOV support
> >> [*] PCI PRI support
> >> -*- PCI PASID support
> >>     PCI host controller drivers  ----
> >> < > PCCard (PCMCIA/CardBus) support  ----
> >> [*] Support for PCI Hotplug  --->
> >> < > RapidIO support
> >>
> >>
> >> This is what I have now:
> >>
> >> [*] PCI support
> >> [*]   Support mmconfig PCI config space access
> >> [*]   PCI Express Port Bus support
> >> [ ]     Root Port Advanced Error Reporting support
> >> -*-     PCI Express ASPM control
> >> [ ]       Debug PCI Express ASPM
> >>           Default ASPM policy (BIOS default)  --->
> >> [*]   Message Signaled Interrupts (MSI and MSI-X)
> >> [*]   PCI Debugging
> >> [ ]   Enable PCI resource re-allocation detection
> >> < >   PCI Stub driver
> >> [*]   Interrupts on hypertransport devices
> >> [ ] PCI IOV support
> >> [ ] PCI PRI support
> >> [ ] PCI PASID support
> >>     PCI host controller drivers  ----
> >> < > PCCard (PCMCIA/CardBus) support  ----
> >> [ ] Support for PCI Hotplug  ----
> >> < > RapidIO support
> >>
> >> I tried disabling the iommu driver first but it had no effect
> >>
> >> If people are interested I could play with the above options to see which one
> >> causes the issue
> >
> > My guess is that PCI hotplug is the important one.  It would be nice
> > if dmesg contained enough information to connect nvme0n1 to a PCI
> > device.  It'd be even nicer if the PCI core noted device removals or
> > whatever happened here.
> >
> > You don't get any more details if you boot with "ignore_loglevel", do you?
> >
> > Mike, you didn't mark this as a regression, so I assume it's always
> > been this way, and we just haven't noticed it because most people
> > enable PCI hotplug (or whatever the relevant config option is).
> 
> I've just tested this again, I enabled PCI Hotplug & PCIe Hotplug and
> nothing - then I noticed I hadn't enabled the ACPI Hotplug driver -
> once I did the issue re-appeared
> 
> I then had to use testdisk to restore my partition table :'(
> 
> I've attached the updated dmesg & my .config

Correct me if I'm wrong:

  - With CONFIG_HOTPLUG_PCI_ACPI not set, suspend/resume works fine
  - With CONFIG_HOTPLUG_PCI_ACPI=y, resume fails as shown in your dmesg log
    (https://bugzilla.kernel.org/attachment.cgi?id=203621)

WARNING: multiple messages have this Message-ID (diff)
From: Bjorn Helgaas <helgaas@kernel.org>
To: Mike Lothian <mike@fireburn.co.uk>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Keith Busch <keith.busch@intel.com>, Jens Axboe <axboe@fb.com>,
	linux-nvme <linux-nvme@lists.infradead.org>,
	Rafael Wysocki <rjw@rjwysocki.net>,
	Linux PM list <linux-pm@vger.kernel.org>
Subject: Re: [Bug 112121] New: Some PCIe options cause devices to be removed after suspend
Date: Mon, 21 Mar 2016 11:36:37 -0500	[thread overview]
Message-ID: <20160321163637.GA12288@localhost> (raw)
In-Reply-To: <CAHbf0-GBiMVr4=n3ZT1NSH8Xgdh3TcdDd=Di+pok6Ep6h42ZOQ@mail.gmail.com>

Hi Mike,

I'm sorry this slipped through the cracks.   I apologize for the
inability of Google Inbox to send plaintext email; I use mutt
because that's a hassle for me, too.

On Sat, Feb 13, 2016 at 11:39:52PM +0000, Mike Lothian wrote:
> On 8 February 2016 at 13:51, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > [+cc linux-pci, NVMe folks, power management folks]
> >
> > On Sun, Feb 7, 2016 at 11:04 AM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> >> https://bugzilla.kernel.org/show_bug.cgi?id=112121
> >>
> >>             Bug ID: 112121
> >>            Summary: Some PCIe options cause devices to be removed after
> >>                     syspend
> >>            Product: Drivers
> >>            Version: 2.5
> >>     Kernel Version: 4.5-rc2
> >>           Hardware: All
> >>                 OS: Linux
> >>               Tree: Mainline
> >>             Status: NEW
> >>           Severity: normal
> >>           Priority: P1
> >>          Component: PCI
> >>           Assignee: drivers_pci@kernel-bugs.osdl.org
> >>           Reporter: mike@fireburn.co.uk
> >>         Regression: No
> >>
> >> Created attachment 203091
> >>   --> https://bugzilla.kernel.org/attachment.cgi?id=203091&action=edit
> >> Dmesg showing PCIe device removals
> >>
> >> I was having issues with suspend, when the machine was being resumed iommu
> >> started removing devices - including my PCIe NVMe drive which contained my root
> >> partition
> >>
> >> The problem showed up with:
> >>
> >> [*] PCI support
> >> [*]   Support mmconfig PCI config space access
> >> [*]   PCI Express Port Bus support
> >> [*]     PCI Express Hotplug driver
> >> [*]     Root Port Advanced Error Reporting support
> >> [*]       PCI Express ECRC settings control
> >> < >       PCIe AER error injector support
> >> -*-     PCI Express ASPM control
> >> [ ]       Debug PCI Express ASPM
> >>           Default ASPM policy (BIOS default)  --->
> >> [*]   Message Signaled Interrupts (MSI and MSI-X)
> >> [ ]   PCI Debugging
> >> [*]   Enable PCI resource re-allocation detection
> >> < >   PCI Stub driver
> >> [*]   Interrupts on hypertransport devices
> >> [ ] PCI IOV support
> >> [*] PCI PRI support
> >> -*- PCI PASID support
> >>     PCI host controller drivers  ----
> >> < > PCCard (PCMCIA/CardBus) support  ----
> >> [*] Support for PCI Hotplug  --->
> >> < > RapidIO support
> >>
> >>
> >> This is what I have now:
> >>
> >> [*] PCI support
> >> [*]   Support mmconfig PCI config space access
> >> [*]   PCI Express Port Bus support
> >> [ ]     Root Port Advanced Error Reporting support
> >> -*-     PCI Express ASPM control
> >> [ ]       Debug PCI Express ASPM
> >>           Default ASPM policy (BIOS default)  --->
> >> [*]   Message Signaled Interrupts (MSI and MSI-X)
> >> [*]   PCI Debugging
> >> [ ]   Enable PCI resource re-allocation detection
> >> < >   PCI Stub driver
> >> [*]   Interrupts on hypertransport devices
> >> [ ] PCI IOV support
> >> [ ] PCI PRI support
> >> [ ] PCI PASID support
> >>     PCI host controller drivers  ----
> >> < > PCCard (PCMCIA/CardBus) support  ----
> >> [ ] Support for PCI Hotplug  ----
> >> < > RapidIO support
> >>
> >> I tried disabling the iommu driver first but it had no effect
> >>
> >> If people are interested I could play with the above options to see which one
> >> causes the issue
> >
> > My guess is that PCI hotplug is the important one.  It would be nice
> > if dmesg contained enough information to connect nvme0n1 to a PCI
> > device.  It'd be even nicer if the PCI core noted device removals or
> > whatever happened here.
> >
> > You don't get any more details if you boot with "ignore_loglevel", do you?
> >
> > Mike, you didn't mark this as a regression, so I assume it's always
> > been this way, and we just haven't noticed it because most people
> > enable PCI hotplug (or whatever the relevant config option is).
> 
> I've just tested this again, I enabled PCI Hotplug & PCIe Hotplug and
> nothing - then I noticed I hadn't enabled the ACPI Hotplug driver -
> once I did the issue re-appeared
> 
> I then had to use testdisk to restore my partition table :'(
> 
> I've attached the updated dmesg & my .config

Correct me if I'm wrong:

  - With CONFIG_HOTPLUG_PCI_ACPI not set, suspend/resume works fine
  - With CONFIG_HOTPLUG_PCI_ACPI=y, resume fails as shown in your dmesg log
    (https://bugzilla.kernel.org/attachment.cgi?id=203621)

  reply	other threads:[~2016-03-21 16:36 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-112121-41252@https.bugzilla.kernel.org/>
2016-02-08 13:51 ` [Bug 112121] New: Some PCIe options cause devices to be removed after syspend Bjorn Helgaas
2016-02-08 13:51   ` Bjorn Helgaas
2016-02-13 23:39   ` Mike Lothian
2016-02-13 23:39     ` Mike Lothian
2016-03-21 16:36     ` Bjorn Helgaas [this message]
2016-03-21 16:36       ` [Bug 112121] New: Some PCIe options cause devices to be removed after suspend Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160321163637.GA12288@localhost \
    --to=helgaas@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.