From: Jonathan Derrick <jonathan.derrick@linux.dev>
To: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>,
Bjorn Helgaas <helgaas@kernel.org>
Cc: linux-pci@vger.kernel.org,
Blazej Kucman <blazej.kucman@intel.com>,
Hans de Goede <hdegoede@redhat.com>,
Lukas Wunner <lukas@wunner.de>,
Naveen Naidu <naveennaidu479@gmail.com>,
Keith Busch <kbusch@kernel.org>,
Nirmal Patel <nirmal.patel@linux.intel.com>
Subject: Re: [Bug 215525] New: HotPlug does not work on upstream kernel 5.17.0-rc1
Date: Thu, 27 Jan 2022 13:47:08 -0700 [thread overview]
Message-ID: <154fcaf2-18cd-9ea9-eee2-bc8b8ee3468d@linux.dev> (raw)
In-Reply-To: <20220127154615.00003df8@linux.intel.com>
On 1/27/2022 7:46 AM, Mariusz Tkaczyk wrote:
> On Mon, 24 Jan 2022 15:46:35 -0600
> Bjorn Helgaas <helgaas@kernel.org> wrote:
>
>> [+cc linux-pci, Hans, Lukas, Naveen, Keith, Nirmal, Jonathan]
>>
>> On Mon, Jan 24, 2022 at 11:46:14AM +0000,
>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=215525
>>>
>>> Bug ID: 215525
>>> Summary: HotPlug does not work on upstream kernel
>>> 5.17.0-rc1 Product: Drivers
>>> Version: 2.5
>>> Kernel Version: 5.17.0-rc1 upstream
>>> Hardware: x86-64
>>> OS: Linux
>>> Tree: Mainline
>>> Status: NEW
>>> Severity: normal
>>> Priority: P1
>>> Component: PCI
>>> Assignee: drivers_pci@kernel-bugs.osdl.org
>>> Reporter: blazej.kucman@intel.com
>>> Regression: No
>>>
>>> Created attachment 300308
>>> -->
>>> https://bugzilla.kernel.org/attachment.cgi?id=300308&action=edit
>>> dmesg
>>>
>>> While testing on latest upstream
>>> kernel(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/)
>>> we noticed that with the merge commit
>>> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d0a231f01e5b25bacd23e6edc7c979a18a517b2b)
>>> hotplug and hotunplug of nvme drives stopped working.
>>>
>>> Rescan PCI does not help.
>>> echo "1" > /sys/bus/pci/rescan
>>>
>>> Issue does not reproduce on a kernel built on an antecedent
>>> commit(88db8458086b1dcf20b56682504bdb34d2bca0e2).
>>>
>>>
>>> During hot-remove device does not disappear, however when we try to
>>> do I/O on the disk then there is an I/O error, and the device
>>> disappears.
>>>
>>> Before I/O no logs regarding the disk appeared in the dmesg, only
>>> after I/O the entries appeared like below:
>>> [ 177.943703] nvme nvme5: controller is down; will reset:
>>> CSTS=0xffffffff, PCI_STATUS=0xffff
>>> [ 177.971661] nvme 10000:0b:00.0: can't change power state from
>>> D3cold to D0 (config space inaccessible)
>>> [ 177.981121] pcieport 10000:00:02.0: can't derive routing for PCI
>>> INT A [ 177.987749] nvme 10000:0b:00.0: PCI INT A: no GSI
>>> [ 177.992633] nvme nvme5: Removing after probe failure status: -19
>>> [ 178.004633] nvme5n1: detected capacity change from 83984375 to 0
>>> [ 178.004677] I/O error, dev nvme5n1, sector 0 op 0x0:(READ) flags
>>> 0x0 phys_seg 1 prio class 0
>>>
>>>
>>> OS: RHEL 8.4 GA
>>> Platform: Intel Purley
>>>
>>> The logs are collected on a non-recent upstream kernel, but a issue
>>> also occurs on the newest upstream
>>> kernel(dd81e1c7d5fb126e5fbc5c9e334d7b3ec29a16a0)
>>
>> Apparently worked immediately before merging the PCI changes for
>> v5.17 and failed immediately after:
>>
>> good: 88db8458086b ("Merge tag 'exfat-for-5.17-rc1' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat") bad:
>> d0a231f01e5b ("Merge tag 'pci-v5.17-changes' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci")
>>
>> Only three commits touch pciehp:
>>
>> 085a9f43433f ("PCI: pciehp: Use down_read/write_nested(reset_lock)
>> to fix lockdep errors") 23584c1ed3e1 ("PCI: pciehp: Fix infinite loop
>> in IRQ handler upon power fault") a3b0f10db148 ("PCI: pciehp: Use
>> PCI_POSSIBLE_ERROR() to check config reads")
>>
>> None seems obviously related to me. Blazej, could you try setting
>> CONFIG_DYNAMIC_DEBUG=y and booting with 'dyndbg="file pciehp* +p"' to
>> enable more debug messages?
>>
>
> Hi Bjorn,
>
> Thanks for your suggestions. Blazej did some tests and results were
> inconclusive. He tested it on two same platforms. On the first one it
> didn't work, even if he reverted all suggested patches. On the second
> one hotplugs always worked.
>
> He noticed that on first platform where issue has been found initally,
> there was boot parameter "pci=nommconf". After adding this parameter
> on the second platform, hotplugs stopped working too.
>
> Tested on tag pci-v5.17-changes. He have CONFIG_HOTPLUG_PCI_PCIE
> and CONFIG_DYNAMIC_DEBUG enabled in config. He also attached two dmesg
> logs to bugzilla with boot parameter 'dyndbg="file pciehp* +p" as
> requested. One with "pci=nommconf" and one without.
>
> Issue seems to related to "pci=nommconf" and it is probably caused
> by change outside pciehp.
Could it be related to this?
int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int
devfn, int reg, int len, u32 *val)
{
if (domain == 0 && reg < 256 && raw_pci_ops)
return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
if (raw_pci_ext_ops)
return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
return -EINVAL;
}
It looks like raw_pci_ext_ops won't be set with nommconf, and VMD
subdevice domain will be > 0.
>
> He is currently working on email client setup to answer himself.
>
> Thanks,
> Mariusz
>
>
next prev parent reply other threads:[~2022-01-27 20:47 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bug-215525-41252@https.bugzilla.kernel.org/>
2022-01-24 21:46 ` [Bug 215525] New: HotPlug does not work on upstream kernel 5.17.0-rc1 Bjorn Helgaas
2022-01-25 8:58 ` Hans de Goede
2022-01-25 15:33 ` Lukas Wunner
2022-01-26 7:31 ` Thorsten Leemhuis
2022-01-27 14:46 ` Mariusz Tkaczyk
2022-01-27 20:47 ` Jonathan Derrick [this message]
2022-01-27 22:31 ` Jonathan Derrick
2022-01-28 2:52 ` Bjorn Helgaas
2022-01-28 8:29 ` Mariusz Tkaczyk
2022-01-28 13:08 ` Bjorn Helgaas
2022-01-28 13:49 ` Kai-Heng Feng
2022-01-28 14:03 ` Bjorn Helgaas
2022-02-02 15:48 ` Blazej Kucman
2022-02-02 16:43 ` Bjorn Helgaas
2022-02-03 9:13 ` Thorsten Leemhuis
2022-02-03 10:47 ` Blazej Kucman
2022-02-03 15:58 ` Bjorn Helgaas
2022-02-09 13:41 ` Blazej Kucman
2022-02-09 21:02 ` Bjorn Helgaas
2022-02-10 11:14 ` Blazej Kucman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=154fcaf2-18cd-9ea9-eee2-bc8b8ee3468d@linux.dev \
--to=jonathan.derrick@linux.dev \
--cc=blazej.kucman@intel.com \
--cc=hdegoede@redhat.com \
--cc=helgaas@kernel.org \
--cc=kbusch@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mariusz.tkaczyk@linux.intel.com \
--cc=naveennaidu479@gmail.com \
--cc=nirmal.patel@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).