* [bugzilla-daemon@kernel.org: [Bug 217251] New: pciehp: nvme not visible after re-insert to tbt port]
@ 2023-03-27 14:33 Bjorn Helgaas
2023-03-27 16:18 ` Keith Busch
0 siblings, 1 reply; 4+ messages in thread
From: Bjorn Helgaas @ 2023-03-27 14:33 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Lukas Wunner
Cc: Aleksander Trofimowicz, linux-pci, linux-nvme, linux-kernel
Forwarding to NVMe folks, lists for visibility.
----- Forwarded message from bugzilla-daemon@kernel.org -----
https://bugzilla.kernel.org/show_bug.cgi?id=217251
...
Created attachment 304031
--> https://bugzilla.kernel.org/attachment.cgi?id=304031&action=edit
the tracing of nvme_pci_enable() during re-insertion
Hi,
There is a JHL7540-based device that may host a NVMe device. After the first
insertion a nvme drive is properly discovered and handled by the relevant
modules. Once disconnected any further attempts are not successful. The device
is visible on a PCI bus, but nvme_pci_enable() ends up calling
pci_disable_device() every time; the runtime PM status of the device is
"suspended", the power status of the 04:01.0 PCI bridge is D3. Preventing the
device from being power managed ("on" -> /sys/devices/../power/control)
combined with device removal and pci rescan changes nothing. A host reboot
restores the initial state.
I would appreciate any suggestions how to debug it further.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [bugzilla-daemon@kernel.org: [Bug 217251] New: pciehp: nvme not visible after re-insert to tbt port]
2023-03-27 14:33 [bugzilla-daemon@kernel.org: [Bug 217251] New: pciehp: nvme not visible after re-insert to tbt port] Bjorn Helgaas
@ 2023-03-27 16:18 ` Keith Busch
2023-03-27 17:43 ` Aleksander Trofimowicz
0 siblings, 1 reply; 4+ messages in thread
From: Keith Busch @ 2023-03-27 16:18 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Lukas Wunner,
Aleksander Trofimowicz, linux-pci, linux-nvme, linux-kernel
On Mon, Mar 27, 2023 at 09:33:59AM -0500, Bjorn Helgaas wrote:
> Forwarding to NVMe folks, lists for visibility.
>
> ----- Forwarded message from bugzilla-daemon@kernel.org -----
>
> https://bugzilla.kernel.org/show_bug.cgi?id=217251
> ...
>
> Created attachment 304031
> --> https://bugzilla.kernel.org/attachment.cgi?id=304031&action=edit
> the tracing of nvme_pci_enable() during re-insertion
>
> Hi,
>
> There is a JHL7540-based device that may host a NVMe device. After the first
> insertion a nvme drive is properly discovered and handled by the relevant
> modules. Once disconnected any further attempts are not successful. The device
> is visible on a PCI bus, but nvme_pci_enable() ends up calling
> pci_disable_device() every time; the runtime PM status of the device is
> "suspended", the power status of the 04:01.0 PCI bridge is D3. Preventing the
> device from being power managed ("on" -> /sys/devices/../power/control)
> combined with device removal and pci rescan changes nothing. A host reboot
> restores the initial state.
>
> I would appreciate any suggestions how to debug it further.
Sounds the same as this report:
http://lists.infradead.org/pipermail/linux-nvme/2023-March/038259.html
The driver is bailing on the device because we can't read it's status register
out of the remapped BAR. There's nothing we can do about that from the nvme
driver level. Memory mapped IO has to work in order to proceed.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [bugzilla-daemon@kernel.org: [Bug 217251] New: pciehp: nvme not visible after re-insert to tbt port]
2023-03-27 16:18 ` Keith Busch
@ 2023-03-27 17:43 ` Aleksander Trofimowicz
2023-03-27 18:25 ` Keith Busch
0 siblings, 1 reply; 4+ messages in thread
From: Aleksander Trofimowicz @ 2023-03-27 17:43 UTC (permalink / raw)
To: Keith Busch
Cc: Bjorn Helgaas, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Lukas Wunner, linux-pci, linux-nvme, linux-kernel
Keith Busch <kbusch@kernel.org> writes:
> On Mon, Mar 27, 2023 at 09:33:59AM -0500, Bjorn Helgaas wrote:
>> Forwarding to NVMe folks, lists for visibility.
>>
>> ----- Forwarded message from bugzilla-daemon@kernel.org -----
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=217251
>> ...
>>
>> Created attachment 304031
>> --> https://bugzilla.kernel.org/attachment.cgi?id=304031&action=edit
>> the tracing of nvme_pci_enable() during re-insertion
>>
>> Hi,
>>
>> There is a JHL7540-based device that may host a NVMe device. After the first
>> insertion a nvme drive is properly discovered and handled by the relevant
>> modules. Once disconnected any further attempts are not successful. The device
>> is visible on a PCI bus, but nvme_pci_enable() ends up calling
>> pci_disable_device() every time; the runtime PM status of the device is
>> "suspended", the power status of the 04:01.0 PCI bridge is D3. Preventing the
>> device from being power managed ("on" -> /sys/devices/../power/control)
>> combined with device removal and pci rescan changes nothing. A host reboot
>> restores the initial state.
>>
>> I would appreciate any suggestions how to debug it further.
>
> Sounds the same as this report:
>
> http://lists.infradead.org/pipermail/linux-nvme/2023-March/038259.html
>
> The driver is bailing on the device because we can't read it's status register
> out of the remapped BAR. There's nothing we can do about that from the nvme
> driver level. Memory mapped IO has to work in order to proceed.
>
Thanks. I can confirm it is the same problem:
a) the platform is Intel Alderlake
b) readl(dev->bar + NVME_REG_CSTS) in nvme_pci_enable() fails
c) reading BAR0 via setpci gives 0x00000004
--
at
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [bugzilla-daemon@kernel.org: [Bug 217251] New: pciehp: nvme not visible after re-insert to tbt port]
2023-03-27 17:43 ` Aleksander Trofimowicz
@ 2023-03-27 18:25 ` Keith Busch
0 siblings, 0 replies; 4+ messages in thread
From: Keith Busch @ 2023-03-27 18:25 UTC (permalink / raw)
To: Aleksander Trofimowicz
Cc: Bjorn Helgaas, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Lukas Wunner, linux-pci, linux-nvme, linux-kernel
On Mon, Mar 27, 2023 at 05:43:18PM +0000, Aleksander Trofimowicz wrote:
>
> Keith Busch <kbusch@kernel.org> writes:
>
> > On Mon, Mar 27, 2023 at 09:33:59AM -0500, Bjorn Helgaas wrote:
> >> Forwarding to NVMe folks, lists for visibility.
> >>
> >> ----- Forwarded message from bugzilla-daemon@kernel.org -----
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=217251
> >> ...
> >>
> >> Created attachment 304031
> >> --> https://bugzilla.kernel.org/attachment.cgi?id=304031&action=edit
> >> the tracing of nvme_pci_enable() during re-insertion
> >>
> >> Hi,
> >>
> >> There is a JHL7540-based device that may host a NVMe device. After the first
> >> insertion a nvme drive is properly discovered and handled by the relevant
> >> modules. Once disconnected any further attempts are not successful. The device
> >> is visible on a PCI bus, but nvme_pci_enable() ends up calling
> >> pci_disable_device() every time; the runtime PM status of the device is
> >> "suspended", the power status of the 04:01.0 PCI bridge is D3. Preventing the
> >> device from being power managed ("on" -> /sys/devices/../power/control)
> >> combined with device removal and pci rescan changes nothing. A host reboot
> >> restores the initial state.
> >>
> >> I would appreciate any suggestions how to debug it further.
> >
> > Sounds the same as this report:
> >
> > http://lists.infradead.org/pipermail/linux-nvme/2023-March/038259.html
> >
> > The driver is bailing on the device because we can't read it's status register
> > out of the remapped BAR. There's nothing we can do about that from the nvme
> > driver level. Memory mapped IO has to work in order to proceed.
> >
> Thanks. I can confirm it is the same problem:
>
> a) the platform is Intel Alderlake
> b) readl(dev->bar + NVME_REG_CSTS) in nvme_pci_enable() fails
> c) reading BAR0 via setpci gives 0x00000004
It's strange too. In your example, kernel says:
0000:05:00.0: BAR 0: assigned [mem 0x54000000-0x54003fff 64bit]
There is a check right after that message that ensures the kernel reads back
what it wrote. No failures reported means the device really did have the
expected BAR value at one point.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-03-27 18:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-27 14:33 [bugzilla-daemon@kernel.org: [Bug 217251] New: pciehp: nvme not visible after re-insert to tbt port] Bjorn Helgaas
2023-03-27 16:18 ` Keith Busch
2023-03-27 17:43 ` Aleksander Trofimowicz
2023-03-27 18:25 ` Keith Busch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox