public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
* Re: [PATCH] x86/PCI: Revert: "Clip only host bridge windows for E820 regions"
       [not found] <6eae37ce-dd44-9c32-3f68-2b4e102dce8e@igalia.com>
@ 2022-06-14 23:01 ` Bjorn Helgaas
  2022-06-14 23:47   ` Keith Busch
  0 siblings, 1 reply; 4+ messages in thread
From: Bjorn Helgaas @ 2022-06-14 23:01 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Hans de Goede, Rafael J . Wysocki, Mika Westerberg,
	Krzysztof Wilczyński, Bjorn Helgaas, Myron Stowe,
	Juha-Pekka Heikkila, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Benoit Grégoire, Hui Wang,
	linux-acpi, linux-pci, x86, linux-kernel, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

[+cc NVMe folks]

On Tue, Jun 14, 2022 at 07:49:27PM -0300, Guilherme G. Piccoli wrote:
> On 14/06/2022 12:47, Hans de Goede wrote:
> > [...]
> > 
> > Have you looked at the log of the failed boot in the Steam Deck kernel
> > bugzilla? Everything there seems to work just fine and then the system
> > just hangs. I think that maybe it cannot find its root disk, so maybe
> > an NVME issue ?
> 
> *Exactly* that - NVMe device is the root disk, it cannot boot since the
> device doesn't work, hence no rootfs =)

Beginning of thread: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com

Steam Deck broke because we erroneously trimmed out the PCI host
bridge window where BIOS had placed most devices, successfully
reassigned all the PCI bridge windows and BARs, but some devices,
apparently including NVMe, didn't work at the new addresses.

Do you NVMe folks know of gotchas in this area?  I want to know
because we'd like to be able to move devices around someday to make
room for hot-added devices.

This reassignment happened before drivers claimed the devices, so from
a PCI point of view, I don't know why the NVMe device wouldn't work at
the new address.

Bjorn


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86/PCI: Revert: "Clip only host bridge windows for E820 regions"
  2022-06-14 23:01 ` [PATCH] x86/PCI: Revert: "Clip only host bridge windows for E820 regions" Bjorn Helgaas
@ 2022-06-14 23:47   ` Keith Busch
  2022-06-15 15:11     ` Bjorn Helgaas
  0 siblings, 1 reply; 4+ messages in thread
From: Keith Busch @ 2022-06-14 23:47 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Guilherme G. Piccoli, Hans de Goede, Rafael J . Wysocki,
	Mika Westerberg, Krzysztof Wilczyński, Bjorn Helgaas,
	Myron Stowe, Juha-Pekka Heikkila, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Benoit Grégoire, Hui Wang,
	linux-acpi, linux-pci, x86, linux-kernel, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

On Tue, Jun 14, 2022 at 06:01:28PM -0500, Bjorn Helgaas wrote:
> [+cc NVMe folks]
> 
> On Tue, Jun 14, 2022 at 07:49:27PM -0300, Guilherme G. Piccoli wrote:
> > On 14/06/2022 12:47, Hans de Goede wrote:
> > > [...]
> > > 
> > > Have you looked at the log of the failed boot in the Steam Deck kernel
> > > bugzilla? Everything there seems to work just fine and then the system
> > > just hangs. I think that maybe it cannot find its root disk, so maybe
> > > an NVME issue ?
> > 
> > *Exactly* that - NVMe device is the root disk, it cannot boot since the
> > device doesn't work, hence no rootfs =)
> 
> Beginning of thread: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com
> 
> Steam Deck broke because we erroneously trimmed out the PCI host
> bridge window where BIOS had placed most devices, successfully
> reassigned all the PCI bridge windows and BARs, but some devices,
> apparently including NVMe, didn't work at the new addresses.
> 
> Do you NVMe folks know of gotchas in this area?  I want to know
> because we'd like to be able to move devices around someday to make
> room for hot-added devices.
> 
> This reassignment happened before drivers claimed the devices, so from
> a PCI point of view, I don't know why the NVMe device wouldn't work at
> the new address.

The probe status quickly returns ENODEV. Based on the output (we don't log
much, so this is just an educated guesss), I think that means the driver read
all F's from the status register, which indicates we can't read it when using
the reassigned memory window.

Why changing memory windows may not work tends to be platform or device
specific. Considering the renumbered windows didn't cause a problem for other
devices, it sounds like this nvme device may be broken.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86/PCI: Revert: "Clip only host bridge windows for E820 regions"
  2022-06-14 23:47   ` Keith Busch
@ 2022-06-15 15:11     ` Bjorn Helgaas
  2022-06-17 20:27       ` Keith Busch
  0 siblings, 1 reply; 4+ messages in thread
From: Bjorn Helgaas @ 2022-06-15 15:11 UTC (permalink / raw)
  To: Keith Busch
  Cc: Guilherme G. Piccoli, Hans de Goede, Rafael J . Wysocki,
	Mika Westerberg, Krzysztof Wilczyński, Bjorn Helgaas,
	Myron Stowe, Juha-Pekka Heikkila, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Benoit Grégoire, Hui Wang,
	linux-acpi, linux-pci, x86, linux-kernel, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

On Tue, Jun 14, 2022 at 04:47:35PM -0700, Keith Busch wrote:
> On Tue, Jun 14, 2022 at 06:01:28PM -0500, Bjorn Helgaas wrote:
> > [+cc NVMe folks]
> > 
> > On Tue, Jun 14, 2022 at 07:49:27PM -0300, Guilherme G. Piccoli wrote:
> > > On 14/06/2022 12:47, Hans de Goede wrote:
> > > > [...]
> > > > 
> > > > Have you looked at the log of the failed boot in the Steam Deck kernel
> > > > bugzilla? Everything there seems to work just fine and then the system
> > > > just hangs. I think that maybe it cannot find its root disk, so maybe
> > > > an NVME issue ?
> > > 
> > > *Exactly* that - NVMe device is the root disk, it cannot boot since the
> > > device doesn't work, hence no rootfs =)
> > 
> > Beginning of thread: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com
> > 
> > Steam Deck broke because we erroneously trimmed out the PCI host
> > bridge window where BIOS had placed most devices, successfully
> > reassigned all the PCI bridge windows and BARs, but some devices,
> > apparently including NVMe, didn't work at the new addresses.
> > 
> > Do you NVMe folks know of gotchas in this area?  I want to know
> > because we'd like to be able to move devices around someday to
> > make room for hot-added devices.
> > 
> > This reassignment happened before drivers claimed the devices, so
> > from a PCI point of view, I don't know why the NVMe device
> > wouldn't work at the new address.
> 
> The probe status quickly returns ENODEV. Based on the output (we
> don't log much, so this is just an educated guesss), I think that
> means the driver read all F's from the status register, which
> indicates we can't read it when using the reassigned memory window.
> 
> Why changing memory windows may not work tends to be platform or
> device specific. Considering the renumbered windows didn't cause a
> problem for other devices, it sounds like this nvme device may be
> broken.

It sounds like you've seen this sort of problem before, so we
shouldn't assume that it's safe to reassign BARs.

I think Windows supports rebalancing, but it does look like drivers
have the ability to veto it:

  https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/stopping-a-device-to-rebalance-resources
  https://docs.microsoft.com/en-us/windows-hardware/drivers/wdf/the-pnp-manager-redistributes-system-resources

So I suppose if/when we support rebalancing, it'll have to be an
opt-in thing for each driver.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86/PCI: Revert: "Clip only host bridge windows for E820 regions"
  2022-06-15 15:11     ` Bjorn Helgaas
@ 2022-06-17 20:27       ` Keith Busch
  0 siblings, 0 replies; 4+ messages in thread
From: Keith Busch @ 2022-06-17 20:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Guilherme G. Piccoli, Hans de Goede, Rafael J . Wysocki,
	Mika Westerberg, Krzysztof Wilczyński, Bjorn Helgaas,
	Myron Stowe, Juha-Pekka Heikkila, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Benoit Grégoire, Hui Wang,
	linux-acpi, linux-pci, x86, linux-kernel, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

On Wed, Jun 15, 2022 at 10:11:00AM -0500, Bjorn Helgaas wrote:
> On Tue, Jun 14, 2022 at 04:47:35PM -0700, Keith Busch wrote:
> > On Tue, Jun 14, 2022 at 06:01:28PM -0500, Bjorn Helgaas wrote:
> > > [+cc NVMe folks]
> > > 
> > > On Tue, Jun 14, 2022 at 07:49:27PM -0300, Guilherme G. Piccoli wrote:
> > > > On 14/06/2022 12:47, Hans de Goede wrote:
> > > > > [...]
> > > > > 
> > > > > Have you looked at the log of the failed boot in the Steam Deck kernel
> > > > > bugzilla? Everything there seems to work just fine and then the system
> > > > > just hangs. I think that maybe it cannot find its root disk, so maybe
> > > > > an NVME issue ?
> > > > 
> > > > *Exactly* that - NVMe device is the root disk, it cannot boot since the
> > > > device doesn't work, hence no rootfs =)
> > > 
> > > Beginning of thread: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com
> > > 
> > > Steam Deck broke because we erroneously trimmed out the PCI host
> > > bridge window where BIOS had placed most devices, successfully
> > > reassigned all the PCI bridge windows and BARs, but some devices,
> > > apparently including NVMe, didn't work at the new addresses.
> > > 
> > > Do you NVMe folks know of gotchas in this area?  I want to know
> > > because we'd like to be able to move devices around someday to
> > > make room for hot-added devices.
> > > 
> > > This reassignment happened before drivers claimed the devices, so
> > > from a PCI point of view, I don't know why the NVMe device
> > > wouldn't work at the new address.
> > 
> > The probe status quickly returns ENODEV. Based on the output (we
> > don't log much, so this is just an educated guesss), I think that
> > means the driver read all F's from the status register, which
> > indicates we can't read it when using the reassigned memory window.
> > 
> > Why changing memory windows may not work tends to be platform or
> > device specific. Considering the renumbered windows didn't cause a
> > problem for other devices, it sounds like this nvme device may be
> > broken.
> 
> It sounds like you've seen this sort of problem before, so we
> shouldn't assume that it's safe to reassign BARs.

I haven't seen this type of problem in years, but as I recall, it was always
low-end consumer crap that couldn't deal with changing BARs; you're stuck with
whatever was set after it was initially powered on. The PCI topology will
reflect the expected renumbering, but whatever is happening on the other side
of the PCI function seems to be unaware of the change.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-06-17 20:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <6eae37ce-dd44-9c32-3f68-2b4e102dce8e@igalia.com>
2022-06-14 23:01 ` [PATCH] x86/PCI: Revert: "Clip only host bridge windows for E820 regions" Bjorn Helgaas
2022-06-14 23:47   ` Keith Busch
2022-06-15 15:11     ` Bjorn Helgaas
2022-06-17 20:27       ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox