Re: PCI: hotplug_event: PCIe PLDA Device BAR Reset

* Re: PCI: hotplug_event: PCIe PLDA Device BAR Reset
       [not found] <CAMciSVU4vv7=WjVUhuP3PJHdpnYqrgMPCmz-HnijEbhyxk54eQ@mail.gmail.com>
@ 2025-02-19 17:06 ` Bjorn Helgaas
  0 siblings, 0 replies; 7+ messages in thread
From: Bjorn Helgaas @ 2025-02-19 17:06 UTC (permalink / raw)
  To: Naveen Kumar P; +Cc: linux-pci, linux-acpi, linux-kernel, kernelnewbies

[+cc linux-acpi]

On Wed, Feb 19, 2025 at 05:52:47PM +0530, Naveen Kumar P wrote:
> Hi all,
> 
> I am writing to seek assistance with an issue we are experiencing with
> a PCIe device (PLDA Device 5555) connected through PCI Express Root
> Port 1 to the host bridge.
> 
> We have observed that after booting the system, the Base Address
> Register (BAR0) memory of this device gets reset to 0x0 after
> approximately one hour or more (the timing is inconsistent). This was
> verified using the lspci output and the setpci -s 01:00.0
> BASE_ADDRESS_0 command.
> 
> To diagnose the issue, we checked the dmesg log, but it did not
> provide any relevant information. I then enabled dynamic debugging for
> the PCI subsystem (drivers/pci/*) and noticed the following messages
> related ACPI hotplug in the dmesg log:
> 
> [    0.465144] pci 0000:01:00.0: reg 0x10: [mem 0xb0400000-0xb07fffff]
> ...
> [ 6710.000355] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event()
> [ 7916.250868] perf: interrupt took too long (4072 > 3601), lowering
> kernel.perf_event_max_sample_rate to 49000
> [ 7984.719647] perf: interrupt took too long (5378 > 5090), lowering
> kernel.perf_event_max_sample_rate to 37000
> [11051.409115] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event()
> [11755.388727] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event()
> [12223.885715] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event()
> [14303.465636] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event()
> After these messages appear, reading the device BAR memory results in
> 0x0 instead of the expected value.
> 
> I would like to understand the following:
> 
> 1. What could be causing these hotplug_event debug messages?

This is an ACPI Notify event.  Basically the platform is telling us to
re-enumerate the hierarchy below RP01 because a device might have been
added or removed.

Unfortunately the only real information we get is the ACPI device
(RP01) and the notification value (ACPI_NOTIFY_BUS_CHECK).

You could instrument acpiphp_check_bridge() to see what path we take.
The main paths look like enable_slot() or disable_slot(), but those
both include a pr_debug() than you apparently don't see.

A remove followed by add would definitely reset the device, including
its BARs.  But you would normally see some messages related to
enumerating a new device.

If this doesn't help, try to reproduce the problem with a recent
kernel, e.g., v6.13, and post the complete dmesg log.

> 2. Why does this result in the BAR memory being reset?
> 3. How can we resolve this issue?
> 
> I have verified that the issue occurs even without loading the driver
> for the PLDA Device 5555, so it does not appear to be related to the
> device driver.
> 
> Any help or guidance on debugging this issue would be greatly appreciated.
> 
> Thank you for your assistance.
> 
> Best regards,
> Naveen

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 7+ messages in thread