All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mario Limonciello <superm1@kernel.org>
To: Lukas Wunner <lukas@wunner.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	"open list:PCI SUBSYSTEM" <linux-pci@vger.kernel.org>,
	linux-pm@vger.kernel.org,
	"Rafael J . Wysocki" <rjw@rjwysocki.net>,
	Mario Limonciello <mario.limonciello@amd.com>
Subject: Re: [PATCH v3 2/2] PCI: Fix runtime PM usage count underflow on device unplug
Date: Sun, 22 Jun 2025 13:39:26 -0500	[thread overview]
Message-ID: <295bf182-7fed-4ffd-93a4-fb5ddf5f1bb4@kernel.org> (raw)
In-Reply-To: <aFeJ83O9PRUrM2Ir@wunner.de>

On 6/21/2025 11:43 PM, Lukas Wunner wrote:
> On Sat, Jun 21, 2025 at 02:56:08PM -0500, Mario Limonciello wrote:
>> On 6/21/25 2:05 PM, Lukas Wunner wrote:
>>> In the dmesg output attached to...
>>>
>>> https://bugzilla.kernel.org/show_bug.cgi?id=220216
>>>
>>> ... the device exhibiting the refcount underflow is a PCIe port.
>>> Are you also seeing this on a PCIe port or is it a different device?
>>
>> The device with the underflow is the disconnected PCIe bridge.
>>
>> In my case it was this bridge that was downstream.
> 
> Okay, in both cases the refcount underflow occurs on a PCIe port.
> So it seems likely the gratuitous refcount decrement is in portdrv.c
> or one of the port services drivers.
> 
> Your patch changes the code path for *all* PCI devices.
> Not just PCIe ports.  Hence it's likely not the right fix.
> 
> It may fix the issue on this particular PCIe port but
> I strongly suspect it'll leak a runtime PM ref on all other devices.
> 

Thanks, I see your point.

> 
>>> So the refcount decrement happens in pcie_portdrv_probe() and
>>> the refcount increment happens in pcie_portdrv_remove().
>>> Both times it's conditional on pci_bridge_d3_possible().
>>> Does that return a different value on probe versus remove?
> 
> Could you please answer this?

I did this check and yes specifically on this PCIe port with the 
underflow the d3 possible lookup returns false during 
pcie_portdrv_remove().  It returns true during pcie_portdrv_probe().

> 
> 
>>> Does any of the port service drivers decrement the refcount
>>> once too often?  I've just looked through pciehp but cannot
>>> find anything out of the ordinary.
>>>
>>> Looking through recent changes, 002bf2fbc00e and bca84a7b93fd
>>> look like potential candidates causing a regression, but the
>>> former is for AER (which isn't used in the dmesg attached to
>>> the bugzilla) and the latter touches suspend on system sleep,
>>> not runtime suspend.
>>>
>>> Can you maybe instrument the pm_runtime_{get,put}*() functions
>>> with a printk() and/or dump_stack() to see where a gratuitous
>>> refcount decrement occurs?
>>
>> That's exactly what I did to conclude this call was an extra one.
>>
>> Here's the drop to 0:
> 
> The drop to 0 is uninteresting.  You need to record *all*
> refcount increments/decrements so that we can see where the
> gratuitous one occurs.  It happens earlier than the drop to 0.
> 
> However, please first check whether the pci_bridge_d3_possible()
> return value changes on probe versus remove of the PCIe port
> in question.  If it does, then that's the root cause and there's
> no need to look any further.
> 

That was a great hypothesis that's spot on.

Just for posterity this was all the increment/decrement calls that I saw 
happen.

pci 0000:02:04.0: inc usage cnt from 0, caller: pci_pm_init+0x84/0x2d0
pci 0000:02:04.0: inc usage cnt from 1, caller: 
pci_scan_bridge_extend+0x6d/0x710
pci 0000:02:04.0: dec usage cnt from 2, caller: 
pci_scan_bridge_extend+0x19e/0x710
pci 0000:02:04.0: inc usage cnt from 1, caller: 
pci_scan_bridge_extend+0x6d/0x710
pci 0000:02:04.0: dec usage cnt from 2, caller: 
pci_scan_bridge_extend+0x19e/0x710
pcieport 0000:02:04.0: inc usage cnt from 1, caller: 
local_pci_probe+0x2d/0xa0
pcieport 0000:02:04.0: inc usage cnt from 2, caller: 
__device_attach+0x9c/0x1b0
pcieport 0000:02:04.0: inc usage cnt from 3, caller: 
__driver_probe_device+0x5c/0x150
pcieport 0000:02:04.0: dec usage cnt from 4, caller: 
__driver_probe_device+0x9a/0x150
pcieport 0000:02:04.0: dec usage cnt from 3, caller: 
__device_attach+0x145/0x1b0
pcieport 0000:02:04.0: dec usage cnt from 2, caller: 
pcie_portdrv_probe+0x19d/0x6d0
pcieport 0000:02:04.0: dec usage cnt from 1, caller: 
pcie_portdrv_probe+0x1a5/0x6d0
pcieport 0000:02:04.0: inc usage cnt from 0, caller: 
device_release_driver_internal+0xac/0x200
pcieport 0000:02:04.0: dec usage cnt from 1, caller: 
device_release_driver_internal+0x197/0x200
pcieport 0000:02:04.0: inc usage cnt from 0, caller: 
pci_device_remove+0x2d/0xb0
pcieport 0000:02:04.0: dec usage cnt from 0, caller: 
pci_device_remove+0x7e/0xb0
pcieport 0000:02:04.0: Runtime PM usage cnt underflow!

What's your suggestion on what to actually do here then?


  reply	other threads:[~2025-06-22 18:39 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-20  2:55 [PATCH v3 0/2] Don't make noise about disconnected USB4 devices Mario Limonciello
2025-06-20  2:55 ` [PATCH v3 1/2] PCI/PM: Skip resuming to D0 if disconnected Mario Limonciello
2025-06-23 17:48   ` Lukas Wunner
2025-06-20  2:55 ` [PATCH v3 2/2] PCI: Fix runtime PM usage count underflow on device unplug Mario Limonciello
2025-06-21 19:05   ` Lukas Wunner
2025-06-21 19:56     ` Mario Limonciello
2025-06-22  4:43       ` Lukas Wunner
2025-06-22 18:39         ` Mario Limonciello [this message]
2025-06-23  1:47           ` Mario Limonciello
2025-06-23  6:53             ` Lukas Wunner
2025-06-23  6:43           ` Lukas Wunner
2025-06-23  7:37             ` Lukas Wunner
2025-06-23 10:05               ` Lukas Wunner
2025-06-23 10:11                 ` Rafael J. Wysocki
2025-06-23 11:37                   ` Mario Limonciello
2025-06-23 12:19                     ` Lukas Wunner
2025-06-23 12:45                       ` Mario Limonciello
2025-06-23 17:23                     ` Lukas Wunner
2025-06-23 17:25                       ` Mario Limonciello

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=295bf182-7fed-4ffd-93a4-fb5ddf5f1bb4@kernel.org \
    --to=superm1@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mario.limonciello@amd.com \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.