linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Mario Limonciello <superm1@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	"open list:PCI SUBSYSTEM" <linux-pci@vger.kernel.org>,
	linux-pm@vger.kernel.org,
	"Rafael J . Wysocki" <rjw@rjwysocki.net>,
	Mario Limonciello <mario.limonciello@amd.com>
Subject: Re: [PATCH v3 2/2] PCI: Fix runtime PM usage count underflow on device unplug
Date: Sat, 21 Jun 2025 21:05:15 +0200	[thread overview]
Message-ID: <aFcCaw_IZr-JuUYY@wunner.de> (raw)
In-Reply-To: <20250620025535.3425049-3-superm1@kernel.org>

On Thu, Jun 19, 2025 at 09:55:35PM -0500, Mario Limonciello wrote:
> When a USB4 dock is unplugged the PCIe bridge it's connected to will
> remove issue a "Link Down" and "Card not detected event". The PCI core
> will treat this as a surprise hotplug event and unconfigure all downstream
> devices.
> 
> pci_stop_bus_device() will call device_release_driver(). As part of device
> release sequence pm_runtime_put_sync() is called for the device which will
> decrement the runtime counter to 0. After this, the device remove callback
> (pci_device_remove()) will be called which again calls pm_runtime_put_sync()
> but as the counter is already 0 will cause an underflow.
> 
> This behavior was introduced in commit 967577b062417 ("PCI/PM: Keep runtime
> PM enabled for unbound PCI devices") to prevent asymmetrical get/put from
> probe/remove, but this misses out on the point that when releasing a driver
> the usage count is decremented from the device core.
> 
> Drop the extra call from pci_device_remove().
> 
> Fixes: 967577b062417 ("PCI/PM: Keep runtime PM enabled for unbound PCI devices")

This doesn't look right.  The refcount underflow issue seems new,
we surely haven't been doing the wrong thing since 2012.


> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -478,9 +478,6 @@ static void pci_device_remove(struct device *dev)
>  	pci_dev->driver = NULL;
>  	pci_iov_remove(pci_dev);
>  
> -	/* Undo the runtime PM settings in local_pci_probe() */
> -	pm_runtime_put_sync(dev);
> -

local_pci_probe() increases the refcount to keep the device in D0.
If the driver wants to use runtime suspend, it needs to decrement
the refcount on ->probe() and re-increment on ->remove().

In the dmesg output attached to...

https://bugzilla.kernel.org/show_bug.cgi?id=220216

... the device exhibiting the refcount underflow is a PCIe port.
Are you also seeing this on a PCIe port or is it a different device?

So the refcount decrement happens in pcie_portdrv_probe() and
the refcount increment happens in pcie_portdrv_remove().
Both times it's conditional on pci_bridge_d3_possible().
Does that return a different value on probe versus remove?

Does any of the port service drivers decrement the refcount
once too often?  I've just looked through pciehp but cannot
find anything out of the ordinary.

Looking through recent changes, 002bf2fbc00e and bca84a7b93fd
look like potential candidates causing a regression, but the
former is for AER (which isn't used in the dmesg attached to
the bugzilla) and the latter touches suspend on system sleep,
not runtime suspend.

Can you maybe instrument the pm_runtime_{get,put}*() functions
with a printk() and/or dump_stack() to see where a gratuitous
refcount decrement occurs?

Alternatively, is there a known-good kernel version which does
not exhibit the issue and which could serve as anchor for
git bisect?

Thanks,

Lukas

  reply	other threads:[~2025-06-21 19:05 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-20  2:55 [PATCH v3 0/2] Don't make noise about disconnected USB4 devices Mario Limonciello
2025-06-20  2:55 ` [PATCH v3 1/2] PCI/PM: Skip resuming to D0 if disconnected Mario Limonciello
2025-06-23 17:48   ` Lukas Wunner
2025-06-20  2:55 ` [PATCH v3 2/2] PCI: Fix runtime PM usage count underflow on device unplug Mario Limonciello
2025-06-21 19:05   ` Lukas Wunner [this message]
2025-06-21 19:56     ` Mario Limonciello
2025-06-22  4:43       ` Lukas Wunner
2025-06-22 18:39         ` Mario Limonciello
2025-06-23  1:47           ` Mario Limonciello
2025-06-23  6:53             ` Lukas Wunner
2025-06-23  6:43           ` Lukas Wunner
2025-06-23  7:37             ` Lukas Wunner
2025-06-23 10:05               ` Lukas Wunner
2025-06-23 10:11                 ` Rafael J. Wysocki
2025-06-23 11:37                   ` Mario Limonciello
2025-06-23 12:19                     ` Lukas Wunner
2025-06-23 12:45                       ` Mario Limonciello
2025-06-23 17:23                     ` Lukas Wunner
2025-06-23 17:25                       ` Mario Limonciello

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aFcCaw_IZr-JuUYY@wunner.de \
    --to=lukas@wunner.de \
    --cc=bhelgaas@google.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mario.limonciello@amd.com \
    --cc=rjw@rjwysocki.net \
    --cc=superm1@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).