From: Matthew Brost <matthew.brost@intel.com>
To: "Piotr Piórkowski" <piotr.piorkowski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>,
<intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH] drm/xe/pf: Take pci_rescan_remove_lock mutex when disabling VFs
Date: Mon, 2 Mar 2026 12:36:15 -0800 [thread overview]
Message-ID: <aaX0v0pezDvqt9mT@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <20260302094336.tdcyzx6kqkthfp4v@intel.com>
On Mon, Mar 02, 2026 at 10:43:36AM +0100, Piotr Piórkowski wrote:
> Michal Wajdeczko <michal.wajdeczko@intel.com> wrote on pią [2026-lut-27 22:40:47 +0100]:
> > Since recent commit a5338e365c45 ("PCI/IOV: Fix race between SR-IOV
> > enable/disable and hotplug") the driver pci_driver.sriov_configure
> > hook is called with the mutex pci_rescan_remove_lock already taken.
> >
> > As we are using this hook as-is during driver removal, we get:
> >
> > [ ] xe 0000:4d:00.0: [drm:xe_pci_sriov_configure [xe]] PF: disabling 1 VF
> > [ ] ------------[ cut here ]------------
> > [ ] debug_locks && !(lock_is_held(&(&pci_rescan_remove_lock)->dep_map) != 0)
> > [ ] WARNING: drivers/pci/remove.c:130 at pci_stop_and_remove_bus_device+0x4c/0x50, CPU#32: rmmod/6476
> > [ ] RIP: 0010:pci_stop_and_remove_bus_device+0x4c/0x50
> > [ ] Call Trace:
> > [ ] <TASK>
> > [ ] pci_iov_remove_virtfn+0xd1/0x140
> > [ ] sriov_disable+0x42/0x100
> > [ ] pci_disable_sriov+0x34/0x50
> > [ ] xe_pci_sriov_configure+0x2d0/0x1150 [xe]
> > [ ] xe_pci_remove+0x7c/0x190 [xe]
> > [ ] pci_device_remove+0x41/0xb0
> > [ ] device_remove+0x43/0x80
> > [ ] device_release_driver_internal+0x215/0x280
> > [ ] driver_detach+0x50/0xb0
> > [ ] bus_remove_driver+0x86/0x120
> > [ ] driver_unregister+0x2f/0x60
> > [ ] pci_unregister_driver+0x22/0xc0
> > [ ] xe_unregister_pci_driver+0x15/0x20 [xe]
> > [ ] xe_exit+0x1f/0x34 [xe]
> >
> > Fix that by taking a pci_rescan_remove_lock as it is now expected.
> >
> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_pci.c | 2 +-
> > drivers/gpu/drm/xe/xe_pci_sriov.c | 20 ++++++++++++++++++++
> > 2 files changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> > index 3ac99472d6dd..fb0abd768e67 100644
> > --- a/drivers/gpu/drm/xe/xe_pci.c
> > +++ b/drivers/gpu/drm/xe/xe_pci.c
> > @@ -1010,7 +1010,7 @@ static void xe_pci_remove(struct pci_dev *pdev)
> > struct xe_device *xe = pdev_to_xe_device(pdev);
> >
> > if (IS_SRIOV_PF(xe))
> > - xe_pci_sriov_configure(pdev, 0);
> > + xe_pci_sriov_disable_vfs(pdev);
> >
> > if (xe_survivability_mode_is_boot_enabled(xe))
> > return;
> > diff --git a/drivers/gpu/drm/xe/xe_pci_sriov.c b/drivers/gpu/drm/xe/xe_pci_sriov.c
> > index 3fd22034f03e..2a3fd3578ef2 100644
> > --- a/drivers/gpu/drm/xe/xe_pci_sriov.c
> > +++ b/drivers/gpu/drm/xe/xe_pci_sriov.c
> > @@ -239,6 +239,26 @@ int xe_pci_sriov_configure(struct pci_dev *pdev, int num_vfs)
> > return pf_disable_vfs(xe);
> > }
> >
> > +/**
> > + * xe_pci_sriov_disable_vfs() - Disable all VFs.
> > + * @pdev: the PF's &pci_dev
> > + *
> > + * This is a simple wrapper around our function that implements the
> > + * pci_driver.sriov_configure hook but also takes a required mutex.
> > + *
> > + * Return: 0 on success or a negative error code on failure.
> > + */
> > +int xe_pci_sriov_disable_vfs(struct pci_dev *pdev)
> > +{
> > + int ret;
> > +
> > + pci_lock_rescan_remove();
> > + ret = xe_pci_sriov_configure(pdev, 0);
> > + pci_unlock_rescan_remove();
> > +
> > + return ret;
> > +}
> > +
> LGTM:
> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
>
The lockdep reasons and the placement of the lock make sense to me.
I do have a question though, as I’m a little concerned about our driver
having to take a lock like pci_lock_rescan_remove...
Why is the xe_pci_sriov_disable_vfs call needed in pci_driver.remove?
In other words, why doesn’t the PCI core call pci_driver.sriov_configure
first?
I don’t see many examples of device drivers having to call
pci_lock_rescan_remove [1], which is why I’m asking. I’m wondering whether we
are missing an accepted flow for SR-IOV, and whether the need to take
pci_lock_rescan_remove just to silence lockdep is pointing to a larger
issue.
Matt
[1] https://elixir.bootlin.com/linux/v6.19.3/A/ident/pci_lock_rescan_remove
>
> > /**
> > * xe_pci_sriov_get_vf_pdev() - Lookup the VF's PCI device using the VF identifier.
> > * @pdev: the PF's &pci_dev
> > --
> > 2.47.1
> >
>
> --
next prev parent reply other threads:[~2026-03-02 20:36 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-27 21:40 [PATCH] drm/xe/pf: Take pci_rescan_remove_lock mutex when disabling VFs Michal Wajdeczko
2026-02-27 21:54 ` ✗ CI.checkpatch: warning for " Patchwork
2026-02-27 21:54 ` ✗ CI.KUnit: failure " Patchwork
2026-03-02 9:43 ` [PATCH] " Piotr Piórkowski
2026-03-02 20:36 ` Matthew Brost [this message]
2026-03-02 21:47 ` Michal Wajdeczko
2026-03-02 22:51 ` Matthew Brost
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aaX0v0pezDvqt9mT@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=michal.wajdeczko@intel.com \
--cc=piotr.piorkowski@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox