* [PATCH] drm/xe/pf: Take pci_rescan_remove_lock mutex when disabling VFs
@ 2026-02-27 21:40 Michal Wajdeczko
2026-02-27 21:54 ` ✗ CI.checkpatch: warning for " Patchwork
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Michal Wajdeczko @ 2026-02-27 21:40 UTC (permalink / raw)
To: intel-xe; +Cc: Michal Wajdeczko
Since recent commit a5338e365c45 ("PCI/IOV: Fix race between SR-IOV
enable/disable and hotplug") the driver pci_driver.sriov_configure
hook is called with the mutex pci_rescan_remove_lock already taken.
As we are using this hook as-is during driver removal, we get:
[ ] xe 0000:4d:00.0: [drm:xe_pci_sriov_configure [xe]] PF: disabling 1 VF
[ ] ------------[ cut here ]------------
[ ] debug_locks && !(lock_is_held(&(&pci_rescan_remove_lock)->dep_map) != 0)
[ ] WARNING: drivers/pci/remove.c:130 at pci_stop_and_remove_bus_device+0x4c/0x50, CPU#32: rmmod/6476
[ ] RIP: 0010:pci_stop_and_remove_bus_device+0x4c/0x50
[ ] Call Trace:
[ ] <TASK>
[ ] pci_iov_remove_virtfn+0xd1/0x140
[ ] sriov_disable+0x42/0x100
[ ] pci_disable_sriov+0x34/0x50
[ ] xe_pci_sriov_configure+0x2d0/0x1150 [xe]
[ ] xe_pci_remove+0x7c/0x190 [xe]
[ ] pci_device_remove+0x41/0xb0
[ ] device_remove+0x43/0x80
[ ] device_release_driver_internal+0x215/0x280
[ ] driver_detach+0x50/0xb0
[ ] bus_remove_driver+0x86/0x120
[ ] driver_unregister+0x2f/0x60
[ ] pci_unregister_driver+0x22/0xc0
[ ] xe_unregister_pci_driver+0x15/0x20 [xe]
[ ] xe_exit+0x1f/0x34 [xe]
Fix that by taking a pci_rescan_remove_lock as it is now expected.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
drivers/gpu/drm/xe/xe_pci.c | 2 +-
drivers/gpu/drm/xe/xe_pci_sriov.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 3ac99472d6dd..fb0abd768e67 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -1010,7 +1010,7 @@ static void xe_pci_remove(struct pci_dev *pdev)
struct xe_device *xe = pdev_to_xe_device(pdev);
if (IS_SRIOV_PF(xe))
- xe_pci_sriov_configure(pdev, 0);
+ xe_pci_sriov_disable_vfs(pdev);
if (xe_survivability_mode_is_boot_enabled(xe))
return;
diff --git a/drivers/gpu/drm/xe/xe_pci_sriov.c b/drivers/gpu/drm/xe/xe_pci_sriov.c
index 3fd22034f03e..2a3fd3578ef2 100644
--- a/drivers/gpu/drm/xe/xe_pci_sriov.c
+++ b/drivers/gpu/drm/xe/xe_pci_sriov.c
@@ -239,6 +239,26 @@ int xe_pci_sriov_configure(struct pci_dev *pdev, int num_vfs)
return pf_disable_vfs(xe);
}
+/**
+ * xe_pci_sriov_disable_vfs() - Disable all VFs.
+ * @pdev: the PF's &pci_dev
+ *
+ * This is a simple wrapper around our function that implements the
+ * pci_driver.sriov_configure hook but also takes a required mutex.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int xe_pci_sriov_disable_vfs(struct pci_dev *pdev)
+{
+ int ret;
+
+ pci_lock_rescan_remove();
+ ret = xe_pci_sriov_configure(pdev, 0);
+ pci_unlock_rescan_remove();
+
+ return ret;
+}
+
/**
* xe_pci_sriov_get_vf_pdev() - Lookup the VF's PCI device using the VF identifier.
* @pdev: the PF's &pci_dev
--
2.47.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* ✗ CI.checkpatch: warning for drm/xe/pf: Take pci_rescan_remove_lock mutex when disabling VFs 2026-02-27 21:40 [PATCH] drm/xe/pf: Take pci_rescan_remove_lock mutex when disabling VFs Michal Wajdeczko @ 2026-02-27 21:54 ` Patchwork 2026-02-27 21:54 ` ✗ CI.KUnit: failure " Patchwork 2026-03-02 9:43 ` [PATCH] " Piotr Piórkowski 2 siblings, 0 replies; 7+ messages in thread From: Patchwork @ 2026-02-27 21:54 UTC (permalink / raw) To: Michal Wajdeczko; +Cc: intel-xe == Series Details == Series: drm/xe/pf: Take pci_rescan_remove_lock mutex when disabling VFs URL : https://patchwork.freedesktop.org/series/162346/ State : warning == Summary == + KERNEL=/kernel + git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt Cloning into 'mt'... warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/ + git -C mt rev-list -n1 origin/master 1f57ba1afceae32108bd24770069f764d940a0e4 + cd /kernel + git config --global --add safe.directory /kernel + git log -n1 commit 302a032f50b095b3e934f202c05299d12e008c71 Author: Michal Wajdeczko <michal.wajdeczko@intel.com> Date: Fri Feb 27 22:40:47 2026 +0100 drm/xe/pf: Take pci_rescan_remove_lock mutex when disabling VFs Since recent commit a5338e365c45 ("PCI/IOV: Fix race between SR-IOV enable/disable and hotplug") the driver pci_driver.sriov_configure hook is called with the mutex pci_rescan_remove_lock already taken. As we are using this hook as-is during driver removal, we get: [ ] xe 0000:4d:00.0: [drm:xe_pci_sriov_configure [xe]] PF: disabling 1 VF [ ] ------------[ cut here ]------------ [ ] debug_locks && !(lock_is_held(&(&pci_rescan_remove_lock)->dep_map) != 0) [ ] WARNING: drivers/pci/remove.c:130 at pci_stop_and_remove_bus_device+0x4c/0x50, CPU#32: rmmod/6476 [ ] RIP: 0010:pci_stop_and_remove_bus_device+0x4c/0x50 [ ] Call Trace: [ ] <TASK> [ ] pci_iov_remove_virtfn+0xd1/0x140 [ ] sriov_disable+0x42/0x100 [ ] pci_disable_sriov+0x34/0x50 [ ] xe_pci_sriov_configure+0x2d0/0x1150 [xe] [ ] xe_pci_remove+0x7c/0x190 [xe] [ ] pci_device_remove+0x41/0xb0 [ ] device_remove+0x43/0x80 [ ] device_release_driver_internal+0x215/0x280 [ ] driver_detach+0x50/0xb0 [ ] bus_remove_driver+0x86/0x120 [ ] driver_unregister+0x2f/0x60 [ ] pci_unregister_driver+0x22/0xc0 [ ] xe_unregister_pci_driver+0x15/0x20 [xe] [ ] xe_exit+0x1f/0x34 [xe] Fix that by taking a pci_rescan_remove_lock as it is now expected. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> + /mt/dim checkpatch a0a6a2fc5a08dc2e936c43e5d181dd0975a251a1 drm-intel 302a032f50b0 drm/xe/pf: Take pci_rescan_remove_lock mutex when disabling VFs -:15: WARNING:COMMIT_LOG_LONG_LINE: Prefer a maximum 75 chars per line (possible unwrapped commit description?) #15: [ ] debug_locks && !(lock_is_held(&(&pci_rescan_remove_lock)->dep_map) != 0) -:82: WARNING:MISSING_FIXES_TAG: The commit message has 'Call Trace:', perhaps it also needs a 'Fixes:' tag? total: 0 errors, 2 warnings, 0 checks, 34 lines checked ^ permalink raw reply [flat|nested] 7+ messages in thread
* ✗ CI.KUnit: failure for drm/xe/pf: Take pci_rescan_remove_lock mutex when disabling VFs 2026-02-27 21:40 [PATCH] drm/xe/pf: Take pci_rescan_remove_lock mutex when disabling VFs Michal Wajdeczko 2026-02-27 21:54 ` ✗ CI.checkpatch: warning for " Patchwork @ 2026-02-27 21:54 ` Patchwork 2026-03-02 9:43 ` [PATCH] " Piotr Piórkowski 2 siblings, 0 replies; 7+ messages in thread From: Patchwork @ 2026-02-27 21:54 UTC (permalink / raw) To: Michal Wajdeczko; +Cc: intel-xe == Series Details == Series: drm/xe/pf: Take pci_rescan_remove_lock mutex when disabling VFs URL : https://patchwork.freedesktop.org/series/162346/ State : failure == Summary == + trap cleanup EXIT + /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig ERROR:root:../drivers/gpu/drm/xe/xe_pci.c: In function ‘xe_pci_remove’: ../drivers/gpu/drm/xe/xe_pci.c:1013:17: error: implicit declaration of function ‘xe_pci_sriov_disable_vfs’ [-Werror=implicit-function-declaration] 1013 | xe_pci_sriov_disable_vfs(pdev); | ^~~~~~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors make[7]: *** [../scripts/Makefile.build:289: drivers/gpu/drm/xe/xe_pci.o] Error 1 make[7]: *** Waiting for unfinished jobs.... make[6]: *** [../scripts/Makefile.build:546: drivers/gpu/drm/xe] Error 2 make[5]: *** [../scripts/Makefile.build:546: drivers/gpu/drm] Error 2 make[4]: *** [../scripts/Makefile.build:546: drivers/gpu] Error 2 make[3]: *** [../scripts/Makefile.build:546: drivers] Error 2 make[2]: *** [/kernel/Makefile:2101: .] Error 2 make[1]: *** [/kernel/Makefile:248: __sub-make] Error 2 make: *** [Makefile:248: __sub-make] Error 2 [21:54:15] Configuring KUnit Kernel ... Generating .config ... Populating config with: $ make ARCH=um O=.kunit olddefconfig [21:54:19] Building KUnit Kernel ... Populating config with: $ make ARCH=um O=.kunit olddefconfig Building with: $ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48 + cleanup ++ stat -c %u:%g /kernel + chown -R 1003:1003 /kernel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/xe/pf: Take pci_rescan_remove_lock mutex when disabling VFs 2026-02-27 21:40 [PATCH] drm/xe/pf: Take pci_rescan_remove_lock mutex when disabling VFs Michal Wajdeczko 2026-02-27 21:54 ` ✗ CI.checkpatch: warning for " Patchwork 2026-02-27 21:54 ` ✗ CI.KUnit: failure " Patchwork @ 2026-03-02 9:43 ` Piotr Piórkowski 2026-03-02 20:36 ` Matthew Brost 2 siblings, 1 reply; 7+ messages in thread From: Piotr Piórkowski @ 2026-03-02 9:43 UTC (permalink / raw) To: Michal Wajdeczko; +Cc: intel-xe Michal Wajdeczko <michal.wajdeczko@intel.com> wrote on pią [2026-lut-27 22:40:47 +0100]: > Since recent commit a5338e365c45 ("PCI/IOV: Fix race between SR-IOV > enable/disable and hotplug") the driver pci_driver.sriov_configure > hook is called with the mutex pci_rescan_remove_lock already taken. > > As we are using this hook as-is during driver removal, we get: > > [ ] xe 0000:4d:00.0: [drm:xe_pci_sriov_configure [xe]] PF: disabling 1 VF > [ ] ------------[ cut here ]------------ > [ ] debug_locks && !(lock_is_held(&(&pci_rescan_remove_lock)->dep_map) != 0) > [ ] WARNING: drivers/pci/remove.c:130 at pci_stop_and_remove_bus_device+0x4c/0x50, CPU#32: rmmod/6476 > [ ] RIP: 0010:pci_stop_and_remove_bus_device+0x4c/0x50 > [ ] Call Trace: > [ ] <TASK> > [ ] pci_iov_remove_virtfn+0xd1/0x140 > [ ] sriov_disable+0x42/0x100 > [ ] pci_disable_sriov+0x34/0x50 > [ ] xe_pci_sriov_configure+0x2d0/0x1150 [xe] > [ ] xe_pci_remove+0x7c/0x190 [xe] > [ ] pci_device_remove+0x41/0xb0 > [ ] device_remove+0x43/0x80 > [ ] device_release_driver_internal+0x215/0x280 > [ ] driver_detach+0x50/0xb0 > [ ] bus_remove_driver+0x86/0x120 > [ ] driver_unregister+0x2f/0x60 > [ ] pci_unregister_driver+0x22/0xc0 > [ ] xe_unregister_pci_driver+0x15/0x20 [xe] > [ ] xe_exit+0x1f/0x34 [xe] > > Fix that by taking a pci_rescan_remove_lock as it is now expected. > > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> > --- > drivers/gpu/drm/xe/xe_pci.c | 2 +- > drivers/gpu/drm/xe/xe_pci_sriov.c | 20 ++++++++++++++++++++ > 2 files changed, 21 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c > index 3ac99472d6dd..fb0abd768e67 100644 > --- a/drivers/gpu/drm/xe/xe_pci.c > +++ b/drivers/gpu/drm/xe/xe_pci.c > @@ -1010,7 +1010,7 @@ static void xe_pci_remove(struct pci_dev *pdev) > struct xe_device *xe = pdev_to_xe_device(pdev); > > if (IS_SRIOV_PF(xe)) > - xe_pci_sriov_configure(pdev, 0); > + xe_pci_sriov_disable_vfs(pdev); > > if (xe_survivability_mode_is_boot_enabled(xe)) > return; > diff --git a/drivers/gpu/drm/xe/xe_pci_sriov.c b/drivers/gpu/drm/xe/xe_pci_sriov.c > index 3fd22034f03e..2a3fd3578ef2 100644 > --- a/drivers/gpu/drm/xe/xe_pci_sriov.c > +++ b/drivers/gpu/drm/xe/xe_pci_sriov.c > @@ -239,6 +239,26 @@ int xe_pci_sriov_configure(struct pci_dev *pdev, int num_vfs) > return pf_disable_vfs(xe); > } > > +/** > + * xe_pci_sriov_disable_vfs() - Disable all VFs. > + * @pdev: the PF's &pci_dev > + * > + * This is a simple wrapper around our function that implements the > + * pci_driver.sriov_configure hook but also takes a required mutex. > + * > + * Return: 0 on success or a negative error code on failure. > + */ > +int xe_pci_sriov_disable_vfs(struct pci_dev *pdev) > +{ > + int ret; > + > + pci_lock_rescan_remove(); > + ret = xe_pci_sriov_configure(pdev, 0); > + pci_unlock_rescan_remove(); > + > + return ret; > +} > + LGTM: Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> > /** > * xe_pci_sriov_get_vf_pdev() - Lookup the VF's PCI device using the VF identifier. > * @pdev: the PF's &pci_dev > -- > 2.47.1 > -- ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/xe/pf: Take pci_rescan_remove_lock mutex when disabling VFs 2026-03-02 9:43 ` [PATCH] " Piotr Piórkowski @ 2026-03-02 20:36 ` Matthew Brost 2026-03-02 21:47 ` Michal Wajdeczko 0 siblings, 1 reply; 7+ messages in thread From: Matthew Brost @ 2026-03-02 20:36 UTC (permalink / raw) To: Piotr Piórkowski; +Cc: Michal Wajdeczko, intel-xe On Mon, Mar 02, 2026 at 10:43:36AM +0100, Piotr Piórkowski wrote: > Michal Wajdeczko <michal.wajdeczko@intel.com> wrote on pią [2026-lut-27 22:40:47 +0100]: > > Since recent commit a5338e365c45 ("PCI/IOV: Fix race between SR-IOV > > enable/disable and hotplug") the driver pci_driver.sriov_configure > > hook is called with the mutex pci_rescan_remove_lock already taken. > > > > As we are using this hook as-is during driver removal, we get: > > > > [ ] xe 0000:4d:00.0: [drm:xe_pci_sriov_configure [xe]] PF: disabling 1 VF > > [ ] ------------[ cut here ]------------ > > [ ] debug_locks && !(lock_is_held(&(&pci_rescan_remove_lock)->dep_map) != 0) > > [ ] WARNING: drivers/pci/remove.c:130 at pci_stop_and_remove_bus_device+0x4c/0x50, CPU#32: rmmod/6476 > > [ ] RIP: 0010:pci_stop_and_remove_bus_device+0x4c/0x50 > > [ ] Call Trace: > > [ ] <TASK> > > [ ] pci_iov_remove_virtfn+0xd1/0x140 > > [ ] sriov_disable+0x42/0x100 > > [ ] pci_disable_sriov+0x34/0x50 > > [ ] xe_pci_sriov_configure+0x2d0/0x1150 [xe] > > [ ] xe_pci_remove+0x7c/0x190 [xe] > > [ ] pci_device_remove+0x41/0xb0 > > [ ] device_remove+0x43/0x80 > > [ ] device_release_driver_internal+0x215/0x280 > > [ ] driver_detach+0x50/0xb0 > > [ ] bus_remove_driver+0x86/0x120 > > [ ] driver_unregister+0x2f/0x60 > > [ ] pci_unregister_driver+0x22/0xc0 > > [ ] xe_unregister_pci_driver+0x15/0x20 [xe] > > [ ] xe_exit+0x1f/0x34 [xe] > > > > Fix that by taking a pci_rescan_remove_lock as it is now expected. > > > > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> > > --- > > drivers/gpu/drm/xe/xe_pci.c | 2 +- > > drivers/gpu/drm/xe/xe_pci_sriov.c | 20 ++++++++++++++++++++ > > 2 files changed, 21 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c > > index 3ac99472d6dd..fb0abd768e67 100644 > > --- a/drivers/gpu/drm/xe/xe_pci.c > > +++ b/drivers/gpu/drm/xe/xe_pci.c > > @@ -1010,7 +1010,7 @@ static void xe_pci_remove(struct pci_dev *pdev) > > struct xe_device *xe = pdev_to_xe_device(pdev); > > > > if (IS_SRIOV_PF(xe)) > > - xe_pci_sriov_configure(pdev, 0); > > + xe_pci_sriov_disable_vfs(pdev); > > > > if (xe_survivability_mode_is_boot_enabled(xe)) > > return; > > diff --git a/drivers/gpu/drm/xe/xe_pci_sriov.c b/drivers/gpu/drm/xe/xe_pci_sriov.c > > index 3fd22034f03e..2a3fd3578ef2 100644 > > --- a/drivers/gpu/drm/xe/xe_pci_sriov.c > > +++ b/drivers/gpu/drm/xe/xe_pci_sriov.c > > @@ -239,6 +239,26 @@ int xe_pci_sriov_configure(struct pci_dev *pdev, int num_vfs) > > return pf_disable_vfs(xe); > > } > > > > +/** > > + * xe_pci_sriov_disable_vfs() - Disable all VFs. > > + * @pdev: the PF's &pci_dev > > + * > > + * This is a simple wrapper around our function that implements the > > + * pci_driver.sriov_configure hook but also takes a required mutex. > > + * > > + * Return: 0 on success or a negative error code on failure. > > + */ > > +int xe_pci_sriov_disable_vfs(struct pci_dev *pdev) > > +{ > > + int ret; > > + > > + pci_lock_rescan_remove(); > > + ret = xe_pci_sriov_configure(pdev, 0); > > + pci_unlock_rescan_remove(); > > + > > + return ret; > > +} > > + > LGTM: > Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> > The lockdep reasons and the placement of the lock make sense to me. I do have a question though, as I’m a little concerned about our driver having to take a lock like pci_lock_rescan_remove... Why is the xe_pci_sriov_disable_vfs call needed in pci_driver.remove? In other words, why doesn’t the PCI core call pci_driver.sriov_configure first? I don’t see many examples of device drivers having to call pci_lock_rescan_remove [1], which is why I’m asking. I’m wondering whether we are missing an accepted flow for SR-IOV, and whether the need to take pci_lock_rescan_remove just to silence lockdep is pointing to a larger issue. Matt [1] https://elixir.bootlin.com/linux/v6.19.3/A/ident/pci_lock_rescan_remove > > > /** > > * xe_pci_sriov_get_vf_pdev() - Lookup the VF's PCI device using the VF identifier. > > * @pdev: the PF's &pci_dev > > -- > > 2.47.1 > > > > -- ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/xe/pf: Take pci_rescan_remove_lock mutex when disabling VFs 2026-03-02 20:36 ` Matthew Brost @ 2026-03-02 21:47 ` Michal Wajdeczko 2026-03-02 22:51 ` Matthew Brost 0 siblings, 1 reply; 7+ messages in thread From: Michal Wajdeczko @ 2026-03-02 21:47 UTC (permalink / raw) To: Matthew Brost; +Cc: intel-xe, Piotr Piórkowski On 3/2/2026 9:36 PM, Matthew Brost wrote: > On Mon, Mar 02, 2026 at 10:43:36AM +0100, Piotr Piórkowski wrote: >> Michal Wajdeczko <michal.wajdeczko@intel.com> wrote on pią [2026-lut-27 22:40:47 +0100]: >>> Since recent commit a5338e365c45 ("PCI/IOV: Fix race between SR-IOV >>> enable/disable and hotplug") the driver pci_driver.sriov_configure >>> hook is called with the mutex pci_rescan_remove_lock already taken. >>> >>> As we are using this hook as-is during driver removal, we get: >>> >>> [ ] xe 0000:4d:00.0: [drm:xe_pci_sriov_configure [xe]] PF: disabling 1 VF >>> [ ] ------------[ cut here ]------------ >>> [ ] debug_locks && !(lock_is_held(&(&pci_rescan_remove_lock)->dep_map) != 0) >>> [ ] WARNING: drivers/pci/remove.c:130 at pci_stop_and_remove_bus_device+0x4c/0x50, CPU#32: rmmod/6476 >>> [ ] RIP: 0010:pci_stop_and_remove_bus_device+0x4c/0x50 >>> [ ] Call Trace: >>> [ ] <TASK> >>> [ ] pci_iov_remove_virtfn+0xd1/0x140 >>> [ ] sriov_disable+0x42/0x100 >>> [ ] pci_disable_sriov+0x34/0x50 >>> [ ] xe_pci_sriov_configure+0x2d0/0x1150 [xe] >>> [ ] xe_pci_remove+0x7c/0x190 [xe] >>> [ ] pci_device_remove+0x41/0xb0 >>> [ ] device_remove+0x43/0x80 >>> [ ] device_release_driver_internal+0x215/0x280 >>> [ ] driver_detach+0x50/0xb0 >>> [ ] bus_remove_driver+0x86/0x120 >>> [ ] driver_unregister+0x2f/0x60 >>> [ ] pci_unregister_driver+0x22/0xc0 >>> [ ] xe_unregister_pci_driver+0x15/0x20 [xe] >>> [ ] xe_exit+0x1f/0x34 [xe] >>> >>> Fix that by taking a pci_rescan_remove_lock as it is now expected. >>> >>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> >>> --- >>> drivers/gpu/drm/xe/xe_pci.c | 2 +- >>> drivers/gpu/drm/xe/xe_pci_sriov.c | 20 ++++++++++++++++++++ >>> 2 files changed, 21 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c >>> index 3ac99472d6dd..fb0abd768e67 100644 >>> --- a/drivers/gpu/drm/xe/xe_pci.c >>> +++ b/drivers/gpu/drm/xe/xe_pci.c >>> @@ -1010,7 +1010,7 @@ static void xe_pci_remove(struct pci_dev *pdev) >>> struct xe_device *xe = pdev_to_xe_device(pdev); >>> >>> if (IS_SRIOV_PF(xe)) >>> - xe_pci_sriov_configure(pdev, 0); >>> + xe_pci_sriov_disable_vfs(pdev); >>> >>> if (xe_survivability_mode_is_boot_enabled(xe)) >>> return; >>> diff --git a/drivers/gpu/drm/xe/xe_pci_sriov.c b/drivers/gpu/drm/xe/xe_pci_sriov.c >>> index 3fd22034f03e..2a3fd3578ef2 100644 >>> --- a/drivers/gpu/drm/xe/xe_pci_sriov.c >>> +++ b/drivers/gpu/drm/xe/xe_pci_sriov.c >>> @@ -239,6 +239,26 @@ int xe_pci_sriov_configure(struct pci_dev *pdev, int num_vfs) >>> return pf_disable_vfs(xe); >>> } >>> >>> +/** >>> + * xe_pci_sriov_disable_vfs() - Disable all VFs. >>> + * @pdev: the PF's &pci_dev >>> + * >>> + * This is a simple wrapper around our function that implements the >>> + * pci_driver.sriov_configure hook but also takes a required mutex. >>> + * >>> + * Return: 0 on success or a negative error code on failure. >>> + */ >>> +int xe_pci_sriov_disable_vfs(struct pci_dev *pdev) >>> +{ >>> + int ret; >>> + >>> + pci_lock_rescan_remove(); >>> + ret = xe_pci_sriov_configure(pdev, 0); >>> + pci_unlock_rescan_remove(); >>> + >>> + return ret; >>> +} >>> + >> LGTM: >> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> >> > > The lockdep reasons and the placement of the lock make sense to me. > > I do have a question though, as I’m a little concerned about our driver > having to take a lock like pci_lock_rescan_remove... > > Why is the xe_pci_sriov_disable_vfs call needed in pci_driver.remove? > In other words, why doesn’t the PCI core call pci_driver.sriov_configure > first? dunno, but it does complain [1] when PF leaves VFs enabled [1] https://elixir.bootlin.com/linux/v6.19.3/source/drivers/pci/iov.c#L1027 > > I don’t see many examples of device drivers having to call > pci_lock_rescan_remove [1], which is why I’m asking. I’m wondering whether we > are missing an accepted flow for SR-IOV, and whether the need to take > pci_lock_rescan_remove just to silence lockdep is pointing to a larger > issue. at first I was assuming that there is just a new expectation for the driver, but indeed it looks that we have a larger problem, as our CI also found that in slightly different scenario [2] this mutex is already taken by the PCI subsystem when calling the .remove callback: [ ] Possible unsafe locking scenario: [ ] CPU0 [ ] ---- [ ] lock(pci_rescan_remove_lock); [ ] lock(pci_rescan_remove_lock); [ ] *** DEADLOCK *** [ ] Call Trace: [ ] dump_stack_lvl+0x91/0xf0 [ ] dump_stack+0x10/0x20 [ ] print_deadlock_bug+0x23f/0x320 [ ] __lock_acquire+0x146e/0x2790 [ ] lock_acquire+0xc4/0x2f0 [ ] __mutex_lock+0xb2/0x10e0 [ ] mutex_lock_nested+0x1b/0x30 [ ] pci_lock_rescan_remove+0x17/0x30 [ ] xe_pci_sriov_disable_vfs+0x12/0x40 [xe] [ ] xe_pci_remove+0x7a/0x180 [xe] [ ] pci_device_remove+0x41/0xb0 [ ] device_remove+0x43/0x80 [ ] device_release_driver_internal+0x215/0x280 [ ] device_release_driver+0x12/0x20 [ ] pci_stop_bus_device+0x69/0x90 [ ] pci_stop_and_remove_bus_device_locked+0x24/0x60 [ ] remove_store+0x85/0xa0 [ ] dev_attr_store+0x17/0x40 but it looks that this issue was already noticed and discussed [3] in the PCI level, so lets wait and see how this will be solved. [2] https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162346v2/shard-bmg-7/igt@core_hotunplug@unplug-rescan.html [3] https://lore.kernel.org/linux-pci/20260228120138.51197-4-ionut.nechita@windriver.com/ > > Matt > > [1] https://elixir.bootlin.com/linux/v6.19.3/A/ident/pci_lock_rescan_remove > >> >>> /** >>> * xe_pci_sriov_get_vf_pdev() - Lookup the VF's PCI device using the VF identifier. >>> * @pdev: the PF's &pci_dev >>> -- >>> 2.47.1 >>> >> >> -- ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/xe/pf: Take pci_rescan_remove_lock mutex when disabling VFs 2026-03-02 21:47 ` Michal Wajdeczko @ 2026-03-02 22:51 ` Matthew Brost 0 siblings, 0 replies; 7+ messages in thread From: Matthew Brost @ 2026-03-02 22:51 UTC (permalink / raw) To: Michal Wajdeczko; +Cc: intel-xe, Piotr Piórkowski On Mon, Mar 02, 2026 at 10:47:41PM +0100, Michal Wajdeczko wrote: > > > On 3/2/2026 9:36 PM, Matthew Brost wrote: > > On Mon, Mar 02, 2026 at 10:43:36AM +0100, Piotr Piórkowski wrote: > >> Michal Wajdeczko <michal.wajdeczko@intel.com> wrote on pią [2026-lut-27 22:40:47 +0100]: > >>> Since recent commit a5338e365c45 ("PCI/IOV: Fix race between SR-IOV > >>> enable/disable and hotplug") the driver pci_driver.sriov_configure > >>> hook is called with the mutex pci_rescan_remove_lock already taken. > >>> > >>> As we are using this hook as-is during driver removal, we get: > >>> > >>> [ ] xe 0000:4d:00.0: [drm:xe_pci_sriov_configure [xe]] PF: disabling 1 VF > >>> [ ] ------------[ cut here ]------------ > >>> [ ] debug_locks && !(lock_is_held(&(&pci_rescan_remove_lock)->dep_map) != 0) > >>> [ ] WARNING: drivers/pci/remove.c:130 at pci_stop_and_remove_bus_device+0x4c/0x50, CPU#32: rmmod/6476 > >>> [ ] RIP: 0010:pci_stop_and_remove_bus_device+0x4c/0x50 > >>> [ ] Call Trace: > >>> [ ] <TASK> > >>> [ ] pci_iov_remove_virtfn+0xd1/0x140 > >>> [ ] sriov_disable+0x42/0x100 > >>> [ ] pci_disable_sriov+0x34/0x50 > >>> [ ] xe_pci_sriov_configure+0x2d0/0x1150 [xe] > >>> [ ] xe_pci_remove+0x7c/0x190 [xe] > >>> [ ] pci_device_remove+0x41/0xb0 > >>> [ ] device_remove+0x43/0x80 > >>> [ ] device_release_driver_internal+0x215/0x280 > >>> [ ] driver_detach+0x50/0xb0 > >>> [ ] bus_remove_driver+0x86/0x120 > >>> [ ] driver_unregister+0x2f/0x60 > >>> [ ] pci_unregister_driver+0x22/0xc0 > >>> [ ] xe_unregister_pci_driver+0x15/0x20 [xe] > >>> [ ] xe_exit+0x1f/0x34 [xe] > >>> > >>> Fix that by taking a pci_rescan_remove_lock as it is now expected. > >>> > >>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> > >>> --- > >>> drivers/gpu/drm/xe/xe_pci.c | 2 +- > >>> drivers/gpu/drm/xe/xe_pci_sriov.c | 20 ++++++++++++++++++++ > >>> 2 files changed, 21 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c > >>> index 3ac99472d6dd..fb0abd768e67 100644 > >>> --- a/drivers/gpu/drm/xe/xe_pci.c > >>> +++ b/drivers/gpu/drm/xe/xe_pci.c > >>> @@ -1010,7 +1010,7 @@ static void xe_pci_remove(struct pci_dev *pdev) > >>> struct xe_device *xe = pdev_to_xe_device(pdev); > >>> > >>> if (IS_SRIOV_PF(xe)) > >>> - xe_pci_sriov_configure(pdev, 0); > >>> + xe_pci_sriov_disable_vfs(pdev); > >>> > >>> if (xe_survivability_mode_is_boot_enabled(xe)) > >>> return; > >>> diff --git a/drivers/gpu/drm/xe/xe_pci_sriov.c b/drivers/gpu/drm/xe/xe_pci_sriov.c > >>> index 3fd22034f03e..2a3fd3578ef2 100644 > >>> --- a/drivers/gpu/drm/xe/xe_pci_sriov.c > >>> +++ b/drivers/gpu/drm/xe/xe_pci_sriov.c > >>> @@ -239,6 +239,26 @@ int xe_pci_sriov_configure(struct pci_dev *pdev, int num_vfs) > >>> return pf_disable_vfs(xe); > >>> } > >>> > >>> +/** > >>> + * xe_pci_sriov_disable_vfs() - Disable all VFs. > >>> + * @pdev: the PF's &pci_dev > >>> + * > >>> + * This is a simple wrapper around our function that implements the > >>> + * pci_driver.sriov_configure hook but also takes a required mutex. > >>> + * > >>> + * Return: 0 on success or a negative error code on failure. > >>> + */ > >>> +int xe_pci_sriov_disable_vfs(struct pci_dev *pdev) > >>> +{ > >>> + int ret; > >>> + > >>> + pci_lock_rescan_remove(); > >>> + ret = xe_pci_sriov_configure(pdev, 0); > >>> + pci_unlock_rescan_remove(); > >>> + > >>> + return ret; > >>> +} > >>> + > >> LGTM: > >> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> > >> > > > > The lockdep reasons and the placement of the lock make sense to me. > > > > I do have a question though, as I’m a little concerned about our driver > > having to take a lock like pci_lock_rescan_remove... > > > > Why is the xe_pci_sriov_disable_vfs call needed in pci_driver.remove? > > In other words, why doesn’t the PCI core call pci_driver.sriov_configure > > first? > > dunno, but it does complain [1] when PF leaves VFs enabled > > [1] https://elixir.bootlin.com/linux/v6.19.3/source/drivers/pci/iov.c#L1027 > > > > > I don’t see many examples of device drivers having to call > > pci_lock_rescan_remove [1], which is why I’m asking. I’m wondering whether we > > are missing an accepted flow for SR-IOV, and whether the need to take > > pci_lock_rescan_remove just to silence lockdep is pointing to a larger > > issue. > > at first I was assuming that there is just a new expectation for the > driver, but indeed it looks that we have a larger problem, as our CI > also found that in slightly different scenario [2] this mutex is already > taken by the PCI subsystem when calling the .remove callback: > > [ ] Possible unsafe locking scenario: > [ ] CPU0 > [ ] ---- > [ ] lock(pci_rescan_remove_lock); > [ ] lock(pci_rescan_remove_lock); > [ ] > *** DEADLOCK *** > > [ ] Call Trace: > [ ] dump_stack_lvl+0x91/0xf0 > [ ] dump_stack+0x10/0x20 > [ ] print_deadlock_bug+0x23f/0x320 > [ ] __lock_acquire+0x146e/0x2790 > [ ] lock_acquire+0xc4/0x2f0 > [ ] __mutex_lock+0xb2/0x10e0 > [ ] mutex_lock_nested+0x1b/0x30 > [ ] pci_lock_rescan_remove+0x17/0x30 > [ ] xe_pci_sriov_disable_vfs+0x12/0x40 [xe] > [ ] xe_pci_remove+0x7a/0x180 [xe] > [ ] pci_device_remove+0x41/0xb0 > [ ] device_remove+0x43/0x80 > [ ] device_release_driver_internal+0x215/0x280 > [ ] device_release_driver+0x12/0x20 > [ ] pci_stop_bus_device+0x69/0x90 > [ ] pci_stop_and_remove_bus_device_locked+0x24/0x60 > [ ] remove_store+0x85/0xa0 > [ ] dev_attr_store+0x17/0x40 > > but it looks that this issue was already noticed and discussed [3] > in the PCI level, so lets wait and see how this will be solved. > > [2] https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-162346v2/shard-bmg-7/igt@core_hotunplug@unplug-rescan.html > [3] https://lore.kernel.org/linux-pci/20260228120138.51197-4-ionut.nechita@windriver.com/ > +1. Generally taking PCI level (or core) locks in drivers is a big red flag. Let's see how the above gets resolved. Matt > > > > > Matt > > > > [1] https://elixir.bootlin.com/linux/v6.19.3/A/ident/pci_lock_rescan_remove > > > >> > >>> /** > >>> * xe_pci_sriov_get_vf_pdev() - Lookup the VF's PCI device using the VF identifier. > >>> * @pdev: the PF's &pci_dev > >>> -- > >>> 2.47.1 > >>> > >> > >> -- > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-03-02 22:52 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-27 21:40 [PATCH] drm/xe/pf: Take pci_rescan_remove_lock mutex when disabling VFs Michal Wajdeczko 2026-02-27 21:54 ` ✗ CI.checkpatch: warning for " Patchwork 2026-02-27 21:54 ` ✗ CI.KUnit: failure " Patchwork 2026-03-02 9:43 ` [PATCH] " Piotr Piórkowski 2026-03-02 20:36 ` Matthew Brost 2026-03-02 21:47 ` Michal Wajdeczko 2026-03-02 22:51 ` Matthew Brost
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox