* [PATCH 1/2] dmaengine: idxd: Remove improper idxd_free @ 2025-05-29 15:34 Yi Sun 2025-05-29 15:34 ` [PATCH 2/2] dmaengine: idxd: Fix refcount underflow on module unload Yi Sun 2025-05-29 16:56 ` [PATCH 1/2] dmaengine: idxd: Remove improper idxd_free Vinicius Costa Gomes 0 siblings, 2 replies; 10+ messages in thread From: Yi Sun @ 2025-05-29 15:34 UTC (permalink / raw) To: dave.jiang, vinicius.gomes, dmaengine, linux-kernel Cc: yi.sun, xueshuai, gordon.jin The put_device() call can be asynchronous cleanup via schedule_delayed_work when CONFIG_DEBUG_KOBJECT_RELEASE is set. This results in a use-after-free failure during module unloading if invoking idxd_free() immediately afterward. Removes the improper call idxd_free() to prevent potential memory corruption. Signed-off-by: Yi Sun <yi.sun@intel.com> diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 760b7d81fcd8..504aca0fd597 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -1324,7 +1324,6 @@ static void idxd_remove(struct pci_dev *pdev) idxd_cleanup(idxd); pci_iounmap(pdev, idxd->reg_base); put_device(idxd_confdev(idxd)); - idxd_free(idxd); pci_disable_device(pdev); } -- 2.43.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] dmaengine: idxd: Fix refcount underflow on module unload 2025-05-29 15:34 [PATCH 1/2] dmaengine: idxd: Remove improper idxd_free Yi Sun @ 2025-05-29 15:34 ` Yi Sun 2025-05-29 17:04 ` Vinicius Costa Gomes 2025-05-29 16:56 ` [PATCH 1/2] dmaengine: idxd: Remove improper idxd_free Vinicius Costa Gomes 1 sibling, 1 reply; 10+ messages in thread From: Yi Sun @ 2025-05-29 15:34 UTC (permalink / raw) To: dave.jiang, vinicius.gomes, dmaengine, linux-kernel Cc: yi.sun, xueshuai, gordon.jin A recent refactor introduced a misplaced put_device() call, resulting in reference count underflow when the module is unloaded. Expand the idxd_cleanup() function to handle proper cleanup, and remove idxd_cleanup_internals() as it was not part of the driver unload path. Fixes: a409e919ca32 ("dmaengine: idxd: Refactor remove call with idxd_cleanup() helper") Signed-off-by: Yi Sun <yi.sun@intel.com> diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 504aca0fd597..a5eabeb6a8bd 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -1321,7 +1321,12 @@ static void idxd_remove(struct pci_dev *pdev) device_unregister(idxd_confdev(idxd)); idxd_shutdown(pdev); idxd_device_remove_debugfs(idxd); - idxd_cleanup(idxd); + perfmon_pmu_remove(idxd); + idxd_cleanup_interrupts(idxd); + if (device_pasid_enabled(idxd)) + idxd_disable_system_pasid(idxd); + if (device_user_pasid_enabled(idxd)) + idxd_disable_sva(idxd->pdev); pci_iounmap(pdev, idxd->reg_base); put_device(idxd_confdev(idxd)); pci_disable_device(pdev); -- 2.43.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] dmaengine: idxd: Fix refcount underflow on module unload 2025-05-29 15:34 ` [PATCH 2/2] dmaengine: idxd: Fix refcount underflow on module unload Yi Sun @ 2025-05-29 17:04 ` Vinicius Costa Gomes 2025-05-30 3:06 ` Yi Sun 0 siblings, 1 reply; 10+ messages in thread From: Vinicius Costa Gomes @ 2025-05-29 17:04 UTC (permalink / raw) To: Yi Sun, dave.jiang, dmaengine, linux-kernel; +Cc: yi.sun, xueshuai, gordon.jin Yi Sun <yi.sun@intel.com> writes: > A recent refactor introduced a misplaced put_device() call, resulting in > reference count underflow when the module is unloaded. > > Expand the idxd_cleanup() function to handle proper cleanup, and remove > idxd_cleanup_internals() as it was not part of the driver unload path. > 'idxd_cleanup_internals()' frees a bunch of stuff. I would expect an explanation of when those things are being free'd now that removed that call. > Fixes: a409e919ca32 ("dmaengine: idxd: Refactor remove call with idxd_cleanup() helper") > Signed-off-by: Yi Sun <yi.sun@intel.com> > > diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c > index 504aca0fd597..a5eabeb6a8bd 100644 > --- a/drivers/dma/idxd/init.c > +++ b/drivers/dma/idxd/init.c > @@ -1321,7 +1321,12 @@ static void idxd_remove(struct pci_dev *pdev) > device_unregister(idxd_confdev(idxd)); > idxd_shutdown(pdev); > idxd_device_remove_debugfs(idxd); > - idxd_cleanup(idxd); > + perfmon_pmu_remove(idxd); > + idxd_cleanup_interrupts(idxd); > + if (device_pasid_enabled(idxd)) > + idxd_disable_system_pasid(idxd); > + if (device_user_pasid_enabled(idxd)) > + idxd_disable_sva(idxd->pdev); > pci_iounmap(pdev, idxd->reg_base); > put_device(idxd_confdev(idxd)); > pci_disable_device(pdev); > -- > 2.43.0 > -- Vinicius ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] dmaengine: idxd: Fix refcount underflow on module unload 2025-05-29 17:04 ` Vinicius Costa Gomes @ 2025-05-30 3:06 ` Yi Sun 2025-05-30 5:39 ` Shuai Xue 0 siblings, 1 reply; 10+ messages in thread From: Yi Sun @ 2025-05-30 3:06 UTC (permalink / raw) To: Vinicius Costa Gomes Cc: dave.jiang, dmaengine, linux-kernel, xueshuai, gordon.jin On 29.05.2025 10:04, Vinicius Costa Gomes wrote: >Yi Sun <yi.sun@intel.com> writes: > >> A recent refactor introduced a misplaced put_device() call, resulting in >> reference count underflow when the module is unloaded. >> >> Expand the idxd_cleanup() function to handle proper cleanup, and remove >> idxd_cleanup_internals() as it was not part of the driver unload path. >> > >'idxd_cleanup_internals()' frees a bunch of stuff. I would expect an >explanation of when those things are being free'd now that removed that >call. > I believe the call to idxd_unregister_devices(), which is invoked at the very beginning of idxd_remove(), already takes care of the necessary put_device() through the following call path: idxd_unregister_devices() -> device_unregister() -> put_device() Therefore, there's no need to add additional put_device() calls for idxd groups, engines, or workqueues. While the commit message for a409e919ca3 states: "Note, this also fixes the missing put_device() for idxd groups, engines, and wqs." it appears that no such omission actually existed, this part of the flow was already correctly handled. Moreover, this refcount underflow issue appears to be a a clear regression. Prior to this commit, idxd_cleanup_internals() was not part of the driver unload path. The commit did not provide a strong justification for calling idxd_cleanup_internals() within idxd_cleanup(). For reference, the both two related bugs produce nearly identical call traces, and I think both are blocking issues. Thanks --Sun, Yi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] dmaengine: idxd: Fix refcount underflow on module unload 2025-05-30 3:06 ` Yi Sun @ 2025-05-30 5:39 ` Shuai Xue 2025-05-30 5:58 ` Yi Sun 0 siblings, 1 reply; 10+ messages in thread From: Shuai Xue @ 2025-05-30 5:39 UTC (permalink / raw) To: Yi Sun, Vinicius Costa Gomes Cc: dave.jiang, dmaengine, linux-kernel, gordon.jin 在 2025/5/30 11:06, Yi Sun 写道: > On 29.05.2025 10:04, Vinicius Costa Gomes wrote: >> Yi Sun <yi.sun@intel.com> writes: >> >>> A recent refactor introduced a misplaced put_device() call, resulting in >>> reference count underflow when the module is unloaded. >>> >>> Expand the idxd_cleanup() function to handle proper cleanup, and remove >>> idxd_cleanup_internals() as it was not part of the driver unload path. >>> >> >> 'idxd_cleanup_internals()' frees a bunch of stuff. I would expect an >> explanation of when those things are being free'd now that removed that >> call. >> > > I believe the call to idxd_unregister_devices(), which is invoked at the > very beginning of idxd_remove(), already takes care of the necessary > put_device() through the following call path: > idxd_unregister_devices() -> device_unregister() -> put_device() > > Therefore, there's no need to add additional put_device() calls for idxd > groups, engines, or workqueues. While the commit message for a409e919ca3 > states: "Note, this also fixes the missing put_device() for idxd groups, > engines, and wqs." > it appears that no such omission actually existed, this part of the flow > was already correctly handled. > > Moreover, this refcount underflow issue appears to be a a clear > regression. Prior to this commit, idxd_cleanup_internals() was not part of > the driver unload path. The commit did not provide a strong justification > for calling idxd_cleanup_internals() within idxd_cleanup(). > > For reference, the both two related bugs produce nearly identical call > traces, and I think both are blocking issues. > > Thanks > --Sun, Yi Hi, Sun, Yi idxd_pci_probe_alloc() { rc = idxd_probe(idxd); rc = idxd_register_devices(idxd); if (rc) { dev_err(dev, "IDXD sysfs setup failed\n"); goto err_dev_register; } // ... err_dev_register: <- idxd_cleanup(idxd); } We use idxd_cleanup() when register devices failed to undo the idxd_probe(). idxd_probe() sets up idxd groups, engine and wqs and get reference counts by device_initialize(). what's the difference between idxd_cleanup_internals() and idxd_unregister_devices()? Thanks. Shuai ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] dmaengine: idxd: Fix refcount underflow on module unload 2025-05-30 5:39 ` Shuai Xue @ 2025-05-30 5:58 ` Yi Sun 0 siblings, 0 replies; 10+ messages in thread From: Yi Sun @ 2025-05-30 5:58 UTC (permalink / raw) To: Shuai Xue Cc: Vinicius Costa Gomes, dave.jiang, dmaengine, linux-kernel, gordon.jin On 30.05.2025 13:39, Shuai Xue wrote: > > >在 2025/5/30 11:06, Yi Sun 写道: >>On 29.05.2025 10:04, Vinicius Costa Gomes wrote: >>>Yi Sun <yi.sun@intel.com> writes: >>> >>>>A recent refactor introduced a misplaced put_device() call, resulting in >>>>reference count underflow when the module is unloaded. >>>> >>>>Expand the idxd_cleanup() function to handle proper cleanup, and remove >>>>idxd_cleanup_internals() as it was not part of the driver unload path. >>>> >>> >>>'idxd_cleanup_internals()' frees a bunch of stuff. I would expect an >>>explanation of when those things are being free'd now that removed that >>>call. >>> >> >>I believe the call to idxd_unregister_devices(), which is invoked at the >>very beginning of idxd_remove(), already takes care of the necessary >>put_device() through the following call path: >>idxd_unregister_devices() -> device_unregister() -> put_device() >> >>Therefore, there's no need to add additional put_device() calls for idxd >>groups, engines, or workqueues. While the commit message for a409e919ca3 >>states: "Note, this also fixes the missing put_device() for idxd groups, >>engines, and wqs." >>it appears that no such omission actually existed, this part of the flow >>was already correctly handled. >> >>Moreover, this refcount underflow issue appears to be a a clear >>regression. Prior to this commit, idxd_cleanup_internals() was not part of >>the driver unload path. The commit did not provide a strong justification >>for calling idxd_cleanup_internals() within idxd_cleanup(). >> >>For reference, the both two related bugs produce nearly identical call >>traces, and I think both are blocking issues. >> >>Thanks >> --Sun, Yi > >Hi, Sun, Yi > >idxd_pci_probe_alloc() { > rc = idxd_probe(idxd); > rc = idxd_register_devices(idxd); > if (rc) { > dev_err(dev, "IDXD sysfs setup failed\n"); > goto err_dev_register; > } >// ... > err_dev_register: <- > idxd_cleanup(idxd); >} > >We use idxd_cleanup() when register devices failed to undo the >idxd_probe(). idxd_probe() sets up idxd groups, engine and wqs and >get reference counts by device_initialize(). > >what's the difference between idxd_cleanup_internals() and >idxd_unregister_devices()? Hi Shuai, Prior to the commit, I meant the function idxd_remove() did not invoke idxd_cleanup() or idxd_cleanup_internals(), and it was able to handle reference counts correctly via idxd_unregister_devices(). However, the code refactor introduced the use of the idxd_cleanup() helper, which internally calls idxd_cleanup_internals(). This results in a duplicate put_device() call, leading to a reference count underflow issue. Thanks --Sun, Yi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] dmaengine: idxd: Remove improper idxd_free 2025-05-29 15:34 [PATCH 1/2] dmaengine: idxd: Remove improper idxd_free Yi Sun 2025-05-29 15:34 ` [PATCH 2/2] dmaengine: idxd: Fix refcount underflow on module unload Yi Sun @ 2025-05-29 16:56 ` Vinicius Costa Gomes 2025-05-30 0:24 ` Yi Sun 1 sibling, 1 reply; 10+ messages in thread From: Vinicius Costa Gomes @ 2025-05-29 16:56 UTC (permalink / raw) To: Yi Sun, dave.jiang, dmaengine, linux-kernel; +Cc: yi.sun, xueshuai, gordon.jin Hi, Yi Sun <yi.sun@intel.com> writes: > The put_device() call can be asynchronous cleanup via schedule_delayed_work > when CONFIG_DEBUG_KOBJECT_RELEASE is set. This results in a use-after-free > failure during module unloading if invoking idxd_free() immediately > afterward. > I think that adding the relevant part of the log would be helpful. (I am looking at either a similar, or this exact problem, so at least to me it would be helpful) > Removes the improper call idxd_free() to prevent potential memory > corruption. Thinking if it would be worth a Fixes: tag. > > Signed-off-by: Yi Sun <yi.sun@intel.com> > > diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c > index 760b7d81fcd8..504aca0fd597 100644 > --- a/drivers/dma/idxd/init.c > +++ b/drivers/dma/idxd/init.c > @@ -1324,7 +1324,6 @@ static void idxd_remove(struct pci_dev *pdev) > idxd_cleanup(idxd); > pci_iounmap(pdev, idxd->reg_base); > put_device(idxd_confdev(idxd)); > - idxd_free(idxd); > pci_disable_device(pdev); > } > > -- > 2.43.0 > -- Vinicius ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] dmaengine: idxd: Remove improper idxd_free 2025-05-29 16:56 ` [PATCH 1/2] dmaengine: idxd: Remove improper idxd_free Vinicius Costa Gomes @ 2025-05-30 0:24 ` Yi Sun 2025-05-30 1:07 ` Vinicius Costa Gomes 0 siblings, 1 reply; 10+ messages in thread From: Yi Sun @ 2025-05-30 0:24 UTC (permalink / raw) To: Vinicius Costa Gomes Cc: dave.jiang, dmaengine, linux-kernel, xueshuai, gordon.jin On 29.05.2025 09:56, Vinicius Costa Gomes wrote: >Hi, > >Yi Sun <yi.sun@intel.com> writes: > >> The put_device() call can be asynchronous cleanup via schedule_delayed_work >> when CONFIG_DEBUG_KOBJECT_RELEASE is set. This results in a use-after-free >> failure during module unloading if invoking idxd_free() immediately >> afterward. >> > >I think that adding the relevant part of the log would be helpful. (I am >looking at either a similar, or this exact problem, so at least to me it >would be helpful) > The issue is easily reproducible: unloading the module with 'modprobe -r idxd' can trigger the call trace so long as a idxd_free() is called immediately after the put_device(). I can include the call trace in the next version commit log if it's helpful. [ 1957.463315] refcount_t: underflow; use-after-free. [ 1957.463337] WARNING: CPU: 15 PID: 4428 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110 ... ... [ 1957.463424] RIP: 0010:refcount_warn_saturate+0xbe/0x110 ... ... [ 1957.463445] Call Trace: [ 1957.463450] <TASK> [ 1957.463458] idxd_remove+0xe4/0x120 [idxd] [ 1957.463497] pci_device_remove+0x3f/0xb0 [ 1957.463505] device_release_driver_internal+0x197/0x200 [ 1957.463513] driver_detach+0x48/0x90 [ 1957.463515] bus_remove_driver+0x74/0xf0 [ 1957.463521] pci_unregister_driver+0x2e/0xb0 [ 1957.463524] idxd_exit_module+0x34/0x7a0 [idxd] [ 1957.463529] __do_sys_delete_module.constprop.0+0x183/0x280 [ 1957.463536] ? syscall_trace_enter+0x163/0x1c0 [ 1957.463540] do_syscall_64+0x54/0xd70 [ 1957.463549] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 1957.463555] RIP: 0033:0x7fb52b10ee2b >> Removes the improper call idxd_free() to prevent potential memory >> corruption. > >Thinking if it would be worth a Fixes: tag. > Yes, sure. Will add Fixes: tag next version. Thanks --Sun, Yi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] dmaengine: idxd: Remove improper idxd_free 2025-05-30 0:24 ` Yi Sun @ 2025-05-30 1:07 ` Vinicius Costa Gomes 2025-05-30 1:42 ` Yi Sun 0 siblings, 1 reply; 10+ messages in thread From: Vinicius Costa Gomes @ 2025-05-30 1:07 UTC (permalink / raw) To: Yi Sun; +Cc: dave.jiang, dmaengine, linux-kernel, xueshuai, gordon.jin Yi Sun <yi.sun@intel.com> writes: > On 29.05.2025 09:56, Vinicius Costa Gomes wrote: >>Hi, >> >>Yi Sun <yi.sun@intel.com> writes: >> >>> The put_device() call can be asynchronous cleanup via schedule_delayed_work >>> when CONFIG_DEBUG_KOBJECT_RELEASE is set. This results in a use-after-free >>> failure during module unloading if invoking idxd_free() immediately >>> afterward. >>> >> >>I think that adding the relevant part of the log would be helpful. (I am >>looking at either a similar, or this exact problem, so at least to me it >>would be helpful) >> > The issue is easily reproducible: unloading the module with 'modprobe -r idxd' > can trigger the call trace so long as a idxd_free() is called immediately > after the put_device(). > Most probably the same issue I am looking at then. > I can include the call trace in the next version commit log if it's helpful. > Yes, that's helpful. Usually it's better to include more information in the commit message. > [ 1957.463315] refcount_t: underflow; use-after-free. > [ 1957.463337] WARNING: CPU: 15 PID: 4428 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110 > ... ... > [ 1957.463424] RIP: 0010:refcount_warn_saturate+0xbe/0x110 > ... ... > [ 1957.463445] Call Trace: > [ 1957.463450] <TASK> > [ 1957.463458] idxd_remove+0xe4/0x120 [idxd] > [ 1957.463497] pci_device_remove+0x3f/0xb0 > [ 1957.463505] device_release_driver_internal+0x197/0x200 > [ 1957.463513] driver_detach+0x48/0x90 > [ 1957.463515] bus_remove_driver+0x74/0xf0 > [ 1957.463521] pci_unregister_driver+0x2e/0xb0 > [ 1957.463524] idxd_exit_module+0x34/0x7a0 [idxd] > [ 1957.463529] __do_sys_delete_module.constprop.0+0x183/0x280 > [ 1957.463536] ? syscall_trace_enter+0x163/0x1c0 > [ 1957.463540] do_syscall_64+0x54/0xd70 > [ 1957.463549] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [ 1957.463555] RIP: 0033:0x7fb52b10ee2b > >>> Removes the improper call idxd_free() to prevent potential memory >>> corruption. >> >>Thinking if it would be worth a Fixes: tag. >> > Yes, sure. Will add Fixes: tag next version. > > Thanks > --Sun, Yi Thank you, -- Vinicius ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] dmaengine: idxd: Remove improper idxd_free 2025-05-30 1:07 ` Vinicius Costa Gomes @ 2025-05-30 1:42 ` Yi Sun 0 siblings, 0 replies; 10+ messages in thread From: Yi Sun @ 2025-05-30 1:42 UTC (permalink / raw) To: Vinicius Costa Gomes Cc: dave.jiang, dmaengine, linux-kernel, xueshuai, gordon.jin On 29.05.2025 18:07, Vinicius Costa Gomes wrote: >Yi Sun <yi.sun@intel.com> writes: > >> On 29.05.2025 09:56, Vinicius Costa Gomes wrote: >>>Hi, >>> >>>Yi Sun <yi.sun@intel.com> writes: >>> >>>> The put_device() call can be asynchronous cleanup via schedule_delayed_work >>>> when CONFIG_DEBUG_KOBJECT_RELEASE is set. This results in a use-after-free >>>> failure during module unloading if invoking idxd_free() immediately >>>> afterward. >>>> >>> >>>I think that adding the relevant part of the log would be helpful. (I am >>>looking at either a similar, or this exact problem, so at least to me it >>>would be helpful) >>> >> The issue is easily reproducible: unloading the module with 'modprobe -r idxd' >> can trigger the call trace so long as a idxd_free() is called immediately >> after the put_device(). >> > >Most probably the same issue I am looking at then. > Yes, and actually, the other patch (patch 2/2: "dmaengine: idxd: Fix refcount underflow on module unload") addresses a similar issue which is triggered in the same function and resulting in nearly identical call traces. There are two separate code changes in idxd/init.c that can lead to a 'refcount_t: underflow; use-after-free' issue, both caused by incorrect use of put_device(). Thanks --Sun, Yi ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-05-30 5:58 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-29 15:34 [PATCH 1/2] dmaengine: idxd: Remove improper idxd_free Yi Sun 2025-05-29 15:34 ` [PATCH 2/2] dmaengine: idxd: Fix refcount underflow on module unload Yi Sun 2025-05-29 17:04 ` Vinicius Costa Gomes 2025-05-30 3:06 ` Yi Sun 2025-05-30 5:39 ` Shuai Xue 2025-05-30 5:58 ` Yi Sun 2025-05-29 16:56 ` [PATCH 1/2] dmaengine: idxd: Remove improper idxd_free Vinicius Costa Gomes 2025-05-30 0:24 ` Yi Sun 2025-05-30 1:07 ` Vinicius Costa Gomes 2025-05-30 1:42 ` Yi Sun
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).