* [PATCH] dmaengine: idxd: fix use-after-free in idxd_free() and idxd_alloc() error paths
@ 2026-06-15 10:39 Bogdan Codres (Wind River)
2026-06-15 10:39 ` Bogdan Codres (Wind River)
0 siblings, 1 reply; 3+ messages in thread
From: Bogdan Codres (Wind River) @ 2026-06-15 10:39 UTC (permalink / raw)
To: dmaengine, linux-kernel
Cc: vkoul, dave.jiang, vinicius.gomes, xueshuai, yi.sun, fenghuay,
dan.carpenter, gregkh, stable
To: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Vinod Koul <vkoul@kernel.org>,
Dave Jiang <dave.jiang@intel.com>,
Vinicius Costa Gomes <vinicius.gomes@intel.com>,
Shuai Xue <xueshuai@linux.alibaba.com>,
Yi Sun <yi.sun@intel.com>,
Fenghua Yu <fenghuay@nvidia.com>,
Dan Carpenter <dan.carpenter@linaro.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org
Hi,
This patch fixes a double-free / use-after-free bug in the IDXD driver's
probe error path that corrupts the slab allocator and crashes the kernel.
The bug was introduced by commit 90022b3a6981 ("dmaengine: idxd: fix memory
leak in error handling path of idxd_pci_probe") which added the idxd_free()
helper.
Root Cause
----------
idxd_free() performs:
static void idxd_free(struct idxd_device *idxd)
{
if (!idxd)
return;
put_device(idxd_confdev(idxd)); // (1) triggers release callback
bitmap_free(idxd->opcap_bmap); // (2) USE AFTER FREE
ida_free(&idxd_ida, idxd->id); // (3) DOUBLE ida_free
kfree(idxd); // (4) DOUBLE kfree
}
Since device_initialize() was called in idxd_alloc(), conf_dev has
refcount=1. Step (1) drops it to 0 and synchronously triggers:
put_device() -> kobject_put() -> kobject_release() -> kobject_cleanup()
-> device_release() -> dev->type->release -> idxd_conf_device_release()
idxd_conf_device_release() (in sysfs.c) already does:
static void idxd_conf_device_release(struct device *dev)
{
struct idxd_device *idxd = confdev_to_idxd(dev);
kfree(idxd->groups);
bitmap_free(idxd->wq_enable_map);
kfree(idxd->wqs);
kfree(idxd->engines);
kfree(idxd->evl);
kmem_cache_destroy(idxd->evl_cache);
ida_free(&idxd_ida, idxd->id); // <- FIRST ida_free
bitmap_free(idxd->opcap_bmap); // <- FIRST bitmap_free
kfree(idxd); // <- FIRST kfree
}
So after put_device() returns in idxd_free():
- idxd pointer is dangling (memory freed)
- idxd->opcap_bmap is dangling
- idxd->id has already been freed from the IDA
Steps 2-4 then operate on freed memory, corrupting the slab allocator.
The same pattern exists in idxd_alloc() at the err_name label.
How to Reproduce
----------------
This occurs during kdump (crash dump collection) on systems with
Intel IDXD hardware:
1. System has Intel IDXD (DSA/IAX) -- e.g., Granite Rapids / Sapphire
Rapids platforms
2. Original kernel panics (any reason)
3. Kdump kernel boots with: reset_devices nr_cpus=1
4. IDXD device is in HALTED state due to reset_devices
5. IDXD driver probes the device -> probe fails -> idxd_free() ->
double-free -> slab corruption
6. systemd-udevd loads next module -> module signature verification
allocates memory -> hits corrupted slab -> kernel oops
Console Output (kdump kernel)
-----------------------------
[ 18.628791] idxd 0000:00:01.0: Device is HALTED!
[ 18.631447] idxd 0000:00:01.0: Intel(R) IDXD DMA Engine init failed
[ 18.631450] ------------[ cut here ]------------
[ 18.631451] ida_free called for id=0 which is not allocated.
[ 18.631462] WARNING: CPU: 0 PID: 11 at lib/idr.c:525 ida_free+0xd3/0x130
[ 18.631502] idxd_pci_probe+0x1b0/0x1860 [idxd]
...
[ 18.898798] BUG: unable to handle page fault for address: ff2c9dd300000010
[ 18.931865] RIP: 0010:___slab_alloc+0x168/0xa10
...
[ 19.097220] __kmalloc_cache_noprof+0x82/0x230
[ 19.102683] mpi_alloc+0x20/0x80
[ 19.106676] rsa_enc+0x2f/0x120
[ 19.110549] pkcs1pad_verify+0x13b/0x1a0
...
[ 19.161968] module_sig_check+0x87/0xe0
[ 19.166709] load_module+0x3c/0x1e80
Affected Versions
-----------------
- Mainline: present at HEAD (introduced Apr 2025)
- Stable: v6.12.30+ (backport commit 017d4012dc05)
- Also present in other stable branches that received the backport
Test Platform
-------------
- Dell PowerEdge XR8720t
- Intel Xeon 6716P-B (Granite Rapids)
- Kernel: 6.12.0-1-rt-amd64 (StarlingX 6.12.40-1.stx.140)
- RT: PREEMPT_RT
Why This Was Not Caught Earlier
-------------------------------
1. The error path only triggers when IDXD device is HALTED -- this
only happens with reset_devices (kdump) or hardware error
2. On normal boot, IDXD probe always succeeds
3. Most kdump configurations blacklist IDXD via module_blacklist=
4. Systems without IDXD hardware are unaffected
5. The ida_free WARNING alone doesn't crash -- it's the subsequent
slab corruption that causes the fatal oops, which may appear as
an unrelated bug
Workaround
----------
Add idxd to module_blacklist in the kdump kernel command line:
module_blacklist=idxd,idxd_bus
Fix
---
Remove the duplicate bitmap_free/ida_free/kfree from idxd_free()
since idxd_conf_device_release() (triggered by put_device()) already
handles all resource deallocation. Similarly fix idxd_alloc() err_name
path.
Related Commits
---------------
- 90022b3a6981 ("dmaengine: idxd: fix memory leak in error handling
path of idxd_pci_probe") -- introduces the bug
- 46a5cca76c76 ("dmaengine: idxd: fix memory leak in error handling
path of idxd_alloc") -- same pattern in idxd_alloc
- f41c538881ee ("dmaengine: idxd: Remove improper idxd_free") -- fixes
the same function but only in idxd_remove(), not probe error path
- c311f5e9248471 ("dmaengine: idxd: Fix freeing the allocated ida too
late") -- establishes the correct pattern for cdev (ida_free before
put_device, not in .release())
Thanks,
Bogdan
Bogdan Codres (1):
dmaengine: idxd: fix use-after-free in idxd_free() and idxd_alloc()
error paths
drivers/dma/idxd/init.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 3+ messages in thread* [PATCH] dmaengine: idxd: fix use-after-free in idxd_free() and idxd_alloc() error paths
2026-06-15 10:39 [PATCH] dmaengine: idxd: fix use-after-free in idxd_free() and idxd_alloc() error paths Bogdan Codres (Wind River)
@ 2026-06-15 10:39 ` Bogdan Codres (Wind River)
2026-06-15 11:02 ` sashiko-bot
0 siblings, 1 reply; 3+ messages in thread
From: Bogdan Codres (Wind River) @ 2026-06-15 10:39 UTC (permalink / raw)
To: dmaengine, linux-kernel
Cc: vkoul, dave.jiang, vinicius.gomes, xueshuai, yi.sun, fenghuay,
dan.carpenter, gregkh, stable, Bogdan Codres
From: Bogdan Codres <bogdan.codres@windriver.com>
We have the following backtrace:
[ 18.628791] idxd 0000:00:01.0: Device is HALTED!
[ 18.631447] idxd 0000:00:01.0: Intel(R) IDXD DMA Engine init failed
[ 18.631450] ------------[ cut here ]------------
[ 18.631451] ida_free called for id=0 which is not allocated.
[ 18.631462] WARNING: CPU: 0 PID: 11 at lib/idr.c:525 ida_free+0xd3/0x130
[ 18.631467] Modules linked in: idxd(+) idxd_bus wmi zl3073x_spi regmap_spi zl3073x_i2c zl3073x i2c_mux_pca954x i2c_mux ipmi_si acpi_power_meter i2c_designware_platform i2c_designware_core acpi_ipmi ipmi_devintf ipmi_msghandler
[ 18.631474] CPU: 0 UID: 0 PID: 11 Comm: kworker/0:1 Not tainted 6.12.0-1-rt-amd64 #1 Debian 6.12.40-1.stx.140
[ 18.631477] Hardware name: Dell Inc. PowerEdge XR8720t/0J91KV, BIOS 1.1.3 02/03/2026
[ 18.631478] Workqueue: events work_for_cpu_fn
[ 18.631480] RIP: 0010:ida_free+0xd3/0x130
[ 18.631482] Code: 62 ff 31 f6 48 89 e7 e8 bb 1b 02 00 eb 5a 83 fb 3e 76 36 48 8b 3c 24 e8 ab 74 03 00 89 ee 48 c7 c7 70 d6 bd b4 e8 7d 1e 36 ff <0f> 0b 48 8b 44 24 38 65 48 2b 04 25 28 00 00 00 75 37 48 83 c4 40
[ 18.631484] RSP: 0018:ff59485680267d58 EFLAGS: 00010282
[ 18.631485] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffb53064c8
[ 18.631486] RDX: 0000000000020940 RSI: 0000000000000000 RDI: ffffffffb53365d0
[ 18.631487] RBP: 0000000000000000 R08: 0000000000000000 R09: ff59485680267b40
[ 18.631487] R10: ff59485680267b38 R11: ffffffffb5336508 R12: 0000000000000000
[ 18.631488] R13: ff2c9dd3800730c8 R14: 0000000000000000 R15: ff2c9dd38385d800
[ 18.631489] FS: 0000000000000000(0000) GS:ff2c9dd3fdc00000(0000) knlGS:0000000000000000
[ 18.631490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 18.631491] CR2: 000055e2e7678098 CR3: 0000002003450005 CR4: 0000000000771ef0
[ 18.631492] PKRU: 55555554
[ 18.631492] Call Trace:
[ 18.631494] <TASK>
[ 18.631495] idxd_pci_probe+0x1b0/0x1860 [idxd]
[ 18.631502] ? set_next_entity+0xcb/0x1b0
[ 18.631506] local_pci_probe+0x43/0xa0
[ 18.631508] work_for_cpu_fn+0x13/0x20
[ 18.631510] process_one_work+0x179/0x390
[ 18.631512] worker_thread+0x237/0x340
[ 18.631515] ? __pfx_worker_thread+0x10/0x10
[ 18.631517] kthread+0xc6/0x100
[ 18.631519] ? __pfx_kthread+0x10/0x10
[ 18.631520] ret_from_fork+0x2d/0x50
[ 18.631523] ? __pfx_kthread+0x10/0x10
[ 18.631524] ret_from_fork_asm+0x1a/0x30
[ 18.631526] </TASK>
[ 18.631527] ---[ end trace 0000000000000000 ]---
When an IDXD device probe fails (e.g., device is HALTED), the error
path in idxd_pci_probe() calls idxd_free() which performs:
1. put_device(idxd_confdev(idxd))
2. bitmap_free(idxd->opcap_bmap)
3. ida_free(&idxd_ida, idxd->id)
4. kfree(idxd)
However, since device_initialize() was already called in idxd_alloc(),
the conf_dev has a refcount of 1. The put_device() in step 1 drops
this to 0 and synchronously invokes idxd_conf_device_release() via:
put_device() -> kobject_put() -> kobject_release() -> kobject_cleanup()
-> device_release() -> dev->type->release -> idxd_conf_device_release()
idxd_conf_device_release() already performs:
ida_free(&idxd_ida, idxd->id);
bitmap_free(idxd->opcap_bmap);
kfree(idxd);
Therefore steps 2-4 in idxd_free() operate on already-freed memory:
- step 2: bitmap_free on dangling pointer (use-after-free)
- step 3: ida_free on already-released ID, triggering:
"ida_free called for id=0 which is not allocated"
- step 4: double kfree() corrupts slab freelist metadata
This is consistent with the pattern established in commit
c311f5e9248471a950 ("dmaengine: idxd: Fix freeing the allocated ida
too late") where ida_free() was removed from the cdev .release()
callback because resources must not be freed in both the .release()
callback and the caller of put_device().
The path is extremely rare in normal operation because:
1. IDXD probe only fails when the device is in HALTED state
2. The device enters HALTED state exclusively after reset_devices
(kdump boot parameter) or unrecoverable hardware error
3. On a normally running system, IDXD probe always succeeds
Fixes: 90022b3a6981 ("dmaengine: idxd: fix memory leak in error handling path of idxd_pci_probe")
Fixes: 46a5cca76c76 ("dmaengine: idxd: fix memory leak in error handling path of idxd_alloc")
Cc: stable@vger.kernel.org
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: Yi Sun <yi.sun@intel.com>
Cc: Fenghua Yu <fenghuay@nvidia.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Bogdan Codres <bogdan.codres@windriver.com>
---
drivers/dma/idxd/init.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index e55136bb5..b76f0d12b 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -586,15 +586,18 @@ static void idxd_read_caps(struct idxd_device *idxd)
idxd->hw.iaa_cap.bits = ioread64(idxd->reg_base + IDXD_IAACAP_OFFSET);
}
+/*
+ * Release an idxd device that was allocated (device_initialize() was called)
+ * but never successfully registered. put_device() drops the last reference and
+ * triggers idxd_conf_device_release() which frees all resources including the
+ * ida, opcap_bmap, and the idxd structure itself.
+ */
static void idxd_free(struct idxd_device *idxd)
{
if (!idxd)
return;
put_device(idxd_confdev(idxd));
- bitmap_free(idxd->opcap_bmap);
- ida_free(&idxd_ida, idxd->id);
- kfree(idxd);
}
static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_data *data)
@@ -634,13 +637,16 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d
return idxd;
err_name:
+ /* device_initialize() was called, so put_device() will trigger
+ * idxd_conf_device_release() which frees ida, opcap_bmap, and idxd.
+ * Do not fall through to err_opcap/err_ida.
+ */
put_device(conf_dev);
- bitmap_free(idxd->opcap_bmap);
+ return NULL;
err_opcap:
ida_free(&idxd_ida, idxd->id);
err_ida:
kfree(idxd);
-
return NULL;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] dmaengine: idxd: fix use-after-free in idxd_free() and idxd_alloc() error paths
2026-06-15 10:39 ` Bogdan Codres (Wind River)
@ 2026-06-15 11:02 ` sashiko-bot
0 siblings, 0 replies; 3+ messages in thread
From: sashiko-bot @ 2026-06-15 11:02 UTC (permalink / raw)
Cc: vkoul, Frank.Li, dmaengine
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
Pre-existing issues:
- [High] Unconditional call to `destroy_workqueue(idxd->wq)` in `idxd_conf_device_release()` causes a guaranteed kernel panic on early probe or allocation failures.
- [High] Late probe failures trigger sequential manual cleanup (`idxd_cleanup_internals`) and `idxd_conf_device_release()`, resulting in massive double-frees and a double-destroy of the workqueue.
- [High] Sub-component configuration and cleanup functions (`idxd_clean_wqs`, `idxd_clean_groups`, `idxd_clean_engines`) suffer from the exact same double-free anti-pattern that the patch intended to fix in the main device.
--
commit 14c25ba1a63a6d7ab71af3fd079660a9cd76f396
Author: Bogdan Codres <bogdan.codres@windriver.com>
dmaengine: idxd: fix use-after-free in idxd_free() and idxd_alloc() error paths
This commit removes redundant cleanup calls in the error paths of idxd_alloc
and idxd_free. By relying on put_device to synchronously trigger the release
handler idxd_conf_device_release, which already frees these resources, it
prevents double-free scenarios during initialization failures.
> diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
> index f1cfc7790d950..e2b5fe072d4ac 100644
> --- a/drivers/dma/idxd/init.c
> +++ b/drivers/dma/idxd/init.c
[ ... ]
> @@ -601,15 +601,18 @@ static void idxd_read_caps(struct idxd_device *idxd)
[ ... ]
> +/*
> + * Release an idxd device that was allocated (device_initialize() was called)
> + * but never successfully registered. put_device() drops the last reference and
> + * triggers idxd_conf_device_release() which frees all resources including the
> + * ida, opcap_bmap, and the idxd structure itself.
> + */
> static void idxd_free(struct idxd_device *idxd)
> {
> if (!idxd)
> return;
>
> put_device(idxd_confdev(idxd));
> - bitmap_free(idxd->opcap_bmap);
> - ida_free(&idxd_ida, idxd->id);
> - kfree(idxd);
> }
[Severity: High]
This is a pre-existing issue, but does this same double-free pattern still
exist for late probe failures?
If idxd_probe() fails late in the initialization sequence, the error path
calls idxd_cleanup(), which delegates to idxd_cleanup_internals().
In drivers/dma/idxd/init.c:idxd_cleanup_internals():
...
destroy_workqueue(idxd->wq);
The error path then falls through to idxd_free(), which calls put_device().
This triggers idxd_conf_device_release() in drivers/dma/idxd/sysfs.c, which
blindly executes destroy_workqueue(idxd->wq) and kfree() on internal arrays
a second time. Can this cause a double-destroy and memory corruption?
[Severity: High]
This isn't a bug introduced by this patch, but do the sub-component cleanup
functions suffer from the same double-free pattern?
Looking at idxd_clean_wqs() in drivers/dma/idxd/init.c:
...
put_device(conf_dev);
kfree(wq);
The put_device() call drops the reference to zero and invokes the release
handler idxd_conf_wq_release(), which already performs kfree(wq). Does
calling kfree(wq) immediately after put_device() result in a double-free?
Similar patterns appear to exist in idxd_clean_groups() and
idxd_clean_engines().
[ ... ]
> @@ -649,13 +652,16 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d
> return idxd;
>
> err_name:
> + /* device_initialize() was called, so put_device() will trigger
> + * idxd_conf_device_release() which frees ida, opcap_bmap, and idxd.
> + * Do not fall through to err_opcap/err_ida.
> + */
> put_device(conf_dev);
> - bitmap_free(idxd->opcap_bmap);
> + return NULL;
[Severity: High]
This is a pre-existing issue, but will this cause a NULL pointer dereference
during early failures?
If dev_set_name() fails, idxd->wq has not been allocated yet and remains NULL.
The error path calls put_device(conf_dev) which invokes the release callback:
drivers/dma/idxd/sysfs.c:idxd_conf_device_release() {
...
destroy_workqueue(idxd->wq);
Since the upstream kernel's destroy_workqueue() does not safely handle NULL
pointers, won't this unconditionally crash the system on early probe or
allocation failures?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260615103932.61828-2-bogdan.codres@windriver.com?part=1
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-15 11:02 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-15 10:39 [PATCH] dmaengine: idxd: fix use-after-free in idxd_free() and idxd_alloc() error paths Bogdan Codres (Wind River)
2026-06-15 10:39 ` Bogdan Codres (Wind River)
2026-06-15 11:02 ` sashiko-bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox