DMA Engine development
 help / color / mirror / Atom feed
* [PATCH] dmaengine: idxd: fix use-after-free in idxd_free() and idxd_alloc() error paths
@ 2026-06-15 10:39 Bogdan Codres (Wind River)
  2026-06-15 10:39 ` Bogdan Codres (Wind River)
  0 siblings, 1 reply; 3+ messages in thread
From: Bogdan Codres (Wind River) @ 2026-06-15 10:39 UTC (permalink / raw)
  To: dmaengine, linux-kernel
  Cc: vkoul, dave.jiang, vinicius.gomes, xueshuai, yi.sun, fenghuay,
	dan.carpenter, gregkh, stable

To: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Vinod Koul <vkoul@kernel.org>,
    Dave Jiang <dave.jiang@intel.com>,
    Vinicius Costa Gomes <vinicius.gomes@intel.com>,
    Shuai Xue <xueshuai@linux.alibaba.com>,
    Yi Sun <yi.sun@intel.com>,
    Fenghua Yu <fenghuay@nvidia.com>,
    Dan Carpenter <dan.carpenter@linaro.org>,
    Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
    stable@vger.kernel.org

Hi,

This patch fixes a double-free / use-after-free bug in the IDXD driver's
probe error path that corrupts the slab allocator and crashes the kernel.
The bug was introduced by commit 90022b3a6981 ("dmaengine: idxd: fix memory
leak in error handling path of idxd_pci_probe") which added the idxd_free()
helper.

Root Cause
----------

idxd_free() performs:

  static void idxd_free(struct idxd_device *idxd)
  {
      if (!idxd)
          return;
      put_device(idxd_confdev(idxd));   // (1) triggers release callback
      bitmap_free(idxd->opcap_bmap);    // (2) USE AFTER FREE
      ida_free(&idxd_ida, idxd->id);    // (3) DOUBLE ida_free
      kfree(idxd);                      // (4) DOUBLE kfree
  }

Since device_initialize() was called in idxd_alloc(), conf_dev has
refcount=1. Step (1) drops it to 0 and synchronously triggers:

  put_device() -> kobject_put() -> kobject_release() -> kobject_cleanup()
    -> device_release() -> dev->type->release -> idxd_conf_device_release()

idxd_conf_device_release() (in sysfs.c) already does:

  static void idxd_conf_device_release(struct device *dev)
  {
      struct idxd_device *idxd = confdev_to_idxd(dev);
      kfree(idxd->groups);
      bitmap_free(idxd->wq_enable_map);
      kfree(idxd->wqs);
      kfree(idxd->engines);
      kfree(idxd->evl);
      kmem_cache_destroy(idxd->evl_cache);
      ida_free(&idxd_ida, idxd->id);    // <- FIRST ida_free
      bitmap_free(idxd->opcap_bmap);    // <- FIRST bitmap_free
      kfree(idxd);                      // <- FIRST kfree
  }

So after put_device() returns in idxd_free():
  - idxd pointer is dangling (memory freed)
  - idxd->opcap_bmap is dangling
  - idxd->id has already been freed from the IDA

Steps 2-4 then operate on freed memory, corrupting the slab allocator.

The same pattern exists in idxd_alloc() at the err_name label.

How to Reproduce
----------------

This occurs during kdump (crash dump collection) on systems with
Intel IDXD hardware:

  1. System has Intel IDXD (DSA/IAX) -- e.g., Granite Rapids / Sapphire
     Rapids platforms
  2. Original kernel panics (any reason)
  3. Kdump kernel boots with: reset_devices nr_cpus=1
  4. IDXD device is in HALTED state due to reset_devices
  5. IDXD driver probes the device -> probe fails -> idxd_free() ->
     double-free -> slab corruption
  6. systemd-udevd loads next module -> module signature verification
     allocates memory -> hits corrupted slab -> kernel oops

Console Output (kdump kernel)
-----------------------------

  [   18.628791] idxd 0000:00:01.0: Device is HALTED!
  [   18.631447] idxd 0000:00:01.0: Intel(R) IDXD DMA Engine init failed
  [   18.631450] ------------[ cut here ]------------
  [   18.631451] ida_free called for id=0 which is not allocated.
  [   18.631462] WARNING: CPU: 0 PID: 11 at lib/idr.c:525 ida_free+0xd3/0x130
  [   18.631502]  idxd_pci_probe+0x1b0/0x1860 [idxd]
    ...
  [   18.898798] BUG: unable to handle page fault for address: ff2c9dd300000010
  [   18.931865] RIP: 0010:___slab_alloc+0x168/0xa10
    ...
  [   19.097220]  __kmalloc_cache_noprof+0x82/0x230
  [   19.102683]  mpi_alloc+0x20/0x80
  [   19.106676]  rsa_enc+0x2f/0x120
  [   19.110549]  pkcs1pad_verify+0x13b/0x1a0
    ...
  [   19.161968]  module_sig_check+0x87/0xe0
  [   19.166709]  load_module+0x3c/0x1e80

Affected Versions
-----------------

  - Mainline: present at HEAD (introduced Apr 2025)
  - Stable: v6.12.30+ (backport commit 017d4012dc05)
  - Also present in other stable branches that received the backport

Test Platform
-------------

  - Dell PowerEdge XR8720t
  - Intel Xeon 6716P-B (Granite Rapids)
  - Kernel: 6.12.0-1-rt-amd64 (StarlingX 6.12.40-1.stx.140)
  - RT: PREEMPT_RT

Why This Was Not Caught Earlier
-------------------------------

  1. The error path only triggers when IDXD device is HALTED -- this
     only happens with reset_devices (kdump) or hardware error
  2. On normal boot, IDXD probe always succeeds
  3. Most kdump configurations blacklist IDXD via module_blacklist=
  4. Systems without IDXD hardware are unaffected
  5. The ida_free WARNING alone doesn't crash -- it's the subsequent
     slab corruption that causes the fatal oops, which may appear as
     an unrelated bug

Workaround
----------

Add idxd to module_blacklist in the kdump kernel command line:

  module_blacklist=idxd,idxd_bus

Fix
---

Remove the duplicate bitmap_free/ida_free/kfree from idxd_free()
since idxd_conf_device_release() (triggered by put_device()) already
handles all resource deallocation. Similarly fix idxd_alloc() err_name
path.

Related Commits
---------------

  - 90022b3a6981 ("dmaengine: idxd: fix memory leak in error handling
    path of idxd_pci_probe") -- introduces the bug
  - 46a5cca76c76 ("dmaengine: idxd: fix memory leak in error handling
    path of idxd_alloc") -- same pattern in idxd_alloc
  - f41c538881ee ("dmaengine: idxd: Remove improper idxd_free") -- fixes
    the same function but only in idxd_remove(), not probe error path
  - c311f5e9248471 ("dmaengine: idxd: Fix freeing the allocated ida too
    late") -- establishes the correct pattern for cdev (ida_free before
    put_device, not in .release())

Thanks,
Bogdan

Bogdan Codres (1):
  dmaengine: idxd: fix use-after-free in idxd_free() and idxd_alloc()
    error paths

 drivers/dma/idxd/init.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] dmaengine: idxd: fix use-after-free in idxd_free() and idxd_alloc() error paths
  2026-06-15 10:39 [PATCH] dmaengine: idxd: fix use-after-free in idxd_free() and idxd_alloc() error paths Bogdan Codres (Wind River)
@ 2026-06-15 10:39 ` Bogdan Codres (Wind River)
  2026-06-15 11:02   ` sashiko-bot
  0 siblings, 1 reply; 3+ messages in thread
From: Bogdan Codres (Wind River) @ 2026-06-15 10:39 UTC (permalink / raw)
  To: dmaengine, linux-kernel
  Cc: vkoul, dave.jiang, vinicius.gomes, xueshuai, yi.sun, fenghuay,
	dan.carpenter, gregkh, stable, Bogdan Codres

From: Bogdan Codres <bogdan.codres@windriver.com>

We have the following backtrace:
[   18.628791] idxd 0000:00:01.0: Device is HALTED!
[   18.631447] idxd 0000:00:01.0: Intel(R) IDXD DMA Engine init failed
[   18.631450] ------------[ cut here ]------------
[   18.631451] ida_free called for id=0 which is not allocated.
[   18.631462] WARNING: CPU: 0 PID: 11 at lib/idr.c:525 ida_free+0xd3/0x130
[   18.631467] Modules linked in: idxd(+) idxd_bus wmi zl3073x_spi regmap_spi zl3073x_i2c zl3073x i2c_mux_pca954x i2c_mux ipmi_si acpi_power_meter i2c_designware_platform i2c_designware_core acpi_ipmi ipmi_devintf ipmi_msghandler
[   18.631474] CPU: 0 UID: 0 PID: 11 Comm: kworker/0:1 Not tainted 6.12.0-1-rt-amd64 #1  Debian 6.12.40-1.stx.140
[   18.631477] Hardware name: Dell Inc. PowerEdge XR8720t/0J91KV, BIOS 1.1.3 02/03/2026
[   18.631478] Workqueue: events work_for_cpu_fn
[   18.631480] RIP: 0010:ida_free+0xd3/0x130
[   18.631482] Code: 62 ff 31 f6 48 89 e7 e8 bb 1b 02 00 eb 5a 83 fb 3e 76 36 48 8b 3c 24 e8 ab 74 03 00 89 ee 48 c7 c7 70 d6 bd b4 e8 7d 1e 36 ff <0f> 0b 48 8b 44 24 38 65 48 2b 04 25 28 00 00 00 75 37 48 83 c4 40
[   18.631484] RSP: 0018:ff59485680267d58 EFLAGS: 00010282
[   18.631485] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffb53064c8
[   18.631486] RDX: 0000000000020940 RSI: 0000000000000000 RDI: ffffffffb53365d0
[   18.631487] RBP: 0000000000000000 R08: 0000000000000000 R09: ff59485680267b40
[   18.631487] R10: ff59485680267b38 R11: ffffffffb5336508 R12: 0000000000000000
[   18.631488] R13: ff2c9dd3800730c8 R14: 0000000000000000 R15: ff2c9dd38385d800
[   18.631489] FS:  0000000000000000(0000) GS:ff2c9dd3fdc00000(0000) knlGS:0000000000000000
[   18.631490] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.631491] CR2: 000055e2e7678098 CR3: 0000002003450005 CR4: 0000000000771ef0
[   18.631492] PKRU: 55555554
[   18.631492] Call Trace:
[   18.631494]  <TASK>
[   18.631495]  idxd_pci_probe+0x1b0/0x1860 [idxd]
[   18.631502]  ? set_next_entity+0xcb/0x1b0
[   18.631506]  local_pci_probe+0x43/0xa0
[   18.631508]  work_for_cpu_fn+0x13/0x20
[   18.631510]  process_one_work+0x179/0x390
[   18.631512]  worker_thread+0x237/0x340
[   18.631515]  ? __pfx_worker_thread+0x10/0x10
[   18.631517]  kthread+0xc6/0x100
[   18.631519]  ? __pfx_kthread+0x10/0x10
[   18.631520]  ret_from_fork+0x2d/0x50
[   18.631523]  ? __pfx_kthread+0x10/0x10
[   18.631524]  ret_from_fork_asm+0x1a/0x30
[   18.631526]  </TASK>
[   18.631527] ---[ end trace 0000000000000000 ]---

When an IDXD device probe fails (e.g., device is HALTED), the error
path in idxd_pci_probe() calls idxd_free() which performs:

  1. put_device(idxd_confdev(idxd))
  2. bitmap_free(idxd->opcap_bmap)
  3. ida_free(&idxd_ida, idxd->id)
  4. kfree(idxd)

However, since device_initialize() was already called in idxd_alloc(),
the conf_dev has a refcount of 1. The put_device() in step 1 drops
this to 0 and synchronously invokes idxd_conf_device_release() via:

  put_device() -> kobject_put() -> kobject_release() -> kobject_cleanup()
    -> device_release() -> dev->type->release -> idxd_conf_device_release()

idxd_conf_device_release() already performs:

  ida_free(&idxd_ida, idxd->id);
  bitmap_free(idxd->opcap_bmap);
  kfree(idxd);

Therefore steps 2-4 in idxd_free() operate on already-freed memory:
  - step 2: bitmap_free on dangling pointer (use-after-free)
  - step 3: ida_free on already-released ID, triggering:
    "ida_free called for id=0 which is not allocated"
  - step 4: double kfree() corrupts slab freelist metadata

This is consistent with the pattern established in commit
c311f5e9248471a950 ("dmaengine: idxd: Fix freeing the allocated ida
too late") where ida_free() was removed from the cdev .release()
callback because resources must not be freed in both the .release()
callback and the caller of put_device().

The path is extremely rare in normal operation because:
  1. IDXD probe only fails when the device is in HALTED state
  2. The device enters HALTED state exclusively after reset_devices
     (kdump boot parameter) or unrecoverable hardware error
  3. On a normally running system, IDXD probe always succeeds

Fixes: 90022b3a6981 ("dmaengine: idxd: fix memory leak in error handling path of idxd_pci_probe")
Fixes: 46a5cca76c76 ("dmaengine: idxd: fix memory leak in error handling path of idxd_alloc")
Cc: stable@vger.kernel.org
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: Yi Sun <yi.sun@intel.com>
Cc: Fenghua Yu <fenghuay@nvidia.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Bogdan Codres <bogdan.codres@windriver.com>
---
 drivers/dma/idxd/init.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index e55136bb5..b76f0d12b 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -586,15 +586,18 @@ static void idxd_read_caps(struct idxd_device *idxd)
 		idxd->hw.iaa_cap.bits = ioread64(idxd->reg_base + IDXD_IAACAP_OFFSET);
 }
 
+/*
+ * Release an idxd device that was allocated (device_initialize() was called)
+ * but never successfully registered. put_device() drops the last reference and
+ * triggers idxd_conf_device_release() which frees all resources including the
+ * ida, opcap_bmap, and the idxd structure itself.
+ */
 static void idxd_free(struct idxd_device *idxd)
 {
 	if (!idxd)
 		return;
 
 	put_device(idxd_confdev(idxd));
-	bitmap_free(idxd->opcap_bmap);
-	ida_free(&idxd_ida, idxd->id);
-	kfree(idxd);
 }
 
 static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_data *data)
@@ -634,13 +637,16 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d
 	return idxd;
 
 err_name:
+	/* device_initialize() was called, so put_device() will trigger
+	 * idxd_conf_device_release() which frees ida, opcap_bmap, and idxd.
+	 * Do not fall through to err_opcap/err_ida.
+	 */
 	put_device(conf_dev);
-	bitmap_free(idxd->opcap_bmap);
+	return NULL;
 err_opcap:
 	ida_free(&idxd_ida, idxd->id);
 err_ida:
 	kfree(idxd);
-
 	return NULL;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] dmaengine: idxd: fix use-after-free in idxd_free() and idxd_alloc() error paths
  2026-06-15 10:39 ` Bogdan Codres (Wind River)
@ 2026-06-15 11:02   ` sashiko-bot
  0 siblings, 0 replies; 3+ messages in thread
From: sashiko-bot @ 2026-06-15 11:02 UTC (permalink / raw)
  Cc: vkoul, Frank.Li, dmaengine

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

Pre-existing issues:
- [High] Unconditional call to `destroy_workqueue(idxd->wq)` in `idxd_conf_device_release()` causes a guaranteed kernel panic on early probe or allocation failures.
- [High] Late probe failures trigger sequential manual cleanup (`idxd_cleanup_internals`) and `idxd_conf_device_release()`, resulting in massive double-frees and a double-destroy of the workqueue.
- [High] Sub-component configuration and cleanup functions (`idxd_clean_wqs`, `idxd_clean_groups`, `idxd_clean_engines`) suffer from the exact same double-free anti-pattern that the patch intended to fix in the main device.
--

commit 14c25ba1a63a6d7ab71af3fd079660a9cd76f396
Author: Bogdan Codres <bogdan.codres@windriver.com>

dmaengine: idxd: fix use-after-free in idxd_free() and idxd_alloc() error paths

This commit removes redundant cleanup calls in the error paths of idxd_alloc
and idxd_free. By relying on put_device to synchronously trigger the release
handler idxd_conf_device_release, which already frees these resources, it
prevents double-free scenarios during initialization failures.

> diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
> index f1cfc7790d950..e2b5fe072d4ac 100644
> --- a/drivers/dma/idxd/init.c
> +++ b/drivers/dma/idxd/init.c
[ ... ]
> @@ -601,15 +601,18 @@ static void idxd_read_caps(struct idxd_device *idxd)
[ ... ]
> +/*
> + * Release an idxd device that was allocated (device_initialize() was called)
> + * but never successfully registered. put_device() drops the last reference and
> + * triggers idxd_conf_device_release() which frees all resources including the
> + * ida, opcap_bmap, and the idxd structure itself.
> + */
>  static void idxd_free(struct idxd_device *idxd)
>  {
>  	if (!idxd)
>  		return;
>  
>  	put_device(idxd_confdev(idxd));
> -	bitmap_free(idxd->opcap_bmap);
> -	ida_free(&idxd_ida, idxd->id);
> -	kfree(idxd);
>  }

[Severity: High]
This is a pre-existing issue, but does this same double-free pattern still
exist for late probe failures?

If idxd_probe() fails late in the initialization sequence, the error path
calls idxd_cleanup(), which delegates to idxd_cleanup_internals().

In drivers/dma/idxd/init.c:idxd_cleanup_internals():
    ...
    destroy_workqueue(idxd->wq);

The error path then falls through to idxd_free(), which calls put_device().
This triggers idxd_conf_device_release() in drivers/dma/idxd/sysfs.c, which
blindly executes destroy_workqueue(idxd->wq) and kfree() on internal arrays
a second time. Can this cause a double-destroy and memory corruption?

[Severity: High]
This isn't a bug introduced by this patch, but do the sub-component cleanup
functions suffer from the same double-free pattern?

Looking at idxd_clean_wqs() in drivers/dma/idxd/init.c:
    ...
    put_device(conf_dev);
    kfree(wq);

The put_device() call drops the reference to zero and invokes the release
handler idxd_conf_wq_release(), which already performs kfree(wq). Does
calling kfree(wq) immediately after put_device() result in a double-free?
Similar patterns appear to exist in idxd_clean_groups() and
idxd_clean_engines().

[ ... ]
> @@ -649,13 +652,16 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d
>  	return idxd;
>  
>  err_name:
> +	/* device_initialize() was called, so put_device() will trigger
> +	 * idxd_conf_device_release() which frees ida, opcap_bmap, and idxd.
> +	 * Do not fall through to err_opcap/err_ida.
> +	 */
>  	put_device(conf_dev);
> -	bitmap_free(idxd->opcap_bmap);
> +	return NULL;

[Severity: High]
This is a pre-existing issue, but will this cause a NULL pointer dereference
during early failures?

If dev_set_name() fails, idxd->wq has not been allocated yet and remains NULL.
The error path calls put_device(conf_dev) which invokes the release callback:

drivers/dma/idxd/sysfs.c:idxd_conf_device_release() {
    ...
    destroy_workqueue(idxd->wq);

Since the upstream kernel's destroy_workqueue() does not safely handle NULL
pointers, won't this unconditionally crash the system on early probe or
allocation failures?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260615103932.61828-2-bogdan.codres@windriver.com?part=1

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-15 11:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-15 10:39 [PATCH] dmaengine: idxd: fix use-after-free in idxd_free() and idxd_alloc() error paths Bogdan Codres (Wind River)
2026-06-15 10:39 ` Bogdan Codres (Wind River)
2026-06-15 11:02   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox