DMA Engine development
 help / color / mirror / Atom feed
* [PATCH v2 1/2] dmaengine: idxd: Do not call destroy_workqueue with null idxd->wq
@ 2026-05-22 20:34 Steve Wahl
  2026-05-22 20:34 ` [PATCH v2 2/2] dmaengine: idxd: fix duplicate memory frees on initialization error path Steve Wahl
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Steve Wahl @ 2026-05-22 20:34 UTC (permalink / raw)
  To: Steve Wahl, Vinicius Costa Gomes, Dave Jiang, Vinod Koul,
	Frank Li, dmaengine, linux-kernel
  Cc: Russ Anderson, Dimitri Sivanich

Error paths within idxd_pci_probe_alloc and related functions end up
calling destroy_workqueue with a null pointer, from
idxd_conf_device_release via put_device, because that allocation has
not yet occurred when the error is hit.

This was encountered running in a kexec'd kdump kernel with reduced
resources, causing the "Device is HALTED!" branch in
idxd_device_init_reset to be taken.

In idxd_conf_device_release, check that the workqueue has been
allocated before trying to destroy it.

Fixes: 3d33de353b1f ("dmaengine: idxd: Fix not releasing workqueue on .release()")

Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
---
v2: split into two patches as requested by Vinicius Costa

 drivers/dma/idxd/sysfs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index 6d251095c350..d5ffc641c856 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -1836,7 +1836,8 @@ static void idxd_conf_device_release(struct device *dev)
 {
 	struct idxd_device *idxd = confdev_to_idxd(dev);
 
-	destroy_workqueue(idxd->wq);
+	if (idxd->wq)
+		destroy_workqueue(idxd->wq);
 	kfree(idxd->groups);
 	bitmap_free(idxd->wq_enable_map);
 	kfree(idxd->wqs);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/2] dmaengine: idxd: fix duplicate memory frees on initialization error path.
  2026-05-22 20:34 [PATCH v2 1/2] dmaengine: idxd: Do not call destroy_workqueue with null idxd->wq Steve Wahl
@ 2026-05-22 20:34 ` Steve Wahl
  2026-05-22 22:51   ` sashiko-bot
  2026-06-27  0:57   ` Vinicius Costa Gomes
  2026-05-22 21:59 ` [PATCH v2 1/2] dmaengine: idxd: Do not call destroy_workqueue with null idxd->wq sashiko-bot
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 7+ messages in thread
From: Steve Wahl @ 2026-05-22 20:34 UTC (permalink / raw)
  To: Steve Wahl, Vinicius Costa Gomes, Dave Jiang, Vinod Koul,
	Frank Li, dmaengine, linux-kernel
  Cc: Russ Anderson, Dimitri Sivanich

Error paths within idxd_pci_probe_alloc and related functions end up
attempting to free memory already freed from idxd_conf_device_release
via put_device.

This was encountered running in a kexec'd kdump kernel with reduced
resources, causing the "Device is HALTED!" branch in
idxd_device_init_reset to be taken.

In idxd_free and idxd_alloc, do not attempt to free allocations that
will already have been freed.

Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
---
v2: split into two patches as requested by Vinicius Costa

 drivers/dma/idxd/init.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index f1cfc7790d95..227e323cc5a0 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -607,9 +607,6 @@ static void idxd_free(struct idxd_device *idxd)
 		return;
 
 	put_device(idxd_confdev(idxd));
-	bitmap_free(idxd->opcap_bmap);
-	ida_free(&idxd_ida, idxd->id);
-	kfree(idxd);
 }
 
 static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_data *data)
@@ -649,8 +646,13 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d
 	return idxd;
 
 err_name:
+	/*
+	 * once device_initialize(conf_dev) is called,
+	 * put_device(conf_dev) will end up calling
+	 * idxd_conf_device_release() which will free the rest.
+	 */
 	put_device(conf_dev);
-	bitmap_free(idxd->opcap_bmap);
+	return NULL;
 err_opcap:
 	ida_free(&idxd_ida, idxd->id);
 err_ida:
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] dmaengine: idxd: Do not call destroy_workqueue with null idxd->wq
  2026-05-22 20:34 [PATCH v2 1/2] dmaengine: idxd: Do not call destroy_workqueue with null idxd->wq Steve Wahl
  2026-05-22 20:34 ` [PATCH v2 2/2] dmaengine: idxd: fix duplicate memory frees on initialization error path Steve Wahl
@ 2026-05-22 21:59 ` sashiko-bot
  2026-06-27  0:47 ` Vinicius Costa Gomes
  2026-06-27  0:49 ` Vinicius Costa Gomes
  3 siblings, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-05-22 21:59 UTC (permalink / raw)
  To: Steve Wahl; +Cc: vkoul, Frank.Li, dmaengine

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

Pre-existing issues:
- [Critical] The driver violates Linux device model lifetime rules, causing widespread double-free and use-after-free bugs in error paths and cleanup routines.
- [Critical] Error cleanup loops for sysfs device registration incorrectly use `while (j--)`, causing an out-of-bounds access or a resource leak depending on the failure iteration.
- [High] The patch adds a NULL check for `idxd->wq` in `idxd_conf_device_release()` but fails to handle dangling pointers, leaving a double-destroy vulnerability.
--

commit b2f5425f47723d096641d78d72f8c45caffb8a7a
Author: Steve Wahl <steve.wahl@hpe.com>
Subject: dmaengine: idxd: Do not call destroy_workqueue with null idxd->wq

This patch prevents calling destroy_workqueue() with a null pointer in
idxd_conf_device_release() by checking that idxd->wq is allocated. This
avoids a crash during error paths where the allocation has not yet occurred.

> diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
> index 6d251095c350f..d5ffc641c8567 100644
> --- a/drivers/dma/idxd/sysfs.c
> +++ b/drivers/dma/idxd/sysfs.c
> @@ -1836,7 +1836,8 @@ static void idxd_conf_device_release(struct device *dev)
>  {
>  	struct idxd_device *idxd = confdev_to_idxd(dev);
>  
> -	destroy_workqueue(idxd->wq);
> +	if (idxd->wq)
> +		destroy_workqueue(idxd->wq);

[Severity: High]
This is a pre-existing issue, but is there a double-destroy regression
here if idxd_cleanup_internals() was called?

If idxd_register_devices() fails, the cleanup code calls
idxd_cleanup_internals() which explicitly destroys idxd->wq but does not set
it to null. When the error propagates to idxd_free() and put_device() is
called, this release callback will be triggered. Because idxd->wq retains its
stale pointer, this check will pass and we will call destroy_workqueue() a
second time.

Should idxd_cleanup_internals() set idxd->wq to null after destroying it?

>  	kfree(idxd->groups);
>  	bitmap_free(idxd->wq_enable_map);
>  	kfree(idxd->wqs);

[Severity: Critical]
This isn't a bug introduced by this patch, but are there use-after-free and
double-free regressions due to device model lifetime violations?

When put_device() drops the refcount to 0, it automatically triggers this
release callback which correctly frees the parent structures and nested
allocations like idxd->wqs and idxd->groups.

However, in init.c routines like idxd_clean_wqs(), the code explicitly calls
kfree(wq) and bitmap_free(wq->opcap_bmap) before or after calling
put_device(). Does this cause a double-free when the release callbacks also
free these resources?

Additionally, when arrays like idxd->wqs are freed in those cleanup functions
but not set to null, won't this release callback free them a second time when
called via idxd_free()?

[Severity: Critical]
This is a pre-existing issue, but does the error cleanup loop for sysfs
device registration in idxd_register_engine_devices() and similar functions
cause an out-of-bounds access or leak?

In those functions, the error paths do this:

    j = i - 1;
    while (j--) {
        device_unregister(...);
    }

If device_add() fails on the very first element (i = 0), j initializes to -1.
Because -1 evaluates to true, the loop executes and decrements j to -2,
which could trigger an out-of-bounds array read (e.g., idxd->wqs[-2]) and a
crash.

If device_add() fails on the next iteration (i = 1), j initializes to 0.
The condition while (0--) evaluates to false, terminating immediately without
unregistering the successfully added element at index 0, leaking the device.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260522203414.336549-1-steve.wahl@hpe.com?part=1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/2] dmaengine: idxd: fix duplicate memory frees on initialization error path.
  2026-05-22 20:34 ` [PATCH v2 2/2] dmaengine: idxd: fix duplicate memory frees on initialization error path Steve Wahl
@ 2026-05-22 22:51   ` sashiko-bot
  2026-06-27  0:57   ` Vinicius Costa Gomes
  1 sibling, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-05-22 22:51 UTC (permalink / raw)
  To: Steve Wahl; +Cc: dmaengine, Frank.Li, vkoul

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [High] Double frees of child objects (workqueues, engines, groups) remain unaddressed in their respective cleanup functions and setup error paths.
- [High] Dangling pointers left by `idxd_cleanup_internals()` result in a second wave of double frees when `idxd_conf_device_release()` is triggered.
--

commit 4ee711bb61961f88e9ddb1a67af212cce348ee26
Author: Steve Wahl <steve.wahl@hpe.com>

dmaengine: idxd: fix duplicate memory frees on initialization error path.

This commit removes duplicate memory frees in idxd_alloc() and idxd_free()
by relying on the device release callback triggered by put_device() to
handle the cleanup of the idxd structure and its internal allocations.

> diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
> index f1cfc7790d950..227e323cc5a0b 100644
> --- a/drivers/dma/idxd/init.c
> +++ b/drivers/dma/idxd/init.c
> @@ -607,9 +607,6 @@ static void idxd_free(struct idxd_device *idxd)
>  		return;
>  
>  	put_device(idxd_confdev(idxd));

[Severity: High]
This is a pre-existing issue, but does idxd_cleanup_internals() leave
dangling pointers that could cause a second wave of double frees here?

During device initialization in idxd_probe(), if an error occurs after
idxd_setup_internals() succeeds (e.g., idxd_device_load_config() fails),
idxd_cleanup_internals() is called. This function explicitly frees
sub-allocations (idxd->wqs, idxd->engines, idxd->groups) and calls
destroy_workqueue(idxd->wq), but it doesn't set these pointers to NULL.

When the error propagates back to idxd_pci_probe_alloc(), it calls
idxd_free(), which reaches this put_device() call. This triggers the
idxd_conf_device_release() callback, which blindly attempts to free the
exact same dangling pointers again and calls destroy_workqueue(idxd->wq) a
second time.

> -	bitmap_free(idxd->opcap_bmap);
> -	ida_free(&idxd_ida, idxd->id);
> -	kfree(idxd);
>  }

[ ... ]

> @@ -649,8 +646,13 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d
>  	return idxd;
>  
>  err_name:
> +	/*
> +	 * once device_initialize(conf_dev) is called,
> +	 * put_device(conf_dev) will end up calling
> +	 * idxd_conf_device_release() which will free the rest.
> +	 */
>  	put_device(conf_dev);

[Severity: High]
This isn't a bug introduced by this patch, but are there similar duplicate
frees remaining for child objects like workqueues, engines, and groups?

While this patch correctly removes the duplicate frees for the main idxd
object, the exact same pattern appears to exist for child objects in functions
like idxd_setup_wqs(), idxd_setup_engines(), and idxd_setup_groups(), as well
as their cleanup counterparts.

In those functions, put_device(conf_dev) is called to drop the reference
count, which synchronously invokes the device's release callback (e.g.,
idxd_conf_wq_release()). The callback explicitly frees the inner members and
the object itself (kfree(wq)).

However, immediately after put_device() returns, the manual cleanup code
incorrectly calls kfree() and bitmap_free() on those exact same pointers
again. Could this lead to double frees during initialization failure or device
removal?

> -	bitmap_free(idxd->opcap_bmap);
> +	return NULL;
>  err_opcap:
>  	ida_free(&idxd_ida, idxd->id);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260522203414.336549-1-steve.wahl@hpe.com?part=2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] dmaengine: idxd: Do not call destroy_workqueue with null idxd->wq
  2026-05-22 20:34 [PATCH v2 1/2] dmaengine: idxd: Do not call destroy_workqueue with null idxd->wq Steve Wahl
  2026-05-22 20:34 ` [PATCH v2 2/2] dmaengine: idxd: fix duplicate memory frees on initialization error path Steve Wahl
  2026-05-22 21:59 ` [PATCH v2 1/2] dmaengine: idxd: Do not call destroy_workqueue with null idxd->wq sashiko-bot
@ 2026-06-27  0:47 ` Vinicius Costa Gomes
  2026-06-27  0:49 ` Vinicius Costa Gomes
  3 siblings, 0 replies; 7+ messages in thread
From: Vinicius Costa Gomes @ 2026-06-27  0:47 UTC (permalink / raw)
  To: Steve Wahl, Steve Wahl, Dave Jiang, Vinod Koul, Frank Li,
	dmaengine, linux-kernel
  Cc: Russ Anderson, Dimitri Sivanich

Steve Wahl <steve.wahl@hpe.com> writes:

> Error paths within idxd_pci_probe_alloc and related functions end up
> calling destroy_workqueue with a null pointer, from
> idxd_conf_device_release via put_device, because that allocation has
> not yet occurred when the error is hit.
>
> This was encountered running in a kexec'd kdump kernel with reduced
> resources, causing the "Device is HALTED!" branch in
> idxd_device_init_reset to be taken.
>
> In idxd_conf_device_release, check that the workqueue has been
> allocated before trying to destroy it.
>
> Fixes: 3d33de353b1f ("dmaengine: idxd: Fix not releasing workqueue on .release()")
>
> Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
> ---



> v2: split into two patches as requested by Vinicius Costa
>
>  drivers/dma/idxd/sysfs.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
> index 6d251095c350..d5ffc641c856 100644
> --- a/drivers/dma/idxd/sysfs.c
> +++ b/drivers/dma/idxd/sysfs.c
> @@ -1836,7 +1836,8 @@ static void idxd_conf_device_release(struct device *dev)
>  {
>  	struct idxd_device *idxd = confdev_to_idxd(dev);
>  
> -	destroy_workqueue(idxd->wq);
> +	if (idxd->wq)
> +		destroy_workqueue(idxd->wq);
>  	kfree(idxd->groups);
>  	bitmap_free(idxd->wq_enable_map);
>  	kfree(idxd->wqs);
> -- 
> 2.51.0
>

-- 
Vinicius

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] dmaengine: idxd: Do not call destroy_workqueue with null idxd->wq
  2026-05-22 20:34 [PATCH v2 1/2] dmaengine: idxd: Do not call destroy_workqueue with null idxd->wq Steve Wahl
                   ` (2 preceding siblings ...)
  2026-06-27  0:47 ` Vinicius Costa Gomes
@ 2026-06-27  0:49 ` Vinicius Costa Gomes
  3 siblings, 0 replies; 7+ messages in thread
From: Vinicius Costa Gomes @ 2026-06-27  0:49 UTC (permalink / raw)
  To: Steve Wahl, Steve Wahl, Dave Jiang, Vinod Koul, Frank Li,
	dmaengine, linux-kernel
  Cc: Russ Anderson, Dimitri Sivanich

Steve Wahl <steve.wahl@hpe.com> writes:

> Error paths within idxd_pci_probe_alloc and related functions end up
> calling destroy_workqueue with a null pointer, from
> idxd_conf_device_release via put_device, because that allocation has
> not yet occurred when the error is hit.
>
> This was encountered running in a kexec'd kdump kernel with reduced
> resources, causing the "Device is HALTED!" branch in
> idxd_device_init_reset to be taken.
>
> In idxd_conf_device_release, check that the workqueue has been
> allocated before trying to destroy it.
>
> Fixes: 3d33de353b1f ("dmaengine: idxd: Fix not releasing workqueue on .release()")
>
> Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
> ---

(for the earlier email, I meant to add this)

Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/2] dmaengine: idxd: fix duplicate memory frees on initialization error path.
  2026-05-22 20:34 ` [PATCH v2 2/2] dmaengine: idxd: fix duplicate memory frees on initialization error path Steve Wahl
  2026-05-22 22:51   ` sashiko-bot
@ 2026-06-27  0:57   ` Vinicius Costa Gomes
  1 sibling, 0 replies; 7+ messages in thread
From: Vinicius Costa Gomes @ 2026-06-27  0:57 UTC (permalink / raw)
  To: Steve Wahl, Steve Wahl, Dave Jiang, Vinod Koul, Frank Li,
	dmaengine, linux-kernel
  Cc: Russ Anderson, Dimitri Sivanich, bogdan.codres

Steve Wahl <steve.wahl@hpe.com> writes:

> Error paths within idxd_pci_probe_alloc and related functions end up
> attempting to free memory already freed from idxd_conf_device_release
> via put_device.
>
> This was encountered running in a kexec'd kdump kernel with reduced
> resources, causing the "Device is HALTED!" branch in
> idxd_device_init_reset to be taken.
>
> In idxd_free and idxd_alloc, do not attempt to free allocations that
> will already have been freed.
>
> Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
> ---

Bogdan Codres series (but submitted a bit later), tries to fix this
issue, has a better splat and a reproducer, I think it makes sense to
add the splat and reproducer here, and add Bogdan as Reported-by (if
agreed, of course).

I am wondering about adding a third patch for the dangling ->wq pointer, as
reported by Sashiko would make sense.

Something like this might do the job (totally untested):

diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index f1cfc7790d95..0a74018f31a8 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -415,6 +415,7 @@ static void idxd_cleanup_internals(struct idxd_device *idxd)
 	idxd_clean_engines(idxd);
 	idxd_clean_wqs(idxd);
 	destroy_workqueue(idxd->wq);
+	idxd->wq = NULL;
 }

How does it sound?

> v2: split into two patches as requested by Vinicius Costa
>
>  drivers/dma/idxd/init.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
> index f1cfc7790d95..227e323cc5a0 100644
> --- a/drivers/dma/idxd/init.c
> +++ b/drivers/dma/idxd/init.c
> @@ -607,9 +607,6 @@ static void idxd_free(struct idxd_device *idxd)
>  		return;
>  
>  	put_device(idxd_confdev(idxd));
> -	bitmap_free(idxd->opcap_bmap);
> -	ida_free(&idxd_ida, idxd->id);
> -	kfree(idxd);
>  }
>  
>  static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_data *data)
> @@ -649,8 +646,13 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d
>  	return idxd;
>  
>  err_name:
> +	/*
> +	 * once device_initialize(conf_dev) is called,
> +	 * put_device(conf_dev) will end up calling
> +	 * idxd_conf_device_release() which will free the rest.
> +	 */
>  	put_device(conf_dev);
> -	bitmap_free(idxd->opcap_bmap);
> +	return NULL;
>  err_opcap:
>  	ida_free(&idxd_ida, idxd->id);
>  err_ida:
> -- 
> 2.51.0
>

-- 
Vinicius

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-27  0:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22 20:34 [PATCH v2 1/2] dmaengine: idxd: Do not call destroy_workqueue with null idxd->wq Steve Wahl
2026-05-22 20:34 ` [PATCH v2 2/2] dmaengine: idxd: fix duplicate memory frees on initialization error path Steve Wahl
2026-05-22 22:51   ` sashiko-bot
2026-06-27  0:57   ` Vinicius Costa Gomes
2026-05-22 21:59 ` [PATCH v2 1/2] dmaengine: idxd: Do not call destroy_workqueue with null idxd->wq sashiko-bot
2026-06-27  0:47 ` Vinicius Costa Gomes
2026-06-27  0:49 ` Vinicius Costa Gomes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox