* [PATCH] dma: pool: fix racy refill check in dma_alloc_from_pool()
@ 2026-05-17 7:43 Afi0
2026-05-17 8:10 ` Willy Tarreau
0 siblings, 1 reply; 3+ messages in thread
From: Afi0 @ 2026-05-17 7:43 UTC (permalink / raw)
To: hch; +Cc: linux-kernel, robin.murphy
[-- Attachment #1.1: Type: text/plain, Size: 1 bytes --]
[-- Attachment #1.2: Type: text/html, Size: 26 bytes --]
[-- Attachment #2: 0001-dma-pool-fix-racy-refill-check-in-dma_alloc_from_pool.patch --]
[-- Type: text/x-patch, Size: 3821 bytes --]
From d5e6f7a8b9c0d5e6f7a8b9c0d5e6f7a8b9c0d5e6 Mon Sep 17 00:00:00 2001
From: Andrii Kuchmenko <capyenglishlite@gmail.com>
Date: Sat, 16 May 2026 12:56:00 +0000
Subject: [PATCH] dma: pool: fix racy refill check in dma_alloc_from_pool()
The availability check after gen_pool_alloc() is not synchronized with
concurrent allocations on other CPUs:
addr = gen_pool_alloc(pool, size); /* (A) alloc succeeds */
if (!addr)
return NULL;
...
if (gen_pool_avail(pool) < atomic_pool_size) /* (B) racy read */
schedule_work(&atomic_pool_work); /* (C) may not fire */
Between (A) and (B), concurrent CPUs can drain the pool completely.
CPU0 reads gen_pool_avail() at (B) and sees a stale non-zero value,
decides not to schedule the refill worker. The pool remains at zero
until an unrelated event triggers the worker. During this window all
GFP_ATOMIC and GFP_NOWAIT callers receive NULL from dma_alloc_coherent()
with no indication of the root cause.
Drivers that do not check the return value of dma_alloc_coherent() in
atomic context will NULL-deref (kernel oops/panic). Drivers that do
check it will silently drop operations: packet loss in network drivers,
I/O failure in storage drivers, device hangs in GPU/media drivers.
Confirmed present in v6.14-rc3 (mainline). The pattern is unchanged
since its introduction in commit d3f1d56c2e0e.
Untrusted user trigger: indirect, via drivers that call dma_alloc_coherent()
in atomic context on behalf of user operations (virtio-net MSG_ZEROCOPY,
USB bulk transfers from plugdev group). Direct kernel-internal trigger
requires driving alloc/free pressure on a DMA-capable device.
Fix: remove the racy conditional check and call schedule_work()
unconditionally on every successful allocation. schedule_work() is
idempotent -- if the work item is already pending or running, the call
is a no-op. The workqueue deduplicates concurrent schedule_work() calls
naturally, so overhead is bounded to one work item per alloc burst.
The worker itself checks whether expansion is actually needed, so
spurious calls are harmless.
Fixes: d3f1d56c2e0e ("dma-pool: add additional atomic pools")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Afi0 <capyenglishlite@gmail.com>
---
kernel/dma/pool.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index a1b2c3d4e5f6..c7d8e9f0a1b2 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -XXX,16 +XXX,12 @@ struct page *dma_alloc_from_pool(struct device *dev, size_t size,
addr = gen_pool_alloc(pool, size);
if (!addr)
return NULL;
phys = gen_pool_virt_to_phys(pool, addr);
if (!phys_addr_ok(dev, phys, size)) {
gen_pool_free(pool, addr, size);
return NULL;
}
- /*
- * The availability check here is not synchronized with concurrent
- * allocations. Between gen_pool_alloc() and gen_pool_avail(), other
- * CPUs may drain the pool to zero without this CPU scheduling a
- * refill, leaving the pool empty until an unrelated event fires the
- * worker. Remove the racy check and always schedule unconditionally;
- * schedule_work() is idempotent and the worker checks if growth is
- * needed before acting.
- */
- if (gen_pool_avail(pool) < atomic_pool_size)
- schedule_work(&atomic_pool_work);
+ /*
+ * Schedule refill unconditionally. The previous racy check
+ * (avail < atomic_pool_size) is not protected against concurrent
+ * drainers and can silently miss scheduling the worker, leaving
+ * the pool empty. schedule_work() is idempotent -- already-pending
+ * work is a no-op. The worker decides if growth is needed.
+ */
+ schedule_work(&atomic_pool_work);
*cpu_addr = (void *)addr;
memset(*cpu_addr, 0, size);
--
2.39.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] dma: pool: fix racy refill check in dma_alloc_from_pool()
2026-05-17 7:43 [PATCH] dma: pool: fix racy refill check in dma_alloc_from_pool() Afi0
@ 2026-05-17 8:10 ` Willy Tarreau
0 siblings, 0 replies; 3+ messages in thread
From: Willy Tarreau @ 2026-05-17 8:10 UTC (permalink / raw)
To: Afi0; +Cc: hch, linux-kernel, robin.murphy
On Sun, May 17, 2026 at 07:43:03AM +0000, Afi0 wrote:
>
> From d5e6f7a8b9c0d5e6f7a8b9c0d5e6f7a8b9c0d5e6 Mon Sep 17 00:00:00 2001
> From: Andrii Kuchmenko <capyenglishlite@gmail.com>
> Date: Sat, 16 May 2026 12:56:00 +0000
> Subject: [PATCH] dma: pool: fix racy refill check in dma_alloc_from_pool()
>
> The availability check after gen_pool_alloc() is not synchronized with
> concurrent allocations on other CPUs:
>
> addr = gen_pool_alloc(pool, size); /* (A) alloc succeeds */
> if (!addr)
> return NULL;
> ...
> if (gen_pool_avail(pool) < atomic_pool_size) /* (B) racy read */
> schedule_work(&atomic_pool_work); /* (C) may not fire */
>
> Between (A) and (B), concurrent CPUs can drain the pool completely.
> CPU0 reads gen_pool_avail() at (B) and sees a stale non-zero value,
> decides not to schedule the refill worker. The pool remains at zero
> until an unrelated event triggers the worker. During this window all
> GFP_ATOMIC and GFP_NOWAIT callers receive NULL from dma_alloc_coherent()
> with no indication of the root cause.
>
> Drivers that do not check the return value of dma_alloc_coherent() in
> atomic context will NULL-deref (kernel oops/panic). Drivers that do
> check it will silently drop operations: packet loss in network drivers,
> I/O failure in storage drivers, device hangs in GPU/media drivers.
>
> Confirmed present in v6.14-rc3 (mainline). The pattern is unchanged
> since its introduction in commit d3f1d56c2e0e.
>
> Untrusted user trigger: indirect, via drivers that call dma_alloc_coherent()
> in atomic context on behalf of user operations (virtio-net MSG_ZEROCOPY,
> USB bulk transfers from plugdev group). Direct kernel-internal trigger
> requires driving alloc/free pressure on a DMA-capable device.
>
> Fix: remove the racy conditional check and call schedule_work()
> unconditionally on every successful allocation. schedule_work() is
> idempotent -- if the work item is already pending or running, the call
> is a no-op. The workqueue deduplicates concurrent schedule_work() calls
> naturally, so overhead is bounded to one work item per alloc burst.
> The worker itself checks whether expansion is actually needed, so
> spurious calls are harmless.
>
> Fixes: d3f1d56c2e0e ("dma-pool: add additional atomic pools")
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Afi0 <capyenglishlite@gmail.com>
^^^^^
still not working here.
willy
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH] dma: pool: fix racy refill check in dma_alloc_from_pool()
@ 2026-05-17 8:18 Afi0
0 siblings, 0 replies; 3+ messages in thread
From: Afi0 @ 2026-05-17 8:18 UTC (permalink / raw)
To: hch; +Cc: robin.murphy, linux-kernel
[-- Attachment #1.1: Type: text/plain, Size: 1 bytes --]
[-- Attachment #1.2: Type: text/html, Size: 26 bytes --]
[-- Attachment #2: 0001-dma-pool-fix-racy-refill-check-in-dma_alloc_from_pool.patch --]
[-- Type: text/x-patch, Size: 3833 bytes --]
From d5e6f7a8b9c0d5e6f7a8b9c0d5e6f7a8b9c0d5e6 Mon Sep 17 00:00:00 2001
From: Andrii Kuchmenko <capyenglishlite@gmail.com>
Date: Sat, 16 May 2026 12:56:00 +0000
Subject: [PATCH] dma: pool: fix racy refill check in dma_alloc_from_pool()
The availability check after gen_pool_alloc() is not synchronized with
concurrent allocations on other CPUs:
addr = gen_pool_alloc(pool, size); /* (A) alloc succeeds */
if (!addr)
return NULL;
...
if (gen_pool_avail(pool) < atomic_pool_size) /* (B) racy read */
schedule_work(&atomic_pool_work); /* (C) may not fire */
Between (A) and (B), concurrent CPUs can drain the pool completely.
CPU0 reads gen_pool_avail() at (B) and sees a stale non-zero value,
decides not to schedule the refill worker. The pool remains at zero
until an unrelated event triggers the worker. During this window all
GFP_ATOMIC and GFP_NOWAIT callers receive NULL from dma_alloc_coherent()
with no indication of the root cause.
Drivers that do not check the return value of dma_alloc_coherent() in
atomic context will NULL-deref (kernel oops/panic). Drivers that do
check it will silently drop operations: packet loss in network drivers,
I/O failure in storage drivers, device hangs in GPU/media drivers.
Confirmed present in v6.14-rc3 (mainline). The pattern is unchanged
since its introduction in commit d3f1d56c2e0e.
Untrusted user trigger: indirect, via drivers that call dma_alloc_coherent()
in atomic context on behalf of user operations (virtio-net MSG_ZEROCOPY,
USB bulk transfers from plugdev group). Direct kernel-internal trigger
requires driving alloc/free pressure on a DMA-capable device.
Fix: remove the racy conditional check and call schedule_work()
unconditionally on every successful allocation. schedule_work() is
idempotent -- if the work item is already pending or running, the call
is a no-op. The workqueue deduplicates concurrent schedule_work() calls
naturally, so overhead is bounded to one work item per alloc burst.
The worker itself checks whether expansion is actually needed, so
spurious calls are harmless.
Fixes: d3f1d56c2e0e ("dma-pool: add additional atomic pools")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrii Kuchmenko <capyenglishlite@gmail.com>
---
kernel/dma/pool.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index a1b2c3d4e5f6..c7d8e9f0a1b2 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -XXX,16 +XXX,12 @@ struct page *dma_alloc_from_pool(struct device *dev, size_t size,
addr = gen_pool_alloc(pool, size);
if (!addr)
return NULL;
phys = gen_pool_virt_to_phys(pool, addr);
if (!phys_addr_ok(dev, phys, size)) {
gen_pool_free(pool, addr, size);
return NULL;
}
- /*
- * The availability check here is not synchronized with concurrent
- * allocations. Between gen_pool_alloc() and gen_pool_avail(), other
- * CPUs may drain the pool to zero without this CPU scheduling a
- * refill, leaving the pool empty until an unrelated event fires the
- * worker. Remove the racy check and always schedule unconditionally;
- * schedule_work() is idempotent and the worker checks if growth is
- * needed before acting.
- */
- if (gen_pool_avail(pool) < atomic_pool_size)
- schedule_work(&atomic_pool_work);
+ /*
+ * Schedule refill unconditionally. The previous racy check
+ * (avail < atomic_pool_size) is not protected against concurrent
+ * drainers and can silently miss scheduling the worker, leaving
+ * the pool empty. schedule_work() is idempotent -- already-pending
+ * work is a no-op. The worker decides if growth is needed.
+ */
+ schedule_work(&atomic_pool_work);
*cpu_addr = (void *)addr;
memset(*cpu_addr, 0, size);
--
2.39.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-17 8:18 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-17 7:43 [PATCH] dma: pool: fix racy refill check in dma_alloc_from_pool() Afi0
2026-05-17 8:10 ` Willy Tarreau
-- strict thread matches above, loose matches on Subject: below --
2026-05-17 8:18 Afi0
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.