* [PATCH 1/7] fault-inject: make enum fault_flags available unconditionally
2025-11-11 13:52 mempool_alloc_bulk and various mempool improvements Christoph Hellwig
@ 2025-11-11 13:52 ` Christoph Hellwig
2025-11-11 13:52 ` [PATCH 2/7] mempool: update kerneldoc comments Christoph Hellwig
` (6 subsequent siblings)
7 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-11-11 13:52 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Eric Biggers, linux-mm, linux-kernel
This will allow using should_fail_ex from code without having to
make it conditional on CONFIG_FAULT_INJECTION.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/linux/fault-inject.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/fault-inject.h b/include/linux/fault-inject.h
index 8c829d28dcf3..58fd14c82270 100644
--- a/include/linux/fault-inject.h
+++ b/include/linux/fault-inject.h
@@ -8,6 +8,10 @@
struct dentry;
struct kmem_cache;
+enum fault_flags {
+ FAULT_NOWARN = 1 << 0,
+};
+
#ifdef CONFIG_FAULT_INJECTION
#include <linux/atomic.h>
@@ -36,10 +40,6 @@ struct fault_attr {
struct dentry *dname;
};
-enum fault_flags {
- FAULT_NOWARN = 1 << 0,
-};
-
#define FAULT_ATTR_INITIALIZER { \
.interval = 1, \
.times = ATOMIC_INIT(1), \
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH 2/7] mempool: update kerneldoc comments
2025-11-11 13:52 mempool_alloc_bulk and various mempool improvements Christoph Hellwig
2025-11-11 13:52 ` [PATCH 1/7] fault-inject: make enum fault_flags available unconditionally Christoph Hellwig
@ 2025-11-11 13:52 ` Christoph Hellwig
2025-11-11 13:52 ` [PATCH 3/7] mempool: add error injection support Christoph Hellwig
` (5 subsequent siblings)
7 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-11-11 13:52 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Eric Biggers, linux-mm, linux-kernel
Use proper formatting, use full sentences and reduce some verbosity in
function parameter descriptions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
mm/mempool.c | 41 ++++++++++++++++++++++-------------------
1 file changed, 22 insertions(+), 19 deletions(-)
diff --git a/mm/mempool.c b/mm/mempool.c
index 1c38e873e546..1f4701713203 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -372,18 +372,20 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
EXPORT_SYMBOL(mempool_resize);
/**
- * mempool_alloc - allocate an element from a specific memory pool
- * @pool: pointer to the memory pool which was allocated via
- * mempool_create().
- * @gfp_mask: the usual allocation bitmask.
+ * mempool_alloc - allocate an element from a memory pool
+ * @pool: pointer to the memory pool
+ * @gfp_mask: GFP_* flags. %__GFP_ZERO is not supported.
*
- * this function only sleeps if the alloc_fn() function sleeps or
- * returns NULL. Note that due to preallocation, this function
- * *never* fails when called from process contexts. (it might
- * fail if called from an IRQ context.)
- * Note: using __GFP_ZERO is not supported.
+ * Allocate an element from @pool. This is done by first calling into the
+ * alloc_fn supplied at pool initialization time, and dipping into the reserved
+ * pool when alloc_fn fails to allocate an element.
*
- * Return: pointer to the allocated element or %NULL on error.
+ * This function only sleeps if the alloc_fn callback sleeps, or when waiting
+ * for elements to become available in the pool.
+ *
+ * Return: pointer to the allocated element or %NULL when failing to allocate
+ * an element. Allocation failure can only happen when @gfp_mask does not
+ * include %__GFP_DIRECT_RECLAIM.
*/
void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
{
@@ -456,11 +458,10 @@ EXPORT_SYMBOL(mempool_alloc_noprof);
/**
* mempool_alloc_preallocated - allocate an element from preallocated elements
- * belonging to a specific memory pool
- * @pool: pointer to the memory pool which was allocated via
- * mempool_create().
+ * belonging to a memory pool
+ * @pool: pointer to the memory pool
*
- * This function is similar to mempool_alloc, but it only attempts allocating
+ * This function is similar to mempool_alloc(), but it only attempts allocating
* an element from the preallocated elements. It does not sleep and immediately
* returns if no preallocated elements are available.
*
@@ -492,12 +493,14 @@ void *mempool_alloc_preallocated(mempool_t *pool)
EXPORT_SYMBOL(mempool_alloc_preallocated);
/**
- * mempool_free - return an element to the pool.
- * @element: pool element pointer.
- * @pool: pointer to the memory pool which was allocated via
- * mempool_create().
+ * mempool_free - return an element to a mempool
+ * @element: pointer to element
+ * @pool: pointer to the memory pool
+ *
+ * Returns @element to @pool if it needs replenishing, else frees it using
+ * the free_fn callback in @pool.
*
- * this function only sleeps if the free_fn() function sleeps.
+ * This function only sleeps if the free_fn callback sleeps.
*/
void mempool_free(void *element, mempool_t *pool)
{
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH 3/7] mempool: add error injection support
2025-11-11 13:52 mempool_alloc_bulk and various mempool improvements Christoph Hellwig
2025-11-11 13:52 ` [PATCH 1/7] fault-inject: make enum fault_flags available unconditionally Christoph Hellwig
2025-11-11 13:52 ` [PATCH 2/7] mempool: update kerneldoc comments Christoph Hellwig
@ 2025-11-11 13:52 ` Christoph Hellwig
2025-11-11 13:52 ` [PATCH 4/7] mempool: factor out a mempool_adjust_gfp helper Christoph Hellwig
` (4 subsequent siblings)
7 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-11-11 13:52 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Eric Biggers, linux-mm, linux-kernel
Add a call to should_fail_ex that forces mempool to actually allocate
from the pool to stress the mempool implementation when enabled through
debugfs. By default should_fail{,_ex} prints a very verbose stack trace
that clutters the kernel log, slows down execution and triggers the
kernel bug detection in xfstests. Pass FAULT_NOWARN and print a
single-line message notating the caller instead so that full tests
can be run with fault injection.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/mempool.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/mm/mempool.c b/mm/mempool.c
index 1f4701713203..76ba4d5e656f 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -9,7 +9,7 @@
* started by Ingo Molnar, Copyright (C) 2001
* debugging by David Rientjes, Copyright (C) 2015
*/
-
+#include <linux/fault-inject.h>
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/highmem.h>
@@ -20,6 +20,15 @@
#include <linux/writeback.h>
#include "slab.h"
+static DECLARE_FAULT_ATTR(fail_mempool_alloc);
+
+static int __init mempool_faul_inject_init(void)
+{
+ return PTR_ERR_OR_ZERO(fault_create_debugfs_attr("fail_mempool_alloc",
+ NULL, &fail_mempool_alloc));
+}
+late_initcall(mempool_faul_inject_init);
+
#ifdef CONFIG_SLUB_DEBUG_ON
static void poison_error(mempool_t *pool, void *element, size_t size,
size_t byte)
@@ -404,9 +413,15 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO);
repeat_alloc:
+ if (should_fail_ex(&fail_mempool_alloc, 1, FAULT_NOWARN)) {
+ pr_info("forcing mempool usage for pool %pS\n",
+ (void *)_RET_IP_);
+ element = NULL;
+ } else {
+ element = pool->alloc(gfp_temp, pool->pool_data);
+ }
- element = pool->alloc(gfp_temp, pool->pool_data);
- if (likely(element != NULL))
+ if (likely(element))
return element;
spin_lock_irqsave(&pool->lock, flags);
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH 4/7] mempool: factor out a mempool_adjust_gfp helper
2025-11-11 13:52 mempool_alloc_bulk and various mempool improvements Christoph Hellwig
` (2 preceding siblings ...)
2025-11-11 13:52 ` [PATCH 3/7] mempool: add error injection support Christoph Hellwig
@ 2025-11-11 13:52 ` Christoph Hellwig
2025-11-11 13:52 ` [PATCH 5/7] mempool: factor out a mempool_alloc_from_pool helper Christoph Hellwig
` (3 subsequent siblings)
7 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-11-11 13:52 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Eric Biggers, linux-mm, linux-kernel
Add a helper to better isolate and document the gfp flags adjustments.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
mm/mempool.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/mm/mempool.c b/mm/mempool.c
index 76ba4d5e656f..912364e279e9 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -380,6 +380,19 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
}
EXPORT_SYMBOL(mempool_resize);
+/*
+ * Adjust the gfp flags for mempool allocations, as we never want to dip into
+ * the global emergency reserves or retry in the page allocator.
+ *
+ * The first pass also doesn't want to go reclaim, but the next passes do, so
+ * return a separate subset for that first iteration.
+ */
+static inline gfp_t mempool_adjust_gfp(gfp_t *gfp_mask)
+{
+ *gfp_mask |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN;
+ return *gfp_mask & ~(__GFP_DIRECT_RECLAIM | __GFP_IO);
+}
+
/**
* mempool_alloc - allocate an element from a memory pool
* @pool: pointer to the memory pool
@@ -398,20 +411,14 @@ EXPORT_SYMBOL(mempool_resize);
*/
void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
{
+ gfp_t gfp_temp = mempool_adjust_gfp(&gfp_mask);
void *element;
unsigned long flags;
wait_queue_entry_t wait;
- gfp_t gfp_temp;
VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO);
might_alloc(gfp_mask);
- gfp_mask |= __GFP_NOMEMALLOC; /* don't allocate emergency reserves */
- gfp_mask |= __GFP_NORETRY; /* don't loop in __alloc_pages */
- gfp_mask |= __GFP_NOWARN; /* failures are OK */
-
- gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO);
-
repeat_alloc:
if (should_fail_ex(&fail_mempool_alloc, 1, FAULT_NOWARN)) {
pr_info("forcing mempool usage for pool %pS\n",
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH 5/7] mempool: factor out a mempool_alloc_from_pool helper
2025-11-11 13:52 mempool_alloc_bulk and various mempool improvements Christoph Hellwig
` (3 preceding siblings ...)
2025-11-11 13:52 ` [PATCH 4/7] mempool: factor out a mempool_adjust_gfp helper Christoph Hellwig
@ 2025-11-11 13:52 ` Christoph Hellwig
2025-11-11 13:52 ` [PATCH 6/7] mempool: fix a wakeup race when sleeping for elements Christoph Hellwig
` (2 subsequent siblings)
7 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-11-11 13:52 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Eric Biggers, linux-mm, linux-kernel
Add a helper for the mempool_alloc slowpath to better separate it from the
fast path, and also use it to implement mempool_alloc_preallocated which
shares the same logic.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
mm/mempool.c | 121 ++++++++++++++++++++++++---------------------------
1 file changed, 57 insertions(+), 64 deletions(-)
diff --git a/mm/mempool.c b/mm/mempool.c
index 912364e279e9..850362f4ca7a 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -380,6 +380,50 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
}
EXPORT_SYMBOL(mempool_resize);
+static void *mempool_alloc_from_pool(struct mempool *pool, gfp_t gfp_mask)
+{
+ unsigned long flags;
+ void *element;
+
+ spin_lock_irqsave(&pool->lock, flags);
+ if (unlikely(!pool->curr_nr))
+ goto fail;
+ element = remove_element(pool);
+ spin_unlock_irqrestore(&pool->lock, flags);
+
+ /* Paired with rmb in mempool_free(), read comment there. */
+ smp_wmb();
+
+ /*
+ * Update the allocation stack trace as this is more useful for
+ * debugging.
+ */
+ kmemleak_update_trace(element);
+ return element;
+
+fail:
+ if (gfp_mask & __GFP_DIRECT_RECLAIM) {
+ DEFINE_WAIT(wait);
+
+ prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
+ spin_unlock_irqrestore(&pool->lock, flags);
+
+ /*
+ * Wait for someone else to return an element to @pool.
+ *
+ * FIXME: this should be io_schedule(). The timeout is there as
+ * a workaround for some DM problems in 2.6.18.
+ */
+ io_schedule_timeout(5 * HZ);
+ finish_wait(&pool->wait, &wait);
+ } else {
+ /* We must not sleep if __GFP_DIRECT_RECLAIM is not set. */
+ spin_unlock_irqrestore(&pool->lock, flags);
+ }
+
+ return NULL;
+}
+
/*
* Adjust the gfp flags for mempool allocations, as we never want to dip into
* the global emergency reserves or retry in the page allocator.
@@ -413,8 +457,6 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
{
gfp_t gfp_temp = mempool_adjust_gfp(&gfp_mask);
void *element;
- unsigned long flags;
- wait_queue_entry_t wait;
VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO);
might_alloc(gfp_mask);
@@ -428,53 +470,22 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
element = pool->alloc(gfp_temp, pool->pool_data);
}
- if (likely(element))
- return element;
-
- spin_lock_irqsave(&pool->lock, flags);
- if (likely(pool->curr_nr)) {
- element = remove_element(pool);
- spin_unlock_irqrestore(&pool->lock, flags);
- /* paired with rmb in mempool_free(), read comment there */
- smp_wmb();
+ if (unlikely(!element)) {
/*
- * Update the allocation stack trace as this is more useful
- * for debugging.
+ * Try to allocate an element from the pool.
+ *
+ * The first pass won't have __GFP_DIRECT_RECLAIM and won't
+ * sleep in mempool_alloc_from_pool. Retry the allocation
+ * with all flags set in that case.
*/
- kmemleak_update_trace(element);
- return element;
- }
-
- /*
- * We use gfp mask w/o direct reclaim or IO for the first round. If
- * alloc failed with that and @pool was empty, retry immediately.
- */
- if (gfp_temp != gfp_mask) {
- spin_unlock_irqrestore(&pool->lock, flags);
- gfp_temp = gfp_mask;
- goto repeat_alloc;
- }
-
- /* We must not sleep if !__GFP_DIRECT_RECLAIM */
- if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
- spin_unlock_irqrestore(&pool->lock, flags);
- return NULL;
+ element = mempool_alloc_from_pool(pool, gfp_mask);
+ if (!element && gfp_temp != gfp_mask) {
+ gfp_temp = gfp_mask;
+ goto repeat_alloc;
+ }
}
- /* Let's wait for someone else to return an element to @pool */
- init_wait(&wait);
- prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
-
- spin_unlock_irqrestore(&pool->lock, flags);
-
- /*
- * FIXME: this should be io_schedule(). The timeout is there as a
- * workaround for some DM problems in 2.6.18.
- */
- io_schedule_timeout(5*HZ);
-
- finish_wait(&pool->wait, &wait);
- goto repeat_alloc;
+ return element;
}
EXPORT_SYMBOL(mempool_alloc_noprof);
@@ -492,25 +503,7 @@ EXPORT_SYMBOL(mempool_alloc_noprof);
*/
void *mempool_alloc_preallocated(mempool_t *pool)
{
- void *element;
- unsigned long flags;
-
- spin_lock_irqsave(&pool->lock, flags);
- if (likely(pool->curr_nr)) {
- element = remove_element(pool);
- spin_unlock_irqrestore(&pool->lock, flags);
- /* paired with rmb in mempool_free(), read comment there */
- smp_wmb();
- /*
- * Update the allocation stack trace as this is more useful
- * for debugging.
- */
- kmemleak_update_trace(element);
- return element;
- }
- spin_unlock_irqrestore(&pool->lock, flags);
-
- return NULL;
+ return mempool_alloc_from_pool(pool, GFP_NOWAIT);
}
EXPORT_SYMBOL(mempool_alloc_preallocated);
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH 6/7] mempool: fix a wakeup race when sleeping for elements
2025-11-11 13:52 mempool_alloc_bulk and various mempool improvements Christoph Hellwig
` (4 preceding siblings ...)
2025-11-11 13:52 ` [PATCH 5/7] mempool: factor out a mempool_alloc_from_pool helper Christoph Hellwig
@ 2025-11-11 13:52 ` Christoph Hellwig
2025-11-12 10:53 ` Vlastimil Babka
2025-11-11 13:52 ` [PATCH 7/7] mempool: add mempool_{alloc,free}_bulk Christoph Hellwig
2025-11-12 12:22 ` mempool_alloc_bulk and various mempool improvements Vlastimil Babka
7 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2025-11-11 13:52 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Eric Biggers, linux-mm, linux-kernel
Waiters always need to re-check their condition after adding themselves
to the waitqueue, as otherwise they might miss a wakeup. Check
for elements in the pool and use them before going to sleep.
The workaround mentioned was probably due to this, but seems genuinely
useful for other reasons, so keep it and update the comment describing
it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
mm/mempool.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/mm/mempool.c b/mm/mempool.c
index 850362f4ca7a..8cf3b5705b7f 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -388,6 +388,7 @@ static void *mempool_alloc_from_pool(struct mempool *pool, gfp_t gfp_mask)
spin_lock_irqsave(&pool->lock, flags);
if (unlikely(!pool->curr_nr))
goto fail;
+alloc:
element = remove_element(pool);
spin_unlock_irqrestore(&pool->lock, flags);
@@ -406,13 +407,17 @@ static void *mempool_alloc_from_pool(struct mempool *pool, gfp_t gfp_mask)
DEFINE_WAIT(wait);
prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
+ if (pool->curr_nr) {
+ finish_wait(&pool->wait, &wait);
+ goto alloc;
+ }
spin_unlock_irqrestore(&pool->lock, flags);
/*
- * Wait for someone else to return an element to @pool.
- *
- * FIXME: this should be io_schedule(). The timeout is there as
- * a workaround for some DM problems in 2.6.18.
+ * Wait for someone else to return an element to @pool, but wake
+ * up occasionally as memory pressure might have reduced even
+ * and the normal allocation in alloc_fn could succeed even if
+ * no element was returned.
*/
io_schedule_timeout(5 * HZ);
finish_wait(&pool->wait, &wait);
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH 6/7] mempool: fix a wakeup race when sleeping for elements
2025-11-11 13:52 ` [PATCH 6/7] mempool: fix a wakeup race when sleeping for elements Christoph Hellwig
@ 2025-11-12 10:53 ` Vlastimil Babka
2025-11-12 15:38 ` Christoph Hellwig
0 siblings, 1 reply; 19+ messages in thread
From: Vlastimil Babka @ 2025-11-12 10:53 UTC (permalink / raw)
To: Christoph Hellwig, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Eric Biggers, linux-mm, linux-kernel
On 11/11/25 14:52, Christoph Hellwig wrote:
> Waiters always need to re-check their condition after adding themselves
> to the waitqueue, as otherwise they might miss a wakeup. Check
> for elements in the pool and use them before going to sleep.
>
> The workaround mentioned was probably due to this, but seems genuinely
> useful for other reasons, so keep it and update the comment describing
> it.
Thanks for looking into that historic comment :)
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> mm/mempool.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/mm/mempool.c b/mm/mempool.c
> index 850362f4ca7a..8cf3b5705b7f 100644
> --- a/mm/mempool.c
> +++ b/mm/mempool.c
> @@ -388,6 +388,7 @@ static void *mempool_alloc_from_pool(struct mempool *pool, gfp_t gfp_mask)
> spin_lock_irqsave(&pool->lock, flags);
> if (unlikely(!pool->curr_nr))
> goto fail;
> +alloc:
> element = remove_element(pool);
> spin_unlock_irqrestore(&pool->lock, flags);
>
> @@ -406,13 +407,17 @@ static void *mempool_alloc_from_pool(struct mempool *pool, gfp_t gfp_mask)
> DEFINE_WAIT(wait);
>
> prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
> + if (pool->curr_nr) {
> + finish_wait(&pool->wait, &wait);
> + goto alloc;
> + }
> spin_unlock_irqrestore(&pool->lock, flags);
I think the race already cannot exist, thanks to the pool->lock being
unlocked after prepare_to_wait()?
The freeing path can't bump pool->curr_nr without the lock, so the condition
you added can't even be true, no?
>
> /*
> - * Wait for someone else to return an element to @pool.
> - *
> - * FIXME: this should be io_schedule(). The timeout is there as
> - * a workaround for some DM problems in 2.6.18.
> + * Wait for someone else to return an element to @pool, but wake
> + * up occasionally as memory pressure might have reduced even
> + * and the normal allocation in alloc_fn could succeed even if
> + * no element was returned.
> */
> io_schedule_timeout(5 * HZ);
> finish_wait(&pool->wait, &wait);
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH 6/7] mempool: fix a wakeup race when sleeping for elements
2025-11-12 10:53 ` Vlastimil Babka
@ 2025-11-12 15:38 ` Christoph Hellwig
0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-11-12 15:38 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Christoph Hellwig, Andrew Morton, Christoph Lameter,
David Rientjes, Roman Gushchin, Harry Yoo, Eric Biggers, linux-mm,
linux-kernel
On Wed, Nov 12, 2025 at 11:53:39AM +0100, Vlastimil Babka wrote:
> > +alloc:
> > element = remove_element(pool);
> > spin_unlock_irqrestore(&pool->lock, flags);
> >
> > @@ -406,13 +407,17 @@ static void *mempool_alloc_from_pool(struct mempool *pool, gfp_t gfp_mask)
> > DEFINE_WAIT(wait);
> >
> > prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
> > + if (pool->curr_nr) {
> > + finish_wait(&pool->wait, &wait);
> > + goto alloc;
> > + }
> > spin_unlock_irqrestore(&pool->lock, flags);
> I think the race already cannot exist, thanks to the pool->lock being
> unlocked after prepare_to_wait()?
> The freeing path can't bump pool->curr_nr without the lock, so the condition
> you added can't even be true, no?
You're right, I'll drop this again.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 7/7] mempool: add mempool_{alloc,free}_bulk
2025-11-11 13:52 mempool_alloc_bulk and various mempool improvements Christoph Hellwig
` (5 preceding siblings ...)
2025-11-11 13:52 ` [PATCH 6/7] mempool: fix a wakeup race when sleeping for elements Christoph Hellwig
@ 2025-11-11 13:52 ` Christoph Hellwig
2025-11-12 12:20 ` Vlastimil Babka
2025-11-12 12:22 ` mempool_alloc_bulk and various mempool improvements Vlastimil Babka
7 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2025-11-11 13:52 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Eric Biggers, linux-mm, linux-kernel
Add a version of the mempool allocator that works for batch allocations
of multiple objects. Calling mempool_alloc in a loop is not safe because
it could deadlock if multiple threads are performing such an allocation
at the same time.
As an extra benefit the interface is build so that the same array can be
used for alloc_pages_bulk / release_pages so that at least for page
backed mempools the fast path can use a nice batch optimization.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/linux/mempool.h | 7 ++
mm/mempool.c | 182 ++++++++++++++++++++++++++++++----------
2 files changed, 145 insertions(+), 44 deletions(-)
diff --git a/include/linux/mempool.h b/include/linux/mempool.h
index 34941a4b9026..486ed50776db 100644
--- a/include/linux/mempool.h
+++ b/include/linux/mempool.h
@@ -66,9 +66,16 @@ extern void mempool_destroy(mempool_t *pool);
extern void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask) __malloc;
#define mempool_alloc(...) \
alloc_hooks(mempool_alloc_noprof(__VA_ARGS__))
+int mempool_alloc_bulk_noprof(mempool_t *pool, void **elem,
+ unsigned int count, gfp_t gfp_mask, unsigned long caller_ip);
+#define mempool_alloc_bulk(pool, elem, count, gfp_mask) \
+ alloc_hooks(mempool_alloc_bulk_noprof(pool, elem, count, gfp_mask, \
+ _RET_IP_))
extern void *mempool_alloc_preallocated(mempool_t *pool) __malloc;
extern void mempool_free(void *element, mempool_t *pool);
+unsigned int mempool_free_bulk(mempool_t *pool, void **elem,
+ unsigned int count);
/*
* A mempool_alloc_t and mempool_free_t that get the memory from
diff --git a/mm/mempool.c b/mm/mempool.c
index 8cf3b5705b7f..e2f05bec633b 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -21,11 +21,16 @@
#include "slab.h"
static DECLARE_FAULT_ATTR(fail_mempool_alloc);
+static DECLARE_FAULT_ATTR(fail_mempool_alloc_bulk);
static int __init mempool_faul_inject_init(void)
{
- return PTR_ERR_OR_ZERO(fault_create_debugfs_attr("fail_mempool_alloc",
- NULL, &fail_mempool_alloc));
+ if (IS_ERR(fault_create_debugfs_attr("fail_mempool_alloc", NULL,
+ &fail_mempool_alloc)) ||
+ IS_ERR(fault_create_debugfs_attr("fail_mempool_alloc_bulk", NULL,
+ &fail_mempool_alloc_bulk)))
+ return -ENOMEM;
+ return 0;
}
late_initcall(mempool_faul_inject_init);
@@ -380,16 +385,21 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
}
EXPORT_SYMBOL(mempool_resize);
-static void *mempool_alloc_from_pool(struct mempool *pool, gfp_t gfp_mask)
+static bool mempool_alloc_from_pool(struct mempool *pool, void **elems,
+ unsigned int count, unsigned int allocated,
+ gfp_t gfp_mask)
{
unsigned long flags;
- void *element;
+ unsigned int i;
spin_lock_irqsave(&pool->lock, flags);
- if (unlikely(!pool->curr_nr))
+ if (unlikely(pool->curr_nr < count - allocated))
goto fail;
alloc:
- element = remove_element(pool);
+ for (; allocated < count; allocated++) {
+ if (!elems[allocated])
+ elems[allocated] = remove_element(pool);
+ }
spin_unlock_irqrestore(&pool->lock, flags);
/* Paired with rmb in mempool_free(), read comment there. */
@@ -399,15 +409,16 @@ static void *mempool_alloc_from_pool(struct mempool *pool, gfp_t gfp_mask)
* Update the allocation stack trace as this is more useful for
* debugging.
*/
- kmemleak_update_trace(element);
- return element;
+ for (i = 0; i < count; i++)
+ kmemleak_update_trace(elems[i]);
+ return true;
fail:
if (gfp_mask & __GFP_DIRECT_RECLAIM) {
DEFINE_WAIT(wait);
prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
- if (pool->curr_nr) {
+ if (pool->curr_nr >= count - allocated) {
finish_wait(&pool->wait, &wait);
goto alloc;
}
@@ -426,7 +437,7 @@ static void *mempool_alloc_from_pool(struct mempool *pool, gfp_t gfp_mask)
spin_unlock_irqrestore(&pool->lock, flags);
}
- return NULL;
+ return false;
}
/*
@@ -442,6 +453,72 @@ static inline gfp_t mempool_adjust_gfp(gfp_t *gfp_mask)
return *gfp_mask & ~(__GFP_DIRECT_RECLAIM | __GFP_IO);
}
+/**
+ * mempool_alloc_bulk - allocate multiple elements from a memory pool
+ * @pool: pointer to the memory pool
+ * @elems: partially or fully populated elements array
+ * @count: size (in entries) of @elem
+ * @gfp_mask: GFP_* flags. %__GFP_ZERO is not supported.
+ *
+ * Allocate elements for each slot in @elem that is non-%NULL. This is done by
+ * first calling into the alloc_fn supplied at pool initialization time, and
+ * dipping into the reserved pool when alloc_fn fails to allocate an element.
+ *
+ * This function only sleeps if the alloc_fn callback sleeps, or when waiting
+ * for elements to become available in the pool.
+ *
+ * Return: Always 0. If it wasn't for %$#^$ alloc tags, it would return void.
+ */
+int mempool_alloc_bulk_noprof(struct mempool *pool, void **elems,
+ unsigned int count, gfp_t gfp_mask, unsigned long caller_ip)
+{
+ gfp_t gfp_temp = mempool_adjust_gfp(&gfp_mask);
+ unsigned int i = 0;
+
+ VM_WARN_ON_ONCE(count > pool->min_nr);
+ VM_WARN_ON_ONCE(!(gfp_mask & __GFP_DIRECT_RECLAIM));
+ VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO);
+ might_alloc(gfp_mask);
+
+ /*
+ * If an error is injected, fail all elements in a bulk allocation so
+ * that we stress the multiple elements missing path.
+ */
+ if (should_fail_ex(&fail_mempool_alloc_bulk, 1, FAULT_NOWARN)) {
+ pr_info("forcing mempool usage for pool %pS\n",
+ (void *)_RET_IP_);
+ goto use_pool;
+ }
+
+repeat_alloc:
+ /*
+ * Try to allocate the elements using the allocation callback. If that
+ * succeeds or we were not allowed to sleep, return now. Don't dip into
+ * the reserved pools for !__GFP_DIRECT_RECLAIM allocations as they
+ * aren't guaranteed to succeed and chances of getting an allocation
+ * from the allocators using GFP_ATOMIC is higher than stealing one of
+ * the few items from our usually small pool.
+ */
+ for (; i < count; i++) {
+ if (!elems[i]) {
+ elems[i] = pool->alloc(gfp_temp, pool->pool_data);
+ if (unlikely(!elems[i]))
+ goto use_pool;
+ }
+ }
+
+ return 0;
+
+use_pool:
+ if (!mempool_alloc_from_pool(pool, elems, count, i, gfp_temp)) {
+ gfp_temp = gfp_mask;
+ goto repeat_alloc;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(mempool_alloc_bulk_noprof);
+
/**
* mempool_alloc - allocate an element from a memory pool
* @pool: pointer to the memory pool
@@ -483,8 +560,8 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
* sleep in mempool_alloc_from_pool. Retry the allocation
* with all flags set in that case.
*/
- element = mempool_alloc_from_pool(pool, gfp_mask);
- if (!element && gfp_temp != gfp_mask) {
+ if (!mempool_alloc_from_pool(pool, &element, 1, 0, gfp_mask) &&
+ gfp_temp != gfp_mask) {
gfp_temp = gfp_mask;
goto repeat_alloc;
}
@@ -508,26 +585,33 @@ EXPORT_SYMBOL(mempool_alloc_noprof);
*/
void *mempool_alloc_preallocated(mempool_t *pool)
{
- return mempool_alloc_from_pool(pool, GFP_NOWAIT);
+ void *element = NULL;
+
+ mempool_alloc_from_pool(pool, &element, 1, 0, GFP_NOWAIT);
+ return element;
}
EXPORT_SYMBOL(mempool_alloc_preallocated);
/**
- * mempool_free - return an element to a mempool
- * @element: pointer to element
+ * mempool_free_bulk - return elements to a mempool
* @pool: pointer to the memory pool
+ * @elems: elements to return
+ * @count: number of elements to return
*
- * Returns @element to @pool if it needs replenishing, else frees it using
- * the free_fn callback in @pool.
+ * Returns a number of elements from the start of @elem to @pool if @pool needs
+ * replenishing and sets their slots in @elem to NULL. Other elements are left
+ * in @elem.
*
- * This function only sleeps if the free_fn callback sleeps.
+ * Return: number of elements transferred to @pool. Elements are always
+ * transferred from the beginning of @elem, so the return value can be used as
+ * an offset into @elem for the freeing the remaining elements in the caller.
*/
-void mempool_free(void *element, mempool_t *pool)
+unsigned int mempool_free_bulk(struct mempool *pool, void **elems,
+ unsigned int count)
{
unsigned long flags;
-
- if (unlikely(element == NULL))
- return;
+ unsigned int freed = 0;
+ bool added = false;
/*
* Paired with the wmb in mempool_alloc(). The preceding read is
@@ -561,21 +645,6 @@ void mempool_free(void *element, mempool_t *pool)
* Waiters happen iff curr_nr is 0 and the above guarantee also
* ensures that there will be frees which return elements to the
* pool waking up the waiters.
- */
- if (unlikely(READ_ONCE(pool->curr_nr) < pool->min_nr)) {
- spin_lock_irqsave(&pool->lock, flags);
- if (likely(pool->curr_nr < pool->min_nr)) {
- add_element(pool, element);
- spin_unlock_irqrestore(&pool->lock, flags);
- if (wq_has_sleeper(&pool->wait))
- wake_up(&pool->wait);
- return;
- }
- spin_unlock_irqrestore(&pool->lock, flags);
- }
-
- /*
- * Handle the min_nr = 0 edge case:
*
* For zero-minimum pools, curr_nr < min_nr (0 < 0) never succeeds,
* so waiters sleeping on pool->wait would never be woken by the
@@ -583,20 +652,45 @@ void mempool_free(void *element, mempool_t *pool)
* allocation of element when both min_nr and curr_nr are 0, and
* any active waiters are properly awakened.
*/
- if (unlikely(pool->min_nr == 0 &&
+ if (unlikely(READ_ONCE(pool->curr_nr) < pool->min_nr)) {
+ spin_lock_irqsave(&pool->lock, flags);
+ while (pool->curr_nr < pool->min_nr && freed < count) {
+ add_element(pool, elems[freed++]);
+ added = true;
+ }
+ spin_unlock_irqrestore(&pool->lock, flags);
+ } else if (unlikely(pool->min_nr == 0 &&
READ_ONCE(pool->curr_nr) == 0)) {
+ /* Handle the min_nr = 0 edge case: */
spin_lock_irqsave(&pool->lock, flags);
if (likely(pool->curr_nr == 0)) {
- add_element(pool, element);
- spin_unlock_irqrestore(&pool->lock, flags);
- if (wq_has_sleeper(&pool->wait))
- wake_up(&pool->wait);
- return;
+ add_element(pool, elems[freed++]);
+ added = true;
}
spin_unlock_irqrestore(&pool->lock, flags);
}
- pool->free(element, pool->pool_data);
+ if (unlikely(added) && wq_has_sleeper(&pool->wait))
+ wake_up(&pool->wait);
+
+ return freed;
+}
+EXPORT_SYMBOL_GPL(mempool_free_bulk);
+
+/**
+ * mempool_free - return an element to the pool.
+ * @element: element to return
+ * @pool: pointer to the memory pool
+ *
+ * Returns @element to @pool if it needs replenishing, else frees it using
+ * the free_fn callback in @pool.
+ *
+ * This function only sleeps if the free_fn callback sleeps.
+ */
+void mempool_free(void *element, struct mempool *pool)
+{
+ if (likely(element) && !mempool_free_bulk(pool, &element, 1))
+ pool->free(element, pool->pool_data);
}
EXPORT_SYMBOL(mempool_free);
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH 7/7] mempool: add mempool_{alloc,free}_bulk
2025-11-11 13:52 ` [PATCH 7/7] mempool: add mempool_{alloc,free}_bulk Christoph Hellwig
@ 2025-11-12 12:20 ` Vlastimil Babka
2025-11-12 15:47 ` Christoph Hellwig
0 siblings, 1 reply; 19+ messages in thread
From: Vlastimil Babka @ 2025-11-12 12:20 UTC (permalink / raw)
To: Christoph Hellwig, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Eric Biggers, linux-mm, linux-kernel
On 11/11/25 14:52, Christoph Hellwig wrote:
> Add a version of the mempool allocator that works for batch allocations
> of multiple objects. Calling mempool_alloc in a loop is not safe because
> it could deadlock if multiple threads are performing such an allocation
> at the same time.
>
> As an extra benefit the interface is build so that the same array can be
> used for alloc_pages_bulk / release_pages so that at least for page
> backed mempools the fast path can use a nice batch optimization.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> include/linux/mempool.h | 7 ++
> mm/mempool.c | 182 ++++++++++++++++++++++++++++++----------
> 2 files changed, 145 insertions(+), 44 deletions(-)
>
> diff --git a/include/linux/mempool.h b/include/linux/mempool.h
> index 34941a4b9026..486ed50776db 100644
> --- a/include/linux/mempool.h
> +++ b/include/linux/mempool.h
> @@ -66,9 +66,16 @@ extern void mempool_destroy(mempool_t *pool);
> extern void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask) __malloc;
> #define mempool_alloc(...) \
> alloc_hooks(mempool_alloc_noprof(__VA_ARGS__))
> +int mempool_alloc_bulk_noprof(mempool_t *pool, void **elem,
> + unsigned int count, gfp_t gfp_mask, unsigned long caller_ip);
> +#define mempool_alloc_bulk(pool, elem, count, gfp_mask) \
> + alloc_hooks(mempool_alloc_bulk_noprof(pool, elem, count, gfp_mask, \
> + _RET_IP_))
>
> extern void *mempool_alloc_preallocated(mempool_t *pool) __malloc;
> extern void mempool_free(void *element, mempool_t *pool);
> +unsigned int mempool_free_bulk(mempool_t *pool, void **elem,
> + unsigned int count);
>
> /*
> * A mempool_alloc_t and mempool_free_t that get the memory from
> diff --git a/mm/mempool.c b/mm/mempool.c
> index 8cf3b5705b7f..e2f05bec633b 100644
> --- a/mm/mempool.c
> +++ b/mm/mempool.c
> @@ -21,11 +21,16 @@
> #include "slab.h"
>
> static DECLARE_FAULT_ATTR(fail_mempool_alloc);
> +static DECLARE_FAULT_ATTR(fail_mempool_alloc_bulk);
>
> static int __init mempool_faul_inject_init(void)
> {
> - return PTR_ERR_OR_ZERO(fault_create_debugfs_attr("fail_mempool_alloc",
> - NULL, &fail_mempool_alloc));
> + if (IS_ERR(fault_create_debugfs_attr("fail_mempool_alloc", NULL,
> + &fail_mempool_alloc)) ||
> + IS_ERR(fault_create_debugfs_attr("fail_mempool_alloc_bulk", NULL,
> + &fail_mempool_alloc_bulk)))
> + return -ENOMEM;
Pedantically speaking the error (from debugfs_create_dir()) might be
different, probably doesn't matter in practice.
> + return 0;
> }
> late_initcall(mempool_faul_inject_init);
>
> @@ -380,16 +385,21 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
> }
> EXPORT_SYMBOL(mempool_resize);
>
> -static void *mempool_alloc_from_pool(struct mempool *pool, gfp_t gfp_mask)
> +static bool mempool_alloc_from_pool(struct mempool *pool, void **elems,
> + unsigned int count, unsigned int allocated,
> + gfp_t gfp_mask)
> {
> unsigned long flags;
> - void *element;
> + unsigned int i;
>
> spin_lock_irqsave(&pool->lock, flags);
> - if (unlikely(!pool->curr_nr))
> + if (unlikely(pool->curr_nr < count - allocated))
So we might be pessimistic here when some of the elements in the array
already are not NULL so we need in fact less. Might be an issue if callers
were relying on this for forward progress? It would be simpler to just tell
them not to...
> goto fail;
> alloc:
> - element = remove_element(pool);
> + for (; allocated < count; allocated++) {
> + if (!elems[allocated])
> + elems[allocated] = remove_element(pool);
> + }
> spin_unlock_irqrestore(&pool->lock, flags);
>
> /* Paired with rmb in mempool_free(), read comment there. */
> @@ -399,15 +409,16 @@ static void *mempool_alloc_from_pool(struct mempool *pool, gfp_t gfp_mask)
> * Update the allocation stack trace as this is more useful for
> * debugging.
> */
> - kmemleak_update_trace(element);
> - return element;
> + for (i = 0; i < count; i++)
> + kmemleak_update_trace(elems[i]);
> + return true;
>
> fail:
> if (gfp_mask & __GFP_DIRECT_RECLAIM) {
> DEFINE_WAIT(wait);
>
> prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
> - if (pool->curr_nr) {
> + if (pool->curr_nr >= count - allocated) {
> finish_wait(&pool->wait, &wait);
> goto alloc;
> }
> @@ -426,7 +437,7 @@ static void *mempool_alloc_from_pool(struct mempool *pool, gfp_t gfp_mask)
> spin_unlock_irqrestore(&pool->lock, flags);
> }
>
> - return NULL;
> + return false;
> }
>
> /*
> @@ -442,6 +453,72 @@ static inline gfp_t mempool_adjust_gfp(gfp_t *gfp_mask)
> return *gfp_mask & ~(__GFP_DIRECT_RECLAIM | __GFP_IO);
> }
>
> +/**
> + * mempool_alloc_bulk - allocate multiple elements from a memory pool
> + * @pool: pointer to the memory pool
> + * @elems: partially or fully populated elements array
> + * @count: size (in entries) of @elem
> + * @gfp_mask: GFP_* flags. %__GFP_ZERO is not supported.
We should say __GFP_DIRECT_RECLAIM is mandatory...
> + *
> + * Allocate elements for each slot in @elem that is non-%NULL. This is done by
> + * first calling into the alloc_fn supplied at pool initialization time, and
> + * dipping into the reserved pool when alloc_fn fails to allocate an element.
> + *
> + * This function only sleeps if the alloc_fn callback sleeps, or when waiting
> + * for elements to become available in the pool.
> + *
> + * Return: Always 0. If it wasn't for %$#^$ alloc tags, it would return void.
My sympathies :) But see below.
> + */
> +int mempool_alloc_bulk_noprof(struct mempool *pool, void **elems,
> + unsigned int count, gfp_t gfp_mask, unsigned long caller_ip)
> +{
> + gfp_t gfp_temp = mempool_adjust_gfp(&gfp_mask);
> + unsigned int i = 0;
> +
> + VM_WARN_ON_ONCE(count > pool->min_nr);
> + VM_WARN_ON_ONCE(!(gfp_mask & __GFP_DIRECT_RECLAIM));
... per here.
> + VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO);
> + might_alloc(gfp_mask);
> +
> + /*
> + * If an error is injected, fail all elements in a bulk allocation so
> + * that we stress the multiple elements missing path.
> + */
> + if (should_fail_ex(&fail_mempool_alloc_bulk, 1, FAULT_NOWARN)) {
> + pr_info("forcing mempool usage for pool %pS\n",
> + (void *)_RET_IP_);
> + goto use_pool;
> + }
> +
> +repeat_alloc:
> + /*
> + * Try to allocate the elements using the allocation callback. If that
> + * succeeds or we were not allowed to sleep, return now. Don't dip into
> + * the reserved pools for !__GFP_DIRECT_RECLAIM allocations as they
> + * aren't guaranteed to succeed and chances of getting an allocation
> + * from the allocators using GFP_ATOMIC is higher than stealing one of
> + * the few items from our usually small pool.
> + */
Hm but the code doesn't do what the comment says, AFAICS? It will try
dipping into the pool and might succeed if there are elements, only will not
wait for them there?
> + for (; i < count; i++) {
> + if (!elems[i]) {
> + elems[i] = pool->alloc(gfp_temp, pool->pool_data);
> + if (unlikely(!elems[i]))
> + goto use_pool;
> + }
> + }
> +
> + return 0;
> +
> +use_pool:
So should we bail out here with -ENOMEM when !(gfp_mask & __GFP_DIRECT_RECLAIM)?
> + if (!mempool_alloc_from_pool(pool, elems, count, i, gfp_temp)) {
> + gfp_temp = gfp_mask;
> + goto repeat_alloc;
Because this seems to be an infinite loop otherwise?
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(mempool_alloc_bulk_noprof);
> +
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH 7/7] mempool: add mempool_{alloc,free}_bulk
2025-11-12 12:20 ` Vlastimil Babka
@ 2025-11-12 15:47 ` Christoph Hellwig
2025-11-12 15:56 ` Vlastimil Babka
0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2025-11-12 15:47 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Christoph Hellwig, Andrew Morton, Christoph Lameter,
David Rientjes, Roman Gushchin, Harry Yoo, Eric Biggers, linux-mm,
linux-kernel
On Wed, Nov 12, 2025 at 01:20:21PM +0100, Vlastimil Babka wrote:
> > + if (IS_ERR(fault_create_debugfs_attr("fail_mempool_alloc", NULL,
> > + &fail_mempool_alloc)) ||
> > + IS_ERR(fault_create_debugfs_attr("fail_mempool_alloc_bulk", NULL,
> > + &fail_mempool_alloc_bulk)))
> > + return -ENOMEM;
>
> Pedantically speaking the error (from debugfs_create_dir()) might be
> different, probably doesn't matter in practice.
Yeah, this is an initcall, so the exact error really does not matter.
But adding an error variable isn't that annyoing, so I'll switch to
that.
> > unsigned long flags;
> > - void *element;
> > + unsigned int i;
> >
> > spin_lock_irqsave(&pool->lock, flags);
> > - if (unlikely(!pool->curr_nr))
> > + if (unlikely(pool->curr_nr < count - allocated))
>
> So we might be pessimistic here when some of the elements in the array
> already are not NULL so we need in fact less. Might be an issue if callers
> were relying on this for forward progress? It would be simpler to just tell
> them not to...
Yes. I think alloc_pages_bulk always allocates from the beginning
and doesn't leave random holes? That's what a quick look at the code
suggest, unfortunately the documentation for it totally sucks.
> > + * @pool: pointer to the memory pool
> > + * @elems: partially or fully populated elements array
> > + * @count: size (in entries) of @elem
> > + * @gfp_mask: GFP_* flags. %__GFP_ZERO is not supported.
>
> We should say __GFP_DIRECT_RECLAIM is mandatory...
It's not really going to fit in there :) Maybe I should have ignored
Eric's request to mention __GFP_ZERO here and just keep everything
together.
> > +repeat_alloc:
> > + /*
> > + * Try to allocate the elements using the allocation callback. If that
> > + * succeeds or we were not allowed to sleep, return now. Don't dip into
> > + * the reserved pools for !__GFP_DIRECT_RECLAIM allocations as they
> > + * aren't guaranteed to succeed and chances of getting an allocation
> > + * from the allocators using GFP_ATOMIC is higher than stealing one of
> > + * the few items from our usually small pool.
> > + */
>
> Hm but the code doesn't do what the comment says, AFAICS? It will try
> dipping into the pool and might succeed if there are elements, only will not
> wait for them there?
Yeah, that comment is actually stale from an older version.
>
> > + for (; i < count; i++) {
> > + if (!elems[i]) {
> > + elems[i] = pool->alloc(gfp_temp, pool->pool_data);
> > + if (unlikely(!elems[i]))
> > + goto use_pool;
> > + }
> > + }
> > +
> > + return 0;
> > +
> > +use_pool:
>
> So should we bail out here with -ENOMEM when !(gfp_mask & __GFP_DIRECT_RECLAIM)?
No, I don't want the !__GFP_DIRECT_RECLAIM handling. It's a mess,
and while for mempool_alloc having it for compatibility might make some
sense, I'd rather avoid it for this new interface where the semantics
of failing allocations are going to be really annoying.
> > + if (!mempool_alloc_from_pool(pool, elems, count, i, gfp_temp)) {
> > + gfp_temp = gfp_mask;
> > + goto repeat_alloc;
>
> Because this seems to be an infinite loop otherwise?
You mean if someone passed in !__GFP_DIRECT_RECLAIM and got the warning
above? Yes, IFF that code makes it to production and then runs into
a low-memory situation it would. But it's an API abuse. The other
option would be to just force __GFP_DIRECT_RECLAIM.
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH 7/7] mempool: add mempool_{alloc,free}_bulk
2025-11-12 15:47 ` Christoph Hellwig
@ 2025-11-12 15:56 ` Vlastimil Babka
2025-11-12 15:58 ` Christoph Hellwig
0 siblings, 1 reply; 19+ messages in thread
From: Vlastimil Babka @ 2025-11-12 15:56 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin,
Harry Yoo, Eric Biggers, linux-mm, linux-kernel
On 11/12/25 16:47, Christoph Hellwig wrote:
> On Wed, Nov 12, 2025 at 01:20:21PM +0100, Vlastimil Babka wrote:
>> > + if (IS_ERR(fault_create_debugfs_attr("fail_mempool_alloc", NULL,
>> > + &fail_mempool_alloc)) ||
>> > + IS_ERR(fault_create_debugfs_attr("fail_mempool_alloc_bulk", NULL,
>> > + &fail_mempool_alloc_bulk)))
>> > + return -ENOMEM;
>>
>> Pedantically speaking the error (from debugfs_create_dir()) might be
>> different, probably doesn't matter in practice.
>
> Yeah, this is an initcall, so the exact error really does not matter.
> But adding an error variable isn't that annyoing, so I'll switch to
> that.
>
>> > unsigned long flags;
>> > - void *element;
>> > + unsigned int i;
>> >
>> > spin_lock_irqsave(&pool->lock, flags);
>> > - if (unlikely(!pool->curr_nr))
>> > + if (unlikely(pool->curr_nr < count - allocated))
>>
>> So we might be pessimistic here when some of the elements in the array
>> already are not NULL so we need in fact less. Might be an issue if callers
>> were relying on this for forward progress? It would be simpler to just tell
>> them not to...
>
> Yes. I think alloc_pages_bulk always allocates from the beginning
> and doesn't leave random holes? That's what a quick look at the code
> suggest, unfortunately the documentation for it totally sucks.
Yeah I think it's fine with alloc_pages_bulk. It would only be a concern is
the bulk alloc+mempool user used up part of the allocated array, NULLing
some earlier pointers but leaving later ones alone, and then attempted to
refill it.
>> > + * @pool: pointer to the memory pool
>> > + * @elems: partially or fully populated elements array
>> > + * @count: size (in entries) of @elem
>> > + * @gfp_mask: GFP_* flags. %__GFP_ZERO is not supported.
>>
>> We should say __GFP_DIRECT_RECLAIM is mandatory...
>
> It's not really going to fit in there :) Maybe I should have ignored
> Eric's request to mention __GFP_ZERO here and just keep everything
> together.
I thought it can be multiline, but if not, can refer to the notes below and
explain there, yeah.
>> > +repeat_alloc:
>> > + /*
>> > + * Try to allocate the elements using the allocation callback. If that
>> > + * succeeds or we were not allowed to sleep, return now. Don't dip into
>> > + * the reserved pools for !__GFP_DIRECT_RECLAIM allocations as they
>> > + * aren't guaranteed to succeed and chances of getting an allocation
>> > + * from the allocators using GFP_ATOMIC is higher than stealing one of
>> > + * the few items from our usually small pool.
>> > + */
>>
>> Hm but the code doesn't do what the comment says, AFAICS? It will try
>> dipping into the pool and might succeed if there are elements, only will not
>> wait for them there?
> Yeah, that comment is actually stale from an older version.
>
>>
>> > + for (; i < count; i++) {
>> > + if (!elems[i]) {
>> > + elems[i] = pool->alloc(gfp_temp, pool->pool_data);
>> > + if (unlikely(!elems[i]))
>> > + goto use_pool;
>> > + }
>> > + }
>> > +
>> > + return 0;
>> > +
>> > +use_pool:
>>
>> So should we bail out here with -ENOMEM when !(gfp_mask & __GFP_DIRECT_RECLAIM)?
>
> No, I don't want the !__GFP_DIRECT_RECLAIM handling. It's a mess,
> and while for mempool_alloc having it for compatibility might make some
> sense, I'd rather avoid it for this new interface where the semantics
> of failing allocations are going to be really annoying.
OK.
>> > + if (!mempool_alloc_from_pool(pool, elems, count, i, gfp_temp)) {
>> > + gfp_temp = gfp_mask;
>> > + goto repeat_alloc;
>>
>> Because this seems to be an infinite loop otherwise?
>
> You mean if someone passed in !__GFP_DIRECT_RECLAIM and got the warning
> above? Yes, IFF that code makes it to production and then runs into
> a low-memory situation it would. But it's an API abuse. The other
> option would be to just force __GFP_DIRECT_RECLAIM.
True, so let's ignore it.
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH 7/7] mempool: add mempool_{alloc,free}_bulk
2025-11-12 15:56 ` Vlastimil Babka
@ 2025-11-12 15:58 ` Christoph Hellwig
0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-11-12 15:58 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Christoph Hellwig, Andrew Morton, Christoph Lameter,
David Rientjes, Roman Gushchin, Harry Yoo, Eric Biggers, linux-mm,
linux-kernel
On Wed, Nov 12, 2025 at 04:56:19PM +0100, Vlastimil Babka wrote:
> > and doesn't leave random holes? That's what a quick look at the code
> > suggest, unfortunately the documentation for it totally sucks.
>
> Yeah I think it's fine with alloc_pages_bulk. It would only be a concern is
> the bulk alloc+mempool user used up part of the allocated array, NULLing
> some earlier pointers but leaving later ones alone, and then attempted to
> refill it.
We could have a sanity check for that. The cache line is hot anyway,
so another few NULL checks aren't going to cost us much.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: mempool_alloc_bulk and various mempool improvements
2025-11-11 13:52 mempool_alloc_bulk and various mempool improvements Christoph Hellwig
` (6 preceding siblings ...)
2025-11-11 13:52 ` [PATCH 7/7] mempool: add mempool_{alloc,free}_bulk Christoph Hellwig
@ 2025-11-12 12:22 ` Vlastimil Babka
2025-11-12 15:50 ` Christoph Hellwig
7 siblings, 1 reply; 19+ messages in thread
From: Vlastimil Babka @ 2025-11-12 12:22 UTC (permalink / raw)
To: Christoph Hellwig, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Eric Biggers, linux-mm, linux-kernel
On 11/11/25 14:52, Christoph Hellwig wrote:
> Hi all,
>
> this series adds a bulk version of mempool_alloc that makes allocating
> multiple objects deadlock safe.
>
> The initial users is the blk-crypto-fallback code:
>
> https://lore.kernel.org/linux-block/20251031093517.1603379-1-hch@lst.de/
>
> with which v1 was posted, but I also have a few other users in mind.
How do we handle this then once it's settled?
> Changes since v1:
> - fix build for !CONFIG_FAULT_INJECTION
> - improve the kerneldoc comments further
> - refactor the code so that the mempool_alloc fastpath does not
> have to deal with arrays
> - don't support !__GFP_DIRECT_RECLAIM for bulk allocations
> - do poll allocations even if not all elements are available
> - s/elem/elems/
> - use a separate failure injection know for the bulk allocator
>
> diffstat:
> include/linux/fault-inject.h | 8 -
> include/linux/mempool.h | 7
> mm/mempool.c | 341 ++++++++++++++++++++++++++++---------------
> 3 files changed, 240 insertions(+), 116 deletions(-)
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: mempool_alloc_bulk and various mempool improvements
2025-11-12 12:22 ` mempool_alloc_bulk and various mempool improvements Vlastimil Babka
@ 2025-11-12 15:50 ` Christoph Hellwig
2025-11-12 15:57 ` Vlastimil Babka
2025-11-12 17:34 ` Eric Biggers
0 siblings, 2 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-11-12 15:50 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Christoph Hellwig, Andrew Morton, Christoph Lameter,
David Rientjes, Roman Gushchin, Harry Yoo, Eric Biggers, linux-mm,
linux-kernel
On Wed, Nov 12, 2025 at 01:22:01PM +0100, Vlastimil Babka wrote:
> On 11/11/25 14:52, Christoph Hellwig wrote:
> > Hi all,
> >
> > this series adds a bulk version of mempool_alloc that makes allocating
> > multiple objects deadlock safe.
> >
> > The initial users is the blk-crypto-fallback code:
> >
> > https://lore.kernel.org/linux-block/20251031093517.1603379-1-hch@lst.de/
> >
> > with which v1 was posted, but I also have a few other users in mind.
>
> How do we handle this then once it's settled?
Good question. I think there is enough mm material now for a merge
through an mm tree. If we get it sorted out for this merge window just
merging it through mm even when there is no user yet would be easiest as
the blk-crypto work is probably not going to make this merge window with
additional review from Eric and I'll most like have another user for the
next merge window as well (block bounce buffering for !STABLE_WRITES
dio). Otherwise we'll have to operate with shared branches.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: mempool_alloc_bulk and various mempool improvements
2025-11-12 15:50 ` Christoph Hellwig
@ 2025-11-12 15:57 ` Vlastimil Babka
2025-11-12 17:34 ` Eric Biggers
1 sibling, 0 replies; 19+ messages in thread
From: Vlastimil Babka @ 2025-11-12 15:57 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin,
Harry Yoo, Eric Biggers, linux-mm, linux-kernel
On 11/12/25 16:50, Christoph Hellwig wrote:
> On Wed, Nov 12, 2025 at 01:22:01PM +0100, Vlastimil Babka wrote:
>> On 11/11/25 14:52, Christoph Hellwig wrote:
>> > Hi all,
>> >
>> > this series adds a bulk version of mempool_alloc that makes allocating
>> > multiple objects deadlock safe.
>> >
>> > The initial users is the blk-crypto-fallback code:
>> >
>> > https://lore.kernel.org/linux-block/20251031093517.1603379-1-hch@lst.de/
>> >
>> > with which v1 was posted, but I also have a few other users in mind.
>>
>> How do we handle this then once it's settled?
>
> Good question. I think there is enough mm material now for a merge
> through an mm tree. If we get it sorted out for this merge window just
> merging it through mm even when there is no user yet would be easiest as
OK should work hopefully...
> the blk-crypto work is probably not going to make this merge window with
> additional review from Eric and I'll most like have another user for the
> next merge window as well (block bounce buffering for !STABLE_WRITES
> dio). Otherwise we'll have to operate with shared branches.
... and we can avoid the need for shared branches that way.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: mempool_alloc_bulk and various mempool improvements
2025-11-12 15:50 ` Christoph Hellwig
2025-11-12 15:57 ` Vlastimil Babka
@ 2025-11-12 17:34 ` Eric Biggers
2025-11-13 5:52 ` Christoph Hellwig
1 sibling, 1 reply; 19+ messages in thread
From: Eric Biggers @ 2025-11-12 17:34 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Vlastimil Babka, Andrew Morton, Christoph Lameter, David Rientjes,
Roman Gushchin, Harry Yoo, linux-mm, linux-kernel
On Wed, Nov 12, 2025 at 04:50:11PM +0100, Christoph Hellwig wrote:
> On Wed, Nov 12, 2025 at 01:22:01PM +0100, Vlastimil Babka wrote:
> > On 11/11/25 14:52, Christoph Hellwig wrote:
> > > Hi all,
> > >
> > > this series adds a bulk version of mempool_alloc that makes allocating
> > > multiple objects deadlock safe.
> > >
> > > The initial users is the blk-crypto-fallback code:
> > >
> > > https://lore.kernel.org/linux-block/20251031093517.1603379-1-hch@lst.de/
> > >
> > > with which v1 was posted, but I also have a few other users in mind.
> >
> > How do we handle this then once it's settled?
>
> Good question. I think there is enough mm material now for a merge
> through an mm tree. If we get it sorted out for this merge window just
> merging it through mm even when there is no user yet would be easiest as
> the blk-crypto work is probably not going to make this merge window with
> additional review from Eric and I'll most like have another user for the
> next merge window as well (block bounce buffering for !STABLE_WRITES
> dio).
That sounds good to me. Sorry for the slow review on the
blk-crypto-fallback changes. How about I also take the two fscrypt
cleanups "fscrypt: pass a real sector_t to
fscrypt_zeroout_range_inline_crypt" and "fscrypt: keep multiple bios in
flight in fscrypt_zeroout_range_inline_crypt" through the fscrypt tree
for 6.19 to get them out of the way? They don't depend on anything
else.
- Eric
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: mempool_alloc_bulk and various mempool improvements
2025-11-12 17:34 ` Eric Biggers
@ 2025-11-13 5:52 ` Christoph Hellwig
0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-11-13 5:52 UTC (permalink / raw)
To: Eric Biggers
Cc: Christoph Hellwig, Vlastimil Babka, Andrew Morton,
Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
linux-mm, linux-kernel
On Wed, Nov 12, 2025 at 05:34:05PM +0000, Eric Biggers wrote:
> That sounds good to me. Sorry for the slow review on the
> blk-crypto-fallback changes.
No problem. I've got enough on my plate myself..
> How about I also take the two fscrypt
> cleanups "fscrypt: pass a real sector_t to
> fscrypt_zeroout_range_inline_crypt" and "fscrypt: keep multiple bios in
> flight in fscrypt_zeroout_range_inline_crypt" through the fscrypt tree
> for 6.19 to get them out of the way? They don't depend on anything
> else.
Sure. I'll resend them with the review comments addressed. I'll also
see if any other of the API cleanups from my experimental branch might
be suitable.
^ permalink raw reply [flat|nested] 19+ messages in thread