move blk-crypto-fallback to sit above the block layer

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* move blk-crypto-fallback to sit above the block layer
@ 2025-10-31  9:34 Christoph Hellwig
  2025-10-31  9:34 ` [PATCH 1/9] mempool: update kerneldoc comments Christoph Hellwig
                   ` (8 more replies)
  0 siblings, 9 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-10-31  9:34 UTC (permalink / raw)
  To: Jens Axboe, Eric Biggers, Vlastimil Babka, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

Hi all,

in the past we had various discussions that doing the blk-crypto fallback
below the block layer causes all kinds of problems due to very late
splitting and communicating up features.

This series turns that call chain upside down by requiring the caller to
call into blk-crypto using a new submit_bio wrapper instead so that only
hardware encryption bios are passed through the block layer as such.

While doings this I also noticed that the existing blk-crypto-fallback
code does various unprotected memory allocations which this converts to
mempools, or from loops of mempool allocations to the new safe batch
mempool allocator.

There might be future avenues for optimization by using high order
folio allocations that match the file systems preferred folio size,
but for that'd probably want a batch folio allocator first, in addition
to deferring it to avoid scope creep.

Diffstat:
 block/blk-core.c            |   10 -
 block/blk-crypto-fallback.c |  437 ++++++++++++++++++++------------------------
 block/blk-crypto-internal.h |   31 +--
 block/blk-crypto.c          |   65 ++----
 fs/buffer.c                 |    3 
 fs/crypto/bio.c             |   89 +++++---
 fs/ext4/page-io.c           |    3 
 fs/ext4/readpage.c          |    9 
 fs/f2fs/data.c              |    4 
 fs/f2fs/file.c              |    3 
 fs/iomap/direct-io.c        |    3 
 include/linux/blk-crypto.h  |   16 +
 include/linux/mempool.h     |    7 
 mm/mempool.c                |  173 ++++++++++++-----
 14 files changed, 464 insertions(+), 389 deletions(-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/9] mempool: update kerneldoc comments
  2025-10-31  9:34 move blk-crypto-fallback to sit above the block layer Christoph Hellwig
@ 2025-10-31  9:34 ` Christoph Hellwig
  2025-11-05 14:02   ` Vlastimil Babka
  2025-11-07  3:26   ` Eric Biggers
  2025-10-31  9:34 ` [PATCH 2/9] mempool: add error injection support Christoph Hellwig
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-10-31  9:34 UTC (permalink / raw)
  To: Jens Axboe, Eric Biggers, Vlastimil Babka, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

Use proper formatting, use full sentences and reduce some verbosity in
function parameter descriptions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/mempool.c | 36 +++++++++++++++++-------------------
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/mm/mempool.c b/mm/mempool.c
index 1c38e873e546..d7c55a98c2be 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -372,18 +372,15 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
 EXPORT_SYMBOL(mempool_resize);
 
 /**
- * mempool_alloc - allocate an element from a specific memory pool
- * @pool:      pointer to the memory pool which was allocated via
- *             mempool_create().
- * @gfp_mask:  the usual allocation bitmask.
+ * mempool_alloc - allocate an element from a memory pool
+ * @pool:	pointer to the memory pool
+ * @gfp_mask:	GFP_* flags.
  *
- * this function only sleeps if the alloc_fn() function sleeps or
- * returns NULL. Note that due to preallocation, this function
- * *never* fails when called from process contexts. (it might
- * fail if called from an IRQ context.)
- * Note: using __GFP_ZERO is not supported.
+ * Note: This function only sleeps if the alloc_fn callback sleeps or returns
+ * %NULL.  Using __GFP_ZERO is not supported.
  *
- * Return: pointer to the allocated element or %NULL on error.
+ * Return: pointer to the allocated element or %NULL on error. This function
+ * never returns %NULL when @gfp_mask allows sleeping.
  */
 void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
 {
@@ -456,11 +453,10 @@ EXPORT_SYMBOL(mempool_alloc_noprof);
 
 /**
  * mempool_alloc_preallocated - allocate an element from preallocated elements
- *                              belonging to a specific memory pool
- * @pool:      pointer to the memory pool which was allocated via
- *             mempool_create().
+ *                              belonging to a memory pool
+ * @pool:	pointer to the memory pool
  *
- * This function is similar to mempool_alloc, but it only attempts allocating
+ * This function is similar to mempool_alloc(), but it only attempts allocating
  * an element from the preallocated elements. It does not sleep and immediately
  * returns if no preallocated elements are available.
  *
@@ -492,12 +488,14 @@ void *mempool_alloc_preallocated(mempool_t *pool)
 EXPORT_SYMBOL(mempool_alloc_preallocated);
 
 /**
- * mempool_free - return an element to the pool.
- * @element:   pool element pointer.
- * @pool:      pointer to the memory pool which was allocated via
- *             mempool_create().
+ * mempool_free - return an element to a mempool
+ * @element:	pointer to element
+ * @pool:	pointer to the memory pool
+ *
+ * Returns @elem to @pool if its needs replenishing, else free it using
+ * the free_fn callback in @pool.
  *
- * this function only sleeps if the free_fn() function sleeps.
+ * This function only sleeps if the free_fn callback sleeps.
  */
 void mempool_free(void *element, mempool_t *pool)
 {
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 2/9] mempool: add error injection support
  2025-10-31  9:34 move blk-crypto-fallback to sit above the block layer Christoph Hellwig
  2025-10-31  9:34 ` [PATCH 1/9] mempool: update kerneldoc comments Christoph Hellwig
@ 2025-10-31  9:34 ` Christoph Hellwig
  2025-11-05 14:04   ` Vlastimil Babka
  2025-11-07  3:29   ` Eric Biggers
  2025-10-31  9:34 ` [PATCH 3/9] mempool: add mempool_{alloc,free}_bulk Christoph Hellwig
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-10-31  9:34 UTC (permalink / raw)
  To: Jens Axboe, Eric Biggers, Vlastimil Babka, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

Add a call to should_fail_ex that forces mempool to actually allocate
from the pool to stress the mempool implementation when enabled through
debugfs.  By default should_fail{,_ex} prints a very verbose stack trace
that clutters the kernel log, slows down execution and triggers the
kernel bug detection in xfstests.  Pass FAULT_NOWARN and print a
single-line message notating the caller instead so that full tests
can be run with fault injection.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/mempool.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/mm/mempool.c b/mm/mempool.c
index d7c55a98c2be..15581179c8b9 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -9,7 +9,7 @@
  *  started by Ingo Molnar, Copyright (C) 2001
  *  debugging by David Rientjes, Copyright (C) 2015
  */
-
+#include <linux/fault-inject.h>
 #include <linux/mm.h>
 #include <linux/slab.h>
 #include <linux/highmem.h>
@@ -20,6 +20,15 @@
 #include <linux/writeback.h>
 #include "slab.h"
 
+static DECLARE_FAULT_ATTR(fail_mempool_alloc);
+
+static int __init mempool_faul_inject_init(void)
+{
+	return PTR_ERR_OR_ZERO(fault_create_debugfs_attr("fail_mempool_alloc",
+			NULL, &fail_mempool_alloc));
+}
+late_initcall(mempool_faul_inject_init);
+
 #ifdef CONFIG_SLUB_DEBUG_ON
 static void poison_error(mempool_t *pool, void *element, size_t size,
 			 size_t byte)
@@ -399,10 +408,15 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
 	gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO);
 
 repeat_alloc:
-
-	element = pool->alloc(gfp_temp, pool->pool_data);
-	if (likely(element != NULL))
-		return element;
+	if (should_fail_ex(&fail_mempool_alloc, 1, FAULT_NOWARN)) {
+		pr_info("forcing mempool usage for pool %pS\n",
+				(void *)_RET_IP_);
+		element = NULL;
+	} else {
+		element = pool->alloc(gfp_temp, pool->pool_data);
+		if (likely(element != NULL))
+			return element;
+	}
 
 	spin_lock_irqsave(&pool->lock, flags);
 	if (likely(pool->curr_nr)) {
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/9] mempool: add mempool_{alloc,free}_bulk
  2025-10-31  9:34 move blk-crypto-fallback to sit above the block layer Christoph Hellwig
  2025-10-31  9:34 ` [PATCH 1/9] mempool: update kerneldoc comments Christoph Hellwig
  2025-10-31  9:34 ` [PATCH 2/9] mempool: add error injection support Christoph Hellwig
@ 2025-10-31  9:34 ` Christoph Hellwig
  2025-11-05 15:04   ` Vlastimil Babka
  2025-11-07  3:52   ` Eric Biggers
  2025-10-31  9:34 ` [PATCH 4/9] fscrypt: pass a real sector_t to fscrypt_zeroout_range_inline_crypt Christoph Hellwig
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-10-31  9:34 UTC (permalink / raw)
  To: Jens Axboe, Eric Biggers, Vlastimil Babka, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

Add a version of the mempool allocator that works for batch allocations
of multiple objects.  Calling mempool_alloc in a loop is not safe because
it could deadlock if multiple threads are performing such an allocation
at the same time.

As an extra benefit the interface is build so that the same array can be
used for alloc_pages_bulk / release_pages so that at least for page
backed mempools the fast path can use a nice batch optimization.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/mempool.h |   7 ++
 mm/mempool.c            | 145 ++++++++++++++++++++++++++++------------
 2 files changed, 111 insertions(+), 41 deletions(-)

diff --git a/include/linux/mempool.h b/include/linux/mempool.h
index 34941a4b9026..486ed50776db 100644
--- a/include/linux/mempool.h
+++ b/include/linux/mempool.h
@@ -66,9 +66,16 @@ extern void mempool_destroy(mempool_t *pool);
 extern void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask) __malloc;
 #define mempool_alloc(...)						\
 	alloc_hooks(mempool_alloc_noprof(__VA_ARGS__))
+int mempool_alloc_bulk_noprof(mempool_t *pool, void **elem,
+		unsigned int count, gfp_t gfp_mask, unsigned long caller_ip);
+#define mempool_alloc_bulk(pool, elem, count, gfp_mask)			\
+	alloc_hooks(mempool_alloc_bulk_noprof(pool, elem, count, gfp_mask, \
+			_RET_IP_))
 
 extern void *mempool_alloc_preallocated(mempool_t *pool) __malloc;
 extern void mempool_free(void *element, mempool_t *pool);
+unsigned int mempool_free_bulk(mempool_t *pool, void **elem,
+		unsigned int count);
 
 /*
  * A mempool_alloc_t and mempool_free_t that get the memory from
diff --git a/mm/mempool.c b/mm/mempool.c
index 15581179c8b9..c980a0396986 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -381,23 +381,29 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
 EXPORT_SYMBOL(mempool_resize);
 
 /**
- * mempool_alloc - allocate an element from a memory pool
+ * mempool_alloc_bulk - allocate multiple elements from a memory pool
  * @pool:	pointer to the memory pool
+ * @elem:	partially or fully populated elements array
+ * @count:	size (in entries) of @elem
  * @gfp_mask:	GFP_* flags.
  *
+ * Allocate elements for each slot in @elem that is non-%NULL.
+ *
  * Note: This function only sleeps if the alloc_fn callback sleeps or returns
  * %NULL.  Using __GFP_ZERO is not supported.
  *
- * Return: pointer to the allocated element or %NULL on error. This function
- * never returns %NULL when @gfp_mask allows sleeping.
+ * Return: 0 if successful, else -ENOMEM.  This function never returns -ENOMEM
+ * when @gfp_mask allows sleeping.
  */
-void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
+int mempool_alloc_bulk_noprof(struct mempool *pool, void **elem,
+		unsigned int count, gfp_t gfp_mask, unsigned long caller_ip)
 {
-	void *element;
 	unsigned long flags;
 	wait_queue_entry_t wait;
 	gfp_t gfp_temp;
+	unsigned int i;
 
+	VM_WARN_ON_ONCE(count > pool->min_nr);
 	VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO);
 	might_alloc(gfp_mask);
 
@@ -407,20 +413,31 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
 
 	gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO);
 
+	i = 0;
 repeat_alloc:
-	if (should_fail_ex(&fail_mempool_alloc, 1, FAULT_NOWARN)) {
-		pr_info("forcing mempool usage for pool %pS\n",
-				(void *)_RET_IP_);
-		element = NULL;
-	} else {
-		element = pool->alloc(gfp_temp, pool->pool_data);
-		if (likely(element != NULL))
-			return element;
+	for (; i < count; i++) {
+		if (!elem[i]) {
+			if (should_fail_ex(&fail_mempool_alloc, 1,
+					FAULT_NOWARN)) {
+				pr_info("forcing pool usage for pool %pS\n",
+					(void *)caller_ip);
+				goto use_pool;
+			}
+			elem[i] = pool->alloc(gfp_temp, pool->pool_data);
+			if (unlikely(!elem[i]))
+				goto use_pool;
+		}
 	}
 
+	return 0;
+
+use_pool:
 	spin_lock_irqsave(&pool->lock, flags);
-	if (likely(pool->curr_nr)) {
-		element = remove_element(pool);
+	if (likely(pool->curr_nr >= count - i)) {
+		for (; i < count; i++) {
+			if (!elem[i])
+				elem[i] = remove_element(pool);
+		}
 		spin_unlock_irqrestore(&pool->lock, flags);
 		/* paired with rmb in mempool_free(), read comment there */
 		smp_wmb();
@@ -428,8 +445,9 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
 		 * Update the allocation stack trace as this is more useful
 		 * for debugging.
 		 */
-		kmemleak_update_trace(element);
-		return element;
+		for (i = 0; i < count; i++)
+			kmemleak_update_trace(elem[i]);
+		return 0;
 	}
 
 	/*
@@ -445,10 +463,12 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
 	/* We must not sleep if !__GFP_DIRECT_RECLAIM */
 	if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
 		spin_unlock_irqrestore(&pool->lock, flags);
-		return NULL;
+		if (i > 0)
+			mempool_free_bulk(pool, elem + i, count - i);
+		return -ENOMEM;
 	}
 
-	/* Let's wait for someone else to return an element to @pool */
+	/* Let's wait for someone else to return elements to @pool */
 	init_wait(&wait);
 	prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
 
@@ -463,6 +483,27 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
 	finish_wait(&pool->wait, &wait);
 	goto repeat_alloc;
 }
+EXPORT_SYMBOL_GPL(mempool_alloc_bulk_noprof);
+
+/**
+ * mempool_alloc - allocate an element from a memory pool
+ * @pool:	pointer to the memory pool
+ * @gfp_mask:	GFP_* flags.
+ *
+ * Note: This function only sleeps if the alloc_fn callback sleeps or returns
+ * %NULL.  Using __GFP_ZERO is not supported.
+ *
+ * Return: pointer to the allocated element or %NULL on error. This function
+ * never returns %NULL when @gfp_mask allows sleeping.
+ */
+void *mempool_alloc_noprof(struct mempool *pool, gfp_t gfp_mask)
+{
+	void *elem[1] = { };
+
+	if (mempool_alloc_bulk_noprof(pool, elem, 1, gfp_mask, _RET_IP_) < 0)
+		return NULL;
+	return elem[0];
+}
 EXPORT_SYMBOL(mempool_alloc_noprof);
 
 /**
@@ -502,21 +543,26 @@ void *mempool_alloc_preallocated(mempool_t *pool)
 EXPORT_SYMBOL(mempool_alloc_preallocated);
 
 /**
- * mempool_free - return an element to a mempool
- * @element:	pointer to element
+ * mempool_free_bulk - return elements to a mempool
  * @pool:	pointer to the memory pool
+ * @elem:	elements to return
+ * @count:	number of elements to return
  *
- * Returns @elem to @pool if its needs replenishing, else free it using
- * the free_fn callback in @pool.
+ * Returns elements from @elem to @pool if its needs replenishing and sets
+ * their slot in @elem to NULL.  Other elements are left in @elem.
+ *
+ * Return: number of elements transferred to @pool.  Elements are always
+ * transferred from the beginning of @elem, so the return value can be used as
+ * an offset into @elem for the freeing the remaining elements in the caller.
  *
  * This function only sleeps if the free_fn callback sleeps.
  */
-void mempool_free(void *element, mempool_t *pool)
+unsigned int mempool_free_bulk(struct mempool *pool, void **elem,
+		unsigned int count)
 {
 	unsigned long flags;
-
-	if (unlikely(element == NULL))
-		return;
+	bool added = false;
+	unsigned int freed = 0;
 
 	/*
 	 * Paired with the wmb in mempool_alloc().  The preceding read is
@@ -553,15 +599,11 @@ void mempool_free(void *element, mempool_t *pool)
 	 */
 	if (unlikely(READ_ONCE(pool->curr_nr) < pool->min_nr)) {
 		spin_lock_irqsave(&pool->lock, flags);
-		if (likely(pool->curr_nr < pool->min_nr)) {
-			add_element(pool, element);
-			spin_unlock_irqrestore(&pool->lock, flags);
-			if (wq_has_sleeper(&pool->wait))
-				wake_up(&pool->wait);
-			return;
+		while (pool->curr_nr < pool->min_nr && freed < count) {
+			add_element(pool, elem[freed++]);
+			added = true;
 		}
 		spin_unlock_irqrestore(&pool->lock, flags);
-	}
 
 	/*
 	 * Handle the min_nr = 0 edge case:
@@ -572,20 +614,41 @@ void mempool_free(void *element, mempool_t *pool)
 	 * allocation of element when both min_nr and curr_nr are 0, and
 	 * any active waiters are properly awakened.
 	 */
-	if (unlikely(pool->min_nr == 0 &&
+	} else if (unlikely(pool->min_nr == 0 &&
 		     READ_ONCE(pool->curr_nr) == 0)) {
 		spin_lock_irqsave(&pool->lock, flags);
 		if (likely(pool->curr_nr == 0)) {
-			add_element(pool, element);
-			spin_unlock_irqrestore(&pool->lock, flags);
-			if (wq_has_sleeper(&pool->wait))
-				wake_up(&pool->wait);
-			return;
+			add_element(pool, elem[freed++]);
+			added = true;
 		}
 		spin_unlock_irqrestore(&pool->lock, flags);
 	}
 
-	pool->free(element, pool->pool_data);
+	if (unlikely(added) && wq_has_sleeper(&pool->wait))
+		wake_up(&pool->wait);
+
+	return freed;
+}
+EXPORT_SYMBOL_GPL(mempool_free_bulk);
+
+/**
+ * mempool_free - return an element to the pool.
+ * @element:	element to return
+ * @pool:	pointer to the memory pool
+ *
+ * Returns @elem to @pool if its needs replenishing, else free it using
+ * the free_fn callback in @pool.
+ *
+ * This function only sleeps if the free_fn callback sleeps.
+ */
+void mempool_free(void *element, struct mempool *pool)
+{
+	if (likely(element)) {
+		void *elem[1] = { element };
+
+		if (!mempool_free_bulk(pool, elem, 1))
+			pool->free(element, pool->pool_data);
+	}
 }
 EXPORT_SYMBOL(mempool_free);
 
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 4/9] fscrypt: pass a real sector_t to fscrypt_zeroout_range_inline_crypt
  2025-10-31  9:34 move blk-crypto-fallback to sit above the block layer Christoph Hellwig
                   ` (2 preceding siblings ...)
  2025-10-31  9:34 ` [PATCH 3/9] mempool: add mempool_{alloc,free}_bulk Christoph Hellwig
@ 2025-10-31  9:34 ` Christoph Hellwig
  2025-11-07  3:55   ` Eric Biggers
  2025-10-31  9:34 ` [PATCH 5/9] fscrypt: keep multiple bios in flight in fscrypt_zeroout_range_inline_crypt Christoph Hellwig
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-10-31  9:34 UTC (permalink / raw)
  To: Jens Axboe, Eric Biggers, Vlastimil Babka, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

While the pblk argument to fscrypt_zeroout_range_inline_crypt is
declared as a sector_t it actually is interpreted as a logical block
size unit, which is highly unusual.  Switch to passing the 512 byte
units that sector_t is defined for.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/crypto/bio.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/crypto/bio.c b/fs/crypto/bio.c
index 5f5599020e94..68b0424d879a 100644
--- a/fs/crypto/bio.c
+++ b/fs/crypto/bio.c
@@ -48,7 +48,7 @@ bool fscrypt_decrypt_bio(struct bio *bio)
 EXPORT_SYMBOL(fscrypt_decrypt_bio);
 
 static int fscrypt_zeroout_range_inline_crypt(const struct inode *inode,
-					      pgoff_t lblk, sector_t pblk,
+					      pgoff_t lblk, sector_t sector,
 					      unsigned int len)
 {
 	const unsigned int blockbits = inode->i_blkbits;
@@ -67,8 +67,7 @@ static int fscrypt_zeroout_range_inline_crypt(const struct inode *inode,
 
 		if (num_pages == 0) {
 			fscrypt_set_bio_crypt_ctx(bio, inode, lblk, GFP_NOFS);
-			bio->bi_iter.bi_sector =
-					pblk << (blockbits - SECTOR_SHIFT);
+			bio->bi_iter.bi_sector = sector;
 		}
 		ret = bio_add_page(bio, ZERO_PAGE(0), bytes_this_page, 0);
 		if (WARN_ON_ONCE(ret != bytes_this_page)) {
@@ -78,7 +77,7 @@ static int fscrypt_zeroout_range_inline_crypt(const struct inode *inode,
 		num_pages++;
 		len -= blocks_this_page;
 		lblk += blocks_this_page;
-		pblk += blocks_this_page;
+		sector += (bytes_this_page >> SECTOR_SHIFT);
 		if (num_pages == BIO_MAX_VECS || !len ||
 		    !fscrypt_mergeable_bio(bio, inode, lblk)) {
 			err = submit_bio_wait(bio);
@@ -132,7 +131,7 @@ int fscrypt_zeroout_range(const struct inode *inode, pgoff_t lblk,
 		return 0;
 
 	if (fscrypt_inode_uses_inline_crypto(inode))
-		return fscrypt_zeroout_range_inline_crypt(inode, lblk, pblk,
+		return fscrypt_zeroout_range_inline_crypt(inode, lblk, sector,
 							  len);
 
 	BUILD_BUG_ON(ARRAY_SIZE(pages) > BIO_MAX_VECS);
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 5/9] fscrypt: keep multiple bios in flight in fscrypt_zeroout_range_inline_crypt
  2025-10-31  9:34 move blk-crypto-fallback to sit above the block layer Christoph Hellwig
                   ` (3 preceding siblings ...)
  2025-10-31  9:34 ` [PATCH 4/9] fscrypt: pass a real sector_t to fscrypt_zeroout_range_inline_crypt Christoph Hellwig
@ 2025-10-31  9:34 ` Christoph Hellwig
  2025-11-07  4:06   ` Eric Biggers
  2025-10-31  9:34 ` [PATCH 6/9] blk-crypto: optimize bio splitting in blk_crypto_fallback_encrypt_bio Christoph Hellwig
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-10-31  9:34 UTC (permalink / raw)
  To: Jens Axboe, Eric Biggers, Vlastimil Babka, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

This should slightly improve performance for large zeroing operations,
but more importantly prepares for blk-crypto refactoring that requires
all fscrypt users to call submit_bio directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/crypto/bio.c | 84 +++++++++++++++++++++++++++++++------------------
 1 file changed, 53 insertions(+), 31 deletions(-)

diff --git a/fs/crypto/bio.c b/fs/crypto/bio.c
index 68b0424d879a..e59d342b4240 100644
--- a/fs/crypto/bio.c
+++ b/fs/crypto/bio.c
@@ -47,49 +47,71 @@ bool fscrypt_decrypt_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(fscrypt_decrypt_bio);
 
+struct fscrypt_zero_done {
+	atomic_t		pending;
+	blk_status_t		status;
+	struct completion	done;
+};
+
+static void fscrypt_zeroout_range_done(struct fscrypt_zero_done *done)
+{
+	if (atomic_dec_and_test(&done->pending))
+		complete(&done->done);
+}
+
+static void fscrypt_zeroout_range_end_io(struct bio *bio)
+{
+	struct fscrypt_zero_done *done = bio->bi_private;
+
+	if (bio->bi_status)
+		cmpxchg(&done->status, 0, bio->bi_status);
+	fscrypt_zeroout_range_done(done);
+	bio_put(bio);
+}
+
 static int fscrypt_zeroout_range_inline_crypt(const struct inode *inode,
 					      pgoff_t lblk, sector_t sector,
 					      unsigned int len)
 {
 	const unsigned int blockbits = inode->i_blkbits;
 	const unsigned int blocks_per_page = 1 << (PAGE_SHIFT - blockbits);
-	struct bio *bio;
-	int ret, err = 0;
-	int num_pages = 0;
+	struct fscrypt_zero_done done = {};
 
-	/* This always succeeds since __GFP_DIRECT_RECLAIM is set. */
-	bio = bio_alloc(inode->i_sb->s_bdev, BIO_MAX_VECS, REQ_OP_WRITE,
-			GFP_NOFS);
+	atomic_set(&done.pending, 1);
+	init_completion(&done.done);
 
 	while (len) {
-		unsigned int blocks_this_page = min(len, blocks_per_page);
-		unsigned int bytes_this_page = blocks_this_page << blockbits;
+		struct bio *bio;
+		unsigned int n;
 
-		if (num_pages == 0) {
-			fscrypt_set_bio_crypt_ctx(bio, inode, lblk, GFP_NOFS);
-			bio->bi_iter.bi_sector = sector;
-		}
-		ret = bio_add_page(bio, ZERO_PAGE(0), bytes_this_page, 0);
-		if (WARN_ON_ONCE(ret != bytes_this_page)) {
-			err = -EIO;
-			goto out;
-		}
-		num_pages++;
-		len -= blocks_this_page;
-		lblk += blocks_this_page;
-		sector += (bytes_this_page >> SECTOR_SHIFT);
-		if (num_pages == BIO_MAX_VECS || !len ||
-		    !fscrypt_mergeable_bio(bio, inode, lblk)) {
-			err = submit_bio_wait(bio);
-			if (err)
-				goto out;
-			bio_reset(bio, inode->i_sb->s_bdev, REQ_OP_WRITE);
-			num_pages = 0;
+		bio = bio_alloc(inode->i_sb->s_bdev, BIO_MAX_VECS, REQ_OP_WRITE,
+				GFP_NOFS);
+		bio->bi_iter.bi_sector = sector;
+		bio->bi_private = &done;
+		bio->bi_end_io = fscrypt_zeroout_range_end_io;
+		fscrypt_set_bio_crypt_ctx(bio, inode, lblk, GFP_NOFS);
+
+		for (n = 0; n < BIO_MAX_VECS; n++) {
+			unsigned int blocks_this_page =
+				min(len, blocks_per_page);
+			unsigned int bytes_this_page = blocks_this_page << blockbits;
+
+			__bio_add_page(bio, ZERO_PAGE(0), bytes_this_page, 0);
+			len -= blocks_this_page;
+			lblk += blocks_this_page;
+			sector += (bytes_this_page >> SECTOR_SHIFT);
+			if (!len || !fscrypt_mergeable_bio(bio, inode, lblk))
+				break;
 		}
+
+		atomic_inc(&done.pending);
+		submit_bio(bio);
 	}
-out:
-	bio_put(bio);
-	return err;
+
+	fscrypt_zeroout_range_done(&done);
+
+	wait_for_completion(&done.done);
+	return blk_status_to_errno(done.status);
 }
 
 /**
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 6/9] blk-crypto: optimize bio splitting in blk_crypto_fallback_encrypt_bio
  2025-10-31  9:34 move blk-crypto-fallback to sit above the block layer Christoph Hellwig
                   ` (4 preceding siblings ...)
  2025-10-31  9:34 ` [PATCH 5/9] fscrypt: keep multiple bios in flight in fscrypt_zeroout_range_inline_crypt Christoph Hellwig
@ 2025-10-31  9:34 ` Christoph Hellwig
  2025-11-14  0:22   ` Eric Biggers
  2025-10-31  9:34 ` [PATCH 7/9] blk-crypto: handle the fallback above the block layer Christoph Hellwig
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-10-31  9:34 UTC (permalink / raw)
  To: Jens Axboe, Eric Biggers, Vlastimil Babka, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

The current code in blk_crypto_fallback_encrypt_bio is inefficient and
prone to deadlocks under memory pressure: It first walks to pass in
plaintext bio to see how much of it can fit into a single encrypted
bio using up to BIO_MAX_VEC PAGE_SIZE segments, and then allocates a
plaintext clone that fits the size, only to allocate another bio for
the ciphertext later.  While the plaintext clone uses a bioset to avoid
deadlocks when allocations could fail, the ciphertex one uses bio_kmalloc
which is a no-go in the file system I/O path.

Switch blk_crypto_fallback_encrypt_bio to walk the source plaintext bio
while consuming bi_iter without cloning it, and instead allocate a
ciphertext bio at the beginning and whenever we fille up the previous
one.  The existing bio_set for the plaintext clones is reused for the
ciphertext bios to remove the deadlock risk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-crypto-fallback.c | 162 ++++++++++++++----------------------
 1 file changed, 63 insertions(+), 99 deletions(-)

diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index 86b27f96051a..1f58010fb437 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -152,35 +152,26 @@ static void blk_crypto_fallback_encrypt_endio(struct bio *enc_bio)
 
 	src_bio->bi_status = enc_bio->bi_status;
 
-	bio_uninit(enc_bio);
-	kfree(enc_bio);
+	bio_put(enc_bio);
 	bio_endio(src_bio);
 }
 
-static struct bio *blk_crypto_fallback_clone_bio(struct bio *bio_src)
+static struct bio *blk_crypto_alloc_enc_bio(struct bio *bio_src,
+		unsigned int nr_segs)
 {
-	unsigned int nr_segs = bio_segments(bio_src);
-	struct bvec_iter iter;
-	struct bio_vec bv;
 	struct bio *bio;
 
-	bio = bio_kmalloc(nr_segs, GFP_NOIO);
-	if (!bio)
-		return NULL;
-	bio_init_inline(bio, bio_src->bi_bdev, nr_segs, bio_src->bi_opf);
+	bio = bio_alloc_bioset(bio_src->bi_bdev, nr_segs, bio_src->bi_opf,
+			GFP_NOIO, &crypto_bio_split);
 	if (bio_flagged(bio_src, BIO_REMAPPED))
 		bio_set_flag(bio, BIO_REMAPPED);
+	bio->bi_private		= bio_src;
+	bio->bi_end_io		= blk_crypto_fallback_encrypt_endio;
 	bio->bi_ioprio		= bio_src->bi_ioprio;
 	bio->bi_write_hint	= bio_src->bi_write_hint;
 	bio->bi_write_stream	= bio_src->bi_write_stream;
 	bio->bi_iter.bi_sector	= bio_src->bi_iter.bi_sector;
-	bio->bi_iter.bi_size	= bio_src->bi_iter.bi_size;
-
-	bio_for_each_segment(bv, bio_src, iter)
-		bio->bi_io_vec[bio->bi_vcnt++] = bv;
-
 	bio_clone_blkg_association(bio, bio_src);
-
 	return bio;
 }
 
@@ -208,32 +199,6 @@ blk_crypto_fallback_alloc_cipher_req(struct blk_crypto_keyslot *slot,
 	return true;
 }
 
-static bool blk_crypto_fallback_split_bio_if_needed(struct bio **bio_ptr)
-{
-	struct bio *bio = *bio_ptr;
-	unsigned int i = 0;
-	unsigned int num_sectors = 0;
-	struct bio_vec bv;
-	struct bvec_iter iter;
-
-	bio_for_each_segment(bv, bio, iter) {
-		num_sectors += bv.bv_len >> SECTOR_SHIFT;
-		if (++i == BIO_MAX_VECS)
-			break;
-	}
-
-	if (num_sectors < bio_sectors(bio)) {
-		bio = bio_submit_split_bioset(bio, num_sectors,
-					      &crypto_bio_split);
-		if (!bio)
-			return false;
-
-		*bio_ptr = bio;
-	}
-
-	return true;
-}
-
 union blk_crypto_iv {
 	__le64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE];
 	u8 bytes[BLK_CRYPTO_MAX_IV_SIZE];
@@ -257,34 +222,22 @@ static void blk_crypto_dun_to_iv(const u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE],
  */
 static bool blk_crypto_fallback_encrypt_bio(struct bio **bio_ptr)
 {
-	struct bio *src_bio, *enc_bio;
-	struct bio_crypt_ctx *bc;
-	struct blk_crypto_keyslot *slot;
-	int data_unit_size;
+	struct bio *src_bio = *bio_ptr;
+	struct bio_crypt_ctx *bc = src_bio->bi_crypt_context;
+	int data_unit_size = bc->bc_key->crypto_cfg.data_unit_size;
 	struct skcipher_request *ciph_req = NULL;
+	struct blk_crypto_keyslot *slot;
 	DECLARE_CRYPTO_WAIT(wait);
 	u64 curr_dun[BLK_CRYPTO_DUN_ARRAY_SIZE];
 	struct scatterlist src, dst;
 	union blk_crypto_iv iv;
-	unsigned int i, j;
+	struct bio *enc_bio = NULL;
+	unsigned int nr_segs;
+	unsigned int enc_idx = 0;
+	unsigned int j;
 	bool ret = false;
 	blk_status_t blk_st;
 
-	/* Split the bio if it's too big for single page bvec */
-	if (!blk_crypto_fallback_split_bio_if_needed(bio_ptr))
-		return false;
-
-	src_bio = *bio_ptr;
-	bc = src_bio->bi_crypt_context;
-	data_unit_size = bc->bc_key->crypto_cfg.data_unit_size;
-
-	/* Allocate bounce bio for encryption */
-	enc_bio = blk_crypto_fallback_clone_bio(src_bio);
-	if (!enc_bio) {
-		src_bio->bi_status = BLK_STS_RESOURCE;
-		return false;
-	}
-
 	/*
 	 * Get a blk-crypto-fallback keyslot that contains a crypto_skcipher for
 	 * this bio's algorithm and key.
@@ -293,7 +246,7 @@ static bool blk_crypto_fallback_encrypt_bio(struct bio **bio_ptr)
 					bc->bc_key, &slot);
 	if (blk_st != BLK_STS_OK) {
 		src_bio->bi_status = blk_st;
-		goto out_put_enc_bio;
+		return false;
 	}
 
 	/* and then allocate an skcipher_request for it */
@@ -309,61 +262,72 @@ static bool blk_crypto_fallback_encrypt_bio(struct bio **bio_ptr)
 	skcipher_request_set_crypt(ciph_req, &src, &dst, data_unit_size,
 				   iv.bytes);
 
-	/* Encrypt each page in the bounce bio */
-	for (i = 0; i < enc_bio->bi_vcnt; i++) {
-		struct bio_vec *enc_bvec = &enc_bio->bi_io_vec[i];
-		struct page *plaintext_page = enc_bvec->bv_page;
-		struct page *ciphertext_page =
-			mempool_alloc(blk_crypto_bounce_page_pool, GFP_NOIO);
-
-		enc_bvec->bv_page = ciphertext_page;
+	/* Encrypt each page in the origin bio */
+	nr_segs = bio_segments(src_bio);
+	for (;;) {
+		struct bio_vec src_bv =
+			bio_iter_iovec(src_bio, src_bio->bi_iter);
+		struct page *enc_page;
 
-		if (!ciphertext_page) {
-			src_bio->bi_status = BLK_STS_RESOURCE;
-			goto out_free_bounce_pages;
+		if (!enc_bio) {
+			enc_bio = blk_crypto_alloc_enc_bio(src_bio,
+					min(nr_segs, BIO_MAX_VECS));
 		}
 
-		sg_set_page(&src, plaintext_page, data_unit_size,
-			    enc_bvec->bv_offset);
-		sg_set_page(&dst, ciphertext_page, data_unit_size,
-			    enc_bvec->bv_offset);
+		enc_page = mempool_alloc(blk_crypto_bounce_page_pool,
+				GFP_NOIO);
+		__bio_add_page(enc_bio, enc_page, src_bv.bv_len,
+				src_bv.bv_offset);
+
+		sg_set_page(&src, src_bv.bv_page, data_unit_size,
+			    src_bv.bv_offset);
+		sg_set_page(&dst, enc_page, data_unit_size, src_bv.bv_offset);
 
 		/* Encrypt each data unit in this page */
-		for (j = 0; j < enc_bvec->bv_len; j += data_unit_size) {
+		for (j = 0; j < src_bv.bv_len; j += data_unit_size) {
 			blk_crypto_dun_to_iv(curr_dun, &iv);
 			if (crypto_wait_req(crypto_skcipher_encrypt(ciph_req),
-					    &wait)) {
-				i++;
-				src_bio->bi_status = BLK_STS_IOERR;
-				goto out_free_bounce_pages;
-			}
+					    &wait))
+				goto out_ioerror;
 			bio_crypt_dun_increment(curr_dun, 1);
 			src.offset += data_unit_size;
 			dst.offset += data_unit_size;
 		}
+
+		bio_advance_iter_single(src_bio, &src_bio->bi_iter,
+				src_bv.bv_len);
+		if (!src_bio->bi_iter.bi_size)
+			break;
+
+		if (++enc_idx == enc_bio->bi_max_vecs) {
+			/*
+			 * Each encrypted bio will call bio_endio in the
+			 * completion handler, so ensure the remaining count
+			 * matches the number of submitted bios.
+			 */
+			bio_inc_remaining(src_bio);
+			submit_bio(enc_bio);
+			enc_bio = NULL;
+			enc_idx = 0;
+		}
+		nr_segs--;
 	}
 
-	enc_bio->bi_private = src_bio;
-	enc_bio->bi_end_io = blk_crypto_fallback_encrypt_endio;
 	*bio_ptr = enc_bio;
 	ret = true;
-
-	enc_bio = NULL;
-	goto out_free_ciph_req;
-
-out_free_bounce_pages:
-	while (i > 0)
-		mempool_free(enc_bio->bi_io_vec[--i].bv_page,
-			     blk_crypto_bounce_page_pool);
 out_free_ciph_req:
 	skcipher_request_free(ciph_req);
 out_release_keyslot:
 	blk_crypto_put_keyslot(slot);
-out_put_enc_bio:
-	if (enc_bio)
-		bio_uninit(enc_bio);
-	kfree(enc_bio);
 	return ret;
+
+out_ioerror:
+	while (enc_idx > 0)
+		mempool_free(enc_bio->bi_io_vec[enc_idx--].bv_page,
+			     blk_crypto_bounce_page_pool);
+	bio_put(enc_bio);
+	src_bio->bi_status = BLK_STS_IOERR;
+	goto out_free_ciph_req;
 }
 
 /*
@@ -537,7 +501,7 @@ static int blk_crypto_fallback_init(void)
 
 	get_random_bytes(blank_key, sizeof(blank_key));
 
-	err = bioset_init(&crypto_bio_split, 64, 0, 0);
+	err = bioset_init(&crypto_bio_split, 64, 0, BIOSET_NEED_BVECS);
 	if (err)
 		goto out;
 
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 7/9] blk-crypto: handle the fallback above the block layer
  2025-10-31  9:34 move blk-crypto-fallback to sit above the block layer Christoph Hellwig
                   ` (5 preceding siblings ...)
  2025-10-31  9:34 ` [PATCH 6/9] blk-crypto: optimize bio splitting in blk_crypto_fallback_encrypt_bio Christoph Hellwig
@ 2025-10-31  9:34 ` Christoph Hellwig
  2025-11-07  4:42   ` Eric Biggers
  2025-11-14  0:37   ` Eric Biggers
  2025-10-31  9:34 ` [PATCH 8/9] blk-crypto: use on-stack skciphers for fallback en/decryption Christoph Hellwig
  2025-10-31  9:34 ` [PATCH 9/9] blk-crypto: use mempool_alloc_bulk for encrypted bio page allocation Christoph Hellwig
  8 siblings, 2 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-10-31  9:34 UTC (permalink / raw)
  To: Jens Axboe, Eric Biggers, Vlastimil Babka, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

Add a blk_crypto_submit_bio helper that either submits the bio when
it is not encrypted or inline encryption is provided, but otherwise
handles the encryption before going down into the low-level driver.
This reduces the risk from bio reordering and keeps memory allocation
as high up in the stack as possible.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c            | 10 ++--
 block/blk-crypto-fallback.c | 97 ++++++++++++++++---------------------
 block/blk-crypto-internal.h | 31 ++++++------
 block/blk-crypto.c          | 65 +++++++++----------------
 fs/buffer.c                 |  3 +-
 fs/crypto/bio.c             |  2 +-
 fs/ext4/page-io.c           |  3 +-
 fs/ext4/readpage.c          |  9 ++--
 fs/f2fs/data.c              |  4 +-
 fs/f2fs/file.c              |  3 +-
 fs/iomap/direct-io.c        |  3 +-
 include/linux/blk-crypto.h  | 16 ++++++
 12 files changed, 116 insertions(+), 130 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 14ae73eebe0d..108b85eb3b9d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -628,9 +628,6 @@ static void __submit_bio(struct bio *bio)
 	/* If plug is not used, add new plug here to cache nsecs time. */
 	struct blk_plug plug;
 
-	if (unlikely(!blk_crypto_bio_prep(&bio)))
-		return;
-
 	blk_start_plug(&plug);
 
 	if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) {
@@ -794,6 +791,13 @@ void submit_bio_noacct(struct bio *bio)
 	if ((bio->bi_opf & REQ_NOWAIT) && !bdev_nowait(bdev))
 		goto not_supported;
 
+	if (bio_has_crypt_ctx(bio)) {
+		if (WARN_ON_ONCE(!bio_has_data(bio)))
+			goto end_io;
+		if (!blk_crypto_supported(bio))
+			goto not_supported;
+	}
+
 	if (should_fail_bio(bio))
 		goto end_io;
 	bio_check_ro(bio);
diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index 1f58010fb437..16a1809e2667 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -141,6 +141,26 @@ static const struct blk_crypto_ll_ops blk_crypto_fallback_ll_ops = {
 	.keyslot_evict          = blk_crypto_fallback_keyslot_evict,
 };
 
+static bool blk_crypto_fallback_bio_valid(struct bio *bio)
+{
+	struct bio_crypt_ctx *bc = bio->bi_crypt_context;
+
+	if (WARN_ON_ONCE(!tfms_inited[bc->bc_key->crypto_cfg.crypto_mode])) {
+		/* User didn't call blk_crypto_start_using_key() first */
+		bio_io_error(bio);
+		return false;
+	}
+
+	if (!__blk_crypto_cfg_supported(blk_crypto_fallback_profile,
+					&bc->bc_key->crypto_cfg)) {
+		bio->bi_status = BLK_STS_NOTSUPP;
+		bio_endio(bio);
+		return false;
+	}
+
+	return true;
+}
+
 static void blk_crypto_fallback_encrypt_endio(struct bio *enc_bio)
 {
 	struct bio *src_bio = enc_bio->bi_private;
@@ -214,15 +234,12 @@ static void blk_crypto_dun_to_iv(const u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE],
 }
 
 /*
- * The crypto API fallback's encryption routine.
- * Allocate a bounce bio for encryption, encrypt the input bio using crypto API,
- * and replace *bio_ptr with the bounce bio. May split input bio if it's too
- * large. Returns true on success. Returns false and sets bio->bi_status on
- * error.
+ * The crypto API fallback's encryption routine.  Allocates a bounce bio for
+ * encryption, encrypts the input bio using crypto API and submits the bounce
+ * bio.
  */
-static bool blk_crypto_fallback_encrypt_bio(struct bio **bio_ptr)
+void blk_crypto_fallback_encrypt_bio(struct bio *src_bio)
 {
-	struct bio *src_bio = *bio_ptr;
 	struct bio_crypt_ctx *bc = src_bio->bi_crypt_context;
 	int data_unit_size = bc->bc_key->crypto_cfg.data_unit_size;
 	struct skcipher_request *ciph_req = NULL;
@@ -235,9 +252,11 @@ static bool blk_crypto_fallback_encrypt_bio(struct bio **bio_ptr)
 	unsigned int nr_segs;
 	unsigned int enc_idx = 0;
 	unsigned int j;
-	bool ret = false;
 	blk_status_t blk_st;
 
+	if (!blk_crypto_fallback_bio_valid(src_bio))
+		return;
+
 	/*
 	 * Get a blk-crypto-fallback keyslot that contains a crypto_skcipher for
 	 * this bio's algorithm and key.
@@ -246,7 +265,7 @@ static bool blk_crypto_fallback_encrypt_bio(struct bio **bio_ptr)
 					bc->bc_key, &slot);
 	if (blk_st != BLK_STS_OK) {
 		src_bio->bi_status = blk_st;
-		return false;
+		goto endio;
 	}
 
 	/* and then allocate an skcipher_request for it */
@@ -313,13 +332,10 @@ static bool blk_crypto_fallback_encrypt_bio(struct bio **bio_ptr)
 		nr_segs--;
 	}
 
-	*bio_ptr = enc_bio;
-	ret = true;
-out_free_ciph_req:
 	skcipher_request_free(ciph_req);
-out_release_keyslot:
 	blk_crypto_put_keyslot(slot);
-	return ret;
+	submit_bio(enc_bio);
+	return;
 
 out_ioerror:
 	while (enc_idx > 0)
@@ -327,7 +343,11 @@ static bool blk_crypto_fallback_encrypt_bio(struct bio **bio_ptr)
 			     blk_crypto_bounce_page_pool);
 	bio_put(enc_bio);
 	src_bio->bi_status = BLK_STS_IOERR;
-	goto out_free_ciph_req;
+	skcipher_request_free(ciph_req);
+out_release_keyslot:
+	blk_crypto_put_keyslot(slot);
+endio:
+	bio_endio(src_bio);
 }
 
 /*
@@ -428,60 +448,25 @@ static void blk_crypto_fallback_decrypt_endio(struct bio *bio)
 	queue_work(blk_crypto_wq, &f_ctx->work);
 }
 
-/**
- * blk_crypto_fallback_bio_prep - Prepare a bio to use fallback en/decryption
- *
- * @bio_ptr: pointer to the bio to prepare
- *
- * If bio is doing a WRITE operation, this splits the bio into two parts if it's
- * too big (see blk_crypto_fallback_split_bio_if_needed()). It then allocates a
- * bounce bio for the first part, encrypts it, and updates bio_ptr to point to
- * the bounce bio.
- *
- * For a READ operation, we mark the bio for decryption by using bi_private and
- * bi_end_io.
- *
- * In either case, this function will make the bio look like a regular bio (i.e.
- * as if no encryption context was ever specified) for the purposes of the rest
- * of the stack except for blk-integrity (blk-integrity and blk-crypto are not
- * currently supported together).
- *
- * Return: true on success. Sets bio->bi_status and returns false on error.
+/*
+ * bio READ case: Set up a f_ctx in the bio's bi_private and set the bi_end_io
+ * appropriately to trigger decryption when the bio is ended.
  */
-bool blk_crypto_fallback_bio_prep(struct bio **bio_ptr)
+bool blk_crypto_fallback_prep_decrypt_bio(struct bio *bio)
 {
-	struct bio *bio = *bio_ptr;
-	struct bio_crypt_ctx *bc = bio->bi_crypt_context;
 	struct bio_fallback_crypt_ctx *f_ctx;
 
-	if (WARN_ON_ONCE(!tfms_inited[bc->bc_key->crypto_cfg.crypto_mode])) {
-		/* User didn't call blk_crypto_start_using_key() first */
-		bio->bi_status = BLK_STS_IOERR;
+	if (!blk_crypto_fallback_bio_valid(bio))
 		return false;
-	}
 
-	if (!__blk_crypto_cfg_supported(blk_crypto_fallback_profile,
-					&bc->bc_key->crypto_cfg)) {
-		bio->bi_status = BLK_STS_NOTSUPP;
-		return false;
-	}
-
-	if (bio_data_dir(bio) == WRITE)
-		return blk_crypto_fallback_encrypt_bio(bio_ptr);
-
-	/*
-	 * bio READ case: Set up a f_ctx in the bio's bi_private and set the
-	 * bi_end_io appropriately to trigger decryption when the bio is ended.
-	 */
 	f_ctx = mempool_alloc(bio_fallback_crypt_ctx_pool, GFP_NOIO);
-	f_ctx->crypt_ctx = *bc;
+	f_ctx->crypt_ctx = *bio->bi_crypt_context;
 	f_ctx->crypt_iter = bio->bi_iter;
 	f_ctx->bi_private_orig = bio->bi_private;
 	f_ctx->bi_end_io_orig = bio->bi_end_io;
 	bio->bi_private = (void *)f_ctx;
 	bio->bi_end_io = blk_crypto_fallback_decrypt_endio;
 	bio_crypt_free_ctx(bio);
-
 	return true;
 }
 
diff --git a/block/blk-crypto-internal.h b/block/blk-crypto-internal.h
index ccf6dff6ff6b..22ac87f44b46 100644
--- a/block/blk-crypto-internal.h
+++ b/block/blk-crypto-internal.h
@@ -86,6 +86,12 @@ bool __blk_crypto_cfg_supported(struct blk_crypto_profile *profile,
 int blk_crypto_ioctl(struct block_device *bdev, unsigned int cmd,
 		     void __user *argp);
 
+static inline bool blk_crypto_supported(struct bio *bio)
+{
+	return blk_crypto_config_supported_natively(bio->bi_bdev,
+			&bio->bi_crypt_context->bc_key->crypto_cfg);
+}
+
 #else /* CONFIG_BLK_INLINE_ENCRYPTION */
 
 static inline int blk_crypto_sysfs_register(struct gendisk *disk)
@@ -139,6 +145,11 @@ static inline int blk_crypto_ioctl(struct block_device *bdev, unsigned int cmd,
 	return -ENOTTY;
 }
 
+static inline bool blk_crypto_supported(struct bio *bio)
+{
+	return false;
+}
+
 #endif /* CONFIG_BLK_INLINE_ENCRYPTION */
 
 void __bio_crypt_advance(struct bio *bio, unsigned int bytes);
@@ -165,14 +176,6 @@ static inline void bio_crypt_do_front_merge(struct request *rq,
 #endif
 }
 
-bool __blk_crypto_bio_prep(struct bio **bio_ptr);
-static inline bool blk_crypto_bio_prep(struct bio **bio_ptr)
-{
-	if (bio_has_crypt_ctx(*bio_ptr))
-		return __blk_crypto_bio_prep(bio_ptr);
-	return true;
-}
-
 blk_status_t __blk_crypto_rq_get_keyslot(struct request *rq);
 static inline blk_status_t blk_crypto_rq_get_keyslot(struct request *rq)
 {
@@ -215,12 +218,13 @@ static inline int blk_crypto_rq_bio_prep(struct request *rq, struct bio *bio,
 	return 0;
 }
 
+bool blk_crypto_fallback_prep_decrypt_bio(struct bio *bio);
+void blk_crypto_fallback_encrypt_bio(struct bio *orig_bio);
+
 #ifdef CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK
 
 int blk_crypto_fallback_start_using_mode(enum blk_crypto_mode_num mode_num);
 
-bool blk_crypto_fallback_bio_prep(struct bio **bio_ptr);
-
 int blk_crypto_fallback_evict_key(const struct blk_crypto_key *key);
 
 #else /* CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK */
@@ -232,13 +236,6 @@ blk_crypto_fallback_start_using_mode(enum blk_crypto_mode_num mode_num)
 	return -ENOPKG;
 }
 
-static inline bool blk_crypto_fallback_bio_prep(struct bio **bio_ptr)
-{
-	pr_warn_once("crypto API fallback disabled; failing request.\n");
-	(*bio_ptr)->bi_status = BLK_STS_NOTSUPP;
-	return false;
-}
-
 static inline int
 blk_crypto_fallback_evict_key(const struct blk_crypto_key *key)
 {
diff --git a/block/blk-crypto.c b/block/blk-crypto.c
index 4b1ad84d1b5a..a8f2dc2a75f6 100644
--- a/block/blk-crypto.c
+++ b/block/blk-crypto.c
@@ -258,57 +258,36 @@ void __blk_crypto_free_request(struct request *rq)
 	rq->crypt_ctx = NULL;
 }
 
-/**
- * __blk_crypto_bio_prep - Prepare bio for inline encryption
- *
- * @bio_ptr: pointer to original bio pointer
- *
- * If the bio crypt context provided for the bio is supported by the underlying
- * device's inline encryption hardware, do nothing.
- *
- * Otherwise, try to perform en/decryption for this bio by falling back to the
- * kernel crypto API. When the crypto API fallback is used for encryption,
- * blk-crypto may choose to split the bio into 2 - the first one that will
- * continue to be processed and the second one that will be resubmitted via
- * submit_bio_noacct. A bounce bio will be allocated to encrypt the contents
- * of the aforementioned "first one", and *bio_ptr will be updated to this
- * bounce bio.
- *
- * Caller must ensure bio has bio_crypt_ctx.
- *
- * Return: true on success; false on error (and bio->bi_status will be set
- *	   appropriately, and bio_endio() will have been called so bio
- *	   submission should abort).
- */
-bool __blk_crypto_bio_prep(struct bio **bio_ptr)
+bool __blk_crypto_submit_bio(struct bio *bio)
 {
-	struct bio *bio = *bio_ptr;
 	const struct blk_crypto_key *bc_key = bio->bi_crypt_context->bc_key;
 
-	/* Error if bio has no data. */
-	if (WARN_ON_ONCE(!bio_has_data(bio))) {
-		bio->bi_status = BLK_STS_IOERR;
-		goto fail;
+	if (!bio_crypt_check_alignment(bio)) {
+		bio_io_error(bio);
+		return false;
 	}
 
-	if (!bio_crypt_check_alignment(bio)) {
-		bio->bi_status = BLK_STS_IOERR;
-		goto fail;
+	if (!blk_crypto_config_supported_natively(bio->bi_bdev,
+			&bc_key->crypto_cfg)) {
+		if (!IS_ENABLED(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK)) {
+			pr_warn_once("crypto API fallback disabled; failing request.\n");
+			bio->bi_status = BLK_STS_NOTSUPP;
+			bio_endio(bio);
+			return false;
+		}
+
+		if (bio_data_dir(bio) == WRITE) {
+			blk_crypto_fallback_encrypt_bio(bio);
+			return false;
+		}
+
+		if (!blk_crypto_fallback_prep_decrypt_bio(bio))
+			return false;
 	}
 
-	/*
-	 * Success if device supports the encryption context, or if we succeeded
-	 * in falling back to the crypto API.
-	 */
-	if (blk_crypto_config_supported_natively(bio->bi_bdev,
-						 &bc_key->crypto_cfg))
-		return true;
-	if (blk_crypto_fallback_bio_prep(bio_ptr))
-		return true;
-fail:
-	bio_endio(*bio_ptr);
-	return false;
+	return true;
 }
+EXPORT_SYMBOL_GPL(__blk_crypto_submit_bio);
 
 int __blk_crypto_rq_bio_prep(struct request *rq, struct bio *bio,
 			     gfp_t gfp_mask)
diff --git a/fs/buffer.c b/fs/buffer.c
index 6a8752f7bbed..f82d2ef4276f 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -29,6 +29,7 @@
 #include <linux/slab.h>
 #include <linux/capability.h>
 #include <linux/blkdev.h>
+#include <linux/blk-crypto.h>
 #include <linux/file.h>
 #include <linux/quotaops.h>
 #include <linux/highmem.h>
@@ -2821,7 +2822,7 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
 		wbc_account_cgroup_owner(wbc, bh->b_folio, bh->b_size);
 	}
 
-	submit_bio(bio);
+	blk_crypto_submit_bio(bio);
 }
 
 void submit_bh(blk_opf_t opf, struct buffer_head *bh)
diff --git a/fs/crypto/bio.c b/fs/crypto/bio.c
index e59d342b4240..f53eb0a21912 100644
--- a/fs/crypto/bio.c
+++ b/fs/crypto/bio.c
@@ -105,7 +105,7 @@ static int fscrypt_zeroout_range_inline_crypt(const struct inode *inode,
 		}
 
 		atomic_inc(&done.pending);
-		submit_bio(bio);
+		blk_crypto_submit_bio(bio);
 	}
 
 	fscrypt_zeroout_range_done(&done);
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 39abfeec5f36..a8c95eee91b7 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -7,6 +7,7 @@
  * Written by Theodore Ts'o, 2010.
  */
 
+#include <linux/blk-crypto.h>
 #include <linux/fs.h>
 #include <linux/time.h>
 #include <linux/highuid.h>
@@ -401,7 +402,7 @@ void ext4_io_submit(struct ext4_io_submit *io)
 	if (bio) {
 		if (io->io_wbc->sync_mode == WB_SYNC_ALL)
 			io->io_bio->bi_opf |= REQ_SYNC;
-		submit_bio(io->io_bio);
+		blk_crypto_submit_bio(io->io_bio);
 	}
 	io->io_bio = NULL;
 }
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index f329daf6e5c7..ad785dcba826 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -36,6 +36,7 @@
 #include <linux/bio.h>
 #include <linux/fs.h>
 #include <linux/buffer_head.h>
+#include <linux/blk-crypto.h>
 #include <linux/blkdev.h>
 #include <linux/highmem.h>
 #include <linux/prefetch.h>
@@ -348,7 +349,7 @@ int ext4_mpage_readpages(struct inode *inode,
 		if (bio && (last_block_in_bio != first_block - 1 ||
 			    !fscrypt_mergeable_bio(bio, inode, next_block))) {
 		submit_and_realloc:
-			submit_bio(bio);
+			blk_crypto_submit_bio(bio);
 			bio = NULL;
 		}
 		if (bio == NULL) {
@@ -374,14 +375,14 @@ int ext4_mpage_readpages(struct inode *inode,
 		if (((map.m_flags & EXT4_MAP_BOUNDARY) &&
 		     (relative_block == map.m_len)) ||
 		    (first_hole != blocks_per_folio)) {
-			submit_bio(bio);
+			blk_crypto_submit_bio(bio);
 			bio = NULL;
 		} else
 			last_block_in_bio = first_block + blocks_per_folio - 1;
 		continue;
 	confused:
 		if (bio) {
-			submit_bio(bio);
+			blk_crypto_submit_bio(bio);
 			bio = NULL;
 		}
 		if (!folio_test_uptodate(folio))
@@ -392,7 +393,7 @@ int ext4_mpage_readpages(struct inode *inode,
 		; /* A label shall be followed by a statement until C23 */
 	}
 	if (bio)
-		submit_bio(bio);
+		blk_crypto_submit_bio(bio);
 	return 0;
 }
 
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 775aa4f63aa3..b5816f229ae2 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -513,7 +513,7 @@ void f2fs_submit_read_bio(struct f2fs_sb_info *sbi, struct bio *bio,
 	trace_f2fs_submit_read_bio(sbi->sb, type, bio);
 
 	iostat_update_submit_ctx(bio, type);
-	submit_bio(bio);
+	blk_crypto_submit_bio(bio);
 }
 
 static void f2fs_submit_write_bio(struct f2fs_sb_info *sbi, struct bio *bio,
@@ -522,7 +522,7 @@ static void f2fs_submit_write_bio(struct f2fs_sb_info *sbi, struct bio *bio,
 	WARN_ON_ONCE(is_read_io(bio_op(bio)));
 	trace_f2fs_submit_write_bio(sbi->sb, type, bio);
 	iostat_update_submit_ctx(bio, type);
-	submit_bio(bio);
+	blk_crypto_submit_bio(bio);
 }
 
 static void __submit_merged_bio(struct f2fs_bio_info *io)
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index ffa045b39c01..11b394dc2d5e 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -5,6 +5,7 @@
  * Copyright (c) 2012 Samsung Electronics Co., Ltd.
  *             http://www.samsung.com/
  */
+#include <linux/blk-crypto.h>
 #include <linux/fs.h>
 #include <linux/f2fs_fs.h>
 #include <linux/stat.h>
@@ -5042,7 +5043,7 @@ static void f2fs_dio_write_submit_io(const struct iomap_iter *iter,
 	enum temp_type temp = f2fs_get_segment_temp(sbi, type);
 
 	bio->bi_write_hint = f2fs_io_type_to_rw_hint(sbi, DATA, temp);
-	submit_bio(bio);
+	blk_crypto_submit_bio(bio);
 }
 
 static const struct iomap_dio_ops f2fs_iomap_dio_write_ops = {
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 5d5d63efbd57..c69dd24be663 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -3,6 +3,7 @@
  * Copyright (C) 2010 Red Hat, Inc.
  * Copyright (c) 2016-2025 Christoph Hellwig.
  */
+#include <linux/blk-crypto.h>
 #include <linux/fscrypt.h>
 #include <linux/pagemap.h>
 #include <linux/iomap.h>
@@ -82,7 +83,7 @@ static void iomap_dio_submit_bio(const struct iomap_iter *iter,
 		dio->dops->submit_io(iter, bio, pos);
 	} else {
 		WARN_ON_ONCE(iter->iomap.flags & IOMAP_F_ANON_WRITE);
-		submit_bio(bio);
+		blk_crypto_submit_bio(bio);
 	}
 }
 
diff --git a/include/linux/blk-crypto.h b/include/linux/blk-crypto.h
index 58b0c5254a67..ffe815c09696 100644
--- a/include/linux/blk-crypto.h
+++ b/include/linux/blk-crypto.h
@@ -171,6 +171,22 @@ static inline bool bio_has_crypt_ctx(struct bio *bio)
 
 #endif /* CONFIG_BLK_INLINE_ENCRYPTION */
 
+bool __blk_crypto_submit_bio(struct bio *bio);
+
+/**
+ * blk_crypto_submit_bio - Submit a bio using inline encryption
+ * @bio: bio to submit
+ *
+ * If the bio crypt context attached to @bio is supported by the underlying
+ * device's inline encryption hardware, just submit @bio.  Otherwise, try to
+ * perform en/decryption for this bio by falling back to the kernel crypto API.
+ */
+static inline void blk_crypto_submit_bio(struct bio *bio)
+{
+	if (!bio_has_crypt_ctx(bio) || __blk_crypto_submit_bio(bio))
+		submit_bio(bio);
+}
+
 int __bio_crypt_clone(struct bio *dst, struct bio *src, gfp_t gfp_mask);
 /**
  * bio_crypt_clone - clone bio encryption context
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 8/9] blk-crypto: use on-stack skciphers for fallback en/decryption
  2025-10-31  9:34 move blk-crypto-fallback to sit above the block layer Christoph Hellwig
                   ` (6 preceding siblings ...)
  2025-10-31  9:34 ` [PATCH 7/9] blk-crypto: handle the fallback above the block layer Christoph Hellwig
@ 2025-10-31  9:34 ` Christoph Hellwig
  2025-11-07  4:18   ` Eric Biggers
  2025-11-14  0:32   ` Eric Biggers
  2025-10-31  9:34 ` [PATCH 9/9] blk-crypto: use mempool_alloc_bulk for encrypted bio page allocation Christoph Hellwig
  8 siblings, 2 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-10-31  9:34 UTC (permalink / raw)
  To: Jens Axboe, Eric Biggers, Vlastimil Babka, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

Allocating a skcipher dynamically can deadlock or cause unexpected
I/O failures when called from writeback context.  Sidestep the
allocation by using on-stack skciphers, similar to what the non
blk-crypto fscrypt already does.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-crypto-fallback.c | 170 +++++++++++++++++-------------------
 1 file changed, 78 insertions(+), 92 deletions(-)

diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index 16a1809e2667..33aa7b26ed37 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -75,7 +75,7 @@ static bool tfms_inited[BLK_ENCRYPTION_MODE_MAX];
 
 static struct blk_crypto_fallback_keyslot {
 	enum blk_crypto_mode_num crypto_mode;
-	struct crypto_skcipher *tfms[BLK_ENCRYPTION_MODE_MAX];
+	struct crypto_sync_skcipher *tfms[BLK_ENCRYPTION_MODE_MAX];
 } *blk_crypto_keyslots;
 
 static struct blk_crypto_profile *blk_crypto_fallback_profile;
@@ -98,7 +98,7 @@ static void blk_crypto_fallback_evict_keyslot(unsigned int slot)
 	WARN_ON(slotp->crypto_mode == BLK_ENCRYPTION_MODE_INVALID);
 
 	/* Clear the key in the skcipher */
-	err = crypto_skcipher_setkey(slotp->tfms[crypto_mode], blank_key,
+	err = crypto_sync_skcipher_setkey(slotp->tfms[crypto_mode], blank_key,
 				     blk_crypto_modes[crypto_mode].keysize);
 	WARN_ON(err);
 	slotp->crypto_mode = BLK_ENCRYPTION_MODE_INVALID;
@@ -119,7 +119,7 @@ blk_crypto_fallback_keyslot_program(struct blk_crypto_profile *profile,
 		blk_crypto_fallback_evict_keyslot(slot);
 
 	slotp->crypto_mode = crypto_mode;
-	err = crypto_skcipher_setkey(slotp->tfms[crypto_mode], key->bytes,
+	err = crypto_sync_skcipher_setkey(slotp->tfms[crypto_mode], key->bytes,
 				     key->size);
 	if (err) {
 		blk_crypto_fallback_evict_keyslot(slot);
@@ -195,28 +195,13 @@ static struct bio *blk_crypto_alloc_enc_bio(struct bio *bio_src,
 	return bio;
 }
 
-static bool
-blk_crypto_fallback_alloc_cipher_req(struct blk_crypto_keyslot *slot,
-				     struct skcipher_request **ciph_req_ret,
-				     struct crypto_wait *wait)
+static struct crypto_sync_skcipher *
+blk_crypto_fallback_tfm(struct blk_crypto_keyslot *slot)
 {
-	struct skcipher_request *ciph_req;
-	const struct blk_crypto_fallback_keyslot *slotp;
-	int keyslot_idx = blk_crypto_keyslot_index(slot);
-
-	slotp = &blk_crypto_keyslots[keyslot_idx];
-	ciph_req = skcipher_request_alloc(slotp->tfms[slotp->crypto_mode],
-					  GFP_NOIO);
-	if (!ciph_req)
-		return false;
-
-	skcipher_request_set_callback(ciph_req,
-				      CRYPTO_TFM_REQ_MAY_BACKLOG |
-				      CRYPTO_TFM_REQ_MAY_SLEEP,
-				      crypto_req_done, wait);
-	*ciph_req_ret = ciph_req;
+	const struct blk_crypto_fallback_keyslot *slotp =
+		&blk_crypto_keyslots[blk_crypto_keyslot_index(slot)];
 
-	return true;
+	return slotp->tfms[slotp->crypto_mode];
 }
 
 union blk_crypto_iv {
@@ -238,12 +223,12 @@ static void blk_crypto_dun_to_iv(const u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE],
  * encryption, encrypts the input bio using crypto API and submits the bounce
  * bio.
  */
-void blk_crypto_fallback_encrypt_bio(struct bio *src_bio)
+static blk_status_t __blk_crypto_fallback_encrypt_bio(struct bio *src_bio,
+		struct crypto_sync_skcipher *tfm)
 {
 	struct bio_crypt_ctx *bc = src_bio->bi_crypt_context;
 	int data_unit_size = bc->bc_key->crypto_cfg.data_unit_size;
-	struct skcipher_request *ciph_req = NULL;
-	struct blk_crypto_keyslot *slot;
+	SYNC_SKCIPHER_REQUEST_ON_STACK(ciph_req, tfm);
 	DECLARE_CRYPTO_WAIT(wait);
 	u64 curr_dun[BLK_CRYPTO_DUN_ARRAY_SIZE];
 	struct scatterlist src, dst;
@@ -252,27 +237,11 @@ void blk_crypto_fallback_encrypt_bio(struct bio *src_bio)
 	unsigned int nr_segs;
 	unsigned int enc_idx = 0;
 	unsigned int j;
-	blk_status_t blk_st;
 
-	if (!blk_crypto_fallback_bio_valid(src_bio))
-		return;
-
-	/*
-	 * Get a blk-crypto-fallback keyslot that contains a crypto_skcipher for
-	 * this bio's algorithm and key.
-	 */
-	blk_st = blk_crypto_get_keyslot(blk_crypto_fallback_profile,
-					bc->bc_key, &slot);
-	if (blk_st != BLK_STS_OK) {
-		src_bio->bi_status = blk_st;
-		goto endio;
-	}
-
-	/* and then allocate an skcipher_request for it */
-	if (!blk_crypto_fallback_alloc_cipher_req(slot, &ciph_req, &wait)) {
-		src_bio->bi_status = BLK_STS_RESOURCE;
-		goto out_release_keyslot;
-	}
+	skcipher_request_set_callback(ciph_req,
+				      CRYPTO_TFM_REQ_MAY_BACKLOG |
+				      CRYPTO_TFM_REQ_MAY_SLEEP,
+				      crypto_req_done, &wait);
 
 	memcpy(curr_dun, bc->bc_dun, sizeof(curr_dun));
 	sg_init_table(&src, 1);
@@ -332,62 +301,61 @@ void blk_crypto_fallback_encrypt_bio(struct bio *src_bio)
 		nr_segs--;
 	}
 
-	skcipher_request_free(ciph_req);
-	blk_crypto_put_keyslot(slot);
 	submit_bio(enc_bio);
-	return;
+	return BLK_STS_OK;
 
 out_ioerror:
 	while (enc_idx > 0)
 		mempool_free(enc_bio->bi_io_vec[enc_idx--].bv_page,
 			     blk_crypto_bounce_page_pool);
 	bio_put(enc_bio);
-	src_bio->bi_status = BLK_STS_IOERR;
-	skcipher_request_free(ciph_req);
-out_release_keyslot:
-	blk_crypto_put_keyslot(slot);
-endio:
-	bio_endio(src_bio);
+	return BLK_STS_IOERR;
+}
+
+void blk_crypto_fallback_encrypt_bio(struct bio *src_bio)
+{
+	struct bio_crypt_ctx *bc = src_bio->bi_crypt_context;
+	struct blk_crypto_keyslot *slot;
+	blk_status_t status;
+
+	if (!blk_crypto_fallback_bio_valid(src_bio))
+		return;
+
+	status = blk_crypto_get_keyslot(blk_crypto_fallback_profile,
+					bc->bc_key, &slot);
+	if (status == BLK_STS_OK) {
+		status = __blk_crypto_fallback_encrypt_bio(src_bio,
+			blk_crypto_fallback_tfm(slot));
+		blk_crypto_put_keyslot(slot);
+	}
+	if (status != BLK_STS_OK) {
+		src_bio->bi_status = status;
+		bio_endio(src_bio);
+		return;
+	}
 }
 
 /*
  * The crypto API fallback's main decryption routine.
  * Decrypts input bio in place, and calls bio_endio on the bio.
  */
-static void blk_crypto_fallback_decrypt_bio(struct work_struct *work)
+static blk_status_t __blk_crypto_fallback_decrypt_bio(struct bio *bio,
+		struct bio_crypt_ctx *bc, struct bvec_iter iter,
+		struct crypto_sync_skcipher *tfm)
 {
-	struct bio_fallback_crypt_ctx *f_ctx =
-		container_of(work, struct bio_fallback_crypt_ctx, work);
-	struct bio *bio = f_ctx->bio;
-	struct bio_crypt_ctx *bc = &f_ctx->crypt_ctx;
-	struct blk_crypto_keyslot *slot;
-	struct skcipher_request *ciph_req = NULL;
+	SYNC_SKCIPHER_REQUEST_ON_STACK(ciph_req, tfm);
 	DECLARE_CRYPTO_WAIT(wait);
 	u64 curr_dun[BLK_CRYPTO_DUN_ARRAY_SIZE];
 	union blk_crypto_iv iv;
 	struct scatterlist sg;
 	struct bio_vec bv;
-	struct bvec_iter iter;
 	const int data_unit_size = bc->bc_key->crypto_cfg.data_unit_size;
 	unsigned int i;
-	blk_status_t blk_st;
 
-	/*
-	 * Get a blk-crypto-fallback keyslot that contains a crypto_skcipher for
-	 * this bio's algorithm and key.
-	 */
-	blk_st = blk_crypto_get_keyslot(blk_crypto_fallback_profile,
-					bc->bc_key, &slot);
-	if (blk_st != BLK_STS_OK) {
-		bio->bi_status = blk_st;
-		goto out_no_keyslot;
-	}
-
-	/* and then allocate an skcipher_request for it */
-	if (!blk_crypto_fallback_alloc_cipher_req(slot, &ciph_req, &wait)) {
-		bio->bi_status = BLK_STS_RESOURCE;
-		goto out;
-	}
+	skcipher_request_set_callback(ciph_req,
+				      CRYPTO_TFM_REQ_MAY_BACKLOG |
+				      CRYPTO_TFM_REQ_MAY_SLEEP,
+				      crypto_req_done, &wait);
 
 	memcpy(curr_dun, bc->bc_dun, sizeof(curr_dun));
 	sg_init_table(&sg, 1);
@@ -395,7 +363,7 @@ static void blk_crypto_fallback_decrypt_bio(struct work_struct *work)
 				   iv.bytes);
 
 	/* Decrypt each segment in the bio */
-	__bio_for_each_segment(bv, bio, iter, f_ctx->crypt_iter) {
+	__bio_for_each_segment(bv, bio, iter, iter) {
 		struct page *page = bv.bv_page;
 
 		sg_set_page(&sg, page, data_unit_size, bv.bv_offset);
@@ -404,19 +372,36 @@ static void blk_crypto_fallback_decrypt_bio(struct work_struct *work)
 		for (i = 0; i < bv.bv_len; i += data_unit_size) {
 			blk_crypto_dun_to_iv(curr_dun, &iv);
 			if (crypto_wait_req(crypto_skcipher_decrypt(ciph_req),
-					    &wait)) {
-				bio->bi_status = BLK_STS_IOERR;
-				goto out;
-			}
+					    &wait))
+				return BLK_STS_IOERR;
 			bio_crypt_dun_increment(curr_dun, 1);
 			sg.offset += data_unit_size;
 		}
 	}
 
-out:
-	skcipher_request_free(ciph_req);
-	blk_crypto_put_keyslot(slot);
-out_no_keyslot:
+	return BLK_STS_OK;
+}
+
+static void blk_crypto_fallback_decrypt_bio(struct work_struct *work)
+{
+	struct bio_fallback_crypt_ctx *f_ctx =
+		container_of(work, struct bio_fallback_crypt_ctx, work);
+	struct bio *bio = f_ctx->bio;
+	struct bio_crypt_ctx *bc = &f_ctx->crypt_ctx;
+	struct blk_crypto_keyslot *slot;
+	blk_status_t status;
+
+	status = blk_crypto_get_keyslot(blk_crypto_fallback_profile,
+					bc->bc_key, &slot);
+	if (status == BLK_STS_OK) {
+		status = __blk_crypto_fallback_decrypt_bio(f_ctx->bio,
+				&f_ctx->crypt_ctx, f_ctx->crypt_iter,
+				blk_crypto_fallback_tfm(slot));
+		blk_crypto_put_keyslot(slot);
+	}
+
+	if (status != BLK_STS_OK)
+		bio->bi_status = status;
 	mempool_free(f_ctx, bio_fallback_crypt_ctx_pool);
 	bio_endio(bio);
 }
@@ -590,7 +575,8 @@ int blk_crypto_fallback_start_using_mode(enum blk_crypto_mode_num mode_num)
 
 	for (i = 0; i < blk_crypto_num_keyslots; i++) {
 		slotp = &blk_crypto_keyslots[i];
-		slotp->tfms[mode_num] = crypto_alloc_skcipher(cipher_str, 0, 0);
+		slotp->tfms[mode_num] = crypto_alloc_sync_skcipher(cipher_str,
+				0, 0);
 		if (IS_ERR(slotp->tfms[mode_num])) {
 			err = PTR_ERR(slotp->tfms[mode_num]);
 			if (err == -ENOENT) {
@@ -602,7 +588,7 @@ int blk_crypto_fallback_start_using_mode(enum blk_crypto_mode_num mode_num)
 			goto out_free_tfms;
 		}
 
-		crypto_skcipher_set_flags(slotp->tfms[mode_num],
+		crypto_sync_skcipher_set_flags(slotp->tfms[mode_num],
 					  CRYPTO_TFM_REQ_FORBID_WEAK_KEYS);
 	}
 
@@ -616,7 +602,7 @@ int blk_crypto_fallback_start_using_mode(enum blk_crypto_mode_num mode_num)
 out_free_tfms:
 	for (i = 0; i < blk_crypto_num_keyslots; i++) {
 		slotp = &blk_crypto_keyslots[i];
-		crypto_free_skcipher(slotp->tfms[mode_num]);
+		crypto_free_sync_skcipher(slotp->tfms[mode_num]);
 		slotp->tfms[mode_num] = NULL;
 	}
 out:
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 9/9] blk-crypto: use mempool_alloc_bulk for encrypted bio page allocation
  2025-10-31  9:34 move blk-crypto-fallback to sit above the block layer Christoph Hellwig
                   ` (7 preceding siblings ...)
  2025-10-31  9:34 ` [PATCH 8/9] blk-crypto: use on-stack skciphers for fallback en/decryption Christoph Hellwig
@ 2025-10-31  9:34 ` Christoph Hellwig
  2025-11-05 15:12   ` Vlastimil Babka
  8 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-10-31  9:34 UTC (permalink / raw)
  To: Jens Axboe, Eric Biggers, Vlastimil Babka, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

Calling mempool_alloc in a lot is not safe unless the maximum allocation
size times the maximum number of threads using it is less than the
minimum pool size.  Use the new mempool_alloc_bulk helper to allocate
all missing elements in one pass to remove this deadlock risk.  This
also means that non-pool allocations now use alloc_pages_bulk which can
be significantly faster than a loop over individual page allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-crypto-fallback.c | 54 ++++++++++++++++++++++++++++++-------
 1 file changed, 45 insertions(+), 9 deletions(-)

diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index 33aa7b26ed37..2f78027f0cce 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -22,7 +22,7 @@
 #include "blk-cgroup.h"
 #include "blk-crypto-internal.h"
 
-static unsigned int num_prealloc_bounce_pg = 32;
+static unsigned int num_prealloc_bounce_pg = BIO_MAX_VECS;
 module_param(num_prealloc_bounce_pg, uint, 0);
 MODULE_PARM_DESC(num_prealloc_bounce_pg,
 		 "Number of preallocated bounce pages for the blk-crypto crypto API fallback");
@@ -164,11 +164,21 @@ static bool blk_crypto_fallback_bio_valid(struct bio *bio)
 static void blk_crypto_fallback_encrypt_endio(struct bio *enc_bio)
 {
 	struct bio *src_bio = enc_bio->bi_private;
-	int i;
+	struct page **pages = (struct page **)enc_bio->bi_io_vec;
+	struct bio_vec *bv;
+	unsigned int i;
 
-	for (i = 0; i < enc_bio->bi_vcnt; i++)
-		mempool_free(enc_bio->bi_io_vec[i].bv_page,
-			     blk_crypto_bounce_page_pool);
+	/*
+	 * Use the same trick as the alloc side to avoid the need for an extra
+	 * pages array.
+	 */
+	bio_for_each_bvec_all(bv, enc_bio, i)
+		pages[i] = bv->bv_page;
+
+	i = mempool_free_bulk(blk_crypto_bounce_page_pool, (void **)pages,
+			enc_bio->bi_vcnt);
+	if (i < enc_bio->bi_vcnt)
+		release_pages(pages + i, enc_bio->bi_vcnt - i);
 
 	src_bio->bi_status = enc_bio->bi_status;
 
@@ -176,9 +186,12 @@ static void blk_crypto_fallback_encrypt_endio(struct bio *enc_bio)
 	bio_endio(src_bio);
 }
 
+#define PAGE_PTRS_PER_BVEC     (sizeof(struct bio_vec) / sizeof(struct page *))
+
 static struct bio *blk_crypto_alloc_enc_bio(struct bio *bio_src,
-		unsigned int nr_segs)
+		unsigned int nr_segs, struct page ***pages_ret)
 {
+	struct page **pages;
 	struct bio *bio;
 
 	bio = bio_alloc_bioset(bio_src->bi_bdev, nr_segs, bio_src->bi_opf,
@@ -192,6 +205,29 @@ static struct bio *blk_crypto_alloc_enc_bio(struct bio *bio_src,
 	bio->bi_write_stream	= bio_src->bi_write_stream;
 	bio->bi_iter.bi_sector	= bio_src->bi_iter.bi_sector;
 	bio_clone_blkg_association(bio, bio_src);
+
+	/*
+	 * Move page array up in the allocated memory for the bio vecs as far as
+	 * possible so that we can start filling biovecs from the beginning
+	 * without overwriting the temporary page array.
+	 */
+	static_assert(PAGE_PTRS_PER_BVEC > 1);
+	pages = (struct page **)bio->bi_io_vec;
+	pages += nr_segs * (PAGE_PTRS_PER_BVEC - 1);
+
+	/*
+	 * Try a bulk allocation first.  This could leave random pages in the
+	 * array unallocated, but we'll fix that up later in mempool_alloc_bulk.
+	 *
+	 * Note: alloc_pages_bulk needs the array to be zeroed, as it assumes
+	 * any non-zero slot already contains a valid allocation.
+	 */
+	memset(pages, 0, sizeof(struct page *) * nr_segs);
+	if (alloc_pages_bulk(GFP_NOFS, nr_segs, pages) < nr_segs) {
+		mempool_alloc_bulk(blk_crypto_bounce_page_pool, (void **)pages,
+				nr_segs, GFP_NOIO);
+	}
+	*pages_ret = pages;
 	return bio;
 }
 
@@ -234,6 +270,7 @@ static blk_status_t __blk_crypto_fallback_encrypt_bio(struct bio *src_bio,
 	struct scatterlist src, dst;
 	union blk_crypto_iv iv;
 	struct bio *enc_bio = NULL;
+	struct page **enc_pages;
 	unsigned int nr_segs;
 	unsigned int enc_idx = 0;
 	unsigned int j;
@@ -259,11 +296,10 @@ static blk_status_t __blk_crypto_fallback_encrypt_bio(struct bio *src_bio,
 
 		if (!enc_bio) {
 			enc_bio = blk_crypto_alloc_enc_bio(src_bio,
-					min(nr_segs, BIO_MAX_VECS));
+					min(nr_segs, BIO_MAX_VECS), &enc_pages);
 		}
 
-		enc_page = mempool_alloc(blk_crypto_bounce_page_pool,
-				GFP_NOIO);
+		enc_page = enc_pages[enc_idx];
 		__bio_add_page(enc_bio, enc_page, src_bv.bv_len,
 				src_bv.bv_offset);
 
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/9] mempool: update kerneldoc comments
  2025-10-31  9:34 ` [PATCH 1/9] mempool: update kerneldoc comments Christoph Hellwig
@ 2025-11-05 14:02   ` Vlastimil Babka
  2025-11-05 14:14     ` Vlastimil Babka
  2025-11-07  3:26   ` Eric Biggers
  1 sibling, 1 reply; 41+ messages in thread
From: Vlastimil Babka @ 2025-11-05 14:02 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Eric Biggers, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

On 10/31/25 10:34, Christoph Hellwig wrote:
> Use proper formatting, use full sentences and reduce some verbosity in
> function parameter descriptions.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  mm/mempool.c | 36 +++++++++++++++++-------------------
>  1 file changed, 17 insertions(+), 19 deletions(-)
> 
> diff --git a/mm/mempool.c b/mm/mempool.c
> index 1c38e873e546..d7c55a98c2be 100644
> --- a/mm/mempool.c
> +++ b/mm/mempool.c
> @@ -372,18 +372,15 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
>  EXPORT_SYMBOL(mempool_resize);
>  
>  /**
> - * mempool_alloc - allocate an element from a specific memory pool
> - * @pool:      pointer to the memory pool which was allocated via
> - *             mempool_create().
> - * @gfp_mask:  the usual allocation bitmask.
> + * mempool_alloc - allocate an element from a memory pool
> + * @pool:	pointer to the memory pool
> + * @gfp_mask:	GFP_* flags.
>   *
> - * this function only sleeps if the alloc_fn() function sleeps or
> - * returns NULL. Note that due to preallocation, this function
> - * *never* fails when called from process contexts. (it might
> - * fail if called from an IRQ context.)

Why remove this part? Isn't it the most important behavior of mempools?

> - * Note: using __GFP_ZERO is not supported.
> + * Note: This function only sleeps if the alloc_fn callback sleeps or returns
> + * %NULL.  Using __GFP_ZERO is not supported.
>   *
> - * Return: pointer to the allocated element or %NULL on error.
> + * Return: pointer to the allocated element or %NULL on error. This function
> + * never returns %NULL when @gfp_mask allows sleeping.
>   */
>  void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
>  {
> @@ -456,11 +453,10 @@ EXPORT_SYMBOL(mempool_alloc_noprof);
>  
>  /**
>   * mempool_alloc_preallocated - allocate an element from preallocated elements
> - *                              belonging to a specific memory pool
> - * @pool:      pointer to the memory pool which was allocated via
> - *             mempool_create().
> + *                              belonging to a memory pool
> + * @pool:	pointer to the memory pool
>   *
> - * This function is similar to mempool_alloc, but it only attempts allocating
> + * This function is similar to mempool_alloc(), but it only attempts allocating
>   * an element from the preallocated elements. It does not sleep and immediately
>   * returns if no preallocated elements are available.
>   *
> @@ -492,12 +488,14 @@ void *mempool_alloc_preallocated(mempool_t *pool)
>  EXPORT_SYMBOL(mempool_alloc_preallocated);
>  
>  /**
> - * mempool_free - return an element to the pool.
> - * @element:   pool element pointer.
> - * @pool:      pointer to the memory pool which was allocated via
> - *             mempool_create().
> + * mempool_free - return an element to a mempool
> + * @element:	pointer to element
> + * @pool:	pointer to the memory pool
> + *
> + * Returns @elem to @pool if its needs replenishing, else free it using
> + * the free_fn callback in @pool.
>   *
> - * this function only sleeps if the free_fn() function sleeps.
> + * This function only sleeps if the free_fn callback sleeps.
>   */
>  void mempool_free(void *element, mempool_t *pool)
>  {



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/9] mempool: add error injection support
  2025-10-31  9:34 ` [PATCH 2/9] mempool: add error injection support Christoph Hellwig
@ 2025-11-05 14:04   ` Vlastimil Babka
  2025-11-07  3:29   ` Eric Biggers
  1 sibling, 0 replies; 41+ messages in thread
From: Vlastimil Babka @ 2025-11-05 14:04 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Eric Biggers, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

On 10/31/25 10:34, Christoph Hellwig wrote:
> Add a call to should_fail_ex that forces mempool to actually allocate
> from the pool to stress the mempool implementation when enabled through
> debugfs.  By default should_fail{,_ex} prints a very verbose stack trace
> that clutters the kernel log, slows down execution and triggers the
> kernel bug detection in xfstests.  Pass FAULT_NOWARN and print a
> single-line message notating the caller instead so that full tests
> can be run with fault injection.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/mempool.c | 24 +++++++++++++++++++-----
>  1 file changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/mempool.c b/mm/mempool.c
> index d7c55a98c2be..15581179c8b9 100644
> --- a/mm/mempool.c
> +++ b/mm/mempool.c
> @@ -9,7 +9,7 @@
>   *  started by Ingo Molnar, Copyright (C) 2001
>   *  debugging by David Rientjes, Copyright (C) 2015
>   */
> -
> +#include <linux/fault-inject.h>
>  #include <linux/mm.h>
>  #include <linux/slab.h>
>  #include <linux/highmem.h>
> @@ -20,6 +20,15 @@
>  #include <linux/writeback.h>
>  #include "slab.h"
>  
> +static DECLARE_FAULT_ATTR(fail_mempool_alloc);
> +
> +static int __init mempool_faul_inject_init(void)
> +{
> +	return PTR_ERR_OR_ZERO(fault_create_debugfs_attr("fail_mempool_alloc",
> +			NULL, &fail_mempool_alloc));
> +}
> +late_initcall(mempool_faul_inject_init);
> +
>  #ifdef CONFIG_SLUB_DEBUG_ON
>  static void poison_error(mempool_t *pool, void *element, size_t size,
>  			 size_t byte)
> @@ -399,10 +408,15 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
>  	gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO);
>  
>  repeat_alloc:
> -
> -	element = pool->alloc(gfp_temp, pool->pool_data);
> -	if (likely(element != NULL))
> -		return element;
> +	if (should_fail_ex(&fail_mempool_alloc, 1, FAULT_NOWARN)) {
> +		pr_info("forcing mempool usage for pool %pS\n",
> +				(void *)_RET_IP_);
> +		element = NULL;
> +	} else {
> +		element = pool->alloc(gfp_temp, pool->pool_data);
> +		if (likely(element != NULL))
> +			return element;
> +	}
>  
>  	spin_lock_irqsave(&pool->lock, flags);
>  	if (likely(pool->curr_nr)) {



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/9] mempool: update kerneldoc comments
  2025-11-05 14:02   ` Vlastimil Babka
@ 2025-11-05 14:14     ` Vlastimil Babka
  0 siblings, 0 replies; 41+ messages in thread
From: Vlastimil Babka @ 2025-11-05 14:14 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Eric Biggers, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

On 11/5/25 15:02, Vlastimil Babka wrote:
> On 10/31/25 10:34, Christoph Hellwig wrote:
>> Use proper formatting, use full sentences and reduce some verbosity in
>> function parameter descriptions.
>> 
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>>  mm/mempool.c | 36 +++++++++++++++++-------------------
>>  1 file changed, 17 insertions(+), 19 deletions(-)
>> 
>> diff --git a/mm/mempool.c b/mm/mempool.c
>> index 1c38e873e546..d7c55a98c2be 100644
>> --- a/mm/mempool.c
>> +++ b/mm/mempool.c
>> @@ -372,18 +372,15 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
>>  EXPORT_SYMBOL(mempool_resize);
>>  
>>  /**
>> - * mempool_alloc - allocate an element from a specific memory pool
>> - * @pool:      pointer to the memory pool which was allocated via
>> - *             mempool_create().
>> - * @gfp_mask:  the usual allocation bitmask.
>> + * mempool_alloc - allocate an element from a memory pool
>> + * @pool:	pointer to the memory pool
>> + * @gfp_mask:	GFP_* flags.
>>   *
>> - * this function only sleeps if the alloc_fn() function sleeps or
>> - * returns NULL. Note that due to preallocation, this function
>> - * *never* fails when called from process contexts. (it might
>> - * fail if called from an IRQ context.)
> 
> Why remove this part? Isn't it the most important behavior of mempools?
> 
>> - * Note: using __GFP_ZERO is not supported.
>> + * Note: This function only sleeps if the alloc_fn callback sleeps or returns
>> + * %NULL.  Using __GFP_ZERO is not supported.
>>   
>> - * Return: pointer to the allocated element or %NULL on error.
>> + * Return: pointer to the allocated element or %NULL on error. This function
>> + * never returns %NULL when @gfp_mask allows sleeping.

Oh I see, it's here.

Acked-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/9] mempool: add mempool_{alloc,free}_bulk
  2025-10-31  9:34 ` [PATCH 3/9] mempool: add mempool_{alloc,free}_bulk Christoph Hellwig
@ 2025-11-05 15:04   ` Vlastimil Babka
  2025-11-06 14:13     ` Christoph Hellwig
  2025-11-07  3:52   ` Eric Biggers
  1 sibling, 1 reply; 41+ messages in thread
From: Vlastimil Babka @ 2025-11-05 15:04 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Eric Biggers, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

On 10/31/25 10:34, Christoph Hellwig wrote:
> Add a version of the mempool allocator that works for batch allocations
> of multiple objects.  Calling mempool_alloc in a loop is not safe because
> it could deadlock if multiple threads are performing such an allocation
> at the same time.
> 
> As an extra benefit the interface is build so that the same array can be
> used for alloc_pages_bulk / release_pages so that at least for page
> backed mempools the fast path can use a nice batch optimization.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/mempool.h |   7 ++
>  mm/mempool.c            | 145 ++++++++++++++++++++++++++++------------
>  2 files changed, 111 insertions(+), 41 deletions(-)
> 
> diff --git a/include/linux/mempool.h b/include/linux/mempool.h
> index 34941a4b9026..486ed50776db 100644
> --- a/include/linux/mempool.h
> +++ b/include/linux/mempool.h
> @@ -66,9 +66,16 @@ extern void mempool_destroy(mempool_t *pool);
>  extern void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask) __malloc;
>  #define mempool_alloc(...)						\
>  	alloc_hooks(mempool_alloc_noprof(__VA_ARGS__))
> +int mempool_alloc_bulk_noprof(mempool_t *pool, void **elem,
> +		unsigned int count, gfp_t gfp_mask, unsigned long caller_ip);
> +#define mempool_alloc_bulk(pool, elem, count, gfp_mask)			\
> +	alloc_hooks(mempool_alloc_bulk_noprof(pool, elem, count, gfp_mask, \
> +			_RET_IP_))
>  
>  extern void *mempool_alloc_preallocated(mempool_t *pool) __malloc;
>  extern void mempool_free(void *element, mempool_t *pool);
> +unsigned int mempool_free_bulk(mempool_t *pool, void **elem,
> +		unsigned int count);
>  
>  /*
>   * A mempool_alloc_t and mempool_free_t that get the memory from
> diff --git a/mm/mempool.c b/mm/mempool.c
> index 15581179c8b9..c980a0396986 100644
> --- a/mm/mempool.c
> +++ b/mm/mempool.c
> @@ -381,23 +381,29 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
>  EXPORT_SYMBOL(mempool_resize);
>  
>  /**
> - * mempool_alloc - allocate an element from a memory pool
> + * mempool_alloc_bulk - allocate multiple elements from a memory pool
>   * @pool:	pointer to the memory pool
> + * @elem:	partially or fully populated elements array
> + * @count:	size (in entries) of @elem
>   * @gfp_mask:	GFP_* flags.
>   *
> + * Allocate elements for each slot in @elem that is non-%NULL.
> + *
>   * Note: This function only sleeps if the alloc_fn callback sleeps or returns
>   * %NULL.  Using __GFP_ZERO is not supported.
>   *
> - * Return: pointer to the allocated element or %NULL on error. This function
> - * never returns %NULL when @gfp_mask allows sleeping.
> + * Return: 0 if successful, else -ENOMEM.  This function never returns -ENOMEM
> + * when @gfp_mask allows sleeping.
>   */
> -void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
> +int mempool_alloc_bulk_noprof(struct mempool *pool, void **elem,
> +		unsigned int count, gfp_t gfp_mask, unsigned long caller_ip)
>  {
> -	void *element;
>  	unsigned long flags;
>  	wait_queue_entry_t wait;
>  	gfp_t gfp_temp;
> +	unsigned int i;
>  
> +	VM_WARN_ON_ONCE(count > pool->min_nr);
>  	VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO);
>  	might_alloc(gfp_mask);
>  
> @@ -407,20 +413,31 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
>  
>  	gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO);
>  
> +	i = 0;
>  repeat_alloc:
> -	if (should_fail_ex(&fail_mempool_alloc, 1, FAULT_NOWARN)) {
> -		pr_info("forcing mempool usage for pool %pS\n",
> -				(void *)_RET_IP_);
> -		element = NULL;
> -	} else {
> -		element = pool->alloc(gfp_temp, pool->pool_data);
> -		if (likely(element != NULL))
> -			return element;
> +	for (; i < count; i++) {
> +		if (!elem[i]) {
> +			if (should_fail_ex(&fail_mempool_alloc, 1,
> +					FAULT_NOWARN)) {
> +				pr_info("forcing pool usage for pool %pS\n",
> +					(void *)caller_ip);
> +				goto use_pool;
> +			}

Would it be enough to do this failure injection attempt once and not in
every iteration?

> +			elem[i] = pool->alloc(gfp_temp, pool->pool_data);
> +			if (unlikely(!elem[i]))
> +				goto use_pool;
> +		}
>  	}
>  
> +	return 0;
> +
> +use_pool:
>  	spin_lock_irqsave(&pool->lock, flags);
> -	if (likely(pool->curr_nr)) {
> -		element = remove_element(pool);
> +	if (likely(pool->curr_nr >= count - i)) {
> +		for (; i < count; i++) {
> +			if (!elem[i])
> +				elem[i] = remove_element(pool);
> +		}
>  		spin_unlock_irqrestore(&pool->lock, flags);
>  		/* paired with rmb in mempool_free(), read comment there */
>  		smp_wmb();
> @@ -428,8 +445,9 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
>  		 * Update the allocation stack trace as this is more useful
>  		 * for debugging.
>  		 */
> -		kmemleak_update_trace(element);
> -		return element;
> +		for (i = 0; i < count; i++)
> +			kmemleak_update_trace(elem[i]);
> +		return 0;
>  	}
>  
>  	/*
> @@ -445,10 +463,12 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
>  	/* We must not sleep if !__GFP_DIRECT_RECLAIM */
>  	if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
>  		spin_unlock_irqrestore(&pool->lock, flags);
> -		return NULL;
> +		if (i > 0)
> +			mempool_free_bulk(pool, elem + i, count - i);

I don't understand why we are trying to free from i to count and not from 0
to i? Seems buggy, there will likely be NULLs which might go through
add_element() which assumes they are not NULL.

Assuming this is fixed we might still have confusing API. We might be
freeing away elements that were already in the array when
mempool_alloc_bulk() was called. OTOH the pool might be missing less than i
elements and mempool_free_bulk() will not do anything with the rest.
Anything beyond i is untouched. The caller has no idea what's in the array
after getting this -ENOMEM. (alloc_pages_bulk() returns the number of pages
there).
Maybe it's acceptable (your usecase I think doesn't even add a caller that
can't block), but needs documenting clearly.

> +		return -ENOMEM;
>  	}
>  
> -	/* Let's wait for someone else to return an element to @pool */
> +	/* Let's wait for someone else to return elements to @pool */
>  	init_wait(&wait);
>  	prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);

So in theory callers waiting for many objects might wait indefinitely to
find enough objects in the pool, while smaller callers succeed their
allocations and deplete the pool. Mempools never provided some fair ordering
of waiters, but this might make it worse deterministically instead of
randomly. Guess it's not such a problem if all callers are comparable in
number of objects.

> @@ -463,6 +483,27 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
>  	finish_wait(&pool->wait, &wait);
>  	goto repeat_alloc;
>  }
> +EXPORT_SYMBOL_GPL(mempool_alloc_bulk_noprof);
> +
> +/**
> + * mempool_alloc - allocate an element from a memory pool
> + * @pool:	pointer to the memory pool
> + * @gfp_mask:	GFP_* flags.
> + *
> + * Note: This function only sleeps if the alloc_fn callback sleeps or returns
> + * %NULL.  Using __GFP_ZERO is not supported.
> + *
> + * Return: pointer to the allocated element or %NULL on error. This function
> + * never returns %NULL when @gfp_mask allows sleeping.
> + */
> +void *mempool_alloc_noprof(struct mempool *pool, gfp_t gfp_mask)
> +{
> +	void *elem[1] = { };
> +
> +	if (mempool_alloc_bulk_noprof(pool, elem, 1, gfp_mask, _RET_IP_) < 0)
> +		return NULL;
> +	return elem[0];
> +}
>  EXPORT_SYMBOL(mempool_alloc_noprof);
>  
>  /**
> @@ -502,21 +543,26 @@ void *mempool_alloc_preallocated(mempool_t *pool)
>  EXPORT_SYMBOL(mempool_alloc_preallocated);
>  
>  /**
> - * mempool_free - return an element to a mempool
> - * @element:	pointer to element
> + * mempool_free_bulk - return elements to a mempool
>   * @pool:	pointer to the memory pool
> + * @elem:	elements to return
> + * @count:	number of elements to return
>   *
> - * Returns @elem to @pool if its needs replenishing, else free it using
> - * the free_fn callback in @pool.
> + * Returns elements from @elem to @pool if its needs replenishing and sets
> + * their slot in @elem to NULL.  Other elements are left in @elem.
> + *
> + * Return: number of elements transferred to @pool.  Elements are always
> + * transferred from the beginning of @elem, so the return value can be used as
> + * an offset into @elem for the freeing the remaining elements in the caller.
>   *
>   * This function only sleeps if the free_fn callback sleeps.

This part now only applies to mempool_free() ?

>   */
> -void mempool_free(void *element, mempool_t *pool)
> +unsigned int mempool_free_bulk(struct mempool *pool, void **elem,
> +		unsigned int count)
>  {
>  	unsigned long flags;
> -
> -	if (unlikely(element == NULL))
> -		return;
> +	bool added = false;
> +	unsigned int freed = 0;
>  
>  	/*
>  	 * Paired with the wmb in mempool_alloc().  The preceding read is
> @@ -553,15 +599,11 @@ void mempool_free(void *element, mempool_t *pool)
>  	 */
>  	if (unlikely(READ_ONCE(pool->curr_nr) < pool->min_nr)) {
>  		spin_lock_irqsave(&pool->lock, flags);
> -		if (likely(pool->curr_nr < pool->min_nr)) {
> -			add_element(pool, element);
> -			spin_unlock_irqrestore(&pool->lock, flags);
> -			if (wq_has_sleeper(&pool->wait))
> -				wake_up(&pool->wait);
> -			return;
> +		while (pool->curr_nr < pool->min_nr && freed < count) {
> +			add_element(pool, elem[freed++]);
> +			added = true;
>  		}
>  		spin_unlock_irqrestore(&pool->lock, flags);
> -	}
>  
>  	/*
>  	 * Handle the min_nr = 0 edge case:
> @@ -572,20 +614,41 @@ void mempool_free(void *element, mempool_t *pool)
>  	 * allocation of element when both min_nr and curr_nr are 0, and
>  	 * any active waiters are properly awakened.
>  	 */
> -	if (unlikely(pool->min_nr == 0 &&
> +	} else if (unlikely(pool->min_nr == 0 &&
>  		     READ_ONCE(pool->curr_nr) == 0)) {
>  		spin_lock_irqsave(&pool->lock, flags);
>  		if (likely(pool->curr_nr == 0)) {
> -			add_element(pool, element);
> -			spin_unlock_irqrestore(&pool->lock, flags);
> -			if (wq_has_sleeper(&pool->wait))
> -				wake_up(&pool->wait);
> -			return;
> +			add_element(pool, elem[freed++]);
> +			added = true;
>  		}
>  		spin_unlock_irqrestore(&pool->lock, flags);
>  	}
>  
> -	pool->free(element, pool->pool_data);
> +	if (unlikely(added) && wq_has_sleeper(&pool->wait))
> +		wake_up(&pool->wait);
> +
> +	return freed;
> +}
> +EXPORT_SYMBOL_GPL(mempool_free_bulk);
> +
> +/**
> + * mempool_free - return an element to the pool.
> + * @element:	element to return
> + * @pool:	pointer to the memory pool
> + *
> + * Returns @elem to @pool if its needs replenishing, else free it using
> + * the free_fn callback in @pool.
> + *
> + * This function only sleeps if the free_fn callback sleeps.
> + */
> +void mempool_free(void *element, struct mempool *pool)
> +{
> +	if (likely(element)) {
> +		void *elem[1] = { element };
> +
> +		if (!mempool_free_bulk(pool, elem, 1))
> +			pool->free(element, pool->pool_data);
> +	}
>  }
>  EXPORT_SYMBOL(mempool_free);
>  



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 9/9] blk-crypto: use mempool_alloc_bulk for encrypted bio page allocation
  2025-10-31  9:34 ` [PATCH 9/9] blk-crypto: use mempool_alloc_bulk for encrypted bio page allocation Christoph Hellwig
@ 2025-11-05 15:12   ` Vlastimil Babka
  2025-11-06 14:01     ` Christoph Hellwig
  0 siblings, 1 reply; 41+ messages in thread
From: Vlastimil Babka @ 2025-11-05 15:12 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Eric Biggers, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

On 10/31/25 10:34, Christoph Hellwig wrote:
> @@ -192,6 +205,29 @@ static struct bio *blk_crypto_alloc_enc_bio(struct bio *bio_src,
>  	bio->bi_write_stream	= bio_src->bi_write_stream;
>  	bio->bi_iter.bi_sector	= bio_src->bi_iter.bi_sector;
>  	bio_clone_blkg_association(bio, bio_src);
> +
> +	/*
> +	 * Move page array up in the allocated memory for the bio vecs as far as
> +	 * possible so that we can start filling biovecs from the beginning
> +	 * without overwriting the temporary page array.
> +	 */
> +	static_assert(PAGE_PTRS_PER_BVEC > 1);
> +	pages = (struct page **)bio->bi_io_vec;
> +	pages += nr_segs * (PAGE_PTRS_PER_BVEC - 1);
> +
> +	/*
> +	 * Try a bulk allocation first.  This could leave random pages in the
> +	 * array unallocated, but we'll fix that up later in mempool_alloc_bulk.
> +	 *
> +	 * Note: alloc_pages_bulk needs the array to be zeroed, as it assumes
> +	 * any non-zero slot already contains a valid allocation.
> +	 */
> +	memset(pages, 0, sizeof(struct page *) * nr_segs);
> +	if (alloc_pages_bulk(GFP_NOFS, nr_segs, pages) < nr_segs) {
> +		mempool_alloc_bulk(blk_crypto_bounce_page_pool, (void **)pages,
> +				nr_segs, GFP_NOIO);

Why do the GFP flags differ?

> +	}
> +	*pages_ret = pages;
>  	return bio;
>  }
>  
> @@ -234,6 +270,7 @@ static blk_status_t __blk_crypto_fallback_encrypt_bio(struct bio *src_bio,
>  	struct scatterlist src, dst;
>  	union blk_crypto_iv iv;
>  	struct bio *enc_bio = NULL;
> +	struct page **enc_pages;
>  	unsigned int nr_segs;
>  	unsigned int enc_idx = 0;
>  	unsigned int j;
> @@ -259,11 +296,10 @@ static blk_status_t __blk_crypto_fallback_encrypt_bio(struct bio *src_bio,
>  
>  		if (!enc_bio) {
>  			enc_bio = blk_crypto_alloc_enc_bio(src_bio,
> -					min(nr_segs, BIO_MAX_VECS));
> +					min(nr_segs, BIO_MAX_VECS), &enc_pages);
>  		}
>  
> -		enc_page = mempool_alloc(blk_crypto_bounce_page_pool,
> -				GFP_NOIO);
> +		enc_page = enc_pages[enc_idx];
>  		__bio_add_page(enc_bio, enc_page, src_bv.bv_len,
>  				src_bv.bv_offset);
>  



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 9/9] blk-crypto: use mempool_alloc_bulk for encrypted bio page allocation
  2025-11-05 15:12   ` Vlastimil Babka
@ 2025-11-06 14:01     ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-11-06 14:01 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Hellwig, Jens Axboe, Eric Biggers, Andrew Morton,
	Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

On Wed, Nov 05, 2025 at 04:12:29PM +0100, Vlastimil Babka wrote:
> > +	memset(pages, 0, sizeof(struct page *) * nr_segs);
> > +	if (alloc_pages_bulk(GFP_NOFS, nr_segs, pages) < nr_segs) {
> > +		mempool_alloc_bulk(blk_crypto_bounce_page_pool, (void **)pages,
> > +				nr_segs, GFP_NOIO);
> 
> Why do the GFP flags differ?

Me messing up.  So fsr all blk-crypto stuff was GFP_NOIO.  I think it
generally should be GFP_NOFS now, but f2fs is doing some weird things
with GFP_NOIO.  (Could everyone please document their restrictions,
preferable using the task flags, thanks..).

I'll restore GFP_NOIO for the next version.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/9] mempool: add mempool_{alloc,free}_bulk
  2025-11-05 15:04   ` Vlastimil Babka
@ 2025-11-06 14:13     ` Christoph Hellwig
  2025-11-06 14:27       ` Vlastimil Babka
  0 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-11-06 14:13 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Hellwig, Jens Axboe, Eric Biggers, Andrew Morton,
	Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

On Wed, Nov 05, 2025 at 04:04:53PM +0100, Vlastimil Babka wrote:
> > +	for (; i < count; i++) {
> > +		if (!elem[i]) {
> > +			if (should_fail_ex(&fail_mempool_alloc, 1,
> > +					FAULT_NOWARN)) {
> > +				pr_info("forcing pool usage for pool %pS\n",
> > +					(void *)caller_ip);
> > +				goto use_pool;
> > +			}
> 
> Would it be enough to do this failure injection attempt once and not in
> every iteration?

Well, that would only test failure handling for the first element. Or
you mean don't call it again if called once?

> >  	/*
> > @@ -445,10 +463,12 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
> >  	/* We must not sleep if !__GFP_DIRECT_RECLAIM */
> >  	if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
> >  		spin_unlock_irqrestore(&pool->lock, flags);
> > -		return NULL;
> > +		if (i > 0)
> > +			mempool_free_bulk(pool, elem + i, count - i);
> 
> I don't understand why we are trying to free from i to count and not from 0
> to i? Seems buggy, there will likely be NULLs which might go through
> add_element() which assumes they are not NULL.

Yes, this looks like broken copy and paste.  The again I'm not even
sure who calls into mempool without __GFP_DIRECT_RECLAIM reset, as
that's kinda pointless.

> Assuming this is fixed we might still have confusing API. We might be
> freeing away elements that were already in the array when
> mempool_alloc_bulk() was called. OTOH the pool might be missing less than i
> elements and mempool_free_bulk() will not do anything with the rest.
> Anything beyond i is untouched. The caller has no idea what's in the array
> after getting this -ENOMEM. (alloc_pages_bulk() returns the number of pages
> there).
> Maybe it's acceptable (your usecase I think doesn't even add a caller that
> can't block), but needs documenting clearly.

I'm tempted to just disallow !__GFP_DIRECT_RECLAIM bulk allocations.
That feature seems to being a lot of trouble for no real gain, as
we can't use mempool as a guaranteed allocator there, so it's kinda
pointless.

> So in theory callers waiting for many objects might wait indefinitely to
> find enough objects in the pool, while smaller callers succeed their
> allocations and deplete the pool. Mempools never provided some fair ordering
> of waiters, but this might make it worse deterministically instead of
> randomly. Guess it's not such a problem if all callers are comparable in
> number of objects.

Yeah, which is the use case.

> >   * This function only sleeps if the free_fn callback sleeps.
> 
> This part now only applies to mempool_free() ?

Both mempool_free and mempool_free_bulk.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/9] mempool: add mempool_{alloc,free}_bulk
  2025-11-06 14:13     ` Christoph Hellwig
@ 2025-11-06 14:27       ` Vlastimil Babka
  2025-11-06 14:48         ` Christoph Hellwig
  0 siblings, 1 reply; 41+ messages in thread
From: Vlastimil Babka @ 2025-11-06 14:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Eric Biggers, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Harry Yoo, linux-block,
	linux-fsdevel, linux-fscrypt, linux-mm

On 11/6/25 15:13, Christoph Hellwig wrote:
> On Wed, Nov 05, 2025 at 04:04:53PM +0100, Vlastimil Babka wrote:
>> > +	for (; i < count; i++) {
>> > +		if (!elem[i]) {
>> > +			if (should_fail_ex(&fail_mempool_alloc, 1,
>> > +					FAULT_NOWARN)) {
>> > +				pr_info("forcing pool usage for pool %pS\n",
>> > +					(void *)caller_ip);
>> > +				goto use_pool;
>> > +			}
>> 
>> Would it be enough to do this failure injection attempt once and not in
>> every iteration?
> 
> Well, that would only test failure handling for the first element. Or
> you mean don't call it again if called once?

I mean since this is (due to the semantics of mempools) not really causing a
failure to the caller (unlike the typical failure injection usage), but
forcing preallocated objecs use, I'm not sure we get much benefit (in terms
of testing caller's error paths) from the fine grained selection of the
first element where we inject fail, and failing immediately or never should
be sufficient.

>> >  	/*
>> > @@ -445,10 +463,12 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
>> >  	/* We must not sleep if !__GFP_DIRECT_RECLAIM */
>> >  	if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
>> >  		spin_unlock_irqrestore(&pool->lock, flags);
>> > -		return NULL;
>> > +		if (i > 0)
>> > +			mempool_free_bulk(pool, elem + i, count - i);
>> 
>> I don't understand why we are trying to free from i to count and not from 0
>> to i? Seems buggy, there will likely be NULLs which might go through
>> add_element() which assumes they are not NULL.
> 
> Yes, this looks like broken copy and paste.  The again I'm not even
> sure who calls into mempool without __GFP_DIRECT_RECLAIM reset, as
> that's kinda pointless.

Hm yeah would have to be some special case where something limits how many
such outstanding allocations can there be, otherwise it's just a cache to
make success more likely but not guaranteed.

>> Assuming this is fixed we might still have confusing API. We might be
>> freeing away elements that were already in the array when
>> mempool_alloc_bulk() was called. OTOH the pool might be missing less than i
>> elements and mempool_free_bulk() will not do anything with the rest.
>> Anything beyond i is untouched. The caller has no idea what's in the array
>> after getting this -ENOMEM. (alloc_pages_bulk() returns the number of pages
>> there).
>> Maybe it's acceptable (your usecase I think doesn't even add a caller that
>> can't block), but needs documenting clearly.
> 
> I'm tempted to just disallow !__GFP_DIRECT_RECLAIM bulk allocations.
> That feature seems to being a lot of trouble for no real gain, as
> we can't use mempool as a guaranteed allocator there, so it's kinda
> pointless.

Agree. If anyone comes up with a use case they can extend and actually test
these rollback paths.

>> So in theory callers waiting for many objects might wait indefinitely to
>> find enough objects in the pool, while smaller callers succeed their
>> allocations and deplete the pool. Mempools never provided some fair ordering
>> of waiters, but this might make it worse deterministically instead of
>> randomly. Guess it's not such a problem if all callers are comparable in
>> number of objects.
> 
> Yeah, which is the use case.

Good.

>> >   * This function only sleeps if the free_fn callback sleeps.
>> 
>> This part now only applies to mempool_free() ?
> 
> Both mempool_free and mempool_free_bulk.

But mempool_free_bulk() doesn't use the callback, it's up to the caller to
free anything the mempool didn't use for its refill.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/9] mempool: add mempool_{alloc,free}_bulk
  2025-11-06 14:27       ` Vlastimil Babka
@ 2025-11-06 14:48         ` Christoph Hellwig
  2025-11-06 14:57           ` Vlastimil Babka
  0 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-11-06 14:48 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Hellwig, Jens Axboe, Eric Biggers, Andrew Morton,
	Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

On Thu, Nov 06, 2025 at 03:27:35PM +0100, Vlastimil Babka wrote:
> >> Would it be enough to do this failure injection attempt once and not in
> >> every iteration?
> > 
> > Well, that would only test failure handling for the first element. Or
> > you mean don't call it again if called once?
> 
> I mean since this is (due to the semantics of mempools) not really causing a
> failure to the caller (unlike the typical failure injection usage), but
> forcing preallocated objecs use, I'm not sure we get much benefit (in terms
> of testing caller's error paths) from the fine grained selection of the
> first element where we inject fail, and failing immediately or never should
> be sufficient.

I guess. OTOH testing multiple failures could be useful?

> > Yes, this looks like broken copy and paste.  The again I'm not even
> > sure who calls into mempool without __GFP_DIRECT_RECLAIM reset, as
> > that's kinda pointless.
> 
> Hm yeah would have to be some special case where something limits how many
> such outstanding allocations can there be, otherwise it's just a cache to
> make success more likely but not guaranteed.

I think the only reason mempool_alloc even allows !__GFP_DIRECT_RECLAIM
is to avoid special casing that in callers that have a non-constant
gfp mask.  So maybe the best thing would be to never actually go to
the pool for them and just give up if alloc_fn fails?

> >> >   * This function only sleeps if the free_fn callback sleeps.
> >> 
> >> This part now only applies to mempool_free() ?
> > 
> > Both mempool_free and mempool_free_bulk.
> 
> But mempool_free_bulk() doesn't use the callback, it's up to the caller to
> free anything the mempool didn't use for its refill.

You're right.  So mempool_free_bulk itself will indeed never sleep and
I'll fix that up.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/9] mempool: add mempool_{alloc,free}_bulk
  2025-11-06 14:48         ` Christoph Hellwig
@ 2025-11-06 14:57           ` Vlastimil Babka
  2025-11-06 15:00             ` Christoph Hellwig
  0 siblings, 1 reply; 41+ messages in thread
From: Vlastimil Babka @ 2025-11-06 14:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Eric Biggers, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Harry Yoo, linux-block,
	linux-fsdevel, linux-fscrypt, linux-mm

On 11/6/25 15:48, Christoph Hellwig wrote:
> On Thu, Nov 06, 2025 at 03:27:35PM +0100, Vlastimil Babka wrote:
>> >> Would it be enough to do this failure injection attempt once and not in
>> >> every iteration?
>> > 
>> > Well, that would only test failure handling for the first element. Or
>> > you mean don't call it again if called once?
>> 
>> I mean since this is (due to the semantics of mempools) not really causing a
>> failure to the caller (unlike the typical failure injection usage), but
>> forcing preallocated objecs use, I'm not sure we get much benefit (in terms
>> of testing caller's error paths) from the fine grained selection of the
>> first element where we inject fail, and failing immediately or never should
>> be sufficient.
> 
> I guess. OTOH testing multiple failures could be useful?

Maybe, guess I don't care that much, as long as it's not causing overhead on
every iteration when disabled, which it shouldn't.

>> > Yes, this looks like broken copy and paste.  The again I'm not even
>> > sure who calls into mempool without __GFP_DIRECT_RECLAIM reset, as
>> > that's kinda pointless.
>> 
>> Hm yeah would have to be some special case where something limits how many
>> such outstanding allocations can there be, otherwise it's just a cache to
>> make success more likely but not guaranteed.
> 
> I think the only reason mempool_alloc even allows !__GFP_DIRECT_RECLAIM
> is to avoid special casing that in callers that have a non-constant
> gfp mask.  So maybe the best thing would be to never actually go to
> the pool for them and just give up if alloc_fn fails?

Yeah, but I guess we could keep trying the pool for the single allocation
case as that's simple enough, just not for the bulk.

>> >> >   * This function only sleeps if the free_fn callback sleeps.
>> >> 
>> >> This part now only applies to mempool_free() ?
>> > 
>> > Both mempool_free and mempool_free_bulk.
>> 
>> But mempool_free_bulk() doesn't use the callback, it's up to the caller to
>> free anything the mempool didn't use for its refill.
> 
> You're right.  So mempool_free_bulk itself will indeed never sleep and
> I'll fix that up.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/9] mempool: add mempool_{alloc,free}_bulk
  2025-11-06 14:57           ` Vlastimil Babka
@ 2025-11-06 15:00             ` Christoph Hellwig
  2025-11-06 15:09               ` Vlastimil Babka
  0 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-11-06 15:00 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Hellwig, Jens Axboe, Eric Biggers, Andrew Morton,
	Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

On Thu, Nov 06, 2025 at 03:57:13PM +0100, Vlastimil Babka wrote:
> > I think the only reason mempool_alloc even allows !__GFP_DIRECT_RECLAIM
> > is to avoid special casing that in callers that have a non-constant
> > gfp mask.  So maybe the best thing would be to never actually go to
> > the pool for them and just give up if alloc_fn fails?
> 
> Yeah, but I guess we could keep trying the pool for the single allocation
> case as that's simple enough, just not for the bulk.

Doing that will be quite a bit more complicated I think.  And given
that the !__GFP_DIRECT_RECLAIM handlers must be able to handle failure
I'm also not sure that using the pool is all that useful.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/9] mempool: add mempool_{alloc,free}_bulk
  2025-11-06 15:00             ` Christoph Hellwig
@ 2025-11-06 15:09               ` Vlastimil Babka
  0 siblings, 0 replies; 41+ messages in thread
From: Vlastimil Babka @ 2025-11-06 15:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Eric Biggers, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Harry Yoo, linux-block,
	linux-fsdevel, linux-fscrypt, linux-mm

On 11/6/25 16:00, Christoph Hellwig wrote:
> On Thu, Nov 06, 2025 at 03:57:13PM +0100, Vlastimil Babka wrote:
>> > I think the only reason mempool_alloc even allows !__GFP_DIRECT_RECLAIM
>> > is to avoid special casing that in callers that have a non-constant
>> > gfp mask.  So maybe the best thing would be to never actually go to
>> > the pool for them and just give up if alloc_fn fails?
>> 
>> Yeah, but I guess we could keep trying the pool for the single allocation
>> case as that's simple enough, just not for the bulk.
> 
> Doing that will be quite a bit more complicated I think.  And given
> that the !__GFP_DIRECT_RECLAIM handlers must be able to handle failure
> I'm also not sure that using the pool is all that useful.

Fine, as long as we don't break some existing users that became dependent on
it in some way.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/9] mempool: update kerneldoc comments
  2025-10-31  9:34 ` [PATCH 1/9] mempool: update kerneldoc comments Christoph Hellwig
  2025-11-05 14:02   ` Vlastimil Babka
@ 2025-11-07  3:26   ` Eric Biggers
  2025-11-07 12:02     ` Christoph Hellwig
  1 sibling, 1 reply; 41+ messages in thread
From: Eric Biggers @ 2025-11-07  3:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Vlastimil Babka, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Harry Yoo, linux-block,
	linux-fsdevel, linux-fscrypt, linux-mm

On Fri, Oct 31, 2025 at 10:34:31AM +0100, Christoph Hellwig wrote:
> diff --git a/mm/mempool.c b/mm/mempool.c
> index 1c38e873e546..d7c55a98c2be 100644
> --- a/mm/mempool.c
> +++ b/mm/mempool.c
> @@ -372,18 +372,15 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
>  EXPORT_SYMBOL(mempool_resize);
>  
>  /**
> - * mempool_alloc - allocate an element from a specific memory pool
> - * @pool:      pointer to the memory pool which was allocated via
> - *             mempool_create().
> - * @gfp_mask:  the usual allocation bitmask.
> + * mempool_alloc - allocate an element from a memory pool
> + * @pool:	pointer to the memory pool
> + * @gfp_mask:	GFP_* flags.
>   *
> - * this function only sleeps if the alloc_fn() function sleeps or
> - * returns NULL. Note that due to preallocation, this function
> - * *never* fails when called from process contexts. (it might
> - * fail if called from an IRQ context.)
> - * Note: using __GFP_ZERO is not supported.
> + * Note: This function only sleeps if the alloc_fn callback sleeps or returns
> + * %NULL.  Using __GFP_ZERO is not supported.

Maybe put the note about __GFP_ZERO being unsupported directly in the
description of @gfp_mask.

>   *
> - * Return: pointer to the allocated element or %NULL on error.
> + * Return: pointer to the allocated element or %NULL on error. This function
> + * never returns %NULL when @gfp_mask allows sleeping.

Is "allows sleeping" exactly the same as "__GFP_DIRECT_RECLAIM is set"?
The latter is what the code actually checks for.

>  /**
> - * mempool_free - return an element to the pool.
> - * @element:   pool element pointer.
> - * @pool:      pointer to the memory pool which was allocated via
> - *             mempool_create().
> + * mempool_free - return an element to a mempool
> + * @element:	pointer to element
> + * @pool:	pointer to the memory pool
> + *
> + * Returns @elem to @pool if its needs replenishing, else free it using
> + * the free_fn callback in @pool.
>   *
> - * this function only sleeps if the free_fn() function sleeps.
> + * This function only sleeps if the free_fn callback sleeps.
>   */
>  void mempool_free(void *element, mempool_t *pool)

"if its needs" => "if it needs" and "@elem" => "@element"

- Eric


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/9] mempool: add error injection support
  2025-10-31  9:34 ` [PATCH 2/9] mempool: add error injection support Christoph Hellwig
  2025-11-05 14:04   ` Vlastimil Babka
@ 2025-11-07  3:29   ` Eric Biggers
  2025-11-07 12:04     ` Christoph Hellwig
  1 sibling, 1 reply; 41+ messages in thread
From: Eric Biggers @ 2025-11-07  3:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Vlastimil Babka, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Harry Yoo, linux-block,
	linux-fsdevel, linux-fscrypt, linux-mm

On Fri, Oct 31, 2025 at 10:34:32AM +0100, Christoph Hellwig wrote:
> Add a call to should_fail_ex that forces mempool to actually allocate
> from the pool to stress the mempool implementation when enabled through
> debugfs.  By default should_fail{,_ex} prints a very verbose stack trace
> that clutters the kernel log, slows down execution and triggers the
> kernel bug detection in xfstests.  Pass FAULT_NOWARN and print a
> single-line message notating the caller instead so that full tests
> can be run with fault injection.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  mm/mempool.c | 24 +++++++++++++++++++-----
>  1 file changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/mempool.c b/mm/mempool.c
> index d7c55a98c2be..15581179c8b9 100644
> --- a/mm/mempool.c
> +++ b/mm/mempool.c
> @@ -9,7 +9,7 @@
>   *  started by Ingo Molnar, Copyright (C) 2001
>   *  debugging by David Rientjes, Copyright (C) 2015
>   */
> -
> +#include <linux/fault-inject.h>
>  #include <linux/mm.h>
>  #include <linux/slab.h>
>  #include <linux/highmem.h>
> @@ -20,6 +20,15 @@
>  #include <linux/writeback.h>
>  #include "slab.h"
>  
> +static DECLARE_FAULT_ATTR(fail_mempool_alloc);
> +
> +static int __init mempool_faul_inject_init(void)
> +{
> +	return PTR_ERR_OR_ZERO(fault_create_debugfs_attr("fail_mempool_alloc",
> +			NULL, &fail_mempool_alloc));
> +}
> +late_initcall(mempool_faul_inject_init);

Initcalls usually go at the bottom of the file.

> +	if (should_fail_ex(&fail_mempool_alloc, 1, FAULT_NOWARN)) {

This doesn't build when CONFIG_FAULT_INJECTION=n.

- Eric


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/9] mempool: add mempool_{alloc,free}_bulk
  2025-10-31  9:34 ` [PATCH 3/9] mempool: add mempool_{alloc,free}_bulk Christoph Hellwig
  2025-11-05 15:04   ` Vlastimil Babka
@ 2025-11-07  3:52   ` Eric Biggers
  2025-11-07 12:06     ` Christoph Hellwig
  1 sibling, 1 reply; 41+ messages in thread
From: Eric Biggers @ 2025-11-07  3:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Vlastimil Babka, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Harry Yoo, linux-block,
	linux-fsdevel, linux-fscrypt, linux-mm

On Fri, Oct 31, 2025 at 10:34:33AM +0100, Christoph Hellwig wrote:
>  /**
> - * mempool_alloc - allocate an element from a memory pool
> + * mempool_alloc_bulk - allocate multiple elements from a memory pool
>   * @pool:	pointer to the memory pool
> + * @elem:	partially or fully populated elements array
> + * @count:	size (in entries) of @elem
>   * @gfp_mask:	GFP_* flags.
>   *
> + * Allocate elements for each slot in @elem that is non-%NULL.

elem => elems (and likewise for mempool_free_bulk())

>   * Note: This function only sleeps if the alloc_fn callback sleeps or returns
>   * %NULL.  Using __GFP_ZERO is not supported.
>   *
> - * Return: pointer to the allocated element or %NULL on error. This function
> - * never returns %NULL when @gfp_mask allows sleeping.
> + * Return: 0 if successful, else -ENOMEM.  This function never returns -ENOMEM
> + * when @gfp_mask allows sleeping.
>   */
> -void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
> +int mempool_alloc_bulk_noprof(struct mempool *pool, void **elem,
> +		unsigned int count, gfp_t gfp_mask, unsigned long caller_ip)

What exactly is the behavior on partial failures?  Is the return value 0
or is it -ENOMEM, and is the array restored to its original state or
might some elements have been allocated?

> +/**
> + * mempool_alloc - allocate an element from a memory pool
> + * @pool:	pointer to the memory pool
> + * @gfp_mask:	GFP_* flags.
> + *
> + * Note: This function only sleeps if the alloc_fn callback sleeps or returns
> + * %NULL.  Using __GFP_ZERO is not supported.
> + *
> + * Return: pointer to the allocated element or %NULL on error. This function
> + * never returns %NULL when @gfp_mask allows sleeping.
> + */
> +void *mempool_alloc_noprof(struct mempool *pool, gfp_t gfp_mask)
> +{
> +	void *elem[1] = { };
> +
> +	if (mempool_alloc_bulk_noprof(pool, elem, 1, gfp_mask, _RET_IP_) < 0)
> +		return NULL;
> +	return elem[0];
> +}
>  EXPORT_SYMBOL(mempool_alloc_noprof);

How much overhead does this add to mempool_alloc(), which will continue
to be the common case?  I wonder if it would be worthwhile to
force-inline the bulk allocation function into it, so that it will get
generate about the same code as before.

>  	if (unlikely(READ_ONCE(pool->curr_nr) < pool->min_nr)) {
>  		spin_lock_irqsave(&pool->lock, flags);
> -		if (likely(pool->curr_nr < pool->min_nr)) {
> -			add_element(pool, element);
> -			spin_unlock_irqrestore(&pool->lock, flags);
> -			if (wq_has_sleeper(&pool->wait))
> -				wake_up(&pool->wait);
> -			return;
> +		while (pool->curr_nr < pool->min_nr && freed < count) {
> +			add_element(pool, elem[freed++]);
> +			added = true;
>  		}
>  		spin_unlock_irqrestore(&pool->lock, flags);
> -	}
>  
>  	/*
>  	 * Handle the min_nr = 0 edge case:
> @@ -572,20 +614,41 @@ void mempool_free(void *element, mempool_t *pool)
>  	 * allocation of element when both min_nr and curr_nr are 0, and
>  	 * any active waiters are properly awakened.
>  	 */

The above comment has a weird position now.  Maybe move it into the
'else if' block below.

> -	if (unlikely(pool->min_nr == 0 &&
> +	} else if (unlikely(pool->min_nr == 0 &&
>  		     READ_ONCE(pool->curr_nr) == 0)) {
>  		spin_lock_irqsave(&pool->lock, flags);
>  		if (likely(pool->curr_nr == 0)) {
> -			add_element(pool, element);
> -			spin_unlock_irqrestore(&pool->lock, flags);
> -			if (wq_has_sleeper(&pool->wait))
> -				wake_up(&pool->wait);
> -			return;
> +			add_element(pool, elem[freed++]);
> +			added = true;
>  		}
>  		spin_unlock_irqrestore(&pool->lock, flags);
>  	}
>  
> -	pool->free(element, pool->pool_data);
> +	if (unlikely(added) && wq_has_sleeper(&pool->wait))
> +		wake_up(&pool->wait);
> +
> +	return freed;
> +}
> +EXPORT_SYMBOL_GPL(mempool_free_bulk);

- Eric


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/9] fscrypt: pass a real sector_t to fscrypt_zeroout_range_inline_crypt
  2025-10-31  9:34 ` [PATCH 4/9] fscrypt: pass a real sector_t to fscrypt_zeroout_range_inline_crypt Christoph Hellwig
@ 2025-11-07  3:55   ` Eric Biggers
  2025-11-07 12:07     ` Christoph Hellwig
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Biggers @ 2025-11-07  3:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Vlastimil Babka, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Harry Yoo, linux-block,
	linux-fsdevel, linux-fscrypt, linux-mm

On Fri, Oct 31, 2025 at 10:34:34AM +0100, Christoph Hellwig wrote:
> While the pblk argument to fscrypt_zeroout_range_inline_crypt is
> declared as a sector_t it actually is interpreted as a logical block
> size unit, which is highly unusual.  Switch to passing the 512 byte
> units that sector_t is defined for.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/crypto/bio.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)

Looks fine, though of course fscrypt_zeroout_range() itself still takes
a 'sector_t pblk' argument.

Reviewed-by: Eric Biggers <ebiggers@kernel.org>

- Eric


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/9] fscrypt: keep multiple bios in flight in fscrypt_zeroout_range_inline_crypt
  2025-10-31  9:34 ` [PATCH 5/9] fscrypt: keep multiple bios in flight in fscrypt_zeroout_range_inline_crypt Christoph Hellwig
@ 2025-11-07  4:06   ` Eric Biggers
  0 siblings, 0 replies; 41+ messages in thread
From: Eric Biggers @ 2025-11-07  4:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Vlastimil Babka, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Harry Yoo, linux-block,
	linux-fsdevel, linux-fscrypt, linux-mm

On Fri, Oct 31, 2025 at 10:34:35AM +0100, Christoph Hellwig wrote:
> This should slightly improve performance for large zeroing operations,
> but more importantly prepares for blk-crypto refactoring that requires
> all fscrypt users to call submit_bio directly.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/crypto/bio.c | 84 +++++++++++++++++++++++++++++++------------------
>  1 file changed, 53 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/crypto/bio.c b/fs/crypto/bio.c
> index 68b0424d879a..e59d342b4240 100644
> --- a/fs/crypto/bio.c
> +++ b/fs/crypto/bio.c
> @@ -47,49 +47,71 @@ bool fscrypt_decrypt_bio(struct bio *bio)
>  }
>  EXPORT_SYMBOL(fscrypt_decrypt_bio);
>  
> +struct fscrypt_zero_done {
> +	atomic_t		pending;
> +	blk_status_t		status;
> +	struct completion	done;
> +};
> +
> +static void fscrypt_zeroout_range_done(struct fscrypt_zero_done *done)
> +{
> +	if (atomic_dec_and_test(&done->pending))
> +		complete(&done->done);
> +}
> +
> +static void fscrypt_zeroout_range_end_io(struct bio *bio)
> +{
> +	struct fscrypt_zero_done *done = bio->bi_private;
> +
> +	if (bio->bi_status)
> +		cmpxchg(&done->status, 0, bio->bi_status);
> +	fscrypt_zeroout_range_done(done);
> +	bio_put(bio);
> +}
> +
>  static int fscrypt_zeroout_range_inline_crypt(const struct inode *inode,
>  					      pgoff_t lblk, sector_t sector,
>  					      unsigned int len)
>  {
>  	const unsigned int blockbits = inode->i_blkbits;
>  	const unsigned int blocks_per_page = 1 << (PAGE_SHIFT - blockbits);
> -	struct bio *bio;
> -	int ret, err = 0;
> -	int num_pages = 0;
> +	struct fscrypt_zero_done done = {};
>  
> -	/* This always succeeds since __GFP_DIRECT_RECLAIM is set. */
> -	bio = bio_alloc(inode->i_sb->s_bdev, BIO_MAX_VECS, REQ_OP_WRITE,
> -			GFP_NOFS);
> +	atomic_set(&done.pending, 1);
> +	init_completion(&done.done);

This could use:

	struct fscrypt_zero_done done = {
		.pending = ATOMIC_INIT(1),
		.done = COMPLETION_INITIALIZER_ONSTACK(done.done),
	};

Either way:

Reviewed-by: Eric Biggers <ebiggers@kernel.org>

- Eric


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 8/9] blk-crypto: use on-stack skciphers for fallback en/decryption
  2025-10-31  9:34 ` [PATCH 8/9] blk-crypto: use on-stack skciphers for fallback en/decryption Christoph Hellwig
@ 2025-11-07  4:18   ` Eric Biggers
  2025-11-07 12:10     ` Christoph Hellwig
  2025-11-14  0:32   ` Eric Biggers
  1 sibling, 1 reply; 41+ messages in thread
From: Eric Biggers @ 2025-11-07  4:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Vlastimil Babka, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Harry Yoo, linux-block,
	linux-fsdevel, linux-fscrypt, linux-mm

On Fri, Oct 31, 2025 at 10:34:38AM +0100, Christoph Hellwig wrote:
> Allocating a skcipher dynamically can deadlock or cause unexpected
> I/O failures when called from writeback context.  Sidestep the
> allocation by using on-stack skciphers, similar to what the non
> blk-crypto fscrypt already does.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

It might be worth leaving a note in the commit message that this also
drops the incomplete support for asynchronous algorithms.  ("Incomplete"
in the sense that they could be used, but only synchronously.)

Also note that with asynchronous algorithms no being longer supported,
the code can actually be simplified further because the
DECLARE_CRYPTO_WAIT(wait) objects are no longer necessary.  The sequence
of crypto_sync_skcipher calls that is used should be similar to what
fscrypt_crypt_data_unit() does.

That could be done in a separate patch if you want, though.

> +void blk_crypto_fallback_encrypt_bio(struct bio *src_bio)
> +{
> +	struct bio_crypt_ctx *bc = src_bio->bi_crypt_context;
> +	struct blk_crypto_keyslot *slot;
> +	blk_status_t status;
> +
> +	if (!blk_crypto_fallback_bio_valid(src_bio))
> +		return;
> +
> +	status = blk_crypto_get_keyslot(blk_crypto_fallback_profile,
> +					bc->bc_key, &slot);
> +	if (status == BLK_STS_OK) {
> +		status = __blk_crypto_fallback_encrypt_bio(src_bio,
> +			blk_crypto_fallback_tfm(slot));
> +		blk_crypto_put_keyslot(slot);
> +	}
> +	if (status != BLK_STS_OK) {
> +		src_bio->bi_status = status;
> +		bio_endio(src_bio);
> +		return;
> +	}

Unnecessary return statement above.

>  /*
>   * The crypto API fallback's main decryption routine.
>   * Decrypts input bio in place, and calls bio_endio on the bio.
>   */
> -static void blk_crypto_fallback_decrypt_bio(struct work_struct *work)
> +static blk_status_t __blk_crypto_fallback_decrypt_bio(struct bio *bio,
> +		struct bio_crypt_ctx *bc, struct bvec_iter iter,
> +		struct crypto_sync_skcipher *tfm)

Comment above needs to be updated, since this function no longer calls
bio_endio().

- Eric


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 7/9] blk-crypto: handle the fallback above the block layer
  2025-10-31  9:34 ` [PATCH 7/9] blk-crypto: handle the fallback above the block layer Christoph Hellwig
@ 2025-11-07  4:42   ` Eric Biggers
  2025-11-07 12:10     ` Christoph Hellwig
  2025-11-14  0:37   ` Eric Biggers
  1 sibling, 1 reply; 41+ messages in thread
From: Eric Biggers @ 2025-11-07  4:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Vlastimil Babka, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Harry Yoo, linux-block,
	linux-fsdevel, linux-fscrypt, linux-mm

On Fri, Oct 31, 2025 at 10:34:37AM +0100, Christoph Hellwig wrote:
> diff --git a/include/linux/blk-crypto.h b/include/linux/blk-crypto.h
> index 58b0c5254a67..ffe815c09696 100644
> --- a/include/linux/blk-crypto.h
> +++ b/include/linux/blk-crypto.h
> @@ -171,6 +171,22 @@ static inline bool bio_has_crypt_ctx(struct bio *bio)
>  
>  #endif /* CONFIG_BLK_INLINE_ENCRYPTION */
>  
> +bool __blk_crypto_submit_bio(struct bio *bio);
> +
> +/**
> + * blk_crypto_submit_bio - Submit a bio using inline encryption
> + * @bio: bio to submit
> + *
> + * If the bio crypt context attached to @bio is supported by the underlying
> + * device's inline encryption hardware, just submit @bio.  Otherwise, try to
> + * perform en/decryption for this bio by falling back to the kernel crypto API.
> + */
> +static inline void blk_crypto_submit_bio(struct bio *bio)
> +{
> +	if (!bio_has_crypt_ctx(bio) || __blk_crypto_submit_bio(bio))
> +		submit_bio(bio);
> +}

So, the new model is that if you have a bio that might have a
bio_crypt_ctx, you always have to use blk_crypto_submit_bio() instead of
submit_bio()?

It looks like usually yes, but not always, because submit_bio() still
works with hardware inline encryption.  However, it also skips the
bio_crypt_check_alignment() check that was done before; now it happens
only in __blk_crypto_submit_bio().  So that creates an ambiguity about
whether that usage is allowed (if, hypothetically, a caller doesn't need
blk-crypto-fallback support).

Maybe the alignment check should be done both in submit_bio_noacct()
after verifying blk_crypto_config_supported_natively(), and in
__blk_crypto_submit_bio() after deciding to use the fallback?  Those
cases are exclusive, so the check would still happen just once per bio.

Either way, the kerneldoc needs to be improved to accurately document
what blk_crypto_submit_bio() does, when it should be called, and how it
differs from submit_bio().  This also deserves a mention in the "API
presented to users of the block layer" section of
Documentation/block/inline-encryption.rst.

(I'll take a closer look at this patch later.  It will take a bit more
time to go through the blk-crypto-fallback implementation.)

- Eric

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/9] mempool: update kerneldoc comments
  2025-11-07  3:26   ` Eric Biggers
@ 2025-11-07 12:02     ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-11-07 12:02 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Christoph Hellwig, Jens Axboe, Vlastimil Babka, Andrew Morton,
	Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

On Thu, Nov 06, 2025 at 07:26:48PM -0800, Eric Biggers wrote:
> > - * *never* fails when called from process contexts. (it might
> > - * fail if called from an IRQ context.)
> > - * Note: using __GFP_ZERO is not supported.
> > + * Note: This function only sleeps if the alloc_fn callback sleeps or returns
> > + * %NULL.  Using __GFP_ZERO is not supported.
> 
> Maybe put the note about __GFP_ZERO being unsupported directly in the
> description of @gfp_mask.

I'll give it a try.

> >   *
> > - * Return: pointer to the allocated element or %NULL on error.
> > + * Return: pointer to the allocated element or %NULL on error. This function
> > + * never returns %NULL when @gfp_mask allows sleeping.
> 
> Is "allows sleeping" exactly the same as "__GFP_DIRECT_RECLAIM is set"?

Yes.

> The latter is what the code actually checks for.

I'll see if I can make that more clear.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/9] mempool: add error injection support
  2025-11-07  3:29   ` Eric Biggers
@ 2025-11-07 12:04     ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-11-07 12:04 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Christoph Hellwig, Jens Axboe, Vlastimil Babka, Andrew Morton,
	Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

On Thu, Nov 06, 2025 at 07:29:00PM -0800, Eric Biggers wrote:
> > +static DECLARE_FAULT_ATTR(fail_mempool_alloc);
> > +
> > +static int __init mempool_faul_inject_init(void)
> > +{
> > +	return PTR_ERR_OR_ZERO(fault_create_debugfs_attr("fail_mempool_alloc",
> > +			NULL, &fail_mempool_alloc));
> > +}
> > +late_initcall(mempool_faul_inject_init);
> 
> Initcalls usually go at the bottom of the file.

For a generic init function yes.  But given that this is only for error
injection, I'd rather keep the error injection handling in one spot.

> 
> > +	if (should_fail_ex(&fail_mempool_alloc, 1, FAULT_NOWARN)) {
> 
> This doesn't build when CONFIG_FAULT_INJECTION=n.

Yeah, the buildbot already told me.  For some reason there is a
!CONFIG_FAULT_INJECTION stube for should_fail_ex, but the flags are
not defined in that case.  I'll add another patch to the series to
fix it.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/9] mempool: add mempool_{alloc,free}_bulk
  2025-11-07  3:52   ` Eric Biggers
@ 2025-11-07 12:06     ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-11-07 12:06 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Christoph Hellwig, Jens Axboe, Vlastimil Babka, Andrew Morton,
	Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

On Thu, Nov 06, 2025 at 07:52:07PM -0800, Eric Biggers wrote:
> > +int mempool_alloc_bulk_noprof(struct mempool *pool, void **elem,
> > +		unsigned int count, gfp_t gfp_mask, unsigned long caller_ip)
> 
> What exactly is the behavior on partial failures?  Is the return value 0
> or is it -ENOMEM, and is the array restored to its original state or
> might some elements have been allocated?

Right now it frees everything.  But as per the discussion with Vlastimil
I'll move to not allowing non-blocking allocations for multiple elements,
at which point the failure case just can't happen and that is sorted out.

> > +}
> >  EXPORT_SYMBOL(mempool_alloc_noprof);
> 
> How much overhead does this add to mempool_alloc(), which will continue
> to be the common case?  I wonder if it would be worthwhile to
> force-inline the bulk allocation function into it, so that it will get
> generate about the same code as before.

It's about 10 extra instructions looking at my profiles.  So I don't
think it matters, but if the maintainers prefer force inlining I
can do that.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/9] fscrypt: pass a real sector_t to fscrypt_zeroout_range_inline_crypt
  2025-11-07  3:55   ` Eric Biggers
@ 2025-11-07 12:07     ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-11-07 12:07 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Christoph Hellwig, Jens Axboe, Vlastimil Babka, Andrew Morton,
	Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

On Thu, Nov 06, 2025 at 07:55:33PM -0800, Eric Biggers wrote:
> On Fri, Oct 31, 2025 at 10:34:34AM +0100, Christoph Hellwig wrote:
> > While the pblk argument to fscrypt_zeroout_range_inline_crypt is
> > declared as a sector_t it actually is interpreted as a logical block
> > size unit, which is highly unusual.  Switch to passing the 512 byte
> > units that sector_t is defined for.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/crypto/bio.c | 9 ++++-----
> >  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> Looks fine, though of course fscrypt_zeroout_range() itself still takes
> a 'sector_t pblk' argument.

Yes.  The argument conventions for fscrypt are a bit odd in general.  I
actually cleaned all this up for an earlier version, but decided to keep
it minimal for this version:

https://git.infradead.org/?p=users/hch/misc.git;a=shortlog;h=refs/heads/fscrypt-experiments

I plan to bring the ext4/fscrypt cleanups in that branch back later.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 7/9] blk-crypto: handle the fallback above the block layer
  2025-11-07  4:42   ` Eric Biggers
@ 2025-11-07 12:10     ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-11-07 12:10 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Christoph Hellwig, Jens Axboe, Vlastimil Babka, Andrew Morton,
	Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

On Thu, Nov 06, 2025 at 08:42:13PM -0800, Eric Biggers wrote:
> So, the new model is that if you have a bio that might have a
> bio_crypt_ctx, you always have to use blk_crypto_submit_bio() instead of
> submit_bio()?

In general yes.

> 
> It looks like usually yes, but not always, because submit_bio() still
> works with hardware inline encryption.

It has to, as that is the interface into the block layer that is used
by blk_crypto_submit_bio.  But the intent is that all submissions go
through blk_crypto_submit_bio first.

> However, it also skips the
> bio_crypt_check_alignment() check that was done before; now it happens
> only in __blk_crypto_submit_bio().  So that creates an ambiguity about
> whether that usage is allowed (if, hypothetically, a caller doesn't need
> blk-crypto-fallback support).
> 
> Maybe the alignment check should be done both in submit_bio_noacct()
> after verifying blk_crypto_config_supported_natively(), and in
> __blk_crypto_submit_bio() after deciding to use the fallback?  Those
> cases are exclusive, so the check would still happen just once per bio.

We could do that.  Or I lift the checking into the core bio splitting
code at least for the hardware case to reduce the overhead.  I'll see
what works out better.

> Either way, the kerneldoc needs to be improved to accurately document
> what blk_crypto_submit_bio() does, when it should be called, and how it
> differs from submit_bio().  This also deserves a mention in the "API
> presented to users of the block layer" section of
> Documentation/block/inline-encryption.rst.

Ok.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 8/9] blk-crypto: use on-stack skciphers for fallback en/decryption
  2025-11-07  4:18   ` Eric Biggers
@ 2025-11-07 12:10     ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-11-07 12:10 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Christoph Hellwig, Jens Axboe, Vlastimil Babka, Andrew Morton,
	Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

On Thu, Nov 06, 2025 at 08:18:59PM -0800, Eric Biggers wrote:
> On Fri, Oct 31, 2025 at 10:34:38AM +0100, Christoph Hellwig wrote:
> > Allocating a skcipher dynamically can deadlock or cause unexpected
> > I/O failures when called from writeback context.  Sidestep the
> > allocation by using on-stack skciphers, similar to what the non
> > blk-crypto fscrypt already does.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> It might be worth leaving a note in the commit message that this also
> drops the incomplete support for asynchronous algorithms.  ("Incomplete"
> in the sense that they could be used, but only synchronously.)

sure.

> Also note that with asynchronous algorithms no being longer supported,
> the code can actually be simplified further because the
> DECLARE_CRYPTO_WAIT(wait) objects are no longer necessary.  The sequence
> of crypto_sync_skcipher calls that is used should be similar to what
> fscrypt_crypt_data_unit() does.

Ok.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/9] blk-crypto: optimize bio splitting in blk_crypto_fallback_encrypt_bio
  2025-10-31  9:34 ` [PATCH 6/9] blk-crypto: optimize bio splitting in blk_crypto_fallback_encrypt_bio Christoph Hellwig
@ 2025-11-14  0:22   ` Eric Biggers
  2025-11-14  5:56     ` Christoph Hellwig
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Biggers @ 2025-11-14  0:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Vlastimil Babka, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Harry Yoo, linux-block,
	linux-fsdevel, linux-fscrypt, linux-mm

On Fri, Oct 31, 2025 at 10:34:36AM +0100, Christoph Hellwig wrote:
> The current code in blk_crypto_fallback_encrypt_bio is inefficient and
> prone to deadlocks under memory pressure: It first walks to pass in
> plaintext bio to see how much of it can fit into a single encrypted
> bio using up to BIO_MAX_VEC PAGE_SIZE segments, and then allocates a
> plaintext clone that fits the size, only to allocate another bio for
> the ciphertext later.  While the plaintext clone uses a bioset to avoid
> deadlocks when allocations could fail, the ciphertex one uses bio_kmalloc
> which is a no-go in the file system I/O path.
> 
> Switch blk_crypto_fallback_encrypt_bio to walk the source plaintext bio
> while consuming bi_iter without cloning it, and instead allocate a
> ciphertext bio at the beginning and whenever we fille up the previous
> one.  The existing bio_set for the plaintext clones is reused for the
> ciphertext bios to remove the deadlock risk.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-crypto-fallback.c | 162 ++++++++++++++----------------------
>  1 file changed, 63 insertions(+), 99 deletions(-)
> 
> diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
> index 86b27f96051a..1f58010fb437 100644
> --- a/block/blk-crypto-fallback.c
> +++ b/block/blk-crypto-fallback.c
> @@ -152,35 +152,26 @@ static void blk_crypto_fallback_encrypt_endio(struct bio *enc_bio)
>  
>  	src_bio->bi_status = enc_bio->bi_status;

There can now be multiple enc_bios completing for the same src_bio, so
this needs something like:

	if (enc_bio->bi_status)
		cmpxchg(&src_bio->bi_status, 0, enc_bio->bi_status);

> -static struct bio *blk_crypto_fallback_clone_bio(struct bio *bio_src)
> +static struct bio *blk_crypto_alloc_enc_bio(struct bio *bio_src,
> +		unsigned int nr_segs)
>  {
> -	unsigned int nr_segs = bio_segments(bio_src);
> -	struct bvec_iter iter;
> -	struct bio_vec bv;
>  	struct bio *bio;
>  
> -	bio = bio_kmalloc(nr_segs, GFP_NOIO);
> -	if (!bio)
> -		return NULL;
> -	bio_init_inline(bio, bio_src->bi_bdev, nr_segs, bio_src->bi_opf);
> +	bio = bio_alloc_bioset(bio_src->bi_bdev, nr_segs, bio_src->bi_opf,
> +			GFP_NOIO, &crypto_bio_split);

Rename crypto_bio_split => enc_bio_set?

> @@ -257,34 +222,22 @@ static void blk_crypto_dun_to_iv(const u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE],
>   */
>  static bool blk_crypto_fallback_encrypt_bio(struct bio **bio_ptr)
>  {

I don't think this patch makes sense by itself, since it leaves the
bio_ptr argument that is used to return a single enc_bio.  It does get
updated later in the series, but it seems that additional change to how
this function is called should go earlier in the series.

> +	/* Encrypt each page in the origin bio */

Maybe origin => source, so that consistent terminology is used.

> +		if (++enc_idx == enc_bio->bi_max_vecs) {
> +			/*
> +			 * Each encrypted bio will call bio_endio in the
> +			 * completion handler, so ensure the remaining count
> +			 * matches the number of submitted bios.
> +			 */
> +			bio_inc_remaining(src_bio);
> +			submit_bio(enc_bio);

The above comment is a bit confusing and could be made clearer.  When we
get here for the first time for example, we increment remaining from 1
to 2.  It doesn't match the number of bios submitted so far, but rather
is one more than it.  The extra one pairs with the submit_bio() outside
the loop.  Maybe consider the following:

			/*
			 * For each additional encrypted bio submitted,
			 * increment the source bio's remaining count.  Each
			 * encrypted bio's completion handler calls bio_endio on
			 * the source bio, so this keeps the source bio from
			 * completing until the last encrypted bio does.
			 */

> +out_ioerror:
> +	while (enc_idx > 0)
> +		mempool_free(enc_bio->bi_io_vec[enc_idx--].bv_page,
> +			     blk_crypto_bounce_page_pool);
> +	bio_put(enc_bio);
> +	src_bio->bi_status = BLK_STS_IOERR;

This error path doesn't seem correct at all.  It would need to free the
full set of pages in enc_bio, not just the ones initialized so far.  It
would also need to use cmpxchg() to correctly set an error on the
src_bio considering that blk_crypto_fallback_encrypt_endio() be trying
to do it concurrently too, and then call bio_endio() on it.

(It's annoying that encryption errors need to be handled at all.  When I
eventually convert this to use lib/crypto/, the encryption functions are
just going to return void.  But for now this is using the traditional
API, which can fail, so technically errors need to be handled...)

- Eric


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 8/9] blk-crypto: use on-stack skciphers for fallback en/decryption
  2025-10-31  9:34 ` [PATCH 8/9] blk-crypto: use on-stack skciphers for fallback en/decryption Christoph Hellwig
  2025-11-07  4:18   ` Eric Biggers
@ 2025-11-14  0:32   ` Eric Biggers
  2025-11-14  5:57     ` Christoph Hellwig
  1 sibling, 1 reply; 41+ messages in thread
From: Eric Biggers @ 2025-11-14  0:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Vlastimil Babka, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Harry Yoo, linux-block,
	linux-fsdevel, linux-fscrypt, linux-mm

On Fri, Oct 31, 2025 at 10:34:38AM +0100, Christoph Hellwig wrote:
> Allocating a skcipher dynamically can deadlock or cause unexpected
> I/O failures when called from writeback context.  Sidestep the
> allocation by using on-stack skciphers, similar to what the non
> blk-crypto fscrypt already does.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

skcipher => skcipher request

> @@ -238,12 +223,12 @@ static void blk_crypto_dun_to_iv(const u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE],
>   * encryption, encrypts the input bio using crypto API and submits the bounce
>   * bio.
>   */
> -void blk_crypto_fallback_encrypt_bio(struct bio *src_bio)

The above comment needs to be updated too.  Maybe leave the static
function __blk_crypto_fallback_encrypt_bio() uncommented, and write an
updated comment for the global function
blk_crypto_fallback_encrypt_bio().  Maybe something like this:

/*
 * blk-crypto-fallback's encryption routine.  Encrypts the source bio's data
 * into a sequence of bounce bios (usually 1, but there can be multiple if there
 * are more than BIO_MAX_VECS pages).  Submits the bounce bios, then completes
 * the source bio once all the bounce bios are done.  This takes ownership of
 * the source bio, i.e. the caller mustn't continue with submission.
 */

But really that ought to go in the previous commit, not this one.

> +static void blk_crypto_fallback_decrypt_bio(struct work_struct *work)
> +{
> +	struct bio_fallback_crypt_ctx *f_ctx =
> +		container_of(work, struct bio_fallback_crypt_ctx, work);
> +	struct bio *bio = f_ctx->bio;
> +	struct bio_crypt_ctx *bc = &f_ctx->crypt_ctx;
> +	struct blk_crypto_keyslot *slot;
> +	blk_status_t status;
> +
> +	status = blk_crypto_get_keyslot(blk_crypto_fallback_profile,
> +					bc->bc_key, &slot);
> +	if (status == BLK_STS_OK) {
> +		status = __blk_crypto_fallback_decrypt_bio(f_ctx->bio,
> +				&f_ctx->crypt_ctx, f_ctx->crypt_iter,
> +				blk_crypto_fallback_tfm(slot));
> +		blk_crypto_put_keyslot(slot);
> +	}

This is referencing f_ctx->bio and f_ctx->crypt_ctx when they were
already loaded into local variables.  Either the local variables should
be used, or they should be removed and the fields always used.

- Eric

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 7/9] blk-crypto: handle the fallback above the block layer
  2025-10-31  9:34 ` [PATCH 7/9] blk-crypto: handle the fallback above the block layer Christoph Hellwig
  2025-11-07  4:42   ` Eric Biggers
@ 2025-11-14  0:37   ` Eric Biggers
  2025-11-14  5:56     ` Christoph Hellwig
  1 sibling, 1 reply; 41+ messages in thread
From: Eric Biggers @ 2025-11-14  0:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Vlastimil Babka, Andrew Morton, Christoph Lameter,
	David Rientjes, Roman Gushchin, Harry Yoo, linux-block,
	linux-fsdevel, linux-fscrypt, linux-mm

On Fri, Oct 31, 2025 at 10:34:37AM +0100, Christoph Hellwig wrote:
> -/**
> - * blk_crypto_fallback_bio_prep - Prepare a bio to use fallback en/decryption
> - *
> - * @bio_ptr: pointer to the bio to prepare
> - *
> - * If bio is doing a WRITE operation, this splits the bio into two parts if it's
> - * too big (see blk_crypto_fallback_split_bio_if_needed()). It then allocates a
> - * bounce bio for the first part, encrypts it, and updates bio_ptr to point to
> - * the bounce bio.
> - *
> - * For a READ operation, we mark the bio for decryption by using bi_private and
> - * bi_end_io.
> - *
> - * In either case, this function will make the bio look like a regular bio (i.e.
> - * as if no encryption context was ever specified) for the purposes of the rest
> - * of the stack except for blk-integrity (blk-integrity and blk-crypto are not
> - * currently supported together).
> - *
> - * Return: true on success. Sets bio->bi_status and returns false on error.
> +/*
> + * bio READ case: Set up a f_ctx in the bio's bi_private and set the bi_end_io
> + * appropriately to trigger decryption when the bio is ended.
>   */
> -bool blk_crypto_fallback_bio_prep(struct bio **bio_ptr)
> +bool blk_crypto_fallback_prep_decrypt_bio(struct bio *bio)

This omits some important details.  Maybe:

/*
 * bio READ case: Set up a fallback crypt context in the bio's bi_private, clear
 * the bio's original crypt context, and set bi_end_io appropriately to trigger
 * decryption when the bio is ended.  Returns true if bio submission should
 * continue, or false if aborted with bio_endio already called.
 */

> -/**
> - * __blk_crypto_bio_prep - Prepare bio for inline encryption
> - *
> - * @bio_ptr: pointer to original bio pointer
> - *
> - * If the bio crypt context provided for the bio is supported by the underlying
> - * device's inline encryption hardware, do nothing.
> - *
> - * Otherwise, try to perform en/decryption for this bio by falling back to the
> - * kernel crypto API. When the crypto API fallback is used for encryption,
> - * blk-crypto may choose to split the bio into 2 - the first one that will
> - * continue to be processed and the second one that will be resubmitted via
> - * submit_bio_noacct. A bounce bio will be allocated to encrypt the contents
> - * of the aforementioned "first one", and *bio_ptr will be updated to this
> - * bounce bio.
> - *
> - * Caller must ensure bio has bio_crypt_ctx.
> - *
> - * Return: true on success; false on error (and bio->bi_status will be set
> - *	   appropriately, and bio_endio() will have been called so bio
> - *	   submission should abort).
> - */
> -bool __blk_crypto_bio_prep(struct bio **bio_ptr)
> +bool __blk_crypto_submit_bio(struct bio *bio)

Similarly, this could at least use a comment about what the return value
means.  It's true if the caller should continue with submission, or
false if blk-crypto took ownership of the bio (either by completing the
bio right away or by arranging for it to be completed later).

- Eric


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/9] blk-crypto: optimize bio splitting in blk_crypto_fallback_encrypt_bio
  2025-11-14  0:22   ` Eric Biggers
@ 2025-11-14  5:56     ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-11-14  5:56 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Christoph Hellwig, Jens Axboe, Vlastimil Babka, Andrew Morton,
	Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

On Thu, Nov 13, 2025 at 04:22:10PM -0800, Eric Biggers wrote:
> > --- a/block/blk-crypto-fallback.c
> > +++ b/block/blk-crypto-fallback.c
> > @@ -152,35 +152,26 @@ static void blk_crypto_fallback_encrypt_endio(struct bio *enc_bio)
> >  
> >  	src_bio->bi_status = enc_bio->bi_status;
> 
> There can now be multiple enc_bios completing for the same src_bio, so
> this needs something like:
> 
> 	if (enc_bio->bi_status)
> 		cmpxchg(&src_bio->bi_status, 0, enc_bio->bi_status);

Yes.

> > -static struct bio *blk_crypto_fallback_clone_bio(struct bio *bio_src)
> > +static struct bio *blk_crypto_alloc_enc_bio(struct bio *bio_src,
> > +		unsigned int nr_segs)
> >  {
> > -	unsigned int nr_segs = bio_segments(bio_src);
> > -	struct bvec_iter iter;
> > -	struct bio_vec bv;
> >  	struct bio *bio;
> >  
> > -	bio = bio_kmalloc(nr_segs, GFP_NOIO);
> > -	if (!bio)
> > -		return NULL;
> > -	bio_init_inline(bio, bio_src->bi_bdev, nr_segs, bio_src->bi_opf);
> > +	bio = bio_alloc_bioset(bio_src->bi_bdev, nr_segs, bio_src->bi_opf,
> > +			GFP_NOIO, &crypto_bio_split);
> 
> Rename crypto_bio_split => enc_bio_set?

Sure.

> >  static bool blk_crypto_fallback_encrypt_bio(struct bio **bio_ptr)
> >  {
> 
> I don't think this patch makes sense by itself, since it leaves the
> bio_ptr argument that is used to return a single enc_bio.  It does get
> updated later in the series, but it seems that additional change to how
> this function is called should go earlier in the series.

I'll look into it.

> 
> > +	/* Encrypt each page in the origin bio */
> 
> Maybe origin => source, so that consistent terminology is used.

Ok.

> 
> > +		if (++enc_idx == enc_bio->bi_max_vecs) {
> > +			/*
> > +			 * Each encrypted bio will call bio_endio in the
> > +			 * completion handler, so ensure the remaining count
> > +			 * matches the number of submitted bios.
> > +			 */
> > +			bio_inc_remaining(src_bio);
> > +			submit_bio(enc_bio);
> 
> The above comment is a bit confusing and could be made clearer.  When we
> get here for the first time for example, we increment remaining from 1
> to 2.  It doesn't match the number of bios submitted so far, but rather
> is one more than it.  The extra one pairs with the submit_bio() outside
> the loop.  Maybe consider the following:
> 
> 			/*
> 			 * For each additional encrypted bio submitted,
> 			 * increment the source bio's remaining count.  Each
> 			 * encrypted bio's completion handler calls bio_endio on
> 			 * the source bio, so this keeps the source bio from
> 			 * completing until the last encrypted bio does.
> 			 */

Yeah.  The comment is a leftover from a previous version that worked a
little differently.

> > +out_ioerror:
> > +	while (enc_idx > 0)
> > +		mempool_free(enc_bio->bi_io_vec[enc_idx--].bv_page,
> > +			     blk_crypto_bounce_page_pool);
> > +	bio_put(enc_bio);
> > +	src_bio->bi_status = BLK_STS_IOERR;
> 
> This error path doesn't seem correct at all.  It would need to free the
> full set of pages in enc_bio, not just the ones initialized so far.

As of this patch the pages are allocated as we go, so I think this is
correct.

> It
> would also need to use cmpxchg() to correctly set an error on the
> src_bio considering that blk_crypto_fallback_encrypt_endio() be trying
> to do it concurrently too, and then call bio_endio() on it.

Yeah.

> (It's annoying that encryption errors need to be handled at all.  When I
> eventually convert this to use lib/crypto/, the encryption functions are
> just going to return void.  But for now this is using the traditional
> API, which can fail, so technically errors need to be handled...)

I can't wait for the library conversion to happen :)



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 7/9] blk-crypto: handle the fallback above the block layer
  2025-11-14  0:37   ` Eric Biggers
@ 2025-11-14  5:56     ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-11-14  5:56 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Christoph Hellwig, Jens Axboe, Vlastimil Babka, Andrew Morton,
	Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

On Thu, Nov 13, 2025 at 04:37:38PM -0800, Eric Biggers wrote:
> This omits some important details.  Maybe:
> 
> /*
>  * bio READ case: Set up a fallback crypt context in the bio's bi_private, clear
>  * the bio's original crypt context, and set bi_end_io appropriately to trigger
>  * decryption when the bio is ended.  Returns true if bio submission should
>  * continue, or false if aborted with bio_endio already called.
>  */

I'll incorporate it.

> > - */
> > -bool __blk_crypto_bio_prep(struct bio **bio_ptr)
> > +bool __blk_crypto_submit_bio(struct bio *bio)
> 
> Similarly, this could at least use a comment about what the return value
> means.  It's true if the caller should continue with submission, or
> false if blk-crypto took ownership of the bio (either by completing the
> bio right away or by arranging for it to be completed later).

Ok.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 8/9] blk-crypto: use on-stack skciphers for fallback en/decryption
  2025-11-14  0:32   ` Eric Biggers
@ 2025-11-14  5:57     ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-11-14  5:57 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Christoph Hellwig, Jens Axboe, Vlastimil Babka, Andrew Morton,
	Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	linux-block, linux-fsdevel, linux-fscrypt, linux-mm

On Thu, Nov 13, 2025 at 04:32:00PM -0800, Eric Biggers wrote:
> >   */
> > -void blk_crypto_fallback_encrypt_bio(struct bio *src_bio)
> 
> The above comment needs to be updated too.  Maybe leave the static
> function __blk_crypto_fallback_encrypt_bio() uncommented, and write an
> updated comment for the global function
> blk_crypto_fallback_encrypt_bio().  Maybe something like this:

Make sense.

> > +	struct bio *bio = f_ctx->bio;
> > +	struct bio_crypt_ctx *bc = &f_ctx->crypt_ctx;
> > +	struct blk_crypto_keyslot *slot;
> > +	blk_status_t status;
> > +
> > +	status = blk_crypto_get_keyslot(blk_crypto_fallback_profile,
> > +					bc->bc_key, &slot);
> > +	if (status == BLK_STS_OK) {
> > +		status = __blk_crypto_fallback_decrypt_bio(f_ctx->bio,
> > +				&f_ctx->crypt_ctx, f_ctx->crypt_iter,
> > +				blk_crypto_fallback_tfm(slot));
> > +		blk_crypto_put_keyslot(slot);
> > +	}
> 
> This is referencing f_ctx->bio and f_ctx->crypt_ctx when they were
> already loaded into local variables.  Either the local variables should
> be used, or they should be removed and the fields always used.

Yeah, I actually noticed that myself after sending out the series.



^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2025-11-14  5:57 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-31  9:34 move blk-crypto-fallback to sit above the block layer Christoph Hellwig
2025-10-31  9:34 ` [PATCH 1/9] mempool: update kerneldoc comments Christoph Hellwig
2025-11-05 14:02   ` Vlastimil Babka
2025-11-05 14:14     ` Vlastimil Babka
2025-11-07  3:26   ` Eric Biggers
2025-11-07 12:02     ` Christoph Hellwig
2025-10-31  9:34 ` [PATCH 2/9] mempool: add error injection support Christoph Hellwig
2025-11-05 14:04   ` Vlastimil Babka
2025-11-07  3:29   ` Eric Biggers
2025-11-07 12:04     ` Christoph Hellwig
2025-10-31  9:34 ` [PATCH 3/9] mempool: add mempool_{alloc,free}_bulk Christoph Hellwig
2025-11-05 15:04   ` Vlastimil Babka
2025-11-06 14:13     ` Christoph Hellwig
2025-11-06 14:27       ` Vlastimil Babka
2025-11-06 14:48         ` Christoph Hellwig
2025-11-06 14:57           ` Vlastimil Babka
2025-11-06 15:00             ` Christoph Hellwig
2025-11-06 15:09               ` Vlastimil Babka
2025-11-07  3:52   ` Eric Biggers
2025-11-07 12:06     ` Christoph Hellwig
2025-10-31  9:34 ` [PATCH 4/9] fscrypt: pass a real sector_t to fscrypt_zeroout_range_inline_crypt Christoph Hellwig
2025-11-07  3:55   ` Eric Biggers
2025-11-07 12:07     ` Christoph Hellwig
2025-10-31  9:34 ` [PATCH 5/9] fscrypt: keep multiple bios in flight in fscrypt_zeroout_range_inline_crypt Christoph Hellwig
2025-11-07  4:06   ` Eric Biggers
2025-10-31  9:34 ` [PATCH 6/9] blk-crypto: optimize bio splitting in blk_crypto_fallback_encrypt_bio Christoph Hellwig
2025-11-14  0:22   ` Eric Biggers
2025-11-14  5:56     ` Christoph Hellwig
2025-10-31  9:34 ` [PATCH 7/9] blk-crypto: handle the fallback above the block layer Christoph Hellwig
2025-11-07  4:42   ` Eric Biggers
2025-11-07 12:10     ` Christoph Hellwig
2025-11-14  0:37   ` Eric Biggers
2025-11-14  5:56     ` Christoph Hellwig
2025-10-31  9:34 ` [PATCH 8/9] blk-crypto: use on-stack skciphers for fallback en/decryption Christoph Hellwig
2025-11-07  4:18   ` Eric Biggers
2025-11-07 12:10     ` Christoph Hellwig
2025-11-14  0:32   ` Eric Biggers
2025-11-14  5:57     ` Christoph Hellwig
2025-10-31  9:34 ` [PATCH 9/9] blk-crypto: use mempool_alloc_bulk for encrypted bio page allocation Christoph Hellwig
2025-11-05 15:12   ` Vlastimil Babka
2025-11-06 14:01     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).