netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Biggers <ebiggers@kernel.org>
To: linux-crypto@vger.kernel.org
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH v2 29/29] crypto: scatterwalk - don't split at page boundaries when !HIGHMEM
Date: Sun, 29 Dec 2024 16:14:18 -0800	[thread overview]
Message-ID: <20241230001418.74739-30-ebiggers@kernel.org> (raw)
In-Reply-To: <20241230001418.74739-1-ebiggers@kernel.org>

From: Eric Biggers <ebiggers@google.com>

When !HIGHMEM, the kmap_local_page() in the scatterlist walker does not
actually map anything, and the address it returns is just the address
from the kernel's direct map, where each sg entry's data is virtually
contiguous.  To improve performance, stop unnecessarily clamping data
segments to page boundaries in this case.

For now, still limit segments to PAGE_SIZE.  This is needed to prevent
preemption from being disabled for too long when SIMD is used, and to
support the alignmask case which still uses a page-sized bounce buffer.

Even so, this change still helps a lot in cases where messages cross a
page boundary.  For example, testing IPsec with AES-GCM on x86_64, the
messages are 1424 bytes which is less than PAGE_SIZE, but on the Rx side
over a third cross a page boundary.  These ended up being processed in
three parts, with the middle part going through skcipher_next_slow which
uses a 16-byte bounce buffer.  That was causing a significant amount of
overhead which unnecessarily reduced the performance benefit of the new
x86_64 AES-GCM assembly code.  This change solves the problem; all these
messages now get passed to the assembly code in one part.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 crypto/skcipher.c            |  4 +-
 include/crypto/scatterwalk.h | 79 ++++++++++++++++++++++++++----------
 2 files changed, 59 insertions(+), 24 deletions(-)

diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 8f6b09377368..16db19663c3d 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -203,12 +203,12 @@ static int skcipher_next_fast(struct skcipher_walk *walk)
 {
 	unsigned long diff;
 
 	diff = offset_in_page(walk->in.offset) -
 	       offset_in_page(walk->out.offset);
-	diff |= (u8 *)scatterwalk_page(&walk->in) -
-		(u8 *)scatterwalk_page(&walk->out);
+	diff |= (u8 *)(sg_page(walk->in.sg) + (walk->in.offset >> PAGE_SHIFT)) -
+		(u8 *)(sg_page(walk->out.sg) + (walk->out.offset >> PAGE_SHIFT));
 
 	skcipher_map_src(walk);
 	walk->dst.virt.addr = walk->src.virt.addr;
 
 	if (diff) {
diff --git a/include/crypto/scatterwalk.h b/include/crypto/scatterwalk.h
index ac03fdf88b2a..3024adbdd443 100644
--- a/include/crypto/scatterwalk.h
+++ b/include/crypto/scatterwalk.h
@@ -47,28 +47,39 @@ static inline void scatterwalk_start_at_pos(struct scatter_walk *walk,
 	}
 	walk->sg = sg;
 	walk->offset = sg->offset + pos;
 }
 
-static inline unsigned int scatterwalk_pagelen(struct scatter_walk *walk)
-{
-	unsigned int len = walk->sg->offset + walk->sg->length - walk->offset;
-	unsigned int len_this_page = offset_in_page(~walk->offset) + 1;
-	return len_this_page > len ? len : len_this_page;
-}
-
 static inline unsigned int scatterwalk_clamp(struct scatter_walk *walk,
 					     unsigned int nbytes)
 {
+	unsigned int len_this_sg;
+	unsigned int limit;
+
 	if (walk->offset >= walk->sg->offset + walk->sg->length)
 		scatterwalk_start(walk, sg_next(walk->sg));
-	return min(nbytes, scatterwalk_pagelen(walk));
-}
+	len_this_sg = walk->sg->offset + walk->sg->length - walk->offset;
 
-static inline struct page *scatterwalk_page(struct scatter_walk *walk)
-{
-	return sg_page(walk->sg) + (walk->offset >> PAGE_SHIFT);
+	/*
+	 * HIGHMEM case: the page may have to be mapped into memory.  To avoid
+	 * the complexity of having to map multiple pages at once per sg entry,
+	 * clamp the returned length to not cross a page boundary.
+	 *
+	 * !HIGHMEM case: no mapping is needed; all pages of the sg entry are
+	 * already mapped contiguously in the kernel's direct map.  For improved
+	 * performance, allow the walker to return data segments that cross a
+	 * page boundary.  Do still cap the length to PAGE_SIZE, since some
+	 * users rely on that to avoid disabling preemption for too long when
+	 * using SIMD.  It's also needed for when skcipher_walk uses a bounce
+	 * page due to the data not being aligned to the algorithm's alignmask.
+	 */
+	if (IS_ENABLED(CONFIG_HIGHMEM))
+		limit = PAGE_SIZE - offset_in_page(walk->offset);
+	else
+		limit = PAGE_SIZE;
+
+	return min3(nbytes, len_this_sg, limit);
 }
 
 /*
  * Create a scatterlist that represents the remaining data in a walk.  Uses
  * chaining to reference the original scatterlist, so this uses at most two
@@ -84,19 +95,27 @@ static inline void scatterwalk_get_sglist(struct scatter_walk *walk,
 		    walk->sg->offset + walk->sg->length - walk->offset,
 		    walk->offset);
 	scatterwalk_crypto_chain(sg_out, sg_next(walk->sg), 2);
 }
 
-static inline void scatterwalk_unmap(void *vaddr)
-{
-	kunmap_local(vaddr);
-}
-
 static inline void *scatterwalk_map(struct scatter_walk *walk)
 {
-	return kmap_local_page(scatterwalk_page(walk)) +
-	       offset_in_page(walk->offset);
+	struct page *base_page = sg_page(walk->sg);
+
+	if (IS_ENABLED(CONFIG_HIGHMEM))
+		return kmap_local_page(base_page + (walk->offset >> PAGE_SHIFT)) +
+		       offset_in_page(walk->offset);
+	/*
+	 * When !HIGHMEM we allow the walker to return segments that span a page
+	 * boundary; see scatterwalk_clamp().  To make it clear that in this
+	 * case we're working in the linear buffer of the whole sg entry in the
+	 * kernel's direct map rather than within the mapped buffer of a single
+	 * page, compute the address as an offset from the page_address() of the
+	 * first page of the sg entry.  Either way the result is the address in
+	 * the direct map, but this makes it clearer what is really going on.
+	 */
+	return page_address(base_page) + walk->offset;
 }
 
 /**
  * scatterwalk_next() - Get the next data buffer in a scatterlist walk
  * @walk: the scatter_walk
@@ -113,10 +132,16 @@ static inline void *scatterwalk_next(struct scatter_walk *walk,
 {
 	*nbytes_ret = scatterwalk_clamp(walk, total);
 	return scatterwalk_map(walk);
 }
 
+static inline void scatterwalk_unmap(const void *vaddr)
+{
+	if (IS_ENABLED(CONFIG_HIGHMEM))
+		kunmap_local(vaddr);
+}
+
 static inline void scatterwalk_advance(struct scatter_walk *walk,
 				       unsigned int nbytes)
 {
 	walk->offset += nbytes;
 }
@@ -131,11 +156,11 @@ static inline void scatterwalk_advance(struct scatter_walk *walk,
  * Use this if the @vaddr was not written to, i.e. it is source data.
  */
 static inline void scatterwalk_done_src(struct scatter_walk *walk,
 					const void *vaddr, unsigned int nbytes)
 {
-	scatterwalk_unmap((void *)vaddr);
+	scatterwalk_unmap(vaddr);
 	scatterwalk_advance(walk, nbytes);
 }
 
 /**
  * scatterwalk_done_dst() - Finish one step of a walk of destination scatterlist
@@ -152,13 +177,23 @@ static inline void scatterwalk_done_dst(struct scatter_walk *walk,
 	scatterwalk_unmap(vaddr);
 	/*
 	 * Explicitly check ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE instead of just
 	 * relying on flush_dcache_page() being a no-op when not implemented,
 	 * since otherwise the BUG_ON in sg_page() does not get optimized out.
+	 * This also avoids having to consider whether the loop would get
+	 * reliably optimized out or not.
 	 */
-	if (ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE)
-		flush_dcache_page(scatterwalk_page(walk));
+	if (ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE) {
+		struct page *base_page, *start_page, *end_page, *page;
+
+		base_page = sg_page(walk->sg);
+		start_page = base_page + (walk->offset >> PAGE_SHIFT);
+		end_page = base_page + ((walk->offset + nbytes +
+					 PAGE_SIZE - 1) >> PAGE_SHIFT);
+		for (page = start_page; page < end_page; page++)
+			flush_dcache_page(page);
+	}
 	scatterwalk_advance(walk, nbytes);
 }
 
 void scatterwalk_skip(struct scatter_walk *walk, unsigned int nbytes);
 
-- 
2.47.1


  parent reply	other threads:[~2024-12-30  0:16 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-30  0:13 [PATCH v2 00/29] crypto: scatterlist handling improvements Eric Biggers
2024-12-30  0:13 ` [PATCH v2 01/29] crypto: skcipher - document skcipher_walk_done() and rename some vars Eric Biggers
2024-12-30  0:13 ` [PATCH v2 02/29] crypto: skcipher - remove unnecessary page alignment of bounce buffer Eric Biggers
2024-12-30  0:13 ` [PATCH v2 03/29] crypto: skcipher - remove redundant clamping to page size Eric Biggers
2024-12-30  0:13 ` [PATCH v2 04/29] crypto: skcipher - remove redundant check for SKCIPHER_WALK_SLOW Eric Biggers
2024-12-30  0:13 ` [PATCH v2 05/29] crypto: skcipher - fold skcipher_walk_skcipher() into skcipher_walk_virt() Eric Biggers
2024-12-30  0:13 ` [PATCH v2 06/29] crypto: skcipher - clean up initialization of skcipher_walk::flags Eric Biggers
2024-12-30  0:13 ` [PATCH v2 07/29] crypto: skcipher - optimize initializing skcipher_walk fields Eric Biggers
2024-12-30  0:13 ` [PATCH v2 08/29] crypto: skcipher - call cond_resched() directly Eric Biggers
2024-12-30  0:13 ` [PATCH v2 09/29] crypto: omap - switch from scatter_walk to plain offset Eric Biggers
2024-12-30  0:13 ` [PATCH v2 10/29] crypto: powerpc/p10-aes-gcm - simplify handling of linear associated data Eric Biggers
2025-01-02 11:50   ` Christophe Leroy
2025-01-02 17:24     ` Eric Biggers
2024-12-30  0:14 ` [PATCH v2 11/29] crypto: scatterwalk - move to next sg entry just in time Eric Biggers
2024-12-30  0:14 ` [PATCH v2 12/29] crypto: scatterwalk - add new functions for skipping data Eric Biggers
2024-12-30  0:14 ` [PATCH v2 13/29] crypto: scatterwalk - add new functions for iterating through data Eric Biggers
2024-12-30  0:14 ` [PATCH v2 14/29] crypto: scatterwalk - add new functions for copying data Eric Biggers
2024-12-30  0:14 ` [PATCH v2 15/29] crypto: scatterwalk - add scatterwalk_get_sglist() Eric Biggers
2024-12-30  0:14 ` [PATCH v2 16/29] crypto: skcipher - use scatterwalk_start_at_pos() Eric Biggers
2024-12-30  0:14 ` [PATCH v2 17/29] crypto: aegis - use the new scatterwalk functions Eric Biggers
2024-12-30  0:14 ` [PATCH v2 18/29] crypto: arm/ghash " Eric Biggers
2024-12-30  0:14 ` [PATCH v2 19/29] crypto: arm64 " Eric Biggers
2024-12-30  0:14 ` [PATCH v2 20/29] crypto: nx " Eric Biggers
2024-12-30  0:14 ` [PATCH v2 21/29] crypto: s390/aes-gcm " Eric Biggers
2025-01-08 15:06   ` Harald Freudenberger
2024-12-30  0:14 ` [PATCH v2 22/29] crypto: s5p-sss " Eric Biggers
2024-12-30  0:14 ` [PATCH v2 23/29] crypto: stm32 " Eric Biggers
2024-12-30  0:14 ` [PATCH v2 24/29] crypto: x86/aes-gcm " Eric Biggers
2024-12-30  0:14 ` [PATCH v2 25/29] crypto: x86/aegis " Eric Biggers
2024-12-30  0:14 ` [PATCH v2 26/29] net/tls: " Eric Biggers
2024-12-30  0:14 ` [PATCH v2 27/29] crypto: skcipher - " Eric Biggers
2024-12-30  0:14 ` [PATCH v2 28/29] crypto: scatterwalk - remove obsolete functions Eric Biggers
2024-12-30  0:14 ` Eric Biggers [this message]
2024-12-30  1:31 ` [PATCH v2 00/29] crypto: scatterlist handling improvements Eric Biggers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241230001418.74739-30-ebiggers@kernel.org \
    --to=ebiggers@kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).