linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 00/10] net: faster and simpler CRC32C computation
@ 2025-05-11  0:41 Eric Biggers
  2025-05-11  0:41 ` [PATCH net-next 01/10] net: introduce CONFIG_NET_CRC32C Eric Biggers
                   ` (11 more replies)
  0 siblings, 12 replies; 37+ messages in thread
From: Eric Biggers @ 2025-05-11  0:41 UTC (permalink / raw)
  To: netdev
  Cc: linux-nvme, linux-sctp, linux-rdma, linux-kernel, Daniel Borkmann,
	Marcelo Ricardo Leitner, Sagi Grimberg, Ard Biesheuvel

Update networking code that computes the CRC32C of packets to just call
crc32c() without unnecessary abstraction layers.  The result is faster
and simpler code.

Patches 1-7 add skb_crc32c() and remove the overly-abstracted and
inefficient __skb_checksum().

Patches 8-10 replace skb_copy_and_hash_datagram_iter() with
skb_copy_and_crc32c_datagram_iter(), eliminating the unnecessary use of
the crypto layer.  This unblocks the conversion of nvme-tcp to call
crc32c() directly instead of using the crypto layer, which patch 9 does.

I'm proposing that this series be taken through net-next for 6.16, but
patch 9 could use an ack from the NVME maintainers.

Eric Biggers (10):
  net: introduce CONFIG_NET_CRC32C
  net: add skb_crc32c()
  net: use skb_crc32c() in skb_crc32c_csum_help()
  RDMA/siw: use skb_crc32c() instead of __skb_checksum()
  sctp: use skb_crc32c() instead of __skb_checksum()
  net: fold __skb_checksum() into skb_checksum()
  lib/crc32: remove unused support for CRC32C combination
  net: add skb_copy_and_crc32c_datagram_iter()
  nvme-tcp: use crc32c() and skb_copy_and_crc32c_datagram_iter()
  net: remove skb_copy_and_hash_datagram_iter()

 drivers/infiniband/sw/siw/Kconfig |   1 +
 drivers/infiniband/sw/siw/siw.h   |  22 +----
 drivers/nvme/host/Kconfig         |   4 +-
 drivers/nvme/host/tcp.c           | 118 +++++++++-----------------
 include/linux/crc32.h             |  23 ------
 include/linux/skbuff.h            |  16 +---
 include/net/checksum.h            |  12 ---
 include/net/sctp/checksum.h       |  29 +------
 lib/crc32.c                       |   6 --
 lib/tests/crc_kunit.c             |   6 --
 net/Kconfig                       |   4 +
 net/core/datagram.c               |  34 ++++----
 net/core/dev.c                    |  10 +--
 net/core/skbuff.c                 | 132 ++++++++++++++++++------------
 net/netfilter/Kconfig             |   4 +-
 net/netfilter/ipvs/Kconfig        |   2 +-
 net/openvswitch/Kconfig           |   2 +-
 net/sched/Kconfig                 |   2 +-
 net/sctp/Kconfig                  |   2 +-
 net/sctp/offload.c                |   1 -
 20 files changed, 156 insertions(+), 274 deletions(-)


base-commit: 0b28182c73a3d013bcabbb890dc1070a8388f55a
-- 
2.49.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH net-next 01/10] net: introduce CONFIG_NET_CRC32C
  2025-05-11  0:41 [PATCH net-next 00/10] net: faster and simpler CRC32C computation Eric Biggers
@ 2025-05-11  0:41 ` Eric Biggers
  2025-05-11  0:41 ` [PATCH net-next 02/10] net: add skb_crc32c() Eric Biggers
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 37+ messages in thread
From: Eric Biggers @ 2025-05-11  0:41 UTC (permalink / raw)
  To: netdev
  Cc: linux-nvme, linux-sctp, linux-rdma, linux-kernel, Daniel Borkmann,
	Marcelo Ricardo Leitner, Sagi Grimberg, Ard Biesheuvel

From: Eric Biggers <ebiggers@google.com>

Add a hidden kconfig symbol NET_CRC32C that will group together the
functions that calculate CRC32C checksums of packets, so that these
don't have to be built into NET-enabled kernels that don't need them.

Make skb_crc32c_csum_help() (which is called only when IP_SCTP is
enabled) conditional on this symbol, and make IP_SCTP select it.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 net/Kconfig      | 4 ++++
 net/core/dev.c   | 2 ++
 net/sctp/Kconfig | 1 +
 3 files changed, 7 insertions(+)

diff --git a/net/Kconfig b/net/Kconfig
index 202cc595e5e6..5b71a52987d3 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -73,10 +73,14 @@ config NET_DEVMEM
 	depends on PAGE_POOL
 
 config NET_SHAPER
 	bool
 
+config NET_CRC32C
+	bool
+	select CRC32
+
 menu "Networking options"
 
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
 source "net/tls/Kconfig"
diff --git a/net/core/dev.c b/net/core/dev.c
index c9013632296f..51606b2d17bb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3592,10 +3592,11 @@ int skb_checksum_help(struct sk_buff *skb)
 out:
 	return ret;
 }
 EXPORT_SYMBOL(skb_checksum_help);
 
+#ifdef CONFIG_NET_CRC32C
 int skb_crc32c_csum_help(struct sk_buff *skb)
 {
 	__le32 crc32c_csum;
 	int ret = 0, offset, start;
 
@@ -3631,10 +3632,11 @@ int skb_crc32c_csum_help(struct sk_buff *skb)
 	skb_reset_csum_not_inet(skb);
 out:
 	return ret;
 }
 EXPORT_SYMBOL(skb_crc32c_csum_help);
+#endif /* CONFIG_NET_CRC32C */
 
 __be16 skb_network_protocol(struct sk_buff *skb, int *depth)
 {
 	__be16 type = skb->protocol;
 
diff --git a/net/sctp/Kconfig b/net/sctp/Kconfig
index d18a72df3654..3669ba351856 100644
--- a/net/sctp/Kconfig
+++ b/net/sctp/Kconfig
@@ -9,10 +9,11 @@ menuconfig IP_SCTP
 	depends on IPV6 || IPV6=n
 	select CRC32
 	select CRYPTO
 	select CRYPTO_HMAC
 	select CRYPTO_SHA1
+	select NET_CRC32C
 	select NET_UDP_TUNNEL
 	help
 	  Stream Control Transmission Protocol
 
 	  From RFC 2960 <http://www.ietf.org/rfc/rfc2960.txt>.

base-commit: 0b28182c73a3d013bcabbb890dc1070a8388f55a
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next 02/10] net: add skb_crc32c()
  2025-05-11  0:41 [PATCH net-next 00/10] net: faster and simpler CRC32C computation Eric Biggers
  2025-05-11  0:41 ` [PATCH net-next 01/10] net: introduce CONFIG_NET_CRC32C Eric Biggers
@ 2025-05-11  0:41 ` Eric Biggers
  2025-05-11  0:41 ` [PATCH net-next 03/10] net: use skb_crc32c() in skb_crc32c_csum_help() Eric Biggers
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 37+ messages in thread
From: Eric Biggers @ 2025-05-11  0:41 UTC (permalink / raw)
  To: netdev
  Cc: linux-nvme, linux-sctp, linux-rdma, linux-kernel, Daniel Borkmann,
	Marcelo Ricardo Leitner, Sagi Grimberg, Ard Biesheuvel

From: Eric Biggers <ebiggers@google.com>

Add skb_crc32c(), which calculates the CRC32C of a sk_buff.  It will
replace __skb_checksum(), which unnecessarily supports arbitrary
checksums.  Compared to __skb_checksum(), skb_crc32c():

   - Uses the correct type for CRC32C values (u32, not __wsum).

   - Does not require the caller to provide a skb_checksum_ops struct.

   - Is faster because it does not use indirect calls and does not use
     the very slow crc32c_combine().

According to commit 2817a336d4d5 ("net: skb_checksum: allow custom
update/combine for walking skb") which added __skb_checksum(), the
original motivation for the abstraction layer was to avoid code
duplication for CRC32C and other checksums in the future.  However:

   - No additional checksums showed up after CRC32C.  __skb_checksum()
     is only used with the "regular" net checksum and CRC32C.

   - Indirect calls are expensive.  Commit 2544af0344ba ("net: avoid
     indirect calls in L4 checksum calculation") worked around this
     using the INDIRECT_CALL_1 macro. But that only avoided the indirect
     call for the net checksum, and at the cost of an extra branch.

   - The checksums use different types (__wsum and u32), causing casts
     to be needed.

   - It made the checksums of fragments be combined (rather than
     chained) for both checksums, despite this being highly
     counterproductive for CRC32C due to how slow crc32c_combine() is.
     This can clearly be seen in commit 4c2f24549644 ("sctp: linearize
     early if it's not GSO") which tried to work around this performance
     bug.  With a dedicated function for each checksum, we can instead
     just use the proper strategy for each checksum.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 include/linux/skbuff.h |  1 +
 net/core/skbuff.c      | 73 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f3e72be6f634..33b33bb18aa6 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4192,10 +4192,11 @@ extern const struct skb_checksum_ops *crc32c_csum_stub __read_mostly;
 
 __wsum __skb_checksum(const struct sk_buff *skb, int offset, int len,
 		      __wsum csum, const struct skb_checksum_ops *ops);
 __wsum skb_checksum(const struct sk_buff *skb, int offset, int len,
 		    __wsum csum);
+u32 skb_crc32c(const struct sk_buff *skb, int offset, int len, u32 crc);
 
 static inline void * __must_check
 __skb_header_pointer(const struct sk_buff *skb, int offset, int len,
 		     const void *data, int hlen, void *buffer)
 {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d73ad79fe739..b9900cc16a24 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -62,10 +62,11 @@
 #include <linux/bitfield.h>
 #include <linux/if_vlan.h>
 #include <linux/mpls.h>
 #include <linux/kcov.h>
 #include <linux/iov_iter.h>
+#include <linux/crc32.h>
 
 #include <net/protocol.h>
 #include <net/dst.h>
 #include <net/sock.h>
 #include <net/checksum.h>
@@ -3626,10 +3627,82 @@ __wsum skb_copy_and_csum_bits(const struct sk_buff *skb, int offset,
 	BUG_ON(len);
 	return csum;
 }
 EXPORT_SYMBOL(skb_copy_and_csum_bits);
 
+#ifdef CONFIG_NET_CRC32C
+u32 skb_crc32c(const struct sk_buff *skb, int offset, int len, u32 crc)
+{
+	int start = skb_headlen(skb);
+	int i, copy = start - offset;
+	struct sk_buff *frag_iter;
+
+	if (copy > 0) {
+		copy = min(copy, len);
+		crc = crc32c(crc, skb->data + offset, copy);
+		len -= copy;
+		if (len == 0)
+			return crc;
+		offset += copy;
+	}
+
+	if (WARN_ON_ONCE(!skb_frags_readable(skb)))
+		return 0;
+
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		int end;
+		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+		WARN_ON(start > offset + len);
+
+		end = start + skb_frag_size(frag);
+		copy = end - offset;
+		if (copy > 0) {
+			u32 p_off, p_len, copied;
+			struct page *p;
+			u8 *vaddr;
+
+			copy = min(copy, len);
+			skb_frag_foreach_page(frag,
+					      skb_frag_off(frag) + offset - start,
+					      copy, p, p_off, p_len, copied) {
+				vaddr = kmap_atomic(p);
+				crc = crc32c(crc, vaddr + p_off, p_len);
+				kunmap_atomic(vaddr);
+			}
+			len -= copy;
+			if (len == 0)
+				return crc;
+			offset += copy;
+		}
+		start = end;
+	}
+
+	skb_walk_frags(skb, frag_iter) {
+		int end;
+
+		WARN_ON(start > offset + len);
+
+		end = start + frag_iter->len;
+		copy = end - offset;
+		if (copy > 0) {
+			copy = min(copy, len);
+			crc = skb_crc32c(frag_iter, offset - start, copy, crc);
+			len -= copy;
+			if (len == 0)
+				return crc;
+			offset += copy;
+		}
+		start = end;
+	}
+	BUG_ON(len);
+
+	return crc;
+}
+EXPORT_SYMBOL(skb_crc32c);
+#endif /* CONFIG_NET_CRC32C */
+
 __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len)
 {
 	__sum16 sum;
 
 	sum = csum_fold(skb_checksum(skb, 0, len, skb->csum));
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next 03/10] net: use skb_crc32c() in skb_crc32c_csum_help()
  2025-05-11  0:41 [PATCH net-next 00/10] net: faster and simpler CRC32C computation Eric Biggers
  2025-05-11  0:41 ` [PATCH net-next 01/10] net: introduce CONFIG_NET_CRC32C Eric Biggers
  2025-05-11  0:41 ` [PATCH net-next 02/10] net: add skb_crc32c() Eric Biggers
@ 2025-05-11  0:41 ` Eric Biggers
  2025-05-11  0:41 ` [PATCH net-next 04/10] RDMA/siw: use skb_crc32c() instead of __skb_checksum() Eric Biggers
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 37+ messages in thread
From: Eric Biggers @ 2025-05-11  0:41 UTC (permalink / raw)
  To: netdev
  Cc: linux-nvme, linux-sctp, linux-rdma, linux-kernel, Daniel Borkmann,
	Marcelo Ricardo Leitner, Sagi Grimberg, Ard Biesheuvel

From: Eric Biggers <ebiggers@google.com>

Instead of calling __skb_checksum() with a skb_checksum_ops struct that
does CRC32C, just call the new function skb_crc32c().  This is faster
and simpler.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 net/core/dev.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 51606b2d17bb..1eda2501f07b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3595,11 +3595,11 @@ int skb_checksum_help(struct sk_buff *skb)
 EXPORT_SYMBOL(skb_checksum_help);
 
 #ifdef CONFIG_NET_CRC32C
 int skb_crc32c_csum_help(struct sk_buff *skb)
 {
-	__le32 crc32c_csum;
+	u32 crc;
 	int ret = 0, offset, start;
 
 	if (skb->ip_summed != CHECKSUM_PARTIAL)
 		goto out;
 
@@ -3623,14 +3623,12 @@ int skb_crc32c_csum_help(struct sk_buff *skb)
 
 	ret = skb_ensure_writable(skb, offset + sizeof(__le32));
 	if (ret)
 		goto out;
 
-	crc32c_csum = cpu_to_le32(~__skb_checksum(skb, start,
-						  skb->len - start, ~(__u32)0,
-						  crc32c_csum_stub));
-	*(__le32 *)(skb->data + offset) = crc32c_csum;
+	crc = ~skb_crc32c(skb, start, skb->len - start, ~0);
+	*(__le32 *)(skb->data + offset) = cpu_to_le32(crc);
 	skb_reset_csum_not_inet(skb);
 out:
 	return ret;
 }
 EXPORT_SYMBOL(skb_crc32c_csum_help);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next 04/10] RDMA/siw: use skb_crc32c() instead of __skb_checksum()
  2025-05-11  0:41 [PATCH net-next 00/10] net: faster and simpler CRC32C computation Eric Biggers
                   ` (2 preceding siblings ...)
  2025-05-11  0:41 ` [PATCH net-next 03/10] net: use skb_crc32c() in skb_crc32c_csum_help() Eric Biggers
@ 2025-05-11  0:41 ` Eric Biggers
  2025-05-15 20:02   ` Bart Van Assche
  2025-05-19  9:04   ` Bernard Metzler
  2025-05-11  0:41 ` [PATCH net-next 05/10] sctp: " Eric Biggers
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 37+ messages in thread
From: Eric Biggers @ 2025-05-11  0:41 UTC (permalink / raw)
  To: netdev
  Cc: linux-nvme, linux-sctp, linux-rdma, linux-kernel, Daniel Borkmann,
	Marcelo Ricardo Leitner, Sagi Grimberg, Ard Biesheuvel

From: Eric Biggers <ebiggers@google.com>

Instead of calling __skb_checksum() with a skb_checksum_ops struct that
does CRC32C, just call the new function skb_crc32c().  This is faster
and simpler.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 drivers/infiniband/sw/siw/Kconfig |  1 +
 drivers/infiniband/sw/siw/siw.h   | 22 +---------------------
 2 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/drivers/infiniband/sw/siw/Kconfig b/drivers/infiniband/sw/siw/Kconfig
index ae4a953e2a03..186f182b80e7 100644
--- a/drivers/infiniband/sw/siw/Kconfig
+++ b/drivers/infiniband/sw/siw/Kconfig
@@ -1,10 +1,11 @@
 config RDMA_SIW
 	tristate "Software RDMA over TCP/IP (iWARP) driver"
 	depends on INET && INFINIBAND
 	depends on INFINIBAND_VIRT_DMA
 	select CRC32
+	select NET_CRC32C
 	help
 	This driver implements the iWARP RDMA transport over
 	the Linux TCP/IP network stack. It enables a system with a
 	standard Ethernet adapter to interoperate with a iWARP
 	adapter or with another system running the SIW driver.
diff --git a/drivers/infiniband/sw/siw/siw.h b/drivers/infiniband/sw/siw/siw.h
index 385067e07faf..d9e5a2e4c471 100644
--- a/drivers/infiniband/sw/siw/siw.h
+++ b/drivers/infiniband/sw/siw/siw.h
@@ -691,33 +691,13 @@ static inline void siw_crc_oneshot(const void *data, size_t len, u8 out[4])
 	siw_crc_init(&crc);
 	siw_crc_update(&crc, data, len);
 	return siw_crc_final(&crc, out);
 }
 
-static inline __wsum siw_csum_update(const void *buff, int len, __wsum sum)
-{
-	return (__force __wsum)crc32c((__force __u32)sum, buff, len);
-}
-
-static inline __wsum siw_csum_combine(__wsum csum, __wsum csum2, int offset,
-				      int len)
-{
-	return (__force __wsum)crc32c_combine((__force __u32)csum,
-					      (__force __u32)csum2, len);
-}
-
 static inline void siw_crc_skb(struct siw_rx_stream *srx, unsigned int len)
 {
-	const struct skb_checksum_ops siw_cs_ops = {
-		.update = siw_csum_update,
-		.combine = siw_csum_combine,
-	};
-	__wsum crc = (__force __wsum)srx->mpa_crc;
-
-	crc = __skb_checksum(srx->skb, srx->skb_offset, len, crc,
-			     &siw_cs_ops);
-	srx->mpa_crc = (__force u32)crc;
+	srx->mpa_crc = skb_crc32c(srx->skb, srx->skb_offset, len, srx->mpa_crc);
 }
 
 #define siw_dbg(ibdev, fmt, ...)                                               \
 	ibdev_dbg(ibdev, "%s: " fmt, __func__, ##__VA_ARGS__)
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next 05/10] sctp: use skb_crc32c() instead of __skb_checksum()
  2025-05-11  0:41 [PATCH net-next 00/10] net: faster and simpler CRC32C computation Eric Biggers
                   ` (3 preceding siblings ...)
  2025-05-11  0:41 ` [PATCH net-next 04/10] RDMA/siw: use skb_crc32c() instead of __skb_checksum() Eric Biggers
@ 2025-05-11  0:41 ` Eric Biggers
  2025-05-11  0:41 ` [PATCH net-next 06/10] net: fold __skb_checksum() into skb_checksum() Eric Biggers
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 37+ messages in thread
From: Eric Biggers @ 2025-05-11  0:41 UTC (permalink / raw)
  To: netdev
  Cc: linux-nvme, linux-sctp, linux-rdma, linux-kernel, Daniel Borkmann,
	Marcelo Ricardo Leitner, Sagi Grimberg, Ard Biesheuvel

From: Eric Biggers <ebiggers@google.com>

Make sctp_compute_cksum() just use the new function skb_crc32c(),
instead of calling __skb_checksum() with a skb_checksum_ops struct that
does CRC32C.  This is faster and simpler.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 include/net/sctp/checksum.h | 29 +++--------------------------
 net/netfilter/Kconfig       |  4 ++--
 net/netfilter/ipvs/Kconfig  |  2 +-
 net/openvswitch/Kconfig     |  2 +-
 net/sched/Kconfig           |  2 +-
 net/sctp/Kconfig            |  1 -
 net/sctp/offload.c          |  1 -
 7 files changed, 8 insertions(+), 33 deletions(-)

diff --git a/include/net/sctp/checksum.h b/include/net/sctp/checksum.h
index 291465c25810..654d37ec0402 100644
--- a/include/net/sctp/checksum.h
+++ b/include/net/sctp/checksum.h
@@ -13,51 +13,28 @@
  *
  * Written or modified by:
  *    Dinakaran Joseph
  *    Jon Grimm <jgrimm@us.ibm.com>
  *    Sridhar Samudrala <sri@us.ibm.com>
- *
- * Rewritten to use libcrc32c by:
  *    Vlad Yasevich <vladislav.yasevich@hp.com>
  */
 
 #ifndef __sctp_checksum_h__
 #define __sctp_checksum_h__
 
 #include <linux/types.h>
 #include <linux/sctp.h>
-#include <linux/crc32c.h>
-#include <linux/crc32.h>
-
-static inline __wsum sctp_csum_update(const void *buff, int len, __wsum sum)
-{
-	return (__force __wsum)crc32c((__force __u32)sum, buff, len);
-}
-
-static inline __wsum sctp_csum_combine(__wsum csum, __wsum csum2,
-				       int offset, int len)
-{
-	return (__force __wsum)crc32c_combine((__force __u32)csum,
-					      (__force __u32)csum2, len);
-}
-
-static const struct skb_checksum_ops sctp_csum_ops = {
-	.update  = sctp_csum_update,
-	.combine = sctp_csum_combine,
-};
 
 static inline __le32 sctp_compute_cksum(const struct sk_buff *skb,
 					unsigned int offset)
 {
 	struct sctphdr *sh = (struct sctphdr *)(skb->data + offset);
 	__le32 old = sh->checksum;
-	__wsum new;
+	u32 new;
 
 	sh->checksum = 0;
-	new = ~__skb_checksum(skb, offset, skb->len - offset, ~(__wsum)0,
-			      &sctp_csum_ops);
+	new = ~skb_crc32c(skb, offset, skb->len - offset, ~0);
 	sh->checksum = old;
-
-	return cpu_to_le32((__force __u32)new);
+	return cpu_to_le32(new);
 }
 
 #endif /* __sctp_checksum_h__ */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 3b2183fc7e56..2560416218d0 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -210,11 +210,11 @@ config NF_CT_PROTO_GRE
 
 config NF_CT_PROTO_SCTP
 	bool 'SCTP protocol connection tracking support'
 	depends on NETFILTER_ADVANCED
 	default y
-	select CRC32
+	select NET_CRC32C
 	help
 	  With this option enabled, the layer 3 independent connection
 	  tracking code will be able to do state tracking on SCTP connections.
 
 	  If unsure, say Y.
@@ -473,11 +473,11 @@ config NETFILTER_SYNPROXY
 
 endif # NF_CONNTRACK
 
 config NF_TABLES
 	select NETFILTER_NETLINK
-	select CRC32
+	select NET_CRC32C
 	tristate "Netfilter nf_tables support"
 	help
 	  nftables is the new packet classification framework that intends to
 	  replace the existing {ip,ip6,arp,eb}_tables infrastructure. It
 	  provides a pseudo-state machine with an extensible instruction-set
diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig
index 8c5b1fe12d07..c203252e856d 100644
--- a/net/netfilter/ipvs/Kconfig
+++ b/net/netfilter/ipvs/Kconfig
@@ -103,11 +103,11 @@ config	IP_VS_PROTO_AH
 	  This option enables support for load balancing AH (Authentication
 	  Header) transport protocol. Say Y if unsure.
 
 config  IP_VS_PROTO_SCTP
 	bool "SCTP load balancing support"
-	select CRC32
+	select NET_CRC32C
 	help
 	  This option enables support for load balancing SCTP transport
 	  protocol. Say Y if unsure.
 
 comment "IPVS scheduler"
diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index 5481bd561eb4..e6aaee92dba4 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -9,12 +9,12 @@ config OPENVSWITCH
 	depends on !NF_CONNTRACK || \
 		   (NF_CONNTRACK && ((!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6) && \
 				     (!NF_NAT || NF_NAT) && \
 				     (!NETFILTER_CONNCOUNT || NETFILTER_CONNCOUNT)))
 	depends on PSAMPLE || !PSAMPLE
-	select CRC32
 	select MPLS
+	select NET_CRC32C
 	select NET_MPLS_GSO
 	select DST_CACHE
 	select NET_NSH
 	select NF_CONNTRACK_OVS if NF_CONNTRACK
 	select NF_NAT_OVS if NF_NAT
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 9f0b3f943fca..ad914d2b2e22 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -794,11 +794,11 @@ config NET_ACT_SKBEDIT
 	  module will be called act_skbedit.
 
 config NET_ACT_CSUM
 	tristate "Checksum Updating"
 	depends on NET_CLS_ACT && INET
-	select CRC32
+	select NET_CRC32C
 	help
 	  Say Y here to update some common checksum after some direct
 	  packet alterations.
 
 	  To compile this code as a module, choose M here: the
diff --git a/net/sctp/Kconfig b/net/sctp/Kconfig
index 3669ba351856..24d5a35ce894 100644
--- a/net/sctp/Kconfig
+++ b/net/sctp/Kconfig
@@ -5,11 +5,10 @@
 
 menuconfig IP_SCTP
 	tristate "The SCTP Protocol"
 	depends on INET
 	depends on IPV6 || IPV6=n
-	select CRC32
 	select CRYPTO
 	select CRYPTO_HMAC
 	select CRYPTO_SHA1
 	select NET_CRC32C
 	select NET_UDP_TUNNEL
diff --git a/net/sctp/offload.c b/net/sctp/offload.c
index 502095173d88..e6f863c031b4 100644
--- a/net/sctp/offload.c
+++ b/net/sctp/offload.c
@@ -109,11 +109,10 @@ int __init sctp_offload_init(void)
 
 	ret = inet6_add_offload(&sctp6_offload, IPPROTO_SCTP);
 	if (ret)
 		goto ipv4;
 
-	crc32c_csum_stub = &sctp_csum_ops;
 	return ret;
 
 ipv4:
 	inet_del_offload(&sctp_offload, IPPROTO_SCTP);
 out:
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next 06/10] net: fold __skb_checksum() into skb_checksum()
  2025-05-11  0:41 [PATCH net-next 00/10] net: faster and simpler CRC32C computation Eric Biggers
                   ` (4 preceding siblings ...)
  2025-05-11  0:41 ` [PATCH net-next 05/10] sctp: " Eric Biggers
@ 2025-05-11  0:41 ` Eric Biggers
  2025-05-11  0:41 ` [PATCH net-next 07/10] lib/crc32: remove unused support for CRC32C combination Eric Biggers
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 37+ messages in thread
From: Eric Biggers @ 2025-05-11  0:41 UTC (permalink / raw)
  To: netdev
  Cc: linux-nvme, linux-sctp, linux-rdma, linux-kernel, Daniel Borkmann,
	Marcelo Ricardo Leitner, Sagi Grimberg, Ard Biesheuvel

From: Eric Biggers <ebiggers@google.com>

Now that the only remaining caller of __skb_checksum() is
skb_checksum(), fold __skb_checksum() into skb_checksum().  This makes
struct skb_checksum_ops unnecessary, so remove that too and simply do
the "regular" net checksum.  It also makes the wrapper functions
csum_partial_ext() and csum_block_add_ext() unnecessary, so remove those
too and just use the underlying functions.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 include/linux/skbuff.h |  9 -------
 include/net/checksum.h | 12 ---------
 net/core/skbuff.c      | 59 +++++-------------------------------------
 3 files changed, 7 insertions(+), 73 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 33b33bb18aa6..eac5c8dde4a5 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4181,19 +4181,10 @@ static inline int memcpy_from_msg(void *data, struct msghdr *msg, int len)
 static inline int memcpy_to_msg(struct msghdr *msg, void *data, int len)
 {
 	return copy_to_iter(data, len, &msg->msg_iter) == len ? 0 : -EFAULT;
 }
 
-struct skb_checksum_ops {
-	__wsum (*update)(const void *mem, int len, __wsum wsum);
-	__wsum (*combine)(__wsum csum, __wsum csum2, int offset, int len);
-};
-
-extern const struct skb_checksum_ops *crc32c_csum_stub __read_mostly;
-
-__wsum __skb_checksum(const struct sk_buff *skb, int offset, int len,
-		      __wsum csum, const struct skb_checksum_ops *ops);
 __wsum skb_checksum(const struct sk_buff *skb, int offset, int len,
 		    __wsum csum);
 u32 skb_crc32c(const struct sk_buff *skb, int offset, int len, u32 crc);
 
 static inline void * __must_check
diff --git a/include/net/checksum.h b/include/net/checksum.h
index 243f972267b8..e57986b173f8 100644
--- a/include/net/checksum.h
+++ b/include/net/checksum.h
@@ -96,16 +96,10 @@ static __always_inline __wsum
 csum_block_add(__wsum csum, __wsum csum2, int offset)
 {
 	return csum_add(csum, csum_shift(csum2, offset));
 }
 
-static __always_inline __wsum
-csum_block_add_ext(__wsum csum, __wsum csum2, int offset, int len)
-{
-	return csum_block_add(csum, csum2, offset);
-}
-
 static __always_inline __wsum
 csum_block_sub(__wsum csum, __wsum csum2, int offset)
 {
 	return csum_block_add(csum, ~csum2, offset);
 }
@@ -113,16 +107,10 @@ csum_block_sub(__wsum csum, __wsum csum2, int offset)
 static __always_inline __wsum csum_unfold(__sum16 n)
 {
 	return (__force __wsum)n;
 }
 
-static __always_inline
-__wsum csum_partial_ext(const void *buff, int len, __wsum sum)
-{
-	return csum_partial(buff, len, sum);
-}
-
 #define CSUM_MANGLED_0 ((__force __sum16)0xffff)
 
 static __always_inline void csum_replace_by_diff(__sum16 *sum, __wsum diff)
 {
 	*sum = csum_fold(csum_add(diff, ~csum_unfold(*sum)));
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b9900cc16a24..bfa892ccc7f7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3438,24 +3438,22 @@ int skb_store_bits(struct sk_buff *skb, int offset, const void *from, int len)
 	return -EFAULT;
 }
 EXPORT_SYMBOL(skb_store_bits);
 
 /* Checksum skb data. */
-__wsum __skb_checksum(const struct sk_buff *skb, int offset, int len,
-		      __wsum csum, const struct skb_checksum_ops *ops)
+__wsum skb_checksum(const struct sk_buff *skb, int offset, int len, __wsum csum)
 {
 	int start = skb_headlen(skb);
 	int i, copy = start - offset;
 	struct sk_buff *frag_iter;
 	int pos = 0;
 
 	/* Checksum header. */
 	if (copy > 0) {
 		if (copy > len)
 			copy = len;
-		csum = INDIRECT_CALL_1(ops->update, csum_partial_ext,
-				       skb->data + offset, copy, csum);
+		csum = csum_partial(skb->data + offset, copy, csum);
 		if ((len -= copy) == 0)
 			return csum;
 		offset += copy;
 		pos	= copy;
 	}
@@ -3481,17 +3479,13 @@ __wsum __skb_checksum(const struct sk_buff *skb, int offset, int len,
 
 			skb_frag_foreach_page(frag,
 					      skb_frag_off(frag) + offset - start,
 					      copy, p, p_off, p_len, copied) {
 				vaddr = kmap_atomic(p);
-				csum2 = INDIRECT_CALL_1(ops->update,
-							csum_partial_ext,
-							vaddr + p_off, p_len, 0);
+				csum2 = csum_partial(vaddr + p_off, p_len, 0);
 				kunmap_atomic(vaddr);
-				csum = INDIRECT_CALL_1(ops->combine,
-						       csum_block_add_ext, csum,
-						       csum2, pos, p_len);
+				csum = csum_block_add(csum, csum2, pos);
 				pos += p_len;
 			}
 
 			if (!(len -= copy))
 				return csum;
@@ -3508,14 +3502,13 @@ __wsum __skb_checksum(const struct sk_buff *skb, int offset, int len,
 		end = start + frag_iter->len;
 		if ((copy = end - offset) > 0) {
 			__wsum csum2;
 			if (copy > len)
 				copy = len;
-			csum2 = __skb_checksum(frag_iter, offset - start,
-					       copy, 0, ops);
-			csum = INDIRECT_CALL_1(ops->combine, csum_block_add_ext,
-					       csum, csum2, pos, copy);
+			csum2 = skb_checksum(frag_iter, offset - start, copy,
+					     0);
+			csum = csum_block_add(csum, csum2, pos);
 			if ((len -= copy) == 0)
 				return csum;
 			offset += copy;
 			pos    += copy;
 		}
@@ -3523,22 +3516,10 @@ __wsum __skb_checksum(const struct sk_buff *skb, int offset, int len,
 	}
 	BUG_ON(len);
 
 	return csum;
 }
-EXPORT_SYMBOL(__skb_checksum);
-
-__wsum skb_checksum(const struct sk_buff *skb, int offset,
-		    int len, __wsum csum)
-{
-	const struct skb_checksum_ops ops = {
-		.update  = csum_partial_ext,
-		.combine = csum_block_add_ext,
-	};
-
-	return __skb_checksum(skb, offset, len, csum, &ops);
-}
 EXPORT_SYMBOL(skb_checksum);
 
 /* Both of above in one bottle. */
 
 __wsum skb_copy_and_csum_bits(const struct sk_buff *skb, int offset,
@@ -3758,36 +3739,10 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb)
 
 	return sum;
 }
 EXPORT_SYMBOL(__skb_checksum_complete);
 
-static __wsum warn_crc32c_csum_update(const void *buff, int len, __wsum sum)
-{
-	net_warn_ratelimited(
-		"%s: attempt to compute crc32c without libcrc32c.ko\n",
-		__func__);
-	return 0;
-}
-
-static __wsum warn_crc32c_csum_combine(__wsum csum, __wsum csum2,
-				       int offset, int len)
-{
-	net_warn_ratelimited(
-		"%s: attempt to compute crc32c without libcrc32c.ko\n",
-		__func__);
-	return 0;
-}
-
-static const struct skb_checksum_ops default_crc32c_ops = {
-	.update  = warn_crc32c_csum_update,
-	.combine = warn_crc32c_csum_combine,
-};
-
-const struct skb_checksum_ops *crc32c_csum_stub __read_mostly =
-	&default_crc32c_ops;
-EXPORT_SYMBOL(crc32c_csum_stub);
-
  /**
  *	skb_zerocopy_headlen - Calculate headroom needed for skb_zerocopy()
  *	@from: source buffer
  *
  *	Calculates the amount of linear headroom needed in the 'to' skb passed
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next 07/10] lib/crc32: remove unused support for CRC32C combination
  2025-05-11  0:41 [PATCH net-next 00/10] net: faster and simpler CRC32C computation Eric Biggers
                   ` (5 preceding siblings ...)
  2025-05-11  0:41 ` [PATCH net-next 06/10] net: fold __skb_checksum() into skb_checksum() Eric Biggers
@ 2025-05-11  0:41 ` Eric Biggers
  2025-05-11  0:41 ` [PATCH net-next 08/10] net: add skb_copy_and_crc32c_datagram_iter() Eric Biggers
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 37+ messages in thread
From: Eric Biggers @ 2025-05-11  0:41 UTC (permalink / raw)
  To: netdev
  Cc: linux-nvme, linux-sctp, linux-rdma, linux-kernel, Daniel Borkmann,
	Marcelo Ricardo Leitner, Sagi Grimberg, Ard Biesheuvel

From: Eric Biggers <ebiggers@google.com>

crc32c_combine() and crc32c_shift() are no longer used (except by the
KUnit test that tests them), and their current implementation is very
slow.  Remove them.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 include/linux/crc32.h | 23 -----------------------
 lib/crc32.c           |  6 ------
 lib/tests/crc_kunit.c |  6 ------
 3 files changed, 35 deletions(-)

diff --git a/include/linux/crc32.h b/include/linux/crc32.h
index 69c2e8bb3782..7f7d0be8a0ac 100644
--- a/include/linux/crc32.h
+++ b/include/linux/crc32.h
@@ -74,33 +74,10 @@ u32 crc32_le_shift(u32 crc, size_t len);
 static inline u32 crc32_le_combine(u32 crc1, u32 crc2, size_t len2)
 {
 	return crc32_le_shift(crc1, len2) ^ crc2;
 }
 
-u32 crc32c_shift(u32 crc, size_t len);
-
-/**
- * crc32c_combine - Combine two crc32c check values into one. For two sequences
- *		    of bytes, seq1 and seq2 with lengths len1 and len2, crc32c()
- *		    check values were calculated for each, crc1 and crc2.
- *
- * @crc1: crc32c of the first block
- * @crc2: crc32c of the second block
- * @len2: length of the second block
- *
- * Return: The crc32c() check value of seq1 and seq2 concatenated, requiring
- *	   only crc1, crc2, and len2. Note: If seq_full denotes the concatenated
- *	   memory area of seq1 with seq2, and crc_full the crc32c() value of
- *	   seq_full, then crc_full == crc32c_combine(crc1, crc2, len2) when
- *	   crc_full was seeded with the same initializer as crc1, and crc2 seed
- *	   was 0. See also crc_combine_test().
- */
-static inline u32 crc32c_combine(u32 crc1, u32 crc2, size_t len2)
-{
-	return crc32c_shift(crc1, len2) ^ crc2;
-}
-
 #define crc32(seed, data, length)  crc32_le(seed, (unsigned char const *)(data), length)
 
 /*
  * Helpers for hash table generation of ethernet nics:
  *
diff --git a/lib/crc32.c b/lib/crc32.c
index fddd424ff224..ade48bbb0083 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -117,16 +117,10 @@ u32 crc32_le_shift(u32 crc, size_t len)
 {
 	return crc32_generic_shift(crc, len, CRC32_POLY_LE);
 }
 EXPORT_SYMBOL(crc32_le_shift);
 
-u32 crc32c_shift(u32 crc, size_t len)
-{
-	return crc32_generic_shift(crc, len, CRC32C_POLY_LE);
-}
-EXPORT_SYMBOL(crc32c_shift);
-
 u32 crc32_be_base(u32 crc, const u8 *p, size_t len)
 {
 	while (len--)
 		crc = (crc << 8) ^ crc32table_be[(crc >> 24) ^ *p++];
 	return crc;
diff --git a/lib/tests/crc_kunit.c b/lib/tests/crc_kunit.c
index 585c48b65cef..064c2d581557 100644
--- a/lib/tests/crc_kunit.c
+++ b/lib/tests/crc_kunit.c
@@ -389,21 +389,15 @@ static void crc32_be_benchmark(struct kunit *test)
 static u64 crc32c_wrapper(u64 crc, const u8 *p, size_t len)
 {
 	return crc32c(crc, p, len);
 }
 
-static u64 crc32c_combine_wrapper(u64 crc1, u64 crc2, size_t len2)
-{
-	return crc32c_combine(crc1, crc2, len2);
-}
-
 static const struct crc_variant crc_variant_crc32c = {
 	.bits = 32,
 	.le = true,
 	.poly = 0x82f63b78,
 	.func = crc32c_wrapper,
-	.combine_func = crc32c_combine_wrapper,
 };
 
 static void crc32c_test(struct kunit *test)
 {
 	crc_test(test, &crc_variant_crc32c);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next 08/10] net: add skb_copy_and_crc32c_datagram_iter()
  2025-05-11  0:41 [PATCH net-next 00/10] net: faster and simpler CRC32C computation Eric Biggers
                   ` (6 preceding siblings ...)
  2025-05-11  0:41 ` [PATCH net-next 07/10] lib/crc32: remove unused support for CRC32C combination Eric Biggers
@ 2025-05-11  0:41 ` Eric Biggers
  2025-05-13 21:41   ` Jakub Kicinski
  2025-05-11  0:41 ` [PATCH net-next 09/10] nvme-tcp: use crc32c() and skb_copy_and_crc32c_datagram_iter() Eric Biggers
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2025-05-11  0:41 UTC (permalink / raw)
  To: netdev
  Cc: linux-nvme, linux-sctp, linux-rdma, linux-kernel, Daniel Borkmann,
	Marcelo Ricardo Leitner, Sagi Grimberg, Ard Biesheuvel

From: Eric Biggers <ebiggers@google.com>

Since skb_copy_and_hash_datagram_iter() is used only with CRC32C, the
crypto_ahash abstraction provides no value.  Add
skb_copy_and_crc32c_datagram_iter() which just calls crc32c() directly.

This is faster and simpler.  It also doesn't have the weird dependency
issue where skb_copy_and_hash_datagram_iter() depends on
CONFIG_CRYPTO_HASH=y without that being expressed explicitly in the
kconfig (presumably because it was too heavyweight for NET to select).
The new function is conditional on the hidden boolean symbol NET_CRC32C,
which selects CRC32.  So it gets compiled only when something that
actually needs CRC32C packet checksums is enabled, it has no implicit
dependency, and it doesn't depend on the heavyweight crypto layer.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 include/linux/skbuff.h |  2 ++
 net/core/datagram.c    | 31 +++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index eac5c8dde4a5..0027f2977c02 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4126,10 +4126,12 @@ static inline int skb_copy_datagram_msg(const struct sk_buff *from, int offset,
 int skb_copy_and_csum_datagram_msg(struct sk_buff *skb, int hlen,
 				   struct msghdr *msg);
 int skb_copy_and_hash_datagram_iter(const struct sk_buff *skb, int offset,
 			   struct iov_iter *to, int len,
 			   struct ahash_request *hash);
+int skb_copy_and_crc32c_datagram_iter(const struct sk_buff *skb, int offset,
+				      struct iov_iter *to, int len, u32 *crcp);
 int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset,
 				 struct iov_iter *from, int len);
 int zerocopy_sg_from_iter(struct sk_buff *skb, struct iov_iter *frm);
 void skb_free_datagram(struct sock *sk, struct sk_buff *skb);
 int skb_kill_datagram(struct sock *sk, struct sk_buff *skb, unsigned int flags);
diff --git a/net/core/datagram.c b/net/core/datagram.c
index f0634f0cb834..47a6eef83162 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -50,10 +50,11 @@
 #include <linux/spinlock.h>
 #include <linux/slab.h>
 #include <linux/pagemap.h>
 #include <linux/iov_iter.h>
 #include <linux/indirect_call_wrapper.h>
+#include <linux/crc32.h>
 
 #include <net/protocol.h>
 #include <linux/skbuff.h>
 
 #include <net/checksum.h>
@@ -515,10 +516,40 @@ int skb_copy_and_hash_datagram_iter(const struct sk_buff *skb, int offset,
 	return __skb_datagram_iter(skb, offset, to, len, true,
 			hash_and_copy_to_iter, hash);
 }
 EXPORT_SYMBOL(skb_copy_and_hash_datagram_iter);
 
+#ifdef CONFIG_NET_CRC32C
+static size_t crc32c_and_copy_to_iter(const void *addr, size_t bytes,
+				      void *_crcp, struct iov_iter *i)
+{
+	u32 *crcp = _crcp;
+	size_t copied;
+
+	copied = copy_to_iter(addr, bytes, i);
+	*crcp = crc32c(*crcp, addr, copied);
+	return copied;
+}
+
+/**
+ *	skb_copy_and_crc32c_datagram_iter - Copy datagram to an iovec iterator
+ *		and update a CRC32C value.
+ *	@skb: buffer to copy
+ *	@offset: offset in the buffer to start copying from
+ *	@to: iovec iterator to copy to
+ *	@len: amount of data to copy from buffer to iovec
+ *	@crcp: pointer to CRC32C value to update
+ */
+int skb_copy_and_crc32c_datagram_iter(const struct sk_buff *skb, int offset,
+				      struct iov_iter *to, int len, u32 *crcp)
+{
+	return __skb_datagram_iter(skb, offset, to, len, true,
+				   crc32c_and_copy_to_iter, crcp);
+}
+EXPORT_SYMBOL(skb_copy_and_crc32c_datagram_iter);
+#endif /* CONFIG_NET_CRC32C */
+
 static size_t simple_copy_to_iter(const void *addr, size_t bytes,
 		void *data __always_unused, struct iov_iter *i)
 {
 	return copy_to_iter(addr, bytes, i);
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next 09/10] nvme-tcp: use crc32c() and skb_copy_and_crc32c_datagram_iter()
  2025-05-11  0:41 [PATCH net-next 00/10] net: faster and simpler CRC32C computation Eric Biggers
                   ` (7 preceding siblings ...)
  2025-05-11  0:41 ` [PATCH net-next 08/10] net: add skb_copy_and_crc32c_datagram_iter() Eric Biggers
@ 2025-05-11  0:41 ` Eric Biggers
  2025-05-16  4:36   ` Christoph Hellwig
  2025-05-17  9:58   ` Sagi Grimberg
  2025-05-11  0:41 ` [PATCH net-next 10/10] net: remove skb_copy_and_hash_datagram_iter() Eric Biggers
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 37+ messages in thread
From: Eric Biggers @ 2025-05-11  0:41 UTC (permalink / raw)
  To: netdev
  Cc: linux-nvme, linux-sctp, linux-rdma, linux-kernel, Daniel Borkmann,
	Marcelo Ricardo Leitner, Sagi Grimberg, Ard Biesheuvel

From: Eric Biggers <ebiggers@google.com>

Now that the crc32c() library function directly takes advantage of
architecture-specific optimizations and there also now exists a function
skb_copy_and_crc32c_datagram_iter(), it is unnecessary to go through the
crypto_ahash API.  Just use those functions.  This is much simpler, and
it also improves performance due to eliminating the crypto API overhead.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 drivers/nvme/host/Kconfig |   4 +-
 drivers/nvme/host/tcp.c   | 118 ++++++++++++--------------------------
 2 files changed, 39 insertions(+), 83 deletions(-)

diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
index 4d64b6935bb9..7dca58f0a237 100644
--- a/drivers/nvme/host/Kconfig
+++ b/drivers/nvme/host/Kconfig
@@ -82,13 +82,13 @@ config NVME_FC
 
 config NVME_TCP
 	tristate "NVM Express over Fabrics TCP host driver"
 	depends on INET
 	depends on BLOCK
+	select CRC32
+	select NET_CRC32C
 	select NVME_FABRICS
-	select CRYPTO
-	select CRYPTO_CRC32C
 	help
 	  This provides support for the NVMe over Fabrics protocol using
 	  the TCP transport.  This allows you to use remote block devices
 	  exported using the NVMe protocol set.
 
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index aba365f97cf6..d32e168036ec 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -6,19 +6,19 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/err.h>
+#include <linux/crc32.h>
 #include <linux/nvme-tcp.h>
 #include <linux/nvme-keyring.h>
 #include <net/sock.h>
 #include <net/tcp.h>
 #include <net/tls.h>
 #include <net/tls_prot.h>
 #include <net/handshake.h>
 #include <linux/blk-mq.h>
-#include <crypto/hash.h>
 #include <net/busy_poll.h>
 #include <trace/events/sock.h>
 
 #include "nvme.h"
 #include "fabrics.h"
@@ -166,12 +166,12 @@ struct nvme_tcp_queue {
 	bool			rd_enabled;
 
 	bool			hdr_digest;
 	bool			data_digest;
 	bool			tls_enabled;
-	struct ahash_request	*rcv_hash;
-	struct ahash_request	*snd_hash;
+	u32			rcv_crc;
+	u32			snd_crc;
 	__le32			exp_ddgst;
 	__le32			recv_ddgst;
 	struct completion       tls_complete;
 	int                     tls_err;
 	struct page_frag_cache	pf_cache;
@@ -454,36 +454,40 @@ nvme_tcp_fetch_request(struct nvme_tcp_queue *queue)
 
 	list_del(&req->entry);
 	return req;
 }
 
-static inline void nvme_tcp_ddgst_final(struct ahash_request *hash,
-		__le32 *dgst)
+static inline void nvme_tcp_ddgst_init(u32 *crcp)
 {
-	ahash_request_set_crypt(hash, NULL, (u8 *)dgst, 0);
-	crypto_ahash_final(hash);
+	*crcp = ~0;
 }
 
-static inline void nvme_tcp_ddgst_update(struct ahash_request *hash,
-		struct page *page, off_t off, size_t len)
+static inline void nvme_tcp_ddgst_update(u32 *crcp,
+		struct page *page, size_t off, size_t len)
 {
-	struct scatterlist sg;
+	page += off / PAGE_SIZE;
+	off %= PAGE_SIZE;
+	while (len) {
+		const void *vaddr = kmap_local_page(page);
+		size_t n = min(len, (size_t)PAGE_SIZE - off);
 
-	sg_init_table(&sg, 1);
-	sg_set_page(&sg, page, len, off);
-	ahash_request_set_crypt(hash, &sg, NULL, len);
-	crypto_ahash_update(hash);
+		*crcp = crc32c(*crcp, vaddr + off, n);
+		kunmap_local(vaddr);
+		page++;
+		off = 0;
+		len -= n;
+	}
 }
 
-static inline void nvme_tcp_hdgst(struct ahash_request *hash,
-		void *pdu, size_t len)
+static inline void nvme_tcp_ddgst_final(u32 *crcp, __le32 *ddgst)
 {
-	struct scatterlist sg;
+	*ddgst = cpu_to_le32(~*crcp);
+}
 
-	sg_init_one(&sg, pdu, len);
-	ahash_request_set_crypt(hash, &sg, pdu + len, len);
-	crypto_ahash_digest(hash);
+static inline void nvme_tcp_hdgst(void *pdu, size_t len)
+{
+	put_unaligned_le32(~crc32c(~0, pdu, len), pdu + len);
 }
 
 static int nvme_tcp_verify_hdgst(struct nvme_tcp_queue *queue,
 		void *pdu, size_t pdu_len)
 {
@@ -497,11 +501,11 @@ static int nvme_tcp_verify_hdgst(struct nvme_tcp_queue *queue,
 			nvme_tcp_queue_id(queue));
 		return -EPROTO;
 	}
 
 	recv_digest = *(__le32 *)(pdu + hdr->hlen);
-	nvme_tcp_hdgst(queue->rcv_hash, pdu, pdu_len);
+	nvme_tcp_hdgst(pdu, pdu_len);
 	exp_digest = *(__le32 *)(pdu + hdr->hlen);
 	if (recv_digest != exp_digest) {
 		dev_err(queue->ctrl->ctrl.device,
 			"header digest error: recv %#x expected %#x\n",
 			le32_to_cpu(recv_digest), le32_to_cpu(exp_digest));
@@ -524,11 +528,11 @@ static int nvme_tcp_check_ddgst(struct nvme_tcp_queue *queue, void *pdu)
 		dev_err(queue->ctrl->ctrl.device,
 			"queue %d: data digest flag is cleared\n",
 		nvme_tcp_queue_id(queue));
 		return -EPROTO;
 	}
-	crypto_ahash_init(queue->rcv_hash);
+	nvme_tcp_ddgst_init(&queue->rcv_crc);
 
 	return 0;
 }
 
 static void nvme_tcp_exit_request(struct blk_mq_tag_set *set,
@@ -924,12 +928,12 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb,
 		/* we can read only from what is left in this bio */
 		recv_len = min_t(size_t, recv_len,
 				iov_iter_count(&req->iter));
 
 		if (queue->data_digest)
-			ret = skb_copy_and_hash_datagram_iter(skb, *offset,
-				&req->iter, recv_len, queue->rcv_hash);
+			ret = skb_copy_and_crc32c_datagram_iter(skb, *offset,
+				&req->iter, recv_len, &queue->rcv_crc);
 		else
 			ret = skb_copy_datagram_iter(skb, *offset,
 					&req->iter, recv_len);
 		if (ret) {
 			dev_err(queue->ctrl->ctrl.device,
@@ -943,11 +947,12 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb,
 		queue->data_remaining -= recv_len;
 	}
 
 	if (!queue->data_remaining) {
 		if (queue->data_digest) {
-			nvme_tcp_ddgst_final(queue->rcv_hash, &queue->exp_ddgst);
+			nvme_tcp_ddgst_final(&queue->rcv_crc,
+					     &queue->exp_ddgst);
 			queue->ddgst_remaining = NVME_TCP_DIGEST_LENGTH;
 		} else {
 			if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) {
 				nvme_tcp_end_request(rq,
 						le16_to_cpu(req->status));
@@ -1145,11 +1150,11 @@ static int nvme_tcp_try_send_data(struct nvme_tcp_request *req)
 		ret = sock_sendmsg(queue->sock, &msg);
 		if (ret <= 0)
 			return ret;
 
 		if (queue->data_digest)
-			nvme_tcp_ddgst_update(queue->snd_hash, page,
+			nvme_tcp_ddgst_update(&queue->snd_crc, page,
 					offset, ret);
 
 		/*
 		 * update the request iterator except for the last payload send
 		 * in the request where we don't want to modify it as we may
@@ -1159,11 +1164,11 @@ static int nvme_tcp_try_send_data(struct nvme_tcp_request *req)
 			nvme_tcp_advance_req(req, ret);
 
 		/* fully successful last send in current PDU */
 		if (last && ret == len) {
 			if (queue->data_digest) {
-				nvme_tcp_ddgst_final(queue->snd_hash,
+				nvme_tcp_ddgst_final(&queue->snd_crc,
 					&req->ddgst);
 				req->state = NVME_TCP_SEND_DDGST;
 				req->offset = 0;
 			} else {
 				if (h2cdata_left)
@@ -1192,11 +1197,11 @@ static int nvme_tcp_try_send_cmd_pdu(struct nvme_tcp_request *req)
 		msg.msg_flags |= MSG_MORE;
 	else
 		msg.msg_flags |= MSG_EOR;
 
 	if (queue->hdr_digest && !req->offset)
-		nvme_tcp_hdgst(queue->snd_hash, pdu, sizeof(*pdu));
+		nvme_tcp_hdgst(pdu, sizeof(*pdu));
 
 	bvec_set_virt(&bvec, (void *)pdu + req->offset, len);
 	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);
 	ret = sock_sendmsg(queue->sock, &msg);
 	if (unlikely(ret <= 0))
@@ -1205,11 +1210,11 @@ static int nvme_tcp_try_send_cmd_pdu(struct nvme_tcp_request *req)
 	len -= ret;
 	if (!len) {
 		if (inline_data) {
 			req->state = NVME_TCP_SEND_DATA;
 			if (queue->data_digest)
-				crypto_ahash_init(queue->snd_hash);
+				nvme_tcp_ddgst_init(&queue->snd_crc);
 		} else {
 			nvme_tcp_done_send_req(queue);
 		}
 		return 1;
 	}
@@ -1227,11 +1232,11 @@ static int nvme_tcp_try_send_data_pdu(struct nvme_tcp_request *req)
 	u8 hdgst = nvme_tcp_hdgst_len(queue);
 	int len = sizeof(*pdu) - req->offset + hdgst;
 	int ret;
 
 	if (queue->hdr_digest && !req->offset)
-		nvme_tcp_hdgst(queue->snd_hash, pdu, sizeof(*pdu));
+		nvme_tcp_hdgst(pdu, sizeof(*pdu));
 
 	if (!req->h2cdata_left)
 		msg.msg_flags |= MSG_SPLICE_PAGES;
 
 	bvec_set_virt(&bvec, (void *)pdu + req->offset, len);
@@ -1242,11 +1247,11 @@ static int nvme_tcp_try_send_data_pdu(struct nvme_tcp_request *req)
 
 	len -= ret;
 	if (!len) {
 		req->state = NVME_TCP_SEND_DATA;
 		if (queue->data_digest)
-			crypto_ahash_init(queue->snd_hash);
+			nvme_tcp_ddgst_init(&queue->snd_crc);
 		return 1;
 	}
 	req->offset += ret;
 
 	return -EAGAIN;
@@ -1382,45 +1387,10 @@ static void nvme_tcp_io_work(struct work_struct *w)
 	} while (!time_after(jiffies, deadline)); /* quota is exhausted */
 
 	queue_work_on(queue->io_cpu, nvme_tcp_wq, &queue->io_work);
 }
 
-static void nvme_tcp_free_crypto(struct nvme_tcp_queue *queue)
-{
-	struct crypto_ahash *tfm = crypto_ahash_reqtfm(queue->rcv_hash);
-
-	ahash_request_free(queue->rcv_hash);
-	ahash_request_free(queue->snd_hash);
-	crypto_free_ahash(tfm);
-}
-
-static int nvme_tcp_alloc_crypto(struct nvme_tcp_queue *queue)
-{
-	struct crypto_ahash *tfm;
-
-	tfm = crypto_alloc_ahash("crc32c", 0, CRYPTO_ALG_ASYNC);
-	if (IS_ERR(tfm))
-		return PTR_ERR(tfm);
-
-	queue->snd_hash = ahash_request_alloc(tfm, GFP_KERNEL);
-	if (!queue->snd_hash)
-		goto free_tfm;
-	ahash_request_set_callback(queue->snd_hash, 0, NULL, NULL);
-
-	queue->rcv_hash = ahash_request_alloc(tfm, GFP_KERNEL);
-	if (!queue->rcv_hash)
-		goto free_snd_hash;
-	ahash_request_set_callback(queue->rcv_hash, 0, NULL, NULL);
-
-	return 0;
-free_snd_hash:
-	ahash_request_free(queue->snd_hash);
-free_tfm:
-	crypto_free_ahash(tfm);
-	return -ENOMEM;
-}
-
 static void nvme_tcp_free_async_req(struct nvme_tcp_ctrl *ctrl)
 {
 	struct nvme_tcp_request *async = &ctrl->async_req;
 
 	page_frag_free(async->pdu);
@@ -1449,13 +1419,10 @@ static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid)
 	unsigned int noreclaim_flag;
 
 	if (!test_and_clear_bit(NVME_TCP_Q_ALLOCATED, &queue->flags))
 		return;
 
-	if (queue->hdr_digest || queue->data_digest)
-		nvme_tcp_free_crypto(queue);
-
 	page_frag_cache_drain(&queue->pf_cache);
 
 	noreclaim_flag = memalloc_noreclaim_save();
 	/* ->sock will be released by fput() */
 	fput(queue->sock->file);
@@ -1865,25 +1832,17 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
 		}
 	}
 
 	queue->hdr_digest = nctrl->opts->hdr_digest;
 	queue->data_digest = nctrl->opts->data_digest;
-	if (queue->hdr_digest || queue->data_digest) {
-		ret = nvme_tcp_alloc_crypto(queue);
-		if (ret) {
-			dev_err(nctrl->device,
-				"failed to allocate queue %d crypto\n", qid);
-			goto err_sock;
-		}
-	}
 
 	rcv_pdu_size = sizeof(struct nvme_tcp_rsp_pdu) +
 			nvme_tcp_hdgst_len(queue);
 	queue->pdu = kmalloc(rcv_pdu_size, GFP_KERNEL);
 	if (!queue->pdu) {
 		ret = -ENOMEM;
-		goto err_crypto;
+		goto err_sock;
 	}
 
 	dev_dbg(nctrl->device, "connecting queue %d\n",
 			nvme_tcp_queue_id(queue));
 
@@ -1912,13 +1871,10 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
 
 err_init_connect:
 	kernel_sock_shutdown(queue->sock, SHUT_RDWR);
 err_rcv_pdu:
 	kfree(queue->pdu);
-err_crypto:
-	if (queue->hdr_digest || queue->data_digest)
-		nvme_tcp_free_crypto(queue);
 err_sock:
 	/* ->sock will be released by fput() */
 	fput(queue->sock->file);
 	queue->sock = NULL;
 err_destroy_mutex:
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next 10/10] net: remove skb_copy_and_hash_datagram_iter()
  2025-05-11  0:41 [PATCH net-next 00/10] net: faster and simpler CRC32C computation Eric Biggers
                   ` (8 preceding siblings ...)
  2025-05-11  0:41 ` [PATCH net-next 09/10] nvme-tcp: use crc32c() and skb_copy_and_crc32c_datagram_iter() Eric Biggers
@ 2025-05-11  0:41 ` Eric Biggers
  2025-05-11 16:30 ` [PATCH net-next 00/10] net: faster and simpler CRC32C computation Andrew Lunn
  2025-05-13 21:40 ` Jakub Kicinski
  11 siblings, 0 replies; 37+ messages in thread
From: Eric Biggers @ 2025-05-11  0:41 UTC (permalink / raw)
  To: netdev
  Cc: linux-nvme, linux-sctp, linux-rdma, linux-kernel, Daniel Borkmann,
	Marcelo Ricardo Leitner, Sagi Grimberg, Ard Biesheuvel

From: Eric Biggers <ebiggers@google.com>

Now that skb_copy_and_hash_datagram_iter() is no longer used, remove it.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 include/linux/skbuff.h |  4 ----
 net/core/datagram.c    | 37 -------------------------------------
 2 files changed, 41 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0027f2977c02..98cecd3a2fb8 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -272,11 +272,10 @@
 /* return minimum truesize of one skb containing X bytes of data */
 #define SKB_TRUESIZE(X) ((X) +						\
 			 SKB_DATA_ALIGN(sizeof(struct sk_buff)) +	\
 			 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
 
-struct ahash_request;
 struct net_device;
 struct scatterlist;
 struct pipe_inode_info;
 struct iov_iter;
 struct napi_struct;
@@ -4123,13 +4122,10 @@ static inline int skb_copy_datagram_msg(const struct sk_buff *from, int offset,
 {
 	return skb_copy_datagram_iter(from, offset, &msg->msg_iter, size);
 }
 int skb_copy_and_csum_datagram_msg(struct sk_buff *skb, int hlen,
 				   struct msghdr *msg);
-int skb_copy_and_hash_datagram_iter(const struct sk_buff *skb, int offset,
-			   struct iov_iter *to, int len,
-			   struct ahash_request *hash);
 int skb_copy_and_crc32c_datagram_iter(const struct sk_buff *skb, int offset,
 				      struct iov_iter *to, int len, u32 *crcp);
 int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset,
 				 struct iov_iter *from, int len);
 int zerocopy_sg_from_iter(struct sk_buff *skb, struct iov_iter *frm);
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 47a6eef83162..b83731f11fad 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -60,11 +60,10 @@
 #include <net/checksum.h>
 #include <net/sock.h>
 #include <net/tcp_states.h>
 #include <trace/events/skb.h>
 #include <net/busy_poll.h>
-#include <crypto/hash.h>
 
 /*
  *	Is a socket 'connection oriented' ?
  */
 static inline int connection_based(struct sock *sk)
@@ -480,46 +479,10 @@ static int __skb_datagram_iter(const struct sk_buff *skb, int offset,
 		goto fault;
 
 	return 0;
 }
 
-static size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
-				    struct iov_iter *i)
-{
-#ifdef CONFIG_CRYPTO_HASH
-	struct ahash_request *hash = hashp;
-	struct scatterlist sg;
-	size_t copied;
-
-	copied = copy_to_iter(addr, bytes, i);
-	sg_init_one(&sg, addr, copied);
-	ahash_request_set_crypt(hash, &sg, NULL, copied);
-	crypto_ahash_update(hash);
-	return copied;
-#else
-	return 0;
-#endif
-}
-
-/**
- *	skb_copy_and_hash_datagram_iter - Copy datagram to an iovec iterator
- *          and update a hash.
- *	@skb: buffer to copy
- *	@offset: offset in the buffer to start copying from
- *	@to: iovec iterator to copy to
- *	@len: amount of data to copy from buffer to iovec
- *      @hash: hash request to update
- */
-int skb_copy_and_hash_datagram_iter(const struct sk_buff *skb, int offset,
-			   struct iov_iter *to, int len,
-			   struct ahash_request *hash)
-{
-	return __skb_datagram_iter(skb, offset, to, len, true,
-			hash_and_copy_to_iter, hash);
-}
-EXPORT_SYMBOL(skb_copy_and_hash_datagram_iter);
-
 #ifdef CONFIG_NET_CRC32C
 static size_t crc32c_and_copy_to_iter(const void *addr, size_t bytes,
 				      void *_crcp, struct iov_iter *i)
 {
 	u32 *crcp = _crcp;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 00/10] net: faster and simpler CRC32C computation
  2025-05-11  0:41 [PATCH net-next 00/10] net: faster and simpler CRC32C computation Eric Biggers
                   ` (9 preceding siblings ...)
  2025-05-11  0:41 ` [PATCH net-next 10/10] net: remove skb_copy_and_hash_datagram_iter() Eric Biggers
@ 2025-05-11 16:30 ` Andrew Lunn
  2025-05-11 17:29   ` Eric Biggers
  2025-05-13 21:40 ` Jakub Kicinski
  11 siblings, 1 reply; 37+ messages in thread
From: Andrew Lunn @ 2025-05-11 16:30 UTC (permalink / raw)
  To: Eric Biggers
  Cc: netdev, linux-nvme, linux-sctp, linux-rdma, linux-kernel,
	Daniel Borkmann, Marcelo Ricardo Leitner, Sagi Grimberg,
	Ard Biesheuvel

On Sat, May 10, 2025 at 05:41:00PM -0700, Eric Biggers wrote:
> Update networking code that computes the CRC32C of packets to just call
> crc32c() without unnecessary abstraction layers.  The result is faster
> and simpler code.

Hi Eric

Do you have some benchmarks for these changes?

	Andrew

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 00/10] net: faster and simpler CRC32C computation
  2025-05-11 16:30 ` [PATCH net-next 00/10] net: faster and simpler CRC32C computation Andrew Lunn
@ 2025-05-11 17:29   ` Eric Biggers
  2025-05-11 21:22     ` Andrew Lunn
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2025-05-11 17:29 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, linux-nvme, linux-sctp, linux-rdma, linux-kernel,
	Daniel Borkmann, Marcelo Ricardo Leitner, Sagi Grimberg,
	Ard Biesheuvel

On Sun, May 11, 2025 at 06:30:25PM +0200, Andrew Lunn wrote:
> On Sat, May 10, 2025 at 05:41:00PM -0700, Eric Biggers wrote:
> > Update networking code that computes the CRC32C of packets to just call
> > crc32c() without unnecessary abstraction layers.  The result is faster
> > and simpler code.
> 
> Hi Eric
> 
> Do you have some benchmarks for these changes?
> 
> 	Andrew

Do you want benchmarks that show that removing the indirect calls makes things
faster?  I think that should be fairly self-evident by now after dealing with
retpoline for years, but I can provide more details if you need them.

Removing the inefficient use of crc32c_combine() makes a massive difference on
fragmented sk_buffs, since crc32c_combine() is so slow (much slower than the CRC
calculation itself).  However, reverting the workaround commit 4c2f24549644
("sctp: linearize early if it's not GSO") is beyond the scope of this patchset,
so for now the sctp stack doesn't actually call skb_crc32c() on fragmented
sk_buffs.  I can provide microbenchmarks of skb_crc32c() on a fragmented sk_buff
directly though, if you don't think it's clear already.

Of course, please also keep in mind the -118 line diffstat.  Even if it wasn't
faster we should just do it this way anyway.

- Eric

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 00/10] net: faster and simpler CRC32C computation
  2025-05-11 17:29   ` Eric Biggers
@ 2025-05-11 21:22     ` Andrew Lunn
  2025-05-11 21:45       ` Ard Biesheuvel
  0 siblings, 1 reply; 37+ messages in thread
From: Andrew Lunn @ 2025-05-11 21:22 UTC (permalink / raw)
  To: Eric Biggers
  Cc: netdev, linux-nvme, linux-sctp, linux-rdma, linux-kernel,
	Daniel Borkmann, Marcelo Ricardo Leitner, Sagi Grimberg,
	Ard Biesheuvel

On Sun, May 11, 2025 at 10:29:29AM -0700, Eric Biggers wrote:
> On Sun, May 11, 2025 at 06:30:25PM +0200, Andrew Lunn wrote:
> > On Sat, May 10, 2025 at 05:41:00PM -0700, Eric Biggers wrote:
> > > Update networking code that computes the CRC32C of packets to just call
> > > crc32c() without unnecessary abstraction layers.  The result is faster
> > > and simpler code.
> > 
> > Hi Eric
> > 
> > Do you have some benchmarks for these changes?
> > 
> > 	Andrew
> 
> Do you want benchmarks that show that removing the indirect calls makes things
> faster?  I think that should be fairly self-evident by now after dealing with
> retpoline for years, but I can provide more details if you need them.

I was think more like iperf before/after? Show the CPU load has gone
down without the bandwidth also going down.

Eric Dumazet has a T-Shirt with a commit message on the back which
increased network performance by X%. At the moment, there is nothing
T-Shirt quotable here.

	Andrew

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 00/10] net: faster and simpler CRC32C computation
  2025-05-11 21:22     ` Andrew Lunn
@ 2025-05-11 21:45       ` Ard Biesheuvel
  2025-05-11 23:07         ` Eric Biggers
  0 siblings, 1 reply; 37+ messages in thread
From: Ard Biesheuvel @ 2025-05-11 21:45 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Eric Biggers, netdev, linux-nvme, linux-sctp, linux-rdma,
	linux-kernel, Daniel Borkmann, Marcelo Ricardo Leitner,
	Sagi Grimberg

On Sun, 11 May 2025 at 23:22, Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Sun, May 11, 2025 at 10:29:29AM -0700, Eric Biggers wrote:
> > On Sun, May 11, 2025 at 06:30:25PM +0200, Andrew Lunn wrote:
> > > On Sat, May 10, 2025 at 05:41:00PM -0700, Eric Biggers wrote:
> > > > Update networking code that computes the CRC32C of packets to just call
> > > > crc32c() without unnecessary abstraction layers.  The result is faster
> > > > and simpler code.
> > >
> > > Hi Eric
> > >
> > > Do you have some benchmarks for these changes?
> > >
> > >     Andrew
> >
> > Do you want benchmarks that show that removing the indirect calls makes things
> > faster?  I think that should be fairly self-evident by now after dealing with
> > retpoline for years, but I can provide more details if you need them.
>
> I was think more like iperf before/after? Show the CPU load has gone
> down without the bandwidth also going down.
>
> Eric Dumazet has a T-Shirt with a commit message on the back which
> increased network performance by X%. At the moment, there is nothing
> T-Shirt quotable here.
>

I think that removing layers of redundant code to ultimately call the
same core CRC-32 implementation is a rather obvious win, especially
when indirect calls are involved. The diffstat speaks for itself, so
maybe you can print that on a T-shirt.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 00/10] net: faster and simpler CRC32C computation
  2025-05-11 21:45       ` Ard Biesheuvel
@ 2025-05-11 23:07         ` Eric Biggers
  2025-05-15 19:21           ` David Laight
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2025-05-11 23:07 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Andrew Lunn, netdev, linux-nvme, linux-sctp, linux-rdma,
	linux-kernel, Daniel Borkmann, Marcelo Ricardo Leitner,
	Sagi Grimberg

On Sun, May 11, 2025 at 11:45:14PM +0200, Ard Biesheuvel wrote:
> On Sun, 11 May 2025 at 23:22, Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > On Sun, May 11, 2025 at 10:29:29AM -0700, Eric Biggers wrote:
> > > On Sun, May 11, 2025 at 06:30:25PM +0200, Andrew Lunn wrote:
> > > > On Sat, May 10, 2025 at 05:41:00PM -0700, Eric Biggers wrote:
> > > > > Update networking code that computes the CRC32C of packets to just call
> > > > > crc32c() without unnecessary abstraction layers.  The result is faster
> > > > > and simpler code.
> > > >
> > > > Hi Eric
> > > >
> > > > Do you have some benchmarks for these changes?
> > > >
> > > >     Andrew
> > >
> > > Do you want benchmarks that show that removing the indirect calls makes things
> > > faster?  I think that should be fairly self-evident by now after dealing with
> > > retpoline for years, but I can provide more details if you need them.
> >
> > I was think more like iperf before/after? Show the CPU load has gone
> > down without the bandwidth also going down.
> >
> > Eric Dumazet has a T-Shirt with a commit message on the back which
> > increased network performance by X%. At the moment, there is nothing
> > T-Shirt quotable here.
> >
> 
> I think that removing layers of redundant code to ultimately call the
> same core CRC-32 implementation is a rather obvious win, especially
> when indirect calls are involved. The diffstat speaks for itself, so
> maybe you can print that on a T-shirt.

Agreed with Ard.  I did try doing some SCTP benchmarks with iperf3 earlier, but
they were very noisy and the CRC32C checksumming seemed to be lost in the noise.
There probably are some tricks to running reliable networking benchmarks; I'm
not a networking developer.  Regardless, this series is a clear win for the
CRC32C code, both from a simplicity and performance perspective.  It also fixes
the kconfig dependency issues.  That should be good enough, IMO.

In case it's helpful, here are some microbenchmarks of __skb_checksum (old) vs
skb_crc32c (new):

    Linear sk_buffs

        Length in bytes    __skb_checksum cycles    skb_crc32c cycles
        ===============    =====================    =================
                     64                       43                   18
                   1420                      204                  161
                  16384                     1735                 1642

    Nonlinear sk_buffs (even split between head and one fragment)

        Length in bytes    __skb_checksum cycles    skb_crc32c cycles
        ===============    =====================    =================
                     64                      579                   22
                   1420                     1506                  194
                  16384                     4365                 1682

So 1420-byte linear buffers (roughly the most common case) is 21% faster, but
other cases range from 5% to 2500% faster.  This was on an AMD Zen 5 processor,
where the kernel defaults to using IBRS instead of retpoline; I understand that
an even larger improvement may be seen when retpoline is enabled.

But again this is just the CRC32C checksumming performance.  I'm not claiming
measurable improvements to overall SCTP (or NVME-TLS) latency or throughput,
though it's possible that there are.

- Eric

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 00/10] net: faster and simpler CRC32C computation
  2025-05-11  0:41 [PATCH net-next 00/10] net: faster and simpler CRC32C computation Eric Biggers
                   ` (10 preceding siblings ...)
  2025-05-11 16:30 ` [PATCH net-next 00/10] net: faster and simpler CRC32C computation Andrew Lunn
@ 2025-05-13 21:40 ` Jakub Kicinski
  2025-05-15 18:10   ` Eric Biggers
  11 siblings, 1 reply; 37+ messages in thread
From: Jakub Kicinski @ 2025-05-13 21:40 UTC (permalink / raw)
  To: Eric Biggers
  Cc: netdev, linux-nvme, linux-sctp, linux-rdma, linux-kernel,
	Daniel Borkmann, Marcelo Ricardo Leitner, Sagi Grimberg,
	Ard Biesheuvel

On Sat, 10 May 2025 17:41:00 -0700 Eric Biggers wrote:
> I'm proposing that this series be taken through net-next for 6.16, but
> patch 9 could use an ack from the NVME maintainers.

And patch 4 from RDMA. Please repost once those were collected and with
the benchmarks you shared on Andrew's request. From networking PoV this
looks good.
-- 
pw-bot: defer

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 08/10] net: add skb_copy_and_crc32c_datagram_iter()
  2025-05-11  0:41 ` [PATCH net-next 08/10] net: add skb_copy_and_crc32c_datagram_iter() Eric Biggers
@ 2025-05-13 21:41   ` Jakub Kicinski
  2025-05-15 18:09     ` Eric Biggers
  0 siblings, 1 reply; 37+ messages in thread
From: Jakub Kicinski @ 2025-05-13 21:41 UTC (permalink / raw)
  To: Eric Biggers
  Cc: netdev, linux-nvme, linux-sctp, linux-rdma, linux-kernel,
	Daniel Borkmann, Marcelo Ricardo Leitner, Sagi Grimberg,
	Ard Biesheuvel

On Sat, 10 May 2025 17:41:08 -0700 Eric Biggers wrote:
> +/**
> + *	skb_copy_and_crc32c_datagram_iter - Copy datagram to an iovec iterator
> + *		and update a CRC32C value.
> + *	@skb: buffer to copy
> + *	@offset: offset in the buffer to start copying from
> + *	@to: iovec iterator to copy to
> + *	@len: amount of data to copy from buffer to iovec
> + *	@crcp: pointer to CRC32C value to update
> + */

When you repost please toss a Return: statement here.
kernel-doc -Wall is complaining

> +int skb_copy_and_crc32c_datagram_iter(const struct sk_buff *skb, int offset,
> +				      struct iov_iter *to, int len, u32 *crcp)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 08/10] net: add skb_copy_and_crc32c_datagram_iter()
  2025-05-13 21:41   ` Jakub Kicinski
@ 2025-05-15 18:09     ` Eric Biggers
  0 siblings, 0 replies; 37+ messages in thread
From: Eric Biggers @ 2025-05-15 18:09 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, linux-nvme, linux-sctp, linux-rdma, linux-kernel,
	Daniel Borkmann, Marcelo Ricardo Leitner, Sagi Grimberg,
	Ard Biesheuvel

On Tue, May 13, 2025 at 02:41:54PM -0700, Jakub Kicinski wrote:
> On Sat, 10 May 2025 17:41:08 -0700 Eric Biggers wrote:
> > +/**
> > + *	skb_copy_and_crc32c_datagram_iter - Copy datagram to an iovec iterator
> > + *		and update a CRC32C value.
> > + *	@skb: buffer to copy
> > + *	@offset: offset in the buffer to start copying from
> > + *	@to: iovec iterator to copy to
> > + *	@len: amount of data to copy from buffer to iovec
> > + *	@crcp: pointer to CRC32C value to update
> > + */
> 
> When you repost please toss a Return: statement here.
> kernel-doc -Wall is complaining

I'll plan to fold in the following:

diff --git a/net/core/datagram.c b/net/core/datagram.c
index 3599eda959917..fa87abb666324 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -538,10 +538,12 @@ static size_t crc32c_and_copy_to_iter(const void *addr, size_t bytes,
  *	@skb: buffer to copy
  *	@offset: offset in the buffer to start copying from
  *	@to: iovec iterator to copy to
  *	@len: amount of data to copy from buffer to iovec
  *	@crcp: pointer to CRC32C value to update
+ *
+ *	Return: 0 on success, -EFAULT if there was a fault during copy.
  */
 int skb_copy_and_crc32c_datagram_iter(const struct sk_buff *skb, int offset,
 				      struct iov_iter *to, int len, u32 *crcp)
 {
 	return __skb_datagram_iter(skb, offset, to, len, true,

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 00/10] net: faster and simpler CRC32C computation
  2025-05-13 21:40 ` Jakub Kicinski
@ 2025-05-15 18:10   ` Eric Biggers
  0 siblings, 0 replies; 37+ messages in thread
From: Eric Biggers @ 2025-05-15 18:10 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, linux-nvme, linux-sctp, linux-rdma, linux-kernel,
	Daniel Borkmann, Marcelo Ricardo Leitner, Sagi Grimberg,
	Ard Biesheuvel

On Tue, May 13, 2025 at 02:40:18PM -0700, Jakub Kicinski wrote:
> On Sat, 10 May 2025 17:41:00 -0700 Eric Biggers wrote:
> > I'm proposing that this series be taken through net-next for 6.16, but
> > patch 9 could use an ack from the NVME maintainers.
> 
> And patch 4 from RDMA. Please repost once those were collected and with
> the benchmarks you shared on Andrew's request. From networking PoV this
> looks good.
> -- 
> pw-bot: defer

Thanks.  linux-nvme and linux-rdma are both Cc'ed on this patchset.

- Eric

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 00/10] net: faster and simpler CRC32C computation
  2025-05-11 23:07         ` Eric Biggers
@ 2025-05-15 19:21           ` David Laight
  2025-05-15 19:50             ` Eric Biggers
  0 siblings, 1 reply; 37+ messages in thread
From: David Laight @ 2025-05-15 19:21 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Ard Biesheuvel, Andrew Lunn, netdev, linux-nvme, linux-sctp,
	linux-rdma, linux-kernel, Daniel Borkmann,
	Marcelo Ricardo Leitner, Sagi Grimberg

On Sun, 11 May 2025 16:07:50 -0700
Eric Biggers <ebiggers@kernel.org> wrote:

> On Sun, May 11, 2025 at 11:45:14PM +0200, Ard Biesheuvel wrote:
> > On Sun, 11 May 2025 at 23:22, Andrew Lunn <andrew@lunn.ch> wrote:  
> > >
> > > On Sun, May 11, 2025 at 10:29:29AM -0700, Eric Biggers wrote:  
> > > > On Sun, May 11, 2025 at 06:30:25PM +0200, Andrew Lunn wrote:  
> > > > > On Sat, May 10, 2025 at 05:41:00PM -0700, Eric Biggers wrote:  
> > > > > > Update networking code that computes the CRC32C of packets to just call
> > > > > > crc32c() without unnecessary abstraction layers.  The result is faster
> > > > > > and simpler code.  
> > > > >
> > > > > Hi Eric
> > > > >
> > > > > Do you have some benchmarks for these changes?
> > > > >
> > > > >     Andrew  
> > > >
> > > > Do you want benchmarks that show that removing the indirect calls makes things
> > > > faster?  I think that should be fairly self-evident by now after dealing with
> > > > retpoline for years, but I can provide more details if you need them.  
> > >
> > > I was think more like iperf before/after? Show the CPU load has gone
> > > down without the bandwidth also going down.
> > >
> > > Eric Dumazet has a T-Shirt with a commit message on the back which
> > > increased network performance by X%. At the moment, there is nothing
> > > T-Shirt quotable here.
> > >  
> > 
> > I think that removing layers of redundant code to ultimately call the
> > same core CRC-32 implementation is a rather obvious win, especially
> > when indirect calls are involved. The diffstat speaks for itself, so
> > maybe you can print that on a T-shirt.  
> 
> Agreed with Ard.  I did try doing some SCTP benchmarks with iperf3 earlier, but
> they were very noisy and the CRC32C checksumming seemed to be lost in the noise.
> There probably are some tricks to running reliable networking benchmarks; I'm
> not a networking developer.  Regardless, this series is a clear win for the
> CRC32C code, both from a simplicity and performance perspective.  It also fixes
> the kconfig dependency issues.  That should be good enough, IMO.
> 
> In case it's helpful, here are some microbenchmarks of __skb_checksum (old) vs
> skb_crc32c (new):
> 
>     Linear sk_buffs
> 
>         Length in bytes    __skb_checksum cycles    skb_crc32c cycles
>         ===============    =====================    =================
>                      64                       43                   18
>                    1420                      204                  161
>                   16384                     1735                 1642
> 
>     Nonlinear sk_buffs (even split between head and one fragment)
> 
>         Length in bytes    __skb_checksum cycles    skb_crc32c cycles
>         ===============    =====================    =================
>                      64                      579                   22
>                    1420                     1506                  194
>                   16384                     4365                 1682
> 
> So 1420-byte linear buffers (roughly the most common case) is 21% faster,

1420 bytes is unlikely to be the most common case - at least for some users.
SCTP is message oriented so the checksum is over a 'user message'.
A non-uncommon use is carrying mobile network messages (eg SMS) over the IP
network (instead of TDM links).
In that case the maximum data chunk size (what is being checksummed) is limited
to not much over 256 bytes - and a lot of data chunks will be smaller.
The actual difficulty is getting multiple data chunks into a single ethernet
packet without adding significant delays.

But the changes definitely improve things.

	David


> but other cases range from 5% to 2500% faster.  This was on an AMD Zen 5 processor,
> where the kernel defaults to using IBRS instead of retpoline; I understand that
> an even larger improvement may be seen when retpoline is enabled.
> 
> But again this is just the CRC32C checksumming performance.  I'm not claiming
> measurable improvements to overall SCTP (or NVME-TLS) latency or throughput,
> though it's possible that there are.
> 
> - Eric
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 00/10] net: faster and simpler CRC32C computation
  2025-05-15 19:21           ` David Laight
@ 2025-05-15 19:50             ` Eric Biggers
  0 siblings, 0 replies; 37+ messages in thread
From: Eric Biggers @ 2025-05-15 19:50 UTC (permalink / raw)
  To: David Laight
  Cc: Ard Biesheuvel, Andrew Lunn, netdev, linux-nvme, linux-sctp,
	linux-rdma, linux-kernel, Daniel Borkmann,
	Marcelo Ricardo Leitner, Sagi Grimberg

On Thu, May 15, 2025 at 08:21:36PM +0100, David Laight wrote:
> On Sun, 11 May 2025 16:07:50 -0700
> Eric Biggers <ebiggers@kernel.org> wrote:
> 
> > On Sun, May 11, 2025 at 11:45:14PM +0200, Ard Biesheuvel wrote:
> > > On Sun, 11 May 2025 at 23:22, Andrew Lunn <andrew@lunn.ch> wrote:  
> > > >
> > > > On Sun, May 11, 2025 at 10:29:29AM -0700, Eric Biggers wrote:  
> > > > > On Sun, May 11, 2025 at 06:30:25PM +0200, Andrew Lunn wrote:  
> > > > > > On Sat, May 10, 2025 at 05:41:00PM -0700, Eric Biggers wrote:  
> > > > > > > Update networking code that computes the CRC32C of packets to just call
> > > > > > > crc32c() without unnecessary abstraction layers.  The result is faster
> > > > > > > and simpler code.  
> > > > > >
> > > > > > Hi Eric
> > > > > >
> > > > > > Do you have some benchmarks for these changes?
> > > > > >
> > > > > >     Andrew  
> > > > >
> > > > > Do you want benchmarks that show that removing the indirect calls makes things
> > > > > faster?  I think that should be fairly self-evident by now after dealing with
> > > > > retpoline for years, but I can provide more details if you need them.  
> > > >
> > > > I was think more like iperf before/after? Show the CPU load has gone
> > > > down without the bandwidth also going down.
> > > >
> > > > Eric Dumazet has a T-Shirt with a commit message on the back which
> > > > increased network performance by X%. At the moment, there is nothing
> > > > T-Shirt quotable here.
> > > >  
> > > 
> > > I think that removing layers of redundant code to ultimately call the
> > > same core CRC-32 implementation is a rather obvious win, especially
> > > when indirect calls are involved. The diffstat speaks for itself, so
> > > maybe you can print that on a T-shirt.  
> > 
> > Agreed with Ard.  I did try doing some SCTP benchmarks with iperf3 earlier, but
> > they were very noisy and the CRC32C checksumming seemed to be lost in the noise.
> > There probably are some tricks to running reliable networking benchmarks; I'm
> > not a networking developer.  Regardless, this series is a clear win for the
> > CRC32C code, both from a simplicity and performance perspective.  It also fixes
> > the kconfig dependency issues.  That should be good enough, IMO.
> > 
> > In case it's helpful, here are some microbenchmarks of __skb_checksum (old) vs
> > skb_crc32c (new):
> > 
> >     Linear sk_buffs
> > 
> >         Length in bytes    __skb_checksum cycles    skb_crc32c cycles
> >         ===============    =====================    =================
> >                      64                       43                   18
> >                    1420                      204                  161
> >                   16384                     1735                 1642
> > 
> >     Nonlinear sk_buffs (even split between head and one fragment)
> > 
> >         Length in bytes    __skb_checksum cycles    skb_crc32c cycles
> >         ===============    =====================    =================
> >                      64                      579                   22
> >                    1420                     1506                  194
> >                   16384                     4365                 1682
> > 
> > So 1420-byte linear buffers (roughly the most common case) is 21% faster,
> 
> 1420 bytes is unlikely to be the most common case - at least for some users.
> SCTP is message oriented so the checksum is over a 'user message'.
> A non-uncommon use is carrying mobile network messages (eg SMS) over the IP
> network (instead of TDM links).
> In that case the maximum data chunk size (what is being checksummed) is limited
> to not much over 256 bytes - and a lot of data chunks will be smaller.
> The actual difficulty is getting multiple data chunks into a single ethernet
> packet without adding significant delays.
> 
> But the changes definitely improve things.

Interesting.  Of course, the data I gave shows that the proportional performance
increase is even greater on short packets than long ones.  I'll include those
tables when I resend the patchset and add a row for 256 bytes too.

- Eric

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 04/10] RDMA/siw: use skb_crc32c() instead of __skb_checksum()
  2025-05-11  0:41 ` [PATCH net-next 04/10] RDMA/siw: use skb_crc32c() instead of __skb_checksum() Eric Biggers
@ 2025-05-15 20:02   ` Bart Van Assche
  2025-05-15 20:12     ` Eric Biggers
  2025-05-19  9:04   ` Bernard Metzler
  1 sibling, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2025-05-15 20:02 UTC (permalink / raw)
  To: Eric Biggers, netdev, Bernard Metzler
  Cc: linux-nvme, linux-sctp, linux-rdma, linux-kernel, Daniel Borkmann,
	Marcelo Ricardo Leitner, Sagi Grimberg, Ard Biesheuvel

On 5/10/25 5:41 PM, Eric Biggers wrote:
> Instead of calling __skb_checksum() with a skb_checksum_ops struct that
> does CRC32C, just call the new function skb_crc32c().  This is faster
> and simpler.
Bernard, since you are the owner and author of the siw driver, please 
help with reviewing this patch.

Eric, do you already have a test case for the siw driver? If not,
multiple tests in the blktests framework use this driver intensively,
including the SRP tests that I wrote myself. See also
https://github.com/osandov/blktests.

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 04/10] RDMA/siw: use skb_crc32c() instead of __skb_checksum()
  2025-05-15 20:02   ` Bart Van Assche
@ 2025-05-15 20:12     ` Eric Biggers
  2025-05-16 10:42       ` Bernard Metzler
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2025-05-15 20:12 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: netdev, Bernard Metzler, linux-nvme, linux-sctp, linux-rdma,
	linux-kernel, Daniel Borkmann, Marcelo Ricardo Leitner,
	Sagi Grimberg, Ard Biesheuvel

On Thu, May 15, 2025 at 01:02:20PM -0700, Bart Van Assche wrote:
> On 5/10/25 5:41 PM, Eric Biggers wrote:
> > Instead of calling __skb_checksum() with a skb_checksum_ops struct that
> > does CRC32C, just call the new function skb_crc32c().  This is faster
> > and simpler.
> Bernard, since you are the owner and author of the siw driver, please help
> with reviewing this patch.
> 
> Eric, do you already have a test case for the siw driver? If not,
> multiple tests in the blktests framework use this driver intensively,
> including the SRP tests that I wrote myself. See also
> https://github.com/osandov/blktests.

No.  I'll try that out when I have a chance.

If the developers/maintainers of the driver could help test it, that would be a
lot easier though.  I've been cleaning up the CRC calls across the whole kernel,
and it gets time-consuming when individual subsystems insist on me running their
custom test suite(s) and providing subsystem-specific benchmarks.

- Eric

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 09/10] nvme-tcp: use crc32c() and skb_copy_and_crc32c_datagram_iter()
  2025-05-11  0:41 ` [PATCH net-next 09/10] nvme-tcp: use crc32c() and skb_copy_and_crc32c_datagram_iter() Eric Biggers
@ 2025-05-16  4:36   ` Christoph Hellwig
  2025-05-16  5:31     ` Eric Biggers
  2025-05-17  9:58   ` Sagi Grimberg
  1 sibling, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2025-05-16  4:36 UTC (permalink / raw)
  To: Eric Biggers
  Cc: netdev, linux-nvme, linux-sctp, linux-rdma, linux-kernel,
	Daniel Borkmann, Marcelo Ricardo Leitner, Sagi Grimberg,
	Ard Biesheuvel

On Sat, May 10, 2025 at 05:41:09PM -0700, Eric Biggers wrote:
> +static inline void nvme_tcp_ddgst_init(u32 *crcp)
>  {
> +	*crcp = ~0;
>  }

This helper looks really odd.  It coud just return the value, or
we could just assign it there using a seed define, e.g. this is what
XFS does:

#define XFS_CRC_SEED    (~(uint32_t)0)

nd that might in fact be worth lifting to common code with a good
comment on why all-d is used as the typical crc32 seed.

> +static inline void nvme_tcp_ddgst_final(u32 *crcp, __le32 *ddgst)
>  {
> +	*ddgst = cpu_to_le32(~*crcp);
> +}

Just return the big endian version calculated here?

> +static inline void nvme_tcp_hdgst(void *pdu, size_t len)
> +{
> +	put_unaligned_le32(~crc32c(~0, pdu, len), pdu + len);
>  }

This could also use the seed define mentioned above.

>  	recv_digest = *(__le32 *)(pdu + hdr->hlen);
> -	nvme_tcp_hdgst(queue->rcv_hash, pdu, pdu_len);
> +	nvme_tcp_hdgst(pdu, pdu_len);
>  	exp_digest = *(__le32 *)(pdu + hdr->hlen);
>  	if (recv_digest != exp_digest) {

This code looks really weird, as it samples the on-the-wire digest
first and then overwrites it.  I'd expect to just check the on-the-wire
on against one calculated in a local variable.

Sagi, any idea what is going on here?

Otherwise this looks great.  It's always good to get rid of the
horrendous crypto API calls.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 09/10] nvme-tcp: use crc32c() and skb_copy_and_crc32c_datagram_iter()
  2025-05-16  4:36   ` Christoph Hellwig
@ 2025-05-16  5:31     ` Eric Biggers
  2025-05-16  6:06       ` Christoph Hellwig
  2025-05-17 20:32       ` Sagi Grimberg
  0 siblings, 2 replies; 37+ messages in thread
From: Eric Biggers @ 2025-05-16  5:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: netdev, linux-nvme, linux-sctp, linux-rdma, linux-kernel,
	Daniel Borkmann, Marcelo Ricardo Leitner, Sagi Grimberg,
	Ard Biesheuvel

On Thu, May 15, 2025 at 09:36:00PM -0700, Christoph Hellwig wrote:
> On Sat, May 10, 2025 at 05:41:09PM -0700, Eric Biggers wrote:
> > +static inline void nvme_tcp_ddgst_init(u32 *crcp)
> >  {
> > +	*crcp = ~0;
> >  }
> 
> This helper looks really odd.  It coud just return the value, or
> we could just assign it there using a seed define, e.g. this is what
> XFS does:
> 
> #define XFS_CRC_SEED    (~(uint32_t)0)
> 
> nd that might in fact be worth lifting to common code with a good
> comment on why all-d is used as the typical crc32 seed.

Yes, there is CRC8_INIT_VALUE for crc8 but nothing for crc32.  It might be worth
adding a CRC32_INIT_VALUE to <linux/crc32.h>.  Though typically there's an
inversion at the end too, which can't be abstracted out the same way...  And
many users initialize with 0 instead of ~0, even though ~0 is the "right" way.

> > +static inline void nvme_tcp_ddgst_final(u32 *crcp, __le32 *ddgst)
> >  {
> > +	*ddgst = cpu_to_le32(~*crcp);
> > +}
> 
> Just return the big endian version calculated here?
> 
> > +static inline void nvme_tcp_hdgst(void *pdu, size_t len)
> > +{
> > +	put_unaligned_le32(~crc32c(~0, pdu, len), pdu + len);
> >  }
> 
> This could also use the seed define mentioned above.
> 
> >  	recv_digest = *(__le32 *)(pdu + hdr->hlen);
> > -	nvme_tcp_hdgst(queue->rcv_hash, pdu, pdu_len);
> > +	nvme_tcp_hdgst(pdu, pdu_len);
> >  	exp_digest = *(__le32 *)(pdu + hdr->hlen);
> >  	if (recv_digest != exp_digest) {
> 
> This code looks really weird, as it samples the on-the-wire digest
> first and then overwrites it.  I'd expect to just check the on-the-wire
> on against one calculated in a local variable.
> 
> Sagi, any idea what is going on here?

It was probably just written that way to reuse nvme_tcp_hdgst() which writes to
the hdgst field.

Anyway, I was trying to keep the translation to the new API as straightforward
as possible, but I'll take your suggestions.  The following is what I ended up
with, but I'll resend the series formally after giving a bit more time.

diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
index 4d64b6935bb91..7dca58f0a237b 100644
--- a/drivers/nvme/host/Kconfig
+++ b/drivers/nvme/host/Kconfig
@@ -82,13 +82,13 @@ config NVME_FC
 
 config NVME_TCP
 	tristate "NVM Express over Fabrics TCP host driver"
 	depends on INET
 	depends on BLOCK
+	select CRC32
+	select NET_CRC32C
 	select NVME_FABRICS
-	select CRYPTO
-	select CRYPTO_CRC32C
 	help
 	  This provides support for the NVMe over Fabrics protocol using
 	  the TCP transport.  This allows you to use remote block devices
 	  exported using the NVMe protocol set.
 
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index aba365f97cf6b..0c139d844422b 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -6,19 +6,19 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/err.h>
+#include <linux/crc32.h>
 #include <linux/nvme-tcp.h>
 #include <linux/nvme-keyring.h>
 #include <net/sock.h>
 #include <net/tcp.h>
 #include <net/tls.h>
 #include <net/tls_prot.h>
 #include <net/handshake.h>
 #include <linux/blk-mq.h>
-#include <crypto/hash.h>
 #include <net/busy_poll.h>
 #include <trace/events/sock.h>
 
 #include "nvme.h"
 #include "fabrics.h"
@@ -166,12 +166,12 @@ struct nvme_tcp_queue {
 	bool			rd_enabled;
 
 	bool			hdr_digest;
 	bool			data_digest;
 	bool			tls_enabled;
-	struct ahash_request	*rcv_hash;
-	struct ahash_request	*snd_hash;
+	u32			rcv_crc;
+	u32			snd_crc;
 	__le32			exp_ddgst;
 	__le32			recv_ddgst;
 	struct completion       tls_complete;
 	int                     tls_err;
 	struct page_frag_cache	pf_cache;
@@ -454,36 +454,37 @@ nvme_tcp_fetch_request(struct nvme_tcp_queue *queue)
 
 	list_del(&req->entry);
 	return req;
 }
 
-static inline void nvme_tcp_ddgst_final(struct ahash_request *hash,
-		__le32 *dgst)
-{
-	ahash_request_set_crypt(hash, NULL, (u8 *)dgst, 0);
-	crypto_ahash_final(hash);
-}
+#define NVME_TCP_CRC_SEED (~0)
 
-static inline void nvme_tcp_ddgst_update(struct ahash_request *hash,
-		struct page *page, off_t off, size_t len)
+static inline void nvme_tcp_ddgst_update(u32 *crcp,
+		struct page *page, size_t off, size_t len)
 {
-	struct scatterlist sg;
+	page += off / PAGE_SIZE;
+	off %= PAGE_SIZE;
+	while (len) {
+		const void *vaddr = kmap_local_page(page);
+		size_t n = min(len, (size_t)PAGE_SIZE - off);
 
-	sg_init_table(&sg, 1);
-	sg_set_page(&sg, page, len, off);
-	ahash_request_set_crypt(hash, &sg, NULL, len);
-	crypto_ahash_update(hash);
+		*crcp = crc32c(*crcp, vaddr + off, n);
+		kunmap_local(vaddr);
+		page++;
+		off = 0;
+		len -= n;
+	}
 }
 
-static inline void nvme_tcp_hdgst(struct ahash_request *hash,
-		void *pdu, size_t len)
+static inline __le32 nvme_tcp_ddgst_final(u32 crc)
 {
-	struct scatterlist sg;
+	return cpu_to_le32(~crc);
+}
 
-	sg_init_one(&sg, pdu, len);
-	ahash_request_set_crypt(hash, &sg, pdu + len, len);
-	crypto_ahash_digest(hash);
+static inline __le32 nvme_tcp_hdgst(const void *pdu, size_t len)
+{
+	return cpu_to_le32(~crc32c(NVME_TCP_CRC_SEED, pdu, len));
 }
 
 static int nvme_tcp_verify_hdgst(struct nvme_tcp_queue *queue,
 		void *pdu, size_t pdu_len)
 {
@@ -497,12 +498,11 @@ static int nvme_tcp_verify_hdgst(struct nvme_tcp_queue *queue,
 			nvme_tcp_queue_id(queue));
 		return -EPROTO;
 	}
 
 	recv_digest = *(__le32 *)(pdu + hdr->hlen);
-	nvme_tcp_hdgst(queue->rcv_hash, pdu, pdu_len);
-	exp_digest = *(__le32 *)(pdu + hdr->hlen);
+	exp_digest = nvme_tcp_hdgst(pdu, pdu_len);
 	if (recv_digest != exp_digest) {
 		dev_err(queue->ctrl->ctrl.device,
 			"header digest error: recv %#x expected %#x\n",
 			le32_to_cpu(recv_digest), le32_to_cpu(exp_digest));
 		return -EIO;
@@ -524,11 +524,11 @@ static int nvme_tcp_check_ddgst(struct nvme_tcp_queue *queue, void *pdu)
 		dev_err(queue->ctrl->ctrl.device,
 			"queue %d: data digest flag is cleared\n",
 		nvme_tcp_queue_id(queue));
 		return -EPROTO;
 	}
-	crypto_ahash_init(queue->rcv_hash);
+	queue->rcv_crc = NVME_TCP_CRC_SEED;
 
 	return 0;
 }
 
 static void nvme_tcp_exit_request(struct blk_mq_tag_set *set,
@@ -924,12 +924,12 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb,
 		/* we can read only from what is left in this bio */
 		recv_len = min_t(size_t, recv_len,
 				iov_iter_count(&req->iter));
 
 		if (queue->data_digest)
-			ret = skb_copy_and_hash_datagram_iter(skb, *offset,
-				&req->iter, recv_len, queue->rcv_hash);
+			ret = skb_copy_and_crc32c_datagram_iter(skb, *offset,
+				&req->iter, recv_len, &queue->rcv_crc);
 		else
 			ret = skb_copy_datagram_iter(skb, *offset,
 					&req->iter, recv_len);
 		if (ret) {
 			dev_err(queue->ctrl->ctrl.device,
@@ -943,11 +943,11 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb,
 		queue->data_remaining -= recv_len;
 	}
 
 	if (!queue->data_remaining) {
 		if (queue->data_digest) {
-			nvme_tcp_ddgst_final(queue->rcv_hash, &queue->exp_ddgst);
+			queue->exp_ddgst = nvme_tcp_ddgst_final(queue->rcv_crc);
 			queue->ddgst_remaining = NVME_TCP_DIGEST_LENGTH;
 		} else {
 			if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) {
 				nvme_tcp_end_request(rq,
 						le16_to_cpu(req->status));
@@ -1145,11 +1145,11 @@ static int nvme_tcp_try_send_data(struct nvme_tcp_request *req)
 		ret = sock_sendmsg(queue->sock, &msg);
 		if (ret <= 0)
 			return ret;
 
 		if (queue->data_digest)
-			nvme_tcp_ddgst_update(queue->snd_hash, page,
+			nvme_tcp_ddgst_update(&queue->snd_crc, page,
 					offset, ret);
 
 		/*
 		 * update the request iterator except for the last payload send
 		 * in the request where we don't want to modify it as we may
@@ -1159,12 +1159,12 @@ static int nvme_tcp_try_send_data(struct nvme_tcp_request *req)
 			nvme_tcp_advance_req(req, ret);
 
 		/* fully successful last send in current PDU */
 		if (last && ret == len) {
 			if (queue->data_digest) {
-				nvme_tcp_ddgst_final(queue->snd_hash,
-					&req->ddgst);
+				req->ddgst =
+					nvme_tcp_ddgst_final(queue->snd_crc);
 				req->state = NVME_TCP_SEND_DDGST;
 				req->offset = 0;
 			} else {
 				if (h2cdata_left)
 					nvme_tcp_setup_h2c_data_pdu(req);
@@ -1192,11 +1192,11 @@ static int nvme_tcp_try_send_cmd_pdu(struct nvme_tcp_request *req)
 		msg.msg_flags |= MSG_MORE;
 	else
 		msg.msg_flags |= MSG_EOR;
 
 	if (queue->hdr_digest && !req->offset)
-		nvme_tcp_hdgst(queue->snd_hash, pdu, sizeof(*pdu));
+		*(__le32 *)(pdu + 1) = nvme_tcp_hdgst(pdu, sizeof(*pdu));
 
 	bvec_set_virt(&bvec, (void *)pdu + req->offset, len);
 	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);
 	ret = sock_sendmsg(queue->sock, &msg);
 	if (unlikely(ret <= 0))
@@ -1205,11 +1205,11 @@ static int nvme_tcp_try_send_cmd_pdu(struct nvme_tcp_request *req)
 	len -= ret;
 	if (!len) {
 		if (inline_data) {
 			req->state = NVME_TCP_SEND_DATA;
 			if (queue->data_digest)
-				crypto_ahash_init(queue->snd_hash);
+				queue->snd_crc = NVME_TCP_CRC_SEED;
 		} else {
 			nvme_tcp_done_send_req(queue);
 		}
 		return 1;
 	}
@@ -1227,11 +1227,11 @@ static int nvme_tcp_try_send_data_pdu(struct nvme_tcp_request *req)
 	u8 hdgst = nvme_tcp_hdgst_len(queue);
 	int len = sizeof(*pdu) - req->offset + hdgst;
 	int ret;
 
 	if (queue->hdr_digest && !req->offset)
-		nvme_tcp_hdgst(queue->snd_hash, pdu, sizeof(*pdu));
+		*(__le32 *)(pdu + 1) = nvme_tcp_hdgst(pdu, sizeof(*pdu));
 
 	if (!req->h2cdata_left)
 		msg.msg_flags |= MSG_SPLICE_PAGES;
 
 	bvec_set_virt(&bvec, (void *)pdu + req->offset, len);
@@ -1242,11 +1242,11 @@ static int nvme_tcp_try_send_data_pdu(struct nvme_tcp_request *req)
 
 	len -= ret;
 	if (!len) {
 		req->state = NVME_TCP_SEND_DATA;
 		if (queue->data_digest)
-			crypto_ahash_init(queue->snd_hash);
+			queue->snd_crc = NVME_TCP_CRC_SEED;
 		return 1;
 	}
 	req->offset += ret;
 
 	return -EAGAIN;
@@ -1382,45 +1382,10 @@ static void nvme_tcp_io_work(struct work_struct *w)
 	} while (!time_after(jiffies, deadline)); /* quota is exhausted */
 
 	queue_work_on(queue->io_cpu, nvme_tcp_wq, &queue->io_work);
 }
 
-static void nvme_tcp_free_crypto(struct nvme_tcp_queue *queue)
-{
-	struct crypto_ahash *tfm = crypto_ahash_reqtfm(queue->rcv_hash);
-
-	ahash_request_free(queue->rcv_hash);
-	ahash_request_free(queue->snd_hash);
-	crypto_free_ahash(tfm);
-}
-
-static int nvme_tcp_alloc_crypto(struct nvme_tcp_queue *queue)
-{
-	struct crypto_ahash *tfm;
-
-	tfm = crypto_alloc_ahash("crc32c", 0, CRYPTO_ALG_ASYNC);
-	if (IS_ERR(tfm))
-		return PTR_ERR(tfm);
-
-	queue->snd_hash = ahash_request_alloc(tfm, GFP_KERNEL);
-	if (!queue->snd_hash)
-		goto free_tfm;
-	ahash_request_set_callback(queue->snd_hash, 0, NULL, NULL);
-
-	queue->rcv_hash = ahash_request_alloc(tfm, GFP_KERNEL);
-	if (!queue->rcv_hash)
-		goto free_snd_hash;
-	ahash_request_set_callback(queue->rcv_hash, 0, NULL, NULL);
-
-	return 0;
-free_snd_hash:
-	ahash_request_free(queue->snd_hash);
-free_tfm:
-	crypto_free_ahash(tfm);
-	return -ENOMEM;
-}
-
 static void nvme_tcp_free_async_req(struct nvme_tcp_ctrl *ctrl)
 {
 	struct nvme_tcp_request *async = &ctrl->async_req;
 
 	page_frag_free(async->pdu);
@@ -1449,13 +1414,10 @@ static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid)
 	unsigned int noreclaim_flag;
 
 	if (!test_and_clear_bit(NVME_TCP_Q_ALLOCATED, &queue->flags))
 		return;
 
-	if (queue->hdr_digest || queue->data_digest)
-		nvme_tcp_free_crypto(queue);
-
 	page_frag_cache_drain(&queue->pf_cache);
 
 	noreclaim_flag = memalloc_noreclaim_save();
 	/* ->sock will be released by fput() */
 	fput(queue->sock->file);
@@ -1865,25 +1827,17 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
 		}
 	}
 
 	queue->hdr_digest = nctrl->opts->hdr_digest;
 	queue->data_digest = nctrl->opts->data_digest;
-	if (queue->hdr_digest || queue->data_digest) {
-		ret = nvme_tcp_alloc_crypto(queue);
-		if (ret) {
-			dev_err(nctrl->device,
-				"failed to allocate queue %d crypto\n", qid);
-			goto err_sock;
-		}
-	}
 
 	rcv_pdu_size = sizeof(struct nvme_tcp_rsp_pdu) +
 			nvme_tcp_hdgst_len(queue);
 	queue->pdu = kmalloc(rcv_pdu_size, GFP_KERNEL);
 	if (!queue->pdu) {
 		ret = -ENOMEM;
-		goto err_crypto;
+		goto err_sock;
 	}
 
 	dev_dbg(nctrl->device, "connecting queue %d\n",
 			nvme_tcp_queue_id(queue));
 
@@ -1912,13 +1866,10 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
 
 err_init_connect:
 	kernel_sock_shutdown(queue->sock, SHUT_RDWR);
 err_rcv_pdu:
 	kfree(queue->pdu);
-err_crypto:
-	if (queue->hdr_digest || queue->data_digest)
-		nvme_tcp_free_crypto(queue);
 err_sock:
 	/* ->sock will be released by fput() */
 	fput(queue->sock->file);
 	queue->sock = NULL;
 err_destroy_mutex:

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 09/10] nvme-tcp: use crc32c() and skb_copy_and_crc32c_datagram_iter()
  2025-05-16  5:31     ` Eric Biggers
@ 2025-05-16  6:06       ` Christoph Hellwig
  2025-05-17 17:45         ` Eric Biggers
  2025-05-17 20:32       ` Sagi Grimberg
  1 sibling, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2025-05-16  6:06 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Christoph Hellwig, netdev, linux-nvme, linux-sctp, linux-rdma,
	linux-kernel, Daniel Borkmann, Marcelo Ricardo Leitner,
	Sagi Grimberg, Ard Biesheuvel

On Thu, May 15, 2025 at 10:31:00PM -0700, Eric Biggers wrote:
> +static inline __le32 nvme_tcp_hdgst(const void *pdu, size_t len)
> +{
> +	return cpu_to_le32(~crc32c(NVME_TCP_CRC_SEED, pdu, len));
>  }

This drops the unaligned handling.  Now in the NVMe protocol it will
always be properly aligned, but my TCP-foo is not good enough to
remember if the networking code will also guarantee 32-bit alignment
for the start of the packet?

Otherwise this looks great:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH net-next 04/10] RDMA/siw: use skb_crc32c() instead of __skb_checksum()
  2025-05-15 20:12     ` Eric Biggers
@ 2025-05-16 10:42       ` Bernard Metzler
  0 siblings, 0 replies; 37+ messages in thread
From: Bernard Metzler @ 2025-05-16 10:42 UTC (permalink / raw)
  To: Eric Biggers, Bart Van Assche
  Cc: netdev@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-sctp@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, Daniel Borkmann,
	Marcelo Ricardo Leitner, Sagi Grimberg, Ard Biesheuvel



> -----Original Message-----
> From: Eric Biggers <ebiggers@kernel.org>
> Sent: Thursday, May 15, 2025 10:13 PM
> To: Bart Van Assche <bvanassche@acm.org>
> Cc: netdev@vger.kernel.org; Bernard Metzler <BMT@zurich.ibm.com>; linux-
> nvme@lists.infradead.org; linux-sctp@vger.kernel.org; linux-
> rdma@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Borkmann
> <daniel@iogearbox.net>; Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com>; Sagi Grimberg <sagi@grimberg.me>; Ard
> Biesheuvel <ardb@kernel.org>
> Subject: [EXTERNAL] Re: [PATCH net-next 04/10] RDMA/siw: use skb_crc32c()
> instead of __skb_checksum()
> 
> On Thu, May 15, 2025 at 01:02:20PM -0700, Bart Van Assche wrote:
> > On 5/10/25 5:41 PM, Eric Biggers wrote:
> > > Instead of calling __skb_checksum() with a skb_checksum_ops struct that
> > > does CRC32C, just call the new function skb_crc32c().  This is faster
> > > and simpler.
> > Bernard, since you are the owner and author of the siw driver, please
> help
> > with reviewing this patch.
> >
> > Eric, do you already have a test case for the siw driver? If not,
> > multiple tests in the blktests framework use this driver intensively,
> > including the SRP tests that I wrote myself. See also
> > https% 
> 3A__github.com_osandov_blktests&d=DwIBAg&c=BSDicqBQBDjDI9RkVyTcHQ&r=4ynb4Sj
> _4MUcZXbhvovE4tYSbqxyOwdSiLedP4yO55g&m=42RdxLC4OJSKYtu0EOtGxON-
> daxZpwd8xG9briOxYyywxa7RWrL-
> KukLFCucHn_c&s=dtz3xdRBYawI7d1FpWaWUV_QhDeExTyHGbPMZLxpzPQ&e= .
> 
> No.  I'll try that out when I have a chance.
> 
> If the developers/maintainers of the driver could help test it, that would
> be a
> lot easier though.  I've been cleaning up the CRC calls across the whole
> kernel,
> and it gets time-consuming when individual subsystems insist on me running
> their
> custom test suite(s) and providing subsystem-specific benchmarks.
> 
> - Eric

I'll definitively test it. Sorry for the delay. You will hear from me
next days. If it comes to things like checksums, I always try to get
a setup where I can test against RMDA hardware (Chelsio). Hope to
find time over the WE.

Thanks,
Bernard.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 09/10] nvme-tcp: use crc32c() and skb_copy_and_crc32c_datagram_iter()
  2025-05-11  0:41 ` [PATCH net-next 09/10] nvme-tcp: use crc32c() and skb_copy_and_crc32c_datagram_iter() Eric Biggers
  2025-05-16  4:36   ` Christoph Hellwig
@ 2025-05-17  9:58   ` Sagi Grimberg
  2025-05-17 17:29     ` Eric Biggers
  1 sibling, 1 reply; 37+ messages in thread
From: Sagi Grimberg @ 2025-05-17  9:58 UTC (permalink / raw)
  To: Eric Biggers, netdev
  Cc: linux-nvme, linux-sctp, linux-rdma, linux-kernel, Daniel Borkmann,
	Marcelo Ricardo Leitner, Ard Biesheuvel

>   #include "nvme.h"
>   #include "fabrics.h"
> @@ -166,12 +166,12 @@ struct nvme_tcp_queue {
>   	bool			rd_enabled;
>   
>   	bool			hdr_digest;
>   	bool			data_digest;
>   	bool			tls_enabled;
> -	struct ahash_request	*rcv_hash;
> -	struct ahash_request	*snd_hash;
> +	u32			rcv_crc;
> +	u32			snd_crc;

Let's call it rcv_dgst (recv digest) and snd_dgst (send digest).
Other than that, looks good to me.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 09/10] nvme-tcp: use crc32c() and skb_copy_and_crc32c_datagram_iter()
  2025-05-17  9:58   ` Sagi Grimberg
@ 2025-05-17 17:29     ` Eric Biggers
  2025-05-17 20:30       ` Sagi Grimberg
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2025-05-17 17:29 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: netdev, linux-nvme, linux-sctp, linux-rdma, linux-kernel,
	Daniel Borkmann, Marcelo Ricardo Leitner, Ard Biesheuvel

On Sat, May 17, 2025 at 12:58:35PM +0300, Sagi Grimberg wrote:
> >   #include "nvme.h"
> >   #include "fabrics.h"
> > @@ -166,12 +166,12 @@ struct nvme_tcp_queue {
> >   	bool			rd_enabled;
> >   	bool			hdr_digest;
> >   	bool			data_digest;
> >   	bool			tls_enabled;
> > -	struct ahash_request	*rcv_hash;
> > -	struct ahash_request	*snd_hash;
> > +	u32			rcv_crc;
> > +	u32			snd_crc;
> >   	__le32			exp_ddgst;
> >   	__le32			recv_ddgst;
> 
> Let's call it rcv_dgst (recv digest) and snd_dgst (send digest).
> Other than that, looks good to me.
> 
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

rcv_dgst would be awfully close to recv_ddgst which holds the on-wire digest
that is received.  I think I slightly prefer *_crc, since that helps
differentiate the in-progress values from the finalized values.

- Eric

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 09/10] nvme-tcp: use crc32c() and skb_copy_and_crc32c_datagram_iter()
  2025-05-16  6:06       ` Christoph Hellwig
@ 2025-05-17 17:45         ` Eric Biggers
  0 siblings, 0 replies; 37+ messages in thread
From: Eric Biggers @ 2025-05-17 17:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: netdev, linux-nvme, linux-sctp, linux-rdma, linux-kernel,
	Daniel Borkmann, Marcelo Ricardo Leitner, Sagi Grimberg,
	Ard Biesheuvel

On Thu, May 15, 2025 at 11:06:56PM -0700, Christoph Hellwig wrote:
> On Thu, May 15, 2025 at 10:31:00PM -0700, Eric Biggers wrote:
> > +static inline __le32 nvme_tcp_hdgst(const void *pdu, size_t len)
> > +{
> > +	return cpu_to_le32(~crc32c(NVME_TCP_CRC_SEED, pdu, len));
> >  }
> 
> This drops the unaligned handling.  Now in the NVMe protocol it will
> always be properly aligned, but my TCP-foo is not good enough to
> remember if the networking code will also guarantee 32-bit alignment
> for the start of the packet?
> 
> Otherwise this looks great:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

The nvme-tcp driver already assumes that the header is at least 4-byte aligned,
considering that it accesses hdr->plen and the struct doesn't use __packed:

    struct nvme_tcp_hdr {
            __u8	type;
            __u8	flags;
            __u8	hlen;
            __u8	pdo;
            __le32	plen;
    };

On the send size, the header size is always sizeof(struct nvme_tcp_cmd_pdu) or
sizeof(struct nvme_tcp_data_pdu) which are multiples of 4.  On the receive side,
nvme-tcp validates hdr->hlen == sizeof(struct nvme_tcp_rsp_pdu) and then does:

    recv_digest = *(__le32 *)(pdu + hdr->hlen);

So, using put_unaligned_le32() is unnecessary.  I just had it there in v1
because I had directly translated crypto_ahash_digest(), which does ultimately
do a put_unaligned_le32() once you unravel all the API layers.

- Eric

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 09/10] nvme-tcp: use crc32c() and skb_copy_and_crc32c_datagram_iter()
  2025-05-17 17:29     ` Eric Biggers
@ 2025-05-17 20:30       ` Sagi Grimberg
  0 siblings, 0 replies; 37+ messages in thread
From: Sagi Grimberg @ 2025-05-17 20:30 UTC (permalink / raw)
  To: Eric Biggers
  Cc: netdev, linux-nvme, linux-sctp, linux-rdma, linux-kernel,
	Daniel Borkmann, Marcelo Ricardo Leitner, Ard Biesheuvel


>> Let's call it rcv_dgst (recv digest) and snd_dgst (send digest).
>> Other than that, looks good to me.
>>
>> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> rcv_dgst would be awfully close to recv_ddgst which holds the on-wire digest
> that is received.  I think I slightly prefer *_crc, since that helps
> differentiate the in-progress values from the finalized values.

yea, sounds good.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 09/10] nvme-tcp: use crc32c() and skb_copy_and_crc32c_datagram_iter()
  2025-05-16  5:31     ` Eric Biggers
  2025-05-16  6:06       ` Christoph Hellwig
@ 2025-05-17 20:32       ` Sagi Grimberg
  1 sibling, 0 replies; 37+ messages in thread
From: Sagi Grimberg @ 2025-05-17 20:32 UTC (permalink / raw)
  To: Eric Biggers, Christoph Hellwig
  Cc: netdev, linux-nvme, linux-sctp, linux-rdma, linux-kernel,
	Daniel Borkmann, Marcelo Ricardo Leitner, Ard Biesheuvel


>   	if (queue->hdr_digest && !req->offset)
> -		nvme_tcp_hdgst(queue->snd_hash, pdu, sizeof(*pdu));
> +		*(__le32 *)(pdu + 1) = nvme_tcp_hdgst(pdu, sizeof(*pdu));

I'd move this assignment to a helper nvme_tcp_set_hdgst(), especially as it
has two call-sites.

Other than that,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE:  [PATCH net-next 04/10] RDMA/siw: use skb_crc32c() instead of __skb_checksum()
  2025-05-11  0:41 ` [PATCH net-next 04/10] RDMA/siw: use skb_crc32c() instead of __skb_checksum() Eric Biggers
  2025-05-15 20:02   ` Bart Van Assche
@ 2025-05-19  9:04   ` Bernard Metzler
  2025-05-20 13:18     ` Leon Romanovsky
  1 sibling, 1 reply; 37+ messages in thread
From: Bernard Metzler @ 2025-05-19  9:04 UTC (permalink / raw)
  To: Eric Biggers, netdev@vger.kernel.org
  Cc: linux-nvme@lists.infradead.org, linux-sctp@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	Daniel Borkmann, Marcelo Ricardo Leitner, Sagi Grimberg,
	Ard Biesheuvel



> -----Original Message-----
> From: Eric Biggers <ebiggers@kernel.org>
> Sent: Sunday, May 11, 2025 2:41 AM
> To: netdev@vger.kernel.org
> Cc: linux-nvme@lists.infradead.org; linux-sctp@vger.kernel.org; linux-
> rdma@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Borkmann
> <daniel@iogearbox.net>; Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com>; Sagi Grimberg <sagi@grimberg.me>; Ard
> Biesheuvel <ardb@kernel.org>
> Subject: [EXTERNAL] [PATCH net-next 04/10] RDMA/siw: use skb_crc32c()
> instead of __skb_checksum()
> 
> From: Eric Biggers <ebiggers@google.com>
> 
> Instead of calling __skb_checksum() with a skb_checksum_ops struct that
> does CRC32C, just call the new function skb_crc32c().  This is faster
> and simpler.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  drivers/infiniband/sw/siw/Kconfig |  1 +
>  drivers/infiniband/sw/siw/siw.h   | 22 +---------------------
>  2 files changed, 2 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/siw/Kconfig
> b/drivers/infiniband/sw/siw/Kconfig
> index ae4a953e2a03..186f182b80e7 100644
> --- a/drivers/infiniband/sw/siw/Kconfig
> +++ b/drivers/infiniband/sw/siw/Kconfig
> @@ -1,10 +1,11 @@
>  config RDMA_SIW
>  	tristate "Software RDMA over TCP/IP (iWARP) driver"
>  	depends on INET && INFINIBAND
>  	depends on INFINIBAND_VIRT_DMA
>  	select CRC32
> +	select NET_CRC32C
>  	help
>  	This driver implements the iWARP RDMA transport over
>  	the Linux TCP/IP network stack. It enables a system with a
>  	standard Ethernet adapter to interoperate with a iWARP
>  	adapter or with another system running the SIW driver.
> diff --git a/drivers/infiniband/sw/siw/siw.h
> b/drivers/infiniband/sw/siw/siw.h
> index 385067e07faf..d9e5a2e4c471 100644
> --- a/drivers/infiniband/sw/siw/siw.h
> +++ b/drivers/infiniband/sw/siw/siw.h
> @@ -691,33 +691,13 @@ static inline void siw_crc_oneshot(const void *data,
> size_t len, u8 out[4])
>  	siw_crc_init(&crc);
>  	siw_crc_update(&crc, data, len);
>  	return siw_crc_final(&crc, out);
>  }
> 
> -static inline __wsum siw_csum_update(const void *buff, int len, __wsum
> sum)
> -{
> -	return (__force __wsum)crc32c((__force __u32)sum, buff, len);
> -}
> -
> -static inline __wsum siw_csum_combine(__wsum csum, __wsum csum2, int
> offset,
> -				      int len)
> -{
> -	return (__force __wsum)crc32c_combine((__force __u32)csum,
> -					      (__force __u32)csum2, len);
> -}
> -
>  static inline void siw_crc_skb(struct siw_rx_stream *srx, unsigned int
> len)
>  {
> -	const struct skb_checksum_ops siw_cs_ops = {
> -		.update = siw_csum_update,
> -		.combine = siw_csum_combine,
> -	};
> -	__wsum crc = (__force __wsum)srx->mpa_crc;
> -
> -	crc = __skb_checksum(srx->skb, srx->skb_offset, len, crc,
> -			     &siw_cs_ops);
> -	srx->mpa_crc = (__force u32)crc;
> +	srx->mpa_crc = skb_crc32c(srx->skb, srx->skb_offset, len, srx-
> >mpa_crc);
>  }
> 
>  #define siw_dbg(ibdev, fmt, ...)
> \
>  	ibdev_dbg(ibdev, "%s: " fmt, __func__, ##__VA_ARGS__)
> 
> --
> 2.49.0
> 

Thanks Eric!
Works fine. Correct checksum tested against siw and cxgb4 peers.

Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 04/10] RDMA/siw: use skb_crc32c() instead of __skb_checksum()
  2025-05-19  9:04   ` Bernard Metzler
@ 2025-05-20 13:18     ` Leon Romanovsky
  2025-05-20 15:18       ` Eric Biggers
  0 siblings, 1 reply; 37+ messages in thread
From: Leon Romanovsky @ 2025-05-20 13:18 UTC (permalink / raw)
  To: Bernard Metzler
  Cc: Eric Biggers, netdev@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-sctp@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	Daniel Borkmann, Marcelo Ricardo Leitner, Sagi Grimberg,
	Ard Biesheuvel

On Mon, May 19, 2025 at 09:04:04AM +0000, Bernard Metzler wrote:
> 

<...>

> > 
> 
> Thanks Eric!
> Works fine. Correct checksum tested against siw and cxgb4 peers.
> 
> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>

This patch should go through RDMA repository, Please resend it.

Thanks

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 04/10] RDMA/siw: use skb_crc32c() instead of __skb_checksum()
  2025-05-20 13:18     ` Leon Romanovsky
@ 2025-05-20 15:18       ` Eric Biggers
  2025-05-21 10:38         ` Leon Romanovsky
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Biggers @ 2025-05-20 15:18 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Bernard Metzler, netdev@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-sctp@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	Daniel Borkmann, Marcelo Ricardo Leitner, Sagi Grimberg,
	Ard Biesheuvel

On Tue, May 20, 2025 at 04:18:41PM +0300, Leon Romanovsky wrote:
> On Mon, May 19, 2025 at 09:04:04AM +0000, Bernard Metzler wrote:
> > 
> 
> <...>
> 
> > > 
> > 
> > Thanks Eric!
> > Works fine. Correct checksum tested against siw and cxgb4 peers.
> > 
> > Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
> 
> This patch should go through RDMA repository, Please resend it.
> 
> Thanks

It depends on patches 1-2, and patches 6-7 depend on this one.  So your proposal
would require that we drag this out over 3 cycles (patches 1-3,5,8-10 in net in
6.16, patch 4 in RDMA in 6.17, patches 6-7 in net in 6.18).  Can we please just
take the whole series through net in 6.16?  There aren't any conflicts.

- Eric

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next 04/10] RDMA/siw: use skb_crc32c() instead of __skb_checksum()
  2025-05-20 15:18       ` Eric Biggers
@ 2025-05-21 10:38         ` Leon Romanovsky
  0 siblings, 0 replies; 37+ messages in thread
From: Leon Romanovsky @ 2025-05-21 10:38 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Bernard Metzler, netdev@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-sctp@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	Daniel Borkmann, Marcelo Ricardo Leitner, Sagi Grimberg,
	Ard Biesheuvel

On Tue, May 20, 2025 at 08:18:17AM -0700, Eric Biggers wrote:
> On Tue, May 20, 2025 at 04:18:41PM +0300, Leon Romanovsky wrote:
> > On Mon, May 19, 2025 at 09:04:04AM +0000, Bernard Metzler wrote:
> > > 
> > 
> > <...>
> > 
> > > > 
> > > 
> > > Thanks Eric!
> > > Works fine. Correct checksum tested against siw and cxgb4 peers.
> > > 
> > > Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
> > 
> > This patch should go through RDMA repository, Please resend it.
> > 
> > Thanks
> 
> It depends on patches 1-2, and patches 6-7 depend on this one.  So your proposal
> would require that we drag this out over 3 cycles (patches 1-3,5,8-10 in net in
> 6.16, patch 4 in RDMA in 6.17, patches 6-7 in net in 6.18).  Can we please just
> take the whole series through net in 6.16?  There aren't any conflicts.

No problem, if it helps.

Acked-by: Leon Romanovsky <leon@kernel.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2025-05-21 10:38 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-11  0:41 [PATCH net-next 00/10] net: faster and simpler CRC32C computation Eric Biggers
2025-05-11  0:41 ` [PATCH net-next 01/10] net: introduce CONFIG_NET_CRC32C Eric Biggers
2025-05-11  0:41 ` [PATCH net-next 02/10] net: add skb_crc32c() Eric Biggers
2025-05-11  0:41 ` [PATCH net-next 03/10] net: use skb_crc32c() in skb_crc32c_csum_help() Eric Biggers
2025-05-11  0:41 ` [PATCH net-next 04/10] RDMA/siw: use skb_crc32c() instead of __skb_checksum() Eric Biggers
2025-05-15 20:02   ` Bart Van Assche
2025-05-15 20:12     ` Eric Biggers
2025-05-16 10:42       ` Bernard Metzler
2025-05-19  9:04   ` Bernard Metzler
2025-05-20 13:18     ` Leon Romanovsky
2025-05-20 15:18       ` Eric Biggers
2025-05-21 10:38         ` Leon Romanovsky
2025-05-11  0:41 ` [PATCH net-next 05/10] sctp: " Eric Biggers
2025-05-11  0:41 ` [PATCH net-next 06/10] net: fold __skb_checksum() into skb_checksum() Eric Biggers
2025-05-11  0:41 ` [PATCH net-next 07/10] lib/crc32: remove unused support for CRC32C combination Eric Biggers
2025-05-11  0:41 ` [PATCH net-next 08/10] net: add skb_copy_and_crc32c_datagram_iter() Eric Biggers
2025-05-13 21:41   ` Jakub Kicinski
2025-05-15 18:09     ` Eric Biggers
2025-05-11  0:41 ` [PATCH net-next 09/10] nvme-tcp: use crc32c() and skb_copy_and_crc32c_datagram_iter() Eric Biggers
2025-05-16  4:36   ` Christoph Hellwig
2025-05-16  5:31     ` Eric Biggers
2025-05-16  6:06       ` Christoph Hellwig
2025-05-17 17:45         ` Eric Biggers
2025-05-17 20:32       ` Sagi Grimberg
2025-05-17  9:58   ` Sagi Grimberg
2025-05-17 17:29     ` Eric Biggers
2025-05-17 20:30       ` Sagi Grimberg
2025-05-11  0:41 ` [PATCH net-next 10/10] net: remove skb_copy_and_hash_datagram_iter() Eric Biggers
2025-05-11 16:30 ` [PATCH net-next 00/10] net: faster and simpler CRC32C computation Andrew Lunn
2025-05-11 17:29   ` Eric Biggers
2025-05-11 21:22     ` Andrew Lunn
2025-05-11 21:45       ` Ard Biesheuvel
2025-05-11 23:07         ` Eric Biggers
2025-05-15 19:21           ` David Laight
2025-05-15 19:50             ` Eric Biggers
2025-05-13 21:40 ` Jakub Kicinski
2025-05-15 18:10   ` Eric Biggers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).