From: Eric Biggers <ebiggers@kernel.org>
To: David Laight <david.laight.linux@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>, Andrew Lunn <andrew@lunn.ch>,
netdev@vger.kernel.org, linux-nvme@lists.infradead.org,
linux-sctp@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org,
Daniel Borkmann <daniel@iogearbox.net>,
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>,
Sagi Grimberg <sagi@grimberg.me>
Subject: Re: [PATCH net-next 00/10] net: faster and simpler CRC32C computation
Date: Thu, 15 May 2025 12:50:51 -0700 [thread overview]
Message-ID: <20250515195051.GK1411@quark> (raw)
In-Reply-To: <20250515202136.32b4f456@pumpkin>
On Thu, May 15, 2025 at 08:21:36PM +0100, David Laight wrote:
> On Sun, 11 May 2025 16:07:50 -0700
> Eric Biggers <ebiggers@kernel.org> wrote:
>
> > On Sun, May 11, 2025 at 11:45:14PM +0200, Ard Biesheuvel wrote:
> > > On Sun, 11 May 2025 at 23:22, Andrew Lunn <andrew@lunn.ch> wrote:
> > > >
> > > > On Sun, May 11, 2025 at 10:29:29AM -0700, Eric Biggers wrote:
> > > > > On Sun, May 11, 2025 at 06:30:25PM +0200, Andrew Lunn wrote:
> > > > > > On Sat, May 10, 2025 at 05:41:00PM -0700, Eric Biggers wrote:
> > > > > > > Update networking code that computes the CRC32C of packets to just call
> > > > > > > crc32c() without unnecessary abstraction layers. The result is faster
> > > > > > > and simpler code.
> > > > > >
> > > > > > Hi Eric
> > > > > >
> > > > > > Do you have some benchmarks for these changes?
> > > > > >
> > > > > > Andrew
> > > > >
> > > > > Do you want benchmarks that show that removing the indirect calls makes things
> > > > > faster? I think that should be fairly self-evident by now after dealing with
> > > > > retpoline for years, but I can provide more details if you need them.
> > > >
> > > > I was think more like iperf before/after? Show the CPU load has gone
> > > > down without the bandwidth also going down.
> > > >
> > > > Eric Dumazet has a T-Shirt with a commit message on the back which
> > > > increased network performance by X%. At the moment, there is nothing
> > > > T-Shirt quotable here.
> > > >
> > >
> > > I think that removing layers of redundant code to ultimately call the
> > > same core CRC-32 implementation is a rather obvious win, especially
> > > when indirect calls are involved. The diffstat speaks for itself, so
> > > maybe you can print that on a T-shirt.
> >
> > Agreed with Ard. I did try doing some SCTP benchmarks with iperf3 earlier, but
> > they were very noisy and the CRC32C checksumming seemed to be lost in the noise.
> > There probably are some tricks to running reliable networking benchmarks; I'm
> > not a networking developer. Regardless, this series is a clear win for the
> > CRC32C code, both from a simplicity and performance perspective. It also fixes
> > the kconfig dependency issues. That should be good enough, IMO.
> >
> > In case it's helpful, here are some microbenchmarks of __skb_checksum (old) vs
> > skb_crc32c (new):
> >
> > Linear sk_buffs
> >
> > Length in bytes __skb_checksum cycles skb_crc32c cycles
> > =============== ===================== =================
> > 64 43 18
> > 1420 204 161
> > 16384 1735 1642
> >
> > Nonlinear sk_buffs (even split between head and one fragment)
> >
> > Length in bytes __skb_checksum cycles skb_crc32c cycles
> > =============== ===================== =================
> > 64 579 22
> > 1420 1506 194
> > 16384 4365 1682
> >
> > So 1420-byte linear buffers (roughly the most common case) is 21% faster,
>
> 1420 bytes is unlikely to be the most common case - at least for some users.
> SCTP is message oriented so the checksum is over a 'user message'.
> A non-uncommon use is carrying mobile network messages (eg SMS) over the IP
> network (instead of TDM links).
> In that case the maximum data chunk size (what is being checksummed) is limited
> to not much over 256 bytes - and a lot of data chunks will be smaller.
> The actual difficulty is getting multiple data chunks into a single ethernet
> packet without adding significant delays.
>
> But the changes definitely improve things.
Interesting. Of course, the data I gave shows that the proportional performance
increase is even greater on short packets than long ones. I'll include those
tables when I resend the patchset and add a row for 256 bytes too.
- Eric
next prev parent reply other threads:[~2025-05-15 19:50 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-11 0:41 [PATCH net-next 00/10] net: faster and simpler CRC32C computation Eric Biggers
2025-05-11 0:41 ` [PATCH net-next 01/10] net: introduce CONFIG_NET_CRC32C Eric Biggers
2025-05-11 0:41 ` [PATCH net-next 02/10] net: add skb_crc32c() Eric Biggers
2025-05-11 0:41 ` [PATCH net-next 03/10] net: use skb_crc32c() in skb_crc32c_csum_help() Eric Biggers
2025-05-11 0:41 ` [PATCH net-next 04/10] RDMA/siw: use skb_crc32c() instead of __skb_checksum() Eric Biggers
2025-05-15 20:02 ` Bart Van Assche
2025-05-15 20:12 ` Eric Biggers
2025-05-16 10:42 ` Bernard Metzler
2025-05-19 9:04 ` Bernard Metzler
2025-05-20 13:18 ` Leon Romanovsky
2025-05-20 15:18 ` Eric Biggers
2025-05-21 10:38 ` Leon Romanovsky
2025-05-11 0:41 ` [PATCH net-next 05/10] sctp: " Eric Biggers
2025-05-11 0:41 ` [PATCH net-next 06/10] net: fold __skb_checksum() into skb_checksum() Eric Biggers
2025-05-11 0:41 ` [PATCH net-next 07/10] lib/crc32: remove unused support for CRC32C combination Eric Biggers
2025-05-11 0:41 ` [PATCH net-next 08/10] net: add skb_copy_and_crc32c_datagram_iter() Eric Biggers
2025-05-13 21:41 ` Jakub Kicinski
2025-05-15 18:09 ` Eric Biggers
2025-05-11 0:41 ` [PATCH net-next 09/10] nvme-tcp: use crc32c() and skb_copy_and_crc32c_datagram_iter() Eric Biggers
2025-05-16 4:36 ` Christoph Hellwig
2025-05-16 5:31 ` Eric Biggers
2025-05-16 6:06 ` Christoph Hellwig
2025-05-17 17:45 ` Eric Biggers
2025-05-17 20:32 ` Sagi Grimberg
2025-05-17 9:58 ` Sagi Grimberg
2025-05-17 17:29 ` Eric Biggers
2025-05-17 20:30 ` Sagi Grimberg
2025-05-11 0:41 ` [PATCH net-next 10/10] net: remove skb_copy_and_hash_datagram_iter() Eric Biggers
2025-05-11 16:30 ` [PATCH net-next 00/10] net: faster and simpler CRC32C computation Andrew Lunn
2025-05-11 17:29 ` Eric Biggers
2025-05-11 21:22 ` Andrew Lunn
2025-05-11 21:45 ` Ard Biesheuvel
2025-05-11 23:07 ` Eric Biggers
2025-05-15 19:21 ` David Laight
2025-05-15 19:50 ` Eric Biggers [this message]
2025-05-13 21:40 ` Jakub Kicinski
2025-05-15 18:10 ` Eric Biggers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250515195051.GK1411@quark \
--to=ebiggers@kernel.org \
--cc=andrew@lunn.ch \
--cc=ardb@kernel.org \
--cc=daniel@iogearbox.net \
--cc=david.laight.linux@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-sctp@vger.kernel.org \
--cc=marcelo.leitner@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).