RE: [PATCH v1] x86/csum: rewrite csum_partial()

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: David Laight <David.Laight@ACULAB.COM>
To: 'Eric Dumazet' <edumazet@google.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	netdev <netdev@vger.kernel.org>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: RE: [PATCH v1] x86/csum: rewrite csum_partial()
Date: Sun, 14 Nov 2021 19:09:54 +0000	[thread overview]
Message-ID: <31bd81df79c4488c92c6a149eeceee3c@AcuMS.aculab.com> (raw)
In-Reply-To: <CANn89iJtqTGuJL6JgfOAuHxbkej9faURhj3yf2a9Y43Uh_4+Kg@mail.gmail.com>

From: Eric Dumazet
> Sent: 14 November 2021 15:04
> 
> On Sun, Nov 14, 2021 at 6:44 AM David Laight <David.Laight@aculab.com> wrote:
> >
> > From: Eric Dumazet
> > > Sent: 11 November 2021 22:31
> > ..
> > > That requires an extra add32_with_carry(), which unfortunately made
> > > the thing slower for me.
> > >
> > > I even hardcoded an inline fast_csum_40bytes() and got best results
> > > with the 10+1 addl,
> > > instead of
> > >  (5 + 1) acql +  mov (needing one extra  register) + shift + addl + adcl
> >
> > Did you try something like:
> >         sum = buf[0];
> >         val = buf[1]:
> >         asm(
> >                 add64 sum, val
> >                 adc64 sum, buf[2]
> >                 adc64 sum, buf[3]
> >                 adc64 sum, buf[4]
> >                 adc64 sum, 0
> >         }
> >         sum_hi = sum >> 32;
> >         asm(
> >                 add32 sum, sum_hi
> >                 adc32 sum, 0
> >         )
> 
> This is what I tried. but the last part was using add32_with_carry(),
> and clang was adding stupid mov to temp variable on the stack,
> killing the perf.

Persuading the compile the generate the required assembler is an art!

I also ended up using __builtin_bswap32(sum) when the alignment
was 'odd' - the shift expression didn't always get converted
to a rotate. Byteswap32 DTRT.

I also noticed that any initial checksum was being added in at the end.
The 64bit code can almost always handle a 32 bit (or maybe 56bit!)
input value and add it in 'for free' into the code that does the
initial alignment.

I don't remember testing misaligned buffers.
But I think it doesn't matter (on cpu anyone cares about!).
Even Sandy bridge can do two memory reads in one clock.
So should be able to do a single misaligned read every clock.
Which almost certainly means that aligning the addresses is pointless.
(Given you're not trying to do the adcx/adox loop.)
(Page spanning shouldn't matter.)

For buffers that aren't a multiple of 8 bytes it might be best to
read the last 8 bytes first and shift left to discard the ones that
would get added in twice.
This value can be added to the 32bit 'input' checksum.
Something like:
	sum_in += buf[length - 8] << (64 - (length & 7) * 8));
Annoyingly a special case is needed for buffers shorter than 8 bytes
to avoid falling off the start of a page.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

next prev parent reply	other threads:[~2021-11-14 19:10 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-11 18:10 [PATCH v1] x86/csum: rewrite csum_partial() Eric Dumazet
2021-11-11 21:56 ` Alexander Duyck
2021-11-11 22:30   ` Eric Dumazet
2021-11-12  9:13     ` Peter Zijlstra
2021-11-12 14:21       ` Eric Dumazet
2021-11-12 15:25         ` Peter Zijlstra
2021-11-12 15:37           ` Eric Dumazet
2021-11-14 14:44     ` David Laight
2021-11-14 15:03       ` Eric Dumazet
2021-11-14 19:09         ` David Laight [this message]
2021-11-14 19:23           ` Eric Dumazet
2021-11-14 14:21   ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=31bd81df79c4488c92c6a149eeceee3c@AcuMS.aculab.com \
    --to=david.laight@aculab.com \
    --cc=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).