All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Laight <David.Laight@ACULAB.COM>
To: David Laight <David.Laight@ACULAB.COM>,
	'Noah Goldstein' <goldstein.w.n@gmail.com>,
	'Eric Dumazet' <edumazet@google.com>
Cc: "'tglx@linutronix.de'" <tglx@linutronix.de>,
	"'mingo@redhat.com'" <mingo@redhat.com>,
	'Borislav Petkov' <bp@alien8.de>,
	"'dave.hansen@linux.intel.com'" <dave.hansen@linux.intel.com>,
	'X86 ML' <x86@kernel.org>, "'hpa@zytor.com'" <hpa@zytor.com>,
	"'peterz@infradead.org'" <peterz@infradead.org>,
	"'alexanderduyck@fb.com'" <alexanderduyck@fb.com>,
	'open list' <linux-kernel@vger.kernel.org>,
	'netdev' <netdev@vger.kernel.org>
Subject: RE: [PATCH] lib/x86: Optimise csum_partial of buffers that are not multiples of 8 bytes.
Date: Tue, 14 Dec 2021 12:36:07 +0000	[thread overview]
Message-ID: <3107b1e365f34df080feefb68be8a422@AcuMS.aculab.com> (raw)
In-Reply-To: <f1cd1a19878248f09e2e7cffe88c8191@AcuMS.aculab.com>

From: David Laight <David.Laight@ACULAB.COM>
> Sent: 13 December 2021 18:01
> 
> Add in the trailing bytes first so that there is no need to worry
> about the sum exceeding 64 bits.

This is an alternate version that (mostly) compiles to reasonable code.
I've also booted a kernel with it - networking still works!

https://godbolt.org/z/K6vY31Gqs

I changed the while (len >= 64) loop into an
if (len >= 64) do (...) while(len >= 64) one.
But gcc makes a pigs breakfast of compiling it - it optimises
it so that it is while (ptr < lim) but adds a lot of code.
So I've done that by hand.
Then it still makes a meal of it because it refuses to take
'buff' from the final loop iteration.
An assignment to the limit helps.

Then there is the calculation of (8 - (len & 7)) * 8.
gcc prior to 9.2 just negate (len & 7) then use leal 56(,%rs1,8),%rcx.
But later ones and fail to notice.
Even given (64 + 8 * -(len & 7)) clang fails to use leal.

I'm not even sure the code clang generates is right:
(%rsi is (len & 7))
        movq    -8(%rsi,%rax), %rdx
        leal    (,%rsi,8), %ecx
        andb    $56, %cl
        negb    %cl
        shrq    %cl, %rdx

The 'negb' is the wrong size of the 'andb'.
It might be ok if it is assuming the cpu ignores the high 2 bits of %cl.
But that is a horrid assumption to be making.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

      parent reply	other threads:[~2021-12-14 12:36 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-13 18:00 [PATCH] lib/x86: Optimise csum_partial of buffers that are not multiples of 8 bytes David Laight
2021-12-13 18:40 ` Alexander Duyck
2021-12-13 22:52   ` David Laight
2021-12-13 18:45 ` Eric Dumazet
2021-12-13 19:23   ` Alexander Duyck
2021-12-14 12:36 ` David Laight [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3107b1e365f34df080feefb68be8a422@AcuMS.aculab.com \
    --to=david.laight@aculab.com \
    --cc=alexanderduyck@fb.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=edumazet@google.com \
    --cc=goldstein.w.n@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.