public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: David Laight <David.Laight@ACULAB.COM>
To: 'Linus Torvalds' <torvalds@linux-foundation.org>,
	Nikolay Borisov <nborisov@suse.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Dave Chinner <david@fromorbit.com>
Subject: RE: [PATCH] lib/string: Bring optimized memcmp from glibc
Date: Fri, 23 Jul 2021 14:02:18 +0000	[thread overview]
Message-ID: <c31eb59df5f8426aaf0009ab15587cee@AcuMS.aculab.com> (raw)
In-Reply-To: <CAHk-=wgMyXh3gGuSzj_Dgw=Gn_XPxGSTPq6Pz7dEyx6JNuAh9g@mail.gmail.com>

From: Linus Torvalds
> Sent: 21 July 2021 19:46
> 
> On Wed, Jul 21, 2021 at 11:17 AM Nikolay Borisov <nborisov@suse.com> wrote:
> >
> > I find it somewhat arbitrary that we choose to align the 2nd pointer and
> > not the first.
> 
> Yeah, that's a bit odd, but I don't think it matters.
> 
> The hope is obviously that they are mutually aligned, and in that case
> it doesn't matter which one you aim to align.
> 
> > So you are saying that the current memcmp could indeed use improvement
> > but you don't want it to be based on the glibc's code due to the ugly
> > misalignment handling?
> 
> Yeah. I suspect that this (very simple) patch gives you the same
> performance improvement that the glibc code does.
> 
> NOTE! I'm not saying this patch is perfect. This one doesn't even
> _try_ to do the mutual alignment, because it's really silly. But I'm
> throwing this out here for discussion, because
> 
>  - it's really simple
> 
>  - I suspect it gets you 99% of the way there
> 
>  - the code generation is actually quite good with both gcc and clang.
> This is gcc:
> 
>         memcmp:
>                 jmp     .L60
>         .L52:
>                 movq    (%rsi), %rax
>                 cmpq    %rax, (%rdi)
>                 jne     .L53
>                 addq    $8, %rdi
>                 addq    $8, %rsi
>                 subq    $8, %rdx
>         .L60:
>                 cmpq    $7, %rdx
>                 ja      .L52

I wonder how fast that can be made to run.
I think the two conditional branches have to run in separate clocks.
So you may get all 5 arithmetic operations to run in the same 2 clocks.
But that may be pushing things on everything except the very latest cpu.
The memory reads aren't limiting at all, the cpu can do two per clock.
So even though (IIRC) misaligned ones cost an extra clock it doesn't matter.

That looks like a +dst++ = *src++ loop.
The array copy dst[i] = src[i]; i++ requires one less 'addq'
provided the cpu has 'register + register' addressing.
Not decrementing the length also saves an 'addq'.
So the loop:
	for (i = 0; i < length - 7; i += 8)
		dst[i] = src[i];  /* Hacked to be right in C */
probably only has one addq and one cmpq per iteration.
That is much more likely to run in the 2 clocks.
(If you can persuade gcc not to transform it!)

It may also be possible to remove the cmpq by arranging
that the flags from the addq contain the right condition.
That needs something like:
	dst += len; src += len; len = -len
	do
		dst[len] = src[len];
	while ((len += 8) < 0);
That probably isn't necessary for x86, but is likely to help sparc.

For mips-like cpu (with 'compare and jump', only 'reg + constant'
addressing) you really want a loop like:
	dst_end = dst + length;
	do
		*dst++ = *src++;
	while (dst < dst_end);
This has two adds and a compare per iteration.
That might be a good compromise for aligned copies.

I'm not at all sure is it ever worth aligning either pointer
if misaligned reads don't fault.
Most compares (of any size) will be aligned.
So you get the 'hit' of the test when it cannot help.
That almost certainly exceeds any benefit.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

  parent reply	other threads:[~2021-07-23 14:03 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-21 13:59 [PATCH] lib/string: Bring optimized memcmp from glibc Nikolay Borisov
2021-07-21 14:32 ` Christoph Hellwig
2021-07-21 14:35   ` Nikolay Borisov
2021-07-21 14:42     ` Christoph Hellwig
2021-07-21 14:51       ` Matthew Wilcox
2021-07-21 15:17         ` Peter.Enderborg
2021-07-21 15:34           ` Matthew Wilcox
2021-07-21 15:39             ` Peter.Enderborg
2021-07-21 18:00 ` Linus Torvalds
2021-07-21 18:17   ` Nikolay Borisov
2021-07-21 18:45     ` Linus Torvalds
2021-07-21 18:55       ` Linus Torvalds
2021-07-21 19:26       ` Linus Torvalds
2021-07-22 11:28         ` Nikolay Borisov
2021-07-22 16:40           ` Linus Torvalds
2021-07-22 17:03             ` Nikolay Borisov
2021-08-26  9:03             ` Nikolay Borisov
2021-07-22  8:28       ` Nikolay Borisov
2021-07-23 14:02       ` David Laight [this message]
2021-07-21 20:10   ` David Sterba
2021-07-21 20:27     ` Linus Torvalds
2021-07-22  5:54       ` Nikolay Borisov
2021-07-28 20:12 ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c31eb59df5f8426aaf0009ab15587cee@AcuMS.aculab.com \
    --to=david.laight@aculab.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nborisov@suse.com \
    --cc=ndesaulniers@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox