Re: Big git diff speedup by avoiding x86 "fast string" memcmp

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Boaz Harrosh <bharrosh@panasas.com>
To: Nick Piggin <npiggin@gmail.com>
Cc: David Miller <davem@davemloft.net>,
	hooanon05@yahoo.co.jp, npiggin@kernel.dk,
	torvalds@linux-foundation.org, linux-arch@vger.kernel.org,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: Big git diff speedup by avoiding x86 "fast string" memcmp
Date: Thu, 16 Dec 2010 16:03:44 +0200	[thread overview]
Message-ID: <4D0A1C40.7060502@panasas.com> (raw)
In-Reply-To: <AANLkTin3v0BLYAuuqTgEzKgSOpVg4GJaJCEds=2ePrBH@mail.gmail.com>

On 12/16/2010 03:13 PM, Nick Piggin wrote:
> On Thu, Dec 16, 2010 at 8:53 PM, Boaz Harrosh <bharrosh@panasas.com> wrote:
>> On 12/15/2010 08:00 PM, David Miller wrote:
>>> From: Boaz Harrosh <bharrosh@panasas.com>
>>> Date: Wed, 15 Dec 2010 15:15:09 +0200
>>>
>>>> I agree that the byte-compare or long-compare should give you very close
>>>> results in modern pipeline CPUs. But surly 12 increments-and-test should
>>>> show up against 3 (or even 2). I would say it must be a better plan.
>>>
>>> For strings of these lengths the setup code necessary to initialize
>>> the inner loop and the tail code to handle the sub-word ending cases
>>> eliminate whatever gains there are.
>>>
>>
>> You miss understood me. I'm saying that we know the beggining of the
>> string is aligned and Nick offered to pad the last long, so surly
>> a shift by 2 (or 3) + the reduction of the 12 dec-and-test to 3
>> should give you an optimization?
> 
> Masking is still going to take a bit more code, but perhaps. We're talking
> a handful of cycles here, so if you add a branch mispredict, or icache
> miss, you'll kill your savings.
> 
> This is what I've got at the moment, which adds only 8 bytes over the
> rep cmp variant, in the __d_lookup_rcu function.
> 
> static inline int dentry_memcmp(const unsigned char *cs,
>                                const unsigned char *ct, size_t count)
> {
>        int ret;
>        do {
>                ret = (*cs != *ct);
>                if (ret)
>                        break;
>                cs++;
>                ct++;
>                count--;
>        } while (count);
>        return ret;
> }
> 
> Now if we pad and zero fill the dentry name, then we can compare with
> the path string, but we still have to mask that guy (unfortunately, I
> didn't consider that earlier) so it's not trivial and adds quite a bit of code
> size and branches:
> 
> static inline int dentry_memcmp_long(const unsigned char *cs,
>                                const unsigned char *ct, ssize_t count)
> {
>         const unsigned long *ls = (const unsigned long *)cs;
>         const unsigned long *lt = (const unsigned long *)ct;
> 
>        int ret;
>        do {
>                 unsigned long t = *lt;
>                 unsigned long s = *ls;
>                 int shift = 0;
> 
>                 if (count < 8)
>                         shift = (8 - count) * 8;
>                 t = t & (0xffffffffffffffff >> shift);
>                ret = (s != t);
>                if (ret)
>                        break;
>                ls++;
>                lt++;
>                count-=8;
>        } while (count > 0);
>        return ret;
> }
> 
> Something like this should work on little endian. You'd have to coax gcc to
> generate a cmov to get rid of that branch I think, because it won't be
> predictable for small string lengths. But then you have to think about
> icache...

static inline int dentry_memcmp_long(const unsigned char *cs,
                                const unsigned char *ct, ssize_t count)
{
        int ret;
	const unsigned long *ls = (const unsigned long *)cs;
	const unsigned long *lt = (const unsigned long *)ct;

        while (count > 8) {
                ret = (*cs != *ct);
                if (ret)
                        break;
                cs++;
                ct++;
                count-=8;
        }
	if (count) {
		unsigned long t = *ct & ((0xffffffffffffffff >> ((8 - count) * 8))
		ret = (*cs != t)
	}

        return ret;
}

Same as yours but just to avoid the branch inside the loop
and slightly smaller code.

BTW: On some ARCHs ++foo is faster than foo++
     and Also, is there a reason not to write the above loop as:

     while(c-- && (ret = (*cs++ != *ct++)))
		;

Thanks
Boaz

next prev parent reply	other threads:[~2010-12-16 14:03 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-09  7:09 Big git diff speedup by avoiding x86 "fast string" memcmp Nick Piggin
2010-12-09 13:37 ` Borislav Petkov
2010-12-10  2:38   ` Nick Piggin
2010-12-10  2:38   ` Nick Piggin
2010-12-10  2:38     ` Nick Piggin
2010-12-10  4:27 ` Nick Piggin
2010-12-10 14:23 ` J. R. Okajima
2010-12-13  1:45   ` Nick Piggin
2010-12-13  7:29     ` J. R. Okajima
2010-12-13  8:25       ` Nick Piggin
2010-12-14 19:01         ` J. R. Okajima
2010-12-15  4:06           ` Nick Piggin
2010-12-15  5:57             ` J. R. Okajima
2010-12-15 13:15             ` Boaz Harrosh
2010-12-15 18:00               ` David Miller
2010-12-16  9:53                 ` Boaz Harrosh
2010-12-16 13:13                   ` Nick Piggin
2010-12-16 14:03                     ` Boaz Harrosh [this message]
2010-12-16 14:15                       ` Nick Piggin
2010-12-16 16:51                   ` Linus Torvalds
2010-12-16 17:57                   ` David Miller
2010-12-15  4:38         ` Américo Wang
2010-12-15  5:54           ` Nick Piggin
2010-12-15  7:12             ` Linus Torvalds
2010-12-15 23:09 ` Tony Luck
2010-12-16  2:34   ` Nick Piggin
  -- strict thread matches above, loose matches on Subject: below --
2010-12-18 22:54 George Spelvin
2010-12-19 14:28 ` Boaz Harrosh
2010-12-19 15:46 ` Nick Piggin
2010-12-19 17:06   ` George Spelvin
2010-12-21  9:26     ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D0A1C40.7060502@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=davem@davemloft.net \
    --cc=hooanon05@yahoo.co.jp \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@gmail.com \
    --cc=npiggin@kernel.dk \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.