linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Boaz Harrosh <bharrosh@panasas.com>
To: Nick Piggin <npiggin@gmail.com>
Cc: David Miller <davem@davemloft.net>,
	hooanon05@yahoo.co.jp, npiggin@kernel.dk,
	torvalds@linux-foundation.org, linux-arch@vger.kernel.org,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: Big git diff speedup by avoiding x86 "fast string" memcmp
Date: Thu, 16 Dec 2010 16:03:44 +0200	[thread overview]
Message-ID: <4D0A1C40.7060502@panasas.com> (raw)
In-Reply-To: <AANLkTin3v0BLYAuuqTgEzKgSOpVg4GJaJCEds=2ePrBH@mail.gmail.com>

On 12/16/2010 03:13 PM, Nick Piggin wrote:
> On Thu, Dec 16, 2010 at 8:53 PM, Boaz Harrosh <bharrosh@panasas.com> wrote:
>> On 12/15/2010 08:00 PM, David Miller wrote:
>>> From: Boaz Harrosh <bharrosh@panasas.com>
>>> Date: Wed, 15 Dec 2010 15:15:09 +0200
>>>
>>>> I agree that the byte-compare or long-compare should give you very close
>>>> results in modern pipeline CPUs. But surly 12 increments-and-test should
>>>> show up against 3 (or even 2). I would say it must be a better plan.
>>>
>>> For strings of these lengths the setup code necessary to initialize
>>> the inner loop and the tail code to handle the sub-word ending cases
>>> eliminate whatever gains there are.
>>>
>>
>> You miss understood me. I'm saying that we know the beggining of the
>> string is aligned and Nick offered to pad the last long, so surly
>> a shift by 2 (or 3) + the reduction of the 12 dec-and-test to 3
>> should give you an optimization?
> 
> Masking is still going to take a bit more code, but perhaps. We're talking
> a handful of cycles here, so if you add a branch mispredict, or icache
> miss, you'll kill your savings.
> 
> This is what I've got at the moment, which adds only 8 bytes over the
> rep cmp variant, in the __d_lookup_rcu function.
> 
> static inline int dentry_memcmp(const unsigned char *cs,
>                                const unsigned char *ct, size_t count)
> {
>        int ret;
>        do {
>                ret = (*cs != *ct);
>                if (ret)
>                        break;
>                cs++;
>                ct++;
>                count--;
>        } while (count);
>        return ret;
> }
> 
> Now if we pad and zero fill the dentry name, then we can compare with
> the path string, but we still have to mask that guy (unfortunately, I
> didn't consider that earlier) so it's not trivial and adds quite a bit of code
> size and branches:
> 
> static inline int dentry_memcmp_long(const unsigned char *cs,
>                                const unsigned char *ct, ssize_t count)
> {
>         const unsigned long *ls = (const unsigned long *)cs;
>         const unsigned long *lt = (const unsigned long *)ct;
> 
>        int ret;
>        do {
>                 unsigned long t = *lt;
>                 unsigned long s = *ls;
>                 int shift = 0;
> 
>                 if (count < 8)
>                         shift = (8 - count) * 8;
>                 t = t & (0xffffffffffffffff >> shift);
>                ret = (s != t);
>                if (ret)
>                        break;
>                ls++;
>                lt++;
>                count-=8;
>        } while (count > 0);
>        return ret;
> }
> 
> Something like this should work on little endian. You'd have to coax gcc to
> generate a cmov to get rid of that branch I think, because it won't be
> predictable for small string lengths. But then you have to think about
> icache...

static inline int dentry_memcmp_long(const unsigned char *cs,
                                const unsigned char *ct, ssize_t count)
{
        int ret;
	const unsigned long *ls = (const unsigned long *)cs;
	const unsigned long *lt = (const unsigned long *)ct;

        while (count > 8) {
                ret = (*cs != *ct);
                if (ret)
                        break;
                cs++;
                ct++;
                count-=8;
        }
	if (count) {
		unsigned long t = *ct & ((0xffffffffffffffff >> ((8 - count) * 8))
		ret = (*cs != t)
	}

        return ret;
}

Same as yours but just to avoid the branch inside the loop
and slightly smaller code.

BTW: On some ARCHs ++foo is faster than foo++
     and Also, is there a reason not to write the above loop as:

     while(c-- && (ret = (*cs++ != *ct++)))
		;

Thanks
Boaz

  reply	other threads:[~2010-12-16 14:03 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-09  7:09 Big git diff speedup by avoiding x86 "fast string" memcmp Nick Piggin
2010-12-09 13:37 ` Borislav Petkov
2010-12-10  2:38   ` Nick Piggin
2010-12-10  4:27 ` Nick Piggin
2010-12-10 14:23 ` J. R. Okajima
2010-12-13  1:45   ` Nick Piggin
2010-12-13  7:29     ` J. R. Okajima
2010-12-13  8:25       ` Nick Piggin
2010-12-14 19:01         ` J. R. Okajima
2010-12-15  4:06           ` Nick Piggin
2010-12-15  5:57             ` J. R. Okajima
2010-12-15 13:15             ` Boaz Harrosh
2010-12-15 18:00               ` David Miller
2010-12-16  9:53                 ` Boaz Harrosh
2010-12-16 13:13                   ` Nick Piggin
2010-12-16 14:03                     ` Boaz Harrosh [this message]
2010-12-16 14:15                       ` Nick Piggin
2010-12-16 16:51                   ` Linus Torvalds
2010-12-16 17:57                   ` David Miller
2010-12-15  4:38         ` Américo Wang
2010-12-15  5:54           ` Nick Piggin
2010-12-15  7:12             ` Linus Torvalds
2010-12-15 23:09 ` Tony Luck
2010-12-16  2:34   ` Nick Piggin
  -- strict thread matches above, loose matches on Subject: below --
2010-12-18 22:54 George Spelvin
2010-12-19 14:28 ` Boaz Harrosh
2010-12-19 15:46 ` Nick Piggin
2010-12-19 17:06   ` George Spelvin
2010-12-21  9:26     ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D0A1C40.7060502@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=davem@davemloft.net \
    --cc=hooanon05@yahoo.co.jp \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@gmail.com \
    --cc=npiggin@kernel.dk \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).