Re: Big git diff speedup by avoiding x86 "fast string" memcmp

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Nick Piggin <npiggin@gmail.com>
To: George Spelvin <linux@horizon.com>
Cc: bharrosh@panasas.com, linux-arch@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Big git diff speedup by avoiding x86 "fast string" memcmp
Date: Mon, 20 Dec 2010 02:46:07 +1100	[thread overview]
Message-ID: <AANLkTikwUEBd1Ez75pQ7gw7ShZitwQOQcH6CvCdJV2np@mail.gmail.com> (raw)
In-Reply-To: <20101218225436.28264.qmail@science.horizon.com>

On Sun, Dec 19, 2010 at 9:54 AM, George Spelvin <linux@horizon.com> wrote:
>> static inline int dentry_memcmp_long(const unsigned char *cs,
>>                               const unsigned char *ct, ssize_t count)
>> {
>>       int ret;
>>       const unsigned long *ls = (const unsigned long *)cs;
>>       const unsigned long *lt = (const unsigned long *)ct;
>>
>>       while (count > 8) {
>>               ret = (*cs != *ct);
>>               if (ret)
>>                       break;
>>               cs++;
>>               ct++;
>>               count-=8;
>>       }
>>       if (count) {
>>               unsigned long t = *ct & ((0xffffffffffffffff >> ((8 - count) * 8))
>>               ret = (*cs != t)
>>       }
>>
>>       return ret;
>> }
>
> First, let's get the code right, and use correct types, but also, there

You still used the wrong vars in the loop.

> are some tricks to reduce the masking cost.
>
> As long as you have to mask one string, *and* don't have to worry about
> running off the end of mapped memory, there's no additional cost to
> masking both in the loop.  Just test (a ^ b) & mask.

Using a lookup table I considered, but maybe not well enough. It is
another cacheline, but common to all lookups. So it could well be
worth it, let's keep your code around...

The big problem for CPUs that don't do well on this type of code is
what the string goes through during the entire syscall.

First, a byte-by-byte strcpy_from_user of the whole name string to
kernel space. Then a byte-by-byte chunking and hashing component
paths according to '/'. Then a byte-by-byte memcmp against the
dentry name.

I'd love to do everything with 8 byte loads, do the component
separation and hashing at the same time as copy from user, and
have the padded and aligned component strings and their hash
available... but complexity.

On my Westmere system, time to do a stat is 640 cycles plus 10
cycles for every byte in the string (this cost holds perfectly
from 1 byte name up to 32 byte names in my test range).
`git diff` average path name strings are 31 bytes, although this
is much less cache friendly, and over several components (my
test is just a single component).

But still, even if the base cost were doubled, it may still
spend 20% or so kernel cycles in name string handling.

This 8 byte memcpy takes my microbenchmark down to 8 cycles per
byte, so it may get several more % on git diff.

A careful thinking about the initial strcpy_from_user, and
hashing code could shave another few cycles off it. Well
worth investigating I think.

next prev parent reply	other threads:[~2010-12-19 15:46 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-18 22:54 Big git diff speedup by avoiding x86 "fast string" memcmp George Spelvin
2010-12-18 22:54 ` George Spelvin
2010-12-19 14:28 ` Boaz Harrosh
2010-12-19 14:28   ` Boaz Harrosh
2010-12-19 15:46 ` Nick Piggin [this message]
2010-12-19 17:06   ` George Spelvin
2010-12-21  9:26     ` Nick Piggin
  -- strict thread matches above, loose matches on Subject: below --
2010-12-09  7:09 Nick Piggin
2010-12-09 13:37 ` Borislav Petkov
2010-12-09 13:37   ` Borislav Petkov
2010-12-10  2:38   ` Nick Piggin
2010-12-10  2:38     ` Nick Piggin
2010-12-10  4:27 ` Nick Piggin
2010-12-10 14:23 ` J. R. Okajima
2010-12-13  1:45   ` Nick Piggin
2010-12-13  7:29     ` J. R. Okajima
2010-12-13  8:25       ` Nick Piggin
2010-12-13  8:25         ` Nick Piggin
2010-12-14 19:01         ` J. R. Okajima
2010-12-15  4:06           ` Nick Piggin
2010-12-15  5:57             ` J. R. Okajima
2010-12-15 13:15             ` Boaz Harrosh
2010-12-15 18:00               ` David Miller
2010-12-16  9:53                 ` Boaz Harrosh
2010-12-16 13:13                   ` Nick Piggin
2010-12-16 14:03                     ` Boaz Harrosh
2010-12-16 14:15                       ` Nick Piggin
2010-12-16 16:51                   ` Linus Torvalds
2010-12-16 17:57                   ` David Miller
2010-12-16 17:57                     ` David Miller
2010-12-15  4:38         ` Américo Wang
2010-12-15  4:38           ` Américo Wang
2010-12-15  5:54           ` Nick Piggin
2010-12-15  7:12             ` Linus Torvalds
2010-12-15 23:09 ` Tony Luck
2010-12-16  2:34   ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTikwUEBd1Ez75pQ7gw7ShZitwQOQcH6CvCdJV2np@mail.gmail.com \
    --to=npiggin@gmail.com \
    --cc=bharrosh@panasas.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@horizon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).