From: David Kastrup <dak@gnu.org>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
Siddharth Goel <siddharth98391@gmail.com>,
Git Mailing List <git@vger.kernel.org>,
Eric Sunshine <sunshine@sunshineco.com>
Subject: Re: [PATCH v3] skip_prefix: rewrite so that prefix is scanned once
Date: Tue, 04 Mar 2014 10:18:26 +0100 [thread overview]
Message-ID: <87txbeiaot.fsf@fencepost.gnu.org> (raw)
In-Reply-To: <20140304015819.GA10643@duynguyen-vnpc.dek-tpc.internal> (Duy Nguyen's message of "Tue, 4 Mar 2014 08:58:19 +0700")
Duy Nguyen <pclouds@gmail.com> writes:
> On Tue, Mar 04, 2014 at 01:09:39AM +0100, David Kastrup wrote:
>> Duy Nguyen <pclouds@gmail.com> writes:
>>
>> > On Tue, Mar 4, 2014 at 5:43 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> >> diff --git a/git-compat-util.h b/git-compat-util.h
>> >> index cbd86c3..68ffaef 100644
>> >> --- a/git-compat-util.h
>> >> +++ b/git-compat-util.h
>> >> @@ -357,8 +357,14 @@ extern int suffixcmp(const char *str, const char *suffix);
>> >>
>> >> static inline const char *skip_prefix(const char *str, const char *prefix)
>> >> {
>> >> - size_t len = strlen(prefix);
>> >> - return strncmp(str, prefix, len) ? NULL : str + len;
>> >
>> > Just a note. gcc does optimize strlen("abcdef") to 6, and with that
>> > information at compile time built-in strncmp might do better.
>>
>> Indeed, most (but not all) of the calls have a constant string as
>> prefix. However, strncmp in each iteration checks for both *str as well
>> as *prefix to be different from '\0' independently (and it appears
>> unlikely to me that the optimizer will figure out that it's unnecessary
>> for either) _and_ compares them for equality so it's not likely to be
>> faster than the open-coded loop.
>>
>> One could, however, use memcmp instead of strncmp. I'm just not sure
>> whether memcmp is guaranteed not to peek beyond the first mismatching
>> byte even if the count would allow for more. It could lead to undefined
>> behavior if the first mismatching byte would be the ending NUL byte of
>> str.
>
> It turns out gcc does not generate a call to strncmp either. It
> inlines repz cmpsb instead.
Oh wow. So it _does_ know that it's not necessary to check for a NUL
byte when the length of one argument is already known. I am seriously
impressed.
> I recall we had a discussion long ago about the inefficiency of repz
> and and open-coded loop is preferred,
I think that this mostly applies for Pentium I, possibly also the
dead-ended Pentium Pro architecture (that sort-of translated the x86
opcodes into RISC instructions). I think that later processor variants
(and AMD anyway) got those instructions back to usable shape.
One thing where there was a _lot_ of performance difference between
open-coding and builtin was using memcpy (repz movb) for copying
well-aligned data bytewise rather than copying, say, integer arrays
element-wise.
But since that was egg-on-face material, the hardware got better at it.
And anyway, GCC should know what to pick here. So with GCC being as
smart as that (using the equivalent of memcmp on its own initiative
instead of the full strncmp), I don't think we have a reasonable chance
to beat its performance with an open-coded variant.
> produces this assembly with gcc -O2 (on x86, apparently)
>
> -- 8< --
> 00000000 <main>:
> 0: 55 push %ebp
> 1: b9 03 00 00 00 mov $0x3,%ecx
> 6: 89 e5 mov %esp,%ebp
> 8: 57 push %edi
> 9: bf 00 00 00 00 mov $0x0,%edi
> e: 56 push %esi
> f: 53 push %ebx
> 10: 83 e4 f0 and $0xfffffff0,%esp
> 13: 83 ec 10 sub $0x10,%esp
> 16: 8b 45 0c mov 0xc(%ebp),%eax
> 19: 8b 40 04 mov 0x4(%eax),%eax
> 1c: 89 c6 mov %eax,%esi
> 1e: f3 a6 repz cmpsb %es:(%edi),%ds:(%esi)
> 20: 0f 97 c3 seta %bl
> 23: 0f 92 c1 setb %cl
> 26: 83 c0 03 add $0x3,%eax
> 29: 31 d2 xor %edx,%edx
> 2b: 38 cb cmp %cl,%bl
> 2d: 0f 44 d0 cmove %eax,%edx
More like i686 than x86 here.
> 30: 89 14 24 mov %edx,(%esp)
> 33: e8 fc ff ff ff call 34 <main+0x34>
> 38: 8d 65 f4 lea -0xc(%ebp),%esp
> 3b: 31 c0 xor %eax,%eax
> 3d: 5b pop %ebx
> 3e: 5e pop %esi
> 3f: 5f pop %edi
> 40: 5d pop %ebp
> 41: c3 ret
Well, we won't beat this here.
--
David Kastrup
prev parent reply other threads:[~2014-03-04 9:18 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-03 3:13 [PATCH v3] skip_prefix: rewrite so that prefix is scanned once Siddharth Goel
2014-03-03 19:05 ` Junio C Hamano
2014-03-03 22:43 ` Junio C Hamano
2014-03-03 23:22 ` David Kastrup
2014-03-03 23:35 ` Junio C Hamano
2014-03-03 23:37 ` Duy Nguyen
2014-03-04 0:09 ` David Kastrup
2014-03-04 1:58 ` Duy Nguyen
2014-03-04 9:18 ` David Kastrup [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87txbeiaot.fsf@fencepost.gnu.org \
--to=dak@gnu.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
--cc=siddharth98391@gmail.com \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.