* Re: [PATCH] Optimize prefixcmp()
2007-12-29 19:22 ` [PATCH] Optimize prefixcmp() Johannes Schindelin
@ 2007-12-29 20:39 ` Marco Costalba
2007-12-29 22:15 ` Johannes Schindelin
2007-12-29 21:54 ` Andy Parkins
` (2 subsequent siblings)
3 siblings, 1 reply; 20+ messages in thread
From: Marco Costalba @ 2007-12-29 20:39 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Git Mailing List
On Dec 29, 2007 8:22 PM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>
> Not only does it avoid the strlen() call also for longer prefixes;
> it also avoids a C++ comment.
>
Avoiding a C++ comment is good ;-) sorry, it slipped to me.
What your patch does not seem to avoid is a segfault if prefix or str
are NULL pointers.
Marco
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Optimize prefixcmp()
2007-12-29 20:39 ` Marco Costalba
@ 2007-12-29 22:15 ` Johannes Schindelin
2007-12-29 22:44 ` Marco Costalba
2007-12-30 13:02 ` Marco Costalba
0 siblings, 2 replies; 20+ messages in thread
From: Johannes Schindelin @ 2007-12-29 22:15 UTC (permalink / raw)
To: Marco Costalba; +Cc: Git Mailing List
Hi,
On Sat, 29 Dec 2007, Marco Costalba wrote:
> What your patch does not seem to avoid is a segfault if prefix or str
> are NULL pointers.
I am quite certain that it is not allowed to pass NULL pointers to strcmp,
and even if it was, I maintain that it is bad style.
FWIW the test suite seems to agree with me, as it passes with my patch.
However, since you already seem to have a profiling setup ready, I would
be interested in some numbers, i.e. if this patch is faster for you or
slower, or shows no effect at all.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Optimize prefixcmp()
2007-12-29 22:15 ` Johannes Schindelin
@ 2007-12-29 22:44 ` Marco Costalba
2007-12-30 13:02 ` Marco Costalba
1 sibling, 0 replies; 20+ messages in thread
From: Marco Costalba @ 2007-12-29 22:44 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Git Mailing List
On Dec 29, 2007 11:15 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> However, since you already seem to have a profiling setup ready, I would
> be interested in some numbers, i.e. if this patch is faster for you or
> slower, or shows no effect at all.
>
Ok. I will do some tests with your patch and I'll let you know.
Marco
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Optimize prefixcmp()
2007-12-29 22:15 ` Johannes Schindelin
2007-12-29 22:44 ` Marco Costalba
@ 2007-12-30 13:02 ` Marco Costalba
2007-12-30 13:55 ` Pierre Habouzit
2007-12-30 15:54 ` Johannes Schindelin
1 sibling, 2 replies; 20+ messages in thread
From: Marco Costalba @ 2007-12-30 13:02 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Git Mailing List
On Dec 29, 2007 11:15 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> However, since you already seem to have a profiling setup ready, I would
> be interested in some numbers, i.e. if this patch is faster for you or
> slower, or shows no effect at all.
>
Yes Johannes, your patch is faster then mine ;-)
These are the results tested on Linux tree:
Vanilla
[marco@localhost linux-2.6]$ time git log --topo-order --no-color
--parents -z --log-size --boundary
--pretty=format:"%m%HX%PX%n%an<%ae>%n%at%n%s%n%b" HEAD > /dev/null
3.61user 0.09system 0:03.70elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+27155minor)pagefaults 0swaps
Marco's path
[marco@localhost linux-2.6]$ time git log --topo-order --no-color
--parents -z --log-size --boundary
--pretty=format:"%m%HX%PX%n%an<%ae>%n%at%n%s%n%b" HEAD > /dev/null
3.21user 0.08system 0:03.30elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+27154minor)pagefaults 0swaps
Johannes's patch
[marco@localhost linux-2.6]$ time git log --topo-order --no-color
--parents -z --log-size --boundary
--pretty=format:"%m%HX%PX%n%an<%ae>%n%at%n%s%n%b" HEAD > /dev/null
2.92user 0.08system 0:03.01elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+27155minor)pagefaults 0swaps
But that's not the end of the story....
After profiling I have found a better yet patch :-)
-------------------- CUT ABOVE --------------------
Subject: [PATCH] Certain codepaths (notably "git log --pretty=format...") use
prefixcmp() extensively, with very short prefixes. In those cases,
calling strlen() is a wasteful operation, so avoid it.
Initial patch by Johannes Schindelin.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
---
git-compat-util.h | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/git-compat-util.h b/git-compat-util.h
index 79eb10e..843a8f5 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -398,7 +398,16 @@ static inline int sane_case(int x, int high)
static inline int prefixcmp(const char *str, const char *prefix)
{
- return strncmp(str, prefix, strlen(prefix));
+ do {
+ if (*str != *prefix)
+ return *(unsigned const char *)prefix - *(unsigned const char *)str;
+
+ if (!*(++prefix))
+ return 0;
+
+ str++;
+
+ } while (1);
}
static inline int strtoul_ui(char const *s, int base, unsigned int *result)
--
1.5.4.rc2-dirty
BTW the results with this profiled patch are the followings:
Marco's patch TAKE 2 (profiled one)
[marco@localhost linux-2.6]$ time git log --topo-order --no-color
--parents -z --log-size --boundary
--pretty=format:"%m%HX%PX%n%an<%ae>%n%at%n%s%n%b" HEAD > /dev/null
2.89user 0.07system 0:02.96elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+27154minor)pagefaults 0swaps
Not a big improvement, but an improvement in any case because the
check for (*prefix==0) and for (*str != *prefix) are swapped regarding
your patch, this means that in the common case of a failing match (as
happens where you are looking for a specific prefix in a string
vector) with this patch you avoid the (*prefix==0) comparison because
prefixcmp() exsits just after the (*str != *prefix).
Of course we need that the *prefix is not "", but we have already
ruled out prefix == NULL, so It does not seem a biggie...
Thanks...it was very fun!
Marco
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] Optimize prefixcmp()
2007-12-30 13:02 ` Marco Costalba
@ 2007-12-30 13:55 ` Pierre Habouzit
2007-12-30 13:58 ` Pierre Habouzit
2007-12-30 15:54 ` Johannes Schindelin
1 sibling, 1 reply; 20+ messages in thread
From: Pierre Habouzit @ 2007-12-30 13:55 UTC (permalink / raw)
To: Marco Costalba; +Cc: Johannes Schindelin, Git Mailing List
[-- Attachment #1: Type: text/plain, Size: 1453 bytes --]
On Sun, Dec 30, 2007 at 01:02:28PM +0000, Marco Costalba wrote:
> Subject: [PATCH] Certain codepaths (notably "git log --pretty=format...") use
>
> prefixcmp() extensively, with very short prefixes. In those cases,
> calling strlen() is a wasteful operation, so avoid it.
>
> Initial patch by Johannes Schindelin.
>
> Signed-off-by: Marco Costalba <mcostalba@gmail.com>
> ---
> git-compat-util.h | 11 ++++++++++-
> 1 files changed, 10 insertions(+), 1 deletions(-)
>
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 79eb10e..843a8f5 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -398,7 +398,16 @@ static inline int sane_case(int x, int high)
>
> static inline int prefixcmp(const char *str, const char *prefix)
> {
> - return strncmp(str, prefix, strlen(prefix));
> + do {
> + if (*str != *prefix)
> + return *(unsigned const char *)prefix - *(unsigned const char *)str;
> +
> + if (!*(++prefix))
> + return 0;
> +
> + str++;
> +
> + } while (1);
This code doesn't work if prefix is "". You want something like:
for (; *prefix; prefix++, str++) {
if (*str != *prefix)
return *(unsigned const char *)prefix - *(unsigned const char *)str;
}
return 0;
--
·O· Pierre Habouzit
··O madcoder@debian.org
OOO http://www.madism.org
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Optimize prefixcmp()
2007-12-30 13:55 ` Pierre Habouzit
@ 2007-12-30 13:58 ` Pierre Habouzit
2007-12-30 14:50 ` Marco Costalba
0 siblings, 1 reply; 20+ messages in thread
From: Pierre Habouzit @ 2007-12-30 13:58 UTC (permalink / raw)
To: Marco Costalba, Johannes Schindelin, Git Mailing List
[-- Attachment #1: Type: text/plain, Size: 1748 bytes --]
On Sun, Dec 30, 2007 at 01:55:57PM +0000, Pierre Habouzit wrote:
> On Sun, Dec 30, 2007 at 01:02:28PM +0000, Marco Costalba wrote:
> > Subject: [PATCH] Certain codepaths (notably "git log --pretty=format...") use
> >
> > prefixcmp() extensively, with very short prefixes. In those cases,
> > calling strlen() is a wasteful operation, so avoid it.
> >
> > Initial patch by Johannes Schindelin.
> >
> > Signed-off-by: Marco Costalba <mcostalba@gmail.com>
> > ---
> > git-compat-util.h | 11 ++++++++++-
> > 1 files changed, 10 insertions(+), 1 deletions(-)
> >
> > diff --git a/git-compat-util.h b/git-compat-util.h
> > index 79eb10e..843a8f5 100644
> > --- a/git-compat-util.h
> > +++ b/git-compat-util.h
> > @@ -398,7 +398,16 @@ static inline int sane_case(int x, int high)
> >
> > static inline int prefixcmp(const char *str, const char *prefix)
> > {
> > - return strncmp(str, prefix, strlen(prefix));
> > + do {
> > + if (*str != *prefix)
> > + return *(unsigned const char *)prefix - *(unsigned const char *)str;
> > +
> > + if (!*(++prefix))
> > + return 0;
> > +
> > + str++;
> > +
> > + } while (1);
>
> This code doesn't work if prefix is "". You want something like:
>
> for (; *prefix; prefix++, str++) {
> if (*str != *prefix)
> return *(unsigned const char *)prefix - *(unsigned const char *)str;
> }
> return 0;
Which happens to be basically the same than what Dscho wrote, though I
suppose the compiler can compile that more efficiently than his code.
--
·O· Pierre Habouzit
··O madcoder@debian.org
OOO http://www.madism.org
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Optimize prefixcmp()
2007-12-30 13:58 ` Pierre Habouzit
@ 2007-12-30 14:50 ` Marco Costalba
2007-12-30 15:17 ` Marco Costalba
0 siblings, 1 reply; 20+ messages in thread
From: Marco Costalba @ 2007-12-30 14:50 UTC (permalink / raw)
To: Pierre Habouzit, Marco Costalba, Johannes Schindelin,
Git Mailing List
On Dec 30, 2007 2:58 PM, Pierre Habouzit <madcoder@debian.org> wrote:
> >
> > This code doesn't work if prefix is "". You want something like:
> >
> > for (; *prefix; prefix++, str++) {
> > if (*str != *prefix)
> > return *(unsigned const char *)prefix - *(unsigned const char *)str;
> > }
> > return 0;
>
> Which happens to be basically the same than what Dscho wrote, though I
> suppose the compiler can compile that more efficiently than his code.
>
Yes, your version covers the *prefix == "" case too. If this case is
important for us we could use something as
static inline int prefixcmp(const char *str, const char *prefix)
{
do {
if (*str != *prefix)
return (!*prefix ? 0 : *(unsigned const char *)prefix - *(unsigned
const char *)str);
if (!*(++prefix))
return 0;
str++;
} while (1);
}
But your code is *surely* nicer then this one. But, for unknown
reasons, this code happens to be faster, probably as you say the
compiler optimizes away the second check in the return statement so
that this version is slightly faster then the 'for' loop one, but
admitelly we are going to much in the academic now.
If *prefix == "" case is to be considered I vote for your/Johannes
version because it's "better code" (tm).
Marco
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Optimize prefixcmp()
2007-12-30 14:50 ` Marco Costalba
@ 2007-12-30 15:17 ` Marco Costalba
0 siblings, 0 replies; 20+ messages in thread
From: Marco Costalba @ 2007-12-30 15:17 UTC (permalink / raw)
To: Pierre Habouzit, Marco Costalba, Johannes Schindelin,
Git Mailing List
On Dec 30, 2007 3:50 PM, Marco Costalba <mcostalba@gmail.com> wrote:
>
> If *prefix == "" case is to be considered I vote for your/Johannes
> version because it's "better code" (tm).
>
Ok this is fast and correct
static inline int prefixcmp(const char *str, const char *prefix)
{
while (*str == *prefix && *prefix)
str++, prefix++;
return (*prefix ? *(unsigned const char *)prefix - *(unsigned const
char *)str : 0);
}
This is the last one, I promise ;-)
Marco
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Optimize prefixcmp()
2007-12-30 13:02 ` Marco Costalba
2007-12-30 13:55 ` Pierre Habouzit
@ 2007-12-30 15:54 ` Johannes Schindelin
1 sibling, 0 replies; 20+ messages in thread
From: Johannes Schindelin @ 2007-12-30 15:54 UTC (permalink / raw)
To: Marco Costalba; +Cc: Git Mailing List
Hi,
On Sun, 30 Dec 2007, Marco Costalba wrote:
> Initial patch by Johannes Schindelin.
Not true ;-)
Ciao,
Dscho
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Optimize prefixcmp()
2007-12-29 19:22 ` [PATCH] Optimize prefixcmp() Johannes Schindelin
2007-12-29 20:39 ` Marco Costalba
@ 2007-12-29 21:54 ` Andy Parkins
2007-12-30 0:44 ` Junio C Hamano
2008-01-02 16:59 ` René Scharfe
3 siblings, 0 replies; 20+ messages in thread
From: Andy Parkins @ 2007-12-29 21:54 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Marco Costalba, Git Mailing List
On Saturday 2007, December 29, Johannes Schindelin wrote:
> Not only does it avoid the strlen() call also for longer prefixes;
> it also avoids a C++ comment.
I'm sure it doesn't matter; but they're allowed in C99. So it's not a C++
comment any more :-)
Andy
--
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Optimize prefixcmp()
2007-12-29 19:22 ` [PATCH] Optimize prefixcmp() Johannes Schindelin
2007-12-29 20:39 ` Marco Costalba
2007-12-29 21:54 ` Andy Parkins
@ 2007-12-30 0:44 ` Junio C Hamano
2008-01-02 16:59 ` René Scharfe
3 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2007-12-30 0:44 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Marco Costalba, Git Mailing List
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> Certain codepaths (notably "git log --pretty=format...") use
> prefixcmp() extensively, with very short prefixes. In those cases,
> calling strlen() is a wasteful operation, so avoid it.
>
> Initial patch by Marco Costalba.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>
> On Sat, 29 Dec 2007, Marco Costalba wrote:
>
> > In case the prefix string is a single char avoid a costly call
> > to strlen() + strncmp()
>
> Could you test this patch, please?
>
> Not only does it avoid the strlen() call also for longer prefixes;
> it also avoids a C++ comment.
>
> git-compat-util.h | 6 +++++-
> 1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 79eb10e..7059cbd 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -398,7 +398,11 @@ static inline int sane_case(int x, int high)
>
> static inline int prefixcmp(const char *str, const char *prefix)
> {
> - return strncmp(str, prefix, strlen(prefix));
> + for (; ; str++, prefix++)
> + if (!*prefix)
> + return 0;
> + else if (*str != *prefix)
> + return (unsigned char)*prefix - (unsigned char)*str;
> }
Losing the unnecessary check for !str || !prefix is a good
change.
While I think, for the readability's sake, Marco's original
without the unnecessary check would be the way to go, a profile
from your totally inlined version would also be interesting, as
it may or may not beat the underlying strncmp(), which could be
highly optimized.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Optimize prefixcmp()
2007-12-29 19:22 ` [PATCH] Optimize prefixcmp() Johannes Schindelin
` (2 preceding siblings ...)
2007-12-30 0:44 ` Junio C Hamano
@ 2008-01-02 16:59 ` René Scharfe
2008-01-02 18:52 ` Junio C Hamano
3 siblings, 1 reply; 20+ messages in thread
From: René Scharfe @ 2008-01-02 16:59 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Marco Costalba, Git Mailing List, Junio C Hamano
Johannes Schindelin schrieb:
> Certain codepaths (notably "git log --pretty=format...") use
> prefixcmp() extensively, with very short prefixes. In those cases,
> calling strlen() is a wasteful operation, so avoid it.
> static inline int prefixcmp(const char *str, const char *prefix)
> {
> - return strncmp(str, prefix, strlen(prefix));
> + for (; ; str++, prefix++)
> + if (!*prefix)
> + return 0;
> + else if (*str != *prefix)
> + return (unsigned char)*prefix - (unsigned char)*str;
> }
>
> static inline int strtoul_ui(char const *s, int base, unsigned int *result)
prefixcmp() was already optimized before -- only for a different use
case. At a number of callsites the prefix is a string literal, which
allowed the compiler to perform the strlen() call at compile time.
The patch increases the text size considerably: the file "git" is
2,620,938 without and 2,640,450 with the patch in my build (there are
136 callsites in builtin*.c). The new version of prefixcmp() shouldn't
be inlined any more, as the benefit of doing so is gone.
Is there a portable way to let the preprocessor decide if
prefixcmp_literal() or prefixcmp_generic() is to be used, depending on
the prefix being a string literal or not?
René
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Optimize prefixcmp()
2008-01-02 16:59 ` René Scharfe
@ 2008-01-02 18:52 ` Junio C Hamano
2008-01-03 0:45 ` René Scharfe
0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2008-01-02 18:52 UTC (permalink / raw)
To: René Scharfe; +Cc: Johannes Schindelin, Marco Costalba, Git Mailing List
René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
> prefixcmp() was already optimized before -- only for a different use
> case. At a number of callsites the prefix is a string literal, which
> allowed the compiler to perform the strlen() call at compile time.
>
> The patch increases the text size considerably: the file "git" is
> 2,620,938 without and 2,640,450 with the patch in my build (there are
> 136 callsites in builtin*.c). The new version of prefixcmp() shouldn't
> be inlined any more, as the benefit of doing so is gone.
Yuck, you are absolutely right. The late thread may have been
well intentioned but resulted in this regression. Sorry about
that.
I presume that all callers with constant prefix are outside
performance critical parts? Can we simply uninline the function
in that case?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Optimize prefixcmp()
2008-01-02 18:52 ` Junio C Hamano
@ 2008-01-03 0:45 ` René Scharfe
0 siblings, 0 replies; 20+ messages in thread
From: René Scharfe @ 2008-01-03 0:45 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Schindelin, Marco Costalba, Git Mailing List
Junio C Hamano schrieb:
> René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
>
>> prefixcmp() was already optimized before -- only for a different use
>> case. At a number of callsites the prefix is a string literal, which
>> allowed the compiler to perform the strlen() call at compile time.
>>
>> The patch increases the text size considerably: the file "git" is
>> 2,620,938 without and 2,640,450 with the patch in my build (there are
>> 136 callsites in builtin*.c). The new version of prefixcmp() shouldn't
>> be inlined any more, as the benefit of doing so is gone.
>
> Yuck, you are absolutely right. The late thread may have been
> well intentioned but resulted in this regression. Sorry about
> that.
>
> I presume that all callers with constant prefix are outside
> performance critical parts? Can we simply uninline the function
> in that case?
Most of them seem to be non-critical performance-wise. They are part of
code to parse parameters or config files. Exceptions are the commit
message parsing code used for --pretty=format (which can't be an issue
given that prefixcmp() was made the way it's now to speed up this code
path) and half of the callsites in fast-import.c.
René
^ permalink raw reply [flat|nested] 20+ messages in thread