From: "René Scharfe" <rene.scharfe@lsrfire.ath.cx>
To: Jeff King <peff@peff.net>
Cc: Mike Hommey <mh@glandium.org>, Junio C Hamano <gitster@pobox.com>,
git@vger.kernel.org
Subject: Re: [PATCH] import memmem() with linear complexity from Gnulib
Date: Sun, 01 Mar 2009 12:15:01 +0100 [thread overview]
Message-ID: <49AA6E35.40205@lsrfire.ath.cx> (raw)
In-Reply-To: <20090301034123.GC30384@coredump.intra.peff.net>
Jeff King schrieb:
> On Sat, Feb 28, 2009 at 11:44:01PM +0100, Mike Hommey wrote:
>
>>> ---
>>> Makefile | 1 +
>>> compat/memmem.c | 103 +++++++++----
>>> compat/str-two-way.h | 429 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>> 3 files changed, 504 insertions(+), 29 deletions(-)
>> Seeing how much memmem is being used in the codebase, is it really worth?
>
> See earlier in the thread, where "git log -Stoken" is substantially
> faster on Linux versus Windows (but some exact numbers before and after
> on Windows would be nice to have in the commit message).
Yes, and also please see the other numbers I just posted. It's more of
an update -- we took the current memmem() fall-back from glibc, too, and
they switched to this shiny new implementation to avoid the quadratic
complexity of the old one in the meantime.
I was going to say that with a fast memmem() we could convert some
strstr() calls, especially those where we know the lengths of the
strings anyway -- intuitively, memmem() should be faster than strstr()
in that case. However, the following patch on top of the Gnulib import
makes "git grep grep v1.6.1" take 10% *more* time for me (on Windows).
I wonder why.
diff --git a/grep.c b/grep.c
index 062b2b6..66ef171 100644
--- a/grep.c
+++ b/grep.c
@@ -39,6 +39,8 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
{
int err;
+ p->pattern_len = strlen(p->pattern);
+
if (opt->fixed || is_fixed(p->pattern))
p->fixed = 1;
if (opt->regflags & REG_ICASE)
@@ -268,16 +270,17 @@ static void show_name(struct grep_opt *opt, const char *name)
printf("%s%c", name, opt->null_following_name ? '\0' : '\n');
}
-static int fixmatch(const char *pattern, char *line, regmatch_t *match)
+static int fixmatch(const char *pattern, size_t pattern_len, const char *bol,
+ const char *eol, regmatch_t *match)
{
- char *hit = strstr(line, pattern);
+ char *hit = memmem(bol, eol - bol, pattern, pattern_len);
if (!hit) {
match->rm_so = match->rm_eo = -1;
return REG_NOMATCH;
}
else {
- match->rm_so = hit - line;
- match->rm_eo = match->rm_so + strlen(pattern);
+ match->rm_so = hit - bol;
+ match->rm_eo = match->rm_so + pattern_len;
return 0;
}
}
@@ -335,7 +338,7 @@ static int match_one_pattern(struct grep_opt *opt, struct grep_pat *p, char *bol
pmatch, 0);
}
else {
- hit = !fixmatch(p->pattern, bol, pmatch);
+ hit = !fixmatch(p->pattern, p->pattern_len, bol, eol, pmatch);
}
if (hit && opt->word_regexp) {
diff --git a/grep.h b/grep.h
index 5102ce3..2e22ab2 100644
--- a/grep.h
+++ b/grep.h
@@ -28,6 +28,7 @@ struct grep_pat {
int no;
enum grep_pat_token token;
const char *pattern;
+ size_t pattern_len;
enum grep_header_field field;
regex_t regexp;
unsigned fixed:1;
next prev parent reply other threads:[~2009-03-01 11:16 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-26 6:52 [PATCH 0/4] Pickaxe search clean-up and optimization Junio C Hamano
2009-02-26 6:52 ` [PATCH 1/4] diffcore-pickaxe: refactor diffcore_pickaxe() Junio C Hamano
2009-02-27 23:58 ` René Scharfe
2009-02-26 6:52 ` [PATCH 2/4] diffcore-pickaxe: micro-optimize has_match() function Junio C Hamano
2009-02-26 6:52 ` [PATCH 3/4] diffcore-pickaxe: further refactor count_match() Junio C Hamano
2009-02-26 7:23 ` Kjetil Barvik
2009-02-28 1:13 ` René Scharfe
2009-02-28 1:25 ` Junio C Hamano
2009-02-28 6:08 ` Junio C Hamano
2009-02-28 13:10 ` René Scharfe
2009-02-28 17:40 ` Junio C Hamano
2009-02-28 18:15 ` René Scharfe
2009-02-28 19:16 ` [PATCH] import memmem() with linear complexity from Gnulib René Scharfe
2009-02-28 22:44 ` Mike Hommey
2009-03-01 3:41 ` Jeff King
2009-03-01 11:15 ` René Scharfe [this message]
2009-03-01 18:55 ` René Scharfe
2009-03-01 7:31 ` [PATCH 3/4] diffcore-pickaxe: further refactor count_match() Junio C Hamano
2009-03-01 10:53 ` René Scharfe
2009-02-26 6:52 ` [PATCH 4/4] diffcore-pickaxe: optimize by trimming common initial and trailing parts Junio C Hamano
2009-02-26 9:05 ` Junio C Hamano
2009-03-02 23:00 ` [PATCH 1/2] diffcore-pickaxe: use memmem() René Scharfe
2009-03-02 23:19 ` [PATCH 2/2] optimize compat/ memmem() René Scharfe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49AA6E35.40205@lsrfire.ath.cx \
--to=rene.scharfe@lsrfire.ath.cx \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mh@glandium.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.