All of lore.kernel.org
 help / color / mirror / Atom feed
From: "René Scharfe" <rene.scharfe@lsrfire.ath.cx>
To: Git Mailing List <git@vger.kernel.org>
Cc: Junio C Hamano <gitster@pobox.com>
Subject: [PATCH 1/2] grep -w: forward to next possible position after rejected match
Date: Sat, 10 Jan 2009 00:08:40 +0100	[thread overview]
Message-ID: <4967D8F8.9070508@lsrfire.ath.cx> (raw)

grep -w accepts matches between non-word characters, only.  If a match
from regexec() doesn't meet this criteria, grep continues its search
after the first character of that match.

We can be a bit smarter here and skip all positions that follow a word
character first, as they can't match our criteria.  This way we can
consume characters quite cheaply and don't need to special-case the
handling of the beginning of a line.

Here's a contrived example command on msysgit (best of five runs):

	$ time git grep -w ...... v1.6.1 >/dev/null

	real    0m1.611s
	user    0m0.000s
	sys     0m0.015s

With the patch it's quite a bit faster:

	$ time git grep -w ...... v1.6.1 >/dev/null

	real    0m1.179s
	user    0m0.000s
	sys     0m0.015s

More common search patterns will gain a lot less, but it's a nice clean
up anyway.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
 grep.c |   11 +++++++----
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/grep.c b/grep.c
index 49e9319..394703b 100644
--- a/grep.c
+++ b/grep.c
@@ -294,7 +294,6 @@ static struct {
 static int match_one_pattern(struct grep_opt *opt, struct grep_pat *p, char *bol, char *eol, enum grep_context ctx)
 {
 	int hit = 0;
-	int at_true_bol = 1;
 	int saved_ch = 0;
 	regmatch_t pmatch[10];
 
@@ -337,7 +336,7 @@ static int match_one_pattern(struct grep_opt *opt, struct grep_pat *p, char *bol
 		 * either end of the line, or at word boundary
 		 * (i.e. the next char must not be a word char).
 		 */
-		if ( ((pmatch[0].rm_so == 0 && at_true_bol) ||
+		if ( ((pmatch[0].rm_so == 0) ||
 		      !word_char(bol[pmatch[0].rm_so-1])) &&
 		     ((pmatch[0].rm_eo == (eol-bol)) ||
 		      !word_char(bol[pmatch[0].rm_eo])) )
@@ -349,10 +348,14 @@ static int match_one_pattern(struct grep_opt *opt, struct grep_pat *p, char *bol
 			/* There could be more than one match on the
 			 * line, and the first match might not be
 			 * strict word match.  But later ones could be!
+			 * Forward to the next possible start, i.e. the
+			 * next position following a non-word char.
 			 */
 			bol = pmatch[0].rm_so + bol + 1;
-			at_true_bol = 0;
-			goto again;
+			while (word_char(bol[-1]) && bol < eol)
+				bol++;
+			if (bol < eol)
+				goto again;
 		}
 	}
 	if (p->token == GREP_PATTERN_HEAD && saved_ch)
-- 
1.6.1

             reply	other threads:[~2009-01-09 23:10 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-09 23:08 René Scharfe [this message]
2009-01-09 23:18 ` [PATCH 2/2] grep: don't call regexec() for fixed strings René Scharfe
2009-01-10 20:37   ` Junio C Hamano
2009-01-12 12:25   ` Mikael Magnusson
2009-01-12 13:33     ` Johannes Schindelin
2009-01-12 15:32   ` Alex Riesen
2009-01-12 19:18     ` René Scharfe
2009-01-13  8:13       ` Junio C Hamano
2009-01-17 15:50         ` [PATCH 1/4] Add ctype test René Scharfe
2009-01-17 15:50         ` [PATCH 2/4] Reformat ctype.c René Scharfe
2009-01-17 15:50         ` [PATCH 3/4] Change NUL char handling of isspecial() René Scharfe
2009-01-17 15:50         ` [PATCH 4/4] Add is_regex_special() René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4967D8F8.9070508@lsrfire.ath.cx \
    --to=rene.scharfe@lsrfire.ath.cx \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.