All of lore.kernel.org
 help / color / mirror / Atom feed
From: "René Scharfe" <rene.scharfe@lsrfire.ath.cx>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 3/4] diffcore-pickaxe: further refactor count_match()
Date: Sat, 28 Feb 2009 02:13:11 +0100	[thread overview]
Message-ID: <49A88FA7.1020402@lsrfire.ath.cx> (raw)
In-Reply-To: <cd73512d11e63554396983ed4e9556b2d18b3e4a.1235629933.git.gitster@pobox.com>

Junio C Hamano schrieb:
> -static unsigned int count_match(int one_or_more,
> -				struct diff_filespec *one,
> -				const char *needle, unsigned long len,
> -				regex_t *regexp)
> +static unsigned int count_match_internal(int one_or_more,
> +					 const char *data, unsigned long sz,
> +					 const char *needle, unsigned long len,
> +					 regex_t *regexp)
>  {

I don't especially like flags like one_or_more.  Having two functions
(which possibly call a common function that does most of the work) is
nicer.  And it's a bit sad here because there already are two functions
with nice names: count_match() and has_match().  But, well, since the
functions are not exported anyway it doesn't matter much.

> @@ -29,16 +20,14 @@ static unsigned int count_match(int one_or_more,
>  		while (*data && !regexec(regexp, data, 1, &regmatch, flags)) {
>  			flags |= REG_NOTBOL;
>  			data += regmatch.rm_so;
> -			if (*data) data++;
> +			if (*data)
> +				data++;
>  			cnt++;
>  			if (one_or_more)
>  				break;
>  		}
> -
> -	} else { /* Classic exact string match */
> -		/* Yes, I've heard of strstr(), but the thing is *data may
> -		 * not be NUL terminated.  Sue me.
> -		 */
> +	} else {
> +		/* data many not be NUL terminated; we cannot use strstr() */

That looks fishy to me.  regexec() expects data to be a NUL-terminated
string, so either the comment is wrong or the regexp case needs to take
better care to add a NUL at the end of the buffer.

In any case, there is also memmem(), which uses the same fast algorithm
as strstr() in recent glibc versions.  Like this?


diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index f4870b4..4c19967 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -11,7 +11,6 @@ static unsigned int count_match_internal(int one_or_more,
 					 regex_t *regexp)
 {
 	unsigned int cnt = 0;
-	unsigned long offset;
 
 	if (regexp) {
 		regmatch_t regmatch;
@@ -27,15 +26,15 @@ static unsigned int count_match_internal(int one_or_more,
 				break;
 		}
 	} else {
-		/* data many not be NUL terminated; we cannot use strstr() */
-		for (offset = 0; offset + len <= sz; offset++) {
-			/* we count non-overlapping occurrences of needle */
-			if (!memcmp(needle, data + offset, len)) {
-				offset += len - 1;
-				cnt++;
-				if (one_or_more)
-					break;
-			}
+		while (sz) {
+			const char *found = memmem(data, sz, needle, len);
+			if (!found)
+				break;
+			sz -= found - data + len;
+			data = found + len;
+			cnt++;
+			if (one_or_more)
+				break;
 		}
 	}
 	return cnt;

  parent reply	other threads:[~2009-02-28  1:14 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-26  6:52 [PATCH 0/4] Pickaxe search clean-up and optimization Junio C Hamano
2009-02-26  6:52 ` [PATCH 1/4] diffcore-pickaxe: refactor diffcore_pickaxe() Junio C Hamano
2009-02-27 23:58   ` René Scharfe
2009-02-26  6:52 ` [PATCH 2/4] diffcore-pickaxe: micro-optimize has_match() function Junio C Hamano
2009-02-26  6:52 ` [PATCH 3/4] diffcore-pickaxe: further refactor count_match() Junio C Hamano
2009-02-26  7:23   ` Kjetil Barvik
2009-02-28  1:13   ` René Scharfe [this message]
2009-02-28  1:25     ` Junio C Hamano
2009-02-28  6:08       ` Junio C Hamano
2009-02-28 13:10         ` René Scharfe
2009-02-28 17:40           ` Junio C Hamano
2009-02-28 18:15             ` René Scharfe
2009-02-28 19:16             ` [PATCH] import memmem() with linear complexity from Gnulib René Scharfe
2009-02-28 22:44               ` Mike Hommey
2009-03-01  3:41                 ` Jeff King
2009-03-01 11:15                   ` René Scharfe
2009-03-01 18:55                     ` René Scharfe
2009-03-01  7:31           ` [PATCH 3/4] diffcore-pickaxe: further refactor count_match() Junio C Hamano
2009-03-01 10:53             ` René Scharfe
2009-02-26  6:52 ` [PATCH 4/4] diffcore-pickaxe: optimize by trimming common initial and trailing parts Junio C Hamano
2009-02-26  9:05   ` Junio C Hamano
2009-03-02 23:00 ` [PATCH 1/2] diffcore-pickaxe: use memmem() René Scharfe
2009-03-02 23:19   ` [PATCH 2/2] optimize compat/ memmem() René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49A88FA7.1020402@lsrfire.ath.cx \
    --to=rene.scharfe@lsrfire.ath.cx \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.