git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "René Scharfe" <rene.scharfe@lsrfire.ath.cx>
To: Jeff King <peff@peff.net>
Cc: Mike Hommey <mh@glandium.org>, Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org
Subject: Re: [PATCH] import memmem() with linear complexity from Gnulib
Date: Sun, 01 Mar 2009 19:55:05 +0100	[thread overview]
Message-ID: <49AADA09.6020305@lsrfire.ath.cx> (raw)
In-Reply-To: <49AA6E35.40205@lsrfire.ath.cx>

René Scharfe schrieb:
> I was going to say that with a fast memmem() we could convert some
> strstr() calls, especially those where we know the lengths of the
> strings anyway -- intuitively, memmem() should be faster than strstr()
> in that case.  However, the following patch on top of the Gnulib import
> makes "git grep grep v1.6.1" take 10% *more* time for me (on Windows).
> I wonder why.

More numbers.  "memmem" is the patch I'm replying to which converts
grep's fixed string search from strstr() to memmem().  "quadratic" means
the current version of compat/memmem.c was used (NO_MEMMEM=yes),
"linear" is the same, but with the newer memmem() from Gnulib imported.
 The numbers are elapsed seconds for the following command, run in a
Linux kernel repo:

   git grep grep v2.6.28 >/dev/null

                                   Ubuntu Fedora
   [1] v1.6.2-rc2                    2.99   3.52
   [2] v1.6.2-rc2+memmem             3.09   3.28
   [3] v1.6.2-rc2+memmem+quadratic   8.42   8.50
   [4] v1.6.2-rc2+memmem+linear      3.17   3.25

So, we should be careful when using memmem() as long as we have the
current implementation in compat/, as the quadratic complexity can
result in a significant slowdown ([3]).

The new memmem() indeed is faster than the new strstr(), as expected
(Fedora [2] and [4] vs. Fedora [1]).  Remember, Fedora 10 already
includes the glibc version with the linear implementations while Ubuntu
8.10 doesn't.

I'm not sure if the difference between the old and new memmem() is
significant or in the noise (Ubuntu [2] and [4]).

In any case, and this surprised me, the fastest of them all is the old
strstr() (Ubuntu [1]).  This is consistent with the slowdown observed on
Windows.

What's even more surprising is the difference between Ubuntu [2] and
[3], which use basically the same memmem().  Or so I thought.  Wrong.
The old memmem() in glibc has a small but effective optimization, that
our version lacks.  I don't remember if I cut it out during the initial
import for increased simplicity or if the version the import was based
on didn't have it.  Anyway, with the following patch, case [3] runs as
fast as case [2] on both Fedora and Ubuntu.

Next step would be to check if pickaxe simply needs this short patch or
really the full-blown linear reimplementation.  I'm off to an
appointment, though, so I'll look into this again tomorrow.

René


diff --git a/compat/memmem.c b/compat/memmem.c
index cd0d877..fed5a77 100644
--- a/compat/memmem.c
+++ b/compat/memmem.c
@@ -5,6 +5,7 @@ void *gitmemmem(const void *haystack, size_t haystack_len,
 {
 	const char *begin = haystack;
 	const char *last_possible = begin + haystack_len - needle_len;
+	char tip;
 
 	/*
 	 * The first occurrence of the empty string is deemed to occur at
@@ -20,8 +21,10 @@ void *gitmemmem(const void *haystack, size_t haystack_len,
 	if (haystack_len < needle_len)
 		return NULL;
 
+	tip = *((const char *)needle);
+
 	for (; begin <= last_possible; begin++) {
-		if (!memcmp(begin, needle, needle_len))
+		if (*begin == tip && !memcmp(begin, needle, needle_len))
 			return (void *)begin;
 	}
 

  reply	other threads:[~2009-03-01 18:56 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-26  6:52 [PATCH 0/4] Pickaxe search clean-up and optimization Junio C Hamano
2009-02-26  6:52 ` [PATCH 1/4] diffcore-pickaxe: refactor diffcore_pickaxe() Junio C Hamano
2009-02-27 23:58   ` René Scharfe
2009-02-26  6:52 ` [PATCH 2/4] diffcore-pickaxe: micro-optimize has_match() function Junio C Hamano
2009-02-26  6:52 ` [PATCH 3/4] diffcore-pickaxe: further refactor count_match() Junio C Hamano
2009-02-26  7:23   ` Kjetil Barvik
2009-02-28  1:13   ` René Scharfe
2009-02-28  1:25     ` Junio C Hamano
2009-02-28  6:08       ` Junio C Hamano
2009-02-28 13:10         ` René Scharfe
2009-02-28 17:40           ` Junio C Hamano
2009-02-28 18:15             ` René Scharfe
2009-02-28 19:16             ` [PATCH] import memmem() with linear complexity from Gnulib René Scharfe
2009-02-28 22:44               ` Mike Hommey
2009-03-01  3:41                 ` Jeff King
2009-03-01 11:15                   ` René Scharfe
2009-03-01 18:55                     ` René Scharfe [this message]
2009-03-01  7:31           ` [PATCH 3/4] diffcore-pickaxe: further refactor count_match() Junio C Hamano
2009-03-01 10:53             ` René Scharfe
2009-02-26  6:52 ` [PATCH 4/4] diffcore-pickaxe: optimize by trimming common initial and trailing parts Junio C Hamano
2009-02-26  9:05   ` Junio C Hamano
2009-03-02 23:00 ` [PATCH 1/2] diffcore-pickaxe: use memmem() René Scharfe
2009-03-02 23:19   ` [PATCH 2/2] optimize compat/ memmem() René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49AADA09.6020305@lsrfire.ath.cx \
    --to=rene.scharfe@lsrfire.ath.cx \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mh@glandium.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).