git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: Miles Bader <miles@gnu.org>, Jeff King <peff@peff.net>,
	Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	git@vger.kernel.org
Subject: Re: [PATCH] grep: do not do external grep on skip-worktree entries
Date: Mon, 11 Jan 2010 07:59:18 -0800 (PST)	[thread overview]
Message-ID: <alpine.LFD.2.00.1001110748560.13040@localhost.localdomain> (raw)
In-Reply-To: <alpine.LFD.2.00.1001110739280.13040@localhost.localdomain>



On Mon, 11 Jan 2010, Linus Torvalds wrote:
>
> Without testing it, I can already ACK it. It looks like the 
> ObviouslyRightThing(tm) to do. But I'll run some numbers too.

Ok, some good news, some meh news, and some bad news.

The good news: the trivial numbers look good. It's noticeably faster than 
external grep for me when it does the 'fixmatch()' case, quite probably 
because fixmatch() on at least Linux/x86-64 (which is the only case I 
really care about) uses SSE to do the string ops.

So on my Nehalem:

	[torvalds@nehalem linux]$ time git grep qwerty > /dev/null 

	real	0m0.418s
	user	0m0.204s
	sys	0m0.136s

	[torvalds@nehalem linux]$ time git grep --no-ext-grep qwerty > /dev/null 

	real	0m0.309s
	user	0m0.168s
	sys	0m0.136s

and since that simple fixmatch case is the common one for me, I'm happy.

The meh news: this shows how grep is faster than regexec() due to being a 
smarter algorithm. For the non-fixed case (I used "qwerty.*as"), the 
numbers are

 - built-in:
	real	0m0.548s
	user	0m0.384s
	sys	0m0.152s

 - external:
	real	0m0.415s
	user	0m0.176s
	sys	0m0.160s

so it really is just 'strstr()' that is faster. But This is a 'meh', 
because I don't really care, and the new code is still way faster than the 
old one. And I'd be personally willing to just drop the external grep if 
this is the worst problem.

[ I worry a bit that some libc implementations of 'strstr' may suck, but I 
  wouldn't lose sleep over it. ]

The bad news is that you broke multi-line greps:

	git grep --no-ext-grep -2 qwerty.*as

results in:

	drivers/char/keyboard.c-unsigned char kbd_sysrq_xlate[KEY_MAX + 1] =
	drivers/char/keyboard.c-        "\000\0331234567890-=\177\t"                    /* 0x00 - 0x0f */
	drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"                          /* 0x10 - 0x1f */

when the _correct_ result is 

	drivers/char/keyboard.c-unsigned char kbd_sysrq_xlate[KEY_MAX + 1] =
	drivers/char/keyboard.c-        "\000\0331234567890-=\177\t"                    /* 0x00 - 0x0f */
	drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"                          /* 0x10 - 0x1f */
	drivers/char/keyboard.c-        "dfghjkl;'`\000\\zxcv"                          /* 0x20 - 0x2f */
	drivers/char/keyboard.c-        "bnm,./\000*\000 \000\201\202\203\204\205"      /* 0x30 - 0x3f */

ie it didn't do the "two lines after" thing.

That said, the external grep also gets this wrong (a different way), 
because it gets all the extra noise due to unnecessary separation lines, 
so for the external grep I actually get

	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	drivers/char/keyboard.c-unsigned char kbd_sysrq_xlate[KEY_MAX + 1] =
	drivers/char/keyboard.c-        "\000\0331234567890-=\177\t"                    /* 0x00 - 0x0f */
	drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"                          /* 0x10 - 0x1f */
	drivers/char/keyboard.c-        "dfghjkl;'`\000\\zxcv"                          /* 0x20 - 0x2f */
	drivers/char/keyboard.c-        "bnm,./\000*\000 \000\201\202\203\204\205"      /* 0x30 - 0x3f */
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--

but that's a long-standing problem, and is more "ugly" than "wrong grep 
results".

			Linus

  reply	other threads:[~2010-01-11 15:59 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-30 14:11 [PATCH] grep: do not do external grep on skip-worktree entries Nguyễn Thái Ngọc Duy
2009-12-31  7:01 ` Junio C Hamano
2009-12-31  7:09   ` Junio C Hamano
2010-01-02 11:50     ` Nguyen Thai Ngoc Duy
2010-01-02 18:44       ` Junio C Hamano
2010-01-02 19:15         ` Nguyen Thai Ngoc Duy
2010-01-02 19:45           ` Junio C Hamano
2010-01-03  2:35             ` Miles Bader
2010-01-03  2:47               ` Miles Bader
2010-01-03  3:08                 ` Miles Bader
2010-01-03 19:32                   ` Linus Torvalds
2010-01-03 20:49                     ` Junio C Hamano
2010-01-04  5:31                       ` Jeff King
2010-01-04  5:52                         ` Junio C Hamano
2010-01-04  6:44                           ` Jeff King
2010-01-04  7:08                             ` Junio C Hamano
2010-01-04  7:14                               ` Junio C Hamano
2010-01-04  7:29                                 ` Jeff King
2010-01-04  7:26                               ` Jeff King
2010-01-04  8:09                                 ` Jeff King
2010-01-04 16:01                                   ` Linus Torvalds
2010-01-04 15:54                             ` Linus Torvalds
2010-01-04 15:57                               ` Miles Bader
2010-01-04 16:03                                 ` Linus Torvalds
2010-01-11  6:39                                   ` Junio C Hamano
2010-01-11 15:43                                     ` Linus Torvalds
2010-01-11 15:59                                       ` Linus Torvalds [this message]
2010-01-11 16:22                                         ` Junio C Hamano
2010-01-11 16:24                                           ` Junio C Hamano
2010-01-11 16:33                                           ` Linus Torvalds
2010-01-12  8:29                                             ` Junio C Hamano
2010-01-12  8:31                                               ` [PATCH] grep: lookahead optimization can be used with -L option Junio C Hamano
2010-01-12  8:32                                               ` [PATCH] grep: -L should show empty files Junio C Hamano
2010-01-12 21:27                                                 ` Sverre Rabbelier
2010-01-13  6:56                                                   ` Junio C Hamano
2010-01-13 16:04                                                     ` Sverre Rabbelier
2010-01-13 19:48                                                       ` Junio C Hamano
2010-01-13  6:48                                               ` [PATCH 1/2] grep: rip out support for external grep Junio C Hamano
2010-01-13  8:29                                                 ` Jay Soffian
2010-01-13  8:59                                                   ` Junio C Hamano
2010-01-13 15:20                                                 ` Linus Torvalds
2010-01-13  6:51                                               ` [PATCH 2/2] grep: rip out pessimization to use fixmatch() Junio C Hamano
2010-01-12 16:21                                         ` [PATCH] grep: do not do external grep on skip-worktree entries Jeff King
2010-01-11 19:26                                     ` Fredrik Kuivinen
     [not found]                                     ` <4c8ef71001111119p253170f8q37bcd3708d894a62@mail.gmail.com>
2010-01-11 19:29                                       ` Linus Torvalds
2010-01-11 19:40                                         ` Fredrik Kuivinen
2010-01-11 20:07                                           ` Linus Torvalds
2010-01-11 21:07                                             ` Fredrik Kuivinen
2010-01-11 21:24                                               ` Linus Torvalds
2010-01-04 16:24                               ` Linus Torvalds
2010-01-04 10:14                           ` Nguyen Thai Ngoc Duy
2010-01-04  6:06                     ` Mike Hommey
2010-01-04  7:04                       ` Jeff King
2010-01-04 12:34             ` [PATCH 1/2] t7002: set test prerequisite "external-grep" if supported Nguyễn Thái Ngọc Duy
2010-01-07  2:37               ` Junio C Hamano
2010-01-07  4:29                 ` Junio C Hamano
2010-01-07 13:27                   ` Nguyen Thai Ngoc Duy
2010-01-07 14:04                     ` Johannes Sixt
2010-01-07 14:26                       ` Nguyen Thai Ngoc Duy
2010-01-04 12:34             ` [PATCH 2/2] t7002: add tests for skip-worktree fixes in commit a67e281 Nguyễn Thái Ngọc Duy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.1001110748560.13040@localhost.localdomain \
    --to=torvalds@linux-foundation.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=miles@gnu.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).