* Fix silly typo in new builtin grep
@ 2006-05-16 0:54 Linus Torvalds
2006-05-16 1:07 ` Linus Torvalds
2006-05-16 3:18 ` Junio C Hamano
0 siblings, 2 replies; 6+ messages in thread
From: Linus Torvalds @ 2006-05-16 0:54 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
The "-F" flag apparently got mis-translated due to some over-eager
copy-paste work into a duplicate "-H" when using the external grep.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---
Me likee the new built-in grep. The ability to say
git grep __make_request v2.6.13 -- '*.c'
to grep for it in a specific version is well worth the fact that it
obviously ends up being slower than grepping in the currently checked-out
tree. It's doing a hell of a lot more, but despite that it's not at all
that slow.
(In fact, I would say that doing the above command in just 4 seconds is
damn impressive - it's a large code-base, and v2.6.13 is several months,
and over 20 _thousand_ revisions ago).
And now it doesn't have any performance downsides, so it's all upside.
Good job. Pls merge into mainline, I think the "grep per revision" is a
killer feature.
diff --git a/builtin-grep.c b/builtin-grep.c
index 3d6e515..66111de 100644
--- a/builtin-grep.c
+++ b/builtin-grep.c
@@ -455,7 +455,7 @@ static int external_grep(struct grep_opt
push_arg("grep");
push_arg("-H");
if (opt->fixed)
- push_arg("-H");
+ push_arg("-F");
if (opt->linenum)
push_arg("-n");
if (opt->regflags & REG_EXTENDED)
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: Fix silly typo in new builtin grep
2006-05-16 0:54 Fix silly typo in new builtin grep Linus Torvalds
@ 2006-05-16 1:07 ` Linus Torvalds
2006-05-16 2:10 ` Morten Welinder
2006-05-16 3:18 ` Junio C Hamano
1 sibling, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2006-05-16 1:07 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
On Mon, 15 May 2006, Linus Torvalds wrote:
>
> Me likee the new built-in grep. The ability to say
>
> git grep __make_request v2.6.13 -- '*.c'
>
> to grep for it in a specific version is well worth the fact that it
> obviously ends up being slower than grepping in the currently checked-out
> tree. It's doing a hell of a lot more, but despite that it's not at all
> that slow.
Side note: it looks like the version generation really isn't much of a
cost. Grepping in v2.6.13 isn't really noticeably slower than the
"pre-external grep" was for grepping in the checked-out file tree.
So it looks like we _could_ improve the grepping of specific versions
noticeably if we were to have a better regex library that was as optimized
as what the external GNU grep seems to do. The actual revision data
generation doesn't seem to be the biggest cost, and at least in _theory_
we could probably speed things up by a factor of two with a faster regex
library.
That's good to keep in mind. It may be that the glibc regexp is just not
very good, but quite frankly, I would personally not be surprised if it's
better than most (ie windows, for example).
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Fix silly typo in new builtin grep
2006-05-16 1:07 ` Linus Torvalds
@ 2006-05-16 2:10 ` Morten Welinder
2006-05-16 3:27 ` Linus Torvalds
0 siblings, 1 reply; 6+ messages in thread
From: Morten Welinder @ 2006-05-16 2:10 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
If I read the code right, it calls regexec for every single character
on every single line. No wonder that takes a while! Just call it
once and it'll search for its match quite nicely.
If that's not enough, the two obvious optimizations are...
1. If the pattern contains no regexp characters (and that is very
common), do a strstr.
2. If the pattern must start with a specific character, search for that
by itself.
M.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Fix silly typo in new builtin grep
2006-05-16 2:10 ` Morten Welinder
@ 2006-05-16 3:27 ` Linus Torvalds
0 siblings, 0 replies; 6+ messages in thread
From: Linus Torvalds @ 2006-05-16 3:27 UTC (permalink / raw)
To: Morten Welinder; +Cc: Junio C Hamano, Git Mailing List
On Mon, 15 May 2006, Morten Welinder wrote:
>
> If I read the code right, it calls regexec for every single character
> on every single line. No wonder that takes a while! Just call it
> once and it'll search for its match quite nicely.
No, it calls it once per pattern per line.
But yes, it calls it once per line, instead of calling it on some bigger
boundary. Partly because of the line-based output, partly probably because
regexec() is not actually amenable to a "<buffer,size>" kind of usage, but
is based on NUL-terminated strings.
> 1. If the pattern contains no regexp characters (and that is very
> common), do a strstr.
>
> 2. If the pattern must start with a specific character, search for that
> by itself.
Yeah, we could do some simple stuff, and see if it helps..
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Fix silly typo in new builtin grep
2006-05-16 0:54 Fix silly typo in new builtin grep Linus Torvalds
2006-05-16 1:07 ` Linus Torvalds
@ 2006-05-16 3:18 ` Junio C Hamano
2006-05-16 3:32 ` Linus Torvalds
1 sibling, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2006-05-16 3:18 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Linus Torvalds <torvalds@osdl.org> writes:
> The "-F" flag apparently got mis-translated due to some over-eager
> copy-paste work into a duplicate "-H" when using the external grep.
Thanks. I've pushed it out to "master", along with some other
stuff.
> Me likee the new built-in grep. The ability to say
>
> git grep __make_request v2.6.13 -- '*.c'
>
> to grep for it in a specific version is well worth the fact that it
> obviously ends up being slower than grepping in the currently checked-out
> tree. It's doing a hell of a lot more, but despite that it's not at all
> that slow.
>
> (In fact, I would say that doing the above command in just 4 seconds is
> damn impressive - it's a large code-base, and v2.6.13 is several months,
> and over 20 _thousand_ revisions ago).
That is a BS praise and you know it ;-). You do not have delta
chains that are 20k long, so grepping from the tree 10 revs ago
and from the tree 20k revs ago would not make a difference.
It _would_ be impressive to CVS folks, but even there each path
would not have 20k revisions. The kernel patches tend to touch
3 paths per patch on average, so 60k changes over 18k files
distributed unevenly -- my guess (I could count but haven't) is
probably 200 revisions at most for most frequently touched file.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Fix silly typo in new builtin grep
2006-05-16 3:18 ` Junio C Hamano
@ 2006-05-16 3:32 ` Linus Torvalds
0 siblings, 0 replies; 6+ messages in thread
From: Linus Torvalds @ 2006-05-16 3:32 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Mon, 15 May 2006, Junio C Hamano wrote:
> >
> > (In fact, I would say that doing the above command in just 4 seconds is
> > damn impressive - it's a large code-base, and v2.6.13 is several months,
> > and over 20 _thousand_ revisions ago).
>
> That is a BS praise and you know it ;-). You do not have delta
> chains that are 20k long, so grepping from the tree 10 revs ago
> and from the tree 20k revs ago would not make a difference.
Oh, I agree. I meant in a "general version-control sense". I doubt a lot
of other version-control systems could do it. Git can, exactly because
it's whole-file based, and our deltas are limited.
So it's not that "builtin-grep" is wonderful. It's that _git_ is
wonderful, and the builtin-grep just shows one of the end results.
That's why we have killer features. To show off.
(That said, git will slow down a tad too - the pack-file access won't be
as optimized for an old version tree, and so you'll seek around some more
for the cold-cache case).
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-05-16 3:32 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-16 0:54 Fix silly typo in new builtin grep Linus Torvalds
2006-05-16 1:07 ` Linus Torvalds
2006-05-16 2:10 ` Morten Welinder
2006-05-16 3:27 ` Linus Torvalds
2006-05-16 3:18 ` Junio C Hamano
2006-05-16 3:32 ` Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox