All of lore.kernel.org
 help / color / mirror / Atom feed
* How to use path limiting (using a glob)?
@ 2009-02-11 19:14 Peter Baumann
  2009-02-11 19:40 ` Linus Torvalds
  2009-02-11 19:48 ` Junio C Hamano
  0 siblings, 2 replies; 6+ messages in thread
From: Peter Baumann @ 2009-02-11 19:14 UTC (permalink / raw)
  To: git

Hallo,

after reading Junio's nice blog today where he explained how to use git grep
efficiently, I saw him using a glob to match for the interesting files:

	 $ git grep -e ';;' -- '*.c'

Is it possible to have the same feature in git diff and the revision
machinery? Because I tried

	$ cd $path_to_your_git_src_dir
	$ git log master -p -- '*.h'
	.... No commit shown 

	$ git diff --name-only v1.5.0  v1.6.0 -- '*.c'

and both don't return anything.

Grettings,
Peter Baumann

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to use path limiting (using a glob)?
  2009-02-11 19:14 How to use path limiting (using a glob)? Peter Baumann
@ 2009-02-11 19:40 ` Linus Torvalds
  2009-02-12 10:27   ` Peter Baumann
  2009-02-11 19:48 ` Junio C Hamano
  1 sibling, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2009-02-11 19:40 UTC (permalink / raw)
  To: Peter Baumann; +Cc: git



On Wed, 11 Feb 2009, Peter Baumann wrote:

> after reading Junio's nice blog today where he explained how to use git grep
> efficiently, I saw him using a glob to match for the interesting files:
> 
> 	 $ git grep -e ';;' -- '*.c'
> 
> Is it possible to have the same feature in git diff and the revision
> machinery?

Not really. Git has two different kinds of path limiters, and they are 
really really different.

 - the "walk current index/directory recursively" kind that "git ls-files" 
   uses, which takes a 'fnmatch()' type path regexp (not a real regexp, 
   but the kind you're used to with shell)

   NOTE! On purpose, we don't set the FNM_PATHNAME, so "*.c" here is 
   different from *.c in shell (it's more like "**.c" in tcsh). IOW, * 
   matches '/' too, and will walk subdirectories.

 - the "revision limiter" pathspec. This is *not* a regexp, it's a pure 
   prefix matcher, for a very simple reason: performance.

> 	$ cd $path_to_your_git_src_dir
> 	$ git log master -p -- '*.h'
> 	.... No commit shown 
> 
> 	$ git diff --name-only v1.5.0  v1.6.0 -- '*.c'
> 
> and both don't return anything.

Yeah, in the revision matcher you can still depend on the shell 
expansion, and it will do _almost_ the right thing. So if you do

	git log master -p *.c

without the quotes, the shell expansion will work, and that in turn will 
give a set of filenames that "git log" will restrict the log to. HOWEVER, 
it's not a real wildcard - it's literally looking at what you have now in 
your current working directory, and saying "give me the logs of those 
pathnames", not "give me the logs of everything ending with .c".

We _could_ make the revision limiter understand fnmatch-style patterns, 
but quite frankly, it's very very expensive - too expensive to be useful 
for big repositories. The point about only matching prefixes is that it 
allows the revision limiter to not even walk into subdirectories that 
don't match, but if you do the "*.c" kind of pattern, now the revision 
code has to look up every tree recursively. That code is also _extremely_ 
performance-critical, so we really don't want to use fnmatch() when we can 
currently use just "memcmp()".

So yes, it's kind of odd how we have two totally different concepts of 
pathname patterns, but it's probably easiest to remember that "'git grep' 
is just special". 

		Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to use path limiting (using a glob)?
  2009-02-11 19:14 How to use path limiting (using a glob)? Peter Baumann
  2009-02-11 19:40 ` Linus Torvalds
@ 2009-02-11 19:48 ` Junio C Hamano
  2009-02-11 21:09   ` Nanako Shiraishi
  1 sibling, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2009-02-11 19:48 UTC (permalink / raw)
  To: Peter Baumann; +Cc: git

Peter Baumann <waste.manager@gmx.de> writes:

> after reading Junio's nice blog today where he explained how to use git grep
> efficiently, I saw him using a glob to match for the interesting files:
>
> 	 $ git grep -e ';;' -- '*.c'
>
> Is it possible to have the same feature in git diff and the revision
> machinery? Because I tried
>
> 	$ cd $path_to_your_git_src_dir
> 	$ git log master -p -- '*.h'
> 	.... No commit shown 
>
> 	$ git diff --name-only v1.5.0  v1.6.0 -- '*.c'
>
> and both don't return anything.

There was a recent discussion on this.  The index family uses glob, the
tree family uses leading-path only.  The one implemented for grep can do
both, and attempts to unify both by providing possibly reusable interface
so that the other two families can be ported to, but we haven't managed to
trick anybody to take up the task ;-).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to use path limiting (using a glob)?
  2009-02-11 19:48 ` Junio C Hamano
@ 2009-02-11 21:09   ` Nanako Shiraishi
  0 siblings, 0 replies; 6+ messages in thread
From: Nanako Shiraishi @ 2009-02-11 21:09 UTC (permalink / raw)
  To: Peter Baumann; +Cc: git, Junio C Hamano

Quoting Junio C Hamano <gitster@pobox.com>:
> Peter Baumann <waste.manager@gmx.de> writes:
>> Hallo,
>>
>> after reading Junio's nice blog today where he explained how to use git grep
>> efficiently, I saw him using a glob to match for the interesting files:
>>
>> 	 $ git grep -e ';;' -- '*.c'
>>
>> Is it possible to have the same feature in git diff and the revision
>> machinery? Because I tried
>>
>> 	$ cd $path_to_your_git_src_dir
>> 	$ git log master -p -- '*.h'
>> 	.... No commit shown 
>>
>> 	$ git diff --name-only v1.5.0  v1.6.0 -- '*.c'
>>
>> and both don't return anything.
> There was a recent discussion on this.  The index family uses glob, the
> tree family uses leading-path only.  The one implemented for grep can do
> both, and attempts to unify both by providing possibly reusable interface
> so that the other two families can be ported to, but we haven't managed to
> trick anybody to take up the task ;-).

The list archive has nicely written summaries on the issues and suggestions on how to make this possible:

    http://article.gmane.org/gmane.comp.version-control.git/94628
    http://thread.gmane.org/gmane.comp.version-control.git/105638/focus=105679

-- 
Nanako Shiraishi, the unofficial project secratary of the git project
http://ivory.ap.teacup.com/nanako3/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to use path limiting (using a glob)?
  2009-02-11 19:40 ` Linus Torvalds
@ 2009-02-12 10:27   ` Peter Baumann
  2009-02-12 11:09     ` Sitaram Chamarty
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Baumann @ 2009-02-12 10:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On Wed, Feb 11, 2009 at 11:40:44AM -0800, Linus Torvalds wrote:
> 
> 
> On Wed, 11 Feb 2009, Peter Baumann wrote:
> 
> > after reading Junio's nice blog today where he explained how to use git grep
> > efficiently, I saw him using a glob to match for the interesting files:
> > 
> > 	 $ git grep -e ';;' -- '*.c'
> > 
> > Is it possible to have the same feature in git diff and the revision
> > machinery?
> 
> Not really. Git has two different kinds of path limiters, and they are 
> really really different.
> 
>  - the "walk current index/directory recursively" kind that "git ls-files" 
>    uses, which takes a 'fnmatch()' type path regexp (not a real regexp, 
>    but the kind you're used to with shell)
> 
>    NOTE! On purpose, we don't set the FNM_PATHNAME, so "*.c" here is 
>    different from *.c in shell (it's more like "**.c" in tcsh). IOW, * 
>    matches '/' too, and will walk subdirectories.
> 

Hm. But if git does only anchor the * at the current directory, wouldn't
this solve (or at least reduce) the performance problems you described in the
later paragraph? Having the "**.c" do a recurisve search for every .c
file would then be used to do a recusrive search. 

>  - the "revision limiter" pathspec. This is *not* a regexp, it's a pure 
>    prefix matcher, for a very simple reason: performance.
> 
> > 	$ cd $path_to_your_git_src_dir
> > 	$ git log master -p -- '*.h'
> > 	.... No commit shown 
> > 
> > 	$ git diff --name-only v1.5.0  v1.6.0 -- '*.c'
> > 
> > and both don't return anything.
> 
> Yeah, in the revision matcher you can still depend on the shell 
> expansion, and it will do _almost_ the right thing. So if you do
> 
> 	git log master -p *.c
> 
> without the quotes, the shell expansion will work, and that in turn will 
> give a set of filenames that "git log" will restrict the log to. HOWEVER, 
> it's not a real wildcard - it's literally looking at what you have now in 
> your current working directory, and saying "give me the logs of those 
> pathnames", not "give me the logs of everything ending with .c".
> 

Ok. Thats actually the reason why I asked for this, because if a file
got removed it wouldn't be found by this.

> We _could_ make the revision limiter understand fnmatch-style patterns, 
> but quite frankly, it's very very expensive - too expensive to be useful 
> for big repositories. The point about only matching prefixes is that it 
> allows the revision limiter to not even walk into subdirectories that 
> don't match, but if you do the "*.c" kind of pattern, now the revision 
> code has to look up every tree recursively. That code is also _extremely_ 
> performance-critical, so we really don't want to use fnmatch() when we can 
> currently use just "memcmp()".
> 
> So yes, it's kind of odd how we have two totally different concepts of 
> pathname patterns, but it's probably easiest to remember that "'git grep' 
> is just special". 
> 
> 		Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to use path limiting (using a glob)?
  2009-02-12 10:27   ` Peter Baumann
@ 2009-02-12 11:09     ` Sitaram Chamarty
  0 siblings, 0 replies; 6+ messages in thread
From: Sitaram Chamarty @ 2009-02-12 11:09 UTC (permalink / raw)
  To: git

On 2009-02-12, Peter Baumann <waste.manager@gmx.de> wrote:
> On Wed, Feb 11, 2009 at 11:40:44AM -0800, Linus Torvalds wrote:

>> On Wed, 11 Feb 2009, Peter Baumann wrote:
>> 
>> > after reading Junio's nice blog today where he
>> > explained how to use git grep efficiently, I saw him
>> > using a glob to match for the interesting files:
>> > 
>> > 	 $ git grep -e ';;' -- '*.c'
>> > 
>> > Is it possible to have the same feature in git diff and the revision
>> > machinery?
>> 
>> Not really. Git has two different kinds of path limiters, and they are 
>> really really different.
>> 
>>  - the "walk current index/directory recursively" kind that "git ls-files" 
>>    uses, which takes a 'fnmatch()' type path regexp (not a real regexp, 
>>    but the kind you're used to with shell)
>> 
>>    NOTE! On purpose, we don't set the FNM_PATHNAME, so "*.c" here is 
>>    different from *.c in shell (it's more like "**.c" in tcsh). IOW, * 
>>    matches '/' too, and will walk subdirectories.
>> 
>
> Hm. But if git does only anchor the * at the current directory, wouldn't
> this solve (or at least reduce) the performance problems you described in the
> later paragraph? Having the "**.c" do a recurisve search for every .c
> file would then be used to do a recusrive search. 

I think Linus meant that it's expensive to look for all *.c
files at any depth in the tree, for every commit in
repository.

You can have either a prefix matcher to limit the search
*within* a tree so you can afford to walk all revs in the
repo, or you stick to just one tree (or a few explicitly
named ones).

You seem to be saying 'fine, I know, and I'm willing to
indicate that I'm accepting this cost by using a different
syntax'.

But the syntax is not the point.  You can certainly do that
right now, if you really wish to.  Just don't try it on a
large repo :-)

    git grep -e pattern $(git rev-list --all) -- *.c

Make suitable modifications to the '--all' in the git
rev-list to limit the revs you want to search.

Regardless of whether there is a simple syntax to support it
or not, this is probably not what you want, most of the time
:-)

Sitaram

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-02-12 11:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-11 19:14 How to use path limiting (using a glob)? Peter Baumann
2009-02-11 19:40 ` Linus Torvalds
2009-02-12 10:27   ` Peter Baumann
2009-02-12 11:09     ` Sitaram Chamarty
2009-02-11 19:48 ` Junio C Hamano
2009-02-11 21:09   ` Nanako Shiraishi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.