* How to use path limiting (using a glob)? @ 2009-02-11 19:14 Peter Baumann 2009-02-11 19:40 ` Linus Torvalds 2009-02-11 19:48 ` Junio C Hamano 0 siblings, 2 replies; 6+ messages in thread From: Peter Baumann @ 2009-02-11 19:14 UTC (permalink / raw) To: git Hallo, after reading Junio's nice blog today where he explained how to use git grep efficiently, I saw him using a glob to match for the interesting files: $ git grep -e ';;' -- '*.c' Is it possible to have the same feature in git diff and the revision machinery? Because I tried $ cd $path_to_your_git_src_dir $ git log master -p -- '*.h' .... No commit shown $ git diff --name-only v1.5.0 v1.6.0 -- '*.c' and both don't return anything. Grettings, Peter Baumann ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: How to use path limiting (using a glob)? 2009-02-11 19:14 How to use path limiting (using a glob)? Peter Baumann @ 2009-02-11 19:40 ` Linus Torvalds 2009-02-12 10:27 ` Peter Baumann 2009-02-11 19:48 ` Junio C Hamano 1 sibling, 1 reply; 6+ messages in thread From: Linus Torvalds @ 2009-02-11 19:40 UTC (permalink / raw) To: Peter Baumann; +Cc: git On Wed, 11 Feb 2009, Peter Baumann wrote: > after reading Junio's nice blog today where he explained how to use git grep > efficiently, I saw him using a glob to match for the interesting files: > > $ git grep -e ';;' -- '*.c' > > Is it possible to have the same feature in git diff and the revision > machinery? Not really. Git has two different kinds of path limiters, and they are really really different. - the "walk current index/directory recursively" kind that "git ls-files" uses, which takes a 'fnmatch()' type path regexp (not a real regexp, but the kind you're used to with shell) NOTE! On purpose, we don't set the FNM_PATHNAME, so "*.c" here is different from *.c in shell (it's more like "**.c" in tcsh). IOW, * matches '/' too, and will walk subdirectories. - the "revision limiter" pathspec. This is *not* a regexp, it's a pure prefix matcher, for a very simple reason: performance. > $ cd $path_to_your_git_src_dir > $ git log master -p -- '*.h' > .... No commit shown > > $ git diff --name-only v1.5.0 v1.6.0 -- '*.c' > > and both don't return anything. Yeah, in the revision matcher you can still depend on the shell expansion, and it will do _almost_ the right thing. So if you do git log master -p *.c without the quotes, the shell expansion will work, and that in turn will give a set of filenames that "git log" will restrict the log to. HOWEVER, it's not a real wildcard - it's literally looking at what you have now in your current working directory, and saying "give me the logs of those pathnames", not "give me the logs of everything ending with .c". We _could_ make the revision limiter understand fnmatch-style patterns, but quite frankly, it's very very expensive - too expensive to be useful for big repositories. The point about only matching prefixes is that it allows the revision limiter to not even walk into subdirectories that don't match, but if you do the "*.c" kind of pattern, now the revision code has to look up every tree recursively. That code is also _extremely_ performance-critical, so we really don't want to use fnmatch() when we can currently use just "memcmp()". So yes, it's kind of odd how we have two totally different concepts of pathname patterns, but it's probably easiest to remember that "'git grep' is just special". Linus ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: How to use path limiting (using a glob)? 2009-02-11 19:40 ` Linus Torvalds @ 2009-02-12 10:27 ` Peter Baumann 2009-02-12 11:09 ` Sitaram Chamarty 0 siblings, 1 reply; 6+ messages in thread From: Peter Baumann @ 2009-02-12 10:27 UTC (permalink / raw) To: Linus Torvalds; +Cc: git On Wed, Feb 11, 2009 at 11:40:44AM -0800, Linus Torvalds wrote: > > > On Wed, 11 Feb 2009, Peter Baumann wrote: > > > after reading Junio's nice blog today where he explained how to use git grep > > efficiently, I saw him using a glob to match for the interesting files: > > > > $ git grep -e ';;' -- '*.c' > > > > Is it possible to have the same feature in git diff and the revision > > machinery? > > Not really. Git has two different kinds of path limiters, and they are > really really different. > > - the "walk current index/directory recursively" kind that "git ls-files" > uses, which takes a 'fnmatch()' type path regexp (not a real regexp, > but the kind you're used to with shell) > > NOTE! On purpose, we don't set the FNM_PATHNAME, so "*.c" here is > different from *.c in shell (it's more like "**.c" in tcsh). IOW, * > matches '/' too, and will walk subdirectories. > Hm. But if git does only anchor the * at the current directory, wouldn't this solve (or at least reduce) the performance problems you described in the later paragraph? Having the "**.c" do a recurisve search for every .c file would then be used to do a recusrive search. > - the "revision limiter" pathspec. This is *not* a regexp, it's a pure > prefix matcher, for a very simple reason: performance. > > > $ cd $path_to_your_git_src_dir > > $ git log master -p -- '*.h' > > .... No commit shown > > > > $ git diff --name-only v1.5.0 v1.6.0 -- '*.c' > > > > and both don't return anything. > > Yeah, in the revision matcher you can still depend on the shell > expansion, and it will do _almost_ the right thing. So if you do > > git log master -p *.c > > without the quotes, the shell expansion will work, and that in turn will > give a set of filenames that "git log" will restrict the log to. HOWEVER, > it's not a real wildcard - it's literally looking at what you have now in > your current working directory, and saying "give me the logs of those > pathnames", not "give me the logs of everything ending with .c". > Ok. Thats actually the reason why I asked for this, because if a file got removed it wouldn't be found by this. > We _could_ make the revision limiter understand fnmatch-style patterns, > but quite frankly, it's very very expensive - too expensive to be useful > for big repositories. The point about only matching prefixes is that it > allows the revision limiter to not even walk into subdirectories that > don't match, but if you do the "*.c" kind of pattern, now the revision > code has to look up every tree recursively. That code is also _extremely_ > performance-critical, so we really don't want to use fnmatch() when we can > currently use just "memcmp()". > > So yes, it's kind of odd how we have two totally different concepts of > pathname patterns, but it's probably easiest to remember that "'git grep' > is just special". > > Linus ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: How to use path limiting (using a glob)? 2009-02-12 10:27 ` Peter Baumann @ 2009-02-12 11:09 ` Sitaram Chamarty 0 siblings, 0 replies; 6+ messages in thread From: Sitaram Chamarty @ 2009-02-12 11:09 UTC (permalink / raw) To: git On 2009-02-12, Peter Baumann <waste.manager@gmx.de> wrote: > On Wed, Feb 11, 2009 at 11:40:44AM -0800, Linus Torvalds wrote: >> On Wed, 11 Feb 2009, Peter Baumann wrote: >> >> > after reading Junio's nice blog today where he >> > explained how to use git grep efficiently, I saw him >> > using a glob to match for the interesting files: >> > >> > $ git grep -e ';;' -- '*.c' >> > >> > Is it possible to have the same feature in git diff and the revision >> > machinery? >> >> Not really. Git has two different kinds of path limiters, and they are >> really really different. >> >> - the "walk current index/directory recursively" kind that "git ls-files" >> uses, which takes a 'fnmatch()' type path regexp (not a real regexp, >> but the kind you're used to with shell) >> >> NOTE! On purpose, we don't set the FNM_PATHNAME, so "*.c" here is >> different from *.c in shell (it's more like "**.c" in tcsh). IOW, * >> matches '/' too, and will walk subdirectories. >> > > Hm. But if git does only anchor the * at the current directory, wouldn't > this solve (or at least reduce) the performance problems you described in the > later paragraph? Having the "**.c" do a recurisve search for every .c > file would then be used to do a recusrive search. I think Linus meant that it's expensive to look for all *.c files at any depth in the tree, for every commit in repository. You can have either a prefix matcher to limit the search *within* a tree so you can afford to walk all revs in the repo, or you stick to just one tree (or a few explicitly named ones). You seem to be saying 'fine, I know, and I'm willing to indicate that I'm accepting this cost by using a different syntax'. But the syntax is not the point. You can certainly do that right now, if you really wish to. Just don't try it on a large repo :-) git grep -e pattern $(git rev-list --all) -- *.c Make suitable modifications to the '--all' in the git rev-list to limit the revs you want to search. Regardless of whether there is a simple syntax to support it or not, this is probably not what you want, most of the time :-) Sitaram ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: How to use path limiting (using a glob)? 2009-02-11 19:14 How to use path limiting (using a glob)? Peter Baumann 2009-02-11 19:40 ` Linus Torvalds @ 2009-02-11 19:48 ` Junio C Hamano 2009-02-11 21:09 ` Nanako Shiraishi 1 sibling, 1 reply; 6+ messages in thread From: Junio C Hamano @ 2009-02-11 19:48 UTC (permalink / raw) To: Peter Baumann; +Cc: git Peter Baumann <waste.manager@gmx.de> writes: > after reading Junio's nice blog today where he explained how to use git grep > efficiently, I saw him using a glob to match for the interesting files: > > $ git grep -e ';;' -- '*.c' > > Is it possible to have the same feature in git diff and the revision > machinery? Because I tried > > $ cd $path_to_your_git_src_dir > $ git log master -p -- '*.h' > .... No commit shown > > $ git diff --name-only v1.5.0 v1.6.0 -- '*.c' > > and both don't return anything. There was a recent discussion on this. The index family uses glob, the tree family uses leading-path only. The one implemented for grep can do both, and attempts to unify both by providing possibly reusable interface so that the other two families can be ported to, but we haven't managed to trick anybody to take up the task ;-). ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: How to use path limiting (using a glob)? 2009-02-11 19:48 ` Junio C Hamano @ 2009-02-11 21:09 ` Nanako Shiraishi 0 siblings, 0 replies; 6+ messages in thread From: Nanako Shiraishi @ 2009-02-11 21:09 UTC (permalink / raw) To: Peter Baumann; +Cc: git, Junio C Hamano Quoting Junio C Hamano <gitster@pobox.com>: > Peter Baumann <waste.manager@gmx.de> writes: >> Hallo, >> >> after reading Junio's nice blog today where he explained how to use git grep >> efficiently, I saw him using a glob to match for the interesting files: >> >> $ git grep -e ';;' -- '*.c' >> >> Is it possible to have the same feature in git diff and the revision >> machinery? Because I tried >> >> $ cd $path_to_your_git_src_dir >> $ git log master -p -- '*.h' >> .... No commit shown >> >> $ git diff --name-only v1.5.0 v1.6.0 -- '*.c' >> >> and both don't return anything. > There was a recent discussion on this. The index family uses glob, the > tree family uses leading-path only. The one implemented for grep can do > both, and attempts to unify both by providing possibly reusable interface > so that the other two families can be ported to, but we haven't managed to > trick anybody to take up the task ;-). The list archive has nicely written summaries on the issues and suggestions on how to make this possible: http://article.gmane.org/gmane.comp.version-control.git/94628 http://thread.gmane.org/gmane.comp.version-control.git/105638/focus=105679 -- Nanako Shiraishi, the unofficial project secratary of the git project http://ivory.ap.teacup.com/nanako3/ ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-02-12 11:11 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-02-11 19:14 How to use path limiting (using a glob)? Peter Baumann 2009-02-11 19:40 ` Linus Torvalds 2009-02-12 10:27 ` Peter Baumann 2009-02-12 11:09 ` Sitaram Chamarty 2009-02-11 19:48 ` Junio C Hamano 2009-02-11 21:09 ` Nanako Shiraishi
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.