From: Peter Baumann <waste.manager@gmx.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: git@vger.kernel.org
Subject: Re: How to use path limiting (using a glob)?
Date: Thu, 12 Feb 2009 11:27:20 +0100 [thread overview]
Message-ID: <20090212102719.GD27232@m62s10.vlinux.de> (raw)
In-Reply-To: <alpine.LFD.2.00.0902111129190.3590@localhost.localdomain>
On Wed, Feb 11, 2009 at 11:40:44AM -0800, Linus Torvalds wrote:
>
>
> On Wed, 11 Feb 2009, Peter Baumann wrote:
>
> > after reading Junio's nice blog today where he explained how to use git grep
> > efficiently, I saw him using a glob to match for the interesting files:
> >
> > $ git grep -e ';;' -- '*.c'
> >
> > Is it possible to have the same feature in git diff and the revision
> > machinery?
>
> Not really. Git has two different kinds of path limiters, and they are
> really really different.
>
> - the "walk current index/directory recursively" kind that "git ls-files"
> uses, which takes a 'fnmatch()' type path regexp (not a real regexp,
> but the kind you're used to with shell)
>
> NOTE! On purpose, we don't set the FNM_PATHNAME, so "*.c" here is
> different from *.c in shell (it's more like "**.c" in tcsh). IOW, *
> matches '/' too, and will walk subdirectories.
>
Hm. But if git does only anchor the * at the current directory, wouldn't
this solve (or at least reduce) the performance problems you described in the
later paragraph? Having the "**.c" do a recurisve search for every .c
file would then be used to do a recusrive search.
> - the "revision limiter" pathspec. This is *not* a regexp, it's a pure
> prefix matcher, for a very simple reason: performance.
>
> > $ cd $path_to_your_git_src_dir
> > $ git log master -p -- '*.h'
> > .... No commit shown
> >
> > $ git diff --name-only v1.5.0 v1.6.0 -- '*.c'
> >
> > and both don't return anything.
>
> Yeah, in the revision matcher you can still depend on the shell
> expansion, and it will do _almost_ the right thing. So if you do
>
> git log master -p *.c
>
> without the quotes, the shell expansion will work, and that in turn will
> give a set of filenames that "git log" will restrict the log to. HOWEVER,
> it's not a real wildcard - it's literally looking at what you have now in
> your current working directory, and saying "give me the logs of those
> pathnames", not "give me the logs of everything ending with .c".
>
Ok. Thats actually the reason why I asked for this, because if a file
got removed it wouldn't be found by this.
> We _could_ make the revision limiter understand fnmatch-style patterns,
> but quite frankly, it's very very expensive - too expensive to be useful
> for big repositories. The point about only matching prefixes is that it
> allows the revision limiter to not even walk into subdirectories that
> don't match, but if you do the "*.c" kind of pattern, now the revision
> code has to look up every tree recursively. That code is also _extremely_
> performance-critical, so we really don't want to use fnmatch() when we can
> currently use just "memcmp()".
>
> So yes, it's kind of odd how we have two totally different concepts of
> pathname patterns, but it's probably easiest to remember that "'git grep'
> is just special".
>
> Linus
next prev parent reply other threads:[~2009-02-12 10:27 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-11 19:14 How to use path limiting (using a glob)? Peter Baumann
2009-02-11 19:40 ` Linus Torvalds
2009-02-12 10:27 ` Peter Baumann [this message]
2009-02-12 11:09 ` Sitaram Chamarty
2009-02-11 19:48 ` Junio C Hamano
2009-02-11 21:09 ` Nanako Shiraishi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090212102719.GD27232@m62s10.vlinux.de \
--to=waste.manager@gmx.de \
--cc=git@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox