Git development
 help / color / mirror / Atom feed
From: Peter Baumann <waste.manager@gmx.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: git@vger.kernel.org
Subject: Re: How to use path limiting (using a glob)?
Date: Thu, 12 Feb 2009 11:27:20 +0100	[thread overview]
Message-ID: <20090212102719.GD27232@m62s10.vlinux.de> (raw)
In-Reply-To: <alpine.LFD.2.00.0902111129190.3590@localhost.localdomain>

On Wed, Feb 11, 2009 at 11:40:44AM -0800, Linus Torvalds wrote:
> 
> 
> On Wed, 11 Feb 2009, Peter Baumann wrote:
> 
> > after reading Junio's nice blog today where he explained how to use git grep
> > efficiently, I saw him using a glob to match for the interesting files:
> > 
> > 	 $ git grep -e ';;' -- '*.c'
> > 
> > Is it possible to have the same feature in git diff and the revision
> > machinery?
> 
> Not really. Git has two different kinds of path limiters, and they are 
> really really different.
> 
>  - the "walk current index/directory recursively" kind that "git ls-files" 
>    uses, which takes a 'fnmatch()' type path regexp (not a real regexp, 
>    but the kind you're used to with shell)
> 
>    NOTE! On purpose, we don't set the FNM_PATHNAME, so "*.c" here is 
>    different from *.c in shell (it's more like "**.c" in tcsh). IOW, * 
>    matches '/' too, and will walk subdirectories.
> 

Hm. But if git does only anchor the * at the current directory, wouldn't
this solve (or at least reduce) the performance problems you described in the
later paragraph? Having the "**.c" do a recurisve search for every .c
file would then be used to do a recusrive search. 

>  - the "revision limiter" pathspec. This is *not* a regexp, it's a pure 
>    prefix matcher, for a very simple reason: performance.
> 
> > 	$ cd $path_to_your_git_src_dir
> > 	$ git log master -p -- '*.h'
> > 	.... No commit shown 
> > 
> > 	$ git diff --name-only v1.5.0  v1.6.0 -- '*.c'
> > 
> > and both don't return anything.
> 
> Yeah, in the revision matcher you can still depend on the shell 
> expansion, and it will do _almost_ the right thing. So if you do
> 
> 	git log master -p *.c
> 
> without the quotes, the shell expansion will work, and that in turn will 
> give a set of filenames that "git log" will restrict the log to. HOWEVER, 
> it's not a real wildcard - it's literally looking at what you have now in 
> your current working directory, and saying "give me the logs of those 
> pathnames", not "give me the logs of everything ending with .c".
> 

Ok. Thats actually the reason why I asked for this, because if a file
got removed it wouldn't be found by this.

> We _could_ make the revision limiter understand fnmatch-style patterns, 
> but quite frankly, it's very very expensive - too expensive to be useful 
> for big repositories. The point about only matching prefixes is that it 
> allows the revision limiter to not even walk into subdirectories that 
> don't match, but if you do the "*.c" kind of pattern, now the revision 
> code has to look up every tree recursively. That code is also _extremely_ 
> performance-critical, so we really don't want to use fnmatch() when we can 
> currently use just "memcmp()".
> 
> So yes, it's kind of odd how we have two totally different concepts of 
> pathname patterns, but it's probably easiest to remember that "'git grep' 
> is just special". 
> 
> 		Linus

  reply	other threads:[~2009-02-12 10:27 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-11 19:14 How to use path limiting (using a glob)? Peter Baumann
2009-02-11 19:40 ` Linus Torvalds
2009-02-12 10:27   ` Peter Baumann [this message]
2009-02-12 11:09     ` Sitaram Chamarty
2009-02-11 19:48 ` Junio C Hamano
2009-02-11 21:09   ` Nanako Shiraishi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090212102719.GD27232@m62s10.vlinux.de \
    --to=waste.manager@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox