git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Making ce_path_match() more useful by accepting globs
@ 2007-11-25 18:03 Junio C Hamano
  2007-11-26  0:11 ` Alex Riesen
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2007-11-25 18:03 UTC (permalink / raw)
  To: Linus, "Torvalds <torvalds"; +Cc: git

Currently ce_path_match() only uses "the leading directory" match, and
does not understand file globs.  These do not work:

   git diff-files 't/*.sh'
   git diff-index HEAD 'xdiff/*.c'
   git update-index -g 'Documentation/howto/*.txt'

This teaches the ce_path_match(), the underlying function that are used
for checking if a given cache entry matches the given set of pathspecs,
to use the match_pathspec() from git-ls-files, which knows about glob
patterns.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * Having two different behaviours of pathspec matching has been
   bothering me for quite some time.  The changes here look trivially
   correct and the result passes all the tests, but this is quite close
   to the core part of the system, and would benefit greatly from extra
   set of eyes.

   This patch does not touch the tree walker, and does not affect
   diff-tree nor ancestry pruning done in the revision traversal.  That
   however is even closer to the core and is performance critical.  It
   needs to be done carefully not to descend into trees that would never
   match needlessly.  IOW, not today.

 dir.c        |    5 +++--
 read-cache.c |   17 ++---------------
 2 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/dir.c b/dir.c
index 225fdfb..be640c9 100644
--- a/dir.c
+++ b/dir.c
@@ -98,20 +98,21 @@ int match_pathspec(const char **pathspec, const char *name, int namelen, int pre
 {
 	int retval;
 	const char *match;
+	int want_seen = !!seen;
 
 	name += prefix;
 	namelen -= prefix;
 
 	for (retval = 0; (match = *pathspec++) != NULL; seen++) {
 		int how;
-		if (retval && *seen == MATCHED_EXACTLY)
+		if (retval && seen && want_seen && *seen == MATCHED_EXACTLY)
 			continue;
 		match += prefix;
 		how = match_one(match, name, namelen);
 		if (how) {
 			if (retval < how)
 				retval = how;
-			if (*seen < how)
+			if (want_seen && *seen < how)
 				*seen = how;
 		}
 	}
diff --git a/read-cache.c b/read-cache.c
index 7db5588..767464e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -472,7 +472,7 @@ int ce_same_name(struct cache_entry *a, struct cache_entry *b)
 
 int ce_path_match(const struct cache_entry *ce, const char **pathspec)
 {
-	const char *match, *name;
+	const char *name;
 	int len;
 
 	if (!pathspec)
@@ -480,20 +480,7 @@ int ce_path_match(const struct cache_entry *ce, const char **pathspec)
 
 	len = ce_namelen(ce);
 	name = ce->name;
-	while ((match = *pathspec++) != NULL) {
-		int matchlen = strlen(match);
-		if (matchlen > len)
-			continue;
-		if (memcmp(name, match, matchlen))
-			continue;
-		if (matchlen && name[matchlen-1] == '/')
-			return 1;
-		if (name[matchlen] == '/' || !name[matchlen])
-			return 1;
-		if (!matchlen)
-			return 1;
-	}
-	return 0;
+	return !!match_pathspec(pathspec, name, len, 0, NULL);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC] Making ce_path_match() more useful by accepting globs
  2007-11-25 18:03 [RFC] Making ce_path_match() more useful by accepting globs Junio C Hamano
@ 2007-11-26  0:11 ` Alex Riesen
  2007-11-26  0:30   ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Alex Riesen @ 2007-11-26  0:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Junio C Hamano, Sun, Nov 25, 2007 19:03:12 +0100:
> Currently ce_path_match() only uses "the leading directory" match, and
> does not understand file globs.  These do not work:
> 
>    git diff-files 't/*.sh'
>    git diff-index HEAD 'xdiff/*.c'
>    git update-index -g 'Documentation/howto/*.txt'

How should my scripts handle files with "*" in names?
Maybe before the proposed change we should make all plumbing accept
"--stdin" (which does not expand globs)?

P.S. you have something broken in your mailer:

    To: Linus@sceptre.sasl.smtp.pobox.com,
	    "Torvalds <torvalds"@linux-fundation.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] Making ce_path_match() more useful by accepting globs
  2007-11-26  0:11 ` Alex Riesen
@ 2007-11-26  0:30   ` Junio C Hamano
  2007-11-26  0:48     ` Junio C Hamano
  2007-11-26 19:38     ` Alex Riesen
  0 siblings, 2 replies; 5+ messages in thread
From: Junio C Hamano @ 2007-11-26  0:30 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Linus Torvalds, git

Alex Riesen <raa.lkml@gmail.com> writes:

> Junio C Hamano, Sun, Nov 25, 2007 19:03:12 +0100:
>> Currently ce_path_match() only uses "the leading directory" match, and
>> does not understand file globs.  These do not work:
>> 
>>    git diff-files 't/*.sh'
>>    git diff-index HEAD 'xdiff/*.c'
>>    git update-index -g 'Documentation/howto/*.txt'
>
> How should my scripts handle files with "*" in names?

We DO NOT CARE.

Why?

How would you handle such files from the command line session without
git?  "ls such-*-a-file" will also show such-silly-a-file as well.

IOW, the user is shooting in the foot --- and at that point I am not all
that interested in helping him.

Having said that, I would think that quoting the meta from fnmatch(3)
like this:

	git ls-files 'such-\*-a-file'

would work fine, just like

	ls such-\*-a-file

would.

If "ls such-*-a-file" reports only one file,

	git ls-files 'such-*-a-file'

would also report that file as well.

So in practice I do not see a problem.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] Making ce_path_match() more useful by accepting globs
  2007-11-26  0:30   ` Junio C Hamano
@ 2007-11-26  0:48     ` Junio C Hamano
  2007-11-26 19:38     ` Alex Riesen
  1 sibling, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2007-11-26  0:48 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Linus Torvalds, git

Junio C Hamano <gitster@pobox.com> writes:

> Alex Riesen <raa.lkml@gmail.com> writes:
>
>> Junio C Hamano, Sun, Nov 25, 2007 19:03:12 +0100:
>>> Currently ce_path_match() only uses "the leading directory" match, and
>>> does not understand file globs.  These do not work:
>>> 
>>>    git diff-files 't/*.sh'
>>>    git diff-index HEAD 'xdiff/*.c'
>>>    git update-index -g 'Documentation/howto/*.txt'
>>
>> How should my scripts handle files with "*" in names?
> ...
> Having said that, I would think that quoting the meta from fnmatch(3)
> like this:
>
> 	git ls-files 'such-\*-a-file'
>
> would work fine, just like
>
> 	ls such-\*-a-file
>
> would.

Yup.  I just did:

	$ >M\*kefile
        $ git add 'M\*kefile'
        $ git ls-files 'M\*kefile'
        M*kefile
        $ git ls-files 'M*kefile'
        M*kefile
        Makefile

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] Making ce_path_match() more useful by accepting globs
  2007-11-26  0:30   ` Junio C Hamano
  2007-11-26  0:48     ` Junio C Hamano
@ 2007-11-26 19:38     ` Alex Riesen
  1 sibling, 0 replies; 5+ messages in thread
From: Alex Riesen @ 2007-11-26 19:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Junio C Hamano, Mon, Nov 26, 2007 01:30:20 +0100:
> Alex Riesen <raa.lkml@gmail.com> writes:
> 
> > Junio C Hamano, Sun, Nov 25, 2007 19:03:12 +0100:
> >> Currently ce_path_match() only uses "the leading directory" match, and
> >> does not understand file globs.  These do not work:
> >> 
> >>    git diff-files 't/*.sh'
> >>    git diff-index HEAD 'xdiff/*.c'
> >>    git update-index -g 'Documentation/howto/*.txt'
> >
> > How should my scripts handle files with "*" in names?
> 
> We DO NOT CARE.
> 
> Why?
> 
> How would you handle such files from the command line session without
> git?  "ls such-*-a-file" will also show such-silly-a-file as well.

It will break existing setups. Something like

    #!/bin/bash

    git diff-files --quiet -- "$@" || do_something

will behave differently

> IOW, the user is shooting in the foot --- and at that point I am not all
> that interested in helping him.

What with? What's wrong with a name like "M*A*S*H" in your personal IMDB?

> Having said that, I would think that quoting the meta from fnmatch(3)
> like this:
> 
> 	git ls-files 'such-\*-a-file'
> 
> would work fine, just like
> 
> 	ls such-\*-a-file
> 
> would.

An existing system will still be broken.

> If "ls such-*-a-file" reports only one file,
> 
> 	git ls-files 'such-*-a-file'
> 
> would also report that file as well.
> 
> So in practice I do not see a problem.

I think I do. There is NO way to pass the filenames to git plumbing
(only the diff-, or is git update-index --stdin -z also subject to
glob expansion?) without being sure they are exactly they were on
filesystem. Quoting is just PITA for scripting.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-11-26 19:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-25 18:03 [RFC] Making ce_path_match() more useful by accepting globs Junio C Hamano
2007-11-26  0:11 ` Alex Riesen
2007-11-26  0:30   ` Junio C Hamano
2007-11-26  0:48     ` Junio C Hamano
2007-11-26 19:38     ` Alex Riesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).