git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Why does git-grep appear to treat exclude pathspecs differently?
@ 2025-07-26 13:44 D. Ben Knoble
  2025-07-27  0:16 ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: D. Ben Knoble @ 2025-07-26 13:44 UTC (permalink / raw)
  To: Git

With Git 2.48.1, I observe the following behavior:

- "git ls-files :^:Documentation/RelNotes | grep Rel" yields
"RelNotes", as expected
- "git grep squash :^:Documentation/RelNotes" yields the error

fatal: ambiguous argument ':^:Documentation/RelNotes': unknown
revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

- "git grep squash :^Documentation/RelNotes", "git grep squash
:^:Documentation/RelNotes/\*", and "git grep squash --
:^:Documentation/RelNotes" all work as expected

So I suppose, upon re-reading, that my question is really: what about
":^:Documentation/RelNotes" is ambiguous when handed to git-grep here?
There are a few revision syntaxes that start with colon, but none that
are followed directly by a caret, I think—and :^… can't be "revision
:'s parent…"

-- 
D. Ben Knoble

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why does git-grep appear to treat exclude pathspecs differently?
  2025-07-26 13:44 Why does git-grep appear to treat exclude pathspecs differently? D. Ben Knoble
@ 2025-07-27  0:16 ` Junio C Hamano
  2025-07-30 21:49   ` D. Ben Knoble
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2025-07-27  0:16 UTC (permalink / raw)
  To: D. Ben Knoble; +Cc: Git

"D. Ben Knoble" <ben.knoble@gmail.com> writes:

> With Git 2.48.1, I observe the following behavior:
>
> - "git ls-files :^:Documentation/RelNotes | grep Rel" yields
> "RelNotes", as expected

It is deliberately confusing to spell ":(exclude)" as ":^:".

> - "git grep squash :^:Documentation/RelNotes" yields the error
>
> fatal: ambiguous argument ':^:Documentation/RelNotes': unknown
> revision or path not in the working tree.

I think if you write it in longhand,

    $ git grep squash ':(exclude)Documentation/RelNotes'

you would not see such an error.

The error message comes from setup.c:die_verify_filename(), I think,
and setup.c:looks_like_pathspec() allows the control flow to avoid
calling that filename verification code path.  It knows to let the
longhand magic pathspec go, and it may be trivial to teach it a
shorthand magic too, but I offhand do not know the implications of
such a change---there might be unintended consequences.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why does git-grep appear to treat exclude pathspecs differently?
  2025-07-27  0:16 ` Junio C Hamano
@ 2025-07-30 21:49   ` D. Ben Knoble
  2025-08-02  9:46     ` Jeff King
  2025-08-02 16:14     ` D. Ben Knoble
  0 siblings, 2 replies; 12+ messages in thread
From: D. Ben Knoble @ 2025-07-30 21:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git

On Sat, Jul 26, 2025 at 8:16 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "D. Ben Knoble" <ben.knoble@gmail.com> writes:
>
> > With Git 2.48.1, I observe the following behavior:
> >
> > - "git ls-files :^:Documentation/RelNotes | grep Rel" yields
> > "RelNotes", as expected
>
> It is deliberately confusing to spell ":(exclude)" as ":^:".

What makes you say that? It's documented in "git help revisions":

           A pathspec that begins with a colon : has special meaning. In the
           short form, the leading colon : is followed by zero or more "magic
           signature" letters (which optionally is terminated by another colon
           :), and the remainder is the pattern to match against the path.

and

           exclude
               After a path matches any non-exclude pathspec, it will be run
               through all exclude pathspecs (magic signature: ! or its synonym
               ^).

>
> > - "git grep squash :^:Documentation/RelNotes" yields the error
> >
> > fatal: ambiguous argument ':^:Documentation/RelNotes': unknown
> > revision or path not in the working tree.
>
> I think if you write it in longhand,
>
>     $ git grep squash ':(exclude)Documentation/RelNotes'
>
> you would not see such an error.

Indeed, I left this syntax out of my original, but it works.

> The error message comes from setup.c:die_verify_filename(), I think,
> and setup.c:looks_like_pathspec() allows the control flow to avoid
> calling that filename verification code path.  It knows to let the
> longhand magic pathspec go, and it may be trivial to teach it a
> shorthand magic too, but I offhand do not know the implications of
> such a change---there might be unintended consequences.

Hm. Running a debugger, this looks accurate. We are in the
"!seen_dashdash" case of builtin/grep.c, with the call

    verify_filename(prefix=0x0000000000000000,
arg=":^:Documentation/RelNotes", diagnose_misspelt_rev=1)

which eventually dies as noted. However:

- looks_like_pathspec() only checks for long magic, as you noted
- setup.c:check_filename() looks for short-magic, too, but only
considers ":^" to work like a pathname if we're excluding everything?

I think what I find confusing is that, while this is definitely a DWIM
case for git-grep, it doesn't seem to do DWIM :) We are verifying that
the remaining arguments are filenames, but couldn't they really be
full pathspecs, as long as they aren't revisions?

The difference for "^:<path>" from ":^:<path>", just to complete the
story, is that in setup.c:check_filename() we try to stat whatever
comes after ":^":
- for the former, that's <path> and we are ok
- for the latter, that's :<path>, fail.

So it seems like the places that check for short-magic should also
consider the optional trailing colon?

-- 
D. Ben Knoble

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why does git-grep appear to treat exclude pathspecs differently?
  2025-07-30 21:49   ` D. Ben Knoble
@ 2025-08-02  9:46     ` Jeff King
  2025-08-02 16:13       ` D. Ben Knoble
  2025-08-02 17:12       ` Junio C Hamano
  2025-08-02 16:14     ` D. Ben Knoble
  1 sibling, 2 replies; 12+ messages in thread
From: Jeff King @ 2025-08-02  9:46 UTC (permalink / raw)
  To: D. Ben Knoble; +Cc: Junio C Hamano, Git

On Wed, Jul 30, 2025 at 05:49:29PM -0400, D. Ben Knoble wrote:

> which eventually dies as noted. However:
> 
> - looks_like_pathspec() only checks for long magic, as you noted
> - setup.c:check_filename() looks for short-magic, too, but only
> considers ":^" to work like a pathname if we're excluding everything?
> 
> I think what I find confusing is that, while this is definitely a DWIM
> case for git-grep, it doesn't seem to do DWIM :) We are verifying that
> the remaining arguments are filenames, but couldn't they really be
> full pathspecs, as long as they aren't revisions?
> 
> The difference for "^:<path>" from ":^:<path>", just to complete the
> story, is that in setup.c:check_filename() we try to stat whatever
> comes after ":^":
> - for the former, that's <path> and we are ok
> - for the latter, that's :<path>, fail.
> 
> So it seems like the places that check for short-magic should also
> consider the optional trailing colon?

Yeah, I'd think so. But it's worse than that, even. According to the
glossary definition you showed:

  In the short form, the leading colon `:` is followed by zero or more
  "magic signature" letters (which optionally is terminated by another
  colon `:`)

we allow multiple bits of magic. So the code in check_filename() that
looks for ":/", ":^", etc would be fooled when seeing more than one
character, like: ":/^exclude-from-root". AFAICT there are only two short
magic types, so I guess nobody has really run into this before.

Also, I guess this function ought to be respecting the literal_pathspecs
global? The actual pathspec parser does.

If we can, we probably ought to be feeding the paths to a function like
pathspec.c:parse_element_magic() and then checking the resulting flags
(and skipping past the prefix as it indicates).

-Peff

PS I didn't even know that we allowed multiple short items or a trailing
   colon until your email! Hidden corners of Git.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why does git-grep appear to treat exclude pathspecs differently?
  2025-08-02  9:46     ` Jeff King
@ 2025-08-02 16:13       ` D. Ben Knoble
  2025-08-02 18:52         ` Jeff King
  2025-08-02 17:12       ` Junio C Hamano
  1 sibling, 1 reply; 12+ messages in thread
From: D. Ben Knoble @ 2025-08-02 16:13 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git

On Sat, Aug 2, 2025 at 5:46 AM Jeff King <peff@peff.net> wrote:
>
> On Wed, Jul 30, 2025 at 05:49:29PM -0400, D. Ben Knoble wrote:
>
> > which eventually dies as noted. However:
> >
> > - looks_like_pathspec() only checks for long magic, as you noted
> > - setup.c:check_filename() looks for short-magic, too, but only
> > considers ":^" to work like a pathname if we're excluding everything?
> >
> > I think what I find confusing is that, while this is definitely a DWIM
> > case for git-grep, it doesn't seem to do DWIM :) We are verifying that
> > the remaining arguments are filenames, but couldn't they really be
> > full pathspecs, as long as they aren't revisions?
> >
> > The difference for "^:<path>" from ":^:<path>", just to complete the
> > story, is that in setup.c:check_filename() we try to stat whatever
> > comes after ":^":
> > - for the former, that's <path> and we are ok
> > - for the latter, that's :<path>, fail.
> >
> > So it seems like the places that check for short-magic should also
> > consider the optional trailing colon?
>
> Yeah, I'd think so. But it's worse than that, even. According to the
> glossary definition you showed:
>
>   In the short form, the leading colon `:` is followed by zero or more
>   "magic signature" letters (which optionally is terminated by another
>   colon `:`)
>
> we allow multiple bits of magic. So the code in check_filename() that
> looks for ":/", ":^", etc would be fooled when seeing more than one
> character, like: ":/^exclude-from-root". AFAICT there are only two short
> magic types, so I guess nobody has really run into this before.

Good call out! I also see only 2 magic items (one with a synonym) at the moment.

>
> Also, I guess this function ought to be respecting the literal_pathspecs
> global? The actual pathspec parser does.
>
> If we can, we probably ought to be feeding the paths to a function like
> pathspec.c:parse_element_magic() and then checking the resulting flags
> (and skipping past the prefix as it indicates).

Thanks for pointing me at this; maybe I'll find some time for patches
unless someone beats me to it.

>
> -Peff
>
> PS I didn't even know that we allowed multiple short items or a trailing
>    colon until your email! Hidden corners of Git.

;)

-- 
D. Ben Knoble

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why does git-grep appear to treat exclude pathspecs differently?
  2025-07-30 21:49   ` D. Ben Knoble
  2025-08-02  9:46     ` Jeff King
@ 2025-08-02 16:14     ` D. Ben Knoble
  2025-08-02 17:23       ` Junio C Hamano
  1 sibling, 1 reply; 12+ messages in thread
From: D. Ben Knoble @ 2025-08-02 16:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git

On Wed, Jul 30, 2025 at 5:49 PM D. Ben Knoble <ben.knoble@gmail.com> wrote:
>
> On Sat, Jul 26, 2025 at 8:16 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > "D. Ben Knoble" <ben.knoble@gmail.com> writes:
> >
> > > With Git 2.48.1, I observe the following behavior:
> > >
> > > - "git ls-files :^:Documentation/RelNotes | grep Rel" yields
> > > "RelNotes", as expected
> >
> > It is deliberately confusing to spell ":(exclude)" as ":^:".
>
> What makes you say that? It's documented in "git help revisions":

Er, that's "git help glossary" for the quotes below—woops!

>
>            A pathspec that begins with a colon : has special meaning. In the
>            short form, the leading colon : is followed by zero or more "magic
>            signature" letters (which optionally is terminated by another colon
>            :), and the remainder is the pattern to match against the path.
>
> and
>
>            exclude
>                After a path matches any non-exclude pathspec, it will be run
>                through all exclude pathspecs (magic signature: ! or its synonym
>                ^).
>

-- 
D. Ben Knoble

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why does git-grep appear to treat exclude pathspecs differently?
  2025-08-02  9:46     ` Jeff King
  2025-08-02 16:13       ` D. Ben Knoble
@ 2025-08-02 17:12       ` Junio C Hamano
  1 sibling, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2025-08-02 17:12 UTC (permalink / raw)
  To: Jeff King; +Cc: D. Ben Knoble, Git

Jeff King <peff@peff.net> writes:

> PS I didn't even know that we allowed multiple short items or a trailing
>    colon until your email! Hidden corners of Git.

But once you think about them, it all makes sense.  The trailing
colon can help strange folks who have paths that begin with one of
these strange byte values ;-).


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why does git-grep appear to treat exclude pathspecs differently?
  2025-08-02 16:14     ` D. Ben Knoble
@ 2025-08-02 17:23       ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2025-08-02 17:23 UTC (permalink / raw)
  To: D. Ben Knoble; +Cc: Git

"D. Ben Knoble" <ben.knoble@gmail.com> writes:

>> > > - "git ls-files :^:Documentation/RelNotes | grep Rel" yields
>> > > "RelNotes", as expected
>> >
>> > It is deliberately confusing to spell ":(exclude)" as ":^:".
>>
>> What makes you say that? It's documented in "git help revisions":

Because the most natural way to spell it is ":!Documentation/RelNotes"
not ":^:Documentation/RelNotes"?  

And that is why I did not say "invalid way to spell".  It just was
unnecessarily unfamiliar form to say the same thing.

But as you need more than one "prefix" to the real pathname to
trigger the bug in the disambiguation code, :! alone would not have
exhibited the symptom, and you would have needed something
multi-letter like :!: or :!/ in front of that path.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why does git-grep appear to treat exclude pathspecs differently?
  2025-08-02 16:13       ` D. Ben Knoble
@ 2025-08-02 18:52         ` Jeff King
  2025-08-03  5:56           ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2025-08-02 18:52 UTC (permalink / raw)
  To: D. Ben Knoble; +Cc: Junio C Hamano, Git

On Sat, Aug 02, 2025 at 12:13:25PM -0400, D. Ben Knoble wrote:

> > Also, I guess this function ought to be respecting the literal_pathspecs
> > global? The actual pathspec parser does.
> >
> > If we can, we probably ought to be feeding the paths to a function like
> > pathspec.c:parse_element_magic() and then checking the resulting flags
> > (and skipping past the prefix as it indicates).
> 
> Thanks for pointing me at this; maybe I'll find some time for patches
> unless someone beats me to it.

So here's a simple-ish way to use the full pathspec parsing code:

diff --git a/setup.c b/setup.c
index 6f52dab64c..ad27a65d6b 100644
--- a/setup.c
+++ b/setup.c
@@ -176,28 +176,27 @@ int path_inside_repo(const char *prefix, const char *path)
 
 int check_filename(const char *prefix, const char *arg)
 {
-	char *to_free = NULL;
+	const char *args[] = { arg, NULL };
+	struct pathspec ps;
 	struct stat st;
 
-	if (skip_prefix(arg, ":/", &arg)) {
-		if (!*arg) /* ":/" is root dir, always exists */
-			return 1;
-		prefix = NULL;
-	} else if (skip_prefix(arg, ":!", &arg) ||
-		   skip_prefix(arg, ":^", &arg)) {
-		if (!*arg) /* excluding everything is silly, but allowed */
-			return 1;
-	}
+	parse_pathspec(&ps, 0, 0, prefix, args);
+	if (ps.nr < 1)
+		BUG("pathspec ended up with no items?");
+	arg = ps.items[0].match;
 
-	if (prefix)
-		arg = to_free = prefix_filename(prefix, arg);
+	/* empty paths (after parsing magic) are OK */
+	if (!*arg) {
+		clear_pathspec(&ps);
+		return 1;
+	}
 
 	if (!lstat(arg, &st)) {
-		free(to_free);
+		clear_pathspec(&ps);
 		return 1; /* file exists */
 	}
 	if (is_missing_file_error(errno)) {
-		free(to_free);
+		clear_pathspec(&ps);
 		return 0; /* file does not exist */
 	}
 	die_errno(_("failed to stat '%s'"), arg);


It does make :^:Documentation work as you'd expect. Curiously, testing
:/Documentation is hard because that is _also_ a valid rev (the ":/"
searches for a commit with that string in its message). So it will
almost always fail anyway as "hey, this is both a rev and a file". But
you can do ":^/Documentation" to see how it behaves. ;)

But here's the interesting part: it breaks a bunch of tests. They all
seem to be doing things like ":file.txt". In check_filename() right now
we treat that literally. But as a pathspec, it is technically "colon
followed by zero or more magic signature letters", and it is eaten.

So I wonder if we have painted ourselves into a compatibility corner a
bit, if we have two conflicting expectations. We might be better off
just teaching check_filename() to parse multiple of [^/!] and the
trailing colon. It's horrible and not great for maintainability, but
this syntax is not something that changes often.

-Peff

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Why does git-grep appear to treat exclude pathspecs differently?
  2025-08-02 18:52         ` Jeff King
@ 2025-08-03  5:56           ` Junio C Hamano
  2025-08-05 18:57             ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2025-08-03  5:56 UTC (permalink / raw)
  To: Jeff King; +Cc: D. Ben Knoble, Git

Jeff King <peff@peff.net> writes:

> But here's the interesting part: it breaks a bunch of tests. They all
> seem to be doing things like ":file.txt". In check_filename() right now
> we treat that literally. But as a pathspec, it is technically "colon
> followed by zero or more magic signature letters", and it is eaten.

Hmph.  Shouldn't the definition be "colon and then one or more magic
signature letters", then?  ":file.txt" to name the blob object at
path file.txt in the index is fairly common "rev" and it is a shame
that it has to become ambiguous with a pathspec element.

> So I wonder if we have painted ourselves into a compatibility corner a
> bit, if we have two conflicting expectations. We might be better off
> just teaching check_filename() to parse multiple of [^/!] and the
> trailing colon. It's horrible and not great for maintainability, but
> this syntax is not something that changes often.

Ah, OK.

So the idea is that when given _as_ a pathspec element (e.g., after
an explicit "--" separator), we do want to interpret ":file.txt" as
the same as "file.txt", but when dwimming to sift revs and pathspec
elements apart, prefer to take it as a blob object name in the
index?

I guess that would work better than the current code (or straight
"use the full pathspec parser" approach) from the compatibility
viewpoint.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why does git-grep appear to treat exclude pathspecs differently?
  2025-08-03  5:56           ` Junio C Hamano
@ 2025-08-05 18:57             ` Jeff King
  2025-08-05 20:06               ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2025-08-05 18:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: D. Ben Knoble, Git

On Sat, Aug 02, 2025 at 10:56:28PM -0700, Junio C Hamano wrote:

> > So I wonder if we have painted ourselves into a compatibility corner a
> > bit, if we have two conflicting expectations. We might be better off
> > just teaching check_filename() to parse multiple of [^/!] and the
> > trailing colon. It's horrible and not great for maintainability, but
> > this syntax is not something that changes often.
> 
> Ah, OK.
> 
> So the idea is that when given _as_ a pathspec element (e.g., after
> an explicit "--" separator), we do want to interpret ":file.txt" as
> the same as "file.txt", but when dwimming to sift revs and pathspec
> elements apart, prefer to take it as a blob object name in the
> index?

Yeah, I think that is a good way of framing / justifying it; DWIM can be
looser because it's inherently about heuristics.

I don't plan to do anything with this topic anytime soon, but maybe Ben
wants to produce a patch in that direction.

-Peff

PS I'll be offline for a week or so starting tomorrow, so apologies for
   any discussions I leave hanging.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why does git-grep appear to treat exclude pathspecs differently?
  2025-08-05 18:57             ` Jeff King
@ 2025-08-05 20:06               ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2025-08-05 20:06 UTC (permalink / raw)
  To: Jeff King; +Cc: D. Ben Knoble, Git

Jeff King <peff@peff.net> writes:

> On Sat, Aug 02, 2025 at 10:56:28PM -0700, Junio C Hamano wrote:
>
>> > So I wonder if we have painted ourselves into a compatibility corner a
>> > bit, if we have two conflicting expectations. We might be better off
>> > just teaching check_filename() to parse multiple of [^/!] and the
>> > trailing colon. It's horrible and not great for maintainability, but
>> > this syntax is not something that changes often.
>> 
>> Ah, OK.
>> 
>> So the idea is that when given _as_ a pathspec element (e.g., after
>> an explicit "--" separator), we do want to interpret ":file.txt" as
>> the same as "file.txt", but when dwimming to sift revs and pathspec
>> elements apart, prefer to take it as a blob object name in the
>> index?
>
> Yeah, I think that is a good way of framing / justifying it; DWIM can be
> looser because it's inherently about heuristics.
>
> I don't plan to do anything with this topic anytime soon, but maybe Ben
> wants to produce a patch in that direction.
>
> -Peff
>
> PS I'll be offline for a week or so starting tomorrow, so apologies for
>    any discussions I leave hanging.

Thanks for a heads-up, and enjoy your time off away from the list
;-)


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-08-05 20:06 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-26 13:44 Why does git-grep appear to treat exclude pathspecs differently? D. Ben Knoble
2025-07-27  0:16 ` Junio C Hamano
2025-07-30 21:49   ` D. Ben Knoble
2025-08-02  9:46     ` Jeff King
2025-08-02 16:13       ` D. Ben Knoble
2025-08-02 18:52         ` Jeff King
2025-08-03  5:56           ` Junio C Hamano
2025-08-05 18:57             ` Jeff King
2025-08-05 20:06               ` Junio C Hamano
2025-08-02 17:12       ` Junio C Hamano
2025-08-02 16:14     ` D. Ben Knoble
2025-08-02 17:23       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).