* Why does git-grep appear to treat exclude pathspecs differently? @ 2025-07-26 13:44 D. Ben Knoble 2025-07-27 0:16 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: D. Ben Knoble @ 2025-07-26 13:44 UTC (permalink / raw) To: Git With Git 2.48.1, I observe the following behavior: - "git ls-files :^:Documentation/RelNotes | grep Rel" yields "RelNotes", as expected - "git grep squash :^:Documentation/RelNotes" yields the error fatal: ambiguous argument ':^:Documentation/RelNotes': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git <command> [<revision>...] -- [<file>...]' - "git grep squash :^Documentation/RelNotes", "git grep squash :^:Documentation/RelNotes/\*", and "git grep squash -- :^:Documentation/RelNotes" all work as expected So I suppose, upon re-reading, that my question is really: what about ":^:Documentation/RelNotes" is ambiguous when handed to git-grep here? There are a few revision syntaxes that start with colon, but none that are followed directly by a caret, I think—and :^… can't be "revision :'s parent…" -- D. Ben Knoble ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why does git-grep appear to treat exclude pathspecs differently? 2025-07-26 13:44 Why does git-grep appear to treat exclude pathspecs differently? D. Ben Knoble @ 2025-07-27 0:16 ` Junio C Hamano 2025-07-30 21:49 ` D. Ben Knoble 0 siblings, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2025-07-27 0:16 UTC (permalink / raw) To: D. Ben Knoble; +Cc: Git "D. Ben Knoble" <ben.knoble@gmail.com> writes: > With Git 2.48.1, I observe the following behavior: > > - "git ls-files :^:Documentation/RelNotes | grep Rel" yields > "RelNotes", as expected It is deliberately confusing to spell ":(exclude)" as ":^:". > - "git grep squash :^:Documentation/RelNotes" yields the error > > fatal: ambiguous argument ':^:Documentation/RelNotes': unknown > revision or path not in the working tree. I think if you write it in longhand, $ git grep squash ':(exclude)Documentation/RelNotes' you would not see such an error. The error message comes from setup.c:die_verify_filename(), I think, and setup.c:looks_like_pathspec() allows the control flow to avoid calling that filename verification code path. It knows to let the longhand magic pathspec go, and it may be trivial to teach it a shorthand magic too, but I offhand do not know the implications of such a change---there might be unintended consequences. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why does git-grep appear to treat exclude pathspecs differently? 2025-07-27 0:16 ` Junio C Hamano @ 2025-07-30 21:49 ` D. Ben Knoble 2025-08-02 9:46 ` Jeff King 2025-08-02 16:14 ` D. Ben Knoble 0 siblings, 2 replies; 12+ messages in thread From: D. Ben Knoble @ 2025-07-30 21:49 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git On Sat, Jul 26, 2025 at 8:16 PM Junio C Hamano <gitster@pobox.com> wrote: > > "D. Ben Knoble" <ben.knoble@gmail.com> writes: > > > With Git 2.48.1, I observe the following behavior: > > > > - "git ls-files :^:Documentation/RelNotes | grep Rel" yields > > "RelNotes", as expected > > It is deliberately confusing to spell ":(exclude)" as ":^:". What makes you say that? It's documented in "git help revisions": A pathspec that begins with a colon : has special meaning. In the short form, the leading colon : is followed by zero or more "magic signature" letters (which optionally is terminated by another colon :), and the remainder is the pattern to match against the path. and exclude After a path matches any non-exclude pathspec, it will be run through all exclude pathspecs (magic signature: ! or its synonym ^). > > > - "git grep squash :^:Documentation/RelNotes" yields the error > > > > fatal: ambiguous argument ':^:Documentation/RelNotes': unknown > > revision or path not in the working tree. > > I think if you write it in longhand, > > $ git grep squash ':(exclude)Documentation/RelNotes' > > you would not see such an error. Indeed, I left this syntax out of my original, but it works. > The error message comes from setup.c:die_verify_filename(), I think, > and setup.c:looks_like_pathspec() allows the control flow to avoid > calling that filename verification code path. It knows to let the > longhand magic pathspec go, and it may be trivial to teach it a > shorthand magic too, but I offhand do not know the implications of > such a change---there might be unintended consequences. Hm. Running a debugger, this looks accurate. We are in the "!seen_dashdash" case of builtin/grep.c, with the call verify_filename(prefix=0x0000000000000000, arg=":^:Documentation/RelNotes", diagnose_misspelt_rev=1) which eventually dies as noted. However: - looks_like_pathspec() only checks for long magic, as you noted - setup.c:check_filename() looks for short-magic, too, but only considers ":^" to work like a pathname if we're excluding everything? I think what I find confusing is that, while this is definitely a DWIM case for git-grep, it doesn't seem to do DWIM :) We are verifying that the remaining arguments are filenames, but couldn't they really be full pathspecs, as long as they aren't revisions? The difference for "^:<path>" from ":^:<path>", just to complete the story, is that in setup.c:check_filename() we try to stat whatever comes after ":^": - for the former, that's <path> and we are ok - for the latter, that's :<path>, fail. So it seems like the places that check for short-magic should also consider the optional trailing colon? -- D. Ben Knoble ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why does git-grep appear to treat exclude pathspecs differently? 2025-07-30 21:49 ` D. Ben Knoble @ 2025-08-02 9:46 ` Jeff King 2025-08-02 16:13 ` D. Ben Knoble 2025-08-02 17:12 ` Junio C Hamano 2025-08-02 16:14 ` D. Ben Knoble 1 sibling, 2 replies; 12+ messages in thread From: Jeff King @ 2025-08-02 9:46 UTC (permalink / raw) To: D. Ben Knoble; +Cc: Junio C Hamano, Git On Wed, Jul 30, 2025 at 05:49:29PM -0400, D. Ben Knoble wrote: > which eventually dies as noted. However: > > - looks_like_pathspec() only checks for long magic, as you noted > - setup.c:check_filename() looks for short-magic, too, but only > considers ":^" to work like a pathname if we're excluding everything? > > I think what I find confusing is that, while this is definitely a DWIM > case for git-grep, it doesn't seem to do DWIM :) We are verifying that > the remaining arguments are filenames, but couldn't they really be > full pathspecs, as long as they aren't revisions? > > The difference for "^:<path>" from ":^:<path>", just to complete the > story, is that in setup.c:check_filename() we try to stat whatever > comes after ":^": > - for the former, that's <path> and we are ok > - for the latter, that's :<path>, fail. > > So it seems like the places that check for short-magic should also > consider the optional trailing colon? Yeah, I'd think so. But it's worse than that, even. According to the glossary definition you showed: In the short form, the leading colon `:` is followed by zero or more "magic signature" letters (which optionally is terminated by another colon `:`) we allow multiple bits of magic. So the code in check_filename() that looks for ":/", ":^", etc would be fooled when seeing more than one character, like: ":/^exclude-from-root". AFAICT there are only two short magic types, so I guess nobody has really run into this before. Also, I guess this function ought to be respecting the literal_pathspecs global? The actual pathspec parser does. If we can, we probably ought to be feeding the paths to a function like pathspec.c:parse_element_magic() and then checking the resulting flags (and skipping past the prefix as it indicates). -Peff PS I didn't even know that we allowed multiple short items or a trailing colon until your email! Hidden corners of Git. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why does git-grep appear to treat exclude pathspecs differently? 2025-08-02 9:46 ` Jeff King @ 2025-08-02 16:13 ` D. Ben Knoble 2025-08-02 18:52 ` Jeff King 2025-08-02 17:12 ` Junio C Hamano 1 sibling, 1 reply; 12+ messages in thread From: D. Ben Knoble @ 2025-08-02 16:13 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Git On Sat, Aug 2, 2025 at 5:46 AM Jeff King <peff@peff.net> wrote: > > On Wed, Jul 30, 2025 at 05:49:29PM -0400, D. Ben Knoble wrote: > > > which eventually dies as noted. However: > > > > - looks_like_pathspec() only checks for long magic, as you noted > > - setup.c:check_filename() looks for short-magic, too, but only > > considers ":^" to work like a pathname if we're excluding everything? > > > > I think what I find confusing is that, while this is definitely a DWIM > > case for git-grep, it doesn't seem to do DWIM :) We are verifying that > > the remaining arguments are filenames, but couldn't they really be > > full pathspecs, as long as they aren't revisions? > > > > The difference for "^:<path>" from ":^:<path>", just to complete the > > story, is that in setup.c:check_filename() we try to stat whatever > > comes after ":^": > > - for the former, that's <path> and we are ok > > - for the latter, that's :<path>, fail. > > > > So it seems like the places that check for short-magic should also > > consider the optional trailing colon? > > Yeah, I'd think so. But it's worse than that, even. According to the > glossary definition you showed: > > In the short form, the leading colon `:` is followed by zero or more > "magic signature" letters (which optionally is terminated by another > colon `:`) > > we allow multiple bits of magic. So the code in check_filename() that > looks for ":/", ":^", etc would be fooled when seeing more than one > character, like: ":/^exclude-from-root". AFAICT there are only two short > magic types, so I guess nobody has really run into this before. Good call out! I also see only 2 magic items (one with a synonym) at the moment. > > Also, I guess this function ought to be respecting the literal_pathspecs > global? The actual pathspec parser does. > > If we can, we probably ought to be feeding the paths to a function like > pathspec.c:parse_element_magic() and then checking the resulting flags > (and skipping past the prefix as it indicates). Thanks for pointing me at this; maybe I'll find some time for patches unless someone beats me to it. > > -Peff > > PS I didn't even know that we allowed multiple short items or a trailing > colon until your email! Hidden corners of Git. ;) -- D. Ben Knoble ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why does git-grep appear to treat exclude pathspecs differently? 2025-08-02 16:13 ` D. Ben Knoble @ 2025-08-02 18:52 ` Jeff King 2025-08-03 5:56 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Jeff King @ 2025-08-02 18:52 UTC (permalink / raw) To: D. Ben Knoble; +Cc: Junio C Hamano, Git On Sat, Aug 02, 2025 at 12:13:25PM -0400, D. Ben Knoble wrote: > > Also, I guess this function ought to be respecting the literal_pathspecs > > global? The actual pathspec parser does. > > > > If we can, we probably ought to be feeding the paths to a function like > > pathspec.c:parse_element_magic() and then checking the resulting flags > > (and skipping past the prefix as it indicates). > > Thanks for pointing me at this; maybe I'll find some time for patches > unless someone beats me to it. So here's a simple-ish way to use the full pathspec parsing code: diff --git a/setup.c b/setup.c index 6f52dab64c..ad27a65d6b 100644 --- a/setup.c +++ b/setup.c @@ -176,28 +176,27 @@ int path_inside_repo(const char *prefix, const char *path) int check_filename(const char *prefix, const char *arg) { - char *to_free = NULL; + const char *args[] = { arg, NULL }; + struct pathspec ps; struct stat st; - if (skip_prefix(arg, ":/", &arg)) { - if (!*arg) /* ":/" is root dir, always exists */ - return 1; - prefix = NULL; - } else if (skip_prefix(arg, ":!", &arg) || - skip_prefix(arg, ":^", &arg)) { - if (!*arg) /* excluding everything is silly, but allowed */ - return 1; - } + parse_pathspec(&ps, 0, 0, prefix, args); + if (ps.nr < 1) + BUG("pathspec ended up with no items?"); + arg = ps.items[0].match; - if (prefix) - arg = to_free = prefix_filename(prefix, arg); + /* empty paths (after parsing magic) are OK */ + if (!*arg) { + clear_pathspec(&ps); + return 1; + } if (!lstat(arg, &st)) { - free(to_free); + clear_pathspec(&ps); return 1; /* file exists */ } if (is_missing_file_error(errno)) { - free(to_free); + clear_pathspec(&ps); return 0; /* file does not exist */ } die_errno(_("failed to stat '%s'"), arg); It does make :^:Documentation work as you'd expect. Curiously, testing :/Documentation is hard because that is _also_ a valid rev (the ":/" searches for a commit with that string in its message). So it will almost always fail anyway as "hey, this is both a rev and a file". But you can do ":^/Documentation" to see how it behaves. ;) But here's the interesting part: it breaks a bunch of tests. They all seem to be doing things like ":file.txt". In check_filename() right now we treat that literally. But as a pathspec, it is technically "colon followed by zero or more magic signature letters", and it is eaten. So I wonder if we have painted ourselves into a compatibility corner a bit, if we have two conflicting expectations. We might be better off just teaching check_filename() to parse multiple of [^/!] and the trailing colon. It's horrible and not great for maintainability, but this syntax is not something that changes often. -Peff ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: Why does git-grep appear to treat exclude pathspecs differently? 2025-08-02 18:52 ` Jeff King @ 2025-08-03 5:56 ` Junio C Hamano 2025-08-05 18:57 ` Jeff King 0 siblings, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2025-08-03 5:56 UTC (permalink / raw) To: Jeff King; +Cc: D. Ben Knoble, Git Jeff King <peff@peff.net> writes: > But here's the interesting part: it breaks a bunch of tests. They all > seem to be doing things like ":file.txt". In check_filename() right now > we treat that literally. But as a pathspec, it is technically "colon > followed by zero or more magic signature letters", and it is eaten. Hmph. Shouldn't the definition be "colon and then one or more magic signature letters", then? ":file.txt" to name the blob object at path file.txt in the index is fairly common "rev" and it is a shame that it has to become ambiguous with a pathspec element. > So I wonder if we have painted ourselves into a compatibility corner a > bit, if we have two conflicting expectations. We might be better off > just teaching check_filename() to parse multiple of [^/!] and the > trailing colon. It's horrible and not great for maintainability, but > this syntax is not something that changes often. Ah, OK. So the idea is that when given _as_ a pathspec element (e.g., after an explicit "--" separator), we do want to interpret ":file.txt" as the same as "file.txt", but when dwimming to sift revs and pathspec elements apart, prefer to take it as a blob object name in the index? I guess that would work better than the current code (or straight "use the full pathspec parser" approach) from the compatibility viewpoint. Thanks. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why does git-grep appear to treat exclude pathspecs differently? 2025-08-03 5:56 ` Junio C Hamano @ 2025-08-05 18:57 ` Jeff King 2025-08-05 20:06 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Jeff King @ 2025-08-05 18:57 UTC (permalink / raw) To: Junio C Hamano; +Cc: D. Ben Knoble, Git On Sat, Aug 02, 2025 at 10:56:28PM -0700, Junio C Hamano wrote: > > So I wonder if we have painted ourselves into a compatibility corner a > > bit, if we have two conflicting expectations. We might be better off > > just teaching check_filename() to parse multiple of [^/!] and the > > trailing colon. It's horrible and not great for maintainability, but > > this syntax is not something that changes often. > > Ah, OK. > > So the idea is that when given _as_ a pathspec element (e.g., after > an explicit "--" separator), we do want to interpret ":file.txt" as > the same as "file.txt", but when dwimming to sift revs and pathspec > elements apart, prefer to take it as a blob object name in the > index? Yeah, I think that is a good way of framing / justifying it; DWIM can be looser because it's inherently about heuristics. I don't plan to do anything with this topic anytime soon, but maybe Ben wants to produce a patch in that direction. -Peff PS I'll be offline for a week or so starting tomorrow, so apologies for any discussions I leave hanging. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why does git-grep appear to treat exclude pathspecs differently? 2025-08-05 18:57 ` Jeff King @ 2025-08-05 20:06 ` Junio C Hamano 0 siblings, 0 replies; 12+ messages in thread From: Junio C Hamano @ 2025-08-05 20:06 UTC (permalink / raw) To: Jeff King; +Cc: D. Ben Knoble, Git Jeff King <peff@peff.net> writes: > On Sat, Aug 02, 2025 at 10:56:28PM -0700, Junio C Hamano wrote: > >> > So I wonder if we have painted ourselves into a compatibility corner a >> > bit, if we have two conflicting expectations. We might be better off >> > just teaching check_filename() to parse multiple of [^/!] and the >> > trailing colon. It's horrible and not great for maintainability, but >> > this syntax is not something that changes often. >> >> Ah, OK. >> >> So the idea is that when given _as_ a pathspec element (e.g., after >> an explicit "--" separator), we do want to interpret ":file.txt" as >> the same as "file.txt", but when dwimming to sift revs and pathspec >> elements apart, prefer to take it as a blob object name in the >> index? > > Yeah, I think that is a good way of framing / justifying it; DWIM can be > looser because it's inherently about heuristics. > > I don't plan to do anything with this topic anytime soon, but maybe Ben > wants to produce a patch in that direction. > > -Peff > > PS I'll be offline for a week or so starting tomorrow, so apologies for > any discussions I leave hanging. Thanks for a heads-up, and enjoy your time off away from the list ;-) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why does git-grep appear to treat exclude pathspecs differently? 2025-08-02 9:46 ` Jeff King 2025-08-02 16:13 ` D. Ben Knoble @ 2025-08-02 17:12 ` Junio C Hamano 1 sibling, 0 replies; 12+ messages in thread From: Junio C Hamano @ 2025-08-02 17:12 UTC (permalink / raw) To: Jeff King; +Cc: D. Ben Knoble, Git Jeff King <peff@peff.net> writes: > PS I didn't even know that we allowed multiple short items or a trailing > colon until your email! Hidden corners of Git. But once you think about them, it all makes sense. The trailing colon can help strange folks who have paths that begin with one of these strange byte values ;-). ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why does git-grep appear to treat exclude pathspecs differently? 2025-07-30 21:49 ` D. Ben Knoble 2025-08-02 9:46 ` Jeff King @ 2025-08-02 16:14 ` D. Ben Knoble 2025-08-02 17:23 ` Junio C Hamano 1 sibling, 1 reply; 12+ messages in thread From: D. Ben Knoble @ 2025-08-02 16:14 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git On Wed, Jul 30, 2025 at 5:49 PM D. Ben Knoble <ben.knoble@gmail.com> wrote: > > On Sat, Jul 26, 2025 at 8:16 PM Junio C Hamano <gitster@pobox.com> wrote: > > > > "D. Ben Knoble" <ben.knoble@gmail.com> writes: > > > > > With Git 2.48.1, I observe the following behavior: > > > > > > - "git ls-files :^:Documentation/RelNotes | grep Rel" yields > > > "RelNotes", as expected > > > > It is deliberately confusing to spell ":(exclude)" as ":^:". > > What makes you say that? It's documented in "git help revisions": Er, that's "git help glossary" for the quotes below—woops! > > A pathspec that begins with a colon : has special meaning. In the > short form, the leading colon : is followed by zero or more "magic > signature" letters (which optionally is terminated by another colon > :), and the remainder is the pattern to match against the path. > > and > > exclude > After a path matches any non-exclude pathspec, it will be run > through all exclude pathspecs (magic signature: ! or its synonym > ^). > -- D. Ben Knoble ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Why does git-grep appear to treat exclude pathspecs differently? 2025-08-02 16:14 ` D. Ben Knoble @ 2025-08-02 17:23 ` Junio C Hamano 0 siblings, 0 replies; 12+ messages in thread From: Junio C Hamano @ 2025-08-02 17:23 UTC (permalink / raw) To: D. Ben Knoble; +Cc: Git "D. Ben Knoble" <ben.knoble@gmail.com> writes: >> > > - "git ls-files :^:Documentation/RelNotes | grep Rel" yields >> > > "RelNotes", as expected >> > >> > It is deliberately confusing to spell ":(exclude)" as ":^:". >> >> What makes you say that? It's documented in "git help revisions": Because the most natural way to spell it is ":!Documentation/RelNotes" not ":^:Documentation/RelNotes"? And that is why I did not say "invalid way to spell". It just was unnecessarily unfamiliar form to say the same thing. But as you need more than one "prefix" to the real pathname to trigger the bug in the disambiguation code, :! alone would not have exhibited the symptom, and you would have needed something multi-letter like :!: or :!/ in front of that path. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-08-05 20:06 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-07-26 13:44 Why does git-grep appear to treat exclude pathspecs differently? D. Ben Knoble 2025-07-27 0:16 ` Junio C Hamano 2025-07-30 21:49 ` D. Ben Knoble 2025-08-02 9:46 ` Jeff King 2025-08-02 16:13 ` D. Ben Knoble 2025-08-02 18:52 ` Jeff King 2025-08-03 5:56 ` Junio C Hamano 2025-08-05 18:57 ` Jeff King 2025-08-05 20:06 ` Junio C Hamano 2025-08-02 17:12 ` Junio C Hamano 2025-08-02 16:14 ` D. Ben Knoble 2025-08-02 17:23 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).