* Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly
@ 2025-04-11 19:08 Piotr Siupa
2025-04-12 1:57 ` K Jayatheerth
0 siblings, 1 reply; 5+ messages in thread
From: Piotr Siupa @ 2025-04-11 19:08 UTC (permalink / raw)
To: git
Hi! I think I've found a bug in the command "git add".
It can be reproduced in a fresh repository by running:
git init
touch 'foo' 'f*'
git add 'f*'
The last command should add both files "f*" and "foo" to the index but
it adds only "f*".
Running it the second time works as expected. (It adds "foo" on the
second attempt.)
I'm using Git 2.43.2. The current "next" (2.49.0.805.g082f7c87e0)
seems to have the same behavior if I'm testing it correctly.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly
2025-04-11 19:08 Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly Piotr Siupa
@ 2025-04-12 1:57 ` K Jayatheerth
2025-04-12 3:00 ` JAYATHEERTH K
0 siblings, 1 reply; 5+ messages in thread
From: K Jayatheerth @ 2025-04-12 1:57 UTC (permalink / raw)
To: piotrsiupa; +Cc: git
Hi Piotr, Hello everyone,
Thanks for the clear bug report, Piotr. I can reproduce the behavior
you described in 2.49:
On Fri, Apr 11, 2025 at 9:08 PM Piotr Siupa <piotrsiupa@gmail.com> wrote:
>
> Hi! I think I've found a bug in the command "git add".
> It can be reproduced in a fresh repository by running:
>
> git init
> touch 'foo' 'f*'
> git add 'f*'
>
> The last command should add both files "f*" and "foo" to the index but
> it adds only "f*".
> Running it the second time works as expected. (It adds "foo" on the
> second attempt.)
Following the code path down from 'cmd_add' (in 'builtin/add.c'),
the issue appears to stem from how pathspecs are matched against
directory entries. This happens specifically within the 'prune_directory'
function which uses 'do_match_pathspec' internally (likely called via
'dir_path_match' -> 'match_pathspec' -> 'match_pathspec_with_flags').
Here's a breakdown of what seems to be happening during that first
'git add ''f*''' call:
First, 'cmd_add' sees it needs to add new files. Then, 'fill_directory'
finds both untracked files: 'foo' and the literal 'f*'.
Next, 'prune_directory' is called to filter these using the pathspec ''f*''.
Inside 'prune_directory', the 'do_match_pathspec' function is called for
each file ('foo', then 'f*', or vice-versa) against the pathspec list
(which just contains ''f*''). These calls share a common marker array
(often called 'seen') to track which pathspecs have found a match so far.
When 'do_match_pathspec' processes the literal file 'f*' against the
pathspec item ''f*'', it calls 'match_pathspec_item'. This helper function
likely returns a code like 'MATCHED_EXACTLY' because the pattern ''f*''
happens to exactly match the filename '"f*"'. Consequently,
'do_match_pathspec' updates the 'seen' array for the ''f*'' pathspec to
mark it as exactly matched. Since a match was found, 'prune_directory'
decides to keep the 'f*' entry.
The problem arises when 'do_match_pathspec' processes the other file, 'foo',
against the same pathspec item ''f*''. Before doing the actual comparison,
it checks the 'seen' array and finds that the ''f*'' pathspec was already
marked 'MATCHED_EXACTLY' (from processing the literal 'f*' file).
An optimization check like 'if (seen && seen[i] == MATCHED_EXACTLY)'
then evaluates to true. This causes the loop to 'continue', skipping the
call to 'match_pathspec_item' entirely for the 'foo' file against the ''f*''
pattern. Because no match was found *in this specific call*, 'do_match_pathspec'
returns 0, and 'prune_directory' discards the 'foo' entry.
Finally, 'prune_directory' returns the filtered list, now containing only 'f*',
and 'add_files' adds only that file to the index.
On the *second* 'git add ''f*''' call, 'fill_directory' only finds the
untracked 'foo'. 'do_match_pathspec' runs with a fresh 'seen' array,
so the 'MATCHED_EXACTLY' check is initially false. 'match_pathspec_item'
is called for 'foo', returns 'MATCHED_FNMATCH' (a glob match), and 'foo'
is correctly added.
> I'm using Git 2.43.2. The current "next" (2.49.0.805.g082f7c87e0)
> seems to have the same behavior if I'm testing it correctly.
Yes, the relevant code structures in 'do_match_pathspec' appear similar
in recent versions, suggesting the behavior is likely consistent.
Conclusion:
The core issue seems to be that optimization check within 'do_match_pathspec':
// inside do_match_pathspec loop:
if (seen && seen[i] == MATCHED_EXACTLY)
continue;
This optimization assumes that once a pathspec item has achieved an
"exact" match against *some* file, it doesn't need to be checked
against *any other* files during the same directory scan operation.
However, when a pathspec contains glob characters (like ''f*'') but
happens to *also* exactly match a literal filename ('f*'),
'match_pathspec_item' appears to return 'MATCHED_EXACTLY'. This triggers
the optimization, incorrectly preventing the *same* pathspec pattern ''f*''
from matching *other* files (like 'foo') via its intended glob behavior
during that initial scan.
A potential fix might involve adjusting the logic in 'match_pathspec_item'
to perhaps not return 'MATCHED_EXACTLY' if the match involved globbing,
or modifying the 'seen' check in 'do_match_pathspec' to account for
this ambiguity.
Thanks again for spotting this subtle behavior!
-Jayatheerth
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly
2025-04-12 1:57 ` K Jayatheerth
@ 2025-04-12 3:00 ` JAYATHEERTH K
2025-04-12 6:27 ` Jeff King
0 siblings, 1 reply; 5+ messages in thread
From: JAYATHEERTH K @ 2025-04-12 3:00 UTC (permalink / raw)
To: piotrsiupa; +Cc: git
Upon a much closer look into the Git source code, specifically
init_pathspec_item within pathspec.c, my previous hypothesis regarding
the optimization check in do_match_pathspec appears to be incorrect,
or at least not the root cause.
The debugging output confirmed that the core issue seems to originate
earlier, during the pathspec parsing phase. It revealed that when
processing the input string 'f*' (without any explicit magic like
:(glob)), the resulting pathspec_item.magic field does not have the ,
even though the string clearly contains the * glob metacharacter. (To
clarify I did multiple fprintf checks in the do_match_pathspec fun)
Looking at the logic in init_pathspec_item, it seems the PATHSPEC_GLOB
flag is generally only added to item->magic if:
a) The user explicitly uses a magic prefix like :(glob)f*.
b) A global setting like GIT_GLOB_PATHSPECS_ENVIRONMENT is active.
The code does not appear to automatically set PATHSPEC_GLOB on
item->magic simply based on scanning the content of the path string
for unescaped *, ?, or [ characters if no explicit magic prefix is
present.
This explains why the fix attempted in do_match_pathspec (which relied
on checking the PATHSPEC_GLOB flag) was ineffective – the flag wasn't
set in the first place for the 'f*' input. It also explains the
original inconsistent behavior: because the pathspec lacks the
PATHSPEC_GLOB magic flag, parts of Git relying on these flags treat it
as literal by default, even if lower-level matching functions might
still perform some wildcard expansion based on the string content
itself.
So, this seems less like a bug in a specific conditional check and
more like a consequence of the current design where the magic flags
indicating glob behavior are opt-in (via prefixes or environment
variables) rather than implicitly inferred from the pathspec string's
content.
-Jayatheerth
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly
2025-04-12 3:00 ` JAYATHEERTH K
@ 2025-04-12 6:27 ` Jeff King
2025-04-12 9:29 ` JAYATHEERTH K
0 siblings, 1 reply; 5+ messages in thread
From: Jeff King @ 2025-04-12 6:27 UTC (permalink / raw)
To: JAYATHEERTH K; +Cc: piotrsiupa, git
On Sat, Apr 12, 2025 at 08:30:00AM +0530, JAYATHEERTH K wrote:
> Upon a much closer look into the Git source code, specifically
> init_pathspec_item within pathspec.c, my previous hypothesis regarding
> the optimization check in do_match_pathspec appears to be incorrect,
> or at least not the root cause.
I think you're still on the right track, but are just mis-interpreting
the item->magic field. It's about user-specified "magic" flags, one of
which is "treat this pathspec like a glob, even if the default (or an
earlier "literal" magic flag) would tell you not to do so".
I don't think there is a bit flag in the pathspec item for "this item
has glob meta characters". But we compute and record the offset of the
first such character in the "nowildcard_len" field (which is for most
cases much better, since we can optimize prefix matches for stuff like
"foo/bar/*").
And you can compare that to "len" to see if it does any globbing at all
(which should also naturally handle stuff like ":(literal)" because we
then mark the whole thing as "nowildcard").
So something like this probably works:
diff --git a/dir.c b/dir.c
index cbd82be6c9..85cc08f4fc 100644
--- a/dir.c
+++ b/dir.c
@@ -519,7 +519,8 @@ static int do_match_pathspec(struct index_state *istate,
( exclude && !(ps->items[i].magic & PATHSPEC_EXCLUDE)))
continue;
- if (seen && seen[i] == MATCHED_EXACTLY)
+ if (seen && seen[i] == MATCHED_EXACTLY &&
+ ps->items[i].nowildcard_len == ps->items[i].len)
continue;
/*
* Make exclude patterns optional and never report
I think if you grep around for 'nowildcard_len ==' you'll find one or
two similar spots.
-Peff
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly
2025-04-12 6:27 ` Jeff King
@ 2025-04-12 9:29 ` JAYATHEERTH K
0 siblings, 0 replies; 5+ messages in thread
From: JAYATHEERTH K @ 2025-04-12 9:29 UTC (permalink / raw)
To: Jeff King; +Cc: piotrsiupa, git
On Sat, Apr 12, 2025 at 11:57 AM Jeff King <peff@peff.net> wrote:
>
> On Sat, Apr 12, 2025 at 08:30:00AM +0530, JAYATHEERTH K wrote:
>
> > Upon a much closer look into the Git source code, specifically
> > init_pathspec_item within pathspec.c, my previous hypothesis regarding
> > the optimization check in do_match_pathspec appears to be incorrect,
> > or at least not the root cause.
>
> I think you're still on the right track, but are just mis-interpreting
> the item->magic field. It's about user-specified "magic" flags, one of
> which is "treat this pathspec like a glob, even if the default (or an
> earlier "literal" magic flag) would tell you not to do so".
>
Ok that makes a lot of sense I was actually tweaking around with
has_wildcard but was confused with the chronology in cmd_add
> I don't think there is a bit flag in the pathspec item for "this item
> has glob meta characters". But we compute and record the offset of the
> first such character in the "nowildcard_len" field (which is for most
> cases much better, since we can optimize prefix matches for stuff like
> "foo/bar/*").
>
> And you can compare that to "len" to see if it does any globbing at all
> (which should also naturally handle stuff like ":(literal)" because we
> then mark the whole thing as "nowildcard").
>
Understood!! makes sense again.
> So something like this probably works:
>
> diff --git a/dir.c b/dir.c
> index cbd82be6c9..85cc08f4fc 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -519,7 +519,8 @@ static int do_match_pathspec(struct index_state *istate,
> ( exclude && !(ps->items[i].magic & PATHSPEC_EXCLUDE)))
> continue;
>
> - if (seen && seen[i] == MATCHED_EXACTLY)
> + if (seen && seen[i] == MATCHED_EXACTLY &&
> + ps->items[i].nowildcard_len == ps->items[i].len)
> continue;
> /*
> * Make exclude patterns optional and never report
>
> I think if you grep around for 'nowildcard_len ==' you'll find one or
> two similar spots.
>
I did and found this (nowildcard_len) in attr files after tracing it up!
Will send a patch in a new thread as soon as possible.
> -Peff
Thank you,
Jayatheerth
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-04-12 9:29 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-11 19:08 Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly Piotr Siupa
2025-04-12 1:57 ` K Jayatheerth
2025-04-12 3:00 ` JAYATHEERTH K
2025-04-12 6:27 ` Jeff King
2025-04-12 9:29 ` JAYATHEERTH K
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.