* Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly @ 2025-04-11 19:08 Piotr Siupa 2025-04-12 1:57 ` K Jayatheerth 0 siblings, 1 reply; 5+ messages in thread From: Piotr Siupa @ 2025-04-11 19:08 UTC (permalink / raw) To: git Hi! I think I've found a bug in the command "git add". It can be reproduced in a fresh repository by running: git init touch 'foo' 'f*' git add 'f*' The last command should add both files "f*" and "foo" to the index but it adds only "f*". Running it the second time works as expected. (It adds "foo" on the second attempt.) I'm using Git 2.43.2. The current "next" (2.49.0.805.g082f7c87e0) seems to have the same behavior if I'm testing it correctly. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly 2025-04-11 19:08 Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly Piotr Siupa @ 2025-04-12 1:57 ` K Jayatheerth 2025-04-12 3:00 ` JAYATHEERTH K 0 siblings, 1 reply; 5+ messages in thread From: K Jayatheerth @ 2025-04-12 1:57 UTC (permalink / raw) To: piotrsiupa; +Cc: git Hi Piotr, Hello everyone, Thanks for the clear bug report, Piotr. I can reproduce the behavior you described in 2.49: On Fri, Apr 11, 2025 at 9:08 PM Piotr Siupa <piotrsiupa@gmail.com> wrote: > > Hi! I think I've found a bug in the command "git add". > It can be reproduced in a fresh repository by running: > > git init > touch 'foo' 'f*' > git add 'f*' > > The last command should add both files "f*" and "foo" to the index but > it adds only "f*". > Running it the second time works as expected. (It adds "foo" on the > second attempt.) Following the code path down from 'cmd_add' (in 'builtin/add.c'), the issue appears to stem from how pathspecs are matched against directory entries. This happens specifically within the 'prune_directory' function which uses 'do_match_pathspec' internally (likely called via 'dir_path_match' -> 'match_pathspec' -> 'match_pathspec_with_flags'). Here's a breakdown of what seems to be happening during that first 'git add ''f*''' call: First, 'cmd_add' sees it needs to add new files. Then, 'fill_directory' finds both untracked files: 'foo' and the literal 'f*'. Next, 'prune_directory' is called to filter these using the pathspec ''f*''. Inside 'prune_directory', the 'do_match_pathspec' function is called for each file ('foo', then 'f*', or vice-versa) against the pathspec list (which just contains ''f*''). These calls share a common marker array (often called 'seen') to track which pathspecs have found a match so far. When 'do_match_pathspec' processes the literal file 'f*' against the pathspec item ''f*'', it calls 'match_pathspec_item'. This helper function likely returns a code like 'MATCHED_EXACTLY' because the pattern ''f*'' happens to exactly match the filename '"f*"'. Consequently, 'do_match_pathspec' updates the 'seen' array for the ''f*'' pathspec to mark it as exactly matched. Since a match was found, 'prune_directory' decides to keep the 'f*' entry. The problem arises when 'do_match_pathspec' processes the other file, 'foo', against the same pathspec item ''f*''. Before doing the actual comparison, it checks the 'seen' array and finds that the ''f*'' pathspec was already marked 'MATCHED_EXACTLY' (from processing the literal 'f*' file). An optimization check like 'if (seen && seen[i] == MATCHED_EXACTLY)' then evaluates to true. This causes the loop to 'continue', skipping the call to 'match_pathspec_item' entirely for the 'foo' file against the ''f*'' pattern. Because no match was found *in this specific call*, 'do_match_pathspec' returns 0, and 'prune_directory' discards the 'foo' entry. Finally, 'prune_directory' returns the filtered list, now containing only 'f*', and 'add_files' adds only that file to the index. On the *second* 'git add ''f*''' call, 'fill_directory' only finds the untracked 'foo'. 'do_match_pathspec' runs with a fresh 'seen' array, so the 'MATCHED_EXACTLY' check is initially false. 'match_pathspec_item' is called for 'foo', returns 'MATCHED_FNMATCH' (a glob match), and 'foo' is correctly added. > I'm using Git 2.43.2. The current "next" (2.49.0.805.g082f7c87e0) > seems to have the same behavior if I'm testing it correctly. Yes, the relevant code structures in 'do_match_pathspec' appear similar in recent versions, suggesting the behavior is likely consistent. Conclusion: The core issue seems to be that optimization check within 'do_match_pathspec': // inside do_match_pathspec loop: if (seen && seen[i] == MATCHED_EXACTLY) continue; This optimization assumes that once a pathspec item has achieved an "exact" match against *some* file, it doesn't need to be checked against *any other* files during the same directory scan operation. However, when a pathspec contains glob characters (like ''f*'') but happens to *also* exactly match a literal filename ('f*'), 'match_pathspec_item' appears to return 'MATCHED_EXACTLY'. This triggers the optimization, incorrectly preventing the *same* pathspec pattern ''f*'' from matching *other* files (like 'foo') via its intended glob behavior during that initial scan. A potential fix might involve adjusting the logic in 'match_pathspec_item' to perhaps not return 'MATCHED_EXACTLY' if the match involved globbing, or modifying the 'seen' check in 'do_match_pathspec' to account for this ambiguity. Thanks again for spotting this subtle behavior! -Jayatheerth ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly 2025-04-12 1:57 ` K Jayatheerth @ 2025-04-12 3:00 ` JAYATHEERTH K 2025-04-12 6:27 ` Jeff King 0 siblings, 1 reply; 5+ messages in thread From: JAYATHEERTH K @ 2025-04-12 3:00 UTC (permalink / raw) To: piotrsiupa; +Cc: git Upon a much closer look into the Git source code, specifically init_pathspec_item within pathspec.c, my previous hypothesis regarding the optimization check in do_match_pathspec appears to be incorrect, or at least not the root cause. The debugging output confirmed that the core issue seems to originate earlier, during the pathspec parsing phase. It revealed that when processing the input string 'f*' (without any explicit magic like :(glob)), the resulting pathspec_item.magic field does not have the , even though the string clearly contains the * glob metacharacter. (To clarify I did multiple fprintf checks in the do_match_pathspec fun) Looking at the logic in init_pathspec_item, it seems the PATHSPEC_GLOB flag is generally only added to item->magic if: a) The user explicitly uses a magic prefix like :(glob)f*. b) A global setting like GIT_GLOB_PATHSPECS_ENVIRONMENT is active. The code does not appear to automatically set PATHSPEC_GLOB on item->magic simply based on scanning the content of the path string for unescaped *, ?, or [ characters if no explicit magic prefix is present. This explains why the fix attempted in do_match_pathspec (which relied on checking the PATHSPEC_GLOB flag) was ineffective – the flag wasn't set in the first place for the 'f*' input. It also explains the original inconsistent behavior: because the pathspec lacks the PATHSPEC_GLOB magic flag, parts of Git relying on these flags treat it as literal by default, even if lower-level matching functions might still perform some wildcard expansion based on the string content itself. So, this seems less like a bug in a specific conditional check and more like a consequence of the current design where the magic flags indicating glob behavior are opt-in (via prefixes or environment variables) rather than implicitly inferred from the pathspec string's content. -Jayatheerth ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly 2025-04-12 3:00 ` JAYATHEERTH K @ 2025-04-12 6:27 ` Jeff King 2025-04-12 9:29 ` JAYATHEERTH K 0 siblings, 1 reply; 5+ messages in thread From: Jeff King @ 2025-04-12 6:27 UTC (permalink / raw) To: JAYATHEERTH K; +Cc: piotrsiupa, git On Sat, Apr 12, 2025 at 08:30:00AM +0530, JAYATHEERTH K wrote: > Upon a much closer look into the Git source code, specifically > init_pathspec_item within pathspec.c, my previous hypothesis regarding > the optimization check in do_match_pathspec appears to be incorrect, > or at least not the root cause. I think you're still on the right track, but are just mis-interpreting the item->magic field. It's about user-specified "magic" flags, one of which is "treat this pathspec like a glob, even if the default (or an earlier "literal" magic flag) would tell you not to do so". I don't think there is a bit flag in the pathspec item for "this item has glob meta characters". But we compute and record the offset of the first such character in the "nowildcard_len" field (which is for most cases much better, since we can optimize prefix matches for stuff like "foo/bar/*"). And you can compare that to "len" to see if it does any globbing at all (which should also naturally handle stuff like ":(literal)" because we then mark the whole thing as "nowildcard"). So something like this probably works: diff --git a/dir.c b/dir.c index cbd82be6c9..85cc08f4fc 100644 --- a/dir.c +++ b/dir.c @@ -519,7 +519,8 @@ static int do_match_pathspec(struct index_state *istate, ( exclude && !(ps->items[i].magic & PATHSPEC_EXCLUDE))) continue; - if (seen && seen[i] == MATCHED_EXACTLY) + if (seen && seen[i] == MATCHED_EXACTLY && + ps->items[i].nowildcard_len == ps->items[i].len) continue; /* * Make exclude patterns optional and never report I think if you grep around for 'nowildcard_len ==' you'll find one or two similar spots. -Peff ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly 2025-04-12 6:27 ` Jeff King @ 2025-04-12 9:29 ` JAYATHEERTH K 0 siblings, 0 replies; 5+ messages in thread From: JAYATHEERTH K @ 2025-04-12 9:29 UTC (permalink / raw) To: Jeff King; +Cc: piotrsiupa, git On Sat, Apr 12, 2025 at 11:57 AM Jeff King <peff@peff.net> wrote: > > On Sat, Apr 12, 2025 at 08:30:00AM +0530, JAYATHEERTH K wrote: > > > Upon a much closer look into the Git source code, specifically > > init_pathspec_item within pathspec.c, my previous hypothesis regarding > > the optimization check in do_match_pathspec appears to be incorrect, > > or at least not the root cause. > > I think you're still on the right track, but are just mis-interpreting > the item->magic field. It's about user-specified "magic" flags, one of > which is "treat this pathspec like a glob, even if the default (or an > earlier "literal" magic flag) would tell you not to do so". > Ok that makes a lot of sense I was actually tweaking around with has_wildcard but was confused with the chronology in cmd_add > I don't think there is a bit flag in the pathspec item for "this item > has glob meta characters". But we compute and record the offset of the > first such character in the "nowildcard_len" field (which is for most > cases much better, since we can optimize prefix matches for stuff like > "foo/bar/*"). > > And you can compare that to "len" to see if it does any globbing at all > (which should also naturally handle stuff like ":(literal)" because we > then mark the whole thing as "nowildcard"). > Understood!! makes sense again. > So something like this probably works: > > diff --git a/dir.c b/dir.c > index cbd82be6c9..85cc08f4fc 100644 > --- a/dir.c > +++ b/dir.c > @@ -519,7 +519,8 @@ static int do_match_pathspec(struct index_state *istate, > ( exclude && !(ps->items[i].magic & PATHSPEC_EXCLUDE))) > continue; > > - if (seen && seen[i] == MATCHED_EXACTLY) > + if (seen && seen[i] == MATCHED_EXACTLY && > + ps->items[i].nowildcard_len == ps->items[i].len) > continue; > /* > * Make exclude patterns optional and never report > > I think if you grep around for 'nowildcard_len ==' you'll find one or > two similar spots. > I did and found this (nowildcard_len) in attr files after tracing it up! Will send a patch in a new thread as soon as possible. > -Peff Thank you, Jayatheerth ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-04-12 9:29 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-04-11 19:08 Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly Piotr Siupa 2025-04-12 1:57 ` K Jayatheerth 2025-04-12 3:00 ` JAYATHEERTH K 2025-04-12 6:27 ` Jeff King 2025-04-12 9:29 ` JAYATHEERTH K
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.