Git development
 help / color / mirror / Atom feed
From: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
To: piotrsiupa@gmail.com
Cc: git@vger.kernel.org
Subject: Re: Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly
Date: Sat, 12 Apr 2025 07:27:48 +0530	[thread overview]
Message-ID: <20250412015748.7177-1-jayatheerthkulkarni2005@gmail.com> (raw)
In-Reply-To: <CAPM0=yBnaXojeC9WkHg08deR-VpjaVQwyrqt8mk+54qLXqSaAQ@mail.gmail.com>

Hi Piotr, Hello everyone,

Thanks for the clear bug report, Piotr. I can reproduce the behavior
you described in 2.49:

On Fri, Apr 11, 2025 at 9:08 PM Piotr Siupa <piotrsiupa@gmail.com> wrote:
>
> Hi! I think I've found a bug in the command "git add".
> It can be reproduced in a fresh repository by running:
>
> git init
> touch 'foo' 'f*'
> git add 'f*'
>
> The last command should add both files "f*" and "foo" to the index but
> it adds only "f*".
> Running it the second time works as expected. (It adds "foo" on the
> second attempt.)

Following the code path down from 'cmd_add' (in 'builtin/add.c'),
the issue appears to stem from how pathspecs are matched against
directory entries. This happens specifically within the 'prune_directory'
function which uses 'do_match_pathspec' internally (likely called via
'dir_path_match' -> 'match_pathspec' -> 'match_pathspec_with_flags').

Here's a breakdown of what seems to be happening during that first
'git add ''f*''' call:

First, 'cmd_add' sees it needs to add new files. Then, 'fill_directory'
finds both untracked files: 'foo' and the literal 'f*'.

Next, 'prune_directory' is called to filter these using the pathspec ''f*''.
Inside 'prune_directory', the 'do_match_pathspec' function is called for
each file ('foo', then 'f*', or vice-versa) against the pathspec list
(which just contains ''f*''). These calls share a common marker array
(often called 'seen') to track which pathspecs have found a match so far.

When 'do_match_pathspec' processes the literal file 'f*' against the
pathspec item ''f*'', it calls 'match_pathspec_item'. This helper function
likely returns a code like 'MATCHED_EXACTLY' because the pattern ''f*''
happens to exactly match the filename '"f*"'. Consequently,
'do_match_pathspec' updates the 'seen' array for the ''f*'' pathspec to
mark it as exactly matched. Since a match was found, 'prune_directory'
decides to keep the 'f*' entry.

The problem arises when 'do_match_pathspec' processes the other file, 'foo',
against the same pathspec item ''f*''. Before doing the actual comparison,
it checks the 'seen' array and finds that the ''f*'' pathspec was already
marked 'MATCHED_EXACTLY' (from processing the literal 'f*' file).
An optimization check like 'if (seen && seen[i] == MATCHED_EXACTLY)'
then evaluates to true. This causes the loop to 'continue', skipping the
call to 'match_pathspec_item' entirely for the 'foo' file against the ''f*''
pattern. Because no match was found *in this specific call*, 'do_match_pathspec'
returns 0, and 'prune_directory' discards the 'foo' entry.

Finally, 'prune_directory' returns the filtered list, now containing only 'f*',
and 'add_files' adds only that file to the index.

On the *second* 'git add ''f*''' call, 'fill_directory' only finds the
untracked 'foo'. 'do_match_pathspec' runs with a fresh 'seen' array,
so the 'MATCHED_EXACTLY' check is initially false. 'match_pathspec_item'
is called for 'foo', returns 'MATCHED_FNMATCH' (a glob match), and 'foo'
is correctly added.

> I'm using Git 2.43.2. The current "next" (2.49.0.805.g082f7c87e0)
> seems to have the same behavior if I'm testing it correctly.

Yes, the relevant code structures in 'do_match_pathspec' appear similar
in recent versions, suggesting the behavior is likely consistent.

Conclusion:

The core issue seems to be that optimization check within 'do_match_pathspec':

  // inside do_match_pathspec loop:
  if (seen && seen[i] == MATCHED_EXACTLY)
          continue;

This optimization assumes that once a pathspec item has achieved an
"exact" match against *some* file, it doesn't need to be checked
against *any other* files during the same directory scan operation.

However, when a pathspec contains glob characters (like ''f*'') but
happens to *also* exactly match a literal filename ('f*'),
'match_pathspec_item' appears to return 'MATCHED_EXACTLY'. This triggers
the optimization, incorrectly preventing the *same* pathspec pattern ''f*''
from matching *other* files (like 'foo') via its intended glob behavior
during that initial scan.

A potential fix might involve adjusting the logic in 'match_pathspec_item'
to perhaps not return 'MATCHED_EXACTLY' if the match involved globbing,
or modifying the 'seen' check in 'do_match_pathspec' to account for
this ambiguity.

Thanks again for spotting this subtle behavior!

-Jayatheerth

  reply	other threads:[~2025-04-12  1:58 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-11 19:08 Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly Piotr Siupa
2025-04-12  1:57 ` K Jayatheerth [this message]
2025-04-12  3:00   ` JAYATHEERTH K
2025-04-12  6:27     ` Jeff King
2025-04-12  9:29       ` JAYATHEERTH K

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250412015748.7177-1-jayatheerthkulkarni2005@gmail.com \
    --to=jayatheerthkulkarni2005@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=piotrsiupa@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox