All of lore.kernel.org
 help / color / mirror / Atom feed
From: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
To: piotrsiupa@gmail.com
Cc: git@vger.kernel.org
Subject: Re: Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly
Date: Sat, 12 Apr 2025 07:27:48 +0530	[thread overview]
Message-ID: <20250412015748.7177-1-jayatheerthkulkarni2005@gmail.com> (raw)
In-Reply-To: <CAPM0=yBnaXojeC9WkHg08deR-VpjaVQwyrqt8mk+54qLXqSaAQ@mail.gmail.com>

Hi Piotr, Hello everyone,

Thanks for the clear bug report, Piotr. I can reproduce the behavior
you described in 2.49:

On Fri, Apr 11, 2025 at 9:08 PM Piotr Siupa <piotrsiupa@gmail.com> wrote:
>
> Hi! I think I've found a bug in the command "git add".
> It can be reproduced in a fresh repository by running:
>
> git init
> touch 'foo' 'f*'
> git add 'f*'
>
> The last command should add both files "f*" and "foo" to the index but
> it adds only "f*".
> Running it the second time works as expected. (It adds "foo" on the
> second attempt.)

Following the code path down from 'cmd_add' (in 'builtin/add.c'),
the issue appears to stem from how pathspecs are matched against
directory entries. This happens specifically within the 'prune_directory'
function which uses 'do_match_pathspec' internally (likely called via
'dir_path_match' -> 'match_pathspec' -> 'match_pathspec_with_flags').

Here's a breakdown of what seems to be happening during that first
'git add ''f*''' call:

First, 'cmd_add' sees it needs to add new files. Then, 'fill_directory'
finds both untracked files: 'foo' and the literal 'f*'.

Next, 'prune_directory' is called to filter these using the pathspec ''f*''.
Inside 'prune_directory', the 'do_match_pathspec' function is called for
each file ('foo', then 'f*', or vice-versa) against the pathspec list
(which just contains ''f*''). These calls share a common marker array
(often called 'seen') to track which pathspecs have found a match so far.

When 'do_match_pathspec' processes the literal file 'f*' against the
pathspec item ''f*'', it calls 'match_pathspec_item'. This helper function
likely returns a code like 'MATCHED_EXACTLY' because the pattern ''f*''
happens to exactly match the filename '"f*"'. Consequently,
'do_match_pathspec' updates the 'seen' array for the ''f*'' pathspec to
mark it as exactly matched. Since a match was found, 'prune_directory'
decides to keep the 'f*' entry.

The problem arises when 'do_match_pathspec' processes the other file, 'foo',
against the same pathspec item ''f*''. Before doing the actual comparison,
it checks the 'seen' array and finds that the ''f*'' pathspec was already
marked 'MATCHED_EXACTLY' (from processing the literal 'f*' file).
An optimization check like 'if (seen && seen[i] == MATCHED_EXACTLY)'
then evaluates to true. This causes the loop to 'continue', skipping the
call to 'match_pathspec_item' entirely for the 'foo' file against the ''f*''
pattern. Because no match was found *in this specific call*, 'do_match_pathspec'
returns 0, and 'prune_directory' discards the 'foo' entry.

Finally, 'prune_directory' returns the filtered list, now containing only 'f*',
and 'add_files' adds only that file to the index.

On the *second* 'git add ''f*''' call, 'fill_directory' only finds the
untracked 'foo'. 'do_match_pathspec' runs with a fresh 'seen' array,
so the 'MATCHED_EXACTLY' check is initially false. 'match_pathspec_item'
is called for 'foo', returns 'MATCHED_FNMATCH' (a glob match), and 'foo'
is correctly added.

> I'm using Git 2.43.2. The current "next" (2.49.0.805.g082f7c87e0)
> seems to have the same behavior if I'm testing it correctly.

Yes, the relevant code structures in 'do_match_pathspec' appear similar
in recent versions, suggesting the behavior is likely consistent.

Conclusion:

The core issue seems to be that optimization check within 'do_match_pathspec':

  // inside do_match_pathspec loop:
  if (seen && seen[i] == MATCHED_EXACTLY)
          continue;

This optimization assumes that once a pathspec item has achieved an
"exact" match against *some* file, it doesn't need to be checked
against *any other* files during the same directory scan operation.

However, when a pathspec contains glob characters (like ''f*'') but
happens to *also* exactly match a literal filename ('f*'),
'match_pathspec_item' appears to return 'MATCHED_EXACTLY'. This triggers
the optimization, incorrectly preventing the *same* pathspec pattern ''f*''
from matching *other* files (like 'foo') via its intended glob behavior
during that initial scan.

A potential fix might involve adjusting the logic in 'match_pathspec_item'
to perhaps not return 'MATCHED_EXACTLY' if the match involved globbing,
or modifying the 'seen' check in 'do_match_pathspec' to account for
this ambiguity.

Thanks again for spotting this subtle behavior!

-Jayatheerth

  reply	other threads:[~2025-04-12  1:58 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-11 19:08 Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly Piotr Siupa
2025-04-12  1:57 ` K Jayatheerth [this message]
2025-04-12  3:00   ` JAYATHEERTH K
2025-04-12  6:27     ` Jeff King
2025-04-12  9:29       ` JAYATHEERTH K

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250412015748.7177-1-jayatheerthkulkarni2005@gmail.com \
    --to=jayatheerthkulkarni2005@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=piotrsiupa@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.