From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: Sruteesh Kumar <sruteesh.oss@protonmail.com>,
"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCH] match_pathname(): give fnmatch one char of prefix context
Date: Thu, 23 Oct 2025 13:28:11 -0700 [thread overview]
Message-ID: <xmqq7bwltlb8.fsf@gitster.g> (raw)
In-Reply-To: <20251014003404.GC1507@coredump.intra.peff.net> (Jeff King's message of "Mon, 13 Oct 2025 20:34:04 -0400")
Jeff King <peff@peff.net> writes:
> Subject: match_pathname(): give fnmatch one char of prefix context
>
> In match_pathname(), which we use for matching .gitignore and
> .gitattribute patterns, we are comparing paths with with fnmatch
"with with" -> "with".
> patterns (actually our extended wildmatch, which will be important).
> There's an extra optimization there: we pre-compute the number of
> non-wildcard characters at the beginning of the pattern and do an
> fspathncmp() on that prefix.
>
> That lets us avoid fnmatch entirely on patterns without wildcards, and
> shrinks the amount of work we hand off to fnmatch. For a pattern like
> "foo*.txt" and a path "foobar.txt", we'd cut away the matching "foo"
> prefix and just pass "*.txt" and "bar.txt" to fnmatch().
>
> But this misses a subtle corner case. In fnmatch(), we'll think
> "bar.txt" is the start of the path, but it's not. This doesn't matter
> for the pattern above, but consider the wildmatch pattern "foo**/bar"
> and the path "foobar". These two should not match, because there is no
> file named "bar", and the "**" applies only to the containing directory
> name. But after removing the "foo" prefix, fnmatch will get "**/bar" and
> "bar", which it does consider a match, because "**/" can match zero
> directories.
Ouch.
> We can solve this by giving fnmatch a bit more context. As long as it
> has one byte of the matched prefix, then it will know that "bar" is not
> the start of the path. In this example it would get "o**/bar" and
> "obar", and realize that they cannot match.
>
> In the case that there are no wildcards at all (i.e., the whole prefix
> matches), we'll continue to return without running fnmatch at all. We
> just need to account for the extra byte in our adjusted lengths.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I wonder how much this prefix-matching buys us in practice. There are
> two cases that are helped:
>
> 1. When there is no wildcard in the pattern at all, we skip fnmatch
> entirely.
>
> 2. We do a raw match of the prefix chars, shrinking the size of what
> is passed to fnmatch.
>
> My suspicion is that most of the improvement comes from (1), and it
> would be very easy to retain that case and get rid of (2). But I haven't
> done any measuring.
The above matches my intuition as well.
> diff --git a/dir.c b/dir.c
> index 0a67a99cb3..764400d9c5 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1360,6 +1360,13 @@ int match_pathname(const char *pathname, int pathlen,
>
> if (fspathncmp(pattern, name, prefix))
> return 0;
> +
> + /*
> + * Retain one character of the prefix to
> + * pass to fnmatch, which lets it distinguish
> + * the start of a directory component correctly.
> + */
> + prefix--;
> pattern += prefix;
> patternlen -= prefix;
> name += prefix;
So, checking pattern "fo*o/bar" against "foo/bar", we'd use
"o*o/bar" to match "oo/bar", which is not necessary but our
conjecture is that feeding shorter fnmatch() is not buying
us much, which justifies this change.
If not, we could do a more targetted pessimization, perhaps like
this, ...
/* the non-wildcard prefix does not match? */
if (fspathncmp(pattern, name, prefix))
return 0;
/* the non-wildcard prefix is the whole thing? */
if (namelen == prefix && patternlen == prefix)
return 1;
/* avoid making foo**/bar match foobar */
if (3 <= prefix && memcmp(pattern, "**/", 3)
prefix--;
pattern += prefix;
patternlen -= prefix;
name += prefix;
namelen -= prefix;
... but that is even more specific hack than yours.
> @@ -1370,7 +1377,7 @@ int match_pathname(const char *pathname, int pathlen,
> * then our prefix match is all we need; we
> * do not need to call fnmatch at all.
> */
> - if (!patternlen && !namelen)
> + if (patternlen == 1 && namelen == 1)
> return 1;
> }
In any case, I do prefer doing this "our non-wildcard part matched
the whole thing, so let's return true" before stripping matching
prefix from the pattern and the name (like I showed earlier).
> diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
> index 273d71411f..db8bde280e 100755
> --- a/t/t0008-ignores.sh
> +++ b/t/t0008-ignores.sh
> @@ -847,6 +847,17 @@ test_expect_success 'directories and ** matches' '
> test_cmp expect actual
> '
>
> +test_expect_success '** not confused by matching leading prefix' '
> + cat >.gitignore <<-\EOF &&
> + foo**/bar
> + EOF
> + git check-ignore foobar foo/bar >actual &&
> + cat >expect <<-\EOF &&
> + foo/bar
> + EOF
> + test_cmp expect actual
> +'
> +
> ############################################################################
> #
> # test whitespace handling
next prev parent reply other threads:[~2025-10-23 20:28 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-10 14:57 Probable issue with code/documentation Sruteesh Kumar
2025-10-14 0:34 ` [PATCH] match_pathname(): give fnmatch one char of prefix context Jeff King
2025-10-14 3:09 ` Jeff King
2025-10-22 16:19 ` Sruteesh Kumar
2025-10-23 20:28 ` Junio C Hamano [this message]
2025-10-24 18:28 ` Sruteesh Kumar
2025-10-26 15:18 ` Jeff King
2025-10-26 15:26 ` Jeff King
2025-10-26 15:40 ` [PATCH v2 0/2] fix "foo**/bar" matching "foobar" Jeff King
2025-10-26 15:41 ` [PATCH v2 1/2] match_pathname(): reorder prefix-match check Jeff King
2025-10-26 15:42 ` [PATCH v2 2/2] match_pathname(): give fnmatch one char of prefix context Jeff King
2025-10-26 23:29 ` [PATCH] " Junio C Hamano
2025-10-27 14:29 ` Jeff King
2025-10-27 15:35 ` Junio C Hamano
2025-10-28 23:19 ` Jeff King
2025-10-29 15:32 ` [PATCH] doc: document backslash in gitignore patterns Jeff King
2025-10-29 15:55 ` Jeff King
2025-10-30 13:40 ` D. Ben Knoble
2025-10-30 15:08 ` Jeff King
2025-10-30 16:05 ` Ben Knoble
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq7bwltlb8.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=sruteesh.oss@protonmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.