* Probable issue with code/documentation
@ 2025-10-10 14:57 Sruteesh Kumar
2025-10-14 0:34 ` [PATCH] match_pathname(): give fnmatch one char of prefix context Jeff King
0 siblings, 1 reply; 20+ messages in thread
From: Sruteesh Kumar @ 2025-10-10 14:57 UTC (permalink / raw)
To: git@vger.kernel.org
Hi Team
Someone rightly raised a concern on incompatibility between git and eclipse-jgit related to pattern matching. Please look at https://github.com/eclipse-jgit/jgit/issues/217
Look at the first scenario in the above link. Git is matching the path foobar with the pattern foo**/bar which is against the git's official documentation (Look at the last point in the double asterisk section at the URL https://git-scm.com/docs/gitignore#_pattern_format).
Is this an issue with the code or the documentation?
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] match_pathname(): give fnmatch one char of prefix context
2025-10-10 14:57 Probable issue with code/documentation Sruteesh Kumar
@ 2025-10-14 0:34 ` Jeff King
2025-10-14 3:09 ` Jeff King
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Jeff King @ 2025-10-14 0:34 UTC (permalink / raw)
To: Sruteesh Kumar; +Cc: git@vger.kernel.org
On Fri, Oct 10, 2025 at 02:57:07PM +0000, Sruteesh Kumar wrote:
> Look at the first scenario in the above link. Git is matching the path
> foobar with the pattern foo**/bar which is against the git's official
> documentation (Look at the last point in the double asterisk section
> at the URL https://git-scm.com/docs/gitignore#_pattern_format).
>
> Is this an issue with the code or the documentation?
I think the code is buggy. Here's a patch which would fix it, but I've
marked it as RFC because:
1. I haven't entirely convinced myself that there aren't more
complicated variants of the same problem.
2. It's kind of a disgusting hack.
-- >8 --
Subject: match_pathname(): give fnmatch one char of prefix context
In match_pathname(), which we use for matching .gitignore and
.gitattribute patterns, we are comparing paths with with fnmatch
patterns (actually our extended wildmatch, which will be important).
There's an extra optimization there: we pre-compute the number of
non-wildcard characters at the beginning of the pattern and do an
fspathncmp() on that prefix.
That lets us avoid fnmatch entirely on patterns without wildcards, and
shrinks the amount of work we hand off to fnmatch. For a pattern like
"foo*.txt" and a path "foobar.txt", we'd cut away the matching "foo"
prefix and just pass "*.txt" and "bar.txt" to fnmatch().
But this misses a subtle corner case. In fnmatch(), we'll think
"bar.txt" is the start of the path, but it's not. This doesn't matter
for the pattern above, but consider the wildmatch pattern "foo**/bar"
and the path "foobar". These two should not match, because there is no
file named "bar", and the "**" applies only to the containing directory
name. But after removing the "foo" prefix, fnmatch will get "**/bar" and
"bar", which it does consider a match, because "**/" can match zero
directories.
We can solve this by giving fnmatch a bit more context. As long as it
has one byte of the matched prefix, then it will know that "bar" is not
the start of the path. In this example it would get "o**/bar" and
"obar", and realize that they cannot match.
In the case that there are no wildcards at all (i.e., the whole prefix
matches), we'll continue to return without running fnmatch at all. We
just need to account for the extra byte in our adjusted lengths.
Signed-off-by: Jeff King <peff@peff.net>
---
I wonder how much this prefix-matching buys us in practice. There are
two cases that are helped:
1. When there is no wildcard in the pattern at all, we skip fnmatch
entirely.
2. We do a raw match of the prefix chars, shrinking the size of what
is passed to fnmatch.
My suspicion is that most of the improvement comes from (1), and it
would be very easy to retain that case and get rid of (2). But I haven't
done any measuring.
dir.c | 9 ++++++++-
t/t0008-ignores.sh | 11 +++++++++++
2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/dir.c b/dir.c
index 0a67a99cb3..764400d9c5 100644
--- a/dir.c
+++ b/dir.c
@@ -1360,6 +1360,13 @@ int match_pathname(const char *pathname, int pathlen,
if (fspathncmp(pattern, name, prefix))
return 0;
+
+ /*
+ * Retain one character of the prefix to
+ * pass to fnmatch, which lets it distinguish
+ * the start of a directory component correctly.
+ */
+ prefix--;
pattern += prefix;
patternlen -= prefix;
name += prefix;
@@ -1370,7 +1377,7 @@ int match_pathname(const char *pathname, int pathlen,
* then our prefix match is all we need; we
* do not need to call fnmatch at all.
*/
- if (!patternlen && !namelen)
+ if (patternlen == 1 && namelen == 1)
return 1;
}
diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
index 273d71411f..db8bde280e 100755
--- a/t/t0008-ignores.sh
+++ b/t/t0008-ignores.sh
@@ -847,6 +847,17 @@ test_expect_success 'directories and ** matches' '
test_cmp expect actual
'
+test_expect_success '** not confused by matching leading prefix' '
+ cat >.gitignore <<-\EOF &&
+ foo**/bar
+ EOF
+ git check-ignore foobar foo/bar >actual &&
+ cat >expect <<-\EOF &&
+ foo/bar
+ EOF
+ test_cmp expect actual
+'
+
############################################################################
#
# test whitespace handling
--
2.51.0.754.gd4f5ded95f
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] match_pathname(): give fnmatch one char of prefix context
2025-10-14 0:34 ` [PATCH] match_pathname(): give fnmatch one char of prefix context Jeff King
@ 2025-10-14 3:09 ` Jeff King
2025-10-22 16:19 ` Sruteesh Kumar
2025-10-23 20:28 ` Junio C Hamano
2 siblings, 0 replies; 20+ messages in thread
From: Jeff King @ 2025-10-14 3:09 UTC (permalink / raw)
To: Sruteesh Kumar; +Cc: git@vger.kernel.org
On Mon, Oct 13, 2025 at 08:34:04PM -0400, Jeff King wrote:
> On Fri, Oct 10, 2025 at 02:57:07PM +0000, Sruteesh Kumar wrote:
>
> > Look at the first scenario in the above link. Git is matching the path
> > foobar with the pattern foo**/bar which is against the git's official
> > documentation (Look at the last point in the double asterisk section
> > at the URL https://git-scm.com/docs/gitignore#_pattern_format).
> >
> > Is this an issue with the code or the documentation?
>
> I think the code is buggy. Here's a patch which would fix it, but I've
> marked it as RFC because:
Er, sorry, I forgot to actually mark it as "RFC" in the subject. But
pretend that I did. ;)
-Peff
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] match_pathname(): give fnmatch one char of prefix context
2025-10-14 0:34 ` [PATCH] match_pathname(): give fnmatch one char of prefix context Jeff King
2025-10-14 3:09 ` Jeff King
@ 2025-10-22 16:19 ` Sruteesh Kumar
2025-10-23 20:28 ` Junio C Hamano
2 siblings, 0 replies; 20+ messages in thread
From: Sruteesh Kumar @ 2025-10-22 16:19 UTC (permalink / raw)
To: Jeff King; +Cc: git@vger.kernel.org
Although this feels like a hack, it fixes the issue without impact much of the code. Otherwise, we would get regression issues. Someone from git community, please check this once.
Sent with Proton Mail secure email.
On Tuesday, October 14th, 2025 at 6:04 AM, Jeff King <peff@peff.net> wrote:
> On Fri, Oct 10, 2025 at 02:57:07PM +0000, Sruteesh Kumar wrote:
>
> > Look at the first scenario in the above link. Git is matching the path
> > foobar with the pattern foo**/bar which is against the git's official
> > documentation (Look at the last point in the double asterisk section
> > at the URL https://git-scm.com/docs/gitignore#_pattern_format).
> >
> > Is this an issue with the code or the documentation?
>
>
> I think the code is buggy. Here's a patch which would fix it, but I've
> marked it as RFC because:
>
> 1. I haven't entirely convinced myself that there aren't more
> complicated variants of the same problem.
>
> 2. It's kind of a disgusting hack.
>
> -- >8 --
>
> Subject: match_pathname(): give fnmatch one char of prefix context
>
> In match_pathname(), which we use for matching .gitignore and
> .gitattribute patterns, we are comparing paths with with fnmatch
> patterns (actually our extended wildmatch, which will be important).
> There's an extra optimization there: we pre-compute the number of
> non-wildcard characters at the beginning of the pattern and do an
> fspathncmp() on that prefix.
>
> That lets us avoid fnmatch entirely on patterns without wildcards, and
> shrinks the amount of work we hand off to fnmatch. For a pattern like
> "foo*.txt" and a path "foobar.txt", we'd cut away the matching "foo"
> prefix and just pass "*.txt" and "bar.txt" to fnmatch().
>
> But this misses a subtle corner case. In fnmatch(), we'll think
> "bar.txt" is the start of the path, but it's not. This doesn't matter
> for the pattern above, but consider the wildmatch pattern "foo**/bar"
> and the path "foobar". These two should not match, because there is no
> file named "bar", and the "" applies only to the containing directory
> name. But after removing the "foo" prefix, fnmatch will get "/bar" and
> "bar", which it does consider a match, because "/" can match zero
> directories.
>
> We can solve this by giving fnmatch a bit more context. As long as it
> has one byte of the matched prefix, then it will know that "bar" is not
> the start of the path. In this example it would get "o/bar" and
> "obar", and realize that they cannot match.
>
> In the case that there are no wildcards at all (i.e., the whole prefix
> matches), we'll continue to return without running fnmatch at all. We
> just need to account for the extra byte in our adjusted lengths.
>
> Signed-off-by: Jeff King peff@peff.net
>
> ---
> I wonder how much this prefix-matching buys us in practice. There are
> two cases that are helped:
>
> 1. When there is no wildcard in the pattern at all, we skip fnmatch
> entirely.
>
> 2. We do a raw match of the prefix chars, shrinking the size of what
> is passed to fnmatch.
>
> My suspicion is that most of the improvement comes from (1), and it
> would be very easy to retain that case and get rid of (2). But I haven't
> done any measuring.
>
> dir.c | 9 ++++++++-
> t/t0008-ignores.sh | 11 +++++++++++
> 2 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/dir.c b/dir.c
> index 0a67a99cb3..764400d9c5 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1360,6 +1360,13 @@ int match_pathname(const char pathname, int pathlen,
>
> if (fspathncmp(pattern, name, prefix))
> return 0;
> +
> + /
> + * Retain one character of the prefix to
> + * pass to fnmatch, which lets it distinguish
> + * the start of a directory component correctly.
> + */
> + prefix--;
> pattern += prefix;
> patternlen -= prefix;
> name += prefix;
> @@ -1370,7 +1377,7 @@ int match_pathname(const char pathname, int pathlen,
> * then our prefix match is all we need; we
> * do not need to call fnmatch at all.
> /
> - if (!patternlen && !namelen)
> + if (patternlen == 1 && namelen == 1)
> return 1;
> }
>
> diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
> index 273d71411f..db8bde280e 100755
> --- a/t/t0008-ignores.sh
> +++ b/t/t0008-ignores.sh
> @@ -847,6 +847,17 @@ test_expect_success 'directories and ** matches' '
> test_cmp expect actual
> '
>
> +test_expect_success ' not confused by matching leading prefix' '
> + cat >.gitignore <<-\EOF &&
>
> + foo**/bar
> + EOF
> + git check-ignore foobar foo/bar >actual &&
>
> + cat >expect <<-\EOF &&
>
> + foo/bar
> + EOF
> + test_cmp expect actual
> +'
> +
> ############################################################################
> #
> # test whitespace handling
> --
> 2.51.0.754.gd4f5ded95f
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] match_pathname(): give fnmatch one char of prefix context
2025-10-14 0:34 ` [PATCH] match_pathname(): give fnmatch one char of prefix context Jeff King
2025-10-14 3:09 ` Jeff King
2025-10-22 16:19 ` Sruteesh Kumar
@ 2025-10-23 20:28 ` Junio C Hamano
2025-10-24 18:28 ` Sruteesh Kumar
` (2 more replies)
2 siblings, 3 replies; 20+ messages in thread
From: Junio C Hamano @ 2025-10-23 20:28 UTC (permalink / raw)
To: Jeff King; +Cc: Sruteesh Kumar, git@vger.kernel.org
Jeff King <peff@peff.net> writes:
> Subject: match_pathname(): give fnmatch one char of prefix context
>
> In match_pathname(), which we use for matching .gitignore and
> .gitattribute patterns, we are comparing paths with with fnmatch
"with with" -> "with".
> patterns (actually our extended wildmatch, which will be important).
> There's an extra optimization there: we pre-compute the number of
> non-wildcard characters at the beginning of the pattern and do an
> fspathncmp() on that prefix.
>
> That lets us avoid fnmatch entirely on patterns without wildcards, and
> shrinks the amount of work we hand off to fnmatch. For a pattern like
> "foo*.txt" and a path "foobar.txt", we'd cut away the matching "foo"
> prefix and just pass "*.txt" and "bar.txt" to fnmatch().
>
> But this misses a subtle corner case. In fnmatch(), we'll think
> "bar.txt" is the start of the path, but it's not. This doesn't matter
> for the pattern above, but consider the wildmatch pattern "foo**/bar"
> and the path "foobar". These two should not match, because there is no
> file named "bar", and the "**" applies only to the containing directory
> name. But after removing the "foo" prefix, fnmatch will get "**/bar" and
> "bar", which it does consider a match, because "**/" can match zero
> directories.
Ouch.
> We can solve this by giving fnmatch a bit more context. As long as it
> has one byte of the matched prefix, then it will know that "bar" is not
> the start of the path. In this example it would get "o**/bar" and
> "obar", and realize that they cannot match.
>
> In the case that there are no wildcards at all (i.e., the whole prefix
> matches), we'll continue to return without running fnmatch at all. We
> just need to account for the extra byte in our adjusted lengths.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I wonder how much this prefix-matching buys us in practice. There are
> two cases that are helped:
>
> 1. When there is no wildcard in the pattern at all, we skip fnmatch
> entirely.
>
> 2. We do a raw match of the prefix chars, shrinking the size of what
> is passed to fnmatch.
>
> My suspicion is that most of the improvement comes from (1), and it
> would be very easy to retain that case and get rid of (2). But I haven't
> done any measuring.
The above matches my intuition as well.
> diff --git a/dir.c b/dir.c
> index 0a67a99cb3..764400d9c5 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1360,6 +1360,13 @@ int match_pathname(const char *pathname, int pathlen,
>
> if (fspathncmp(pattern, name, prefix))
> return 0;
> +
> + /*
> + * Retain one character of the prefix to
> + * pass to fnmatch, which lets it distinguish
> + * the start of a directory component correctly.
> + */
> + prefix--;
> pattern += prefix;
> patternlen -= prefix;
> name += prefix;
So, checking pattern "fo*o/bar" against "foo/bar", we'd use
"o*o/bar" to match "oo/bar", which is not necessary but our
conjecture is that feeding shorter fnmatch() is not buying
us much, which justifies this change.
If not, we could do a more targetted pessimization, perhaps like
this, ...
/* the non-wildcard prefix does not match? */
if (fspathncmp(pattern, name, prefix))
return 0;
/* the non-wildcard prefix is the whole thing? */
if (namelen == prefix && patternlen == prefix)
return 1;
/* avoid making foo**/bar match foobar */
if (3 <= prefix && memcmp(pattern, "**/", 3)
prefix--;
pattern += prefix;
patternlen -= prefix;
name += prefix;
namelen -= prefix;
... but that is even more specific hack than yours.
> @@ -1370,7 +1377,7 @@ int match_pathname(const char *pathname, int pathlen,
> * then our prefix match is all we need; we
> * do not need to call fnmatch at all.
> */
> - if (!patternlen && !namelen)
> + if (patternlen == 1 && namelen == 1)
> return 1;
> }
In any case, I do prefer doing this "our non-wildcard part matched
the whole thing, so let's return true" before stripping matching
prefix from the pattern and the name (like I showed earlier).
> diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
> index 273d71411f..db8bde280e 100755
> --- a/t/t0008-ignores.sh
> +++ b/t/t0008-ignores.sh
> @@ -847,6 +847,17 @@ test_expect_success 'directories and ** matches' '
> test_cmp expect actual
> '
>
> +test_expect_success '** not confused by matching leading prefix' '
> + cat >.gitignore <<-\EOF &&
> + foo**/bar
> + EOF
> + git check-ignore foobar foo/bar >actual &&
> + cat >expect <<-\EOF &&
> + foo/bar
> + EOF
> + test_cmp expect actual
> +'
> +
> ############################################################################
> #
> # test whitespace handling
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] match_pathname(): give fnmatch one char of prefix context
2025-10-23 20:28 ` Junio C Hamano
@ 2025-10-24 18:28 ` Sruteesh Kumar
2025-10-26 15:18 ` Jeff King
2025-10-26 15:26 ` Jeff King
2 siblings, 0 replies; 20+ messages in thread
From: Sruteesh Kumar @ 2025-10-24 18:28 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, git@vger.kernel.org
On Friday, October 24th, 2025 at 1:58 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King peff@peff.net writes:
>
> > Subject: match_pathname(): give fnmatch one char of prefix context
> >
> > In match_pathname(), which we use for matching .gitignore and
> > .gitattribute patterns, we are comparing paths with with fnmatch
>
>
> "with with" -> "with".
>
> > patterns (actually our extended wildmatch, which will be important).
> > There's an extra optimization there: we pre-compute the number of
> > non-wildcard characters at the beginning of the pattern and do an
> > fspathncmp() on that prefix.
> >
> > That lets us avoid fnmatch entirely on patterns without wildcards, and
> > shrinks the amount of work we hand off to fnmatch. For a pattern like
> > "foo*.txt" and a path "foobar.txt", we'd cut away the matching "foo"
> > prefix and just pass "*.txt" and "bar.txt" to fnmatch().
> >
> > But this misses a subtle corner case. In fnmatch(), we'll think
> > "bar.txt" is the start of the path, but it's not. This doesn't matter
> > for the pattern above, but consider the wildmatch pattern "foo**/bar"
> > and the path "foobar". These two should not match, because there is no
> > file named "bar", and the "" applies only to the containing directory
> > name. But after removing the "foo" prefix, fnmatch will get "/bar" and
> > "bar", which it does consider a match, because "**/" can match zero
> > directories.
>
>
> Ouch.
>
> > We can solve this by giving fnmatch a bit more context. As long as it
> > has one byte of the matched prefix, then it will know that "bar" is not
> > the start of the path. In this example it would get "o**/bar" and
> > "obar", and realize that they cannot match.
> >
> > In the case that there are no wildcards at all (i.e., the whole prefix
> > matches), we'll continue to return without running fnmatch at all. We
> > just need to account for the extra byte in our adjusted lengths.
> >
> > Signed-off-by: Jeff King peff@peff.net
> > ---
> > I wonder how much this prefix-matching buys us in practice. There are
> > two cases that are helped:
> >
> > 1. When there is no wildcard in the pattern at all, we skip fnmatch
> > entirely.
> >
> > 2. We do a raw match of the prefix chars, shrinking the size of what
> > is passed to fnmatch.
> >
> > My suspicion is that most of the improvement comes from (1), and it
> > would be very easy to retain that case and get rid of (2). But I haven't
> > done any measuring.
>
>
> The above matches my intuition as well.
>
> > diff --git a/dir.c b/dir.c
> > index 0a67a99cb3..764400d9c5 100644
> > --- a/dir.c
> > +++ b/dir.c
> > @@ -1360,6 +1360,13 @@ int match_pathname(const char *pathname, int pathlen,
> >
> > if (fspathncmp(pattern, name, prefix))
> > return 0;
> > +
> > + /*
> > + * Retain one character of the prefix to
> > + * pass to fnmatch, which lets it distinguish
> > + * the start of a directory component correctly.
> > + */
> > + prefix--;
> > pattern += prefix;
> > patternlen -= prefix;
> > name += prefix;
>
>
> So, checking pattern "foo/bar" against "foo/bar", we'd use
> "oo/bar" to match "oo/bar", which is not necessary but our
> conjecture is that feeding shorter fnmatch() is not buying
> us much, which justifies this change.
Actually, I too just realized that skipping the matching non-wildcard prefix doesn't provide any optimization (Correct me, if I am wrong). We any need to compare both pattern and path until the end of the prefix be it match_pathname() or fnmatch(). Any idea why this is explicitly done? May be a small demonstration through an example?
>
> If not, we could do a more targetted pessimization, perhaps like
> this, ...
>
> /* the non-wildcard prefix does not match? /
> if (fspathncmp(pattern, name, prefix))
> return 0;
>
> / the non-wildcard prefix is the whole thing? /
> if (namelen == prefix && patternlen == prefix)
> return 1;
>
> / avoid making foo**/bar match foobar */
> if (3 <= prefix && memcmp(pattern, "**/", 3)
> prefix--;
> pattern += prefix;
> patternlen -= prefix;
> name += prefix;
> namelen -= prefix;
>
> ... but that is even more specific hack than yours.
Yes. There is no way we are doing this. We are literally bypassing this single scenario in the code which is not a good practice
>
> > @@ -1370,7 +1377,7 @@ int match_pathname(const char *pathname, int pathlen,
> > * then our prefix match is all we need; we
> > * do not need to call fnmatch at all.
> > */
> > - if (!patternlen && !namelen)
> > + if (patternlen == 1 && namelen == 1)
> > return 1;
> > }
>
>
> In any case, I do prefer doing this "our non-wildcard part matched
> the whole thing, so let's return true" before stripping matching
> prefix from the pattern and the name (like I showed earlier).
>
> > diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
> > index 273d71411f..db8bde280e 100755
> > --- a/t/t0008-ignores.sh
> > +++ b/t/t0008-ignores.sh
> > @@ -847,6 +847,17 @@ test_expect_success 'directories and ** matches' '
> > test_cmp expect actual
> > '
> >
> > +test_expect_success '** not confused by matching leading prefix' '
> > + cat >.gitignore <<-\EOF &&
> > + foo**/bar
> > + EOF
> > + git check-ignore foobar foo/bar >actual &&
> > + cat >expect <<-\EOF &&
> > + foo/bar
> > + EOF
> > + test_cmp expect actual
> > +'
> > +
> > ############################################################################
> > #
> > # test whitespace handling
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] match_pathname(): give fnmatch one char of prefix context
2025-10-23 20:28 ` Junio C Hamano
2025-10-24 18:28 ` Sruteesh Kumar
@ 2025-10-26 15:18 ` Jeff King
2025-10-26 15:26 ` Jeff King
2 siblings, 0 replies; 20+ messages in thread
From: Jeff King @ 2025-10-26 15:18 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Sruteesh Kumar, git@vger.kernel.org
On Thu, Oct 23, 2025 at 01:28:11PM -0700, Junio C Hamano wrote:
> > I wonder how much this prefix-matching buys us in practice. There are
> > two cases that are helped:
> >
> > 1. When there is no wildcard in the pattern at all, we skip fnmatch
> > entirely.
> >
> > 2. We do a raw match of the prefix chars, shrinking the size of what
> > is passed to fnmatch.
> >
> > My suspicion is that most of the improvement comes from (1), and it
> > would be very easy to retain that case and get rid of (2). But I haven't
> > done any measuring.
>
> The above matches my intuition as well.
I wanted some actual numbers, so here's more than anybody probably cares
to read on the subject. This is the test script I came up with:
-- >8 --
#!/bin/sh
pattern=$1; shift
path=$1; shift
versions=
for i in "$@"; do
versions="${versions:+$versions,}$i"
done
rm -rf repo
git init repo
cd repo
echo "/$pattern" >.gitignore
for i in $(seq 1 1000000); do
eval "echo \"$path\""
done >input
hyperfine -L git "$versions" -n "{git}" '../{git} check-ignore -q --no-index --stdin <input || true'
-- 8< --
The idea is to just match $path a million times against $path, with as
little other junk as possible. And we test it against a series of
builds, which are:
- git.orig; the current tip of 'master'
- git.none; ripping out the prefix-match entirely, leaving it to
wildmatch to match those characters.
- git.fullonly; leaving the early return when the prefix consumes the
whole name, but not advancing name/pattern by the prefix amount
before handing it to fnmatch
- git.minusone; shrinking the prefix by one to give fnmatch one char
of context (i.e., the patch under discussion)
Here are results for a long name that matches exactly:
$ long='this is an extremely long pattern that has no globs in it'
$ ./test "$long" "$long" git.*
Benchmark 1: git.fullonly
Time (mean ± σ): 311.0 ms ± 5.9 ms [User: 302.5 ms, System: 8.4 ms]
Range (min … max): 303.4 ms … 319.3 ms 10 runs
Benchmark 2: git.minusone
Time (mean ± σ): 316.4 ms ± 4.8 ms [User: 303.5 ms, System: 12.9 ms]
Range (min … max): 306.3 ms … 323.5 ms 10 runs
Benchmark 3: git.none
Time (mean ± σ): 455.3 ms ± 12.2 ms [User: 444.4 ms, System: 10.8 ms]
Range (min … max): 442.1 ms … 475.1 ms 10 runs
Benchmark 4: git.orig
Time (mean ± σ): 320.8 ms ± 6.8 ms [User: 307.8 ms, System: 12.9 ms]
Range (min … max): 311.7 ms … 331.0 ms 10 runs
Summary
git.fullonly ran
1.02 ± 0.02 times faster than git.minusone
1.03 ± 0.03 times faster than git.orig
1.46 ± 0.05 times faster than git.none
So we can see that "git.none" performs way worse than the rest of them.
I.e., this optimization really is doing something. That wasn't
immediately obvious to me, since it's all O(n) in the end. But
presumably wildmatch's slow char-by-char walk is much less fast than
what amounts to a strcmp().
Interestingly fullonly is a little faster, even though we should never
be running the removed code in this case. I guess the function fits in
cache a little better? :)
Now let's try the same thing with paths that only prefix match:
$ ./test "$long" "$long-\$i" git.*
Benchmark 1: git.fullonly
Time (mean ± σ): 472.1 ms ± 6.1 ms [User: 458.0 ms, System: 14.0 ms]
Range (min … max): 463.1 ms … 484.9 ms 10 runs
Benchmark 2: git.minusone
Time (mean ± σ): 334.4 ms ± 5.5 ms [User: 324.3 ms, System: 10.0 ms]
Range (min … max): 325.7 ms … 340.6 ms 10 runs
Benchmark 3: git.none
Time (mean ± σ): 461.7 ms ± 6.0 ms [User: 448.8 ms, System: 12.8 ms]
Range (min … max): 449.2 ms … 473.6 ms 10 runs
Benchmark 4: git.orig
Time (mean ± σ): 334.4 ms ± 3.4 ms [User: 325.5 ms, System: 8.8 ms]
Range (min … max): 326.0 ms … 338.1 ms 10 runs
Summary
git.minusone ran
1.00 ± 0.02 times faster than git.orig
1.38 ± 0.03 times faster than git.none
1.41 ± 0.03 times faster than git.fullonly
This matches what I'd expect. git.fullonly performs even worse than
git.none now (because we're matching that long prefix only to hand the
entire thing off to wildmatch to do it again). There is a possible
optimization missing in the code for this case, where we know we've
eaten all of "patternlen" but not all "namelen" (so we know we cannot
match), but it's probably rare enough not to worry about (and certainly
fnmatch should be able to figure that out quickly).
The other interesting thing here is that git.minusone performs about the
same as git.orig. So adding one character back for fnmatch to process is
not a big deal. It's a small constant amount of extra work.
Note that neither pattern has a wildcard here! We could test that with:
$ ./test "$long-*" "$long-\$i" git.*
but the timings are roughly the same as above. In either case, fnmatch
will accept or reject pretty quickly once it hits the part after
"$long". So all we are really measuring is how long fnmatch takes to
walk over that first prefix part without wildcards.
So the obvious takeaway is that yes, fnmatch is much slower than strmcp.
That doesn't tell us how _often_ these cases kick in. That is, the
intuition we were talking about above is that matching "foo" with "foo"
is a lot more common than "foo*" matching "foobar". And no amount of
playing with made-up inputs will tell us that. ;)
But my inclination is to leave the optimization in place, assuming we do
not find any more corner cases that foil the one-char-of-context hack.
It's not much code, and it definitely _can_ help.
-Peff
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] match_pathname(): give fnmatch one char of prefix context
2025-10-23 20:28 ` Junio C Hamano
2025-10-24 18:28 ` Sruteesh Kumar
2025-10-26 15:18 ` Jeff King
@ 2025-10-26 15:26 ` Jeff King
2025-10-26 15:40 ` [PATCH v2 0/2] fix "foo**/bar" matching "foobar" Jeff King
2025-10-26 23:29 ` [PATCH] " Junio C Hamano
2 siblings, 2 replies; 20+ messages in thread
From: Jeff King @ 2025-10-26 15:26 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Sruteesh Kumar, git@vger.kernel.org
On Thu, Oct 23, 2025 at 01:28:11PM -0700, Junio C Hamano wrote:
> > Subject: match_pathname(): give fnmatch one char of prefix context
> >
> > In match_pathname(), which we use for matching .gitignore and
> > .gitattribute patterns, we are comparing paths with with fnmatch
>
> "with with" -> "with".
Thanks, will fix.
> > My suspicion is that most of the improvement comes from (1), and it
> > would be very easy to retain that case and get rid of (2). But I haven't
> > done any measuring.
>
> The above matches my intuition as well.
I poked at performance a bit more, but it was long so I put it into a
separate email. My findings are that yes, this optimization is very
measurable, and that my proposed patch does not hurt anything.
> > @@ -1360,6 +1360,13 @@ int match_pathname(const char *pathname, int pathlen,
> >
> > if (fspathncmp(pattern, name, prefix))
> > return 0;
> > +
> > + /*
> > + * Retain one character of the prefix to
> > + * pass to fnmatch, which lets it distinguish
> > + * the start of a directory component correctly.
> > + */
> > + prefix--;
> > pattern += prefix;
> > patternlen -= prefix;
> > name += prefix;
>
> So, checking pattern "fo*o/bar" against "foo/bar", we'd use
> "o*o/bar" to match "oo/bar", which is not necessary but our
> conjecture is that feeding shorter fnmatch() is not buying
> us much, which justifies this change.
Yeah. It turns out it _does_ buy us something (at least in some corner
cases), but pushing one extra char onto fnmatch is not a big deal.
> If not, we could do a more targetted pessimization, perhaps like
> this, ...
>
> /* the non-wildcard prefix does not match? */
> if (fspathncmp(pattern, name, prefix))
> return 0;
>
> /* the non-wildcard prefix is the whole thing? */
> if (namelen == prefix && patternlen == prefix)
> return 1;
>
> /* avoid making foo**/bar match foobar */
> if (3 <= prefix && memcmp(pattern, "**/", 3)
> prefix--;
> pattern += prefix;
> patternlen -= prefix;
> name += prefix;
> namelen -= prefix;
>
> ... but that is even more specific hack than yours.
Yeah, I think that would also work. Mostly I would worry that there are
other cases besides a raw "**/" which causes similar problems, but I
could not think of any. Passing in a single char of context seemed like
an easy but general fix.
I also wonder how expensive that memcmp() is. ;) Obviously not very, but
if the point is that we are trying to save fnmatch from looking at that
one extra character, we already pinching pennies in a mostly
un-measurable way.
> > @@ -1370,7 +1377,7 @@ int match_pathname(const char *pathname, int pathlen,
> > * then our prefix match is all we need; we
> > * do not need to call fnmatch at all.
> > */
> > - if (!patternlen && !namelen)
> > + if (patternlen == 1 && namelen == 1)
> > return 1;
> > }
>
> In any case, I do prefer doing this "our non-wildcard part matched
> the whole thing, so let's return true" before stripping matching
> prefix from the pattern and the name (like I showed earlier).
Me too. I wrote it a few different ways before ending up with the "==
1", just because it made the diff smaller. But let me do it as two
steps, which I think will make it all more clear.
-Peff
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 0/2] fix "foo**/bar" matching "foobar"
2025-10-26 15:26 ` Jeff King
@ 2025-10-26 15:40 ` Jeff King
2025-10-26 15:41 ` [PATCH v2 1/2] match_pathname(): reorder prefix-match check Jeff King
2025-10-26 15:42 ` [PATCH v2 2/2] match_pathname(): give fnmatch one char of prefix context Jeff King
2025-10-26 23:29 ` [PATCH] " Junio C Hamano
1 sibling, 2 replies; 20+ messages in thread
From: Jeff King @ 2025-10-26 15:40 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Sruteesh Kumar, git@vger.kernel.org
On Sun, Oct 26, 2025 at 11:26:14AM -0400, Jeff King wrote:
> Me too. I wrote it a few different ways before ending up with the "==
> 1", just because it made the diff smaller. But let me do it as two
> steps, which I think will make it all more clear.
Here it is. Maybe excessive to split it up, but each patch is quite nice
to read now. :)
[1/2]: match_pathname(): reorder prefix-match check
[2/2]: match_pathname(): give fnmatch one char of prefix context
dir.c | 18 ++++++++++++------
t/t0008-ignores.sh | 11 +++++++++++
2 files changed, 23 insertions(+), 6 deletions(-)
-Peff
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 1/2] match_pathname(): reorder prefix-match check
2025-10-26 15:40 ` [PATCH v2 0/2] fix "foo**/bar" matching "foobar" Jeff King
@ 2025-10-26 15:41 ` Jeff King
2025-10-26 15:42 ` [PATCH v2 2/2] match_pathname(): give fnmatch one char of prefix context Jeff King
1 sibling, 0 replies; 20+ messages in thread
From: Jeff King @ 2025-10-26 15:41 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Sruteesh Kumar, git@vger.kernel.org
As an optimization, we use fspathncmp() to match a prefix of the pattern
that does not contain any wildcards, and then pass the remainder to
fnmatch(). If it has matched the whole thing, we can return early.
Let's shift this early-return check to before we tweak the pattern and
name strings. That will gives us more flexibility with that tweaking.
It might also save a few instructions, but I couldn't measure any
improvement in doing so (and I wouldn't be surprised if an optimizing
compiler could figure that out itself).
Signed-off-by: Jeff King <peff@peff.net>
---
dir.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/dir.c b/dir.c
index 0a67a99cb3..bbc2bf289d 100644
--- a/dir.c
+++ b/dir.c
@@ -1360,18 +1360,19 @@ int match_pathname(const char *pathname, int pathlen,
if (fspathncmp(pattern, name, prefix))
return 0;
- pattern += prefix;
- patternlen -= prefix;
- name += prefix;
- namelen -= prefix;
/*
* If the whole pattern did not have a wildcard,
* then our prefix match is all we need; we
* do not need to call fnmatch at all.
*/
- if (!patternlen && !namelen)
+ if (patternlen == prefix && namelen == prefix)
return 1;
+
+ pattern += prefix;
+ patternlen -= prefix;
+ name += prefix;
+ namelen -= prefix;
}
return fnmatch_icase_mem(pattern, patternlen,
--
2.51.1.840.g23b87c0a58
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 2/2] match_pathname(): give fnmatch one char of prefix context
2025-10-26 15:40 ` [PATCH v2 0/2] fix "foo**/bar" matching "foobar" Jeff King
2025-10-26 15:41 ` [PATCH v2 1/2] match_pathname(): reorder prefix-match check Jeff King
@ 2025-10-26 15:42 ` Jeff King
1 sibling, 0 replies; 20+ messages in thread
From: Jeff King @ 2025-10-26 15:42 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Sruteesh Kumar, git@vger.kernel.org
In match_pathname(), which we use for matching .gitignore and
.gitattribute patterns, we are comparing paths with fnmatch patterns
(actually our extended wildmatch, which will be important). There's an
extra optimization there: we pre-compute the number of non-wildcard
characters at the beginning of the pattern and do an fspathncmp() on
that prefix.
That lets us avoid fnmatch entirely on patterns without wildcards, and
shrinks the amount of work we hand off to fnmatch. For a pattern like
"foo*.txt" and a path "foobar.txt", we'd cut away the matching "foo"
prefix and just pass "*.txt" and "bar.txt" to fnmatch().
But this misses a subtle corner case. In fnmatch(), we'll think
"bar.txt" is the start of the path, but it's not. This doesn't matter
for the pattern above, but consider the wildmatch pattern "foo**/bar"
and the path "foobar". These two should not match, because there is no
file named "bar", and the "**" applies only to the containing directory
name. But after removing the "foo" prefix, fnmatch will get "**/bar" and
"bar", which it does consider a match, because "**/" can match zero
directories.
We can solve this by giving fnmatch a bit more context. As long as it
has one byte of the matched prefix, then it will know that "bar" is not
the start of the path. In this example it would get "o**/bar" and
"obar", and realize that they cannot match.
Signed-off-by: Jeff King <peff@peff.net>
---
dir.c | 7 ++++++-
t/t0008-ignores.sh | 11 +++++++++++
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/dir.c b/dir.c
index bbc2bf289d..22fe595f3f 100644
--- a/dir.c
+++ b/dir.c
@@ -1360,7 +1360,6 @@ int match_pathname(const char *pathname, int pathlen,
if (fspathncmp(pattern, name, prefix))
return 0;
-
/*
* If the whole pattern did not have a wildcard,
* then our prefix match is all we need; we
@@ -1369,6 +1368,12 @@ int match_pathname(const char *pathname, int pathlen,
if (patternlen == prefix && namelen == prefix)
return 1;
+ /*
+ * Retain one character of the prefix to
+ * pass to fnmatch, which lets it distinguish
+ * the start of a directory component correctly.
+ */
+ prefix--;
pattern += prefix;
patternlen -= prefix;
name += prefix;
diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
index 273d71411f..db8bde280e 100755
--- a/t/t0008-ignores.sh
+++ b/t/t0008-ignores.sh
@@ -847,6 +847,17 @@ test_expect_success 'directories and ** matches' '
test_cmp expect actual
'
+test_expect_success '** not confused by matching leading prefix' '
+ cat >.gitignore <<-\EOF &&
+ foo**/bar
+ EOF
+ git check-ignore foobar foo/bar >actual &&
+ cat >expect <<-\EOF &&
+ foo/bar
+ EOF
+ test_cmp expect actual
+'
+
############################################################################
#
# test whitespace handling
--
2.51.1.840.g23b87c0a58
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] match_pathname(): give fnmatch one char of prefix context
2025-10-26 15:26 ` Jeff King
2025-10-26 15:40 ` [PATCH v2 0/2] fix "foo**/bar" matching "foobar" Jeff King
@ 2025-10-26 23:29 ` Junio C Hamano
2025-10-27 14:29 ` Jeff King
1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2025-10-26 23:29 UTC (permalink / raw)
To: Jeff King; +Cc: Sruteesh Kumar, git@vger.kernel.org
Jeff King <peff@peff.net> writes:
> I also wonder how expensive that memcmp() is. ;) Obviously not very, but
> if the point is that we are trying to save fnmatch from looking at that
> one extra character, we already pinching pennies in a mostly
> un-measurable way.
I added the "limit to known bad case" in the illustration not for
performance but for correctness. This was because just like we
weren't convinced that the "**/" may be the only case that breaks
the existing optimization, I was worried if stepping back by one
byte may somehow make a pattern that should not match mistakenly
match.
In any case, your numbers did make sense and tells us that optimized
strncmp() is much faster than comparing a byte at a time, which is
expected.
Thanks.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] match_pathname(): give fnmatch one char of prefix context
2025-10-26 23:29 ` [PATCH] " Junio C Hamano
@ 2025-10-27 14:29 ` Jeff King
2025-10-27 15:35 ` Junio C Hamano
0 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2025-10-27 14:29 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Sruteesh Kumar, git@vger.kernel.org
On Sun, Oct 26, 2025 at 04:29:40PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > I also wonder how expensive that memcmp() is. ;) Obviously not very, but
> > if the point is that we are trying to save fnmatch from looking at that
> > one extra character, we already pinching pennies in a mostly
> > un-measurable way.
>
> I added the "limit to known bad case" in the illustration not for
> performance but for correctness. This was because just like we
> weren't convinced that the "**/" may be the only case that breaks
> the existing optimization, I was worried if stepping back by one
> byte may somehow make a pattern that should not match mistakenly
> match.
Hmm, I hadn't really considered that. This is mostly hand-waving, but it
feels unlikely to cause issues because we're giving fnmatch() more
context, and never less. We know that the stepped-back character matches
in the name and the pattern. So we are asking "o*bar" to match "obar"
instead of "*bar" to match "bar", which seems like it should never be a
bad thing unless there is a bug in fnmatch.
BTW, there was another bug mentioned in that original issue around
backslash handling. I didn't investigate it at all, though. It didn't
look like it would be related to this optimization, so I think we can
just consider this fix independently.
-Peff
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] match_pathname(): give fnmatch one char of prefix context
2025-10-27 14:29 ` Jeff King
@ 2025-10-27 15:35 ` Junio C Hamano
2025-10-28 23:19 ` Jeff King
0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2025-10-27 15:35 UTC (permalink / raw)
To: Jeff King; +Cc: Sruteesh Kumar, git@vger.kernel.org
Jeff King <peff@peff.net> writes:
>> I added the "limit to known bad case" in the illustration not for
>> performance but for correctness. This was because just like we
>> weren't convinced that the "**/" may be the only case that breaks
>> the existing optimization, I was worried if stepping back by one
>> byte may somehow make a pattern that should not match mistakenly
>> match.
>
> Hmm, I hadn't really considered that. This is mostly hand-waving, but it
> feels unlikely to cause issues because we're giving fnmatch() more
> context, and never less.
Yup. The above was an explanation of what I was thinking when I
wrote it, not a justification. I haven't thought of a single case
that it would hurt to step back by one byte.
> BTW, there was another bug mentioned in that original issue around
> backslash handling. I didn't investigate it at all, though. It didn't
> look like it would be related to this optimization, so I think we can
> just consider this fix independently.
Hmph, backslash is GIT_GLOB_SPECIAL so nowildcard prefix would stop
before it. Could it be that we mistake it as a directory separator?
I _think_ we are still cleanly distinguish paths from the filesystem
(which could use backslash as directory separator on some platforms)
and the pathspec (which defines the slash as the sole directory
separator), and have platform-specific fspathncmp() to absorb the
differences when matching one with the other. And nowildcard_len is
all about the pathspec, so it probably is something else.
\Thanks.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] match_pathname(): give fnmatch one char of prefix context
2025-10-27 15:35 ` Junio C Hamano
@ 2025-10-28 23:19 ` Jeff King
2025-10-29 15:32 ` [PATCH] doc: document backslash in gitignore patterns Jeff King
0 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2025-10-28 23:19 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Sruteesh Kumar, git@vger.kernel.org
On Mon, Oct 27, 2025 at 08:35:23AM -0700, Junio C Hamano wrote:
> > BTW, there was another bug mentioned in that original issue around
> > backslash handling. I didn't investigate it at all, though. It didn't
> > look like it would be related to this optimization, so I think we can
> > just consider this fix independently.
>
> Hmph, backslash is GIT_GLOB_SPECIAL so nowildcard prefix would stop
> before it. Could it be that we mistake it as a directory separator?
> I _think_ we are still cleanly distinguish paths from the filesystem
> (which could use backslash as directory separator on some platforms)
> and the pathspec (which defines the slash as the sole directory
> separator), and have platform-specific fspathncmp() to absorb the
> differences when matching one with the other. And nowildcard_len is
> all about the pathspec, so it probably is something else.
Yeah, if you do:
git init
echo 'foo\' >.gitignore
git check-ignore --no-index -n -v 'foo\'
then we do have a nowildcard prefix of only 3. But I don't think we get
to match_pathname() and the prefix-matching optimization at all. In
match_basename(), we directly call the equivalent of (those are literal
backslashes in the strings):
fnmatch("foo\", "foo\", 0);
and it claims there's no match. Now this is our own wildmatch-backed
implementation, but I wondered what POSIX has to say on a trailing
backslash like that. It's:
If FNM_NOESCAPE is not set in flags, a <backslash> character in
pattern followed by any other character shall match that second
character in string. In particular, "\\" shall match a
<backslash> in string. If pattern ends with an unescaped
<backslash>, fnmatch() shall return a non-zero value (indicating
either no match or an error). If FNM_NOESCAPE is set, a
<backslash> character shall be treated as an ordinary character.
So we are supposed to reject the match. Looking at the wildmatch
implementation, it does this:
switch (p_ch) {
case '\\':
/* Literal match with following character. Note that the test
* in "default" handles the p[1] == '\0' failure case. */
p_ch = *++p;
/* FALLTHROUGH */
default:
if (t_ch != p_ch)
return WM_NOMATCH;
continue;
which matches what POSIX says.
So I think the input is really nonsense, and we're following POSIX here
in rejecting it. I can't fault an alternative implementation too much
for treating the "\" as a literal char, since that's the only other
sensible behavior. It's probably what I'd do if I hadn't read that bit
of POSIX. ;)
But to a certain degree, I think this is a case of "if it hurts, don't
do it". If you are trying to match "foo\", the correct pattern is
"foo\\".
-Peff
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] doc: document backslash in gitignore patterns
2025-10-28 23:19 ` Jeff King
@ 2025-10-29 15:32 ` Jeff King
2025-10-29 15:55 ` Jeff King
0 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2025-10-29 15:32 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Sruteesh Kumar, git@vger.kernel.org
On Tue, Oct 28, 2025 at 07:19:45PM -0400, Jeff King wrote:
> So I think the input is really nonsense, and we're following POSIX here
> in rejecting it. I can't fault an alternative implementation too much
> for treating the "\" as a literal char, since that's the only other
> sensible behavior. It's probably what I'd do if I hadn't read that bit
> of POSIX. ;)
>
> But to a certain degree, I think this is a case of "if it hurts, don't
> do it". If you are trying to match "foo\", the correct pattern is
> "foo\\".
This was all non-obvious enough to me (and to the JGit folks) that I
think it might be worth beefing up the docs a little, like this:
-- >8 --
Subject: [PATCH] doc: document backslash in gitignore patterns
Because gitignore patterns are passed to fnmatch, the handling of
backslashes is the same as it is there: it can be used to escape
metacharacters. We do reference fnmatch(3) for more details, but it may
be friendlier to point out this implication explicitly (especially for
people who want to know about backslash handling and search the
documentation for that word). There are also two cases that I've seen
some other backslash-escaping systems handle differently, so let's
describe those:
1. A backslash before any character treats that character literally,
even if it's not otherwise a meta-character. As opposed to
including the backslash itself (like "foo\bar" in shell expands to
"foo\bar") or forbidding it ("foo\zar" is required to produce a
diagnostic in C).
2. A backslash at the end of the string is an invalid pattern (and not
a literal backslash).
This second one in particular was a point of confusion between our
implementation and the one in JGit. Our wildmatch behavior matches what
POSIX specifies for fnmatch, so the code and documentation are in line.
But let's add a test to cover this case. Note that the behavior here
differs between wildmatch itself (which is what gitignore will use) and
pathspec matching (which will only turn to wildmatch if a literal match
fails). So we match "foo\" to "foo\" in pathspecs, but not via
gitignore.
Signed-off-by: Jeff King <peff@peff.net>
---
I like the compact format of the tests in t3070, but it is not at all
obvious what the two lines of zeroes in ones means just from the diff
context. ;)
The first line is wildmatch directly, and the second is pathspec
matching. The surrounding tests, without a second set, default to the
same outcome for both.
Documentation/gitignore.adoc | 5 +++++
t/t3070-wildmatch.sh | 2 ++
2 files changed, 7 insertions(+)
diff --git a/Documentation/gitignore.adoc b/Documentation/gitignore.adoc
index 5e0964ef41..9fccab4ae8 100644
--- a/Documentation/gitignore.adoc
+++ b/Documentation/gitignore.adoc
@@ -111,6 +111,11 @@ PATTERN FORMAT
one of the characters in a range. See fnmatch(3) and the
FNM_PATHNAME flag for a more detailed description.
+ - A backslash ("`\`") can be used to escape any character. E.g., "`\*`"
+ matches a literal asterisk (and "`\a`" matches "`a`", even though
+ there is no need for escaping there). As with fnmatch(3), a backslash
+ at the end of a pattern is an invalid pattern that never matches.
+
Two consecutive asterisks ("`**`") in patterns matched against
full pathname may have special meaning:
diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
index 3da824117c..655bb1a0f2 100755
--- a/t/t3070-wildmatch.sh
+++ b/t/t3070-wildmatch.sh
@@ -235,6 +235,8 @@ match 1 1 1 1 aaaaaaabababab '*ab'
match 1 1 1 1 'foo*' 'foo\*'
match 0 0 0 0 foobar 'foo\*bar'
match 1 1 1 1 'f\oo' 'f\\oo'
+match 0 0 0 0 \
+ 1 1 1 1 'foo\' 'foo\'
match 1 1 1 1 ball '*[al]?'
match 0 0 0 0 ten '[ten]'
match 1 1 1 1 ten '**[!te]'
--
2.51.2.833.g5cd7b514cb
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] doc: document backslash in gitignore patterns
2025-10-29 15:32 ` [PATCH] doc: document backslash in gitignore patterns Jeff King
@ 2025-10-29 15:55 ` Jeff King
2025-10-30 13:40 ` D. Ben Knoble
0 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2025-10-29 15:55 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Sruteesh Kumar, git@vger.kernel.org
On Wed, Oct 29, 2025 at 11:32:37AM -0400, Jeff King wrote:
> Subject: [PATCH] doc: document backslash in gitignore patterns
Oh, I forgot to mention: obviously gitattributes inherits the same
behavior here. I looked at whether it would want a similar patch, but it
does not define the pattern format at all, and just punts to "see
gitignore(5) for details". I think that's OK. Unlike where we refer to
fnmatch(3) here, you cannot even begin to wonder how backslashes are
handled by gitattributes without reading gitignore(5). ;)
There's also the "pathspec" entry in gitglossary(7), which does mention
fnmatch(3). Though it is even more confusing because of the literal
matching that pathspecs do. I don't know if we'd want anything there
(and I kind of doubt people get as exotic about patterns there as they
would in gitignore).
-Peff
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] doc: document backslash in gitignore patterns
2025-10-29 15:55 ` Jeff King
@ 2025-10-30 13:40 ` D. Ben Knoble
2025-10-30 15:08 ` Jeff King
0 siblings, 1 reply; 20+ messages in thread
From: D. Ben Knoble @ 2025-10-30 13:40 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Sruteesh Kumar, git@vger.kernel.org
On Wed, Oct 29, 2025 at 12:32 PM Jeff King <peff@peff.net> wrote:
>
> On Wed, Oct 29, 2025 at 11:32:37AM -0400, Jeff King wrote:
>
> > Subject: [PATCH] doc: document backslash in gitignore patterns
>
> Oh, I forgot to mention: obviously gitattributes inherits the same
> behavior here. I looked at whether it would want a similar patch, but it
> does not define the pattern format at all, and just punts to "see
> gitignore(5) for details". I think that's OK. Unlike where we refer to
> fnmatch(3) here, you cannot even begin to wonder how backslashes are
> handled by gitattributes without reading gitignore(5). ;)
>
> There's also the "pathspec" entry in gitglossary(7), which does mention
> fnmatch(3). Though it is even more confusing because of the literal
> matching that pathspecs do. I don't know if we'd want anything there
> (and I kind of doubt people get as exotic about patterns there as they
> would in gitignore).
>
> -Peff
I certainly wondered about the pathspec case, since the commit message
called out the difference in behavior. For example, at least in this
one corner, we can't reliably use Git commands with pathspecs to build
up example gitignore patterns to throw in .gitignore?
BTW, is the literal matching intended to be conveyed by
• any path matches itself
? If so, I'm not quite sure how to interpret a pathspec like a/b given
a repo with a/b and dir/a/b—do both match or only the former? I expect
in combination with the 2 subsequent bullets that only the former
matches. Conversely, with a pathspec "b" in that case, I think I could
read the docs as suggesting both match, when IIRC none do. Hm!
But this is a bit of a tangent, and the pathspec entry is already, uh,
complicated [1]. Without a good place to leave extra notes for cases
like this, I'm not sure what to do. Certainly unifying the behavior
would be incompatible (if obscure).
[1]: https://lore.kernel.org/git/20250802094657.GG3711639@coredump.intra.peff.net/
--
D. Ben Knoble
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] doc: document backslash in gitignore patterns
2025-10-30 13:40 ` D. Ben Knoble
@ 2025-10-30 15:08 ` Jeff King
2025-10-30 16:05 ` Ben Knoble
0 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2025-10-30 15:08 UTC (permalink / raw)
To: D. Ben Knoble; +Cc: Junio C Hamano, Sruteesh Kumar, git@vger.kernel.org
On Thu, Oct 30, 2025 at 09:40:36AM -0400, D. Ben Knoble wrote:
> > There's also the "pathspec" entry in gitglossary(7), which does mention
> > fnmatch(3). Though it is even more confusing because of the literal
> > matching that pathspecs do. I don't know if we'd want anything there
> > (and I kind of doubt people get as exotic about patterns there as they
> > would in gitignore).
>
> I certainly wondered about the pathspec case, since the commit message
> called out the difference in behavior. For example, at least in this
> one corner, we can't reliably use Git commands with pathspecs to build
> up example gitignore patterns to throw in .gitignore?
They're close enough that I suspect people do use them interchangeably,
but there are definitely important corner cases. Like the anchoring
stuff below.
> BTW, is the literal matching intended to be conveyed by
>
> • any path matches itself
>
> ? If so, I'm not quite sure how to interpret a pathspec like a/b given
> a repo with a/b and dir/a/b—do both match or only the former? I expect
> in combination with the 2 subsequent bullets that only the former
> matches. Conversely, with a pathspec "b" in that case, I think I could
> read the docs as suggesting both match, when IIRC none do. Hm!
I may not be the right person to ask, as I wasn't aware of the literal
match behavior here until I tried to write that wildmatch test, and then
walked it through the debugger. ;) Finding the documentation
justification came later.
But yeah, I think that a pathspec "a/b" will not match "dir/a/b",
because pathspecs are implicitly anchored to the start of the path. So
"foo" in .gitignore will match "some/dir/foo". You'd need to "/foo" to
anchor it to the top-level. But pathspecs always start at the top-level,
and you'd need "**/foo" to be the equivalent of gitignore's "foo".
But even that's not entirely true. We allow "*" to match even directory
separators in pathspecs (which is what makes "*.c" match anywhere), so
"*/foo" is enough.
> But this is a bit of a tangent, and the pathspec entry is already, uh,
> complicated [1]. Without a good place to leave extra notes for cases
> like this, I'm not sure what to do. Certainly unifying the behavior
> would be incompatible (if obscure).
It's definitely complicated, but I'm not sure which of those features
are important for command-line ergonomics, and which are just pointless
inconsistencies. I'd be scared to start changing things and finding out. :)
I certainly don't have any objection to improving the pathspec
documentation if it's unclear, though I think that can be done
separately on top of the patch in this thread (and I'm not planning to
tackle it myself).
-Peff
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] doc: document backslash in gitignore patterns
2025-10-30 15:08 ` Jeff King
@ 2025-10-30 16:05 ` Ben Knoble
0 siblings, 0 replies; 20+ messages in thread
From: Ben Knoble @ 2025-10-30 16:05 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Sruteesh Kumar, git
> Le 30 oct. 2025 à 11:08, Jeff King <peff@peff.net> a écrit :
>
> On Thu, Oct 30, 2025 at 09:40:36AM -0400, D. Ben Knoble wrote:
>
>>> There's also the "pathspec" entry in gitglossary(7), which does mention
>>> fnmatch(3). Though it is even more confusing because of the literal
>>> matching that pathspecs do. I don't know if we'd want anything there
>>> (and I kind of doubt people get as exotic about patterns there as they
>>> would in gitignore).
>>
>> I certainly wondered about the pathspec case, since the commit message
>> called out the difference in behavior. For example, at least in this
>> one corner, we can't reliably use Git commands with pathspecs to build
>> up example gitignore patterns to throw in .gitignore?
>
> They're close enough that I suspect people do use them interchangeably,
> but there are definitely important corner cases. Like the anchoring
> stuff below.
>
>> BTW, is the literal matching intended to be conveyed by
>>
>> • any path matches itself
>>
>> ? If so, I'm not quite sure how to interpret a pathspec like a/b given
>> a repo with a/b and dir/a/b—do both match or only the former? I expect
>> in combination with the 2 subsequent bullets that only the former
>> matches. Conversely, with a pathspec "b" in that case, I think I could
>> read the docs as suggesting both match, when IIRC none do. Hm!
>
> I may not be the right person to ask, as I wasn't aware of the literal
> match behavior here until I tried to write that wildmatch test, and then
> walked it through the debugger. ;) Finding the documentation
> justification came later.
>
> But yeah, I think that a pathspec "a/b" will not match "dir/a/b",
> because pathspecs are implicitly anchored to the start of the path. So
> "foo" in .gitignore will match "some/dir/foo". You'd need to "/foo" to
> anchor it to the top-level. But pathspecs always start at the top-level,
> and you'd need "**/foo" to be the equivalent of gitignore's "foo".
>
> But even that's not entirely true. We allow "*" to match even directory
> separators in pathspecs (which is what makes "*.c" match anywhere), so
> "*/foo" is enough.
>
>> But this is a bit of a tangent, and the pathspec entry is already, uh,
>> complicated [1]. Without a good place to leave extra notes for cases
>> like this, I'm not sure what to do. Certainly unifying the behavior
>> would be incompatible (if obscure).
>
> It's definitely complicated, but I'm not sure which of those features
> are important for command-line ergonomics, and which are just pointless
> inconsistencies. I'd be scared to start changing things and finding out. :)
>
> I certainly don't have any objection to improving the pathspec
> documentation if it's unclear, though I think that can be done
> separately on top of the patch in this thread (and I'm not planning to
> tackle it myself).
>
> -Peff
Agreed, thanks.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2025-10-30 16:05 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-10 14:57 Probable issue with code/documentation Sruteesh Kumar
2025-10-14 0:34 ` [PATCH] match_pathname(): give fnmatch one char of prefix context Jeff King
2025-10-14 3:09 ` Jeff King
2025-10-22 16:19 ` Sruteesh Kumar
2025-10-23 20:28 ` Junio C Hamano
2025-10-24 18:28 ` Sruteesh Kumar
2025-10-26 15:18 ` Jeff King
2025-10-26 15:26 ` Jeff King
2025-10-26 15:40 ` [PATCH v2 0/2] fix "foo**/bar" matching "foobar" Jeff King
2025-10-26 15:41 ` [PATCH v2 1/2] match_pathname(): reorder prefix-match check Jeff King
2025-10-26 15:42 ` [PATCH v2 2/2] match_pathname(): give fnmatch one char of prefix context Jeff King
2025-10-26 23:29 ` [PATCH] " Junio C Hamano
2025-10-27 14:29 ` Jeff King
2025-10-27 15:35 ` Junio C Hamano
2025-10-28 23:19 ` Jeff King
2025-10-29 15:32 ` [PATCH] doc: document backslash in gitignore patterns Jeff King
2025-10-29 15:55 ` Jeff King
2025-10-30 13:40 ` D. Ben Knoble
2025-10-30 15:08 ` Jeff King
2025-10-30 16:05 ` Ben Knoble
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).