* [PATCH 0/2] Fix attr magic combined with pathspec prefix
@ 2023-07-07 22:04 Junio C Hamano
2023-07-07 22:04 ` [PATCH 1/2] t6135: attr magic with path pattern Junio C Hamano
2023-07-07 22:04 ` [PATCH 2/2] dir: do not feed path suffix to pathspec match Junio C Hamano
0 siblings, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2023-07-07 22:04 UTC (permalink / raw)
To: git; +Cc: Matthew Hughes
Matthew Hughes noticed and reported that a pathspec that uses the
attribute magic with pathspec pattern does not work correctly.
After digging around, I found that
$ git ls-files ":(attr:label)"
notices that the "label" attribute is set to path "sub/file" by
listing it, and combined with a pathspec pattern, i.e.
$ git ls-files ":(attr:label)sub"
it still correctly reports "sub/file" has the "label" attribute, be
it defined in ".gitignore" or "sub/.gitignore". The case that it
does not work is the command invocation is
$ git ls-files ":(attr:label)sub/"
and the attribute "label" is defined in "sub/.gitattributes" for
"sub/file".
It turns out that the problematic invocation triggers the common
prefix optimization, which is totally broken for this case.
The first patch enhances the test coverage, and the second patch
fixes the broken common prefix optimization.
Junio C Hamano (2):
t6135: attr magic with path pattern
dir: do not feed path suffix to pathspec match
dir.c | 31 ++++++-----------------
t/t6135-pathspec-with-attrs.sh | 46 ++++++++++++++++++++++++++++++++--
2 files changed, 52 insertions(+), 25 deletions(-)
--
2.41.0-318-g061c58647e
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] t6135: attr magic with path pattern
2023-07-07 22:04 [PATCH 0/2] Fix attr magic combined with pathspec prefix Junio C Hamano
@ 2023-07-07 22:04 ` Junio C Hamano
2023-07-07 22:04 ` [PATCH 2/2] dir: do not feed path suffix to pathspec match Junio C Hamano
1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2023-07-07 22:04 UTC (permalink / raw)
To: git
The test coverage on attribute magic combined with path pattern
was a bit thin. Let's add a few and make sure "(attr:X)sub" and
"(attr:X)sub/" behave the same.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
t/t6135-pathspec-with-attrs.sh | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/t/t6135-pathspec-with-attrs.sh b/t/t6135-pathspec-with-attrs.sh
index 457cc167c7..f63774094f 100755
--- a/t/t6135-pathspec-with-attrs.sh
+++ b/t/t6135-pathspec-with-attrs.sh
@@ -78,7 +78,17 @@ test_expect_success 'check specific set attr' '
test_cmp expect actual
'
-test_expect_success 'check specific set attr (2)' '
+test_expect_success 'check set attr with pathspec pattern' '
+ echo sub/fileSetLabel >expect &&
+
+ git ls-files ":(attr:label)sub" >actual &&
+ test_cmp expect actual &&
+
+ git ls-files ":(attr:label)sub/" >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'check specific set attr in tree-ish' '
cat <<-\EOF >expect &&
HEAD:fileSetLabel
HEAD:sub/fileSetLabel
@@ -87,6 +97,16 @@ test_expect_success 'check specific set attr (2)' '
test_cmp expect actual
'
+test_expect_success 'check specific set attr with pathspec pattern in tree-ish' '
+ echo HEAD:sub/fileSetLabel >expect &&
+
+ git grep -l content HEAD ":(attr:label)sub" >actual &&
+ test_cmp expect actual &&
+
+ git grep -l content HEAD ":(attr:label)sub/" >actual &&
+ test_cmp expect actual
+'
+
test_expect_success 'check specific unset attr' '
cat <<-\EOF >expect &&
fileUnsetLabel
--
2.41.0-318-g061c58647e
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] dir: do not feed path suffix to pathspec match
2023-07-07 22:04 [PATCH 0/2] Fix attr magic combined with pathspec prefix Junio C Hamano
2023-07-07 22:04 ` [PATCH 1/2] t6135: attr magic with path pattern Junio C Hamano
@ 2023-07-07 22:04 ` Junio C Hamano
2023-07-07 23:45 ` [PATCH 2/2alt] " Junio C Hamano
2023-07-08 7:16 ` [PATCH 2/2] dir: do not feed path suffix to pathspec match René Scharfe
1 sibling, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2023-07-07 22:04 UTC (permalink / raw)
To: git; +Cc: Matthew Hughes
match_pathspec_item() takes "prefix" value, assuming that it is OK
for a caller to chop off the common leading prefix of pathspec
strings from the path and only allow the remainder of the path to
match the pathspec pattern (after chopping the same leading prefix
of it, of course).
The "common leading prefix" optimization has two main features:
* discard the entries in the in-core index that are outside of the
common leading prefix; if you are doing "ls-files one/a one/b",
we know all matches must be from "one/", so first the code
discards all entries outside the "one/" directory from the
in-core index. This allows us to work on a smaller dataset.
* allow skipping the comparison of a few leading bytes when
matching pathspec with path. When "ls-files" finds the path
"one/a/1" in the in-core index given "one/a" and "one/b" as the
pathspec, knowing that common leading prefix "one/" was found
lets the pathspec matchiner not to bother comparing "one/" part,
and allows it to feed "a/1" down, as long as the pathspec element
"one/a" gets corresponding adjustment to "a".
However, losing the full path in the repository too early in the
callchain (at dir.c:do_match_pathspec()) and feeding only the suffix
to the low level code (i.e. dir.c:match_pathspec_item()) loses a
crucial piece of infomation that is needed by the pathspec matching
code, when the attribute magic is involved. The attributes, other
than the ones that are built-in and the ones that come from the
$GIT_DIR/info/attributes file and the top-level .gitattributes file,
are lazily read on-demand, as we encounter each path and ask if it
matches the pathspec.
For example, if you say "git ls-files "(attr:label)sub/" in a
repository with a file "sub/file" that is given the 'label'
attribute in sub/.gitattributes:
* The common prefix optimization finds that "sub/" is the common
prefix and prunes the in-core index so that it has only entries
inside that directory. This is desirable.
* The code then walks the in-core index, finds "sub/file", and
eventually asks do_match_pathspec() if it matches the given
pathspec.
* do_match_pathspec() calls match_pathspec_item() _after_ stripping
the common prefix "sub/" from the path, giving it "file", plus
the length of the common prefix (4-bytes), so that the pathspec
element "(attr:label)sub/" can be treated as if it were "(attr:label)".
The last one is what breaks the match, as the pathspec subsystem
ends up asking the attribute subsystem to find the attribute
attached to the path "file".
Unfortunately this was not discovered so far because the code works
well if we do not trigger the common prefix optimization, e.g.
git ls-files "(attr:label)sub"
git ls-files "(attr:label)sub/" "(attr:label)dir/"
would have reported "sub/file" as a path with the 'label' attribute
just fine.
Update do_match_pathspec() so that it does not strip the prefix from
the path, and always feeding the full pathname to match_pathspec_item().
Reported-by: Matthew Hughes <mhughes@uw.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
dir.c | 31 ++++++++-----------------------
t/t6135-pathspec-with-attrs.sh | 24 +++++++++++++++++++++++-
2 files changed, 31 insertions(+), 24 deletions(-)
diff --git a/dir.c b/dir.c
index 3acac7beb1..6116022ae6 100644
--- a/dir.c
+++ b/dir.c
@@ -337,12 +337,11 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat,
* [2] Only if DO_MATCH_LEADING_PATHSPEC is passed; otherwise, not a match.
*/
static int match_pathspec_item(struct index_state *istate,
- const struct pathspec_item *item, int prefix,
+ const struct pathspec_item *item,
const char *name, int namelen, unsigned flags)
{
- /* name/namelen has prefix cut off by caller */
- const char *match = item->match + prefix;
- int matchlen = item->len - prefix;
+ const char *match = item->match;
+ int matchlen = item->len;
/*
* The normal call pattern is:
@@ -362,19 +361,9 @@ static int match_pathspec_item(struct index_state *istate,
* other words, we do not trust the caller on comparing the
* prefix part when :(icase) is involved. We do exact
* comparison ourselves.
- *
- * Normally the caller (common_prefix_len() in fact) does
- * _exact_ matching on name[-prefix+1..-1] and we do not need
- * to check that part. Be defensive and check it anyway, in
- * case common_prefix_len is changed, or a new caller is
- * introduced that does not use common_prefix_len.
- *
- * If the penalty turns out too high when prefix is really
- * long, maybe change it to
- * strncmp(match, name, item->prefix - prefix)
*/
if (item->prefix && (item->magic & PATHSPEC_ICASE) &&
- strncmp(item->match, name - prefix, item->prefix))
+ strncmp(item->match, name, item->prefix))
return 0;
if (item->attr_match_nr &&
@@ -399,7 +388,7 @@ static int match_pathspec_item(struct index_state *istate,
if (item->nowildcard_len < item->len &&
!git_fnmatch(item, match, name,
- item->nowildcard_len - prefix))
+ item->nowildcard_len))
return MATCHED_FNMATCH;
/* Perform checks to see if "name" is a leading string of the pathspec */
@@ -414,8 +403,7 @@ static int match_pathspec_item(struct index_state *istate,
/* name doesn't match up to the first wild character */
if (item->nowildcard_len < item->len &&
- ps_strncmp(item, match, name,
- item->nowildcard_len - prefix))
+ ps_strncmp(item, match, name, item->nowildcard_len))
return 0;
/*
@@ -488,9 +476,6 @@ static int do_match_pathspec(struct index_state *istate,
return 0;
}
- name += prefix;
- namelen -= prefix;
-
for (i = ps->nr - 1; i >= 0; i--) {
int how;
@@ -506,8 +491,8 @@ static int do_match_pathspec(struct index_state *istate,
*/
if (seen && ps->items[i].magic & PATHSPEC_EXCLUDE)
seen[i] = MATCHED_FNMATCH;
- how = match_pathspec_item(istate, ps->items+i, prefix, name,
- namelen, flags);
+ how = match_pathspec_item(istate, ps->items+i,
+ name, namelen, flags);
if (ps->recursive &&
(ps->magic & PATHSPEC_MAXDEPTH) &&
ps->max_depth != -1 &&
diff --git a/t/t6135-pathspec-with-attrs.sh b/t/t6135-pathspec-with-attrs.sh
index f63774094f..f70c395e75 100755
--- a/t/t6135-pathspec-with-attrs.sh
+++ b/t/t6135-pathspec-with-attrs.sh
@@ -65,7 +65,8 @@ test_expect_success 'setup .gitattributes' '
fileValue label=foo
fileWrongLabel label☺
EOF
- git add .gitattributes &&
+ echo fileSetLabel label1 >sub/.gitattributes &&
+ git add .gitattributes sub/.gitattributes &&
git commit -m "add attributes"
'
@@ -157,6 +158,7 @@ test_expect_success 'check unspecified attr' '
fileC
fileNoLabel
fileWrongLabel
+ sub/.gitattributes
sub/fileA
sub/fileAB
sub/fileAC
@@ -181,6 +183,7 @@ test_expect_success 'check unspecified attr (2)' '
HEAD:fileC
HEAD:fileNoLabel
HEAD:fileWrongLabel
+ HEAD:sub/.gitattributes
HEAD:sub/fileA
HEAD:sub/fileAB
HEAD:sub/fileAC
@@ -200,6 +203,7 @@ test_expect_success 'check multiple unspecified attr' '
fileC
fileNoLabel
fileWrongLabel
+ sub/.gitattributes
sub/fileC
sub/fileNoLabel
sub/fileWrongLabel
@@ -273,4 +277,22 @@ test_expect_success 'backslash cannot be used as a value' '
test_i18ngrep "for value matching" actual
'
+test_expect_success 'reading from .gitattributes in a subdirectory (1)' '
+ git ls-files ":(attr:label1)" >actual &&
+ test_write_lines "sub/fileSetLabel" >expect &&
+ test_cmp expect actual
+'
+
+test_expect_success 'reading from .gitattributes in a subdirectory (2)' '
+ git ls-files ":(attr:label1)sub" >actual &&
+ test_write_lines "sub/fileSetLabel" >expect &&
+ test_cmp expect actual
+'
+
+test_expect_success 'reading from .gitattributes in a subdirectory (3)' '
+ git ls-files ":(attr:label1)sub/" >actual &&
+ test_write_lines "sub/fileSetLabel" >expect &&
+ test_cmp expect actual
+'
+
test_done
--
2.41.0-318-g061c58647e
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2alt] dir: do not feed path suffix to pathspec match
2023-07-07 22:04 ` [PATCH 2/2] dir: do not feed path suffix to pathspec match Junio C Hamano
@ 2023-07-07 23:45 ` Junio C Hamano
2023-07-08 7:16 ` René Scharfe
2023-07-08 7:16 ` [PATCH 2/2] dir: do not feed path suffix to pathspec match René Scharfe
1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2023-07-07 23:45 UTC (permalink / raw)
To: git
Junio C Hamano <gitster@pobox.com> writes:
> ...
> * do_match_pathspec() calls match_pathspec_item() _after_ stripping
> the common prefix "sub/" from the path, giving it "file", plus
> the length of the common prefix (4-bytes), so that the pathspec
> element "(attr:label)sub/" can be treated as if it were "(attr:label)".
>
> The last one is what breaks the match, as the pathspec subsystem
> ends up asking the attribute subsystem to find the attribute
> attached to the path "file".
> ...
> Update do_match_pathspec() so that it does not strip the prefix from
> the path, and always feeding the full pathname to match_pathspec_item().
Here is an alternative approach with a lot smaller code footprint.
Instead of teaching do_match_pathspec() not to strip the common
prefix from the pathname, we teach match_pathspec_item() how to
recover the original pathname before stripping, and use that when
calling match_pathspec_attrs() function. The same trick is already
used in an earlier part of the same function, so even though it
looks somewhat dirty, it is unlikely that it would introduce
more breakage.
As the test part is the same, I'll just show the code change
relative to the 'master' branch.
I am undecided which one is better.
dir.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git c/dir.c w/dir.c
index a7469df3ac..635d1b058c 100644
--- c/dir.c
+++ w/dir.c
@@ -374,7 +374,7 @@ static int match_pathspec_item(struct index_state *istate,
return 0;
if (item->attr_match_nr &&
- !match_pathspec_attrs(istate, name, namelen, item))
+ !match_pathspec_attrs(istate, name - prefix, namelen + prefix, item))
return 0;
/* If the match was just the prefix, we matched */
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] dir: do not feed path suffix to pathspec match
2023-07-07 22:04 ` [PATCH 2/2] dir: do not feed path suffix to pathspec match Junio C Hamano
2023-07-07 23:45 ` [PATCH 2/2alt] " Junio C Hamano
@ 2023-07-08 7:16 ` René Scharfe
1 sibling, 0 replies; 9+ messages in thread
From: René Scharfe @ 2023-07-08 7:16 UTC (permalink / raw)
To: Junio C Hamano, git; +Cc: Matthew Hughes
Am 08.07.23 um 00:04 schrieb Junio C Hamano:
> diff --git a/dir.c b/dir.c
> index 3acac7beb1..6116022ae6 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -488,9 +476,6 @@ static int do_match_pathspec(struct index_state *istate,
> return 0;
> }
>
> - name += prefix;
> - namelen -= prefix;
> -
> for (i = ps->nr - 1; i >= 0; i--) {
> int how;
>
> @@ -506,8 +491,8 @@ static int do_match_pathspec(struct index_state *istate,
> */
> if (seen && ps->items[i].magic & PATHSPEC_EXCLUDE)
> seen[i] = MATCHED_FNMATCH;
> - how = match_pathspec_item(istate, ps->items+i, prefix, name,
> - namelen, flags);
> + how = match_pathspec_item(istate, ps->items+i,
> + name, namelen, flags);
With that, the parameter "prefix" of do_match_pathspec() becomes unused
and can be removed. This cascades to match_pathspec_with_flags(),
match_pathspec(), dir_path_match() and builtin/add.c::prune_directory(),
and fill_directory() can lose its return value.
The code continues here like this, though:
if (ps->recursive &&
(ps->magic & PATHSPEC_MAXDEPTH) &&
ps->max_depth != -1 &&
how && how != MATCHED_FNMATCH) {
int len = ps->items[i].len;
if (name[len] == '/')
len++;
if (within_depth(name+len, namelen-len, 0, ps->max_depth))
how = MATCHED_EXACTLY;
else
how = 0;
}
And "name" here would be affected by "prefix" no longer being added.
Does it fix or break --max-depth? I think neither: builtin/grep.c --
the only user of PATHSPEC_MAXDEPTH AFAICS -- passes a prefix of 0.
René
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2alt] dir: do not feed path suffix to pathspec match
2023-07-07 23:45 ` [PATCH 2/2alt] " Junio C Hamano
@ 2023-07-08 7:16 ` René Scharfe
2023-07-08 21:35 ` [PATCH 2alt/2] dir: match "attr" pathspec magic with correct paths Junio C Hamano
0 siblings, 1 reply; 9+ messages in thread
From: René Scharfe @ 2023-07-08 7:16 UTC (permalink / raw)
To: Junio C Hamano, git
Am 08.07.23 um 01:45 schrieb Junio C Hamano:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> ...
>> * do_match_pathspec() calls match_pathspec_item() _after_ stripping
>> the common prefix "sub/" from the path, giving it "file", plus
>> the length of the common prefix (4-bytes), so that the pathspec
>> element "(attr:label)sub/" can be treated as if it were "(attr:label)".
>>
>> The last one is what breaks the match, as the pathspec subsystem
>> ends up asking the attribute subsystem to find the attribute
>> attached to the path "file".
>> ...
>> Update do_match_pathspec() so that it does not strip the prefix from
>> the path, and always feeding the full pathname to match_pathspec_item().
>
> Here is an alternative approach with a lot smaller code footprint.
> Instead of teaching do_match_pathspec() not to strip the common
> prefix from the pathname, we teach match_pathspec_item() how to
> recover the original pathname before stripping, and use that when
> calling match_pathspec_attrs() function. The same trick is already
> used in an earlier part of the same function, so even though it
> looks somewhat dirty, it is unlikely that it would introduce
> more breakage.
>
> As the test part is the same, I'll just show the code change
> relative to the 'master' branch.
>
> I am undecided which one is better.
>
> dir.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git c/dir.c w/dir.c
> index a7469df3ac..635d1b058c 100644
> --- c/dir.c
> +++ w/dir.c
> @@ -374,7 +374,7 @@ static int match_pathspec_item(struct index_state *istate,
> return 0;
>
> if (item->attr_match_nr &&
> - !match_pathspec_attrs(istate, name, namelen, item))
> + !match_pathspec_attrs(istate, name - prefix, namelen + prefix, item))
match_pathspec_item() has only one caller, and it did the opposite, so
this is safe. And a minimal fix like that is less likely to have side
effects. Removing the trick will surely improve the code, though. If
match_pathspec_item() needs the full name then we should pass it on,
and if the "prefix" offset needs to be added then it can happen right
there in that function.
> return 0;
>
> /* If the match was just the prefix, we matched */
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2alt/2] dir: match "attr" pathspec magic with correct paths
2023-07-08 7:16 ` René Scharfe
@ 2023-07-08 21:35 ` Junio C Hamano
2023-07-09 5:35 ` René Scharfe
0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2023-07-08 21:35 UTC (permalink / raw)
To: git; +Cc: René Scharfe
René Scharfe <l.s.r@web.de> writes:
>> if (item->attr_match_nr &&
>> - !match_pathspec_attrs(istate, name, namelen, item))
>> + !match_pathspec_attrs(istate, name - prefix, namelen + prefix, item))
>
> match_pathspec_item() has only one caller, and it did the opposite, so
> this is safe. And a minimal fix like that is less likely to have side
> effects. Removing the trick will surely improve the code, though. If
> match_pathspec_item() needs the full name then we should pass it on,
> and if the "prefix" offset needs to be added then it can happen right
> there in that function.
Yup. I am inclined to take this version and then update the
proposed log message to put less blame on the "common prefix"
optimization in general.
Thanks.
Just for completeness, this is with an updated log message.
----- >8 --------- >8 --------- >8 --------- >8 --------- >8 -----
The match_pathspec_item() function takes "prefix" value, allowing a
caller to chop off the common leading prefix of pathspec pattern
strings from the path and only use the remainder of the path to
match the pathspec patterns (after chopping the same leading prefix
of them, of course).
This "common leading prefix" optimization has two main features:
* discard the entries in the in-core index that are outside of the
common leading prefix; if you are doing "ls-files one/a one/b",
we know all matches must be from "one/", so first the code
discards all entries outside the "one/" directory from the
in-core index. This allows us to work on a smaller dataset.
* allow skipping the comparison of the leading bytes when matching
pathspec with path. When "ls-files" finds the path "one/a/1" in
the in-core index given "one/a" and "one/b" as the pathspec,
knowing that common leading prefix "one/" was found lets the
pathspec matchinery not to bother comparing "one/" part, and
allows it to feed "a/1" down, as long as the pathspec element
"one/a" gets corresponding adjustment to "a".
When the "attr" pathspec magic is in effect, however, the current
code breaks down.
The attributes, other than the ones that are built-in and the ones
that come from the $GIT_DIR/info/attributes file and the top-level
.gitattributes file, are lazily read from the filesystem on-demand,
as we encounter each path and ask if it matches the pathspec. For
example, if you say "git ls-files "(attr:label)sub/" in a repository
with a file "sub/file" that is given the 'label' attribute in
"sub/.gitattributes":
* The common prefix optimization finds that "sub/" is the common
prefix and prunes the in-core index so that it has only entries
inside that directory. This is desirable.
* The code then walks the in-core index, finds "sub/file", and
eventually asks do_match_pathspec() if it matches the given
pathspec.
* do_match_pathspec() calls match_pathspec_item() _after_ stripping
the common prefix "sub/" from the path, giving it "file", plus
the length of the common prefix (4-bytes), so that the pathspec
element "(attr:label)sub/" can be treated as if it were "(attr:label)".
The last one is what breaks the match in the current code, as the
pathspec subsystem ends up asking the attribute subsystem to find
the attribute attached to the path "file". We need to ask about the
attributes on "sub/file" when calling match_pathspec_attrs(); this
can be done by looking at "prefix" bytes before the beginning of
"name", which is the same trick already used by another piece of the
code in the same match_pathspec_item() function.
Unfortunately this was not discovered so far because the code works
with slightly different arguments, e.g.
$ git ls-files "(attr:label)sub"
$ git ls-files "(attr:label)sub/" "no/such/dir/"
would have reported "sub/file" as a path with the 'label' attribute
just fine, because neither would trigger the common prefix
optimization.
Reported-by: Matthew Hughes <mhughes@uw.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
dir.c | 2 +-
t/t6135-pathspec-with-attrs.sh | 24 +++++++++++++++++++++++-
2 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/dir.c b/dir.c
index a7469df3ac..635d1b058c 100644
--- a/dir.c
+++ b/dir.c
@@ -374,7 +374,7 @@ static int match_pathspec_item(struct index_state *istate,
return 0;
if (item->attr_match_nr &&
- !match_pathspec_attrs(istate, name, namelen, item))
+ !match_pathspec_attrs(istate, name - prefix, namelen + prefix, item))
return 0;
/* If the match was just the prefix, we matched */
diff --git a/t/t6135-pathspec-with-attrs.sh b/t/t6135-pathspec-with-attrs.sh
index f63774094f..f70c395e75 100755
--- a/t/t6135-pathspec-with-attrs.sh
+++ b/t/t6135-pathspec-with-attrs.sh
@@ -65,7 +65,8 @@ test_expect_success 'setup .gitattributes' '
fileValue label=foo
fileWrongLabel label☺
EOF
- git add .gitattributes &&
+ echo fileSetLabel label1 >sub/.gitattributes &&
+ git add .gitattributes sub/.gitattributes &&
git commit -m "add attributes"
'
@@ -157,6 +158,7 @@ test_expect_success 'check unspecified attr' '
fileC
fileNoLabel
fileWrongLabel
+ sub/.gitattributes
sub/fileA
sub/fileAB
sub/fileAC
@@ -181,6 +183,7 @@ test_expect_success 'check unspecified attr (2)' '
HEAD:fileC
HEAD:fileNoLabel
HEAD:fileWrongLabel
+ HEAD:sub/.gitattributes
HEAD:sub/fileA
HEAD:sub/fileAB
HEAD:sub/fileAC
@@ -200,6 +203,7 @@ test_expect_success 'check multiple unspecified attr' '
fileC
fileNoLabel
fileWrongLabel
+ sub/.gitattributes
sub/fileC
sub/fileNoLabel
sub/fileWrongLabel
@@ -273,4 +277,22 @@ test_expect_success 'backslash cannot be used as a value' '
test_i18ngrep "for value matching" actual
'
+test_expect_success 'reading from .gitattributes in a subdirectory (1)' '
+ git ls-files ":(attr:label1)" >actual &&
+ test_write_lines "sub/fileSetLabel" >expect &&
+ test_cmp expect actual
+'
+
+test_expect_success 'reading from .gitattributes in a subdirectory (2)' '
+ git ls-files ":(attr:label1)sub" >actual &&
+ test_write_lines "sub/fileSetLabel" >expect &&
+ test_cmp expect actual
+'
+
+test_expect_success 'reading from .gitattributes in a subdirectory (3)' '
+ git ls-files ":(attr:label1)sub/" >actual &&
+ test_write_lines "sub/fileSetLabel" >expect &&
+ test_cmp expect actual
+'
+
test_done
--
2.41.0-327-gaa9166bcc0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2alt/2] dir: match "attr" pathspec magic with correct paths
2023-07-08 21:35 ` [PATCH 2alt/2] dir: match "attr" pathspec magic with correct paths Junio C Hamano
@ 2023-07-09 5:35 ` René Scharfe
2023-07-09 9:28 ` Junio C Hamano
0 siblings, 1 reply; 9+ messages in thread
From: René Scharfe @ 2023-07-09 5:35 UTC (permalink / raw)
To: Junio C Hamano, git
Am 08.07.23 um 23:35 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>>> if (item->attr_match_nr &&
>>> - !match_pathspec_attrs(istate, name, namelen, item))
>>> + !match_pathspec_attrs(istate, name - prefix, namelen + prefix, item))
>>
>> match_pathspec_item() has only one caller, and it did the opposite, so
>> this is safe. And a minimal fix like that is less likely to have side
>> effects. Removing the trick will surely improve the code, though. If
>> match_pathspec_item() needs the full name then we should pass it on,
>> and if the "prefix" offset needs to be added then it can happen right
>> there in that function.
>
> Yup. I am inclined to take this version and then update the
> proposed log message to put less blame on the "common prefix"
> optimization in general.
>
> Thanks.
>
> Just for completeness, this is with an updated log message.
>
> ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 -----
> The match_pathspec_item() function takes "prefix" value, allowing a
> caller to chop off the common leading prefix of pathspec pattern
> strings from the path and only use the remainder of the path to
> match the pathspec patterns (after chopping the same leading prefix
> of them, of course).
>
> This "common leading prefix" optimization has two main features:
>
> * discard the entries in the in-core index that are outside of the
> common leading prefix; if you are doing "ls-files one/a one/b",
> we know all matches must be from "one/", so first the code
> discards all entries outside the "one/" directory from the
> in-core index. This allows us to work on a smaller dataset.
>
> * allow skipping the comparison of the leading bytes when matching
> pathspec with path. When "ls-files" finds the path "one/a/1" in
> the in-core index given "one/a" and "one/b" as the pathspec,
> knowing that common leading prefix "one/" was found lets the
> pathspec matchinery not to bother comparing "one/" part, and
> allows it to feed "a/1" down, as long as the pathspec element
> "one/a" gets corresponding adjustment to "a".
>
> When the "attr" pathspec magic is in effect, however, the current
> code breaks down.
>
> The attributes, other than the ones that are built-in and the ones
> that come from the $GIT_DIR/info/attributes file and the top-level
> .gitattributes file, are lazily read from the filesystem on-demand,
> as we encounter each path and ask if it matches the pathspec. For
> example, if you say "git ls-files "(attr:label)sub/" in a repository
> with a file "sub/file" that is given the 'label' attribute in
> "sub/.gitattributes":
>
> * The common prefix optimization finds that "sub/" is the common
> prefix and prunes the in-core index so that it has only entries
> inside that directory. This is desirable.
>
> * The code then walks the in-core index, finds "sub/file", and
> eventually asks do_match_pathspec() if it matches the given
> pathspec.
>
> * do_match_pathspec() calls match_pathspec_item() _after_ stripping
> the common prefix "sub/" from the path, giving it "file", plus
> the length of the common prefix (4-bytes), so that the pathspec
> element "(attr:label)sub/" can be treated as if it were "(attr:label)".
>
> The last one is what breaks the match in the current code, as the
> pathspec subsystem ends up asking the attribute subsystem to find
> the attribute attached to the path "file". We need to ask about the
> attributes on "sub/file" when calling match_pathspec_attrs(); this
> can be done by looking at "prefix" bytes before the beginning of
> "name", which is the same trick already used by another piece of the
> code in the same match_pathspec_item() function.
>
> Unfortunately this was not discovered so far because the code works
> with slightly different arguments, e.g.
>
> $ git ls-files "(attr:label)sub"
> $ git ls-files "(attr:label)sub/" "no/such/dir/"
>
> would have reported "sub/file" as a path with the 'label' attribute
> just fine, because neither would trigger the common prefix
> optimization.
Makes me wonder how important this optimization is, when this flaw went
unnoticed for ten years.
Using the latest main against on an old Chromium repository, because
it has lots of files:
Benchmark 1: ./git -C ../chromium/src ls-files third_party/blink/web_tests/external third_party/blink/web_tests/platform
Time (mean ± σ): 37.8 ms ± 0.2 ms [User: 28.3 ms, System: 8.4 ms]
Range (min … max): 37.4 ms … 38.7 ms 74 runs
Benchmark 2: ./git -C ../chromium/src ls-files third_party/blink/web_tests/external third_party/blink/web_tests/platform missing
Time (mean ± σ): 48.4 ms ± 0.5 ms [User: 38.5 ms, System: 8.7 ms]
Range (min … max): 47.8 ms … 51.9 ms 58 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Summary
./git -C ../chromium/src ls-files third_party/blink/web_tests/external third_party/blink/web_tests/platform ran
1.28 ± 0.02 times faster than ./git -C ../chromium/src ls-files third_party/blink/web_tests/external third_party/blink/web_tests/platform missing
We can see that the shared prefix optimization helps noticeably, even
though the measurements are noisy.
With your big patch 2:
Benchmark 1: ./git -C ../chromium/src ls-files third_party/blink/web_tests/external third_party/blink/web_tests/platform
Time (mean ± σ): 38.0 ms ± 0.4 ms [User: 28.3 ms, System: 8.5 ms]
Range (min … max): 37.7 ms … 40.3 ms 72 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 2: ./git -C ../chromium/src ls-files third_party/blink/web_tests/external third_party/blink/web_tests/platform missing
Time (mean ± σ): 48.5 ms ± 0.4 ms [User: 38.5 ms, System: 8.8 ms]
Range (min … max): 47.9 ms … 50.6 ms 58 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Summary
./git -C ../chromium/src ls-files third_party/blink/web_tests/external third_party/blink/web_tests/platform ran
1.28 ± 0.02 times faster than ./git -C ../chromium/src ls-files third_party/blink/web_tests/external third_party/blink/web_tests/platform missing
The difference to main is small enough to get lost in the noise.
The one-line fix is nice and surgical, but I like the other one more.
Gets rid of crusty underutilized code that doesn't even seem to make
a measurable difference.
René
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2alt/2] dir: match "attr" pathspec magic with correct paths
2023-07-09 5:35 ` René Scharfe
@ 2023-07-09 9:28 ` Junio C Hamano
0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2023-07-09 9:28 UTC (permalink / raw)
To: René Scharfe; +Cc: git
René Scharfe <l.s.r@web.de> writes:
>> This "common leading prefix" optimization has two main features:
>>
>> * discard the entries in the in-core index that are outside of the
>> common leading prefix; if you are doing "ls-files one/a one/b",
>> we know all matches must be from "one/", so first the code
>> discards all entries outside the "one/" directory from the
>> in-core index. This allows us to work on a smaller dataset.
>>
>> * allow skipping the comparison of the leading bytes when matching
>> pathspec with path. When "ls-files" finds the path "one/a/1" in
>> the in-core index given "one/a" and "one/b" as the pathspec,
>> knowing that common leading prefix "one/" was found lets the
>> pathspec matchinery not to bother comparing "one/" part, and
>> allows it to feed "a/1" down, as long as the pathspec element
>> "one/a" gets corresponding adjustment to "a".
>> ...
> With your big patch 2:
> ...
> The difference to main is small enough to get lost in the noise.
>
> The one-line fix is nice and surgical, but I like the other one more.
> Gets rid of crusty underutilized code that doesn't even seem to make
> a measurable difference.
Your benchmark matches my intuition, in that the main benefit of the
optimization comes from discarding the in-core cache entries outside
the area covered by the common prefix, and not from being able to
skip comparing a leading bytes. The value in code simplification
the larger change has may want to be pursued later, but I'd rather
see us make a small "fix" that can be merged down to 'maint' first.
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-07-09 9:30 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-07 22:04 [PATCH 0/2] Fix attr magic combined with pathspec prefix Junio C Hamano
2023-07-07 22:04 ` [PATCH 1/2] t6135: attr magic with path pattern Junio C Hamano
2023-07-07 22:04 ` [PATCH 2/2] dir: do not feed path suffix to pathspec match Junio C Hamano
2023-07-07 23:45 ` [PATCH 2/2alt] " Junio C Hamano
2023-07-08 7:16 ` René Scharfe
2023-07-08 21:35 ` [PATCH 2alt/2] dir: match "attr" pathspec magic with correct paths Junio C Hamano
2023-07-09 5:35 ` René Scharfe
2023-07-09 9:28 ` Junio C Hamano
2023-07-08 7:16 ` [PATCH 2/2] dir: do not feed path suffix to pathspec match René Scharfe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).