git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: git@vger.kernel.org
Cc: Matthew Hughes <mhughes@uw.co.uk>
Subject: [PATCH 2/2] dir: do not feed path suffix to pathspec match
Date: Fri,  7 Jul 2023 15:04:57 -0700	[thread overview]
Message-ID: <20230707220457.3655121-3-gitster@pobox.com> (raw)
In-Reply-To: <20230707220457.3655121-1-gitster@pobox.com>

match_pathspec_item() takes "prefix" value, assuming that it is OK
for a caller to chop off the common leading prefix of pathspec
strings from the path and only allow the remainder of the path to
match the pathspec pattern (after chopping the same leading prefix
of it, of course).

The "common leading prefix" optimization has two main features:

 * discard the entries in the in-core index that are outside of the
   common leading prefix; if you are doing "ls-files one/a one/b",
   we know all matches must be from "one/", so first the code
   discards all entries outside the "one/" directory from the
   in-core index.  This allows us to work on a smaller dataset.

 * allow skipping the comparison of a few leading bytes when
   matching pathspec with path.  When "ls-files" finds the path
   "one/a/1" in the in-core index given "one/a" and "one/b" as the
   pathspec, knowing that common leading prefix "one/" was found
   lets the pathspec matchiner not to bother comparing "one/" part,
   and allows it to feed "a/1" down, as long as the pathspec element
   "one/a" gets corresponding adjustment to "a".

However, losing the full path in the repository too early in the
callchain (at dir.c:do_match_pathspec()) and feeding only the suffix
to the low level code (i.e. dir.c:match_pathspec_item()) loses a
crucial piece of infomation that is needed by the pathspec matching
code, when the attribute magic is involved.  The attributes, other
than the ones that are built-in and the ones that come from the
$GIT_DIR/info/attributes file and the top-level .gitattributes file,
are lazily read on-demand, as we encounter each path and ask if it
matches the pathspec.

For example, if you say "git ls-files "(attr:label)sub/" in a
repository with a file "sub/file" that is given the 'label'
attribute in sub/.gitattributes:

 * The common prefix optimization finds that "sub/" is the common
   prefix and prunes the in-core index so that it has only entries
   inside that directory.  This is desirable.

 * The code then walks the in-core index, finds "sub/file", and
   eventually asks do_match_pathspec() if it matches the given
   pathspec.

 * do_match_pathspec() calls match_pathspec_item() _after_ stripping
   the common prefix "sub/" from the path, giving it "file", plus
   the length of the common prefix (4-bytes), so that the pathspec
   element "(attr:label)sub/" can be treated as if it were "(attr:label)".

The last one is what breaks the match, as the pathspec subsystem
ends up asking the attribute subsystem to find the attribute
attached to the path "file".

Unfortunately this was not discovered so far because the code works
well if we do not trigger the common prefix optimization, e.g.

	git ls-files "(attr:label)sub"
	git ls-files "(attr:label)sub/" "(attr:label)dir/"

would have reported "sub/file" as a path with the 'label' attribute
just fine.

Update do_match_pathspec() so that it does not strip the prefix from
the path, and always feeding the full pathname to match_pathspec_item().

Reported-by: Matthew Hughes <mhughes@uw.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 dir.c                          | 31 ++++++++-----------------------
 t/t6135-pathspec-with-attrs.sh | 24 +++++++++++++++++++++++-
 2 files changed, 31 insertions(+), 24 deletions(-)

diff --git a/dir.c b/dir.c
index 3acac7beb1..6116022ae6 100644
--- a/dir.c
+++ b/dir.c
@@ -337,12 +337,11 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat,
  * [2] Only if DO_MATCH_LEADING_PATHSPEC is passed; otherwise, not a match.
  */
 static int match_pathspec_item(struct index_state *istate,
-			       const struct pathspec_item *item, int prefix,
+			       const struct pathspec_item *item,
 			       const char *name, int namelen, unsigned flags)
 {
-	/* name/namelen has prefix cut off by caller */
-	const char *match = item->match + prefix;
-	int matchlen = item->len - prefix;
+	const char *match = item->match;
+	int matchlen = item->len;
 
 	/*
 	 * The normal call pattern is:
@@ -362,19 +361,9 @@ static int match_pathspec_item(struct index_state *istate,
 	 * other words, we do not trust the caller on comparing the
 	 * prefix part when :(icase) is involved. We do exact
 	 * comparison ourselves.
-	 *
-	 * Normally the caller (common_prefix_len() in fact) does
-	 * _exact_ matching on name[-prefix+1..-1] and we do not need
-	 * to check that part. Be defensive and check it anyway, in
-	 * case common_prefix_len is changed, or a new caller is
-	 * introduced that does not use common_prefix_len.
-	 *
-	 * If the penalty turns out too high when prefix is really
-	 * long, maybe change it to
-	 * strncmp(match, name, item->prefix - prefix)
 	 */
 	if (item->prefix && (item->magic & PATHSPEC_ICASE) &&
-	    strncmp(item->match, name - prefix, item->prefix))
+	    strncmp(item->match, name, item->prefix))
 		return 0;
 
 	if (item->attr_match_nr &&
@@ -399,7 +388,7 @@ static int match_pathspec_item(struct index_state *istate,
 
 	if (item->nowildcard_len < item->len &&
 	    !git_fnmatch(item, match, name,
-			 item->nowildcard_len - prefix))
+			 item->nowildcard_len))
 		return MATCHED_FNMATCH;
 
 	/* Perform checks to see if "name" is a leading string of the pathspec */
@@ -414,8 +403,7 @@ static int match_pathspec_item(struct index_state *istate,
 
 		/* name doesn't match up to the first wild character */
 		if (item->nowildcard_len < item->len &&
-		    ps_strncmp(item, match, name,
-			       item->nowildcard_len - prefix))
+		    ps_strncmp(item, match, name, item->nowildcard_len))
 			return 0;
 
 		/*
@@ -488,9 +476,6 @@ static int do_match_pathspec(struct index_state *istate,
 			return 0;
 	}
 
-	name += prefix;
-	namelen -= prefix;
-
 	for (i = ps->nr - 1; i >= 0; i--) {
 		int how;
 
@@ -506,8 +491,8 @@ static int do_match_pathspec(struct index_state *istate,
 		 */
 		if (seen && ps->items[i].magic & PATHSPEC_EXCLUDE)
 			seen[i] = MATCHED_FNMATCH;
-		how = match_pathspec_item(istate, ps->items+i, prefix, name,
-					  namelen, flags);
+		how = match_pathspec_item(istate, ps->items+i,
+					  name, namelen, flags);
 		if (ps->recursive &&
 		    (ps->magic & PATHSPEC_MAXDEPTH) &&
 		    ps->max_depth != -1 &&
diff --git a/t/t6135-pathspec-with-attrs.sh b/t/t6135-pathspec-with-attrs.sh
index f63774094f..f70c395e75 100755
--- a/t/t6135-pathspec-with-attrs.sh
+++ b/t/t6135-pathspec-with-attrs.sh
@@ -65,7 +65,8 @@ test_expect_success 'setup .gitattributes' '
 	fileValue label=foo
 	fileWrongLabel label☺
 	EOF
-	git add .gitattributes &&
+	echo fileSetLabel label1 >sub/.gitattributes &&
+	git add .gitattributes sub/.gitattributes &&
 	git commit -m "add attributes"
 '
 
@@ -157,6 +158,7 @@ test_expect_success 'check unspecified attr' '
 	fileC
 	fileNoLabel
 	fileWrongLabel
+	sub/.gitattributes
 	sub/fileA
 	sub/fileAB
 	sub/fileAC
@@ -181,6 +183,7 @@ test_expect_success 'check unspecified attr (2)' '
 	HEAD:fileC
 	HEAD:fileNoLabel
 	HEAD:fileWrongLabel
+	HEAD:sub/.gitattributes
 	HEAD:sub/fileA
 	HEAD:sub/fileAB
 	HEAD:sub/fileAC
@@ -200,6 +203,7 @@ test_expect_success 'check multiple unspecified attr' '
 	fileC
 	fileNoLabel
 	fileWrongLabel
+	sub/.gitattributes
 	sub/fileC
 	sub/fileNoLabel
 	sub/fileWrongLabel
@@ -273,4 +277,22 @@ test_expect_success 'backslash cannot be used as a value' '
 	test_i18ngrep "for value matching" actual
 '
 
+test_expect_success 'reading from .gitattributes in a subdirectory (1)' '
+	git ls-files ":(attr:label1)" >actual &&
+	test_write_lines "sub/fileSetLabel" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'reading from .gitattributes in a subdirectory (2)' '
+	git ls-files ":(attr:label1)sub" >actual &&
+	test_write_lines "sub/fileSetLabel" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'reading from .gitattributes in a subdirectory (3)' '
+	git ls-files ":(attr:label1)sub/" >actual &&
+	test_write_lines "sub/fileSetLabel" >expect &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.41.0-318-g061c58647e


  parent reply	other threads:[~2023-07-07 22:05 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-07 22:04 [PATCH 0/2] Fix attr magic combined with pathspec prefix Junio C Hamano
2023-07-07 22:04 ` [PATCH 1/2] t6135: attr magic with path pattern Junio C Hamano
2023-07-07 22:04 ` Junio C Hamano [this message]
2023-07-07 23:45   ` [PATCH 2/2alt] dir: do not feed path suffix to pathspec match Junio C Hamano
2023-07-08  7:16     ` René Scharfe
2023-07-08 21:35       ` [PATCH 2alt/2] dir: match "attr" pathspec magic with correct paths Junio C Hamano
2023-07-09  5:35         ` René Scharfe
2023-07-09  9:28           ` Junio C Hamano
2023-07-08  7:16   ` [PATCH 2/2] dir: do not feed path suffix to pathspec match René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230707220457.3655121-3-gitster@pobox.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=mhughes@uw.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).