git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Victoria Dye <vdye@github.com>
To: Shuqi Liang <cheskaqiqi@gmail.com>, git@vger.kernel.org
Cc: gitster@pobox.com
Subject: Re: [PATCH v4 2/3] attr.c: read attributes in a sparse directory
Date: Thu, 20 Jul 2023 13:18:47 -0700	[thread overview]
Message-ID: <5e478d8b-9ef4-864b-41e4-e0a79877d278@github.com> (raw)
In-Reply-To: <20230718232916.31660-3-cheskaqiqi@gmail.com>

Shuqi Liang wrote:
> Before this patch, git check-attr was unable to read the attributes from
> a .gitattributes file within a sparse directory. The original comment
> was operating under the assumption that users are only interested in
> files or directories inside the cones. Therefore, in the original code,
> in the case of a cone-mode sparse-checkout, we didn't load the
> .gitattributes file.
> 
> However, this behavior can lead to missing attributes for files inside
> sparse directories, causing inconsistencies in file handling.
> 
> To resolve this, revise 'git check-attr' to allow attribute reading for
> files in sparse directories from the corresponding .gitattributes files:
> 
> 1.Utilize path_in_cone_mode_sparse_checkout() and index_name_pos_sparse
> to check if a path falls within a sparse directory.
> 
> 2.If path is inside a sparse directory, employ the value of
> index_name_pos_sparse() to find the sparse directory containing path and
> path relative to sparse directory. Proceed to read attributes from the
> tree OID of the sparse directory using read_attr_from_blob().
> 
> 3.If path is not inside a sparse directory,ensure that attributes are
> fetched from the index blob with read_blob_data_from_index().
> 
> Modify previous tests so such difference is not considered as an error.

I don't quite follow what you mean here "such difference". I see that you
changed the 'test_expect_failure' to 'test_expect_success', but that's
because the attributes inside a sparse directory are now being read rather
than ignored.

The rest of the commit message looks good to me, though.

> 
> Helped-by: Victoria Dye <vdye@github.com>
> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
> ---
>  attr.c                                   | 60 +++++++++++++++++-------
>  t/t1092-sparse-checkout-compatibility.sh |  4 +-
>  2 files changed, 44 insertions(+), 20 deletions(-)
> 
> diff --git a/attr.c b/attr.c
> index 7d39ac4a29..7650f5481a 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -808,35 +808,59 @@ static struct attr_stack *read_attr_from_blob(struct index_state *istate,
>  static struct attr_stack *read_attr_from_index(struct index_state *istate,
>  					       const char *path, unsigned flags)
>  {
> +	struct attr_stack *stack = NULL;
>  	char *buf;
>  	unsigned long size;
> +	int pos = -1;
> +	char normalize_path[PATH_MAX];
> +	const char *relative_path;
>  
>  	if (!istate)
>  		return NULL;
>  
>  	/*
> -	 * The .gitattributes file only applies to files within its
> -	 * parent directory. In the case of cone-mode sparse-checkout,
> -	 * the .gitattributes file is sparse if and only if all paths
> -	 * within that directory are also sparse. Thus, don't load the
> -	 * .gitattributes file since it will not matter.
> -	 *
> -	 * In the case of a sparse index, it is critical that we don't go
> -	 * looking for a .gitattributes file, as doing so would cause the
> -	 * index to expand.
> +	 * When handling sparse-checkouts, .gitattributes files
> +	 * may reside within a sparse directory. We distinguish
> +	 * whether a path exists directly in the index or not by
> +	 * evaluating if 'pos' is negative.
> +	 * If 'pos' is negative, the path is not directly present
> +	 * in the index and is likely within a sparse directory.
> +	 * For paths not in the index, The absolute value of 'pos'
> +	 * minus 1 gives us the position where the path would be
> +	 * inserted in lexicographic order within the index.
> +	 * We then subtract another 1 from this value
> +	 * (pos = -pos - 2) to find the position of the last
> +	 * index entry which is lexicographically smaller than
> +	 * the path. This would be the sparse directory containing
> +	 * the path. By identifying the sparse directory containing
> +	 * the path, we can correctly read the attributes specified
> +	 * in the .gitattributes file from the tree object of the
> +	 * sparse directory.
>  	 */
> -	if (!path_in_cone_mode_sparse_checkout(path, istate))
> -		return NULL;
> +	if (!path_in_cone_mode_sparse_checkout(path, istate)) {
> +		pos = index_name_pos_sparse(istate, path, strlen(path));
>  
> -	buf = read_blob_data_from_index(istate, path, &size);
> -	if (!buf)
> -		return NULL;
> -	if (size >= ATTR_MAX_FILE_SIZE) {
> -		warning(_("ignoring overly large gitattributes blob '%s'"), path);
> -		return NULL;
> +		if (pos < 0)
> +			pos = -pos - 2;
>  	}

What 'pos' represents after this block is somewhat confusing/possibly
misleading. Consider the possible cases for 'path':

1. 'path' is inside the sparse-checkout cone.
2. 'path' is not inside the sparse-checkout cone, but it is in the index
   (i.e., the sparse directory has been expanded for some reason).
3. 'path' is inside a sparse directory.

In case #1, 'pos' will be '-1'. We never enter the
'!path_in_cone_mode_sparse_checkout()' if-statement, so the value is never
updated.

In case #2, 'pos' will be positive, since it exists in the index.

In case #3, the return value of 'index_name_pos_sparse()' will be negative,
then assigned the value of '-pos - 2' to (in many cases) create a positive
value pointing to the entry before the insertion point of 'path'.

Based on your explanation in the code comment above, I would assume that, if
'pos' was >= 0 coming out of this block, it represented the index position
of the potential sparse directory containing 'path'. But that's not true in
case #2.

To make the purpose of these variables clearer, instead of changing 'pos'
in-place, you could create a variable like 'sparse_dir_pos'. Like 'pos' is
now, it'd be initialized to '-1', but instead of unconditionally assigning
it the output of 'index_name_pos_sparse()', you'd only assign it if the
output of that function is negative:

	if (!path_in_cone_mode_sparse_checkout(path, istate)) {
		int pos = index_name_pos_sparse(istate, path, strlen(path));
		if (pos < 0)
			sparse_dir_pos = -pos - 2;
	}

Then, you'd use 'sparse_dir_pos' in the condition below, and it'd always be
-1 when 'path' is definitely not contained in a sparse directory.

>  
> -	return read_attr_from_buf(buf, path, flags);
> +	if (pos >= 0 && !path_in_cone_mode_sparse_checkout(path, istate) &&

nit: the check of '!path_in_cone_mode_sparse_checkout()' is redundant, since
'pos' will always be '-1' if 'path_in_cone_mode_sparse_checkout()' is true.

> +	    S_ISSPARSEDIR(istate->cache[pos]->ce_mode) &&
> +	    !strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos])) &&
> +	    !normalize_path_copy(normalize_path, path)) {

The normalization should come before the 'strncmp' so that you're comparing
'normalize_path' to the index entry.  

That said, I know I asked about path normalization earlier [1], but did you
confirm that 'path' _isn't_ normalized? Because if it's normalized by all of
the potential callers of this function, the check here would be unnecessary.

[1] https://lore.kernel.org/git/e4a77d0f-cf1d-ef76-fe26-ad5e58372a02@github.com/

> +		relative_path = normalize_path + ce_namelen(istate->cache[pos]);

'relative_path' should be declared here, since it's not used outside this
block. 

> +		stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags);
> +	} else {
> +		buf = read_blob_data_from_index(istate, path, &size);
> +		if (!buf)
> +			return NULL;
> +		if (size >= ATTR_MAX_FILE_SIZE) {
> +			warning(_("ignoring overly large gitattributes blob '%s'"), path);
> +			return NULL;
> +		}
> +		stack = read_attr_from_buf(buf, path, flags);
> +	}
> +	return stack;
>  }
>  
>  static struct attr_stack *read_attr(struct index_state *istate,
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 90633f383a..3f32c1f972 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -2271,7 +2271,7 @@ test_expect_success 'check-attr with pathspec inside sparse definition' '
>  	test_all_match git check-attr -a --cached -- deep/a
>  '
>  
> -test_expect_failure 'check-attr with pathspec outside sparse definition' '
> +test_expect_success 'check-attr with pathspec outside sparse definition' '

Re: my suggested change to the test in patch 1 [2], when I applied _this_
patch, the test still failed for the 'sparse-index' case. It doesn't seem to
be a problem with your patch, but rather a bug in 'diff' that can be
reproduced with this test (using the infrastructure in t1092):

test_expect_failure 'diff --cached shows differences in sparse directory' '
	init_repos &&

	test_all_match git reset --soft update-folder1 &&
	test_all_match git diff --cached -- folder1/a
'

It's not immediately obvious to me what the problem is, but my guess is it's
some mis-handling of sparse directories in the internal diff machinery.
Given the likely complexity of the issue, I'd be content with you leaving
the 'diff --check' test as 'test_expect_failure' with a note about the bug
in 'diff' to fix later. Or, if you do want to investigate & fix it now, I
wouldn't be opposed to that either. :) 

[2] https://lore.kernel.org/git/c3ebe3b4-88b9-8ca2-2ee3-39a3e0d82201@github.com/

>  	init_repos &&
>  
>  	echo "a -crlf myAttr" >>.gitattributes &&
> @@ -2288,7 +2288,7 @@ test_expect_failure 'check-attr with pathspec outside sparse definition' '
>  	test_all_match git check-attr  -a --cached -- folder1/a
>  '
>  
> -test_expect_failure 'diff --check with pathspec outside sparse definition' '
> +test_expect_success 'diff --check with pathspec outside sparse definition' '
>  	init_repos &&
>  
>  	write_script edit-contents <<-\EOF &&


  reply	other threads:[~2023-07-20 20:18 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-01  6:48 [PATCH v1 0/3] check-attr: integrate with sparse-index Shuqi Liang
2023-07-01  6:48 ` [PATCH v1 1/3] attr.c: read attributes in a sparse directory Shuqi Liang
2023-07-03 17:59   ` Victoria Dye
2023-07-01  6:48 ` [PATCH v1 2/3] t1092: add tests for `git check-attr` Shuqi Liang
2023-07-03 18:11   ` Victoria Dye
2023-07-01  6:48 ` [PATCH v1 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-07-03 18:21   ` Victoria Dye
2023-07-07 15:18 ` [PATCH v2 0/3] " Shuqi Liang
2023-07-07 15:18   ` [PATCH v2 1/3] Enable gitattributes read from sparse directories Shuqi Liang
2023-07-07 23:15     ` Junio C Hamano
2023-07-07 15:18   ` [PATCH v2 2/3] t1092: add tests for `git check-attr` Shuqi Liang
2023-07-07 15:18   ` [PATCH v2 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-07-11 13:30   ` [PATCH v3 0/3] " Shuqi Liang
2023-07-11 13:30     ` [PATCH v3 1/3] attr.c: read attributes in a sparse directory Shuqi Liang
2023-07-11 21:15       ` Junio C Hamano
2023-07-11 22:08         ` Junio C Hamano
2023-07-13 20:22           ` Shuqi Liang
2023-07-13 20:13         ` Shuqi Liang
2023-07-11 21:24       ` Victoria Dye
2023-07-11 13:30     ` [PATCH v3 2/3] t1092: add tests for `git check-attr` Shuqi Liang
2023-07-11 18:52       ` Junio C Hamano
2023-07-11 20:47       ` Victoria Dye
2023-07-11 13:30     ` [PATCH v3 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-07-11 20:07       ` Junio C Hamano
2023-07-11 16:56     ` [PATCH v3 0/3] " Junio C Hamano
2023-07-18 23:29     ` [PATCH v4 " Shuqi Liang
2023-07-18 23:29       ` [PATCH v4 1/3] t1092: add tests for 'git check-attr' Shuqi Liang
2023-07-20 18:43         ` Victoria Dye
2023-07-18 23:29       ` [PATCH v4 2/3] attr.c: read attributes in a sparse directory Shuqi Liang
2023-07-20 20:18         ` Victoria Dye [this message]
2023-08-03 16:22           ` Glen Choo
2023-08-15  8:05             ` Shuqi Liang
2023-07-18 23:29       ` [PATCH v4 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-08-11 14:22       ` [PATCH v5 0/3] " Shuqi Liang
2023-08-11 14:22         ` [PATCH v5 1/3] t1092: add tests for 'git check-attr' Shuqi Liang
2023-08-11 14:22         ` [PATCH v5 2/3] attr.c: read attributes in a sparse directory Shuqi Liang
2023-08-11 14:22         ` [PATCH v5 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-08-14 16:24         ` [PATCH v5 0/3] " Victoria Dye
2023-08-14 17:10           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5e478d8b-9ef4-864b-41e4-e0a79877d278@github.com \
    --to=vdye@github.com \
    --cc=cheskaqiqi@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).