From: Victoria Dye <vdye@github.com>
To: Shuqi Liang <cheskaqiqi@gmail.com>, git@vger.kernel.org
Cc: gitster@pobox.com
Subject: Re: [PATCH v4 2/3] attr.c: read attributes in a sparse directory
Date: Thu, 20 Jul 2023 13:18:47 -0700 [thread overview]
Message-ID: <5e478d8b-9ef4-864b-41e4-e0a79877d278@github.com> (raw)
In-Reply-To: <20230718232916.31660-3-cheskaqiqi@gmail.com>
Shuqi Liang wrote:
> Before this patch, git check-attr was unable to read the attributes from
> a .gitattributes file within a sparse directory. The original comment
> was operating under the assumption that users are only interested in
> files or directories inside the cones. Therefore, in the original code,
> in the case of a cone-mode sparse-checkout, we didn't load the
> .gitattributes file.
>
> However, this behavior can lead to missing attributes for files inside
> sparse directories, causing inconsistencies in file handling.
>
> To resolve this, revise 'git check-attr' to allow attribute reading for
> files in sparse directories from the corresponding .gitattributes files:
>
> 1.Utilize path_in_cone_mode_sparse_checkout() and index_name_pos_sparse
> to check if a path falls within a sparse directory.
>
> 2.If path is inside a sparse directory, employ the value of
> index_name_pos_sparse() to find the sparse directory containing path and
> path relative to sparse directory. Proceed to read attributes from the
> tree OID of the sparse directory using read_attr_from_blob().
>
> 3.If path is not inside a sparse directory,ensure that attributes are
> fetched from the index blob with read_blob_data_from_index().
>
> Modify previous tests so such difference is not considered as an error.
I don't quite follow what you mean here "such difference". I see that you
changed the 'test_expect_failure' to 'test_expect_success', but that's
because the attributes inside a sparse directory are now being read rather
than ignored.
The rest of the commit message looks good to me, though.
>
> Helped-by: Victoria Dye <vdye@github.com>
> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
> ---
> attr.c | 60 +++++++++++++++++-------
> t/t1092-sparse-checkout-compatibility.sh | 4 +-
> 2 files changed, 44 insertions(+), 20 deletions(-)
>
> diff --git a/attr.c b/attr.c
> index 7d39ac4a29..7650f5481a 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -808,35 +808,59 @@ static struct attr_stack *read_attr_from_blob(struct index_state *istate,
> static struct attr_stack *read_attr_from_index(struct index_state *istate,
> const char *path, unsigned flags)
> {
> + struct attr_stack *stack = NULL;
> char *buf;
> unsigned long size;
> + int pos = -1;
> + char normalize_path[PATH_MAX];
> + const char *relative_path;
>
> if (!istate)
> return NULL;
>
> /*
> - * The .gitattributes file only applies to files within its
> - * parent directory. In the case of cone-mode sparse-checkout,
> - * the .gitattributes file is sparse if and only if all paths
> - * within that directory are also sparse. Thus, don't load the
> - * .gitattributes file since it will not matter.
> - *
> - * In the case of a sparse index, it is critical that we don't go
> - * looking for a .gitattributes file, as doing so would cause the
> - * index to expand.
> + * When handling sparse-checkouts, .gitattributes files
> + * may reside within a sparse directory. We distinguish
> + * whether a path exists directly in the index or not by
> + * evaluating if 'pos' is negative.
> + * If 'pos' is negative, the path is not directly present
> + * in the index and is likely within a sparse directory.
> + * For paths not in the index, The absolute value of 'pos'
> + * minus 1 gives us the position where the path would be
> + * inserted in lexicographic order within the index.
> + * We then subtract another 1 from this value
> + * (pos = -pos - 2) to find the position of the last
> + * index entry which is lexicographically smaller than
> + * the path. This would be the sparse directory containing
> + * the path. By identifying the sparse directory containing
> + * the path, we can correctly read the attributes specified
> + * in the .gitattributes file from the tree object of the
> + * sparse directory.
> */
> - if (!path_in_cone_mode_sparse_checkout(path, istate))
> - return NULL;
> + if (!path_in_cone_mode_sparse_checkout(path, istate)) {
> + pos = index_name_pos_sparse(istate, path, strlen(path));
>
> - buf = read_blob_data_from_index(istate, path, &size);
> - if (!buf)
> - return NULL;
> - if (size >= ATTR_MAX_FILE_SIZE) {
> - warning(_("ignoring overly large gitattributes blob '%s'"), path);
> - return NULL;
> + if (pos < 0)
> + pos = -pos - 2;
> }
What 'pos' represents after this block is somewhat confusing/possibly
misleading. Consider the possible cases for 'path':
1. 'path' is inside the sparse-checkout cone.
2. 'path' is not inside the sparse-checkout cone, but it is in the index
(i.e., the sparse directory has been expanded for some reason).
3. 'path' is inside a sparse directory.
In case #1, 'pos' will be '-1'. We never enter the
'!path_in_cone_mode_sparse_checkout()' if-statement, so the value is never
updated.
In case #2, 'pos' will be positive, since it exists in the index.
In case #3, the return value of 'index_name_pos_sparse()' will be negative,
then assigned the value of '-pos - 2' to (in many cases) create a positive
value pointing to the entry before the insertion point of 'path'.
Based on your explanation in the code comment above, I would assume that, if
'pos' was >= 0 coming out of this block, it represented the index position
of the potential sparse directory containing 'path'. But that's not true in
case #2.
To make the purpose of these variables clearer, instead of changing 'pos'
in-place, you could create a variable like 'sparse_dir_pos'. Like 'pos' is
now, it'd be initialized to '-1', but instead of unconditionally assigning
it the output of 'index_name_pos_sparse()', you'd only assign it if the
output of that function is negative:
if (!path_in_cone_mode_sparse_checkout(path, istate)) {
int pos = index_name_pos_sparse(istate, path, strlen(path));
if (pos < 0)
sparse_dir_pos = -pos - 2;
}
Then, you'd use 'sparse_dir_pos' in the condition below, and it'd always be
-1 when 'path' is definitely not contained in a sparse directory.
>
> - return read_attr_from_buf(buf, path, flags);
> + if (pos >= 0 && !path_in_cone_mode_sparse_checkout(path, istate) &&
nit: the check of '!path_in_cone_mode_sparse_checkout()' is redundant, since
'pos' will always be '-1' if 'path_in_cone_mode_sparse_checkout()' is true.
> + S_ISSPARSEDIR(istate->cache[pos]->ce_mode) &&
> + !strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos])) &&
> + !normalize_path_copy(normalize_path, path)) {
The normalization should come before the 'strncmp' so that you're comparing
'normalize_path' to the index entry.
That said, I know I asked about path normalization earlier [1], but did you
confirm that 'path' _isn't_ normalized? Because if it's normalized by all of
the potential callers of this function, the check here would be unnecessary.
[1] https://lore.kernel.org/git/e4a77d0f-cf1d-ef76-fe26-ad5e58372a02@github.com/
> + relative_path = normalize_path + ce_namelen(istate->cache[pos]);
'relative_path' should be declared here, since it's not used outside this
block.
> + stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags);
> + } else {
> + buf = read_blob_data_from_index(istate, path, &size);
> + if (!buf)
> + return NULL;
> + if (size >= ATTR_MAX_FILE_SIZE) {
> + warning(_("ignoring overly large gitattributes blob '%s'"), path);
> + return NULL;
> + }
> + stack = read_attr_from_buf(buf, path, flags);
> + }
> + return stack;
> }
>
> static struct attr_stack *read_attr(struct index_state *istate,
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 90633f383a..3f32c1f972 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -2271,7 +2271,7 @@ test_expect_success 'check-attr with pathspec inside sparse definition' '
> test_all_match git check-attr -a --cached -- deep/a
> '
>
> -test_expect_failure 'check-attr with pathspec outside sparse definition' '
> +test_expect_success 'check-attr with pathspec outside sparse definition' '
Re: my suggested change to the test in patch 1 [2], when I applied _this_
patch, the test still failed for the 'sparse-index' case. It doesn't seem to
be a problem with your patch, but rather a bug in 'diff' that can be
reproduced with this test (using the infrastructure in t1092):
test_expect_failure 'diff --cached shows differences in sparse directory' '
init_repos &&
test_all_match git reset --soft update-folder1 &&
test_all_match git diff --cached -- folder1/a
'
It's not immediately obvious to me what the problem is, but my guess is it's
some mis-handling of sparse directories in the internal diff machinery.
Given the likely complexity of the issue, I'd be content with you leaving
the 'diff --check' test as 'test_expect_failure' with a note about the bug
in 'diff' to fix later. Or, if you do want to investigate & fix it now, I
wouldn't be opposed to that either. :)
[2] https://lore.kernel.org/git/c3ebe3b4-88b9-8ca2-2ee3-39a3e0d82201@github.com/
> init_repos &&
>
> echo "a -crlf myAttr" >>.gitattributes &&
> @@ -2288,7 +2288,7 @@ test_expect_failure 'check-attr with pathspec outside sparse definition' '
> test_all_match git check-attr -a --cached -- folder1/a
> '
>
> -test_expect_failure 'diff --check with pathspec outside sparse definition' '
> +test_expect_success 'diff --check with pathspec outside sparse definition' '
> init_repos &&
>
> write_script edit-contents <<-\EOF &&
next prev parent reply other threads:[~2023-07-20 20:18 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-01 6:48 [PATCH v1 0/3] check-attr: integrate with sparse-index Shuqi Liang
2023-07-01 6:48 ` [PATCH v1 1/3] attr.c: read attributes in a sparse directory Shuqi Liang
2023-07-03 17:59 ` Victoria Dye
2023-07-01 6:48 ` [PATCH v1 2/3] t1092: add tests for `git check-attr` Shuqi Liang
2023-07-03 18:11 ` Victoria Dye
2023-07-01 6:48 ` [PATCH v1 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-07-03 18:21 ` Victoria Dye
2023-07-07 15:18 ` [PATCH v2 0/3] " Shuqi Liang
2023-07-07 15:18 ` [PATCH v2 1/3] Enable gitattributes read from sparse directories Shuqi Liang
2023-07-07 23:15 ` Junio C Hamano
2023-07-07 15:18 ` [PATCH v2 2/3] t1092: add tests for `git check-attr` Shuqi Liang
2023-07-07 15:18 ` [PATCH v2 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-07-11 13:30 ` [PATCH v3 0/3] " Shuqi Liang
2023-07-11 13:30 ` [PATCH v3 1/3] attr.c: read attributes in a sparse directory Shuqi Liang
2023-07-11 21:15 ` Junio C Hamano
2023-07-11 22:08 ` Junio C Hamano
2023-07-13 20:22 ` Shuqi Liang
2023-07-13 20:13 ` Shuqi Liang
2023-07-11 21:24 ` Victoria Dye
2023-07-11 13:30 ` [PATCH v3 2/3] t1092: add tests for `git check-attr` Shuqi Liang
2023-07-11 18:52 ` Junio C Hamano
2023-07-11 20:47 ` Victoria Dye
2023-07-11 13:30 ` [PATCH v3 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-07-11 20:07 ` Junio C Hamano
2023-07-11 16:56 ` [PATCH v3 0/3] " Junio C Hamano
2023-07-18 23:29 ` [PATCH v4 " Shuqi Liang
2023-07-18 23:29 ` [PATCH v4 1/3] t1092: add tests for 'git check-attr' Shuqi Liang
2023-07-20 18:43 ` Victoria Dye
2023-07-18 23:29 ` [PATCH v4 2/3] attr.c: read attributes in a sparse directory Shuqi Liang
2023-07-20 20:18 ` Victoria Dye [this message]
2023-08-03 16:22 ` Glen Choo
2023-08-15 8:05 ` Shuqi Liang
2023-07-18 23:29 ` [PATCH v4 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-08-11 14:22 ` [PATCH v5 0/3] " Shuqi Liang
2023-08-11 14:22 ` [PATCH v5 1/3] t1092: add tests for 'git check-attr' Shuqi Liang
2023-08-11 14:22 ` [PATCH v5 2/3] attr.c: read attributes in a sparse directory Shuqi Liang
2023-08-11 14:22 ` [PATCH v5 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-08-14 16:24 ` [PATCH v5 0/3] " Victoria Dye
2023-08-14 17:10 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5e478d8b-9ef4-864b-41e4-e0a79877d278@github.com \
--to=vdye@github.com \
--cc=cheskaqiqi@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.