git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Victoria Dye <vdye@github.com>
To: Shuqi Liang <cheskaqiqi@gmail.com>, git@vger.kernel.org
Cc: gitster@pobox.com
Subject: Re: [PATCH v3 1/3] attr.c: read attributes in a sparse directory
Date: Tue, 11 Jul 2023 14:24:37 -0700	[thread overview]
Message-ID: <e4a77d0f-cf1d-ef76-fe26-ad5e58372a02@github.com> (raw)
In-Reply-To: <20230711133035.16916-2-cheskaqiqi@gmail.com>

Shuqi Liang wrote:
> 'git check-attr' cannot currently find attributes of a file within a
> sparse directory. This is due to .gitattributes files are irrelevant in
> sparse-checkout cone mode, as the file is considered sparse only if all
> paths within its parent directory are also sparse. 

If .gitattributes files are irrelevant in sparse-checkout cone mode, then
why are we changing the behavior? If you're challenging that assertion,
please state so clearly.

> In addition,> searching for a .gitattributes file causes expansion of the sparse
> index, which is avoided to prevent potential performance degradation.

This isn't an unchangeable fact (as your implementation below shows).
Expanding the index is just the most straightforward approach, but the
performance cost of that is (AFAICT) a reason used to justify why we didn't
read sparse directory attributes in the past.

> 
> However, this behavior can lead to missing attributes for files inside
> sparse directories, causing inconsistencies in file handling.
> 
> To resolve this, revise 'git check-attr' to allow attribute reading for
> files in sparse directories from the corresponding .gitattributes files:
> 
> 1.Utilize path_in_cone_mode_sparse_checkout() and index_name_pos_sparse
> to check if a path falls within a sparse directory.
> 
> 2.If path is inside a sparse directory, employ the value of
> index_name_pos_sparse() to find the sparse directory containing path and
> path relative to sparse directory. Proceed to read attributes from the
> tree OID of the sparse directory using read_attr_from_blob().
> 
> 3.If path is not inside a sparse directory,ensure that attributes are
> fetched from the index blob with read_blob_data_from_index().

Makes sense to me.

> 
> Helped-by: Victoria Dye <vdye@github.com>
> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
> ---
>  attr.c | 47 ++++++++++++++++++++++++++++-------------------
>  1 file changed, 28 insertions(+), 19 deletions(-)
> 
> diff --git a/attr.c b/attr.c
> index 7d39ac4a29..be06747b0d 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -808,35 +808,44 @@ static struct attr_stack *read_attr_from_blob(struct index_state *istate,
>  static struct attr_stack *read_attr_from_index(struct index_state *istate,
>  					       const char *path, unsigned flags)
>  {
> +	struct attr_stack *stack = NULL;
>  	char *buf;
>  	unsigned long size;
> +	int pos;
>  
>  	if (!istate)
>  		return NULL;
>  
>  	/*
> -	 * The .gitattributes file only applies to files within its
> -	 * parent directory. In the case of cone-mode sparse-checkout,
> -	 * the .gitattributes file is sparse if and only if all paths
> -	 * within that directory are also sparse. Thus, don't load the
> -	 * .gitattributes file since it will not matter.
> -	 *
> -	 * In the case of a sparse index, it is critical that we don't go
> -	 * looking for a .gitattributes file, as doing so would cause the
> -	 * index to expand.
> +	 * If the pos value is negative, it means the path is not in the index. 
> +	 * However, the absolute value of pos minus 1 gives us the position where the path 
> +	 * would be inserted in lexicographic order. By subtracting another 1 from this 
> +	 * value (pos = -pos - 2), we find the position of the last index entry 
> +	 * which is lexicographically smaller than the provided path. This would be 
> +	 * the sparse directory containing the path.

This is a good explanation of what '-pos - 2' represents, but it doesn't
explain why we'd want that value. Could you add a bit of detail around why
1) we care whether 'pos' identifies a value that exists in the index or not,
and 2) why we're looking for the sparse directory containing the path?

>  	 */
> -	if (!path_in_cone_mode_sparse_checkout(path, istate))
> -		return NULL;
> +	pos = index_name_pos_sparse(istate, path, strlen(path));
> +	pos = - pos - 2;

nit: don't add the space between '-' and 'pos'. This should be:

	pos = -pos - 2;

>  
> -	buf = read_blob_data_from_index(istate, path, &size);
> -	if (!buf)
> -		return NULL;
> -	if (size >= ATTR_MAX_FILE_SIZE) {
> -		warning(_("ignoring overly large gitattributes blob '%s'"), path);
> -		return NULL;
> -	}
> +	if (!path_in_cone_mode_sparse_checkout(path, istate) && 0 <= pos) {

Typically, we try to put the less expensive operation first in a condition
like this (if the first part of the condition is 'false', the second part
won't be evaluated). 'path_in_cone_mode_sparse_checkout()' is more expensive
than a simple numerical check, so this should probably be:

	if (pos >= 0 && !path_in_cone_mode_sparse_checkout(path, istate)) {

But on a more general note, why check 'path_in_cone_mode_sparse_checkout()'
at all? The goal is to determine whether 'path' is inside a sparse
directory, so first you search the index to find where that directory would
be, then - if 'path' isn't in the sparse-checkout cone - check whether the
index entry you found is a sparse directory. But sparse directories can't
exist within the sparse-checkout cone in the first place, so the
'path_in_cone_mode_sparse_checkout()' is redundant. 

Instead, 'path_in_cone_mode_sparse_checkout()' (and probably
'istate->sparse_index', since sparse directories can't exist if the index
isn't sparse) could be used to avoid calculating 'index_name_pos_sparse()'
in the first place; the index search operation is generally more expensive
than 'path_in_cone_mode_sparse_checkout()', especially when sparse-checkout
is disabled entirely.

> +		if (!S_ISSPARSEDIR(istate->cache[pos]->ce_mode))
> +			return NULL;
>  
> -	return read_attr_from_buf(buf, path, flags);
> +		if (strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos])) == 0) {

All of these nested conditions could be simplified/collapsed into a single,
top-level condition:

	if (pos >= 0 && !path_in_cone_mode_sparse_checkout(path, istate) &&
	    S_ISSPARSEDIR(istate->cache[pos]->ce_mode) &&
	    !strncmp(istate->cache[pos]->name, path, ce_namelen(istate->cache[pos]))) {

IMO, this also more clearly reflects _why_ you'd want to enter this
condition and read from the index directly:

* If the path is not in the sparse-checkout cone
* AND the index entry preceding 'path' is a sparse directory
* AND the sparse directory is the prefix of 'path' (i.e., 'path' is in the
  directory) 
    -> Read from the sparse directory's tree

One other quick sanity check - for the sparse directory prefixing check to
work, 'path' needs to be a normalized path relative to the root of the repo.
Is that guaranteed to be the case here?

> +			const char *relative_path = path + ce_namelen(istate->cache[pos]);  

Here, you get the relative path within the sparse directory by skipping past
the sparse directory name in 'path'. If 'path' is normalized (see above),
this works. Nice!

> +			stack = read_attr_from_blob(istate, &istate->cache[pos]->oid, relative_path, flags);
> +		}
> +	} else {
> +		buf = read_blob_data_from_index(istate, path, &size);
> +		if (!buf)
> +			return NULL;
> +		if (size >= ATTR_MAX_FILE_SIZE) {
> +			warning(_("ignoring overly large gitattributes blob '%s'"), path);
> +			return NULL;
> +		}
> +		stack = read_attr_from_buf(buf, path, flags);
> +	}
> +	return stack;
>  }
>  
>  static struct attr_stack *read_attr(struct index_state *istate,


  parent reply	other threads:[~2023-07-11 21:24 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-01  6:48 [PATCH v1 0/3] check-attr: integrate with sparse-index Shuqi Liang
2023-07-01  6:48 ` [PATCH v1 1/3] attr.c: read attributes in a sparse directory Shuqi Liang
2023-07-03 17:59   ` Victoria Dye
2023-07-01  6:48 ` [PATCH v1 2/3] t1092: add tests for `git check-attr` Shuqi Liang
2023-07-03 18:11   ` Victoria Dye
2023-07-01  6:48 ` [PATCH v1 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-07-03 18:21   ` Victoria Dye
2023-07-07 15:18 ` [PATCH v2 0/3] " Shuqi Liang
2023-07-07 15:18   ` [PATCH v2 1/3] Enable gitattributes read from sparse directories Shuqi Liang
2023-07-07 23:15     ` Junio C Hamano
2023-07-07 15:18   ` [PATCH v2 2/3] t1092: add tests for `git check-attr` Shuqi Liang
2023-07-07 15:18   ` [PATCH v2 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-07-11 13:30   ` [PATCH v3 0/3] " Shuqi Liang
2023-07-11 13:30     ` [PATCH v3 1/3] attr.c: read attributes in a sparse directory Shuqi Liang
2023-07-11 21:15       ` Junio C Hamano
2023-07-11 22:08         ` Junio C Hamano
2023-07-13 20:22           ` Shuqi Liang
2023-07-13 20:13         ` Shuqi Liang
2023-07-11 21:24       ` Victoria Dye [this message]
2023-07-11 13:30     ` [PATCH v3 2/3] t1092: add tests for `git check-attr` Shuqi Liang
2023-07-11 18:52       ` Junio C Hamano
2023-07-11 20:47       ` Victoria Dye
2023-07-11 13:30     ` [PATCH v3 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-07-11 20:07       ` Junio C Hamano
2023-07-11 16:56     ` [PATCH v3 0/3] " Junio C Hamano
2023-07-18 23:29     ` [PATCH v4 " Shuqi Liang
2023-07-18 23:29       ` [PATCH v4 1/3] t1092: add tests for 'git check-attr' Shuqi Liang
2023-07-20 18:43         ` Victoria Dye
2023-07-18 23:29       ` [PATCH v4 2/3] attr.c: read attributes in a sparse directory Shuqi Liang
2023-07-20 20:18         ` Victoria Dye
2023-08-03 16:22           ` Glen Choo
2023-08-15  8:05             ` Shuqi Liang
2023-07-18 23:29       ` [PATCH v4 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-08-11 14:22       ` [PATCH v5 0/3] " Shuqi Liang
2023-08-11 14:22         ` [PATCH v5 1/3] t1092: add tests for 'git check-attr' Shuqi Liang
2023-08-11 14:22         ` [PATCH v5 2/3] attr.c: read attributes in a sparse directory Shuqi Liang
2023-08-11 14:22         ` [PATCH v5 3/3] check-attr: integrate with sparse-index Shuqi Liang
2023-08-14 16:24         ` [PATCH v5 0/3] " Victoria Dye
2023-08-14 17:10           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e4a77d0f-cf1d-ef76-fe26-ad5e58372a02@github.com \
    --to=vdye@github.com \
    --cc=cheskaqiqi@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).