git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: derrickstolee@github.com, shaoxuan.yuan02@gmail.com,
	newren@gmail.com, gitster@pobox.com,
	Victoria Dye <vdye@github.com>
Subject: [PATCH v2 0/4] reset/checkout: fix miscellaneous sparse index bugs
Date: Sun, 07 Aug 2022 02:57:05 +0000	[thread overview]
Message-ID: <pull.1312.v2.git.1659841030.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1312.git.1659645967.gitgitgadget@gmail.com>

While working on sparse index integration for 'git rm' [1], Shaoxuan found
that removed sparse directories, when reset, would no longer be sparse. This
was due to how 'unpack_trees()' determined whether a traversed directory was
a sparse directory or not; it would only unpack an entry as a sparse
directory if it existed in the index. However, if the sparse directory was
removed, it would be treated like a non-sparse directory and its contents
would be individually unpacked.

To avoid this unnecessary traversal and keep the results of 'reset' as
sparse as possible, the decision logic for whether a directory is sparse is
changed to:

 * If the directory is a sparse directory in the index, unpack it.
 * If not, is the directory inside the sparse cone? If so, do not unpack it.
 * If the directory is outside the sparse cone, does it have any child
   entries in the index? If so, do not unpack it.
 * Otherwise, unpack the entry as a sparse directory.

In the process of updating 'reset', a separate issue was found in 'checkout'
where collapsed sparse directories did not have modified contents reported
file-by-file. A similar bug was found with 'status' in 2c521b0e49 (status:
fix nested sparse directory diff in sparse index, 2022-03-01), and
'checkout' was corrected the same way (setting the diff flag 'recursive' to
1).


Changes since V1
================

 * Reverted the removal of 'index_entry_exists()' to avoid breaking other
   in-flight series.
 * Renamed 'is_missing_sparse_dir()' to 'is_new_sparse_dir()'; revised
   comments and commit messages to clarify what that function is doing and
   why.
 * Handled "unexpected" inputs to 'is_new_sparse_dir()' more gently,
   returning 0 if 'p' is not a directory or the directory already exists in
   the index (rather than exiting with 'BUG()'). This is intended to make
   'is_new_sparse_dir()' less reliant on information about the index
   established by 'unpack_callback()' & 'unpack_single_entry()', resulting
   in easier-to-read and more reusable code.

Thanks!

 * Victoria

[1]
https://lore.kernel.org/git/20220803045118.1243087-1-shaoxuan.yuan02@gmail.com/

Victoria Dye (4):
  checkout: fix nested sparse directory diff in sparse index
  oneway_diff: handle removed sparse directories
  cache.h: create 'index_name_pos_sparse()'
  unpack-trees: unpack new trees as sparse directories

 builtin/checkout.c                       |   1 +
 cache.h                                  |   9 ++
 diff-lib.c                               |   5 ++
 read-cache.c                             |   5 ++
 t/t1092-sparse-checkout-compatibility.sh |  25 ++++++
 unpack-trees.c                           | 106 ++++++++++++++++++++---
 6 files changed, 141 insertions(+), 10 deletions(-)


base-commit: 4af7188bc97f70277d0f10d56d5373022b1fa385
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1312%2Fvdye%2Freset%2Fhandle-missing-dirs-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1312/vdye/reset/handle-missing-dirs-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1312

Range-diff vs v1:

 1:  255318f4dc6 = 1:  255318f4dc6 checkout: fix nested sparse directory diff in sparse index
 2:  55c77ba4b29 = 2:  55c77ba4b29 oneway_diff: handle removed sparse directories
 3:  f7978d223fe ! 3:  d0bdec63286 cache.h: replace 'index_entry_exists()' with 'index_name_pos_sparse()'
     @@ Metadata
      Author: Victoria Dye <vdye@github.com>
      
       ## Commit message ##
     -    cache.h: replace 'index_entry_exists()' with 'index_name_pos_sparse()'
     +    cache.h: create 'index_name_pos_sparse()'
      
     -    Replace 'index_entry_exists()' (which returns a binary '1' or '0' depending
     -    on whether a specified entry exists in the index) with
     -    'index_name_pos_sparse()' (which behaves the same as 'index_name_pos()',
     +    Add 'index_name_pos_sparse()', which behaves the same as 'index_name_pos()',
          except that it does not expand a sparse index to search for an entry inside
     -    a sparse directory).
     +    a sparse directory.
      
     -    'index_entry_exists()' was original implemented in 20ec2d034c (reset: make
     -    sparse-aware (except --mixed), 2021-11-29) to allow callers to search for an
     -    index entry without expanding a sparse index. That particular case only
     -    required knowing whether the requested entry existed. This patch expands the
     -    amount of information returned by indicating both 1) whether the entry
     -    exists, and 2) its position (or potential position) in the index.
     +    'index_entry_exists()' was originally implemented in 20ec2d034c (reset: make
     +    sparse-aware (except --mixed), 2021-11-29) as an alternative to
     +    'index_name_pos()' to allow callers to search for an index entry without
     +    expanding a sparse index. However, that particular use case only required
     +    knowing whether the requested entry existed, so 'index_entry_exists()' does
     +    not return the index positioning information provided by 'index_name_pos()'.
      
     -    Signed-off-by: Victoria Dye <vdye@github.com>
     +    This patch implements 'index_name_pos_sparse()' to accommodate callers that
     +    need the positioning information of 'index_name_pos()', but do not want to
     +    expand the index.
      
     - ## cache-tree.c ##
     -@@ cache-tree.c: static void prime_cache_tree_rec(struct repository *r,
     - 			 * as normal.
     - 			 */
     - 			if (r->index->sparse_index &&
     --			    index_entry_exists(r->index, tree_path->buf, tree_path->len))
     -+			    index_name_pos_sparse(r->index, tree_path->buf, tree_path->len) >= 0)
     - 				prime_cache_tree_sparse_dir(sub->cache_tree, subtree);
     - 			else
     - 				prime_cache_tree_rec(r, sub->cache_tree, subtree, tree_path);
     +    Signed-off-by: Victoria Dye <vdye@github.com>
      
       ## cache.h ##
      @@ cache.h: struct cache_entry *index_file_exists(struct index_state *istate, const char *na
     +  */
       int index_name_pos(struct index_state *, const char *name, int namelen);
       
     - /*
     -- * Determines whether an entry with the given name exists within the
     -- * given index. The return value is 1 if an exact match is found, otherwise
     -- * it is 0. Note that, unlike index_name_pos, this function does not expand
     -- * the index if it is sparse. If an item exists within the full index but it
     -- * is contained within a sparse directory (and not in the sparse index), 0 is
     -- * returned.
     -- */
     --int index_entry_exists(struct index_state *, const char *name, int namelen);
     ++/*
      + * Like index_name_pos, returns the position of an entry of the given name in
      + * the index if one exists, otherwise returns a negative value where the negated
      + * value minus 1 is the position where the index entry would be inserted. Unlike
     @@ cache.h: struct cache_entry *index_file_exists(struct index_state *istate, const
      + * inside a sparse directory.
      + */
      +int index_name_pos_sparse(struct index_state *, const char *name, int namelen);
     - 
     ++
       /*
     -  * Some functions return the negative complement of an insert position when a
     +  * Determines whether an entry with the given name exists within the
     +  * given index. The return value is 1 if an exact match is found, otherwise
      
       ## read-cache.c ##
      @@ read-cache.c: int index_name_pos(struct index_state *istate, const char *name, int namelen)
       	return index_name_stage_pos(istate, name, namelen, 0, EXPAND_SPARSE);
       }
       
     --int index_entry_exists(struct index_state *istate, const char *name, int namelen)
      +int index_name_pos_sparse(struct index_state *istate, const char *name, int namelen)
     - {
     --	return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0;
     ++{
      +	return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE);
     - }
     - 
     - int remove_index_entry_at(struct index_state *istate, int pos)
     ++}
     ++
     + int index_entry_exists(struct index_state *istate, const char *name, int namelen)
     + {
     + 	return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0;
 4:  016971a6711 ! 4:  97ca668102c unpack-trees: handle missing sparse directories
     @@ Metadata
      Author: Victoria Dye <vdye@github.com>
      
       ## Commit message ##
     -    unpack-trees: handle missing sparse directories
     +    unpack-trees: unpack new trees as sparse directories
      
     -    If a sparse directory does not exist in the index, unpack it at the
     -    directory level rather than recursing into it an unpacking its contents
     -    file-by-file. This helps keep the sparse index as collapsed as possible in
     -    cases such as 'git reset --hard' restoring a sparse directory.
     +    If 'unpack_single_entry()' is unpacking a new directory tree (that is, one
     +    not already present in the index) into a sparse index, unpack the tree as a
     +    sparse directory rather than traversing its contents and unpacking each file
     +    individually. This helps keep the sparse index as collapsed as possible in
     +    cases such as 'git reset --hard' restoring a outside-of-cone directory
     +    removed with 'git rm -r --sparse'.
      
     -    A directory is determined to be truly non-existent in the index (rather than
     -    the parent of existing index entries), if 1) its path is outside the sparse
     -    cone and 2) there are no children of the directory in the index. This check
     -    is performed by 'missing_dir_is_sparse()' in 'unpack_single_entry()'. If the
     -    directory is a missing sparse dir, 'unpack_single_entry()'  will proceed
     -    with unpacking it. This determination is also propagated back up to
     -    'unpack_callback()' via 'is_missing_sparse_dir' to prevent further tree
     -    traversal into the unpacked directory.
     +    Without this patch, 'unpack_single_entry()' will only unpack a directory
     +    into the index as a sparse directory (rather than traversing into it and
     +    unpacking its files one-by-one) if an entry with the same name already
     +    exists in the index. This patch allows sparse directory unpacking without a
     +    matching index entry when the following conditions are met:
     +
     +    1. the directory's path is outside the sparse cone, and
     +    2. there are no children of the directory in the index
     +
     +    If a directory meets these requirements (as determined by
     +    'is_new_sparse_dir()'), 'unpack_single_entry()' unpacks the sparse directory
     +    index entry and propagates the decision back up to 'unpack_callback()' to
     +    prevent unnecessary tree traversal into the unpacked directory.
      
          Reported-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
          Signed-off-by: Victoria Dye <vdye@github.com>
     @@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse
       }
       
      +/*
     -+ * Determine whether the path specified corresponds to a sparse directory
     -+ * completely missing from the index. This function is assumed to only be
     -+ * called when the named path isn't already in the index.
     ++ * Determine whether the path specified by 'p' should be unpacked as a new
     ++ * sparse directory in a sparse index. A new sparse directory 'A/':
     ++ * - must be outside the sparse cone.
     ++ * - must not already be in the index (i.e., no index entry with name 'A/'
     ++ *   exists).
     ++ * - must not have any child entries in the index (i.e., no index entry
     ++ *   'A/<something>' exists).
     ++ * If 'p' meets the above requirements, return 1; otherwise, return 0.
      + */
     -+static int missing_dir_is_sparse(const struct traverse_info *info,
     -+				 const struct name_entry *p)
     ++static int entry_is_new_sparse_dir(const struct traverse_info *info,
     ++				   const struct name_entry *p)
      +{
      +	int res, pos;
      +	struct strbuf dirpath = STRBUF_INIT;
      +	struct unpack_trees_options *o = info->data;
      +
     ++	if (!S_ISDIR(p->mode))
     ++		return 0;
     ++
      +	/*
     -+	 * First, check whether the path is in the sparse cone. If it is,
     -+	 * then this directory shouldn't be sparse.
     ++	 * If the path is inside the sparse cone, it can't be a sparse directory.
      +	 */
      +	strbuf_add(&dirpath, info->traverse_path, info->pathlen);
      +	strbuf_add(&dirpath, p->path, p->pathlen);
     @@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse
      +		goto cleanup;
      +	}
      +
     -+	/*
     -+	 * Given that the directory is not inside the sparse cone, it could be
     -+	 * (partially) expanded in the index. If child entries exist, the path
     -+	 * is not a missing sparse directory.
     -+	 */
      +	pos = index_name_pos_sparse(o->src_index, dirpath.buf, dirpath.len);
     -+	if (pos >= 0)
     -+		BUG("cache entry '%s%s' shouldn't exist in the index",
     -+		    info->traverse_path, p->path);
     ++	if (pos >= 0) {
     ++		/* Path is already in the index, not a new sparse dir */
     ++		res = 0;
     ++		goto cleanup;
     ++	}
      +
     ++	/* Where would this sparse dir be inserted into the index? */
      +	pos = -pos - 1;
      +	if (pos >= o->src_index->cache_nr) {
     ++		/*
     ++		 * Sparse dir would be inserted at the end of the index, so we
     ++		 * know it has no child entries.
     ++		 */
      +		res = 1;
      +		goto cleanup;
      +	}
      +
     ++	/*
     ++	 * If the dir has child entries in the index, the first would be at the
     ++	 * position the sparse directory would be inserted. If the entry at this
     ++	 * position is inside the dir, not a new sparse dir.
     ++	 */
      +	res = strncmp(o->src_index->cache[pos]->name, dirpath.buf, dirpath.len);
      +
      +cleanup:
     @@ unpack-trees.c: static int unpack_single_entry(int n, unsigned long mask,
       			       const struct name_entry *names,
      -			       const struct traverse_info *info)
      +			       const struct traverse_info *info,
     -+			       int *is_missing_sparse_dir)
     ++			       int *is_new_sparse_dir)
       {
       	int i;
       	struct unpack_trees_options *o = info->data;
     @@ unpack-trees.c: static int unpack_single_entry(int n, unsigned long mask,
       
      -	if (mask == dirmask && !src[0])
      -		return 0;
     -+	*is_missing_sparse_dir = 0;
     ++	*is_new_sparse_dir = 0;
      +	if (mask == dirmask && !src[0]) {
      +		/*
     -+		 * If the directory is completely missing from the index but
     -+		 * would otherwise be a sparse directory, we should unpack it.
     -+		 * If not, we'll return and continue recursively traversing the
     -+		 * tree.
     ++		 * If we're not in a sparse index, we can't unpack a directory
     ++		 * without recursing into it, so we return.
      +		 */
      +		if (!o->src_index->sparse_index)
      +			return 0;
     @@ unpack-trees.c: static int unpack_single_entry(int n, unsigned long mask,
      +		while (!p->mode)
      +			p++;
      +
     -+		*is_missing_sparse_dir = missing_dir_is_sparse(info, p);
     -+		if (!*is_missing_sparse_dir)
     ++		/*
     ++		 * If the directory is completely missing from the index but
     ++		 * would otherwise be a sparse directory, we should unpack it.
     ++		 * If not, we'll return and continue recursively traversing the
     ++		 * tree.
     ++		 */
     ++		*is_new_sparse_dir = entry_is_new_sparse_dir(info, p);
     ++		if (!*is_new_sparse_dir)
      +			return 0;
      +	}
       
     @@ unpack-trees.c: static int unpack_single_entry(int n, unsigned long mask,
      -	if (mask == dirmask && src[0] &&
      -	    S_ISSPARSEDIR(src[0]->ce_mode))
      +	if (mask == dirmask &&
     -+	    (*is_missing_sparse_dir || (src[0] && S_ISSPARSEDIR(src[0]->ce_mode))))
     ++	    (*is_new_sparse_dir || (src[0] && S_ISSPARSEDIR(src[0]->ce_mode))))
       		conflicts = 0;
       
       	/*
     @@ unpack-trees.c: static int unpack_sparse_callback(int n, unsigned long mask, uns
       	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
       	struct unpack_trees_options *o = info->data;
      -	int ret;
     -+	int ret, is_missing_sparse_dir;
     ++	int ret, is_new_sparse_dir;
       
       	assert(o->merge);
       
     @@ unpack-trees.c: static int unpack_sparse_callback(int n, unsigned long mask, uns
       	 * 'dirmask' accordingly.
       	 */
      -	ret = unpack_single_entry(n - 1, mask >> 1, dirmask >> 1, src, names + 1, info);
     -+	ret = unpack_single_entry(n - 1, mask >> 1, dirmask >> 1, src, names + 1, info, &is_missing_sparse_dir);
     ++	ret = unpack_single_entry(n - 1, mask >> 1, dirmask >> 1, src, names + 1, info, &is_new_sparse_dir);
       
       	if (src[0])
       		discard_cache_entry(src[0]);
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
       	struct unpack_trees_options *o = info->data;
       	const struct name_entry *p = names;
     -+	int is_missing_sparse_dir;
     ++	int is_new_sparse_dir;
       
       	/* Find first entry with a real name (we could use "mask" too) */
       	while (!p->mode)
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       	}
       
      -	if (unpack_single_entry(n, mask, dirmask, src, names, info) < 0)
     -+	if (unpack_single_entry(n, mask, dirmask, src, names, info, &is_missing_sparse_dir))
     ++	if (unpack_single_entry(n, mask, dirmask, src, names, info, &is_new_sparse_dir))
       		return -1;
       
       	if (o->merge && src[0]) {
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       		}
       
       		if (!is_sparse_directory_entry(src[0], names, info) &&
     -+		    !is_missing_sparse_dir &&
     ++		    !is_new_sparse_dir &&
       		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
       						    names, info) < 0) {
       			return -1;

-- 
gitgitgadget

  parent reply	other threads:[~2022-08-07  2:57 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-04 20:46 [PATCH 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
2022-08-04 20:46 ` [PATCH 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
2022-08-05 17:59   ` Derrick Stolee
2022-08-04 20:46 ` [PATCH 2/4] oneway_diff: handle removed sparse directories Victoria Dye via GitGitGadget
2022-08-04 20:46 ` [PATCH 3/4] cache.h: replace 'index_entry_exists()' with 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
2022-08-04 22:16   ` Junio C Hamano
2022-08-06  0:09   ` Junio C Hamano
2022-08-04 20:46 ` [PATCH 4/4] unpack-trees: handle missing sparse directories Victoria Dye via GitGitGadget
2022-08-04 23:23   ` Junio C Hamano
2022-08-05 16:36     ` Victoria Dye
2022-08-05 19:24       ` Junio C Hamano
2022-08-07  2:57 ` Victoria Dye via GitGitGadget [this message]
2022-08-07  2:57   ` [PATCH v2 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
2022-08-07  2:57   ` [PATCH v2 2/4] oneway_diff: handle removed sparse directories Victoria Dye via GitGitGadget
2022-08-07  2:57   ` [PATCH v2 3/4] cache.h: create 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
2022-08-07  2:57   ` [PATCH v2 4/4] unpack-trees: unpack new trees as sparse directories Victoria Dye via GitGitGadget
2022-08-08 19:07   ` [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
2022-08-08 19:07     ` [PATCH v3 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
2022-08-08 19:07     ` [PATCH v3 2/4] oneway_diff: handle removed sparse directories Victoria Dye via GitGitGadget
2022-08-08 19:07     ` [PATCH v3 3/4] cache.h: create 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
2022-08-08 19:07     ` [PATCH v3 4/4] unpack-trees: unpack new trees as sparse directories Victoria Dye via GitGitGadget
2022-08-08 21:17     ` [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs Junio C Hamano
2022-08-09 13:20       ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1312.v2.git.1659841030.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=shaoxuan.yuan02@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).