From: "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: derrickstolee@github.com, shaoxuan.yuan02@gmail.com,
newren@gmail.com, gitster@pobox.com,
Victoria Dye <vdye@github.com>
Subject: [PATCH v2 0/4] reset/checkout: fix miscellaneous sparse index bugs
Date: Sun, 07 Aug 2022 02:57:05 +0000 [thread overview]
Message-ID: <pull.1312.v2.git.1659841030.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1312.git.1659645967.gitgitgadget@gmail.com>
While working on sparse index integration for 'git rm' [1], Shaoxuan found
that removed sparse directories, when reset, would no longer be sparse. This
was due to how 'unpack_trees()' determined whether a traversed directory was
a sparse directory or not; it would only unpack an entry as a sparse
directory if it existed in the index. However, if the sparse directory was
removed, it would be treated like a non-sparse directory and its contents
would be individually unpacked.
To avoid this unnecessary traversal and keep the results of 'reset' as
sparse as possible, the decision logic for whether a directory is sparse is
changed to:
* If the directory is a sparse directory in the index, unpack it.
* If not, is the directory inside the sparse cone? If so, do not unpack it.
* If the directory is outside the sparse cone, does it have any child
entries in the index? If so, do not unpack it.
* Otherwise, unpack the entry as a sparse directory.
In the process of updating 'reset', a separate issue was found in 'checkout'
where collapsed sparse directories did not have modified contents reported
file-by-file. A similar bug was found with 'status' in 2c521b0e49 (status:
fix nested sparse directory diff in sparse index, 2022-03-01), and
'checkout' was corrected the same way (setting the diff flag 'recursive' to
1).
Changes since V1
================
* Reverted the removal of 'index_entry_exists()' to avoid breaking other
in-flight series.
* Renamed 'is_missing_sparse_dir()' to 'is_new_sparse_dir()'; revised
comments and commit messages to clarify what that function is doing and
why.
* Handled "unexpected" inputs to 'is_new_sparse_dir()' more gently,
returning 0 if 'p' is not a directory or the directory already exists in
the index (rather than exiting with 'BUG()'). This is intended to make
'is_new_sparse_dir()' less reliant on information about the index
established by 'unpack_callback()' & 'unpack_single_entry()', resulting
in easier-to-read and more reusable code.
Thanks!
* Victoria
[1]
https://lore.kernel.org/git/20220803045118.1243087-1-shaoxuan.yuan02@gmail.com/
Victoria Dye (4):
checkout: fix nested sparse directory diff in sparse index
oneway_diff: handle removed sparse directories
cache.h: create 'index_name_pos_sparse()'
unpack-trees: unpack new trees as sparse directories
builtin/checkout.c | 1 +
cache.h | 9 ++
diff-lib.c | 5 ++
read-cache.c | 5 ++
t/t1092-sparse-checkout-compatibility.sh | 25 ++++++
unpack-trees.c | 106 ++++++++++++++++++++---
6 files changed, 141 insertions(+), 10 deletions(-)
base-commit: 4af7188bc97f70277d0f10d56d5373022b1fa385
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1312%2Fvdye%2Freset%2Fhandle-missing-dirs-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1312/vdye/reset/handle-missing-dirs-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1312
Range-diff vs v1:
1: 255318f4dc6 = 1: 255318f4dc6 checkout: fix nested sparse directory diff in sparse index
2: 55c77ba4b29 = 2: 55c77ba4b29 oneway_diff: handle removed sparse directories
3: f7978d223fe ! 3: d0bdec63286 cache.h: replace 'index_entry_exists()' with 'index_name_pos_sparse()'
@@ Metadata
Author: Victoria Dye <vdye@github.com>
## Commit message ##
- cache.h: replace 'index_entry_exists()' with 'index_name_pos_sparse()'
+ cache.h: create 'index_name_pos_sparse()'
- Replace 'index_entry_exists()' (which returns a binary '1' or '0' depending
- on whether a specified entry exists in the index) with
- 'index_name_pos_sparse()' (which behaves the same as 'index_name_pos()',
+ Add 'index_name_pos_sparse()', which behaves the same as 'index_name_pos()',
except that it does not expand a sparse index to search for an entry inside
- a sparse directory).
+ a sparse directory.
- 'index_entry_exists()' was original implemented in 20ec2d034c (reset: make
- sparse-aware (except --mixed), 2021-11-29) to allow callers to search for an
- index entry without expanding a sparse index. That particular case only
- required knowing whether the requested entry existed. This patch expands the
- amount of information returned by indicating both 1) whether the entry
- exists, and 2) its position (or potential position) in the index.
+ 'index_entry_exists()' was originally implemented in 20ec2d034c (reset: make
+ sparse-aware (except --mixed), 2021-11-29) as an alternative to
+ 'index_name_pos()' to allow callers to search for an index entry without
+ expanding a sparse index. However, that particular use case only required
+ knowing whether the requested entry existed, so 'index_entry_exists()' does
+ not return the index positioning information provided by 'index_name_pos()'.
- Signed-off-by: Victoria Dye <vdye@github.com>
+ This patch implements 'index_name_pos_sparse()' to accommodate callers that
+ need the positioning information of 'index_name_pos()', but do not want to
+ expand the index.
- ## cache-tree.c ##
-@@ cache-tree.c: static void prime_cache_tree_rec(struct repository *r,
- * as normal.
- */
- if (r->index->sparse_index &&
-- index_entry_exists(r->index, tree_path->buf, tree_path->len))
-+ index_name_pos_sparse(r->index, tree_path->buf, tree_path->len) >= 0)
- prime_cache_tree_sparse_dir(sub->cache_tree, subtree);
- else
- prime_cache_tree_rec(r, sub->cache_tree, subtree, tree_path);
+ Signed-off-by: Victoria Dye <vdye@github.com>
## cache.h ##
@@ cache.h: struct cache_entry *index_file_exists(struct index_state *istate, const char *na
+ */
int index_name_pos(struct index_state *, const char *name, int namelen);
- /*
-- * Determines whether an entry with the given name exists within the
-- * given index. The return value is 1 if an exact match is found, otherwise
-- * it is 0. Note that, unlike index_name_pos, this function does not expand
-- * the index if it is sparse. If an item exists within the full index but it
-- * is contained within a sparse directory (and not in the sparse index), 0 is
-- * returned.
-- */
--int index_entry_exists(struct index_state *, const char *name, int namelen);
++/*
+ * Like index_name_pos, returns the position of an entry of the given name in
+ * the index if one exists, otherwise returns a negative value where the negated
+ * value minus 1 is the position where the index entry would be inserted. Unlike
@@ cache.h: struct cache_entry *index_file_exists(struct index_state *istate, const
+ * inside a sparse directory.
+ */
+int index_name_pos_sparse(struct index_state *, const char *name, int namelen);
-
++
/*
- * Some functions return the negative complement of an insert position when a
+ * Determines whether an entry with the given name exists within the
+ * given index. The return value is 1 if an exact match is found, otherwise
## read-cache.c ##
@@ read-cache.c: int index_name_pos(struct index_state *istate, const char *name, int namelen)
return index_name_stage_pos(istate, name, namelen, 0, EXPAND_SPARSE);
}
--int index_entry_exists(struct index_state *istate, const char *name, int namelen)
+int index_name_pos_sparse(struct index_state *istate, const char *name, int namelen)
- {
-- return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0;
++{
+ return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE);
- }
-
- int remove_index_entry_at(struct index_state *istate, int pos)
++}
++
+ int index_entry_exists(struct index_state *istate, const char *name, int namelen)
+ {
+ return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0;
4: 016971a6711 ! 4: 97ca668102c unpack-trees: handle missing sparse directories
@@ Metadata
Author: Victoria Dye <vdye@github.com>
## Commit message ##
- unpack-trees: handle missing sparse directories
+ unpack-trees: unpack new trees as sparse directories
- If a sparse directory does not exist in the index, unpack it at the
- directory level rather than recursing into it an unpacking its contents
- file-by-file. This helps keep the sparse index as collapsed as possible in
- cases such as 'git reset --hard' restoring a sparse directory.
+ If 'unpack_single_entry()' is unpacking a new directory tree (that is, one
+ not already present in the index) into a sparse index, unpack the tree as a
+ sparse directory rather than traversing its contents and unpacking each file
+ individually. This helps keep the sparse index as collapsed as possible in
+ cases such as 'git reset --hard' restoring a outside-of-cone directory
+ removed with 'git rm -r --sparse'.
- A directory is determined to be truly non-existent in the index (rather than
- the parent of existing index entries), if 1) its path is outside the sparse
- cone and 2) there are no children of the directory in the index. This check
- is performed by 'missing_dir_is_sparse()' in 'unpack_single_entry()'. If the
- directory is a missing sparse dir, 'unpack_single_entry()' will proceed
- with unpacking it. This determination is also propagated back up to
- 'unpack_callback()' via 'is_missing_sparse_dir' to prevent further tree
- traversal into the unpacked directory.
+ Without this patch, 'unpack_single_entry()' will only unpack a directory
+ into the index as a sparse directory (rather than traversing into it and
+ unpacking its files one-by-one) if an entry with the same name already
+ exists in the index. This patch allows sparse directory unpacking without a
+ matching index entry when the following conditions are met:
+
+ 1. the directory's path is outside the sparse cone, and
+ 2. there are no children of the directory in the index
+
+ If a directory meets these requirements (as determined by
+ 'is_new_sparse_dir()'), 'unpack_single_entry()' unpacks the sparse directory
+ index entry and propagates the decision back up to 'unpack_callback()' to
+ prevent unnecessary tree traversal into the unpacked directory.
Reported-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
@@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse
}
+/*
-+ * Determine whether the path specified corresponds to a sparse directory
-+ * completely missing from the index. This function is assumed to only be
-+ * called when the named path isn't already in the index.
++ * Determine whether the path specified by 'p' should be unpacked as a new
++ * sparse directory in a sparse index. A new sparse directory 'A/':
++ * - must be outside the sparse cone.
++ * - must not already be in the index (i.e., no index entry with name 'A/'
++ * exists).
++ * - must not have any child entries in the index (i.e., no index entry
++ * 'A/<something>' exists).
++ * If 'p' meets the above requirements, return 1; otherwise, return 0.
+ */
-+static int missing_dir_is_sparse(const struct traverse_info *info,
-+ const struct name_entry *p)
++static int entry_is_new_sparse_dir(const struct traverse_info *info,
++ const struct name_entry *p)
+{
+ int res, pos;
+ struct strbuf dirpath = STRBUF_INIT;
+ struct unpack_trees_options *o = info->data;
+
++ if (!S_ISDIR(p->mode))
++ return 0;
++
+ /*
-+ * First, check whether the path is in the sparse cone. If it is,
-+ * then this directory shouldn't be sparse.
++ * If the path is inside the sparse cone, it can't be a sparse directory.
+ */
+ strbuf_add(&dirpath, info->traverse_path, info->pathlen);
+ strbuf_add(&dirpath, p->path, p->pathlen);
@@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse
+ goto cleanup;
+ }
+
-+ /*
-+ * Given that the directory is not inside the sparse cone, it could be
-+ * (partially) expanded in the index. If child entries exist, the path
-+ * is not a missing sparse directory.
-+ */
+ pos = index_name_pos_sparse(o->src_index, dirpath.buf, dirpath.len);
-+ if (pos >= 0)
-+ BUG("cache entry '%s%s' shouldn't exist in the index",
-+ info->traverse_path, p->path);
++ if (pos >= 0) {
++ /* Path is already in the index, not a new sparse dir */
++ res = 0;
++ goto cleanup;
++ }
+
++ /* Where would this sparse dir be inserted into the index? */
+ pos = -pos - 1;
+ if (pos >= o->src_index->cache_nr) {
++ /*
++ * Sparse dir would be inserted at the end of the index, so we
++ * know it has no child entries.
++ */
+ res = 1;
+ goto cleanup;
+ }
+
++ /*
++ * If the dir has child entries in the index, the first would be at the
++ * position the sparse directory would be inserted. If the entry at this
++ * position is inside the dir, not a new sparse dir.
++ */
+ res = strncmp(o->src_index->cache[pos]->name, dirpath.buf, dirpath.len);
+
+cleanup:
@@ unpack-trees.c: static int unpack_single_entry(int n, unsigned long mask,
const struct name_entry *names,
- const struct traverse_info *info)
+ const struct traverse_info *info,
-+ int *is_missing_sparse_dir)
++ int *is_new_sparse_dir)
{
int i;
struct unpack_trees_options *o = info->data;
@@ unpack-trees.c: static int unpack_single_entry(int n, unsigned long mask,
- if (mask == dirmask && !src[0])
- return 0;
-+ *is_missing_sparse_dir = 0;
++ *is_new_sparse_dir = 0;
+ if (mask == dirmask && !src[0]) {
+ /*
-+ * If the directory is completely missing from the index but
-+ * would otherwise be a sparse directory, we should unpack it.
-+ * If not, we'll return and continue recursively traversing the
-+ * tree.
++ * If we're not in a sparse index, we can't unpack a directory
++ * without recursing into it, so we return.
+ */
+ if (!o->src_index->sparse_index)
+ return 0;
@@ unpack-trees.c: static int unpack_single_entry(int n, unsigned long mask,
+ while (!p->mode)
+ p++;
+
-+ *is_missing_sparse_dir = missing_dir_is_sparse(info, p);
-+ if (!*is_missing_sparse_dir)
++ /*
++ * If the directory is completely missing from the index but
++ * would otherwise be a sparse directory, we should unpack it.
++ * If not, we'll return and continue recursively traversing the
++ * tree.
++ */
++ *is_new_sparse_dir = entry_is_new_sparse_dir(info, p);
++ if (!*is_new_sparse_dir)
+ return 0;
+ }
@@ unpack-trees.c: static int unpack_single_entry(int n, unsigned long mask,
- if (mask == dirmask && src[0] &&
- S_ISSPARSEDIR(src[0]->ce_mode))
+ if (mask == dirmask &&
-+ (*is_missing_sparse_dir || (src[0] && S_ISSPARSEDIR(src[0]->ce_mode))))
++ (*is_new_sparse_dir || (src[0] && S_ISSPARSEDIR(src[0]->ce_mode))))
conflicts = 0;
/*
@@ unpack-trees.c: static int unpack_sparse_callback(int n, unsigned long mask, uns
struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
struct unpack_trees_options *o = info->data;
- int ret;
-+ int ret, is_missing_sparse_dir;
++ int ret, is_new_sparse_dir;
assert(o->merge);
@@ unpack-trees.c: static int unpack_sparse_callback(int n, unsigned long mask, uns
* 'dirmask' accordingly.
*/
- ret = unpack_single_entry(n - 1, mask >> 1, dirmask >> 1, src, names + 1, info);
-+ ret = unpack_single_entry(n - 1, mask >> 1, dirmask >> 1, src, names + 1, info, &is_missing_sparse_dir);
++ ret = unpack_single_entry(n - 1, mask >> 1, dirmask >> 1, src, names + 1, info, &is_new_sparse_dir);
if (src[0])
discard_cache_entry(src[0]);
@@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
struct unpack_trees_options *o = info->data;
const struct name_entry *p = names;
-+ int is_missing_sparse_dir;
++ int is_new_sparse_dir;
/* Find first entry with a real name (we could use "mask" too) */
while (!p->mode)
@@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
}
- if (unpack_single_entry(n, mask, dirmask, src, names, info) < 0)
-+ if (unpack_single_entry(n, mask, dirmask, src, names, info, &is_missing_sparse_dir))
++ if (unpack_single_entry(n, mask, dirmask, src, names, info, &is_new_sparse_dir))
return -1;
if (o->merge && src[0]) {
@@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
}
if (!is_sparse_directory_entry(src[0], names, info) &&
-+ !is_missing_sparse_dir &&
++ !is_new_sparse_dir &&
traverse_trees_recursive(n, dirmask, mask & ~dirmask,
names, info) < 0) {
return -1;
--
gitgitgadget
next prev parent reply other threads:[~2022-08-07 2:57 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-04 20:46 [PATCH 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
2022-08-04 20:46 ` [PATCH 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
2022-08-05 17:59 ` Derrick Stolee
2022-08-04 20:46 ` [PATCH 2/4] oneway_diff: handle removed sparse directories Victoria Dye via GitGitGadget
2022-08-04 20:46 ` [PATCH 3/4] cache.h: replace 'index_entry_exists()' with 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
2022-08-04 22:16 ` Junio C Hamano
2022-08-06 0:09 ` Junio C Hamano
2022-08-04 20:46 ` [PATCH 4/4] unpack-trees: handle missing sparse directories Victoria Dye via GitGitGadget
2022-08-04 23:23 ` Junio C Hamano
2022-08-05 16:36 ` Victoria Dye
2022-08-05 19:24 ` Junio C Hamano
2022-08-07 2:57 ` Victoria Dye via GitGitGadget [this message]
2022-08-07 2:57 ` [PATCH v2 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
2022-08-07 2:57 ` [PATCH v2 2/4] oneway_diff: handle removed sparse directories Victoria Dye via GitGitGadget
2022-08-07 2:57 ` [PATCH v2 3/4] cache.h: create 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
2022-08-07 2:57 ` [PATCH v2 4/4] unpack-trees: unpack new trees as sparse directories Victoria Dye via GitGitGadget
2022-08-08 19:07 ` [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
2022-08-08 19:07 ` [PATCH v3 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
2022-08-08 19:07 ` [PATCH v3 2/4] oneway_diff: handle removed sparse directories Victoria Dye via GitGitGadget
2022-08-08 19:07 ` [PATCH v3 3/4] cache.h: create 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
2022-08-08 19:07 ` [PATCH v3 4/4] unpack-trees: unpack new trees as sparse directories Victoria Dye via GitGitGadget
2022-08-08 21:17 ` [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs Junio C Hamano
2022-08-09 13:20 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1312.v2.git.1659841030.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=shaoxuan.yuan02@gmail.com \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).