git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: derrickstolee@github.com, gitster@pobox.com, newren@gmail.com,
	Victoria Dye <vdye@github.com>, Victoria Dye <vdye@github.com>
Subject: [PATCH v2 2/3] unpack-trees: increment cache_bottom for sparse directories
Date: Thu, 17 Mar 2022 15:55:35 +0000	[thread overview]
Message-ID: <cf8dc50c300a08f54d0cf69c82d4d979922a8143.1647532536.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1179.v2.git.1647532536.gitgitgadget@gmail.com>

From: Victoria Dye <vdye@github.com>

Correct tracking of the 'cache_bottom' for cases where sparse directories
are present in the index.

BACKGROUND
----------
The 'unpack_trees_options.cache_bottom' is a variable that tracks the
in-progress "bottom" of the cache as 'unpack_trees()' iterates through the
contents of the index. Most importantly, this value informs the sequential
return values of 'next_cache_entry()' which, in the "diff cache" usage of
'unpack_callback()', are either unpacked as-is or are passed into the diff
machinery.

The 'cache_bottom' is intended to track the position of the first entry in
the index that has not yet been diffed or unpacked. It is advanced in two
main ways: either it is incremented when an index entry is marked as "used"
(in 'mark_ce_used()'), indicating that it was unpacked or diffed, or when a
directory is unpacked, in which case it is increased by an amount equaling
the number of index entries inside that tree.

In 17a1bb570b (unpack-trees: preserve cache_bottom, 2021-07-14), it was
identified that sparse directories posed a problem to the above
'cache_bottom' advancement logic - because a sparse directory was both an
index entry that could be "used" and a directory that can be unpacked, the
'cache_bottom' would be incremented too many times. To solve this problem,
the 'mark_ce_used()' advancement of 'cache_bottom' was skipped for sparse
directories.

INCORRECT CACHE_BOTTOM TRACKING
-------------------------------
Skipping the 'cache_bottom' advancement for sparse directories in
'mark_ce_used()' breaks down in two cases:

1. When the 'unpack_trees()' operation is *not* a "cache diff" (because the
   directory contents-based incrementing of 'cache_bottom' does not happen).
2. When a cache diff is performed with a pathspec (because
   'unpack_index_entry()' will unpack a sparse directory not matched by the
   pathspec without performing the directory contents-based increment).

The former luckily does not appear to affect 'git' behavior, likely because
'cache_bottom' is largely unused (non-"cache diff" 'unpack_trees()' uses
'find_index_entry()' - rather than 'next_cache_entry()' - to find the index
entries to unpack).

The latter, however, causes 'cache_bottom' to "lag behind" its intended
position by an amount equal to the number of sparse directories unpacked so
far with 'unpack_index_entry()'. If a repository is structured such that any
sparse directories are ordered lexicographically *after* any
pathspec-matching directories, though, this issue won't present any adverse
behavior.

This was the case with the 't1092-sparse-checkout-compatibility.sh' tests
before the addition of the 'before/' sparse directory (ordered *before* the
in-cone 'deep/' directory), therefore sidestepping the issue. Once the
'before/' directory was added, though, 'cache_bottom' began to lag behind
its intended position, causing 'next_cache_entry()' to return index entries
it had already processed and, ultimately, an incorrect diff.

CORRECTING CACHE_BOTTOM
-----------------------
The problems observed in 't1092' come from 'cache_bottom' lagging behind in
cases where the cache tree-based advancement doesn't occur. To solve this,
then, the fix in 17a1bb570b is "reversed"; rather than skipping
'cache_bottom' advancement in 'mark_ce_used()', we skip the directory
contents-based advancement for sparse directories. Now, every index entry
can be accounted for in 'cache_bottom':

* if you're working with a single index entry, 'cache_bottom' is incremented
  in 'mark_ce_used()'
* if you're working with a directory that contains index entries (but is not
  one itself), 'cache_bottom' is incremented by the number of entries in
  that directory.

Finally, change the 'test_expect_failure' tests in 't1092' failing due to
this bug back to 'test_expect_success'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  6 +++---
 unpack-trees.c                           | 16 ++++++++--------
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index dcd7061fb3b..236ab530284 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -340,7 +340,7 @@ test_expect_success 'deep changes during checkout' '
 	test_all_match git checkout base
 '
 
-test_expect_failure 'add outside sparse cone' '
+test_expect_success 'add outside sparse cone' '
 	init_repos &&
 
 	run_on_sparse mkdir folder1 &&
@@ -382,7 +382,7 @@ test_expect_success 'commit including unstaged changes' '
 	test_all_match git status --porcelain=v2
 '
 
-test_expect_failure 'status/add: outside sparse cone' '
+test_expect_success 'status/add: outside sparse cone' '
 	init_repos &&
 
 	# folder1 is at HEAD, but outside the sparse cone
@@ -593,7 +593,7 @@ test_expect_success 'checkout and reset (keep)' '
 	test_all_match test_must_fail git reset --keep deepest
 '
 
-test_expect_failure 'reset with pathspecs inside sparse definition' '
+test_expect_success 'reset with pathspecs inside sparse definition' '
 	init_repos &&
 
 	write_script edit-contents <<-\EOF &&
diff --git a/unpack-trees.c b/unpack-trees.c
index 2763a029a17..b82c1a9705e 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -595,13 +595,6 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
 {
 	ce->ce_flags |= CE_UNPACKED;
 
-	/*
-	 * If this is a sparse directory, don't advance cache_bottom.
-	 * That will be advanced later using the cache-tree data.
-	 */
-	if (S_ISSPARSEDIR(ce->ce_mode))
-		return;
-
 	if (o->cache_bottom < o->src_index->cache_nr &&
 	    o->src_index->cache[o->cache_bottom] == ce) {
 		int bottom = o->cache_bottom;
@@ -1478,7 +1471,14 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			 * it does not do any look-ahead, so this is safe.
 			 */
 			if (matches) {
-				o->cache_bottom += matches;
+				/*
+				 * Only increment the cache_bottom if the
+				 * directory isn't a sparse directory index
+				 * entry (if it is, it was already incremented)
+				 * in 'mark_ce_used()'
+				 */
+				if (!src[0] || !S_ISSPARSEDIR(src[0]->ce_mode))
+					o->cache_bottom += matches;
 				return mask;
 			}
 		}
-- 
gitgitgadget


  parent reply	other threads:[~2022-03-17 15:55 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-16 20:11 [PATCH 0/3] Fix use of 'cache_bottom' in sparse index Victoria Dye via GitGitGadget
2022-03-16 20:12 ` [PATCH 1/3] t1092: add sparse directory before cone in test repo Victoria Dye via GitGitGadget
2022-03-16 20:12 ` [PATCH 2/3] unpack-trees: increment cache_bottom for sparse directories Victoria Dye via GitGitGadget
2022-03-16 20:12 ` [PATCH 3/3] Revert "unpack-trees: improve performance of next_cache_entry" Victoria Dye via GitGitGadget
2022-03-16 20:21 ` [PATCH 0/3] Fix use of 'cache_bottom' in sparse index Junio C Hamano
2022-03-17 15:55 ` [PATCH v2 " Victoria Dye via GitGitGadget
2022-03-17 15:55   ` [PATCH v2 1/3] t1092: add sparse directory before cone in test repo Victoria Dye via GitGitGadget
2022-03-17 15:55   ` Victoria Dye via GitGitGadget [this message]
2022-03-21 19:03     ` [PATCH v2 2/3] unpack-trees: increment cache_bottom for sparse directories Derrick Stolee
2022-03-21 20:52       ` Junio C Hamano
2022-03-17 15:55   ` [PATCH v2 3/3] Revert "unpack-trees: improve performance of next_cache_entry" Victoria Dye via GitGitGadget
2022-03-21 19:12   ` [PATCH v2 0/3] Fix use of 'cache_bottom' in sparse index Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cf8dc50c300a08f54d0cf69c82d4d979922a8143.1647532536.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).