From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, newren@gmail.com, anh@canva.com,
Derrick Stolee <stolee@gmail.com>,
Derrick Stolee <stolee@gmail.com>
Subject: [PATCH v3 1/5] sparse-checkout: refactor skip worktree retry logic
Date: Fri, 28 Jun 2024 12:43:21 +0000 [thread overview]
Message-ID: <0844cda94cffd76ea9e86a0837731cfb4dc7bf88.1719578605.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1754.v3.git.1719578605.gitgitgadget@gmail.com>
From: Derrick Stolee <stolee@gmail.com>
The clear_skip_worktree_from_present_files() method was introduced in
af6a51875a (repo_read_index: clear SKIP_WORKTREE bit from files present
in worktree, 2022-01-14) to help cases where sparse-checkout is enabled
but some paths outside of the sparse-checkout also exist on disk. This
operation can be slow as it needs to check path existence in a way not
stored in the index, so caching was introduced in d79d299352 (Accelerate
clear_skip_worktree_from_present_files() by caching, 2022-01-14).
This check is particularly confusing in the presence of a sparse index,
as a sparse tree entry corresponding to an existing directory must first
be expanded to a full index before examining the paths within. This is
currently implemented using a 'goto' and a boolean variable to ensure we
restart only once.
Even with that caching, it was noticed that this could take a long time
to execute. 89aaab11a3 (index: add trace2 region for clear skip
worktree, 2022-11-03) introduced trace2 regions to measure this time.
Further, the way the loop repeats itself was slightly confusing and
prone to breakage, so a BUG() statement was added in 8c7abdc596 (index:
raise a bug if the index is materialised more than once, 2022-11-03) to
be sure that the second run of the loop does not hit any sparse trees.
One thing that can be confusing about the current setup is that the
trace2 regions nest and it is not clear that a second loop is running
after a sparse index is expanded. Here is an example of what the regions
look like in a typical case:
| region_enter | ... | label:clear_skip_worktree_from_present_files
| region_enter | ... | ..label:update
| region_leave | ... | ..label:update
| region_enter | ... | ..label:ensure_full_index
| region_enter | ... | ....label:update
| region_leave | ... | ....label:update
| region_leave | ... | ..label:ensure_full_index
| data | ... | ..sparse_path_count:1
| data | ... | ..sparse_path_count_full:269538
| region_leave | ... | label:clear_skip_worktree_from_present_files
One thing that is particularly difficult to understand about these
regions is that most of the time is spent between the close of the
ensure_full_index region and the reporting of the end data. This is
because of the restart of the loop being within the same region as the
first iteration of the loop.
This change refactors the method into two separate methods that are
traced separately. This will be more important later when we change
other features of the methods, but for now the only functional change is
the difference in the structure of the trace regions.
After this change, the same telemetry section is split into three
distinct chunks:
| region_enter | ... | label:clear_skip_worktree_from_present_files_sparse
| data | ... | ..sparse_path_count:1
| region_leave | ... | label:clear_skip_worktree_from_present_files_sparse
| region_enter | ... | label:update
| region_leave | ... | label:update
| region_enter | ... | label:ensure_full_index
| region_enter | ... | ..label:update
| region_leave | ... | ..label:update
| region_leave | ... | label:ensure_full_index
| region_enter | ... | label:clear_skip_worktree_from_present_files_full
| data | ... | ..full_path_count:269538
| region_leave | ... | label:clear_skip_worktree_from_present_files_full
Here, we see the sparse loop terminating early with its first sparse
path being a sparse directory containing a file. Then, that loop's
region terminates before ensure_full_index begins (in this case, the
cache-tree must also be computed). Then, _after_ the index is expanded,
the full loop begins with its own region.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
sparse-index.c | 77 ++++++++++++++++++++++++++++++++++----------------
1 file changed, 53 insertions(+), 24 deletions(-)
diff --git a/sparse-index.c b/sparse-index.c
index e48e40cae71..e0457c87fff 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -486,49 +486,78 @@ static int path_found(const char *path, const char **dirname, size_t *dir_len,
return 0;
}
-void clear_skip_worktree_from_present_files(struct index_state *istate)
+static int clear_skip_worktree_from_present_files_sparse(struct index_state *istate)
{
const char *last_dirname = NULL;
size_t dir_len = 0;
int dir_found = 1;
- int i;
- int path_count[2] = {0, 0};
- int restarted = 0;
+ int path_count = 0;
+ int to_restart = 0;
- if (!core_apply_sparse_checkout ||
- sparse_expect_files_outside_of_patterns)
- return;
-
- trace2_region_enter("index", "clear_skip_worktree_from_present_files",
+ trace2_region_enter("index", "clear_skip_worktree_from_present_files_sparse",
istate->repo);
-restart:
- for (i = 0; i < istate->cache_nr; i++) {
+ for (int i = 0; i < istate->cache_nr; i++) {
struct cache_entry *ce = istate->cache[i];
if (ce_skip_worktree(ce)) {
- path_count[restarted]++;
+ path_count++;
if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
if (S_ISSPARSEDIR(ce->ce_mode)) {
- if (restarted)
- BUG("ensure-full-index did not fully flatten?");
- ensure_full_index(istate);
- restarted = 1;
- goto restart;
+ to_restart = 1;
+ break;
}
ce->ce_flags &= ~CE_SKIP_WORKTREE;
}
}
}
- if (path_count[0])
- trace2_data_intmax("index", istate->repo,
- "sparse_path_count", path_count[0]);
- if (restarted)
- trace2_data_intmax("index", istate->repo,
- "sparse_path_count_full", path_count[1]);
- trace2_region_leave("index", "clear_skip_worktree_from_present_files",
+ trace2_data_intmax("index", istate->repo,
+ "sparse_path_count", path_count);
+ trace2_region_leave("index", "clear_skip_worktree_from_present_files_sparse",
+ istate->repo);
+ return to_restart;
+}
+
+static void clear_skip_worktree_from_present_files_full(struct index_state *istate)
+{
+ const char *last_dirname = NULL;
+ size_t dir_len = 0;
+ int dir_found = 1;
+
+ int path_count = 0;
+
+ trace2_region_enter("index", "clear_skip_worktree_from_present_files_full",
istate->repo);
+ for (int i = 0; i < istate->cache_nr; i++) {
+ struct cache_entry *ce = istate->cache[i];
+
+ if (S_ISSPARSEDIR(ce->ce_mode))
+ BUG("ensure-full-index did not fully flatten?");
+
+ if (ce_skip_worktree(ce)) {
+ path_count++;
+ if (path_found(ce->name, &last_dirname, &dir_len, &dir_found))
+ ce->ce_flags &= ~CE_SKIP_WORKTREE;
+ }
+ }
+
+ trace2_data_intmax("index", istate->repo,
+ "full_path_count", path_count);
+ trace2_region_leave("index", "clear_skip_worktree_from_present_files_full",
+ istate->repo);
+}
+
+void clear_skip_worktree_from_present_files(struct index_state *istate)
+{
+ if (!core_apply_sparse_checkout ||
+ sparse_expect_files_outside_of_patterns)
+ return;
+
+ if (clear_skip_worktree_from_present_files_sparse(istate)) {
+ ensure_full_index(istate);
+ clear_skip_worktree_from_present_files_full(istate);
+ }
}
/*
--
gitgitgadget
next prev parent reply other threads:[~2024-06-28 12:43 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-20 16:11 [PATCH 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Derrick Stolee via GitGitGadget
2024-06-20 16:11 ` [PATCH 1/5] sparse-index: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-24 22:12 ` Elijah Newren
2024-06-26 12:42 ` Derrick Stolee
2024-06-20 16:11 ` [PATCH 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-24 22:13 ` Elijah Newren
2024-06-26 12:43 ` Derrick Stolee
2024-06-20 16:11 ` [PATCH 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-24 22:13 ` Elijah Newren
2024-06-20 16:11 ` [PATCH 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-24 22:13 ` Elijah Newren
2024-06-20 16:11 ` [PATCH 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-24 22:14 ` Elijah Newren
2024-06-25 0:08 ` Junio C Hamano
2024-06-26 13:06 ` Derrick Stolee
2024-06-28 0:10 ` Elijah Newren
2024-06-20 19:16 ` [PATCH 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Junio C Hamano
2024-06-20 20:21 ` Derrick Stolee
2024-06-20 21:02 ` Junio C Hamano
2024-06-26 14:29 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2024-06-26 14:29 ` [PATCH v2 1/5] sparse-checkout: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-27 20:59 ` Junio C Hamano
2024-06-28 0:51 ` Elijah Newren
2024-06-28 1:49 ` Derrick Stolee
2024-06-28 5:50 ` Junio C Hamano
2024-06-28 0:31 ` Elijah Newren
2024-06-28 1:56 ` Derrick Stolee
2024-06-26 14:29 ` [PATCH v2 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-26 14:29 ` [PATCH v2 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-26 14:29 ` [PATCH v2 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-26 14:29 ` [PATCH v2 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-27 21:14 ` Junio C Hamano
2024-06-28 1:56 ` Derrick Stolee
2024-06-27 21:46 ` [PATCH v2 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Junio C Hamano
2024-06-28 0:59 ` Elijah Newren
2024-06-28 1:57 ` Derrick Stolee
2024-06-28 12:43 ` [PATCH v3 " Derrick Stolee via GitGitGadget
2024-06-28 12:43 ` Derrick Stolee via GitGitGadget [this message]
2024-06-28 12:43 ` [PATCH v3 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-28 12:43 ` [PATCH v3 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-28 12:43 ` [PATCH v3 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-28 12:43 ` [PATCH v3 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-28 15:07 ` [PATCH v3 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Elijah Newren
2024-06-28 19:34 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0844cda94cffd76ea9e86a0837731cfb4dc7bf88.1719578605.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=anh@canva.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).