From: Junio C Hamano <gitster@pobox.com>
To: "Anh Le via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Timothy Jones <timothy@canva.com>,
Jeff Hostetler <jeffhost@microsoft.com>, Anh Le <anh@canva.com>
Subject: Re: [PATCH] index: add trace2 region for clear skip worktree
Date: Tue, 25 Oct 2022 20:16:45 -0700 [thread overview]
Message-ID: <xmqq35bbmfz6.fsf@gitster.g> (raw)
In-Reply-To: pull.1368.git.git.1666742722502.gitgitgadget@gmail.com
"Anh Le via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Anh Le <anh@canva.com>
>
> In a large repository using sparse checkout, checking
> whether a file marked with skip worktree is present
> on disk and its skip worktree bit should be cleared
> can take a considerable amount of time. Add a trace2
> region to surface this information.
>
> Signed-off-by: Anh Le <anh@canva.com>
> ---
> index: add trace2 region for clear skip worktree
>
> In large repository using sparse checkout, checking whether a file
> marked with skip worktree is present on disk and its skip worktree bit
> should be cleared can take a considerable amount of time. Add a trace2
> region to surface this information.
It is easy to see that the change is no-op from functionality's
standpoint. The condition under which ce->ce_flags loses the
CE_SKIP_WORKTREE bit is the same as before, and the only change is
that in the iteration a couple of variables are incremented, which
may (or may not) have performance impact, but shouldn't break
correctness.
I am not sure about the value of these counters, honestly.
If we hit a sparse dir just once, we fully flatten the index and go
back to the restart state, and it would be a bug if we still see a
sparsified directory in the in-core index after that point, so
restart_count would be at most one, wouldn't it? Why do we even
need to count with intmax_t?
Similarly, path_count is bounded by istate->cache_nr. As you know
the approximate size of your project, I am not sure what extra
information you want to get by counting the paths with the skip bit
set before the first sparsified directory in the in-core index
twice, and the other paths with the skip bit set just once, and
adding these numbers together. Also again, I am not quite sure what
the point is to count the paths in intmax_t.
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1368%2FHaizzz%2Fmaster-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1368/Haizzz/master-v1
> Pull-Request: https://github.com/git/git/pull/1368
>
> sparse-index.c | 21 +++++++++++++++------
> 1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/sparse-index.c b/sparse-index.c
> index e4a54ce1943..d11049c8aeb 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -493,24 +493,33 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
> int dir_found = 1;
>
> int i;
> + intmax_t path_count = 0;
> + intmax_t restart_count = 0;
>
> if (!core_apply_sparse_checkout ||
> sparse_expect_files_outside_of_patterns)
> return;
>
> + trace2_region_enter("index", "clear_skip_worktree_from_present_files", istate->repo);
> restart:
> for (i = 0; i < istate->cache_nr; i++) {
> struct cache_entry *ce = istate->cache[i];
>
> - if (ce_skip_worktree(ce) &&
> - path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
> - if (S_ISSPARSEDIR(ce->ce_mode)) {
> - ensure_full_index(istate);
> - goto restart;
> + if (ce_skip_worktree(ce)) {
> + path_count++;
> + if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
> + if (S_ISSPARSEDIR(ce->ce_mode)) {
> + ensure_full_index(istate);
> + restart_count++;
> + goto restart;
> + }
> + ce->ce_flags &= ~CE_SKIP_WORKTREE;
> }
> - ce->ce_flags &= ~CE_SKIP_WORKTREE;
> }
> }
> + trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/path_count", path_count);
> + trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/restart_count", restart_count);
> + trace2_region_leave("index", "clear_skip_worktree_from_present_files", istate->repo);
> }
>
> /*
>
> base-commit: 1fc3c0ad407008c2f71dd9ae1241d8b75f8ef886
next prev parent reply other threads:[~2022-10-26 3:16 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-26 0:05 [PATCH] index: add trace2 region for clear skip worktree Anh Le via GitGitGadget
2022-10-26 3:16 ` Junio C Hamano [this message]
2022-10-26 14:13 ` Jeff Hostetler
2022-10-26 16:01 ` Junio C Hamano
2022-10-26 18:29 ` Jeff Hostetler
2022-10-27 0:04 ` Anh Le
2022-10-28 0:46 ` [PATCH v2] " Anh Le via GitGitGadget
2022-10-28 15:49 ` Derrick Stolee
2022-10-28 17:17 ` Junio C Hamano
2022-10-30 23:28 ` Anh Le
2022-10-28 16:50 ` Jeff Hostetler
2022-10-31 0:56 ` [PATCH v3] " Anh Le via GitGitGadget
2022-10-31 22:34 ` Taylor Blau
2022-11-03 23:04 ` [PATCH v4 0/2] " Anh Le via GitGitGadget
2022-11-03 23:05 ` [PATCH v4 1/2] " Anh Le via GitGitGadget
2022-11-03 23:05 ` [PATCH v4 2/2] index: raise a bug if the index is materialised more than once Anh Le via GitGitGadget
2022-11-05 0:29 ` [PATCH v4 0/2] index: add trace2 region for clear skip worktree Taylor Blau
2022-11-07 20:50 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq35bbmfz6.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=anh@canva.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=jeffhost@microsoft.com \
--cc=timothy@canva.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.