From: Elijah Newren <newren@gmail.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com, anh@canva.com,
Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v3 0/5] sparse-index: improve clear_skip_worktree_from_present_files()
Date: Fri, 28 Jun 2024 08:07:44 -0700 [thread overview]
Message-ID: <CABPp-BFd7Bk68Omdao5LS0sP5bK1WQ7V6dodB5x8EsncNARxNA@mail.gmail.com> (raw)
In-Reply-To: <pull.1754.v3.git.1719578605.gitgitgadget@gmail.com>
On Fri, Jun 28, 2024 at 5:43 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> While doing some investigation in a private monorepo with sparse-checkout
> and a sparse index, I accidentally left a modified file outside of my
> sparse-checkout cone. This caused my Git commands to slow to a crawl, so I
> reran with GIT_TRACE2_PERF=1.
>
> While I was able to identify clear_skip_worktree_from_present_files() as the
> culprit, it took longer than desired to figure out what was going on. This
> series intends to both fix the performance issue (as much as possible) and
> do some refactoring to make it easier to understand what is happening.
>
> In the end, I was able to reduce the number of lstat() calls in my case from
> over 1.1 million to about 4,400, improving the time from 13.4s to 81ms on a
> warm disk cache. (These numbers are from a test after v2, which somehow hit
> the old caching algorithm even worse than my test in v1.)
>
>
> Updates in v3
> =============
>
> * Removed the incorrect paragraph in the commit message of patch 1.
> * Replaced "largest" with "longest" in the final patch.
>
> Thanks, Stolee
>
> Derrick Stolee (5):
> sparse-checkout: refactor skip worktree retry logic
> sparse-index: refactor path_found()
> sparse-index: use strbuf in path_found()
> sparse-index: count lstat() calls
> sparse-index: improve lstat caching of sparse paths
>
> sparse-index.c | 216 +++++++++++++++++++++++++++++++++++++------------
> 1 file changed, 164 insertions(+), 52 deletions(-)
>
>
> base-commit: 66ac6e4bcd111be3fa9c2a6b3fafea718d00678d
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1754%2Fderrickstolee%2Fclear-skip-speed-v3
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1754/derrickstolee/clear-skip-speed-v3
> Pull-Request: https://github.com/gitgitgadget/git/pull/1754
>
> Range-diff vs v2:
>
> 1: 93d0baed0b0 ! 1: 0844cda94cf sparse-checkout: refactor skip worktree retry logic
> @@ Commit message
> stored in the index, so caching was introduced in d79d299352 (Accelerate
> clear_skip_worktree_from_present_files() by caching, 2022-01-14).
>
> - If users are having trouble with the performance of this operation and
> - don't care about paths outside of the sparse-checkout, they can disable
> - them using the sparse.expectFilesOutsideOfPatterns config option
> - introduced in ecc7c8841d (repo_read_index: add config to expect files
> - outside sparse patterns, 2022-02-25).
> -
> This check is particularly confusing in the presence of a sparse index,
> as a sparse tree entry corresponding to an existing directory must first
> be expanded to a full index before examining the paths within. This is
> 2: 69c3beaabf7 = 2: c242e2c9168 sparse-index: refactor path_found()
> 3: 0a82e6b4183 = 3: ad63bf746ca sparse-index: use strbuf in path_found()
> 4: 9549f5b8062 = 4: db6ded0df0d sparse-index: count lstat() calls
> 5: 0cb344ac14f ! 5: 1f58e19691f sparse-index: improve lstat caching of sparse paths
> @@ sparse-index.c: static void clear_path_found_data(struct path_found_data *data)
> }
>
> +/**
> -+ * Return the length of the largest common substring that ends in a
> -+ * slash ('/') to indicate the largest common parent directory. Returns
> ++ * Return the length of the longest common substring that ends in a
> ++ * slash ('/') to indicate the longest common parent directory. Returns
> + * zero if no common directory exists.
> + */
> +static size_t max_common_dir_prefix(const char *path1, const char *path2)
>
> --
> gitgitgadget
This version covers the last two outstanding items.
Reviewed-by: Elijah Newren <newren@gmail.com>
next prev parent reply other threads:[~2024-06-28 15:07 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-20 16:11 [PATCH 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Derrick Stolee via GitGitGadget
2024-06-20 16:11 ` [PATCH 1/5] sparse-index: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-24 22:12 ` Elijah Newren
2024-06-26 12:42 ` Derrick Stolee
2024-06-20 16:11 ` [PATCH 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-24 22:13 ` Elijah Newren
2024-06-26 12:43 ` Derrick Stolee
2024-06-20 16:11 ` [PATCH 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-24 22:13 ` Elijah Newren
2024-06-20 16:11 ` [PATCH 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-24 22:13 ` Elijah Newren
2024-06-20 16:11 ` [PATCH 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-24 22:14 ` Elijah Newren
2024-06-25 0:08 ` Junio C Hamano
2024-06-26 13:06 ` Derrick Stolee
2024-06-28 0:10 ` Elijah Newren
2024-06-20 19:16 ` [PATCH 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Junio C Hamano
2024-06-20 20:21 ` Derrick Stolee
2024-06-20 21:02 ` Junio C Hamano
2024-06-26 14:29 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2024-06-26 14:29 ` [PATCH v2 1/5] sparse-checkout: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-27 20:59 ` Junio C Hamano
2024-06-28 0:51 ` Elijah Newren
2024-06-28 1:49 ` Derrick Stolee
2024-06-28 5:50 ` Junio C Hamano
2024-06-28 0:31 ` Elijah Newren
2024-06-28 1:56 ` Derrick Stolee
2024-06-26 14:29 ` [PATCH v2 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-26 14:29 ` [PATCH v2 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-26 14:29 ` [PATCH v2 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-26 14:29 ` [PATCH v2 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-27 21:14 ` Junio C Hamano
2024-06-28 1:56 ` Derrick Stolee
2024-06-27 21:46 ` [PATCH v2 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Junio C Hamano
2024-06-28 0:59 ` Elijah Newren
2024-06-28 1:57 ` Derrick Stolee
2024-06-28 12:43 ` [PATCH v3 " Derrick Stolee via GitGitGadget
2024-06-28 12:43 ` [PATCH v3 1/5] sparse-checkout: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-28 12:43 ` [PATCH v3 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-28 12:43 ` [PATCH v3 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-28 12:43 ` [PATCH v3 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-28 12:43 ` [PATCH v3 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-28 15:07 ` Elijah Newren [this message]
2024-06-28 19:34 ` [PATCH v3 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CABPp-BFd7Bk68Omdao5LS0sP5bK1WQ7V6dodB5x8EsncNARxNA@mail.gmail.com \
--to=newren@gmail.com \
--cc=anh@canva.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).