From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, newren@gmail.com, anh@canva.com,
Derrick Stolee <stolee@gmail.com>
Subject: [PATCH v3 0/5] sparse-index: improve clear_skip_worktree_from_present_files()
Date: Fri, 28 Jun 2024 12:43:20 +0000 [thread overview]
Message-ID: <pull.1754.v3.git.1719578605.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1754.v2.git.1719412192.gitgitgadget@gmail.com>
While doing some investigation in a private monorepo with sparse-checkout
and a sparse index, I accidentally left a modified file outside of my
sparse-checkout cone. This caused my Git commands to slow to a crawl, so I
reran with GIT_TRACE2_PERF=1.
While I was able to identify clear_skip_worktree_from_present_files() as the
culprit, it took longer than desired to figure out what was going on. This
series intends to both fix the performance issue (as much as possible) and
do some refactoring to make it easier to understand what is happening.
In the end, I was able to reduce the number of lstat() calls in my case from
over 1.1 million to about 4,400, improving the time from 13.4s to 81ms on a
warm disk cache. (These numbers are from a test after v2, which somehow hit
the old caching algorithm even worse than my test in v1.)
Updates in v3
=============
* Removed the incorrect paragraph in the commit message of patch 1.
* Replaced "largest" with "longest" in the final patch.
Thanks, Stolee
Derrick Stolee (5):
sparse-checkout: refactor skip worktree retry logic
sparse-index: refactor path_found()
sparse-index: use strbuf in path_found()
sparse-index: count lstat() calls
sparse-index: improve lstat caching of sparse paths
sparse-index.c | 216 +++++++++++++++++++++++++++++++++++++------------
1 file changed, 164 insertions(+), 52 deletions(-)
base-commit: 66ac6e4bcd111be3fa9c2a6b3fafea718d00678d
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1754%2Fderrickstolee%2Fclear-skip-speed-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1754/derrickstolee/clear-skip-speed-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1754
Range-diff vs v2:
1: 93d0baed0b0 ! 1: 0844cda94cf sparse-checkout: refactor skip worktree retry logic
@@ Commit message
stored in the index, so caching was introduced in d79d299352 (Accelerate
clear_skip_worktree_from_present_files() by caching, 2022-01-14).
- If users are having trouble with the performance of this operation and
- don't care about paths outside of the sparse-checkout, they can disable
- them using the sparse.expectFilesOutsideOfPatterns config option
- introduced in ecc7c8841d (repo_read_index: add config to expect files
- outside sparse patterns, 2022-02-25).
-
This check is particularly confusing in the presence of a sparse index,
as a sparse tree entry corresponding to an existing directory must first
be expanded to a full index before examining the paths within. This is
2: 69c3beaabf7 = 2: c242e2c9168 sparse-index: refactor path_found()
3: 0a82e6b4183 = 3: ad63bf746ca sparse-index: use strbuf in path_found()
4: 9549f5b8062 = 4: db6ded0df0d sparse-index: count lstat() calls
5: 0cb344ac14f ! 5: 1f58e19691f sparse-index: improve lstat caching of sparse paths
@@ sparse-index.c: static void clear_path_found_data(struct path_found_data *data)
}
+/**
-+ * Return the length of the largest common substring that ends in a
-+ * slash ('/') to indicate the largest common parent directory. Returns
++ * Return the length of the longest common substring that ends in a
++ * slash ('/') to indicate the longest common parent directory. Returns
+ * zero if no common directory exists.
+ */
+static size_t max_common_dir_prefix(const char *path1, const char *path2)
--
gitgitgadget
next prev parent reply other threads:[~2024-06-28 12:43 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-20 16:11 [PATCH 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Derrick Stolee via GitGitGadget
2024-06-20 16:11 ` [PATCH 1/5] sparse-index: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-24 22:12 ` Elijah Newren
2024-06-26 12:42 ` Derrick Stolee
2024-06-20 16:11 ` [PATCH 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-24 22:13 ` Elijah Newren
2024-06-26 12:43 ` Derrick Stolee
2024-06-20 16:11 ` [PATCH 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-24 22:13 ` Elijah Newren
2024-06-20 16:11 ` [PATCH 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-24 22:13 ` Elijah Newren
2024-06-20 16:11 ` [PATCH 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-24 22:14 ` Elijah Newren
2024-06-25 0:08 ` Junio C Hamano
2024-06-26 13:06 ` Derrick Stolee
2024-06-28 0:10 ` Elijah Newren
2024-06-20 19:16 ` [PATCH 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Junio C Hamano
2024-06-20 20:21 ` Derrick Stolee
2024-06-20 21:02 ` Junio C Hamano
2024-06-26 14:29 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2024-06-26 14:29 ` [PATCH v2 1/5] sparse-checkout: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-27 20:59 ` Junio C Hamano
2024-06-28 0:51 ` Elijah Newren
2024-06-28 1:49 ` Derrick Stolee
2024-06-28 5:50 ` Junio C Hamano
2024-06-28 0:31 ` Elijah Newren
2024-06-28 1:56 ` Derrick Stolee
2024-06-26 14:29 ` [PATCH v2 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-26 14:29 ` [PATCH v2 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-26 14:29 ` [PATCH v2 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-26 14:29 ` [PATCH v2 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-27 21:14 ` Junio C Hamano
2024-06-28 1:56 ` Derrick Stolee
2024-06-27 21:46 ` [PATCH v2 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Junio C Hamano
2024-06-28 0:59 ` Elijah Newren
2024-06-28 1:57 ` Derrick Stolee
2024-06-28 12:43 ` Derrick Stolee via GitGitGadget [this message]
2024-06-28 12:43 ` [PATCH v3 1/5] sparse-checkout: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-28 12:43 ` [PATCH v3 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-28 12:43 ` [PATCH v3 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-28 12:43 ` [PATCH v3 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-28 12:43 ` [PATCH v3 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-28 15:07 ` [PATCH v3 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Elijah Newren
2024-06-28 19:34 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1754.v3.git.1719578605.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=anh@canva.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.