From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>,
Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, newren@gmail.com, anh@canva.com
Subject: Re: [PATCH 0/5] sparse-index: improve clear_skip_worktree_from_present_files()
Date: Thu, 20 Jun 2024 16:21:20 -0400 [thread overview]
Message-ID: <77974c37-b82f-42fb-848f-e060495c4f49@gmail.com> (raw)
In-Reply-To: <xmqqiky3tofw.fsf@gitster.g>
On 6/20/24 3:16 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> While doing some investigation in a private monorepo with sparse-checkout
>> and a sparse index, I accidentally left a modified file outside of my
>> sparse-checkout cone. This caused my Git commands to slow to a crawl, so I
>> reran with GIT_TRACE2_PERF=1.
>>
>> While I was able to identify clear_skip_worktree_from_present_files() as the
>> culprit, it took longer than desired to figure out what was going on. This
>> series intends to both fix the performance issue (as much as possible) and
>> do some refactoring to make it easier to understand what is happening.
>>
>> In the end, I was able to reduce the number of lstat() calls in my case from
>> over 170,000 to about 6,500, improving the time from 2.5s to 71ms on a warm
>> disk cache. Thanks, Stolee
>
> That's impressive but I cannot offhand tell how big 170k (or 6.5k
> for that matter) is relative to the size of the tree. How many
> paths are there in the entire tree (i.e. "git ls-tree -r HEAD | wc
> -l") vs the number of the in-cone paths in the working tree?
>
> If 6.5k is in the same ballpark as the latter, it would be really
> good.
You're right, I didn't include the full context here. The repo has
about 2.1 million paths at HEAD, but most of them are sparse.
In Patch 5, I detail that there are 1,841,997 total sparse files in
the expanded index. Thus, the previous caching algorithm was already
doing decent work and calling lstat() 11x fewer times than the naive
implementation.
The new caching algorithm improves this to 6,521, which is a 282x
improvement over naive and and 26x improvement over the previous
caching algorithm.
But what you are really asking is how close this is to the optimal.
I didn't include that in Patch 5 details, but I was able to look at
my notes and see that the sparse_path_count data point was 1,962,
meaning there are that many sparse trees in the sparse index before
expanding. Thus, the 6,521 lstat() calls are 3.3x more than the
absolute minimum required.
Does that help answer the questions you had? I'm happy to provide
more information.
Thanks,
-Stolee
next prev parent reply other threads:[~2024-06-20 20:21 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-20 16:11 [PATCH 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Derrick Stolee via GitGitGadget
2024-06-20 16:11 ` [PATCH 1/5] sparse-index: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-24 22:12 ` Elijah Newren
2024-06-26 12:42 ` Derrick Stolee
2024-06-20 16:11 ` [PATCH 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-24 22:13 ` Elijah Newren
2024-06-26 12:43 ` Derrick Stolee
2024-06-20 16:11 ` [PATCH 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-24 22:13 ` Elijah Newren
2024-06-20 16:11 ` [PATCH 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-24 22:13 ` Elijah Newren
2024-06-20 16:11 ` [PATCH 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-24 22:14 ` Elijah Newren
2024-06-25 0:08 ` Junio C Hamano
2024-06-26 13:06 ` Derrick Stolee
2024-06-28 0:10 ` Elijah Newren
2024-06-20 19:16 ` [PATCH 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Junio C Hamano
2024-06-20 20:21 ` Derrick Stolee [this message]
2024-06-20 21:02 ` Junio C Hamano
2024-06-26 14:29 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2024-06-26 14:29 ` [PATCH v2 1/5] sparse-checkout: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-27 20:59 ` Junio C Hamano
2024-06-28 0:51 ` Elijah Newren
2024-06-28 1:49 ` Derrick Stolee
2024-06-28 5:50 ` Junio C Hamano
2024-06-28 0:31 ` Elijah Newren
2024-06-28 1:56 ` Derrick Stolee
2024-06-26 14:29 ` [PATCH v2 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-26 14:29 ` [PATCH v2 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-26 14:29 ` [PATCH v2 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-26 14:29 ` [PATCH v2 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-27 21:14 ` Junio C Hamano
2024-06-28 1:56 ` Derrick Stolee
2024-06-27 21:46 ` [PATCH v2 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Junio C Hamano
2024-06-28 0:59 ` Elijah Newren
2024-06-28 1:57 ` Derrick Stolee
2024-06-28 12:43 ` [PATCH v3 " Derrick Stolee via GitGitGadget
2024-06-28 12:43 ` [PATCH v3 1/5] sparse-checkout: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-28 12:43 ` [PATCH v3 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-28 12:43 ` [PATCH v3 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-28 12:43 ` [PATCH v3 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-28 12:43 ` [PATCH v3 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-28 15:07 ` [PATCH v3 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Elijah Newren
2024-06-28 19:34 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=77974c37-b82f-42fb-848f-e060495c4f49@gmail.com \
--to=stolee@gmail.com \
--cc=anh@canva.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).