git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com, anh@canva.com,
	 Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v3 0/5] sparse-index: improve clear_skip_worktree_from_present_files()
Date: Fri, 28 Jun 2024 08:07:44 -0700	[thread overview]
Message-ID: <CABPp-BFd7Bk68Omdao5LS0sP5bK1WQ7V6dodB5x8EsncNARxNA@mail.gmail.com> (raw)
In-Reply-To: <pull.1754.v3.git.1719578605.gitgitgadget@gmail.com>

On Fri, Jun 28, 2024 at 5:43 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> While doing some investigation in a private monorepo with sparse-checkout
> and a sparse index, I accidentally left a modified file outside of my
> sparse-checkout cone. This caused my Git commands to slow to a crawl, so I
> reran with GIT_TRACE2_PERF=1.
>
> While I was able to identify clear_skip_worktree_from_present_files() as the
> culprit, it took longer than desired to figure out what was going on. This
> series intends to both fix the performance issue (as much as possible) and
> do some refactoring to make it easier to understand what is happening.
>
> In the end, I was able to reduce the number of lstat() calls in my case from
> over 1.1 million to about 4,400, improving the time from 13.4s to 81ms on a
> warm disk cache. (These numbers are from a test after v2, which somehow hit
> the old caching algorithm even worse than my test in v1.)
>
>
> Updates in v3
> =============
>
>  * Removed the incorrect paragraph in the commit message of patch 1.
>  * Replaced "largest" with "longest" in the final patch.
>
> Thanks, Stolee
>
> Derrick Stolee (5):
>   sparse-checkout: refactor skip worktree retry logic
>   sparse-index: refactor path_found()
>   sparse-index: use strbuf in path_found()
>   sparse-index: count lstat() calls
>   sparse-index: improve lstat caching of sparse paths
>
>  sparse-index.c | 216 +++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 164 insertions(+), 52 deletions(-)
>
>
> base-commit: 66ac6e4bcd111be3fa9c2a6b3fafea718d00678d
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1754%2Fderrickstolee%2Fclear-skip-speed-v3
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1754/derrickstolee/clear-skip-speed-v3
> Pull-Request: https://github.com/gitgitgadget/git/pull/1754
>
> Range-diff vs v2:
>
>  1:  93d0baed0b0 ! 1:  0844cda94cf sparse-checkout: refactor skip worktree retry logic
>      @@ Commit message
>           stored in the index, so caching was introduced in d79d299352 (Accelerate
>           clear_skip_worktree_from_present_files() by caching, 2022-01-14).
>
>      -    If users are having trouble with the performance of this operation and
>      -    don't care about paths outside of the sparse-checkout, they can disable
>      -    them using the sparse.expectFilesOutsideOfPatterns config option
>      -    introduced in ecc7c8841d (repo_read_index: add config to expect files
>      -    outside sparse patterns, 2022-02-25).
>      -
>           This check is particularly confusing in the presence of a sparse index,
>           as a sparse tree entry corresponding to an existing directory must first
>           be expanded to a full index before examining the paths within. This is
>  2:  69c3beaabf7 = 2:  c242e2c9168 sparse-index: refactor path_found()
>  3:  0a82e6b4183 = 3:  ad63bf746ca sparse-index: use strbuf in path_found()
>  4:  9549f5b8062 = 4:  db6ded0df0d sparse-index: count lstat() calls
>  5:  0cb344ac14f ! 5:  1f58e19691f sparse-index: improve lstat caching of sparse paths
>      @@ sparse-index.c: static void clear_path_found_data(struct path_found_data *data)
>        }
>
>       +/**
>      -+ * Return the length of the largest common substring that ends in a
>      -+ * slash ('/') to indicate the largest common parent directory. Returns
>      ++ * Return the length of the longest common substring that ends in a
>      ++ * slash ('/') to indicate the longest common parent directory. Returns
>       + * zero if no common directory exists.
>       + */
>       +static size_t max_common_dir_prefix(const char *path1, const char *path2)
>
> --
> gitgitgadget

This version covers the last two outstanding items.

Reviewed-by: Elijah Newren <newren@gmail.com>

  parent reply	other threads:[~2024-06-28 15:07 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-20 16:11 [PATCH 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Derrick Stolee via GitGitGadget
2024-06-20 16:11 ` [PATCH 1/5] sparse-index: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-24 22:12   ` Elijah Newren
2024-06-26 12:42     ` Derrick Stolee
2024-06-20 16:11 ` [PATCH 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-24 22:13   ` Elijah Newren
2024-06-26 12:43     ` Derrick Stolee
2024-06-20 16:11 ` [PATCH 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-24 22:13   ` Elijah Newren
2024-06-20 16:11 ` [PATCH 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-24 22:13   ` Elijah Newren
2024-06-20 16:11 ` [PATCH 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-24 22:14   ` Elijah Newren
2024-06-25  0:08     ` Junio C Hamano
2024-06-26 13:06     ` Derrick Stolee
2024-06-28  0:10       ` Elijah Newren
2024-06-20 19:16 ` [PATCH 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Junio C Hamano
2024-06-20 20:21   ` Derrick Stolee
2024-06-20 21:02     ` Junio C Hamano
2024-06-26 14:29 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2024-06-26 14:29   ` [PATCH v2 1/5] sparse-checkout: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-27 20:59     ` Junio C Hamano
2024-06-28  0:51       ` Elijah Newren
2024-06-28  1:49         ` Derrick Stolee
2024-06-28  5:50         ` Junio C Hamano
2024-06-28  0:31     ` Elijah Newren
2024-06-28  1:56       ` Derrick Stolee
2024-06-26 14:29   ` [PATCH v2 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-26 14:29   ` [PATCH v2 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-26 14:29   ` [PATCH v2 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-26 14:29   ` [PATCH v2 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-27 21:14     ` Junio C Hamano
2024-06-28  1:56       ` Derrick Stolee
2024-06-27 21:46   ` [PATCH v2 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Junio C Hamano
2024-06-28  0:59     ` Elijah Newren
2024-06-28  1:57       ` Derrick Stolee
2024-06-28 12:43   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2024-06-28 12:43     ` [PATCH v3 1/5] sparse-checkout: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-28 12:43     ` [PATCH v3 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-28 12:43     ` [PATCH v3 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-28 12:43     ` [PATCH v3 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-28 12:43     ` [PATCH v3 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-28 15:07     ` Elijah Newren [this message]
2024-06-28 19:34       ` [PATCH v3 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BFd7Bk68Omdao5LS0sP5bK1WQ7V6dodB5x8EsncNARxNA@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=anh@canva.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).