All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, newren@gmail.com, anh@canva.com,
	Derrick Stolee <stolee@gmail.com>
Subject: [PATCH v3 0/5] sparse-index: improve clear_skip_worktree_from_present_files()
Date: Fri, 28 Jun 2024 12:43:20 +0000	[thread overview]
Message-ID: <pull.1754.v3.git.1719578605.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1754.v2.git.1719412192.gitgitgadget@gmail.com>

While doing some investigation in a private monorepo with sparse-checkout
and a sparse index, I accidentally left a modified file outside of my
sparse-checkout cone. This caused my Git commands to slow to a crawl, so I
reran with GIT_TRACE2_PERF=1.

While I was able to identify clear_skip_worktree_from_present_files() as the
culprit, it took longer than desired to figure out what was going on. This
series intends to both fix the performance issue (as much as possible) and
do some refactoring to make it easier to understand what is happening.

In the end, I was able to reduce the number of lstat() calls in my case from
over 1.1 million to about 4,400, improving the time from 13.4s to 81ms on a
warm disk cache. (These numbers are from a test after v2, which somehow hit
the old caching algorithm even worse than my test in v1.)


Updates in v3
=============

 * Removed the incorrect paragraph in the commit message of patch 1.
 * Replaced "largest" with "longest" in the final patch.

Thanks, Stolee

Derrick Stolee (5):
  sparse-checkout: refactor skip worktree retry logic
  sparse-index: refactor path_found()
  sparse-index: use strbuf in path_found()
  sparse-index: count lstat() calls
  sparse-index: improve lstat caching of sparse paths

 sparse-index.c | 216 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 164 insertions(+), 52 deletions(-)


base-commit: 66ac6e4bcd111be3fa9c2a6b3fafea718d00678d
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1754%2Fderrickstolee%2Fclear-skip-speed-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1754/derrickstolee/clear-skip-speed-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1754

Range-diff vs v2:

 1:  93d0baed0b0 ! 1:  0844cda94cf sparse-checkout: refactor skip worktree retry logic
     @@ Commit message
          stored in the index, so caching was introduced in d79d299352 (Accelerate
          clear_skip_worktree_from_present_files() by caching, 2022-01-14).
      
     -    If users are having trouble with the performance of this operation and
     -    don't care about paths outside of the sparse-checkout, they can disable
     -    them using the sparse.expectFilesOutsideOfPatterns config option
     -    introduced in ecc7c8841d (repo_read_index: add config to expect files
     -    outside sparse patterns, 2022-02-25).
     -
          This check is particularly confusing in the presence of a sparse index,
          as a sparse tree entry corresponding to an existing directory must first
          be expanded to a full index before examining the paths within. This is
 2:  69c3beaabf7 = 2:  c242e2c9168 sparse-index: refactor path_found()
 3:  0a82e6b4183 = 3:  ad63bf746ca sparse-index: use strbuf in path_found()
 4:  9549f5b8062 = 4:  db6ded0df0d sparse-index: count lstat() calls
 5:  0cb344ac14f ! 5:  1f58e19691f sparse-index: improve lstat caching of sparse paths
     @@ sparse-index.c: static void clear_path_found_data(struct path_found_data *data)
       }
       
      +/**
     -+ * Return the length of the largest common substring that ends in a
     -+ * slash ('/') to indicate the largest common parent directory. Returns
     ++ * Return the length of the longest common substring that ends in a
     ++ * slash ('/') to indicate the longest common parent directory. Returns
      + * zero if no common directory exists.
      + */
      +static size_t max_common_dir_prefix(const char *path1, const char *path2)

-- 
gitgitgadget

  parent reply	other threads:[~2024-06-28 12:43 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-20 16:11 [PATCH 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Derrick Stolee via GitGitGadget
2024-06-20 16:11 ` [PATCH 1/5] sparse-index: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-24 22:12   ` Elijah Newren
2024-06-26 12:42     ` Derrick Stolee
2024-06-20 16:11 ` [PATCH 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-24 22:13   ` Elijah Newren
2024-06-26 12:43     ` Derrick Stolee
2024-06-20 16:11 ` [PATCH 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-24 22:13   ` Elijah Newren
2024-06-20 16:11 ` [PATCH 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-24 22:13   ` Elijah Newren
2024-06-20 16:11 ` [PATCH 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-24 22:14   ` Elijah Newren
2024-06-25  0:08     ` Junio C Hamano
2024-06-26 13:06     ` Derrick Stolee
2024-06-28  0:10       ` Elijah Newren
2024-06-20 19:16 ` [PATCH 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Junio C Hamano
2024-06-20 20:21   ` Derrick Stolee
2024-06-20 21:02     ` Junio C Hamano
2024-06-26 14:29 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2024-06-26 14:29   ` [PATCH v2 1/5] sparse-checkout: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-27 20:59     ` Junio C Hamano
2024-06-28  0:51       ` Elijah Newren
2024-06-28  1:49         ` Derrick Stolee
2024-06-28  5:50         ` Junio C Hamano
2024-06-28  0:31     ` Elijah Newren
2024-06-28  1:56       ` Derrick Stolee
2024-06-26 14:29   ` [PATCH v2 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-26 14:29   ` [PATCH v2 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-26 14:29   ` [PATCH v2 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-26 14:29   ` [PATCH v2 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-27 21:14     ` Junio C Hamano
2024-06-28  1:56       ` Derrick Stolee
2024-06-27 21:46   ` [PATCH v2 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Junio C Hamano
2024-06-28  0:59     ` Elijah Newren
2024-06-28  1:57       ` Derrick Stolee
2024-06-28 12:43   ` Derrick Stolee via GitGitGadget [this message]
2024-06-28 12:43     ` [PATCH v3 1/5] sparse-checkout: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-28 12:43     ` [PATCH v3 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-28 12:43     ` [PATCH v3 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-28 12:43     ` [PATCH v3 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-28 12:43     ` [PATCH v3 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-28 15:07     ` [PATCH v3 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Elijah Newren
2024-06-28 19:34       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1754.v3.git.1719578605.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=anh@canva.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.