From: Patrick Steinhardt <ps@pks.im>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com, newren@gmail.com,
Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH 0/3] sparse-checkout: add 'clean' command
Date: Tue, 8 Jul 2025 14:15:04 +0200 [thread overview]
Message-ID: <aG0LyDAUSM7F7OmH@pks.im> (raw)
In-Reply-To: <pull.1941.git.1751973594.gitgitgadget@gmail.com>
On Tue, Jul 08, 2025 at 11:19:50AM +0000, Derrick Stolee via GitGitGadget wrote:
> When using cone-mode sparse-checkout, users specify which tracked
> directories they want (recursively) and any directory not part of the parent
> paths for those directories are considered "out of scope". When changing
> sparse-checkouts, there are a variety of reasons why these "out of scope"
> directories could remain, including:
>
> * The user has .gitignore or .git/info/exclude files that tell Git to not
> remove files of a certain type.
> * Some filesystem blocker prevented the removal of a tracked file. This is
> usually more of an issue on Windows where a read handle will block file
> deletion.
>
> Typically, this would not mean too much for the user experience. A few extra
> filesystem checks might be required to satisfy git status commands, but the
> scope of the performance hit is relative to how many cruft files are left
> over in this situation.
>
> However, when using the sparse index, these tracked sparse directories cause
> significant performance issues. When noticing that the index contains a
> sparse directory but that directory exists on disk, Git needs to expand that
> sparse directory to determine which files are tracked or untracked. The
> current mechanism expands the entire index to a full one, an expensive
> operation that scales with the total number of paths at HEAD and not just
> the number of cruft files left over.
>
> Advice was added in 9479a31d603 (advice: warn when sparse index expands,
> 2024-07-08) to help users determine that they were in this state. However,
> the advice doesn't actually recommend helpful ways to get out of this state.
> Recommending "git clean" on its own is incomplete, as typically users
> actually need 'git clean -dfx' to clear out the ignored or excluded files.
> Even then, they may need 'git sparse-checkout reapply' afterwards to clear
> the sparse directories.
>
> The advice was successful in helping to alert users to the problem, which is
> how I got wind of many of these cases for how users get into this state.
> It's now time to give them a tool that helps them out of this state.
As usual for you, this is a nicely-written summary of how we got here
and why the current mechanisms are insufficient for mere mortals.
> This series adds a new 'git sparse-checkout clean' command that currently
> only works for cone-mode sparse-checkouts. The only thing it does is
> collapse the index to a sparse index (as much as possible) and make sure
> that any sparse directories are removed. These directories are listed to
> stdout.
>
> A --dry-run option is available to list the directories that would be
> removed without actually deleting the directories.
>
> This option would be preferred to something like 'git clean -dfx' since it
> does not clear the excluded files that are still within the sparse-checkout.
> Instead, it performs the exact filesystem operations required to refresh the
> sparse index performance back to what is expected.
>
> I spent a few weeks debating with myself about whether or not this was the
> right interface, so please suggest alternatives if you have better ideas.
> Among my rejected ideas include:
>
> * 'git sparse-checkout reapply -f -x' or similar augmentations of
> 'reapply'.
> * 'git clean --sparse' to focus the clean operation on things outside of
> the sparse-checkout.
>
> The implementation is rather simple with the current CLI. Future
> augmentations could include a --quiet option to silence the output and a
> --verbose option to list the files that exist within each directory and
> would/will be removed.
One of the benefits of your new command is that we can extend it in the
future as necessary if we ever notice that there are other things that
we need to do to bring the sparse checkout up to date again. So without
yet having had a look at the implementation I think this direction is
quite sensible.
Ideally it would of course be great if we could automatically fix the
issue for our users. But as we have to prune potentially-ignored data it
is very much a no-go to do that in the automatically.
Patrick
next prev parent reply other threads:[~2025-07-08 12:15 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-08 11:19 [PATCH 0/3] sparse-checkout: add 'clean' command Derrick Stolee via GitGitGadget
2025-07-08 11:19 ` [PATCH 1/3] sparse-checkout: remove use of the_repository Derrick Stolee via GitGitGadget
2025-07-08 20:49 ` Elijah Newren
2025-07-08 20:59 ` Junio C Hamano
2025-07-08 11:19 ` [PATCH 2/3] sparse-checkout: add 'clean' command Derrick Stolee via GitGitGadget
2025-07-08 12:15 ` Patrick Steinhardt
2025-07-08 20:30 ` Junio C Hamano
2025-07-08 21:20 ` Junio C Hamano
2025-07-09 14:39 ` Derrick Stolee
2025-07-09 16:46 ` Junio C Hamano
2025-07-08 21:43 ` Elijah Newren
2025-07-09 16:13 ` Derrick Stolee
2025-07-09 17:35 ` Elijah Newren
2025-07-15 13:38 ` Derrick Stolee
2025-07-15 17:17 ` Elijah Newren
2025-07-08 11:19 ` [PATCH 3/3] sparse-index: point users to new 'clean' action Derrick Stolee via GitGitGadget
2025-07-08 21:45 ` Elijah Newren
2025-07-08 12:15 ` Patrick Steinhardt [this message]
2025-07-08 20:36 ` [PATCH 0/3] sparse-checkout: add 'clean' command Elijah Newren
2025-07-08 22:01 ` Elijah Newren
2025-07-08 23:41 ` Junio C Hamano
2025-07-09 15:41 ` Derrick Stolee
2025-07-17 1:34 ` [PATCH v2 0/8] " Derrick Stolee via GitGitGadget
2025-07-17 1:34 ` [PATCH v2 1/8] sparse-checkout: remove use of the_repository Derrick Stolee via GitGitGadget
2025-07-17 1:34 ` [PATCH v2 2/8] sparse-checkout: add basics of 'clean' command Derrick Stolee via GitGitGadget
2025-08-05 21:32 ` Elijah Newren
2025-09-11 13:37 ` Derrick Stolee
2025-07-17 1:34 ` [PATCH v2 3/8] sparse-checkout: match some 'clean' behavior Derrick Stolee via GitGitGadget
2025-08-05 22:06 ` Elijah Newren
2025-09-11 13:52 ` Derrick Stolee
2025-07-17 1:34 ` [PATCH v2 4/8] dir: add generic "walk all files" helper Derrick Stolee via GitGitGadget
2025-08-05 22:22 ` Elijah Newren
2025-07-17 1:34 ` [PATCH v2 5/8] sparse-checkout: add --verbose option to 'clean' Derrick Stolee via GitGitGadget
2025-08-05 22:22 ` Elijah Newren
2025-09-11 14:06 ` Derrick Stolee
2025-07-17 1:34 ` [PATCH v2 6/8] sparse-index: point users to new 'clean' action Derrick Stolee via GitGitGadget
2025-07-17 1:34 ` [PATCH v2 7/8] t: expand tests around sparse merges and clean Derrick Stolee via GitGitGadget
2025-07-17 1:34 ` [PATCH v2 8/8] sparse-checkout: make 'clean' clear more files Derrick Stolee via GitGitGadget
2025-08-06 0:21 ` Elijah Newren
2025-09-11 15:26 ` Derrick Stolee
2025-09-11 16:21 ` Derrick Stolee
2025-08-28 23:22 ` [PATCH v2 0/8] sparse-checkout: add 'clean' command Junio C Hamano
2025-08-29 0:15 ` Elijah Newren
2025-08-29 0:27 ` Junio C Hamano
2025-08-29 21:03 ` Junio C Hamano
2025-08-30 13:41 ` Derrick Stolee
2025-09-12 10:30 ` [PATCH v3 0/7] " Derrick Stolee via GitGitGadget
2025-09-12 10:30 ` [PATCH v3 1/7] sparse-checkout: remove use of the_repository Derrick Stolee via GitGitGadget
2025-09-12 10:30 ` [PATCH v3 2/7] sparse-checkout: add basics of 'clean' command Derrick Stolee via GitGitGadget
2025-10-07 22:49 ` Elijah Newren
2025-10-20 14:16 ` Derrick Stolee
2025-09-12 10:30 ` [PATCH v3 3/7] sparse-checkout: match some 'clean' behavior Derrick Stolee via GitGitGadget
2025-09-12 10:30 ` [PATCH v3 4/7] dir: add generic "walk all files" helper Derrick Stolee via GitGitGadget
2025-09-12 10:30 ` [PATCH v3 5/7] sparse-checkout: add --verbose option to 'clean' Derrick Stolee via GitGitGadget
2025-09-15 18:09 ` Derrick Stolee
2025-09-15 19:12 ` Junio C Hamano
2025-09-16 2:00 ` Derrick Stolee
2025-09-12 10:30 ` [PATCH v3 6/7] sparse-index: point users to new 'clean' action Derrick Stolee via GitGitGadget
2025-10-07 22:53 ` Elijah Newren
2025-10-20 14:17 ` Derrick Stolee
2025-09-12 10:30 ` [PATCH v3 7/7] t: expand tests around sparse merges and clean Derrick Stolee via GitGitGadget
2025-09-12 16:12 ` [PATCH v3 0/7] sparse-checkout: add 'clean' command Junio C Hamano
2025-09-26 13:40 ` Derrick Stolee
2025-09-26 18:58 ` Elijah Newren
2025-10-07 23:07 ` Elijah Newren
2025-10-20 14:25 ` Derrick Stolee
2025-10-20 14:24 ` [PATCH 8/8] sparse-index: improve advice message instructions Derrick Stolee
2025-10-20 16:29 ` Junio C Hamano
2025-10-24 2:22 ` Elijah Newren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aG0LyDAUSM7F7OmH@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).