git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, newren@gmail.com,
	Patrick Steinhardt <ps@pks.im>, Derrick Stolee <stolee@gmail.com>,
	Derrick Stolee <stolee@gmail.com>
Subject: [PATCH v2 2/8] sparse-checkout: add basics of 'clean' command
Date: Thu, 17 Jul 2025 01:34:08 +0000	[thread overview]
Message-ID: <7e8f7c2d6c8c740d42bc6d157fa491b558b9ff6a.1752716054.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1941.v2.git.1752716054.gitgitgadget@gmail.com>

From: Derrick Stolee <stolee@gmail.com>

When users change their sparse-checkout definitions to add new
directories and remove old ones, there may be a few reasons why
directories no longer in scope remain (ignored or excluded files still
exist, Windows handles are still open, etc.). When these files still
exist, the sparse index feature notices that a tracked, but sparse,
directory still exists on disk and thus the index expands. This causes a
performance hit _and_ the advice printed isn't very helpful. Using 'git
clean' isn't enough (generally '-dfx' may be needed) but also this may
not be sufficient.

Add a new subcommand to 'git sparse-checkout' that removes these
tracked-but-sparse directories. This necessarily removes all files
contained within, including tracked and untracked files. Of particular
importance are ignored and excluded files which would normally be
ignored even by 'git clean -f' unless the '-x' or '-X' option is
provided. This is the most extreme method for doing this, but it works
when the sparse-checkout is in cone mode and is expected to rescope
based on directories, not files.

The current implementation always deletes these sparse directories
without warning. This is unacceptable for a released version, but those
features will be added in changes coming immediately after this one.

Note that untracked directories within the sparse-checkout remain.
Further, directories that contain staged changes or files in merge
conflict states are not deleted. This is a detail that is partly hidden
by the implementation which relies on collapsing the index to a sparse
index in-memory and only deleting directories that are listed as sparse
in the index.

If a staged change exists, then that entry is not stored as a sparse
tree entry and thus remains on-disk until committed or reset.

There are some interesting cases around merge conflict resolution, but
that will be carefully analyzed in the future.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
 Documentation/git-sparse-checkout.adoc | 11 ++++-
 builtin/sparse-checkout.c              | 64 +++++++++++++++++++++++++-
 t/t1091-sparse-checkout-builtin.sh     | 38 +++++++++++++++
 3 files changed, 111 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-sparse-checkout.adoc b/Documentation/git-sparse-checkout.adoc
index 529a8edd9c1e..6db88f00781d 100644
--- a/Documentation/git-sparse-checkout.adoc
+++ b/Documentation/git-sparse-checkout.adoc
@@ -9,7 +9,7 @@ git-sparse-checkout - Reduce your working tree to a subset of tracked files
 SYNOPSIS
 --------
 [verse]
-'git sparse-checkout' (init | list | set | add | reapply | disable | check-rules) [<options>]
+'git sparse-checkout' (init | list | set | add | reapply | disable | check-rules | clean) [<options>]
 
 
 DESCRIPTION
@@ -111,6 +111,15 @@ flags, with the same meaning as the flags from the `set` command, in order
 to change which sparsity mode you are using without needing to also respecify
 all sparsity paths.
 
+'clean'::
+	Remove all files in tracked directories that are outside of the
+	sparse-checkout definition. This subcommand requires cone-mode
+	sparse-checkout to be sure that we know which directories are
+	both tracked and all contained paths are not in the sparse-checkout.
+	This command can be used to be sure the sparse index works
+	efficiently, though it does not require enabling the sparse index
+  feature via the `index.sparse=true` configuration.
+
 'disable'::
 	Disable the `core.sparseCheckout` config setting, and restore the
 	working directory to include all files.
diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 61714bf80be0..6fe6ec718fe3 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -2,6 +2,7 @@
 #define DISABLE_SIGN_COMPARE_WARNINGS
 
 #include "builtin.h"
+#include "abspath.h"
 #include "config.h"
 #include "dir.h"
 #include "environment.h"
@@ -23,7 +24,7 @@
 static const char *empty_base = "";
 
 static char const * const builtin_sparse_checkout_usage[] = {
-	N_("git sparse-checkout (init | list | set | add | reapply | disable | check-rules) [<options>]"),
+	N_("git sparse-checkout (init | list | set | add | reapply | disable | check-rules | clean) [<options>]"),
 	NULL
 };
 
@@ -924,6 +925,66 @@ static int sparse_checkout_reapply(int argc, const char **argv,
 	return update_working_directory(repo, NULL);
 }
 
+static char const * const builtin_sparse_checkout_clean_usage[] = {
+	"git sparse-checkout clean [-n|--dry-run]",
+	NULL
+};
+
+static const char *msg_remove = N_("Removing %s\n");
+
+static int sparse_checkout_clean(int argc, const char **argv,
+				   const char *prefix,
+				   struct repository *repo)
+{
+	struct strbuf full_path = STRBUF_INIT;
+	const char *msg = msg_remove;
+	size_t worktree_len;
+
+	struct option builtin_sparse_checkout_clean_options[] = {
+		OPT_END(),
+	};
+
+	setup_work_tree();
+	if (!repo->settings.sparse_checkout)
+		die(_("must be in a sparse-checkout to clean directories"));
+	if (!repo->settings.sparse_checkout_cone)
+		die(_("must be in a cone-mode sparse-checkout to clean directories"));
+
+	argc = parse_options(argc, argv, prefix,
+			     builtin_sparse_checkout_clean_options,
+			     builtin_sparse_checkout_clean_usage, 0);
+
+	if (repo_read_index(repo) < 0)
+		die(_("failed to read index"));
+
+	if (convert_to_sparse(repo->index, SPARSE_INDEX_MEMORY_ONLY) ||
+	    repo->index->sparse_index == INDEX_EXPANDED)
+		die(_("failed to convert index to a sparse index; resolve merge conflicts and try again"));
+
+	strbuf_addstr(&full_path, repo->worktree);
+	strbuf_addch(&full_path, '/');
+	worktree_len = full_path.len;
+
+	for (size_t i = 0; i < repo->index->cache_nr; i++) {
+		struct cache_entry *ce = repo->index->cache[i];
+		if (!S_ISSPARSEDIR(ce->ce_mode))
+			continue;
+		strbuf_setlen(&full_path, worktree_len);
+		strbuf_add(&full_path, ce->name, ce->ce_namelen);
+
+		if (!is_directory(full_path.buf))
+			continue;
+
+		printf(msg, ce->name);
+
+		if (remove_dir_recursively(&full_path, 0))
+			warning_errno(_("failed to remove '%s'"), ce->name);
+	}
+
+	strbuf_release(&full_path);
+	return 0;
+}
+
 static char const * const builtin_sparse_checkout_disable_usage[] = {
 	"git sparse-checkout disable",
 	NULL
@@ -1079,6 +1140,7 @@ int cmd_sparse_checkout(int argc,
 		OPT_SUBCOMMAND("set", &fn, sparse_checkout_set),
 		OPT_SUBCOMMAND("add", &fn, sparse_checkout_add),
 		OPT_SUBCOMMAND("reapply", &fn, sparse_checkout_reapply),
+		OPT_SUBCOMMAND("clean", &fn, sparse_checkout_clean),
 		OPT_SUBCOMMAND("disable", &fn, sparse_checkout_disable),
 		OPT_SUBCOMMAND("check-rules", &fn, sparse_checkout_check_rules),
 		OPT_END(),
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index ab3a105ffff2..a48eedf766d2 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -1050,5 +1050,43 @@ test_expect_success 'check-rules null termination' '
 	test_cmp expect actual
 '
 
+test_expect_success 'clean' '
+	git -C repo sparse-checkout set --cone deep/deeper1 &&
+	mkdir repo/deep/deeper2 repo/folder1 &&
+	touch repo/deep/deeper2/file &&
+	touch repo/folder1/file &&
+
+	cat >expect <<-\EOF &&
+	Removing deep/deeper2/
+	Removing folder1/
+	EOF
+
+	git -C repo sparse-checkout clean >out &&
+	test_cmp expect out &&
+
+	test_path_is_missing repo/deep/deeper2 &&
+	test_path_is_missing repo/folder1
+'
+
+test_expect_success 'clean with staged sparse change' '
+	git -C repo sparse-checkout set --cone deep/deeper1 &&
+	mkdir repo/deep/deeper2 repo/folder1 repo/folder2 &&
+	touch repo/deep/deeper2/file &&
+	touch repo/folder1/file &&
+	echo dirty >repo/folder2/a &&
+
+	git -C repo add --sparse folder1/file &&
+
+	# deletes deep/deeper2/ but leaves folder1/ and folder2/
+	cat >expect <<-\EOF &&
+	Removing deep/deeper2/
+	EOF
+
+	git -C repo sparse-checkout clean >out &&
+	test_cmp expect out &&
+
+	test_path_is_missing repo/deep/deeper2 &&
+	test_path_exists repo/folder1
+'
 
 test_done
-- 
gitgitgadget


  parent reply	other threads:[~2025-07-17  1:34 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-08 11:19 [PATCH 0/3] sparse-checkout: add 'clean' command Derrick Stolee via GitGitGadget
2025-07-08 11:19 ` [PATCH 1/3] sparse-checkout: remove use of the_repository Derrick Stolee via GitGitGadget
2025-07-08 20:49   ` Elijah Newren
2025-07-08 20:59   ` Junio C Hamano
2025-07-08 11:19 ` [PATCH 2/3] sparse-checkout: add 'clean' command Derrick Stolee via GitGitGadget
2025-07-08 12:15   ` Patrick Steinhardt
2025-07-08 20:30     ` Junio C Hamano
2025-07-08 21:20   ` Junio C Hamano
2025-07-09 14:39     ` Derrick Stolee
2025-07-09 16:46       ` Junio C Hamano
2025-07-08 21:43   ` Elijah Newren
2025-07-09 16:13     ` Derrick Stolee
2025-07-09 17:35       ` Elijah Newren
2025-07-15 13:38         ` Derrick Stolee
2025-07-15 17:17           ` Elijah Newren
2025-07-08 11:19 ` [PATCH 3/3] sparse-index: point users to new 'clean' action Derrick Stolee via GitGitGadget
2025-07-08 21:45   ` Elijah Newren
2025-07-08 12:15 ` [PATCH 0/3] sparse-checkout: add 'clean' command Patrick Steinhardt
2025-07-08 20:36 ` Elijah Newren
2025-07-08 22:01   ` Elijah Newren
2025-07-08 23:41 ` Junio C Hamano
2025-07-09 15:41   ` Derrick Stolee
2025-07-17  1:34 ` [PATCH v2 0/8] " Derrick Stolee via GitGitGadget
2025-07-17  1:34   ` [PATCH v2 1/8] sparse-checkout: remove use of the_repository Derrick Stolee via GitGitGadget
2025-07-17  1:34   ` Derrick Stolee via GitGitGadget [this message]
2025-08-05 21:32     ` [PATCH v2 2/8] sparse-checkout: add basics of 'clean' command Elijah Newren
2025-09-11 13:37       ` Derrick Stolee
2025-07-17  1:34   ` [PATCH v2 3/8] sparse-checkout: match some 'clean' behavior Derrick Stolee via GitGitGadget
2025-08-05 22:06     ` Elijah Newren
2025-09-11 13:52       ` Derrick Stolee
2025-07-17  1:34   ` [PATCH v2 4/8] dir: add generic "walk all files" helper Derrick Stolee via GitGitGadget
2025-08-05 22:22     ` Elijah Newren
2025-07-17  1:34   ` [PATCH v2 5/8] sparse-checkout: add --verbose option to 'clean' Derrick Stolee via GitGitGadget
2025-08-05 22:22     ` Elijah Newren
2025-09-11 14:06       ` Derrick Stolee
2025-07-17  1:34   ` [PATCH v2 6/8] sparse-index: point users to new 'clean' action Derrick Stolee via GitGitGadget
2025-07-17  1:34   ` [PATCH v2 7/8] t: expand tests around sparse merges and clean Derrick Stolee via GitGitGadget
2025-07-17  1:34   ` [PATCH v2 8/8] sparse-checkout: make 'clean' clear more files Derrick Stolee via GitGitGadget
2025-08-06  0:21     ` Elijah Newren
2025-09-11 15:26       ` Derrick Stolee
2025-09-11 16:21         ` Derrick Stolee
2025-08-28 23:22   ` [PATCH v2 0/8] sparse-checkout: add 'clean' command Junio C Hamano
2025-08-29  0:15     ` Elijah Newren
2025-08-29  0:27       ` Junio C Hamano
2025-08-29 21:03         ` Junio C Hamano
2025-08-30 13:41           ` Derrick Stolee
2025-09-12 10:30   ` [PATCH v3 0/7] " Derrick Stolee via GitGitGadget
2025-09-12 10:30     ` [PATCH v3 1/7] sparse-checkout: remove use of the_repository Derrick Stolee via GitGitGadget
2025-09-12 10:30     ` [PATCH v3 2/7] sparse-checkout: add basics of 'clean' command Derrick Stolee via GitGitGadget
2025-10-07 22:49       ` Elijah Newren
2025-10-20 14:16         ` Derrick Stolee
2025-09-12 10:30     ` [PATCH v3 3/7] sparse-checkout: match some 'clean' behavior Derrick Stolee via GitGitGadget
2025-09-12 10:30     ` [PATCH v3 4/7] dir: add generic "walk all files" helper Derrick Stolee via GitGitGadget
2025-09-12 10:30     ` [PATCH v3 5/7] sparse-checkout: add --verbose option to 'clean' Derrick Stolee via GitGitGadget
2025-09-15 18:09       ` Derrick Stolee
2025-09-15 19:12         ` Junio C Hamano
2025-09-16  2:00           ` Derrick Stolee
2025-09-12 10:30     ` [PATCH v3 6/7] sparse-index: point users to new 'clean' action Derrick Stolee via GitGitGadget
2025-10-07 22:53       ` Elijah Newren
2025-10-20 14:17         ` Derrick Stolee
2025-09-12 10:30     ` [PATCH v3 7/7] t: expand tests around sparse merges and clean Derrick Stolee via GitGitGadget
2025-09-12 16:12     ` [PATCH v3 0/7] sparse-checkout: add 'clean' command Junio C Hamano
2025-09-26 13:40       ` Derrick Stolee
2025-09-26 18:58         ` Elijah Newren
2025-10-07 23:07     ` Elijah Newren
2025-10-20 14:25       ` Derrick Stolee
2025-10-20 14:24     ` [PATCH 8/8] sparse-index: improve advice message instructions Derrick Stolee
2025-10-20 16:29       ` Junio C Hamano
2025-10-24  2:22       ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7e8f7c2d6c8c740d42bc6d157fa491b558b9ff6a.1752716054.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=ps@pks.im \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).