All of lore.kernel.org
 help / color / mirror / Atom feed
From: "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, me@ttaylorr.com, peff@peff.net,
	ZheNing Hu <adlternative@gmail.com>,
	ZheNing Hu <adlternative@gmail.com>
Subject: [PATCH v4] gc: add `--expire-to` option
Date: Fri, 24 Jan 2025 07:49:14 +0000	[thread overview]
Message-ID: <pull.1843.v4.git.1737704954987.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1843.v3.git.1736994932003.gitgitgadget@gmail.com>

From: ZheNing Hu <adlternative@gmail.com>

This commit extends the functionality of `git gc`
by adding a new option, `--expire-to=<dir>`. Previously,
this feature was implemented in 91badeba32 (builtin/repack.c:
implement `--expire-to` for storing pruned objects, 2022-10-24),
which allowing users to specify a directory where unreachable
and expired cruft packs are stored during garbage collection.
However, users had to run `git repack --cruft --expire-to=<dir>`
followed by `git prune` to achieve similar results within `git gc`.

By introducing `--expire-to=<dir>` directly into `git gc`,
we simplify the process for users who wish to manage their
repository's cleanup more efficiently. This change involves
passing the `--expire-to=<dir>` parameter through to `git repack`,
making it easier for users to set up a backup location for cruft
packs that will be pruned.

Due to the original `git gc --prune=now` deleting all unreachable
objects by passing the `-a` parameter to git repack. With the
addition of the `--cruft` and `--expire-to` options, it is necessary
to modify this default behavior: instead of deleting these
unreachable objects, they should be merged into a cruft pack and
collected in a specified directory. Therefore, we do not pass `-a`
to the repack command but instead pass `--cruft`, `--expire-to`,
and `--cruft-expiration=now` to repack.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
    gc: add --expire-to option
    
    I want to perform a "safe" garbage collection for the Git repository on
    the server, which avoids data corruption issues caused by concurrent
    pushes during git GC. To achieve this, I currently need to use git
    repack --cruft --expire-to=<dir> and git prune in combination. However,
    it would be simpler if we could directly use --expire-to=<dir> with the
    git-gc command.
    
    v1: add --expire-to option to gc v1 -> v2: fix git gc --prune=now with
    --expire-to v2 -> v3: squash two patch into one patch v3 -> v4: modify
    docs, commit message, and give more tests

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1843%2Fadlternative%2Fzh%2Fgc-expire-to-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1843/adlternative/zh/gc-expire-to-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1843

Range-diff vs v3:

 1:  0842ec34948 ! 1:  6946ccde275 gc: add `--expire-to` option
     @@ Commit message
      
          This commit extends the functionality of `git gc`
          by adding a new option, `--expire-to=<dir>`. Previously,
     -    this feature was implemented in `git repack` (see 91badeb),
     -    allowing users to specify a directory where unreachable and
     -    expired cruft packs are stored during garbage collection.
     +    this feature was implemented in 91badeba32 (builtin/repack.c:
     +    implement `--expire-to` for storing pruned objects, 2022-10-24),
     +    which allowing users to specify a directory where unreachable
     +    and expired cruft packs are stored during garbage collection.
          However, users had to run `git repack --cruft --expire-to=<dir>`
          followed by `git prune` to achieve similar results within `git gc`.
      
     @@ Commit message
          making it easier for users to set up a backup location for cruft
          packs that will be pruned.
      
     -    Note: When git-gc is used with both `--cruft` and `--expire-to`,
     -    it does not pass `-a` to git-repack to delete all unreachable
     -    objects as `git gc --prune=now` originally did. Instead, it
     -    generates a cruft pack in the directory specified by expire-to.
     +    Due to the original `git gc --prune=now` deleting all unreachable
     +    objects by passing the `-a` parameter to git repack. With the
     +    addition of the `--cruft` and `--expire-to` options, it is necessary
     +    to modify this default behavior: instead of deleting these
     +    unreachable objects, they should be merged into a cruft pack and
     +    collected in a specified directory. Therefore, we do not pass `-a`
     +    to the repack command but instead pass `--cruft`, `--expire-to`,
     +    and `--cruft-expiration=now` to repack.
      
          Signed-off-by: ZheNing Hu <adlternative@gmail.com>
      
     @@ Documentation/git-gc.txt: be performed as well.
      +--expire-to=<dir>::
      +	When packing unreachable objects into a cruft pack, write a cruft
      +	pack containing pruned objects (if any) to the directory `<dir>`.
     ++	This option only has an effect when used together with `--cruft`.
      +	See the `--expire-to` option of linkgit:git-repack[1] for
     -+	more.
     ++	more information.
      +
       --prune=<date>::
       	Prune loose objects older than date (default is 2 weeks ago,
     @@ t/t6500-gc.sh: test_expect_success 'gc.maxCruftSize sets appropriate repack opti
       	test_subcommand $cruft_max_size_opts --max-cruft-size=3145728 <trace2.txt
       '
       
     -+test_expect_success '--expire-to sets appropriate repack options' '
     ++test_expect_success '--expire-to sets repack --expire-to' '
     ++	rm -rf expired &&
      +	mkdir expired &&
     -+	GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size gc --cruft --expire-to=./expired/pack &&
     -+	test_subcommand $cruft_max_size_opts --expire-to=./expired/pack <trace2.txt
     ++	expire_to="$(pwd)/expired/pack" &&
     ++	GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size gc --cruft --expire-to="$expire_to" &&
     ++	test_subcommand $cruft_max_size_opts --expire-to="$expire_to" <trace2.txt
     ++'
     ++
     ++test_expect_success '--expire-to with --prune=now sets repack --expire-to' '
     ++	rm -rf expired &&
     ++	mkdir expired &&
     ++	expire_to="$(pwd)/expired/pack" &&
     ++	GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size gc --cruft --prune=now --expire-to="$expire_to" &&
     ++	test_subcommand git repack -d -l --cruft --cruft-expiration=now --expire-to="$expire_to" <trace2.txt
     ++'
     ++
     ++
     ++test_expect_success '--expire-to with --no-cruft sets repack -A' '
     ++	rm -rf expired &&
     ++	mkdir expired &&
     ++	expire_to="$(pwd)/expired/pack" &&
     ++	GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size gc --no-cruft --expire-to="$expire_to" &&
     ++	test_subcommand git repack -d -l -A --unpack-unreachable=2.weeks.ago <trace2.txt
     ++'
     ++
     ++test_expect_success '--expire-to with --no-cruft sets repack -a' '
     ++	rm -rf expired &&
     ++	mkdir expired &&
     ++	expire_to="$(pwd)/expired/pack" &&
     ++	GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size gc --no-cruft --prune=now --expire-to="$expire_to" &&
     ++	test_subcommand git repack -d -l -a <trace2.txt
      +'
      +
       run_and_wait_for_gc () {


 Documentation/git-gc.txt |  7 +++++++
 builtin/gc.c             |  9 +++++++--
 t/t6500-gc.sh            | 33 +++++++++++++++++++++++++++++++++
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 370e22faaeb..0eac8e85f08 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -69,6 +69,13 @@ be performed as well.
 	the `--max-cruft-size` option of linkgit:git-repack[1] for
 	more.
 
+--expire-to=<dir>::
+	When packing unreachable objects into a cruft pack, write a cruft
+	pack containing pruned objects (if any) to the directory `<dir>`.
+	This option only has an effect when used together with `--cruft`.
+	See the `--expire-to` option of linkgit:git-repack[1] for
+	more information.
+
 --prune=<date>::
 	Prune loose objects older than date (default is 2 weeks ago,
 	overridable by the config variable `gc.pruneExpire`).
diff --git a/builtin/gc.c b/builtin/gc.c
index d52735354c9..8656e1caff0 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -136,6 +136,7 @@ struct gc_config {
 	char *prune_worktrees_expire;
 	char *repack_filter;
 	char *repack_filter_to;
+	char *repack_expire_to;
 	unsigned long big_pack_threshold;
 	unsigned long max_delta_cache_size;
 };
@@ -432,7 +433,8 @@ static int keep_one_pack(struct string_list_item *item, void *data UNUSED)
 static void add_repack_all_option(struct gc_config *cfg,
 				  struct string_list *keep_pack)
 {
-	if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now"))
+	if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now")
+		&& !(cfg->cruft_packs && cfg->repack_expire_to))
 		strvec_push(&repack, "-a");
 	else if (cfg->cruft_packs) {
 		strvec_push(&repack, "--cruft");
@@ -441,6 +443,8 @@ static void add_repack_all_option(struct gc_config *cfg,
 		if (cfg->max_cruft_size)
 			strvec_pushf(&repack, "--max-cruft-size=%lu",
 				     cfg->max_cruft_size);
+		if (cfg->repack_expire_to)
+			strvec_pushf(&repack, "--expire-to=%s", cfg->repack_expire_to);
 	} else {
 		strvec_push(&repack, "-A");
 		if (cfg->prune_expire)
@@ -675,7 +679,6 @@ struct repository *repo UNUSED)
 	const char *prune_expire_sentinel = "sentinel";
 	const char *prune_expire_arg = prune_expire_sentinel;
 	int ret;
-
 	struct option builtin_gc_options[] = {
 		OPT__QUIET(&quiet, N_("suppress progress reporting")),
 		{ OPTION_STRING, 0, "prune", &prune_expire_arg, N_("date"),
@@ -694,6 +697,8 @@ struct repository *repo UNUSED)
 			   PARSE_OPT_NOCOMPLETE),
 		OPT_BOOL(0, "keep-largest-pack", &keep_largest_pack,
 			 N_("repack all other packs except the largest pack")),
+		OPT_STRING(0, "expire-to", &cfg.repack_expire_to, N_("dir"),
+			   N_("pack prefix to store a pack containing pruned objects")),
 		OPT_END()
 	};
 
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index ee074b99b70..74f7bd09046 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -339,6 +339,39 @@ test_expect_success 'gc.maxCruftSize sets appropriate repack options' '
 	test_subcommand $cruft_max_size_opts --max-cruft-size=3145728 <trace2.txt
 '
 
+test_expect_success '--expire-to sets repack --expire-to' '
+	rm -rf expired &&
+	mkdir expired &&
+	expire_to="$(pwd)/expired/pack" &&
+	GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size gc --cruft --expire-to="$expire_to" &&
+	test_subcommand $cruft_max_size_opts --expire-to="$expire_to" <trace2.txt
+'
+
+test_expect_success '--expire-to with --prune=now sets repack --expire-to' '
+	rm -rf expired &&
+	mkdir expired &&
+	expire_to="$(pwd)/expired/pack" &&
+	GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size gc --cruft --prune=now --expire-to="$expire_to" &&
+	test_subcommand git repack -d -l --cruft --cruft-expiration=now --expire-to="$expire_to" <trace2.txt
+'
+
+
+test_expect_success '--expire-to with --no-cruft sets repack -A' '
+	rm -rf expired &&
+	mkdir expired &&
+	expire_to="$(pwd)/expired/pack" &&
+	GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size gc --no-cruft --expire-to="$expire_to" &&
+	test_subcommand git repack -d -l -A --unpack-unreachable=2.weeks.ago <trace2.txt
+'
+
+test_expect_success '--expire-to with --no-cruft sets repack -a' '
+	rm -rf expired &&
+	mkdir expired &&
+	expire_to="$(pwd)/expired/pack" &&
+	GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size gc --no-cruft --prune=now --expire-to="$expire_to" &&
+	test_subcommand git repack -d -l -a <trace2.txt
+'
+
 run_and_wait_for_gc () {
 	# We read stdout from gc for the side effect of waiting until the
 	# background gc process exits, closing its fd 9.  Furthermore, the

base-commit: 92999a42db1c5f43f330e4f2bca4026b5b81576f
-- 
gitgitgadget

  parent reply	other threads:[~2025-01-24  7:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-24 11:52 [PATCH] gc: add `--expire-to` option ZheNing Hu via GitGitGadget
2024-12-31  2:18 ` [PATCH v2 0/2] gc: add --expire-to option ZheNing Hu via GitGitGadget
2024-12-31  2:18   ` [PATCH v2 1/2] gc: add `--expire-to` option ZheNing Hu via GitGitGadget
2025-01-03  4:57     ` ZheNing Hu
2025-01-13  7:12     ` ZheNing Hu
2024-12-31  2:18   ` [PATCH v2 2/2] fix(gc): make --prune=now compatible with --expire-to ZheNing Hu via GitGitGadget
2025-01-13  9:17     ` Jeff King
2025-01-15  7:56       ` ZheNing Hu
2025-01-16  2:35   ` [PATCH v3] gc: add `--expire-to` option ZheNing Hu via GitGitGadget
2025-01-16 18:23     ` Junio C Hamano
2025-01-23  3:42       ` ZheNing Hu
2025-01-24  7:49     ` ZheNing Hu via GitGitGadget [this message]
2025-02-04 18:15       ` [PATCH v4] " Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1843.v4.git.1737704954987.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=adlternative@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.