git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] maintenance: add prune-remote-refs task
@ 2024-12-23  9:36 Shubham Kanodia via GitGitGadget
  2024-12-27  9:07 ` Junio C Hamano
  2024-12-28 10:07 ` [PATCH v2] " Shubham Kanodia via GitGitGadget
  0 siblings, 2 replies; 15+ messages in thread
From: Shubham Kanodia via GitGitGadget @ 2024-12-23  9:36 UTC (permalink / raw)
  To: git; +Cc: "mailto:gitster@pobox.com" <

From: Shubham Kanodia <shubham.kanodia10@gmail.com>

Remote-tracking refs can accumulate in local repositories even as branches
are deleted on remotes, impacting git performance negatively. Existing
alternatives to keep refs pruned have a few issues — 

1. The `fetch.prune` config automatically cleans up remote ref on fetch,
but also pulls in new ref from remote which is an undesirable side-effect.

2.`git remote prune` cleans up refs without adding to the existing list
but requires periodic user intervention.

This adds a new maintenance task 'prune-remote-refs' that runs
'git remote prune' for each configured remote daily. This provides an
automated way to clean up stale remote-tracking refs — especially when
users may not do a full fetch.

This task is disabled by default.

Signed-off-by: Shubham Kanodia <shubham.kanodia10@gmail.com>
---
    maintenance: add prune-remote-refs task
    
    As discussed previously on:
    https://lore.kernel.org/git/xmqqwmfr112w.fsf@gitster.g/T/#t
    
    Remote-tracking refs can accumulate in local repositories even as
    branches are deleted on remotes, impacting git performance negatively.
    Existing alternatives to keep refs pruned have a few issues — 
    
     1. The fetch.prune config automatically cleans up remote ref on fetch,
        but also pulls in new ref from remote which is an undesirable
        side-effect.
    
    2.git remote prune cleans up refs without adding to the existing list
    but requires periodic user intervention.
    
    This adds a new maintenance task 'prune-remote-refs' that runs 'git
    remote prune' for each configured remote daily. This provides an
    automated way to clean up stale remote-tracking refs — especially when
    users may not do a full fetch.
    
    This task is disabled by default.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1838%2Fpastelsky%2Fsk%2Fadd-remote-prune-maintenance-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1838/pastelsky/sk/add-remote-prune-maintenance-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1838

 Documentation/git-maintenance.txt | 20 ++++++++++++++
 builtin/gc.c                      | 42 +++++++++++++++++++++++++++++
 t/t7900-maintenance.sh            | 44 +++++++++++++++++++++++++++++++
 3 files changed, 106 insertions(+)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 6e6651309d3..0c8f1e01ccd 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -158,6 +158,26 @@ pack-refs::
 	need to iterate across many references. See linkgit:git-pack-refs[1]
 	for more information.
 
+prune-remote-refs::
+	The `prune-remote-refs` task runs `git remote prune` on each remote
+	repository registered in the local repository. This task helps clean
+	up deleted remote branches, improving the performance of operations
+	that iterate through the refs. See linkgit:git-remote[1] for more
+	information. This task is disabled by default.
++
+NOTE: This task is opt-in to prevent unexpected removal of remote refs
+for users of git-maintenance. For most users, configuring `fetch.prune=true`
+is a acceptable solution, as it will automatically clean up stale remote-tracking
+branches during normal fetch operations. However, this task can be useful in
+specific scenarios:
++
+--
+* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`)
+  where `fetch.prune` would not affect refs outside the fetched hierarchy
+* When third-party tools might perform unexpected full fetches, and you want
+  periodic cleanup independently of fetch operations
+--
+
 OPTIONS
 -------
 --auto::
diff --git a/builtin/gc.c b/builtin/gc.c
index 4ae5196aedf..9acf1d29895 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -20,6 +20,7 @@
 #include "lockfile.h"
 #include "parse-options.h"
 #include "run-command.h"
+#include "remote.h"
 #include "sigchain.h"
 #include "strvec.h"
 #include "commit.h"
@@ -913,6 +914,40 @@ static int maintenance_opt_schedule(const struct option *opt, const char *arg,
 	return 0;
 }
 
+static int collect_remote(struct remote *remote, void *cb_data)
+{
+	struct string_list *list = cb_data;
+
+	if (!remote->url.nr)
+		return 0;
+
+	string_list_append(list, remote->name);
+	return 0;
+}
+
+static int maintenance_task_prune_remote(struct maintenance_run_opts *opts UNUSED,
+					 struct gc_config *cfg UNUSED)
+{
+	struct string_list_item *item;
+	struct string_list remotes_list = STRING_LIST_INIT_NODUP;
+	struct child_process child = CHILD_PROCESS_INIT;
+	int result = 0;
+
+	for_each_remote(collect_remote, &remotes_list);
+
+	for_each_string_list_item (item, &remotes_list) {
+		const char *remote_name = item->string;
+		child.git_cmd = 1;
+		strvec_pushl(&child.args, "remote", "prune", remote_name, NULL);
+
+		if (run_command(&child))
+			result = error(_("failed to prune '%s'"), remote_name);
+	}
+
+	string_list_clear(&remotes_list, 0);
+	return result;
+}
+
 /* Remember to update object flag allocation in object.h */
 #define SEEN		(1u<<0)
 
@@ -1375,6 +1410,7 @@ enum maintenance_task_label {
 	TASK_GC,
 	TASK_COMMIT_GRAPH,
 	TASK_PACK_REFS,
+	TASK_PRUNE_REMOTE_REFS,
 
 	/* Leave as final value */
 	TASK__COUNT
@@ -1411,6 +1447,10 @@ static struct maintenance_task tasks[] = {
 		maintenance_task_pack_refs,
 		pack_refs_condition,
 	},
+	[TASK_PRUNE_REMOTE_REFS] = {
+		"prune-remote-refs",
+		maintenance_task_prune_remote,
+	},
 };
 
 static int compare_tasks_by_selection(const void *a_, const void *b_)
@@ -1505,6 +1545,8 @@ static void initialize_maintenance_strategy(void)
 		tasks[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY;
 		tasks[TASK_PACK_REFS].enabled = 1;
 		tasks[TASK_PACK_REFS].schedule = SCHEDULE_WEEKLY;
+		tasks[TASK_PRUNE_REMOTE_REFS].enabled = 0;
+		tasks[TASK_PRUNE_REMOTE_REFS].schedule = SCHEDULE_DAILY;
 	}
 }
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 0ce4ba1cbef..60a0c3f8353 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -446,6 +446,50 @@ test_expect_success 'pack-refs task' '
 	test_subcommand git pack-refs --all --prune <pack-refs.txt
 '
 
+test_expect_success 'prune-remote-refs task not enabled by default' '
+	git clone . prune-test &&
+	(
+		cd prune-test &&
+		GIT_TRACE2_EVENT="$(pwd)/prune.txt" git maintenance run 2>err &&
+		test_subcommand ! git remote prune origin <prune.txt
+	)
+'
+
+test_expect_success 'prune-remote-refs task cleans stale remote refs' '
+	test_commit initial &&
+
+	# Create two separate remote repos
+	git clone . remote1 &&
+	git clone . remote2 &&
+
+	git clone . prune-test-clean &&
+	(
+		cd prune-test-clean &&
+		git config maintenance.prune-remote-refs.enabled true &&
+
+		# Add both remotes
+		git remote add remote1 "../remote1" &&
+		git remote add remote2 "../remote2" &&
+
+		# Create and push branches to both remotes
+		git branch -f side2 HEAD &&
+		git push remote1 side2 &&
+		git push remote2 side2 &&
+
+		# Rename branches in each remote to simulate a stale branch
+		git -C ../remote1 branch -m side2 side3 &&
+		git -C ../remote2 branch -m side2 side4 &&
+
+		GIT_TRACE2_EVENT="$(pwd)/prune.txt" git maintenance run --task=prune-remote-refs &&
+
+		# Verify pruning happened for both remotes
+		test_subcommand git remote prune remote1 <prune.txt &&
+		test_subcommand git remote prune remote2 <prune.txt &&
+		test_must_fail git rev-parse refs/remotes/remote1/side2 &&
+		test_must_fail git rev-parse refs/remotes/remote2/side2
+	)
+'
+
 test_expect_success '--auto and --schedule incompatible' '
 	test_must_fail git maintenance run --auto --schedule=daily 2>err &&
 	test_grep "at most one" err

base-commit: 063bcebf0c917140ca0e705cbe0fdea127e90086
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] maintenance: add prune-remote-refs task
  2024-12-23  9:36 [PATCH] maintenance: add prune-remote-refs task Shubham Kanodia via GitGitGadget
@ 2024-12-27  9:07 ` Junio C Hamano
  2024-12-28  9:58   ` Shubham Kanodia
  2024-12-28 10:07 ` [PATCH v2] " Shubham Kanodia via GitGitGadget
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2024-12-27  9:07 UTC (permalink / raw)
  To: Shubham Kanodia via GitGitGadget; +Cc: git, ps, Shubham Kanodia

Thanks for a patch.


"Shubham Kanodia via GitGitGadget" <gitgitgadget@gmail.com> writes:

You'd want to check your procedure to tell GGG about addresses; I am
seeing these

    From: "Shubham Kanodia via GitGitGadget" <gitgitgadget@gmail.com>
    To: git@vger.kernel.org
    Cc: "mailto:gitster@pobox.com" <[gitster@pobox.com]>,
            "mailto:ps@pks.im" <[ps@pks.im]>,
            Shubham Kanodia <shubham.kanodia10@gmail.com>,
            Shubham Kanodia <shubham.kanodia10@gmail.com>

and Cc addresses in it would probably not work as-is (I've fixed
them up manually).

> From: Shubham Kanodia <shubham.kanodia10@gmail.com>
>
> Remote-tracking refs can accumulate in local repositories even as branches
> are deleted on remotes, impacting git performance negatively. Existing
> alternatives to keep refs pruned have a few issues:
>
> 1. The `fetch.prune` config automatically cleans up remote ref on fetch,
> but also pulls in new ref from remote which is an undesirable side-effect.

This makes it sound as if fetch.prune configuration makes new refs
pulled, but that is not what happens and that is not what you wanted
to hint.

	If you run "git fetch" with the "--prune" option (or with
	the fetch.prune configuration set to true) while having the
	default refspec "+refs/heads/*:refs/remotes/$name/*"
	configured in remote.$name.fetch, then ...

> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
> index 6e6651309d3..0c8f1e01ccd 100644
> --- a/Documentation/git-maintenance.txt
> +++ b/Documentation/git-maintenance.txt
> @@ -158,6 +158,26 @@ pack-refs::
>  	need to iterate across many references. See linkgit:git-pack-refs[1]
>  	for more information.
>  
> +prune-remote-refs::
> +	The `prune-remote-refs` task runs `git remote prune` on each remote
> +	repository registered in the local repository. This task helps clean
> +	up deleted remote branches, improving the performance of operations
> +	that iterate through the refs. See linkgit:git-remote[1] for more
> +	information. This task is disabled by default.
> ++
> +NOTE: This task is opt-in to prevent unexpected removal of remote refs
> +for users of git-maintenance. For most users, configuring `fetch.prune=true`
> +is a acceptable solution, as it will automatically clean up stale remote-tracking
> +branches during normal fetch operations. However, this task can be useful in
> +specific scenarios:
> ++
> +--
> +* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`)
> +  where `fetch.prune` would not affect refs outside the fetched hierarchy

The word "hierarchy" hints that things under refs/remotes/origin/
(which is the hierarchy 'foo' is fetched into) that went away would
be pruned, but that is not what happens (otherwise you would not be
adding this feature).

> +* When third-party tools might perform unexpected full fetches, and you want
> +  periodic cleanup independently of fetch operations

You'd want a full-stop after these two sentences, by the way.

> diff --git a/builtin/gc.c b/builtin/gc.c
> index 4ae5196aedf..9acf1d29895 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -20,6 +20,7 @@
>  #include "lockfile.h"
>  #include "parse-options.h"
>  #include "run-command.h"
> +#include "remote.h"
>  #include "sigchain.h"
>  #include "strvec.h"
>  #include "commit.h"
> @@ -913,6 +914,40 @@ static int maintenance_opt_schedule(const struct option *opt, const char *arg,
>  	return 0;
>  }
>  
> +static int collect_remote(struct remote *remote, void *cb_data)
> +{
> +	struct string_list *list = cb_data;
> +
> +	if (!remote->url.nr)
> +		return 0;
> +
> +	string_list_append(list, remote->name);
> +	return 0;
> +}
> +
> +static int maintenance_task_prune_remote(struct maintenance_run_opts *opts UNUSED,
> +					 struct gc_config *cfg UNUSED)
> +{
> +	struct string_list_item *item;
> +	struct string_list remotes_list = STRING_LIST_INIT_NODUP;
> +	struct child_process child = CHILD_PROCESS_INIT;
> +	int result = 0;
> +
> +	for_each_remote(collect_remote, &remotes_list);
> +
> +	for_each_string_list_item (item, &remotes_list) {
> +		const char *remote_name = item->string;
> +		child.git_cmd = 1;
> +		strvec_pushl(&child.args, "remote", "prune", remote_name, NULL);
> +
> +		if (run_command(&child))
> +			result = error(_("failed to prune '%s'"), remote_name);
> +	}

Hmph, is there a reason why you need two loops, instead of
for-each-remote calling a function that does the run_command()
thing?

"git grep for_each_string_list_item \*.c" tells me that we almost
never write SP between the macro name and the opening parenthesis.

This loop does not stop at the first error, but returns a non-zero
error after noticing even a single remote fail to run prune, which
sounds like a seneible design.  Would an error percolate up the same
way when two different tasks run and one of them fails in the
control folow in "git maintenance"?  Just want to see if we are
being consistent with the surrounding code.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] maintenance: add prune-remote-refs task
  2024-12-27  9:07 ` Junio C Hamano
@ 2024-12-28  9:58   ` Shubham Kanodia
  2024-12-28 16:05     ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Shubham Kanodia @ 2024-12-28  9:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shubham Kanodia via GitGitGadget, git, ps

On Fri, Dec 27, 2024 at 2:37 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Thanks for a patch.
>
>
> "Shubham Kanodia via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> You'd want to check your procedure to tell GGG about addresses; I am
> seeing these
>
>     From: "Shubham Kanodia via GitGitGadget" <gitgitgadget@gmail.com>
>     To: git@vger.kernel.org
>     Cc: "mailto:gitster@pobox.com" <[gitster@pobox.com]>,
>             "mailto:ps@pks.im" <[ps@pks.im]>,
>             Shubham Kanodia <shubham.kanodia10@gmail.com>,
>             Shubham Kanodia <shubham.kanodia10@gmail.com>
>
> and Cc addresses in it would probably not work as-is (I've fixed
> them up manually).

I think the GGG comment had a few formatting errors. Thanks for fixing the cc.

> Hmph, is there a reason why you need two loops, instead of
> for-each-remote calling a function that does the run_command()
> thing?

It can be collapsed into one.

> This loop does not stop at the first error, but returns a non-zero
> error after noticing even a single remote fail to run prune, which
> sounds like a seneible design.  Would an error percolate up the same
> way when two different tasks run and one of them fails in the
> control folow in "git maintenance"?  Just want to see if we are
> being consistent with the surrounding code.

Fair point. I'll make the process flow identical to the prefetch refs
task that works similarly across remotes.
It returns as soon as the first remote fails (without necessarily
affecting other tasks).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] maintenance: add prune-remote-refs task
  2024-12-23  9:36 [PATCH] maintenance: add prune-remote-refs task Shubham Kanodia via GitGitGadget
  2024-12-27  9:07 ` Junio C Hamano
@ 2024-12-28 10:07 ` Shubham Kanodia via GitGitGadget
  2024-12-28 16:25   ` Junio C Hamano
                     ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Shubham Kanodia via GitGitGadget @ 2024-12-28 10:07 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Shubham Kanodia,
	Shubham Kanodia

From: Shubham Kanodia <shubham.kanodia10@gmail.com>

Remote-tracking refs can accumulate in local repositories even as branches
are deleted on remotes, impacting git performance negatively. Existing
alternatives to keep refs pruned have a few issues — 

1. Running `git fetch` with either `--prune` or `fetch.prune=true` set will
prune stale refs, but requires a manual operation and also pulls in new
refs from remote which can be an undesirable side-effect.

2.`git remote prune` cleans up refs without adding to the existing list
but requires periodic user intervention.

This adds a new maintenance task 'prune-remote-refs' that runs
'git remote prune' for each configured remote daily. This provides an
automated way to clean up stale remote-tracking refs — especially when
users may not do a full fetch.

This task is disabled by default.

Signed-off-by: Shubham Kanodia <shubham.kanodia10@gmail.com>
---
    maintenance: add prune-remote-refs task
    
    As discussed previously on:
    https://lore.kernel.org/git/xmqqwmfr112w.fsf@gitster.g/T/#t
    
    Remote-tracking refs can accumulate in local repositories even as
    branches are deleted on remotes, impacting git performance negatively.
    Existing alternatives to keep refs pruned have a few issues — 
    
     1. The fetch.prune config automatically cleans up remote ref on fetch,
        but also pulls in new ref from remote which is an undesirable
        side-effect.
    
    2.git remote prune cleans up refs without adding to the existing list
    but requires periodic user intervention.
    
    This adds a new maintenance task 'prune-remote-refs' that runs 'git
    remote prune' for each configured remote daily. This provides an
    automated way to clean up stale remote-tracking refs — especially when
    users may not do a full fetch.
    
    This task is disabled by default.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1838%2Fpastelsky%2Fsk%2Fadd-remote-prune-maintenance-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1838/pastelsky/sk/add-remote-prune-maintenance-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1838

Range-diff vs v1:

 1:  72e27d3ebe6 ! 1:  4d6c143c970 maintenance: add prune-remote-refs task
     @@ Commit message
          are deleted on remotes, impacting git performance negatively. Existing
          alternatives to keep refs pruned have a few issues — 
      
     -    1. The `fetch.prune` config automatically cleans up remote ref on fetch,
     -    but also pulls in new ref from remote which is an undesirable side-effect.
     +    1. Running `git fetch` with either `--prune` or `fetch.prune=true` set will
     +    prune stale refs, but requires a manual operation and also pulls in new
     +    refs from remote which can be an undesirable side-effect.
      
          2.`git remote prune` cleans up refs without adding to the existing list
          but requires periodic user intervention.
     @@ Documentation/git-maintenance.txt: pack-refs::
      ++
      +--
      +* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`)
     -+  where `fetch.prune` would not affect refs outside the fetched hierarchy
     ++  where `fetch.prune` would only affect refs that are explicitly fetched.
      +* When third-party tools might perform unexpected full fetches, and you want
     -+  periodic cleanup independently of fetch operations
     ++  periodic cleanup independently of fetch operations.
      +--
      +
       OPTIONS
     @@ builtin/gc.c: static int maintenance_opt_schedule(const struct option *opt, cons
       	return 0;
       }
       
     -+static int collect_remote(struct remote *remote, void *cb_data)
     ++static int prune_remote(struct remote *remote, void *cb_data UNUSED)
      +{
     -+	struct string_list *list = cb_data;
     ++	struct child_process child = CHILD_PROCESS_INIT;
      +
      +	if (!remote->url.nr)
      +		return 0;
      +
     -+	string_list_append(list, remote->name);
     -+	return 0;
     ++	child.git_cmd = 1;
     ++	strvec_pushl(&child.args, "remote", "prune", remote->name, NULL);
     ++
     ++	return !!run_command(&child);
      +}
      +
     -+static int maintenance_task_prune_remote(struct maintenance_run_opts *opts UNUSED,
     ++static int maintenance_task_prune_remote(struct maintenance_run_opts *opts,
      +					 struct gc_config *cfg UNUSED)
      +{
     -+	struct string_list_item *item;
     -+	struct string_list remotes_list = STRING_LIST_INIT_NODUP;
     -+	struct child_process child = CHILD_PROCESS_INIT;
     -+	int result = 0;
     -+
     -+	for_each_remote(collect_remote, &remotes_list);
     -+
     -+	for_each_string_list_item (item, &remotes_list) {
     -+		const char *remote_name = item->string;
     -+		child.git_cmd = 1;
     -+		strvec_pushl(&child.args, "remote", "prune", remote_name, NULL);
     -+
     -+		if (run_command(&child))
     -+			result = error(_("failed to prune '%s'"), remote_name);
     ++	if (for_each_remote(prune_remote, opts)) {
     ++		error(_("failed to prune remotes"));
     ++		return 1;
      +	}
      +
     -+	string_list_clear(&remotes_list, 0);
     -+	return result;
     ++	return 0;
      +}
      +
       /* Remember to update object flag allocation in object.h */


 Documentation/git-maintenance.txt | 20 ++++++++++++++
 builtin/gc.c                      | 32 ++++++++++++++++++++++
 t/t7900-maintenance.sh            | 44 +++++++++++++++++++++++++++++++
 3 files changed, 96 insertions(+)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 6e6651309d3..8b3e496c8ef 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -158,6 +158,26 @@ pack-refs::
 	need to iterate across many references. See linkgit:git-pack-refs[1]
 	for more information.
 
+prune-remote-refs::
+	The `prune-remote-refs` task runs `git remote prune` on each remote
+	repository registered in the local repository. This task helps clean
+	up deleted remote branches, improving the performance of operations
+	that iterate through the refs. See linkgit:git-remote[1] for more
+	information. This task is disabled by default.
++
+NOTE: This task is opt-in to prevent unexpected removal of remote refs
+for users of git-maintenance. For most users, configuring `fetch.prune=true`
+is a acceptable solution, as it will automatically clean up stale remote-tracking
+branches during normal fetch operations. However, this task can be useful in
+specific scenarios:
++
+--
+* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`)
+  where `fetch.prune` would only affect refs that are explicitly fetched.
+* When third-party tools might perform unexpected full fetches, and you want
+  periodic cleanup independently of fetch operations.
+--
+
 OPTIONS
 -------
 --auto::
diff --git a/builtin/gc.c b/builtin/gc.c
index 4ae5196aedf..329c764f300 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -20,6 +20,7 @@
 #include "lockfile.h"
 #include "parse-options.h"
 #include "run-command.h"
+#include "remote.h"
 #include "sigchain.h"
 #include "strvec.h"
 #include "commit.h"
@@ -913,6 +914,30 @@ static int maintenance_opt_schedule(const struct option *opt, const char *arg,
 	return 0;
 }
 
+static int prune_remote(struct remote *remote, void *cb_data UNUSED)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+
+	if (!remote->url.nr)
+		return 0;
+
+	child.git_cmd = 1;
+	strvec_pushl(&child.args, "remote", "prune", remote->name, NULL);
+
+	return !!run_command(&child);
+}
+
+static int maintenance_task_prune_remote(struct maintenance_run_opts *opts,
+					 struct gc_config *cfg UNUSED)
+{
+	if (for_each_remote(prune_remote, opts)) {
+		error(_("failed to prune remotes"));
+		return 1;
+	}
+
+	return 0;
+}
+
 /* Remember to update object flag allocation in object.h */
 #define SEEN		(1u<<0)
 
@@ -1375,6 +1400,7 @@ enum maintenance_task_label {
 	TASK_GC,
 	TASK_COMMIT_GRAPH,
 	TASK_PACK_REFS,
+	TASK_PRUNE_REMOTE_REFS,
 
 	/* Leave as final value */
 	TASK__COUNT
@@ -1411,6 +1437,10 @@ static struct maintenance_task tasks[] = {
 		maintenance_task_pack_refs,
 		pack_refs_condition,
 	},
+	[TASK_PRUNE_REMOTE_REFS] = {
+		"prune-remote-refs",
+		maintenance_task_prune_remote,
+	},
 };
 
 static int compare_tasks_by_selection(const void *a_, const void *b_)
@@ -1505,6 +1535,8 @@ static void initialize_maintenance_strategy(void)
 		tasks[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY;
 		tasks[TASK_PACK_REFS].enabled = 1;
 		tasks[TASK_PACK_REFS].schedule = SCHEDULE_WEEKLY;
+		tasks[TASK_PRUNE_REMOTE_REFS].enabled = 0;
+		tasks[TASK_PRUNE_REMOTE_REFS].schedule = SCHEDULE_DAILY;
 	}
 }
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 0ce4ba1cbef..60a0c3f8353 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -446,6 +446,50 @@ test_expect_success 'pack-refs task' '
 	test_subcommand git pack-refs --all --prune <pack-refs.txt
 '
 
+test_expect_success 'prune-remote-refs task not enabled by default' '
+	git clone . prune-test &&
+	(
+		cd prune-test &&
+		GIT_TRACE2_EVENT="$(pwd)/prune.txt" git maintenance run 2>err &&
+		test_subcommand ! git remote prune origin <prune.txt
+	)
+'
+
+test_expect_success 'prune-remote-refs task cleans stale remote refs' '
+	test_commit initial &&
+
+	# Create two separate remote repos
+	git clone . remote1 &&
+	git clone . remote2 &&
+
+	git clone . prune-test-clean &&
+	(
+		cd prune-test-clean &&
+		git config maintenance.prune-remote-refs.enabled true &&
+
+		# Add both remotes
+		git remote add remote1 "../remote1" &&
+		git remote add remote2 "../remote2" &&
+
+		# Create and push branches to both remotes
+		git branch -f side2 HEAD &&
+		git push remote1 side2 &&
+		git push remote2 side2 &&
+
+		# Rename branches in each remote to simulate a stale branch
+		git -C ../remote1 branch -m side2 side3 &&
+		git -C ../remote2 branch -m side2 side4 &&
+
+		GIT_TRACE2_EVENT="$(pwd)/prune.txt" git maintenance run --task=prune-remote-refs &&
+
+		# Verify pruning happened for both remotes
+		test_subcommand git remote prune remote1 <prune.txt &&
+		test_subcommand git remote prune remote2 <prune.txt &&
+		test_must_fail git rev-parse refs/remotes/remote1/side2 &&
+		test_must_fail git rev-parse refs/remotes/remote2/side2
+	)
+'
+
 test_expect_success '--auto and --schedule incompatible' '
 	test_must_fail git maintenance run --auto --schedule=daily 2>err &&
 	test_grep "at most one" err

base-commit: 063bcebf0c917140ca0e705cbe0fdea127e90086
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] maintenance: add prune-remote-refs task
  2024-12-28  9:58   ` Shubham Kanodia
@ 2024-12-28 16:05     ` Junio C Hamano
  2024-12-28 16:24       ` Shubham Kanodia
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2024-12-28 16:05 UTC (permalink / raw)
  To: Shubham Kanodia; +Cc: Shubham Kanodia via GitGitGadget, git, ps

Shubham Kanodia <shubham.kanodia10@gmail.com> writes:

>> Hmph, is there a reason why you need two loops, instead of
>> for-each-remote calling a function that does the run_command()
>> thing?
>
> It can be collapsed into one.

Sorry, but that is not an answer, as my question was not a
suggestion to change anything.

It was a question asking you if there was a specific reason why the
code was structured the way it was written.  If there is another way
to write it, you need to answer why the alternative wasn't picked.

>> This loop does not stop at the first error, but returns a non-zero
>> error after noticing even a single remote fail to run prune, which
>> sounds like a seneible design.  Would an error percolate up the same
>> way when two different tasks run and one of them fails in the
>> control folow in "git maintenance"?  Just want to see if we are
>> being consistent with the surrounding code.
>
> Fair point. I'll make the process flow identical to the prefetch refs
> task that works similarly across remotes.
> It returns as soon as the first remote fails (without necessarily
> affecting other tasks).

... and the first failure signals the caller a failure?  That would
match what you did in your new feature, which is perfect.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] maintenance: add prune-remote-refs task
  2024-12-28 16:05     ` Junio C Hamano
@ 2024-12-28 16:24       ` Shubham Kanodia
  0 siblings, 0 replies; 15+ messages in thread
From: Shubham Kanodia @ 2024-12-28 16:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shubham Kanodia via GitGitGadget, git, ps

On Sat, Dec 28, 2024 at 9:35 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Shubham Kanodia <shubham.kanodia10@gmail.com> writes:
>
> >> Hmph, is there a reason why you need two loops, instead of
> >> for-each-remote calling a function that does the run_command()
> >> thing?
> >
> > It can be collapsed into one.
>
> Sorry, but that is not an answer, as my question was not a
> suggestion to change anything.
>
> It was a question asking you if there was a specific reason why the
> code was structured the way it was written.  If there is another way
> to write it, you need to answer why the alternative wasn't picked.

There wasn't a good reason for doing it that way. I guess I was trying
to understand the second argument for `for_each_remote` would be best
used if the command was called directly (while avoiding a compilation
warning), but looking at a few other usages of `for_each_remote` I
realised that it could just be marked unused in this case (since we
aren't doing anything with it).

I should've probably looked deeper and learnt from existing patterns
(e.g. `maintenance_task_prefetch`) — which I have in my last patch.

> >> This loop does not stop at the first error, but returns a non-zero
> >> error after noticing even a single remote fail to run prune, which
> >> sounds like a seneible design.  Would an error percolate up the same
> >> way when two different tasks run and one of them fails in the
> >> control folow in "git maintenance"?  Just want to see if we are
> >> being consistent with the surrounding code.
> >
> > Fair point. I'll make the process flow identical to the prefetch refs
> > task that works similarly across remotes.
> > It returns as soon as the first remote fails (without necessarily
> > affecting other tasks).
>
> ... and the first failure signals the caller a failure?  That would
> match what you did in your new feature, which is perfect.

Exactly — the first failing remote will signal that the
`prune-remote-refs` task has failed via an immediate `return 1`.
The maintenance command uses this to register the exit code of the top
level command to 1, while continuing to execute all other tasks
anyway.

Thanks,
Shubham K

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] maintenance: add prune-remote-refs task
  2024-12-28 10:07 ` [PATCH v2] " Shubham Kanodia via GitGitGadget
@ 2024-12-28 16:25   ` Junio C Hamano
  2024-12-30  7:15   ` Patrick Steinhardt
  2025-01-03 18:13   ` [PATCH v3] " Shubham Kanodia via GitGitGadget
  2 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2024-12-28 16:25 UTC (permalink / raw)
  To: Shubham Kanodia via GitGitGadget; +Cc: git, Patrick Steinhardt, Shubham Kanodia

"Shubham Kanodia via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Shubham Kanodia <shubham.kanodia10@gmail.com>
>
> Remote-tracking refs can accumulate in local repositories even as branches
> are deleted on remotes, impacting git performance negatively. Existing
> alternatives to keep refs pruned have a few issues — 
>
> 1. Running `git fetch` with either `--prune` or `fetch.prune=true` set will
> prune stale refs, but requires a manual operation and also pulls in new
> refs from remote which can be an undesirable side-effect.

It is only true if you cloned without any tweaks.  For example, if
you cloned with the single-branch option, you would not pull in new
refs, wouldn't you?  Also "requires a manual operation" is not quite
a good rationale, as you could have placed such a "git fetch"
instead of "git remote prune", in the maintenance schedule.

For this to become an issue, the condition we saw in earlier
discussion, i.e.

    while having the *default* refspec
    "+refs/heads/*:refs/remotes/$name/*" configured in
    remote.$name.fetch

is crucial.  Since that is the default refspec "git clone" gives
you, your "git fetch --prune" would give you full set of refs while
pruning, and the end result is that you have up-to-date set of
remote-tracking branches (which you may not want).

> 2.`git remote prune` cleans up refs without adding to the existing list
> but requires periodic user intervention.

You have a SP after "1." but not after "2.".

> This adds a new maintenance task 'prune-remote-refs' that runs
> 'git remote prune' for each configured remote daily. This provides an
> automated way to clean up stale remote-tracking refs — especially when
> users may not do a full fetch.

"This adds" -> "Add".

I'd strike the latter sentence.  Regardless of what users do or do
not do, the automated clean-up is performed.

> This task is disabled by default.
>
> Signed-off-by: Shubham Kanodia <shubham.kanodia10@gmail.com>
> ---

> +NOTE: This task is opt-in to prevent unexpected removal of remote refs
> +for users of git-maintenance. For most users, configuring `fetch.prune=true`
> +is a acceptable solution, as it will automatically clean up stale remote-tracking

"a acceptable" -> "an acceptable".

> +branches during normal fetch operations. However, this task can be useful in
> +specific scenarios:
> ++
> +--
> +* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`)
> +  where `fetch.prune` would only affect refs that are explicitly fetched.
> +* When third-party tools might perform unexpected full fetches, and you want
> +  periodic cleanup independently of fetch operations.
> +--

Very well written.

> @@ -913,6 +914,30 @@ static int maintenance_opt_schedule(const struct option *opt, const char *arg,
>  	return 0;
>  }
>  
> +static int prune_remote(struct remote *remote, void *cb_data UNUSED)
> +{
> +	struct child_process child = CHILD_PROCESS_INIT;
> +
> +	if (!remote->url.nr)
> +		return 0;
> +
> +	child.git_cmd = 1;
> +	strvec_pushl(&child.args, "remote", "prune", remote->name, NULL);
> +
> +	return !!run_command(&child);
> +}
> +
> +static int maintenance_task_prune_remote(struct maintenance_run_opts *opts,
> +					 struct gc_config *cfg UNUSED)
> +{
> +	if (for_each_remote(prune_remote, opts)) {
> +		error(_("failed to prune remotes"));
> +		return 1;
> +	}
> +
> +	return 0;
> +}

Nice reuse of the program structure, which is very clean and easy to read.

Overall very well written.  Will queue, with attached range-diff.
Thanks.


--- >8 ---
1:  0ae9668b5c ! 1:  8a40f8b319 maintenance: add prune-remote-refs task
    @@ Commit message
     
         Remote-tracking refs can accumulate in local repositories even as branches
         are deleted on remotes, impacting git performance negatively. Existing
    -    alternatives to keep refs pruned have a few issues — 
    +    alternatives to keep refs pruned have a few issues:
     
    -    1. Running `git fetch` with either `--prune` or `fetch.prune=true` set will
    -    prune stale refs, but requires a manual operation and also pulls in new
    -    refs from remote which can be an undesirable side-effect.
    +      1. Running `git fetch` with either `--prune` or `fetch.prune=true`
    +         set, with the default refspec to copy all their branches into
    +         our remote-tracking branches, will prune stale refs, but also
    +         pulls in new branches from remote.  That is undesirable if the
    +         user wants to only work with a selected few remote branches.
     
    -    2.`git remote prune` cleans up refs without adding to the existing list
    -    but requires periodic user intervention.
    +      2. `git remote prune` cleans up refs without adding to the
    +         existing list but requires periodic user intervention.
     
    -    This adds a new maintenance task 'prune-remote-refs' that runs
    -    'git remote prune' for each configured remote daily. This provides an
    -    automated way to clean up stale remote-tracking refs — especially when
    -    users may not do a full fetch.
    -
    -    This task is disabled by default.
    +    Add a new maintenance task 'prune-remote-refs' that runs 'git remote
    +    prune' for each configured remote daily.  Leave the task disabled by
    +    default, as it may be unexpected to see their remote-tracking
    +    branches to disappear while they are not watching for unsuspecting
    +    users.
     
         Signed-off-by: Shubham Kanodia <shubham.kanodia10@gmail.com>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
    @@ Documentation/git-maintenance.txt: pack-refs::
     ++
     +NOTE: This task is opt-in to prevent unexpected removal of remote refs
     +for users of git-maintenance. For most users, configuring `fetch.prune=true`
    -+is a acceptable solution, as it will automatically clean up stale remote-tracking
    ++is an acceptable solution, as it will automatically clean up stale remote-tracking
     +branches during normal fetch operations. However, this task can be useful in
     +specific scenarios:
     ++


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] maintenance: add prune-remote-refs task
  2024-12-28 10:07 ` [PATCH v2] " Shubham Kanodia via GitGitGadget
  2024-12-28 16:25   ` Junio C Hamano
@ 2024-12-30  7:15   ` Patrick Steinhardt
  2024-12-30 14:05     ` Junio C Hamano
  2025-01-03 18:13   ` [PATCH v3] " Shubham Kanodia via GitGitGadget
  2 siblings, 1 reply; 15+ messages in thread
From: Patrick Steinhardt @ 2024-12-30  7:15 UTC (permalink / raw)
  To: Shubham Kanodia via GitGitGadget; +Cc: git, Junio C Hamano, Shubham Kanodia

On Sat, Dec 28, 2024 at 10:07:41AM +0000, Shubham Kanodia via GitGitGadget wrote:
> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
> index 6e6651309d3..8b3e496c8ef 100644
> --- a/Documentation/git-maintenance.txt
> +++ b/Documentation/git-maintenance.txt
> @@ -158,6 +158,26 @@ pack-refs::
>  	need to iterate across many references. See linkgit:git-pack-refs[1]
>  	for more information.
>  
> +prune-remote-refs::
> +	The `prune-remote-refs` task runs `git remote prune` on each remote
> +	repository registered in the local repository. This task helps clean
> +	up deleted remote branches, improving the performance of operations
> +	that iterate through the refs. See linkgit:git-remote[1] for more
> +	information. This task is disabled by default.
> ++
> +NOTE: This task is opt-in to prevent unexpected removal of remote refs
> +for users of git-maintenance. For most users, configuring `fetch.prune=true`

Do we want to make this linkgit:git-maintenance[1] even though this is
self-referential?

> +is a acceptable solution, as it will automatically clean up stale remote-tracking
> +branches during normal fetch operations. However, this task can be useful in
> +specific scenarios:
> ++
> +--
> +* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`)
> +  where `fetch.prune` would only affect refs that are explicitly fetched.
> +* When third-party tools might perform unexpected full fetches, and you want
> +  periodic cleanup independently of fetch operations.
> +--

Nicely explained. I wish we had more such documentation that is taking
the user by their hand and explains why they may or may not want to have
a specific thing.

> diff --git a/builtin/gc.c b/builtin/gc.c
> index 4ae5196aedf..329c764f300 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -20,6 +20,7 @@
>  #include "lockfile.h"
>  #include "parse-options.h"
>  #include "run-command.h"
> +#include "remote.h"
>  #include "sigchain.h"
>  #include "strvec.h"
>  #include "commit.h"
> @@ -913,6 +914,30 @@ static int maintenance_opt_schedule(const struct option *opt, const char *arg,
>  	return 0;
>  }
>  
> +static int prune_remote(struct remote *remote, void *cb_data UNUSED)
> +{
> +	struct child_process child = CHILD_PROCESS_INIT;
> +
> +	if (!remote->url.nr)
> +		return 0;
> +
> +	child.git_cmd = 1;
> +	strvec_pushl(&child.args, "remote", "prune", remote->name, NULL);
> +
> +	return !!run_command(&child);
> +}
> +
> +static int maintenance_task_prune_remote(struct maintenance_run_opts *opts,
> +					 struct gc_config *cfg UNUSED)
> +{
> +	if (for_each_remote(prune_remote, opts)) {
> +		error(_("failed to prune remotes"));
> +		return 1;

I wonder whether we should adapt the loop to be eager. Erroring out on
the first failed remote would potentially mean that none of the other
remotes may get pruned. So if you had a now-unreachable remote as first
remote then none of your remotes would be pruned.

If so, we may want to collect the names of failed remotes and print
them, as well.

Patrick

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] maintenance: add prune-remote-refs task
  2024-12-30  7:15   ` Patrick Steinhardt
@ 2024-12-30 14:05     ` Junio C Hamano
  2025-01-03  6:50       ` Shubham Kanodia
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2024-12-30 14:05 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Shubham Kanodia via GitGitGadget, git, Shubham Kanodia

Patrick Steinhardt <ps@pks.im> writes:

> On Sat, Dec 28, 2024 at 10:07:41AM +0000, Shubham Kanodia via GitGitGadget wrote:
>> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
>> index 6e6651309d3..8b3e496c8ef 100644
>> --- a/Documentation/git-maintenance.txt
>> +++ b/Documentation/git-maintenance.txt
>> @@ -158,6 +158,26 @@ pack-refs::
>>  	need to iterate across many references. See linkgit:git-pack-refs[1]
>>  	for more information.
>>  
>> +prune-remote-refs::
>> +	The `prune-remote-refs` task runs `git remote prune` on each remote
>> +	repository registered in the local repository. This task helps clean
>> +	up deleted remote branches, improving the performance of operations
>> +	that iterate through the refs. See linkgit:git-remote[1] for more
>> +	information. This task is disabled by default.
>> ++
>> +NOTE: This task is opt-in to prevent unexpected removal of remote refs
>> +for users of git-maintenance. For most users, configuring `fetch.prune=true`
>
> Do we want to make this linkgit:git-maintenance[1] even though this is
> self-referential?

That certainly is a thought---the rule could be "whenever we refer
to a Git command, we refer to it in a uniform way".  An alternative
would be "of git-maintenance" -> "of this command" to weaken it.

This refers to those users who want to use the command for other
reasons (you use the scheduled tasks driven by 'git maintenance'
only because you wanted the 'gc' and 'pack-refs' tasks to run, you
do not necessarily want to run a new kind of task the new version of
Git started supporting, especially when the task is destructive,
like this one).  We might want to stress that point, perhaps?  If a
reader reads this part of the documentation, finds this task useful
and decides to use 'git maintenance', the note would sound somewhat
nonsensical to them---"I thought about the ramifications, I decided
I wanted to use the command, why would it be opt-in?" is a plausible
confusion.

>> +is a acceptable solution, as it will automatically clean up stale remote-tracking
>> +branches during normal fetch operations. However, this task can be useful in
>> +specific scenarios:
>> ++
>> +--
>> +* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`)
>> +  where `fetch.prune` would only affect refs that are explicitly fetched.
>> +* When third-party tools might perform unexpected full fetches, and you want
>> +  periodic cleanup independently of fetch operations.
>> +--
>
> Nicely explained. I wish we had more such documentation that is taking
> the user by their hand and explains why they may or may not want to have
> a specific thing.

Yes, a configuration or an option that are not for everybody and for
every situation need such a guidance, and this one is done nicely.

>> +static int maintenance_task_prune_remote(struct maintenance_run_opts *opts,
>> +					 struct gc_config *cfg UNUSED)
>> +{
>> +	if (for_each_remote(prune_remote, opts)) {
>> +		error(_("failed to prune remotes"));
>> +		return 1;
>
> I wonder whether we should adapt the loop to be eager. Erroring out on
> the first failed remote would potentially mean that none of the other
> remotes may get pruned. So if you had a now-unreachable remote as first
> remote then none of your remotes would be pruned.

I think the structure, hence the behaviour, is shared with an
existing prefetch task.  I think the current way is OK-ish, but
given that we are not in a hurry, we may want to correct the
semantics for both of them before unleashing this new task to the
world.

For that, we need the callback functions given to for_each_remote
(i.e., fetch_remote and prune_remote) to always return "success" in
the sense to tell "I am done with this remote" to allow the loop to
continue to the next remote, and convey the failure from the
subcommand via some other means (like flipping a bit in the cbdata).

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] maintenance: add prune-remote-refs task
  2024-12-30 14:05     ` Junio C Hamano
@ 2025-01-03  6:50       ` Shubham Kanodia
  2025-01-03  7:38         ` Patrick Steinhardt
  0 siblings, 1 reply; 15+ messages in thread
From: Shubham Kanodia @ 2025-01-03  6:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, Shubham Kanodia via GitGitGadget, git

On Mon, Dec 30, 2024 at 7:35 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Patrick Steinhardt <ps@pks.im> writes:
>
> > On Sat, Dec 28, 2024 at 10:07:41AM +0000, Shubham Kanodia via GitGitGadget wrote:
> >> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
> >> index 6e6651309d3..8b3e496c8ef 100644
> >> --- a/Documentation/git-maintenance.txt
> >> +++ b/Documentation/git-maintenance.txt
> >> @@ -158,6 +158,26 @@ pack-refs::
> >>      need to iterate across many references. See linkgit:git-pack-refs[1]
> >>      for more information.
> >>
> >> +prune-remote-refs::
> >> +    The `prune-remote-refs` task runs `git remote prune` on each remote
> >> +    repository registered in the local repository. This task helps clean
> >> +    up deleted remote branches, improving the performance of operations
> >> +    that iterate through the refs. See linkgit:git-remote[1] for more
> >> +    information. This task is disabled by default.
> >> ++
> >> +NOTE: This task is opt-in to prevent unexpected removal of remote refs
> >> +for users of git-maintenance. For most users, configuring `fetch.prune=true`
> >
> > Do we want to make this linkgit:git-maintenance[1] even though this is
> > self-referential?
>
> That certainly is a thought---the rule could be "whenever we refer
> to a Git command, we refer to it in a uniform way".  An alternative
> would be "of git-maintenance" -> "of this command" to weaken it.
>
> This refers to those users who want to use the command for other
> reasons (you use the scheduled tasks driven by 'git maintenance'
> only because you wanted the 'gc' and 'pack-refs' tasks to run, you
> do not necessarily want to run a new kind of task the new version of
> Git started supporting, especially when the task is destructive,
> like this one).  We might want to stress that point, perhaps?  If a
> reader reads this part of the documentation, finds this task useful
> and decides to use 'git maintenance', the note would sound somewhat
> nonsensical to them---"I thought about the ramifications, I decided
> I wanted to use the command, why would it be opt-in?" is a plausible
> confusion.
>
> >> +is a acceptable solution, as it will automatically clean up stale remote-tracking
> >> +branches during normal fetch operations. However, this task can be useful in
> >> +specific scenarios:
> >> ++
> >> +--
> >> +* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`)
> >> +  where `fetch.prune` would only affect refs that are explicitly fetched.
> >> +* When third-party tools might perform unexpected full fetches, and you want
> >> +  periodic cleanup independently of fetch operations.
> >> +--
> >
> > Nicely explained. I wish we had more such documentation that is taking
> > the user by their hand and explains why they may or may not want to have
> > a specific thing.
>
> Yes, a configuration or an option that are not for everybody and for
> every situation need such a guidance, and this one is done nicely.
>
> >> +static int maintenance_task_prune_remote(struct maintenance_run_opts *opts,
> >> +                                     struct gc_config *cfg UNUSED)
> >> +{
> >> +    if (for_each_remote(prune_remote, opts)) {
> >> +            error(_("failed to prune remotes"));
> >> +            return 1;
> >
> > I wonder whether we should adapt the loop to be eager. Erroring out on
> > the first failed remote would potentially mean that none of the other
> > remotes may get pruned. So if you had a now-unreachable remote as first
> > remote then none of your remotes would be pruned.
>
> I think the structure, hence the behaviour, is shared with an
> existing prefetch task.  I think the current way is OK-ish, but
> given that we are not in a hurry, we may want to correct the
> semantics for both of them before unleashing this new task to the
> world.
>
> For that, we need the callback functions given to for_each_remote
> (i.e., fetch_remote and prune_remote) to always return "success" in
> the sense to tell "I am done with this remote" to allow the loop to
> continue to the next remote, and convey the failure from the
> subcommand via some other means (like flipping a bit in the cbdata).
>
> Thanks.

Curious — I submitted my patches through GGG, but Junio was kind
enough to apply a few other fixes to it.
Is there a place I can now get the whole diff (with the range diff
patched in) so I can pull that into GGG?

Thanks,
Shubham

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] maintenance: add prune-remote-refs task
  2025-01-03  6:50       ` Shubham Kanodia
@ 2025-01-03  7:38         ` Patrick Steinhardt
  0 siblings, 0 replies; 15+ messages in thread
From: Patrick Steinhardt @ 2025-01-03  7:38 UTC (permalink / raw)
  To: Shubham Kanodia; +Cc: Junio C Hamano, Shubham Kanodia via GitGitGadget, git

On Fri, Jan 03, 2025 at 12:20:07PM +0530, Shubham Kanodia wrote:
> Curious — I submitted my patches through GGG, but Junio was kind
> enough to apply a few other fixes to it.
> Is there a place I can now get the whole diff (with the range diff
> patched in) so I can pull that into GGG?

Junio publishes his branches at [1], and yours specifically is called
"sk/maintenance-remote-prune". I'd typically just update my local branch
to match what he has in there.

Patrick

[1]: https://github.com/gitster/git.git

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3] maintenance: add prune-remote-refs task
  2024-12-28 10:07 ` [PATCH v2] " Shubham Kanodia via GitGitGadget
  2024-12-28 16:25   ` Junio C Hamano
  2024-12-30  7:15   ` Patrick Steinhardt
@ 2025-01-03 18:13   ` Shubham Kanodia via GitGitGadget
  2025-01-03 19:02     ` Junio C Hamano
  2 siblings, 1 reply; 15+ messages in thread
From: Shubham Kanodia via GitGitGadget @ 2025-01-03 18:13 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Shubham Kanodia,
	Shubham Kanodia

From: Shubham Kanodia <shubham.kanodia10@gmail.com>

Remote-tracking refs can accumulate in local repositories even as branches
are deleted on remotes, impacting git performance negatively. Existing
alternatives to keep refs pruned have a few issues:

  1. Running `git fetch` with either `--prune` or `fetch.prune=true`
     set, with the default refspec to copy all their branches into
     our remote-tracking branches, will prune stale refs, but also
     pulls in new branches from remote.  That is undesirable if the
     user wants to only work with a selected few remote branches.

  2. `git remote prune` cleans up refs without adding to the
     existing list but requires periodic user intervention.

Add a new maintenance task 'prune-remote-refs' that runs 'git remote
prune' for each configured remote daily.  Leave the task disabled by
default, as it may be unexpected to see their remote-tracking
branches to disappear while they are not watching for unsuspecting
users.

Signed-off-by: Shubham Kanodia <shubham.kanodia10@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
    maintenance: add prune-remote-refs task
    
    As discussed previously on:
    https://lore.kernel.org/git/xmqqwmfr112w.fsf@gitster.g/T/#t
    
    Remote-tracking refs can accumulate in local repositories even as
    branches are deleted on remotes, impacting git performance negatively.
    Existing alternatives to keep refs pruned have a few issues — 
    
     1. The fetch.prune config automatically cleans up remote ref on fetch,
        but also pulls in new ref from remote which is an undesirable
        side-effect.
    
    2.git remote prune cleans up refs without adding to the existing list
    but requires periodic user intervention.
    
    This adds a new maintenance task 'prune-remote-refs' that runs 'git
    remote prune' for each configured remote daily. This provides an
    automated way to clean up stale remote-tracking refs — especially when
    users may not do a full fetch.
    
    This task is disabled by default.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1838%2Fpastelsky%2Fsk%2Fadd-remote-prune-maintenance-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1838/pastelsky/sk/add-remote-prune-maintenance-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1838

Range-diff vs v2:

 1:  4d6c143c970 ! 1:  7954df1009a maintenance: add prune-remote-refs task
     @@ Commit message
      
          Remote-tracking refs can accumulate in local repositories even as branches
          are deleted on remotes, impacting git performance negatively. Existing
     -    alternatives to keep refs pruned have a few issues — 
     +    alternatives to keep refs pruned have a few issues:
      
     -    1. Running `git fetch` with either `--prune` or `fetch.prune=true` set will
     -    prune stale refs, but requires a manual operation and also pulls in new
     -    refs from remote which can be an undesirable side-effect.
     +      1. Running `git fetch` with either `--prune` or `fetch.prune=true`
     +         set, with the default refspec to copy all their branches into
     +         our remote-tracking branches, will prune stale refs, but also
     +         pulls in new branches from remote.  That is undesirable if the
     +         user wants to only work with a selected few remote branches.
      
     -    2.`git remote prune` cleans up refs without adding to the existing list
     -    but requires periodic user intervention.
     +      2. `git remote prune` cleans up refs without adding to the
     +         existing list but requires periodic user intervention.
      
     -    This adds a new maintenance task 'prune-remote-refs' that runs
     -    'git remote prune' for each configured remote daily. This provides an
     -    automated way to clean up stale remote-tracking refs — especially when
     -    users may not do a full fetch.
     -
     -    This task is disabled by default.
     +    Add a new maintenance task 'prune-remote-refs' that runs 'git remote
     +    prune' for each configured remote daily.  Leave the task disabled by
     +    default, as it may be unexpected to see their remote-tracking
     +    branches to disappear while they are not watching for unsuspecting
     +    users.
      
          Signed-off-by: Shubham Kanodia <shubham.kanodia10@gmail.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## Documentation/git-maintenance.txt ##
      @@ Documentation/git-maintenance.txt: pack-refs::
     @@ Documentation/git-maintenance.txt: pack-refs::
      +	information. This task is disabled by default.
      ++
      +NOTE: This task is opt-in to prevent unexpected removal of remote refs
     -+for users of git-maintenance. For most users, configuring `fetch.prune=true`
     -+is a acceptable solution, as it will automatically clean up stale remote-tracking
     ++for users of linkgit:git-maintenance[1]. For most users, configuring `fetch.prune=true`
     ++is an acceptable solution, as it will automatically clean up stale remote-tracking
      +branches during normal fetch operations. However, this task can be useful in
      +specific scenarios:
      ++
     @@ builtin/gc.c: static int maintenance_opt_schedule(const struct option *opt, cons
       	return 0;
       }
       
     -+static int prune_remote(struct remote *remote, void *cb_data UNUSED)
     ++struct remote_cb_data {
     ++	struct maintenance_run_opts *maintenance_opts;
     ++	struct string_list failed_remotes;
     ++};
     ++
     ++static void report_failed_remotes(struct string_list *failed_remotes,
     ++				  const char *action_name)
     ++{
     ++	if (failed_remotes->nr) {
     ++		int i;
     ++		struct strbuf msg = STRBUF_INIT;
     ++		strbuf_addf(&msg, _("failed to %s the following remotes: "),
     ++			    action_name);
     ++		for (i = 0; i < failed_remotes->nr; i++) {
     ++			if (i)
     ++				strbuf_addstr(&msg, ", ");
     ++			strbuf_addstr(&msg, failed_remotes->items[i].string);
     ++		}
     ++		error("%s", msg.buf);
     ++		strbuf_release(&msg);
     ++	}
     ++}
     ++
     ++static int prune_remote(struct remote *remote, void *cb_data)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
     ++	struct remote_cb_data *data = cb_data;
      +
      +	if (!remote->url.nr)
      +		return 0;
     @@ builtin/gc.c: static int maintenance_opt_schedule(const struct option *opt, cons
      +	child.git_cmd = 1;
      +	strvec_pushl(&child.args, "remote", "prune", remote->name, NULL);
      +
     -+	return !!run_command(&child);
     ++	if (run_command(&child))
     ++		string_list_append(&data->failed_remotes, remote->name);
     ++
     ++	return 0;
      +}
      +
      +static int maintenance_task_prune_remote(struct maintenance_run_opts *opts,
      +					 struct gc_config *cfg UNUSED)
      +{
     -+	if (for_each_remote(prune_remote, opts)) {
     -+		error(_("failed to prune remotes"));
     -+		return 1;
     -+	}
     ++	struct remote_cb_data cbdata = { .maintenance_opts = opts,
     ++					 .failed_remotes = STRING_LIST_INIT_DUP };
      +
     -+	return 0;
     ++	int result;
     ++	result = for_each_remote(prune_remote, &cbdata);
     ++
     ++	report_failed_remotes(&cbdata.failed_remotes, "prune");
     ++	if (cbdata.failed_remotes.nr)
     ++		result = 1;
     ++
     ++	string_list_clear(&cbdata.failed_remotes, 0);
     ++	return result;
      +}
      +
       /* Remember to update object flag allocation in object.h */
       #define SEEN		(1u<<0)
       
     +@@ builtin/gc.c: static int maintenance_task_commit_graph(struct maintenance_run_opts *opts,
     + 
     + static int fetch_remote(struct remote *remote, void *cbdata)
     + {
     +-	struct maintenance_run_opts *opts = cbdata;
     + 	struct child_process child = CHILD_PROCESS_INIT;
     ++	struct remote_cb_data *data = cbdata;
     + 
     + 	if (remote->skip_default_update)
     + 		return 0;
     +@@ builtin/gc.c: static int fetch_remote(struct remote *remote, void *cbdata)
     + 		     "--no-write-fetch-head", "--recurse-submodules=no",
     + 		     NULL);
     + 
     +-	if (opts->quiet)
     ++	if (data->maintenance_opts->quiet)
     + 		strvec_push(&child.args, "--quiet");
     + 
     +-	return !!run_command(&child);
     ++	if (run_command(&child))
     ++		string_list_append(&data->failed_remotes, remote->name);
     ++
     ++	return 0;
     + }
     + 
     + static int maintenance_task_prefetch(struct maintenance_run_opts *opts,
     + 				     struct gc_config *cfg UNUSED)
     + {
     +-	if (for_each_remote(fetch_remote, opts)) {
     +-		error(_("failed to prefetch remotes"));
     +-		return 1;
     ++	struct remote_cb_data cbdata = { .maintenance_opts = opts,
     ++					 .failed_remotes = STRING_LIST_INIT_DUP };
     ++
     ++	int result = 0;
     ++
     ++	if (for_each_remote(fetch_remote, &cbdata)) {
     ++		error(_("failed to prefetch some remotes"));
     ++		result = 1;
     + 	}
     + 
     +-	return 0;
     ++	report_failed_remotes(&cbdata.failed_remotes, "prefetch");
     ++	if (cbdata.failed_remotes.nr)
     ++		result = 1;
     ++
     ++	string_list_clear(&cbdata.failed_remotes, 0);
     ++	return result;
     + }
     + 
     + static int maintenance_task_gc(struct maintenance_run_opts *opts,
      @@ builtin/gc.c: enum maintenance_task_label {
       	TASK_GC,
       	TASK_COMMIT_GRAPH,
     @@ t/t7900-maintenance.sh: test_expect_success 'pack-refs task' '
      +		test_must_fail git rev-parse refs/remotes/remote2/side2
      +	)
      +'
     ++
     ++test_expect_success 'prune-remote-refs task continues to prune remotes even if some fail' '
     ++	test_commit initial-prune-remote-refs &&
     ++
     ++	git clone . remote-bad1 &&
     ++	git clone . remote-bad2 &&
     ++	git clone . remote-good &&
     ++
     ++	git clone . prune-test-partial &&
     ++	(
     ++		cd prune-test-partial &&
     ++		git config maintenance.prune-remote-refs.enabled true &&
     ++
     ++		# Add remotes in alphabetical order to ensure processing order
     ++		git remote add aaa-bad1 "../remote-bad1" &&
     ++		git remote add bbb-bad2 "../remote-bad2" &&
     ++		git remote add ccc-good "../remote-good" &&
     ++
     ++		# Create and push branches to all remotes
     ++		git branch -f side2 HEAD &&
     ++		git push aaa-bad1 side2 &&
     ++		git push bbb-bad2 side2 &&
     ++		git push ccc-good side2 &&
     ++
     ++		# Rename branch in good remote to simulate a stale branch
     ++		git -C ../remote-good branch -m side2 side3 &&
     ++
     ++		# Break the bad remotes by removing their directories
     ++		rm -rf ../remote-bad1 ../remote-bad2 &&
     ++
     ++		GIT_TRACE2_EVENT="$(pwd)/prune.txt" git maintenance run --task=prune-remote-refs 2>err || true &&
     ++
     ++		# Verify pruning happened for good remote despite bad remote failures
     ++		test_subcommand git remote prune ccc-good <prune.txt &&
     ++		test_must_fail git rev-parse refs/remotes/ccc-good/side2 &&
     ++		test_grep "error: failed to prune the following remotes: aaa-bad1, bbb-bad2" err
     ++	)
     ++'
      +
       test_expect_success '--auto and --schedule incompatible' '
       	test_must_fail git maintenance run --auto --schedule=daily 2>err &&


 Documentation/git-maintenance.txt | 20 +++++++
 builtin/gc.c                      | 92 ++++++++++++++++++++++++++++---
 t/t7900-maintenance.sh            | 82 +++++++++++++++++++++++++++
 3 files changed, 187 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 6e6651309d3..df59d43ec88 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -158,6 +158,26 @@ pack-refs::
 	need to iterate across many references. See linkgit:git-pack-refs[1]
 	for more information.
 
+prune-remote-refs::
+	The `prune-remote-refs` task runs `git remote prune` on each remote
+	repository registered in the local repository. This task helps clean
+	up deleted remote branches, improving the performance of operations
+	that iterate through the refs. See linkgit:git-remote[1] for more
+	information. This task is disabled by default.
++
+NOTE: This task is opt-in to prevent unexpected removal of remote refs
+for users of linkgit:git-maintenance[1]. For most users, configuring `fetch.prune=true`
+is an acceptable solution, as it will automatically clean up stale remote-tracking
+branches during normal fetch operations. However, this task can be useful in
+specific scenarios:
++
+--
+* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`)
+  where `fetch.prune` would only affect refs that are explicitly fetched.
+* When third-party tools might perform unexpected full fetches, and you want
+  periodic cleanup independently of fetch operations.
+--
+
 OPTIONS
 -------
 --auto::
diff --git a/builtin/gc.c b/builtin/gc.c
index a9b1c36de27..ae2a6762a92 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -23,6 +23,7 @@
 #include "lockfile.h"
 #include "parse-options.h"
 #include "run-command.h"
+#include "remote.h"
 #include "sigchain.h"
 #include "strvec.h"
 #include "commit.h"
@@ -916,6 +917,63 @@ static int maintenance_opt_schedule(const struct option *opt, const char *arg,
 	return 0;
 }
 
+struct remote_cb_data {
+	struct maintenance_run_opts *maintenance_opts;
+	struct string_list failed_remotes;
+};
+
+static void report_failed_remotes(struct string_list *failed_remotes,
+				  const char *action_name)
+{
+	if (failed_remotes->nr) {
+		int i;
+		struct strbuf msg = STRBUF_INIT;
+		strbuf_addf(&msg, _("failed to %s the following remotes: "),
+			    action_name);
+		for (i = 0; i < failed_remotes->nr; i++) {
+			if (i)
+				strbuf_addstr(&msg, ", ");
+			strbuf_addstr(&msg, failed_remotes->items[i].string);
+		}
+		error("%s", msg.buf);
+		strbuf_release(&msg);
+	}
+}
+
+static int prune_remote(struct remote *remote, void *cb_data)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+	struct remote_cb_data *data = cb_data;
+
+	if (!remote->url.nr)
+		return 0;
+
+	child.git_cmd = 1;
+	strvec_pushl(&child.args, "remote", "prune", remote->name, NULL);
+
+	if (run_command(&child))
+		string_list_append(&data->failed_remotes, remote->name);
+
+	return 0;
+}
+
+static int maintenance_task_prune_remote(struct maintenance_run_opts *opts,
+					 struct gc_config *cfg UNUSED)
+{
+	struct remote_cb_data cbdata = { .maintenance_opts = opts,
+					 .failed_remotes = STRING_LIST_INIT_DUP };
+
+	int result;
+	result = for_each_remote(prune_remote, &cbdata);
+
+	report_failed_remotes(&cbdata.failed_remotes, "prune");
+	if (cbdata.failed_remotes.nr)
+		result = 1;
+
+	string_list_clear(&cbdata.failed_remotes, 0);
+	return result;
+}
+
 /* Remember to update object flag allocation in object.h */
 #define SEEN		(1u<<0)
 
@@ -1036,8 +1094,8 @@ static int maintenance_task_commit_graph(struct maintenance_run_opts *opts,
 
 static int fetch_remote(struct remote *remote, void *cbdata)
 {
-	struct maintenance_run_opts *opts = cbdata;
 	struct child_process child = CHILD_PROCESS_INIT;
+	struct remote_cb_data *data = cbdata;
 
 	if (remote->skip_default_update)
 		return 0;
@@ -1048,21 +1106,34 @@ static int fetch_remote(struct remote *remote, void *cbdata)
 		     "--no-write-fetch-head", "--recurse-submodules=no",
 		     NULL);
 
-	if (opts->quiet)
+	if (data->maintenance_opts->quiet)
 		strvec_push(&child.args, "--quiet");
 
-	return !!run_command(&child);
+	if (run_command(&child))
+		string_list_append(&data->failed_remotes, remote->name);
+
+	return 0;
 }
 
 static int maintenance_task_prefetch(struct maintenance_run_opts *opts,
 				     struct gc_config *cfg UNUSED)
 {
-	if (for_each_remote(fetch_remote, opts)) {
-		error(_("failed to prefetch remotes"));
-		return 1;
+	struct remote_cb_data cbdata = { .maintenance_opts = opts,
+					 .failed_remotes = STRING_LIST_INIT_DUP };
+
+	int result = 0;
+
+	if (for_each_remote(fetch_remote, &cbdata)) {
+		error(_("failed to prefetch some remotes"));
+		result = 1;
 	}
 
-	return 0;
+	report_failed_remotes(&cbdata.failed_remotes, "prefetch");
+	if (cbdata.failed_remotes.nr)
+		result = 1;
+
+	string_list_clear(&cbdata.failed_remotes, 0);
+	return result;
 }
 
 static int maintenance_task_gc(struct maintenance_run_opts *opts,
@@ -1378,6 +1449,7 @@ enum maintenance_task_label {
 	TASK_GC,
 	TASK_COMMIT_GRAPH,
 	TASK_PACK_REFS,
+	TASK_PRUNE_REMOTE_REFS,
 
 	/* Leave as final value */
 	TASK__COUNT
@@ -1414,6 +1486,10 @@ static struct maintenance_task tasks[] = {
 		maintenance_task_pack_refs,
 		pack_refs_condition,
 	},
+	[TASK_PRUNE_REMOTE_REFS] = {
+		"prune-remote-refs",
+		maintenance_task_prune_remote,
+	},
 };
 
 static int compare_tasks_by_selection(const void *a_, const void *b_)
@@ -1508,6 +1584,8 @@ static void initialize_maintenance_strategy(void)
 		tasks[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY;
 		tasks[TASK_PACK_REFS].enabled = 1;
 		tasks[TASK_PACK_REFS].schedule = SCHEDULE_WEEKLY;
+		tasks[TASK_PRUNE_REMOTE_REFS].enabled = 0;
+		tasks[TASK_PRUNE_REMOTE_REFS].schedule = SCHEDULE_DAILY;
 	}
 }
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 1909aed95e0..34e8fa6b5fb 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -447,6 +447,88 @@ test_expect_success 'pack-refs task' '
 	test_subcommand git pack-refs --all --prune <pack-refs.txt
 '
 
+test_expect_success 'prune-remote-refs task not enabled by default' '
+	git clone . prune-test &&
+	(
+		cd prune-test &&
+		GIT_TRACE2_EVENT="$(pwd)/prune.txt" git maintenance run 2>err &&
+		test_subcommand ! git remote prune origin <prune.txt
+	)
+'
+
+test_expect_success 'prune-remote-refs task cleans stale remote refs' '
+	test_commit initial &&
+
+	# Create two separate remote repos
+	git clone . remote1 &&
+	git clone . remote2 &&
+
+	git clone . prune-test-clean &&
+	(
+		cd prune-test-clean &&
+		git config maintenance.prune-remote-refs.enabled true &&
+
+		# Add both remotes
+		git remote add remote1 "../remote1" &&
+		git remote add remote2 "../remote2" &&
+
+		# Create and push branches to both remotes
+		git branch -f side2 HEAD &&
+		git push remote1 side2 &&
+		git push remote2 side2 &&
+
+		# Rename branches in each remote to simulate a stale branch
+		git -C ../remote1 branch -m side2 side3 &&
+		git -C ../remote2 branch -m side2 side4 &&
+
+		GIT_TRACE2_EVENT="$(pwd)/prune.txt" git maintenance run --task=prune-remote-refs &&
+
+		# Verify pruning happened for both remotes
+		test_subcommand git remote prune remote1 <prune.txt &&
+		test_subcommand git remote prune remote2 <prune.txt &&
+		test_must_fail git rev-parse refs/remotes/remote1/side2 &&
+		test_must_fail git rev-parse refs/remotes/remote2/side2
+	)
+'
+
+test_expect_success 'prune-remote-refs task continues to prune remotes even if some fail' '
+	test_commit initial-prune-remote-refs &&
+
+	git clone . remote-bad1 &&
+	git clone . remote-bad2 &&
+	git clone . remote-good &&
+
+	git clone . prune-test-partial &&
+	(
+		cd prune-test-partial &&
+		git config maintenance.prune-remote-refs.enabled true &&
+
+		# Add remotes in alphabetical order to ensure processing order
+		git remote add aaa-bad1 "../remote-bad1" &&
+		git remote add bbb-bad2 "../remote-bad2" &&
+		git remote add ccc-good "../remote-good" &&
+
+		# Create and push branches to all remotes
+		git branch -f side2 HEAD &&
+		git push aaa-bad1 side2 &&
+		git push bbb-bad2 side2 &&
+		git push ccc-good side2 &&
+
+		# Rename branch in good remote to simulate a stale branch
+		git -C ../remote-good branch -m side2 side3 &&
+
+		# Break the bad remotes by removing their directories
+		rm -rf ../remote-bad1 ../remote-bad2 &&
+
+		GIT_TRACE2_EVENT="$(pwd)/prune.txt" git maintenance run --task=prune-remote-refs 2>err || true &&
+
+		# Verify pruning happened for good remote despite bad remote failures
+		test_subcommand git remote prune ccc-good <prune.txt &&
+		test_must_fail git rev-parse refs/remotes/ccc-good/side2 &&
+		test_grep "error: failed to prune the following remotes: aaa-bad1, bbb-bad2" err
+	)
+'
+
 test_expect_success '--auto and --schedule incompatible' '
 	test_must_fail git maintenance run --auto --schedule=daily 2>err &&
 	test_grep "at most one" err

base-commit: 76cf4f61c87855ebf0784b88aaf737d6b09f504b
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] maintenance: add prune-remote-refs task
  2025-01-03 18:13   ` [PATCH v3] " Shubham Kanodia via GitGitGadget
@ 2025-01-03 19:02     ` Junio C Hamano
       [not found]       ` <CAG=Um+1ch1sKC0H8MJoFv=6iSK3pvA=03AKXmvhm5DG=H8T1rw@mail.gmail.com>
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2025-01-03 19:02 UTC (permalink / raw)
  To: Shubham Kanodia via GitGitGadget
  Cc: git, Patrick Steinhardt, Shubham Kanodia, Derrick Stolee

"Shubham Kanodia via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Shubham Kanodia <shubham.kanodia10@gmail.com>
>
> Remote-tracking refs can accumulate in local repositories even as branches
> are deleted on remotes, impacting git performance negatively. Existing
> alternatives to keep refs pruned have a few issues:
>
>   1. Running `git fetch` with either `--prune` or `fetch.prune=true`
>      set, with the default refspec to copy all their branches into
>      our remote-tracking branches, will prune stale refs, but also
>      pulls in new branches from remote.  That is undesirable if the
>      user wants to only work with a selected few remote branches.
>
>   2. `git remote prune` cleans up refs without adding to the
>      existing list but requires periodic user intervention.
>
> Add a new maintenance task 'prune-remote-refs' that runs 'git remote
> prune' for each configured remote daily.  Leave the task disabled by
> default, as it may be unexpected to see their remote-tracking
> branches to disappear while they are not watching for unsuspecting
> users.

There is no description on how and why the prefetch job has been
modified here.

I haven't formed a strong opinion on the "should we keep going after
the first failure?" question yet, and if this topic is modifying the
way how the prefetch operates, the patch(es) should be CC'ed to the
author of that feature (The author of 28cb5e66 (maintenance: add
prefetch task, 2020-09-25) CC'ed).

If it turns out to be a good idea to do so, I would expect the topic
to consist of at least two patches:

 - [PATCH 1/2] to argue that it is a bug that the prefetch job stops
   at the first failed remote, and change its behaviour to prefetch
   from all remotes and then report a failure if the prefetch failed
   for any remote.  With some additional tests to check the updated
   behaviour.

 - [PATCH 2/2] to argue the need for periodic `remote prune`, and do
   the part of this patch that relates to that new feature.

> +struct remote_cb_data {
> +	struct maintenance_run_opts *maintenance_opts;
> +	struct string_list failed_remotes;
> +};
> +
> +static void report_failed_remotes(struct string_list *failed_remotes,
> +				  const char *action_name)
> +{
> +	if (failed_remotes->nr) {
> +		int i;
> +		struct strbuf msg = STRBUF_INIT;
> +		strbuf_addf(&msg, _("failed to %s the following remotes: "),
> +			    action_name);
> +		for (i = 0; i < failed_remotes->nr; i++) {
> +			if (i)
> +				strbuf_addstr(&msg, ", ");
> +			strbuf_addstr(&msg, failed_remotes->items[i].string);
> +		}
> +		error("%s", msg.buf);
> +		strbuf_release(&msg);
> +	}
> +}

A few comments:

 - The message pretends to be _("localizable"), but the sentence
   logo inserts action_name that is not translated.  If the
   operation failed only for a single remote, "following remotes" is
   grammatically incorrect.

 - Would it be useful to force this message to a single line, with
   multiple remote names concatenated with ","?  Computer output of
   a listing often is more useful without "," if it is meant to be
   consumable for cut-and-paste users.

Overall, I am fairly negative on the report this helper tries to
give the users.  Because we are going to do the operation on all
remotes anyway, wouldn't we have let the underlying operations (like
"git fetch" or "git remore prune") already issue error messages to
the user?  Do we need this extra reporting on top at all?

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] maintenance: add prune-remote-refs task
       [not found]       ` <CAG=Um+1ch1sKC0H8MJoFv=6iSK3pvA=03AKXmvhm5DG=H8T1rw@mail.gmail.com>
@ 2025-01-07 17:29         ` Shubham Kanodia
  2025-01-07 18:48           ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Shubham Kanodia @ 2025-01-07 17:29 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Shubham Kanodia via GitGitGadget, git, Patrick Steinhardt,
	Derrick Stolee

On Sat, Jan 4, 2025 at 1:23 AM Shubham Kanodia
<shubham.kanodia10@gmail.com> wrote:
>
>
>
> On Sat, 4 Jan 2025 at 12:32 AM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>> "Shubham Kanodia via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>> > From: Shubham Kanodia <shubham.kanodia10@gmail.com>
>> >
>> > Remote-tracking refs can accumulate in local repositories even as branches
>> > are deleted on remotes, impacting git performance negatively. Existing
>> > alternatives to keep refs pruned have a few issues:
>> >
>> >   1. Running `git fetch` with either `--prune` or `fetch.prune=true`
>> >      set, with the default refspec to copy all their branches into
>> >      our remote-tracking branches, will prune stale refs, but also
>> >      pulls in new branches from remote.  That is undesirable if the
>> >      user wants to only work with a selected few remote branches.
>> >
>> >   2. `git remote prune` cleans up refs without adding to the
>> >      existing list but requires periodic user intervention.
>> >
>> > Add a new maintenance task 'prune-remote-refs' that runs 'git remote
>> > prune' for each configured remote daily.  Leave the task disabled by
>> > default, as it may be unexpected to see their remote-tracking
>> > branches to disappear while they are not watching for unsuspecting
>> > users.
>>
>> There is no description on how and why the prefetch job has been
>> modified here.
>>
>> I haven't formed a strong opinion on the "should we keep going after
>> the first failure?" question yet, and if this topic is modifying the
>> way how the prefetch operates, the patch(es) should be CC'ed to the
>> author of that feature (The author of 28cb5e66 (maintenance: add
>> prefetch task, 2020-09-25) CC'ed).
>>
>> If it turns out to be a good idea to do so, I would expect the topic
>> to consist of at least two patches:
>>
>>  - [PATCH 1/2] to argue that it is a bug that the prefetch job stops
>>    at the first failed remote, and change its behaviour to prefetch
>>    from all remotes and then report a failure if the prefetch failed
>>    for any remote.  With some additional tests to check the updated
>>    behaviour.
>>
>>  - [PATCH 2/2] to argue the need for periodic `remote prune`, and do
>>    the part of this patch that relates to that new feature.
>>
>> > +struct remote_cb_data {
>> > +     struct maintenance_run_opts *maintenance_opts;
>> > +     struct string_list failed_remotes;
>> > +};
>> > +
>> > +static void report_failed_remotes(struct string_list *failed_remotes,
>> > +                               const char *action_name)
>> > +{
>> > +     if (failed_remotes->nr) {
>> > +             int i;
>> > +             struct strbuf msg = STRBUF_INIT;
>> > +             strbuf_addf(&msg, _("failed to %s the following remotes: "),
>> > +                         action_name);
>> > +             for (i = 0; i < failed_remotes->nr; i++) {
>> > +                     if (i)
>> > +                             strbuf_addstr(&msg, ", ");
>> > +                     strbuf_addstr(&msg, failed_remotes->items[i].string);
>> > +             }
>> > +             error("%s", msg.buf);
>> > +             strbuf_release(&msg);
>> > +     }
>> > +}
>>
>> A few comments:
>>
>>  - The message pretends to be _("localizable"), but the sentence
>>    logo inserts action_name that is not translated.  If the
>>    operation failed only for a single remote, "following remotes" is
>>    grammatically incorrect.
>>
>>  - Would it be useful to force this message to a single line, with
>>    multiple remote names concatenated with ","?  Computer output of
>>    a listing often is more useful without "," if it is meant to be
>>    consumable for cut-and-paste users.
>>
>> Overall, I am fairly negative on the report this helper tries to
>> give the users.  Because we are going to do the operation on all
>> remotes anyway, wouldn't we have let the underlying operations (like
>> "git fetch" or "git remore prune") already issue error messages to
>> the user?  Do we need this extra reporting on top at all?
>>
>> Thanks.
>
>
>
> If it’s fine, I’d like to discuss the change to process all remotes separately from the initial change I submitted.
>
> I took an attempt at it in my last patch, but I don’t know if I’m really going to have the time to iterate on it as it looks more involved.
>
> In any case the implementation isn’t any worse off than the current maintenance command.
>
> Also, as a new contributor I’m unsure of recalling / patching a change that’s already in seen at the moment unless it is an unworkable solution.
>
> Thanks again.
> Shubham K

Noticed an update on What's Cooking —

> * sk/maintenance-remote-prune (2025-01-03) 1 commit
> - maintenance: add prune-remote-refs task
>  A new periodic maintenance task to run "git remote prune" has been
 introduced.
> Expecting a reroll.

Junio, what do you think about my previous suggestion — do you think
that changing the remote behaviours is a blocker for this change to
make its way to master?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] maintenance: add prune-remote-refs task
  2025-01-07 17:29         ` Shubham Kanodia
@ 2025-01-07 18:48           ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2025-01-07 18:48 UTC (permalink / raw)
  To: Shubham Kanodia
  Cc: Shubham Kanodia via GitGitGadget, git, Patrick Steinhardt,
	Derrick Stolee

Shubham Kanodia <shubham.kanodia10@gmail.com> writes:

> Junio, what do you think about my previous suggestion — do you think
> that changing the remote behaviours is a blocker for this change to
> make its way to master?

Yes, changing the behaviour of an existing command in the same patch
that adds a new feature, and doing so without clearly explaining why
such a change is a good idea in the proposed log message, are both
huge blockers.  Not just to 'master', but to anywhere near 'next'.

Thanks.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-01-07 18:48 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-23  9:36 [PATCH] maintenance: add prune-remote-refs task Shubham Kanodia via GitGitGadget
2024-12-27  9:07 ` Junio C Hamano
2024-12-28  9:58   ` Shubham Kanodia
2024-12-28 16:05     ` Junio C Hamano
2024-12-28 16:24       ` Shubham Kanodia
2024-12-28 10:07 ` [PATCH v2] " Shubham Kanodia via GitGitGadget
2024-12-28 16:25   ` Junio C Hamano
2024-12-30  7:15   ` Patrick Steinhardt
2024-12-30 14:05     ` Junio C Hamano
2025-01-03  6:50       ` Shubham Kanodia
2025-01-03  7:38         ` Patrick Steinhardt
2025-01-03 18:13   ` [PATCH v3] " Shubham Kanodia via GitGitGadget
2025-01-03 19:02     ` Junio C Hamano
     [not found]       ` <CAG=Um+1ch1sKC0H8MJoFv=6iSK3pvA=03AKXmvhm5DG=H8T1rw@mail.gmail.com>
2025-01-07 17:29         ` Shubham Kanodia
2025-01-07 18:48           ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).