git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Shubham Kanodia via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, ps@pks.im,
	Shubham Kanodia <shubham.kanodia10@gmail.com>
Subject: Re: [PATCH] maintenance: add prune-remote-refs task
Date: Fri, 27 Dec 2024 01:07:41 -0800	[thread overview]
Message-ID: <xmqqed1tv6hu.fsf@gitster.g> (raw)
In-Reply-To: <pull.1838.git.1734946566885.gitgitgadget@gmail.com> (Shubham Kanodia via GitGitGadget's message of "Mon, 23 Dec 2024 09:36:06 +0000")

Thanks for a patch.


"Shubham Kanodia via GitGitGadget" <gitgitgadget@gmail.com> writes:

You'd want to check your procedure to tell GGG about addresses; I am
seeing these

    From: "Shubham Kanodia via GitGitGadget" <gitgitgadget@gmail.com>
    To: git@vger.kernel.org
    Cc: "mailto:gitster@pobox.com" <[gitster@pobox.com]>,
            "mailto:ps@pks.im" <[ps@pks.im]>,
            Shubham Kanodia <shubham.kanodia10@gmail.com>,
            Shubham Kanodia <shubham.kanodia10@gmail.com>

and Cc addresses in it would probably not work as-is (I've fixed
them up manually).

> From: Shubham Kanodia <shubham.kanodia10@gmail.com>
>
> Remote-tracking refs can accumulate in local repositories even as branches
> are deleted on remotes, impacting git performance negatively. Existing
> alternatives to keep refs pruned have a few issues:
>
> 1. The `fetch.prune` config automatically cleans up remote ref on fetch,
> but also pulls in new ref from remote which is an undesirable side-effect.

This makes it sound as if fetch.prune configuration makes new refs
pulled, but that is not what happens and that is not what you wanted
to hint.

	If you run "git fetch" with the "--prune" option (or with
	the fetch.prune configuration set to true) while having the
	default refspec "+refs/heads/*:refs/remotes/$name/*"
	configured in remote.$name.fetch, then ...

> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
> index 6e6651309d3..0c8f1e01ccd 100644
> --- a/Documentation/git-maintenance.txt
> +++ b/Documentation/git-maintenance.txt
> @@ -158,6 +158,26 @@ pack-refs::
>  	need to iterate across many references. See linkgit:git-pack-refs[1]
>  	for more information.
>  
> +prune-remote-refs::
> +	The `prune-remote-refs` task runs `git remote prune` on each remote
> +	repository registered in the local repository. This task helps clean
> +	up deleted remote branches, improving the performance of operations
> +	that iterate through the refs. See linkgit:git-remote[1] for more
> +	information. This task is disabled by default.
> ++
> +NOTE: This task is opt-in to prevent unexpected removal of remote refs
> +for users of git-maintenance. For most users, configuring `fetch.prune=true`
> +is a acceptable solution, as it will automatically clean up stale remote-tracking
> +branches during normal fetch operations. However, this task can be useful in
> +specific scenarios:
> ++
> +--
> +* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`)
> +  where `fetch.prune` would not affect refs outside the fetched hierarchy

The word "hierarchy" hints that things under refs/remotes/origin/
(which is the hierarchy 'foo' is fetched into) that went away would
be pruned, but that is not what happens (otherwise you would not be
adding this feature).

> +* When third-party tools might perform unexpected full fetches, and you want
> +  periodic cleanup independently of fetch operations

You'd want a full-stop after these two sentences, by the way.

> diff --git a/builtin/gc.c b/builtin/gc.c
> index 4ae5196aedf..9acf1d29895 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -20,6 +20,7 @@
>  #include "lockfile.h"
>  #include "parse-options.h"
>  #include "run-command.h"
> +#include "remote.h"
>  #include "sigchain.h"
>  #include "strvec.h"
>  #include "commit.h"
> @@ -913,6 +914,40 @@ static int maintenance_opt_schedule(const struct option *opt, const char *arg,
>  	return 0;
>  }
>  
> +static int collect_remote(struct remote *remote, void *cb_data)
> +{
> +	struct string_list *list = cb_data;
> +
> +	if (!remote->url.nr)
> +		return 0;
> +
> +	string_list_append(list, remote->name);
> +	return 0;
> +}
> +
> +static int maintenance_task_prune_remote(struct maintenance_run_opts *opts UNUSED,
> +					 struct gc_config *cfg UNUSED)
> +{
> +	struct string_list_item *item;
> +	struct string_list remotes_list = STRING_LIST_INIT_NODUP;
> +	struct child_process child = CHILD_PROCESS_INIT;
> +	int result = 0;
> +
> +	for_each_remote(collect_remote, &remotes_list);
> +
> +	for_each_string_list_item (item, &remotes_list) {
> +		const char *remote_name = item->string;
> +		child.git_cmd = 1;
> +		strvec_pushl(&child.args, "remote", "prune", remote_name, NULL);
> +
> +		if (run_command(&child))
> +			result = error(_("failed to prune '%s'"), remote_name);
> +	}

Hmph, is there a reason why you need two loops, instead of
for-each-remote calling a function that does the run_command()
thing?

"git grep for_each_string_list_item \*.c" tells me that we almost
never write SP between the macro name and the opening parenthesis.

This loop does not stop at the first error, but returns a non-zero
error after noticing even a single remote fail to run prune, which
sounds like a seneible design.  Would an error percolate up the same
way when two different tasks run and one of them fails in the
control folow in "git maintenance"?  Just want to see if we are
being consistent with the surrounding code.

Thanks.

  reply	other threads:[~2024-12-27  9:07 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-23  9:36 [PATCH] maintenance: add prune-remote-refs task Shubham Kanodia via GitGitGadget
2024-12-27  9:07 ` Junio C Hamano [this message]
2024-12-28  9:58   ` Shubham Kanodia
2024-12-28 16:05     ` Junio C Hamano
2024-12-28 16:24       ` Shubham Kanodia
2024-12-28 10:07 ` [PATCH v2] " Shubham Kanodia via GitGitGadget
2024-12-28 16:25   ` Junio C Hamano
2024-12-30  7:15   ` Patrick Steinhardt
2024-12-30 14:05     ` Junio C Hamano
2025-01-03  6:50       ` Shubham Kanodia
2025-01-03  7:38         ` Patrick Steinhardt
2025-01-03 18:13   ` [PATCH v3] " Shubham Kanodia via GitGitGadget
2025-01-03 19:02     ` Junio C Hamano
     [not found]       ` <CAG=Um+1ch1sKC0H8MJoFv=6iSK3pvA=03AKXmvhm5DG=H8T1rw@mail.gmail.com>
2025-01-07 17:29         ` Shubham Kanodia
2025-01-07 18:48           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqed1tv6hu.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=ps@pks.im \
    --cc=shubham.kanodia10@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).