git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Liu Zhongbo via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Liu Zhongbo <liuzhongbo.gg@gmail.com>,
	Liu Zhongbo <liuzhongbo.6666@bytedance.com>
Subject: Re: [PATCH] builtin/fetch: iterate symrefs instead of all when checking dangling refs
Date: Wed, 16 Oct 2024 09:13:49 +0200	[thread overview]
Message-ID: <Zw9npQrbBzddKxaJ@pks.im> (raw)
In-Reply-To: <pull.1812.git.git.1728962878717.gitgitgadget@gmail.com>

On Tue, Oct 15, 2024 at 03:27:58AM +0000, Liu Zhongbo via GitGitGadget wrote:
> From: Liu Zhongbo <liuzhongbo.6666@bytedance.com>
> 
> refs_warn_dangling_symref() traverse all references to check if there are
> any dangling symbolic references. The complexity is
> O(number of deleted references * total number of references).
> It will take a lot of time if there are tens of thousands of branches in
> monorepo.
> 
> So I first identified all the symbolic references, and then only traverse
> in these references. The complexity is
> O (number of deleted references * number of symbolic references).

Okay. I'm a bit curious here, mostly because I would have thought that
it should be able to make this O(number of deleted refs * logn(existing
refs)): for every deleted ref, you should only have to look up whether
its target exists or not, which typically is O(logn existing refs).

But let's read on.

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 80a64d0d269..ec4be60cfeb 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1412,15 +1412,18 @@ static int prune_refs(struct display_state *display_state,
>  
>  	if (verbosity >= 0) {
>  		int summary_width = transport_summary_width(stale_refs);
> +	    struct string_list symrefs = STRING_LIST_INIT_NODUP;
> +	    refs_get_symrefs(get_main_ref_store(the_repository), &symrefs);
>  
>  		for (ref = stale_refs; ref; ref = ref->next) {
>  			display_ref_update(display_state, '-', _("[deleted]"), NULL,
>  					   _("(none)"), ref->name,
>  					   &ref->new_oid, &ref->old_oid,
>  					   summary_width);
> -			refs_warn_dangling_symref(get_main_ref_store(the_repository),
> -						  stderr, dangling_msg, ref->name);
> +	        refs_warn_dangling_symref(get_main_ref_store(the_repository), stderr,
> +				      dangling_msg, ref->name, &symrefs);
>  		}
> +	    string_list_clear(&symrefs, 0);
>  	}
>  
>  cleanup:

Okay, so here we're now iterating through the refs which we are about to
delete. For every such ref, we call `refs_warn_dangling_symref()`, which
iterates through all references in the repository to check whether any
of them resolves to the passed ref.

That feels inefficient indeed, and I agree that reading symrefs once is
going to be way more efficient. But open-coding part of the logic here
does not make much sense to me, as the function itself should know to do
it efficiently.

Can we instead refactor the code to use `refs_warn_dangling_symrefs()`,
which does all of this in a single iteration over all refs? That'd
remove the need for us to do all of the above as we now only iterate a
single time through all refs, wouldn't it?

That still isn't quite O(number of deleted refs * logn existing refs) in
theory. It's rather O(existing refs * logn deleted refs) as before
because we have to also look up the currently iterader refname in
`struct warn_if_dangling_data`, which is using a binary search in the
sorted string list. But I think this should still be way more efficient
compared to the current solution, where we iterate through all refs
multiple times.

Patrick

      parent reply	other threads:[~2024-10-16  7:13 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-15  3:27 [PATCH] builtin/fetch: iterate symrefs instead of all when checking dangling refs Liu Zhongbo via GitGitGadget
2024-10-15 19:08 ` Taylor Blau
2024-10-16  7:13 ` Patrick Steinhardt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zw9npQrbBzddKxaJ@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=liuzhongbo.6666@bytedance.com \
    --cc=liuzhongbo.gg@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).