git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jacob Vosmaer <jacob@gitlab.com>
Cc: git@vger.kernel.org, avarab@gmail.com, peff@peff.net, me@ttaylorr.com
Subject: Re: [PATCH v2 1/1] builtin/pack-objects.c: avoid iterating all refs
Date: Wed, 20 Jan 2021 09:49:20 -0500	[thread overview]
Message-ID: <YAhC8Gsp4H17e28n@nand.local> (raw)
In-Reply-To: <20210120124514.49737-2-jacob@gitlab.com>

Hi Jacob,

On Wed, Jan 20, 2021 at 01:45:14PM +0100, Jacob Vosmaer wrote:
> In git-pack-objects, we iterate over all the tags if the --include-tag
> option is passed on the command line. For some reason this uses
> for_each_ref which is expensive if the repo has many refs. We should
> use for_each_tag_ref instead.

I don't think it's worth sending another version, but I would have liked
to see: "... because we can save time by only iterating over some of the
refs" at the end of this paragraph.

> Because the add_ref_tag callback will now only visit tags we
> simplified it a bit.
>
> The motivation for this change is that we observed performance issues
> with a repository on gitlab.com that has 500,000 refs but only 2,000
> tags. The fetch traffic on that repo is dominated by CI, and when we
> changed CI to fetch with 'git fetch --no-tags' we saw a dramatic
> change in the CPU profile of git-pack-objects. This lead us to this
> particular ref walk. More details in:
> https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/746#note_483546598
>
> Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
> ---
>  builtin/pack-objects.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 2a00358f34..ad52c91bdb 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -2803,13 +2803,11 @@ static void add_tag_chain(const struct object_id *oid)
>  	}
>  }
>
> -static int add_ref_tag(const char *path, const struct object_id *oid, int flag, void *cb_data)
> +static int add_ref_tag(const char *tag, const struct object_id *oid, int flag, void *cb_data)
>  {
>  	struct object_id peeled;
>
> -	if (starts_with(path, "refs/tags/") && /* is a tag? */
> -	    !peel_ref(path, &peeled)    && /* peelable? */
> -	    obj_is_packed(&peeled)) /* object packed? */
> +	if (!peel_ref(tag, &peeled) && obj_is_packed(&peeled))
>  		add_tag_chain(oid);
>  	return 0;
>  }
> @@ -3740,7 +3738,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  	}
>  	cleanup_preferred_base();
>  	if (include_tag && nr_result)
> -		for_each_ref(add_ref_tag, NULL);
> +		for_each_tag_ref(add_ref_tag, NULL);

OK. Seeing another caller (builtin/pack-objects.c:compute_write_order())
that passes a callback to for_each_tag_ref() makes me feel more
comfortable about using it here.

Thanks for investigating and resolving this in a way which cleans up the
surrounding code.

>  	stop_progress(&progress_state);
>  	trace2_region_leave("pack-objects", "enumerate-objects",
>  			    the_repository);
> --
> 2.30.0

This version looks good to me, thanks for digging!

  Reviewed-by: Taylor Blau <me@ttaylorr.com>

Thanks,
Taylor

  reply	other threads:[~2021-01-20 15:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-20 12:45 [PATCH v2 0/1] builtin/pack-objects.c: avoid iterating all refs Jacob Vosmaer
2021-01-20 12:45 ` [PATCH v2 1/1] " Jacob Vosmaer
2021-01-20 14:49   ` Taylor Blau [this message]
2021-01-20 16:18     ` Jeff King
2021-01-20 16:19       ` Taylor Blau
2021-01-20 18:49         ` Jacob Vosmaer
2021-01-20 19:45         ` Jeff King
2021-01-20 21:46           ` Jacob Vosmaer
2021-01-20 21:52             ` Taylor Blau
2021-01-21  2:54             ` Jeff King
2021-01-22 16:46               ` Jacob Vosmaer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YAhC8Gsp4H17e28n@nand.local \
    --to=me@ttaylorr.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jacob@gitlab.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).