From: Taylor Blau <me@ttaylorr.com>
To: Jacob Vosmaer <jacob@gitlab.com>
Cc: git@vger.kernel.org, avarab@gmail.com, peff@peff.net, me@ttaylorr.com
Subject: Re: [PATCH v2 1/1] builtin/pack-objects.c: avoid iterating all refs
Date: Wed, 20 Jan 2021 09:49:20 -0500 [thread overview]
Message-ID: <YAhC8Gsp4H17e28n@nand.local> (raw)
In-Reply-To: <20210120124514.49737-2-jacob@gitlab.com>
Hi Jacob,
On Wed, Jan 20, 2021 at 01:45:14PM +0100, Jacob Vosmaer wrote:
> In git-pack-objects, we iterate over all the tags if the --include-tag
> option is passed on the command line. For some reason this uses
> for_each_ref which is expensive if the repo has many refs. We should
> use for_each_tag_ref instead.
I don't think it's worth sending another version, but I would have liked
to see: "... because we can save time by only iterating over some of the
refs" at the end of this paragraph.
> Because the add_ref_tag callback will now only visit tags we
> simplified it a bit.
>
> The motivation for this change is that we observed performance issues
> with a repository on gitlab.com that has 500,000 refs but only 2,000
> tags. The fetch traffic on that repo is dominated by CI, and when we
> changed CI to fetch with 'git fetch --no-tags' we saw a dramatic
> change in the CPU profile of git-pack-objects. This lead us to this
> particular ref walk. More details in:
> https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/746#note_483546598
>
> Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
> ---
> builtin/pack-objects.c | 8 +++-----
> 1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 2a00358f34..ad52c91bdb 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -2803,13 +2803,11 @@ static void add_tag_chain(const struct object_id *oid)
> }
> }
>
> -static int add_ref_tag(const char *path, const struct object_id *oid, int flag, void *cb_data)
> +static int add_ref_tag(const char *tag, const struct object_id *oid, int flag, void *cb_data)
> {
> struct object_id peeled;
>
> - if (starts_with(path, "refs/tags/") && /* is a tag? */
> - !peel_ref(path, &peeled) && /* peelable? */
> - obj_is_packed(&peeled)) /* object packed? */
> + if (!peel_ref(tag, &peeled) && obj_is_packed(&peeled))
> add_tag_chain(oid);
> return 0;
> }
> @@ -3740,7 +3738,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
> }
> cleanup_preferred_base();
> if (include_tag && nr_result)
> - for_each_ref(add_ref_tag, NULL);
> + for_each_tag_ref(add_ref_tag, NULL);
OK. Seeing another caller (builtin/pack-objects.c:compute_write_order())
that passes a callback to for_each_tag_ref() makes me feel more
comfortable about using it here.
Thanks for investigating and resolving this in a way which cleans up the
surrounding code.
> stop_progress(&progress_state);
> trace2_region_leave("pack-objects", "enumerate-objects",
> the_repository);
> --
> 2.30.0
This version looks good to me, thanks for digging!
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Thanks,
Taylor
next prev parent reply other threads:[~2021-01-20 15:40 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-20 12:45 [PATCH v2 0/1] builtin/pack-objects.c: avoid iterating all refs Jacob Vosmaer
2021-01-20 12:45 ` [PATCH v2 1/1] " Jacob Vosmaer
2021-01-20 14:49 ` Taylor Blau [this message]
2021-01-20 16:18 ` Jeff King
2021-01-20 16:19 ` Taylor Blau
2021-01-20 18:49 ` Jacob Vosmaer
2021-01-20 19:45 ` Jeff King
2021-01-20 21:46 ` Jacob Vosmaer
2021-01-20 21:52 ` Taylor Blau
2021-01-21 2:54 ` Jeff King
2021-01-22 16:46 ` Jacob Vosmaer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YAhC8Gsp4H17e28n@nand.local \
--to=me@ttaylorr.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=jacob@gitlab.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).