From: Derrick Stolee <stolee@gmail.com>
To: Patrick Steinhardt <ps@pks.im>, git@vger.kernel.org
Subject: Re: [PATCH] fetch-pack: speed up loading of refs via commit graph
Date: Wed, 4 Aug 2021 10:55:00 -0400 [thread overview]
Message-ID: <9532b3a8-baeb-50ac-766e-e9844aa63051@gmail.com> (raw)
In-Reply-To: <08519b8ab6f395cffbcd5e530bfba6aaf64241a2.1628085347.git.ps@pks.im>
On 8/4/2021 9:56 AM, Patrick Steinhardt wrote:
> When doing reference negotiation, git-fetch-pack(1) is loading all refs
> from disk in order to determine which commits it has in common with the
> remote repository. This can be quite expensive in repositories with many
> references though: in a real-world repository with around 2.2 million
> refs, fetching a single commit by its ID takes around 44 seconds.
>
> Dominating the loading time is decompression and parsing of the objects
> which are referenced by commits. Given the fact that we only care about
> commits (or tags which can be peeled to one) in this context, there is
> thus an easy performance win by switching the parsing logic to make use
> of the commit graph in case we have one available.
Nice find!
> Like this, we avoid
> hitting the object database to parse these commits but instead only load
> them from the commit-graph. This results in a significant performance
> boost when executing git-fetch in said repository with 2.2 million refs:
>
> Benchmark #1: HEAD~: git fetch $remote $commit
> Time (mean ± σ): 44.168 s ± 0.341 s [User: 42.985 s, System: 1.106 s]
> Range (min … max): 43.565 s … 44.577 s 10 runs
>
> Benchmark #2: HEAD: git fetch $remote $commit
> Time (mean ± σ): 19.498 s ± 0.724 s [User: 18.751 s, System: 0.690 s]
> Range (min … max): 18.629 s … 20.454 s 10 runs
>
> Summary
> 'HEAD: git fetch $remote $commit' ran
> 2.27 ± 0.09 times faster than 'HEAD~: git fetch $remote $commit'
That's a great improvement. I'm sure that the remaining time is dominated
by ref parsing.
> - if (type == OBJ_COMMIT)
> - return (struct commit *) parse_object(the_repository, oid);
> +
> + if (type == OBJ_COMMIT) {
> + struct commit *commit = lookup_commit(the_repository, oid);
> + if (!commit || repo_parse_commit(the_repository, commit))
> + return NULL;
> + return commit;
> + }
> +
And this change looks obviously correct to me. I'm glad that the
implementation actually doesn't care about the commit-graph, but
instead cares about using the "standard" parsing approach instead
of side-stepping the commit-graph via parse_object().
I took a quick look for other instances where we use parse_object()
but also know to expect a commit, but did not find one.
Thanks,
-Stolee
next prev parent reply other threads:[~2021-08-04 14:55 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-04 13:56 [PATCH] fetch-pack: speed up loading of refs via commit graph Patrick Steinhardt
2021-08-04 14:55 ` Derrick Stolee [this message]
2021-08-04 17:45 ` Junio C Hamano
2021-08-04 20:59 ` Jeff King
2021-08-04 21:32 ` Junio C Hamano
2021-08-05 6:04 ` Patrick Steinhardt
2021-08-05 11:53 ` Patrick Steinhardt
2021-08-05 16:26 ` Junio C Hamano
2021-08-05 20:42 ` Jeff King
2021-08-05 20:40 ` Jeff King
2021-08-05 19:05 ` Ævar Arnfjörð Bjarmason
2021-08-05 20:29 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9532b3a8-baeb-50ac-766e-e9844aa63051@gmail.com \
--to=stolee@gmail.com \
--cc=git@vger.kernel.org \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).