git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Patrick Steinhardt <ps@pks.im>, git@vger.kernel.org
Subject: Re: [PATCH] fetch-pack: speed up loading of refs via commit graph
Date: Wed, 4 Aug 2021 10:55:00 -0400	[thread overview]
Message-ID: <9532b3a8-baeb-50ac-766e-e9844aa63051@gmail.com> (raw)
In-Reply-To: <08519b8ab6f395cffbcd5e530bfba6aaf64241a2.1628085347.git.ps@pks.im>

On 8/4/2021 9:56 AM, Patrick Steinhardt wrote:
> When doing reference negotiation, git-fetch-pack(1) is loading all refs
> from disk in order to determine which commits it has in common with the
> remote repository. This can be quite expensive in repositories with many
> references though: in a real-world repository with around 2.2 million
> refs, fetching a single commit by its ID takes around 44 seconds.
> 
> Dominating the loading time is decompression and parsing of the objects
> which are referenced by commits. Given the fact that we only care about
> commits (or tags which can be peeled to one) in this context, there is
> thus an easy performance win by switching the parsing logic to make use
> of the commit graph in case we have one available.

Nice find!

> Like this, we avoid
> hitting the object database to parse these commits but instead only load
> them from the commit-graph. This results in a significant performance
> boost when executing git-fetch in said repository with 2.2 million refs:
> 
>     Benchmark #1: HEAD~: git fetch $remote $commit
>       Time (mean ± σ):     44.168 s ±  0.341 s    [User: 42.985 s, System: 1.106 s]
>       Range (min … max):   43.565 s … 44.577 s    10 runs
> 
>     Benchmark #2: HEAD: git fetch $remote $commit
>       Time (mean ± σ):     19.498 s ±  0.724 s    [User: 18.751 s, System: 0.690 s]
>       Range (min … max):   18.629 s … 20.454 s    10 runs
> 
>     Summary
>       'HEAD: git fetch $remote $commit' ran
>         2.27 ± 0.09 times faster than 'HEAD~: git fetch $remote $commit'

That's a great improvement. I'm sure that the remaining time is dominated
by ref parsing.

> -	if (type == OBJ_COMMIT)
> -		return (struct commit *) parse_object(the_repository, oid);
> +
> +	if (type == OBJ_COMMIT) {
> +		struct commit *commit = lookup_commit(the_repository, oid);
> +		if (!commit || repo_parse_commit(the_repository, commit))
> +			return NULL;
> +		return commit;
> +	}
> +

And this change looks obviously correct to me. I'm glad that the
implementation actually doesn't care about the commit-graph, but
instead cares about using the "standard" parsing approach instead
of side-stepping the commit-graph via parse_object().

I took a quick look for other instances where we use parse_object()
but also know to expect a commit, but did not find one.

Thanks,
-Stolee

  reply	other threads:[~2021-08-04 14:55 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-04 13:56 [PATCH] fetch-pack: speed up loading of refs via commit graph Patrick Steinhardt
2021-08-04 14:55 ` Derrick Stolee [this message]
2021-08-04 17:45 ` Junio C Hamano
2021-08-04 20:59 ` Jeff King
2021-08-04 21:32   ` Junio C Hamano
2021-08-05  6:04   ` Patrick Steinhardt
2021-08-05 11:53     ` Patrick Steinhardt
2021-08-05 16:26       ` Junio C Hamano
2021-08-05 20:42       ` Jeff King
2021-08-05 20:40     ` Jeff King
2021-08-05 19:05   ` Ævar Arnfjörð Bjarmason
2021-08-05 20:29     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9532b3a8-baeb-50ac-766e-e9844aa63051@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).