Re: [PATCH 3/3] receive-pack: use batched reference updates

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Patrick Steinhardt <ps@pks.im>
To: Karthik Nayak <karthik.188@gmail.com>
Cc: git@vger.kernel.org, toon@iotcl.com
Subject: Re: [PATCH 3/3] receive-pack: use batched reference updates
Date: Wed, 14 May 2025 14:31:09 +0200	[thread overview]
Message-ID: <aCSNDbUX-MMJZj5S@pks.im> (raw)
In-Reply-To: <20250514-501-update-git-fetch-1-to-use-partial-transactions-v1-3-7c65f46493d4@gmail.com>

On Wed, May 14, 2025 at 11:03:49AM +0200, Karthik Nayak wrote:
[snip]
> With the reftable backend there is a 18x performance improvement, when
> performing receive-pack with 10000 refs:
> 
>   Benchmark 1: receive: many refs (refformat = reftable, refcount = 10000, revision = master)
>     Time (mean ± σ):      4.276 s ±  0.078 s    [User: 0.796 s, System: 3.318 s]
>     Range (min … max):    4.185 s …  4.430 s    10 runs
> 
>   Benchmark 2: receive: many refs (refformat = reftable, refcount = 10000, revision = HEAD)
>     Time (mean ± σ):     235.4 ms ±   6.9 ms    [User: 75.4 ms, System: 157.3 ms]
>     Range (min … max):   228.5 ms … 254.2 ms    11 runs
> 
>   Summary
>     receive: many refs (refformat = reftable, refcount = 10000, revision = HEAD) ran
>      18.16 ± 0.63 times faster than receive: many refs (refformat = reftable, refcount = 10000, revision = master)
> 
> In similar conditions, the files backend sees a 1.21x performance
> improvement:
> 
>   Benchmark 1: receive: many refs (refformat = files, refcount = 10000, revision = master)
>     Time (mean ± σ):      1.121 s ±  0.021 s    [User: 0.128 s, System: 0.975 s]
>     Range (min … max):    1.097 s …  1.156 s    10 runs
> 
>   Benchmark 2: receive: many refs (refformat = files, refcount = 10000, revision = HEAD)
>     Time (mean ± σ):     927.9 ms ±  22.6 ms    [User: 99.0 ms, System: 815.2 ms]
>     Range (min … max):   903.1 ms … 978.0 ms    10 runs
> 
>   Summary
>     receive: many refs (refformat = files, refcount = 10000, revision = HEAD) ran
>       1.21 ± 0.04 times faster than receive: many refs (refformat = files, refcount = 10000, revision = master)

We see almost the same speedups as we saw in git-fetch(1), and the
reason why we see them is basically the same.

> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index be314879e8..b4fceb3837 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -1843,35 +1843,91 @@ static void BUG_if_skipped_connectivity_check(struct command *commands,
>  	BUG_if_bug("connectivity check skipped???");
>  }
>  
> +static void ref_transaction_rejection_handler(const char *refname,
> +					      const struct object_id *old_oid UNUSED,
> +					      const struct object_id *new_oid UNUSED,
> +					      const char *old_target UNUSED,
> +					      const char *new_target UNUSED,
> +					      enum ref_transaction_error err,
> +					      void *cb_data)
> +{
> +	struct strmap *failed_refs = (struct strmap *)cb_data;

This cast is unnecessary.

> +	const char *reason = "";
> +
> +	switch (err) {
> +	case REF_TRANSACTION_ERROR_NAME_CONFLICT:
> +		reason = "refname conflict";
> +		break;
> +	case REF_TRANSACTION_ERROR_CREATE_EXISTS:
> +		reason = "reference already exists";
> +		break;
> +	case REF_TRANSACTION_ERROR_NONEXISTENT_REF:
> +		reason = "reference does not exist";
> +		break;
> +	case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE:
> +		reason = "incorrect old value provided";
> +		break;
> +	case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE:
> +		reason = "invalid new value provided";
> +		break;
> +	case REF_TRANSACTION_ERROR_EXPECTED_SYMREF:
> +		reason = "expected symref but found regular ref";
> +		break;
> +	default:
> +		reason = "unkown failure";
> +	}
> +
> +	strmap_put(failed_refs, refname, xstrdup(reason));
> +}

I'd have expected something like this for git-fetch(1), as well, so that
we don't silently swallow failed ref updates. Would it make sense to
maybe provide an array of reasons by enum so that we can reuse those
messages?

>  static void execute_commands_non_atomic(struct command *commands,
>  					struct shallow_info *si)
>  {
>  	struct command *cmd;
>  	struct strbuf err = STRBUF_INIT;
> +	const char *reported_error = "";
> +	struct strmap failed_refs = STRMAP_INIT;
> +
> +	transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
> +						  REF_TRANSACTION_ALLOW_FAILURE, &err);
> +	if (!transaction) {
> +		rp_error("%s", err.buf);
> +		strbuf_reset(&err);
> +		reported_error = "transaction failed to start";
> +		goto failure;
> +	}

Okay. We now create a single transaction with failures being allowed.

>  	for (cmd = commands; cmd; cmd = cmd->next) {
>  		if (!should_process_cmd(cmd) || cmd->run_proc_receive)
>  			continue;
>  
> -		transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
> -							  0, &err);
> -		if (!transaction) {
> -			rp_error("%s", err.buf);
> -			strbuf_reset(&err);
> -			cmd->error_string = "transaction failed to start";
> -			continue;
> -		}
> -
>  		cmd->error_string = update(cmd, si);
> +	}

So here we only need to queue each update.

> -		if (!cmd->error_string
> -		    && ref_transaction_commit(transaction, &err)) {
> -			rp_error("%s", err.buf);
> -			strbuf_reset(&err);
> -			cmd->error_string = "failed to update ref";
> -		}
> -		ref_transaction_free(transaction);
> +	if (ref_transaction_commit(transaction, &err)) {
> +		rp_error("%s", err.buf);
> +		reported_error = "failed to update refs";
> +		goto failure;
>  	}
> +
> +	ref_transaction_for_each_rejected_update(transaction,
> +						 ref_transaction_rejection_handler,
> +						 &failed_refs);
> +
> +	if (strmap_empty(&failed_refs))
> +		goto cleanup;
> +
> +failure:
> +	for (cmd = commands; cmd; cmd = cmd->next) {
> +		if (strmap_empty(&failed_refs))
> +			cmd->error_string = reported_error;

The reported error may have one of there values now:

  - The empty string. Is it correct to set the error string to that
    value? Shouldn't it rather be a `NULL` pointer?

  - "transaction failed to start". It makes sense to update every
    command accordingly, as we wouldn't have updated any of them.

  - "failed to update refs", in case the commit failed. Does the commit
    fail only in cases where we couldn't update _any_ reference, or does
    it also retrun an error when only one of the updates failed? If the
    latter, we shouldn't update all the others to "failed", should we?

In any case, it feels weird that we use `strmap_empty()` to check for
this condition. I'd have expected that we rather check for
`reported_error` to be a non-empty string directly to figure out whether
the transaction itself has failed as a whole. Like that, we'd know that
we only ever do this if we hit a `goto failure` branch.

Patrick

next prev parent reply	other threads:[~2025-05-14 12:31 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-14  9:03 [PATCH 0/3] fetch/receive: use batched reference updates Karthik Nayak
2025-05-14  9:03 ` [PATCH 1/3] fetch: " Karthik Nayak
2025-05-14 12:31   ` Patrick Steinhardt
2025-05-15 11:13     ` Karthik Nayak
2025-05-15 11:30       ` Patrick Steinhardt
2025-05-15 11:36         ` Karthik Nayak
2025-05-14 17:36   ` Junio C Hamano
2025-05-14  9:03 ` [PATCH 2/3] send-pack: fix memory leak around duplicate refs Karthik Nayak
2025-05-14 17:46   ` Junio C Hamano
2025-05-15 11:23     ` Karthik Nayak
2025-05-14  9:03 ` [PATCH 3/3] receive-pack: use batched reference updates Karthik Nayak
2025-05-14 12:31   ` Patrick Steinhardt [this message]
2025-05-14 19:00     ` Junio C Hamano
2025-05-15 11:30     ` Karthik Nayak
2025-05-15 14:07 ` [PATCH v2 0/4] fetch/receive: " Karthik Nayak
2025-05-15 14:07   ` [PATCH v2 1/4] refs: add function to translate errors to strings Karthik Nayak
2025-05-15 19:11     ` Jeff King
2025-05-16  9:11       ` Karthik Nayak
2025-05-15 20:26     ` Junio C Hamano
2025-05-16  9:12       ` Karthik Nayak
2025-05-15 14:07   ` [PATCH v2 2/4] fetch: use batched reference updates Karthik Nayak
2025-05-16  5:40     ` Patrick Steinhardt
2025-05-16  9:53       ` Karthik Nayak
2025-05-16 10:00         ` Patrick Steinhardt
2025-05-18 11:30           ` Karthik Nayak
2025-05-15 14:07   ` [PATCH v2 3/4] send-pack: fix memory leak around duplicate refs Karthik Nayak
2025-05-15 14:07   ` [PATCH v2 4/4] receive-pack: use batched reference updates Karthik Nayak
2025-05-15 18:55     ` Jeff King
2025-05-15 19:09       ` Jeff King
2025-05-16 19:49         ` Karthik Nayak
2025-05-19  9:58 ` [PATCH v3 0/4] fetch/receive: " Karthik Nayak
2025-05-19  9:58   ` [PATCH v3 1/4] refs: add function to translate errors to strings Karthik Nayak
2025-05-19  9:58   ` [PATCH v3 2/4] fetch: use batched reference updates Karthik Nayak
2025-05-19  9:58   ` [PATCH v3 3/4] send-pack: fix memory leak around duplicate refs Karthik Nayak
2025-05-19  9:58   ` [PATCH v3 4/4] receive-pack: use batched reference updates Karthik Nayak
2025-05-19 18:14   ` [PATCH v3 0/4] fetch/receive: " Junio C Hamano
2025-05-20  9:05     ` Karthik Nayak
2025-05-21 13:14       ` Junio C Hamano
2025-05-22  6:00       ` Jeff King
2025-05-22  8:50         ` Karthik Nayak
2025-05-22 15:31           ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aCSNDbUX-MMJZj5S@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=karthik.188@gmail.com \
    --cc=toon@iotcl.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).