[PATCH 0/3] fetch/receive: use batched reference updates

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] fetch/receive: use batched reference updates
@ 2025-05-14  9:03 Karthik Nayak
  2025-05-14  9:03 ` [PATCH 1/3] fetch: " Karthik Nayak
                   ` (4 more replies)
  0 siblings, 5 replies; 41+ messages in thread
From: Karthik Nayak @ 2025-05-14  9:03 UTC (permalink / raw)
  To: git; +Cc: toon, ps, Karthik Nayak

The git-fetch(1) and git-receive-pack(1) commands update references as
part of the flow. Each reference update is treated as a single entity
and a transaction is created for each update.

This can be really slow, specifically in reference backends where there
are optimizations which ensure a single transaction with 'N' reference
update perform much faster than 'N' individual transactions. Also having
'N' individual transactions has buildup and teardown costs. These costs
add up in repositories with a large number of references.

Also specifically in the reftable backend, 'N' individual transactions
would also trigger auto-compaction for every transaction.

The reasoning for using individual transactions is because we want to
allow partial updates of references in these commands. Using a single
transaction would be an all-or-nothing scenario.

Recently we introduced an in-between solution called batched reference
updates in 23fc8e4f61 (refs: implement batch reference update support,
2025-04-08). This allows us to batch a set of reference updates, where
individual updates can pass/fail without affecting the batch.

This patch series, modifies both 'git-fetch(1)' and
'git-receive-pack(1)' to use this mechanism. With this, we see a
significant performance boost:

+---------------------+---------------+------------------+
|                     | files backend | reftable backend |
+---------------------+---------------+------------------+
| git-fetch(1)        | 1.25x         | 22x              |
| git-receive-pack(1) | 1.21x         | 18x              |
+---------------------+---------------+------------------+

The first and third patch handle the changes for 'git-fetch(1)' and
'git-receive-pack(1)' respectively. The second patch fixes a small
memory leak I encountered while working on this series.

This is based on top of master: 7a1d2bd0a5 (Merge branch 'master' of
https://github.com/j6t/gitk, 2025-05-09). There were no conflicts 
observed with next or seen.

---
 builtin/fetch.c                  | 119 +++++++++++++++++++++------------------
 builtin/receive-pack.c           |  88 +++++++++++++++++++++++------
 send-pack.c                      |   7 +++
 t/t1416-ref-transaction-hooks.sh |   2 -
 t/t5408-send-pack-stdin.sh       |  14 ++++-
 5 files changed, 155 insertions(+), 75 deletions(-)

Karthik Nayak (3):
      fetch: use batched reference updates
      send-pack: fix memory leak around duplicate refs
      receive-pack: use batched reference updates

base-commit: 7a1d2bd0a596f42a8a7a68d55577967bb454fec0
change-id: 20250423-501-update-git-fetch-1-to-use-partial-transactions-7fb66339fe79

Thanks
- Karthik

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/3] fetch: use batched reference updates
  2025-05-14  9:03 [PATCH 0/3] fetch/receive: use batched reference updates Karthik Nayak
@ 2025-05-14  9:03 ` Karthik Nayak
  2025-05-14 12:31   ` Patrick Steinhardt
  2025-05-14 17:36   ` Junio C Hamano
  2025-05-14  9:03 ` [PATCH 2/3] send-pack: fix memory leak around duplicate refs Karthik Nayak
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 41+ messages in thread
From: Karthik Nayak @ 2025-05-14  9:03 UTC (permalink / raw)
  To: git; +Cc: toon, ps, Karthik Nayak

The reference updates performed as a part of 'git-fetch(1)', take place
one at a time. For each reference update, a new transaction is created
and committed. This is necessary to ensure we can allow individual
updates to fail without failing the entire command. The command also
supports an '--atomic' mode, which uses a single transaction to update
all of the references. But this mode has an all-or-nothing approach,
where if a single update fails, all updates would fail.

In 23fc8e4f61 (refs: implement batch reference update support,
2025-04-08), we introduced a new mechanism to batch reference updates.
Under the hood, this uses a single transaction to perform a batch of
reference updates, while allowing only individual updates to fail.
Utilize this newly introduced batch update mechanism in 'git-fetch(1)'.
This provides a significant bump in performance, especially when dealing
with repositories with large number of references.

Adding support for batched updates is simply modifying the flow to also
create a batch update transaction in the non-atomic flow.

With the reftable backend there is a 22x performance improvement, when
performing 'git-fetch(1)' with 10000 refs:

  Benchmark 1: fetch: many refs (refformat = reftable, refcount = 10000, revision = master)
    Time (mean ± σ):      3.403 s ±  0.775 s    [User: 1.875 s, System: 1.417 s]
    Range (min … max):    2.454 s …  4.529 s    10 runs

  Benchmark 2: fetch: many refs (refformat = reftable, refcount = 10000, revision = HEAD)
    Time (mean ± σ):     154.3 ms ±  17.6 ms    [User: 102.5 ms, System: 56.1 ms]
    Range (min … max):   145.2 ms … 220.5 ms    18 runs

  Summary
    fetch: many refs (refformat = reftable, refcount = 10000, revision = HEAD) ran
     22.06 ± 5.62 times faster than fetch: many refs (refformat = reftable, refcount = 10000, revision = master)

In similar conditions, the files backend sees a 1.25x performance
improvement:

  Benchmark 1: fetch: many refs (refformat = files, refcount = 10000, revision = master)
    Time (mean ± σ):     605.5 ms ±   9.4 ms    [User: 117.8 ms, System: 483.3 ms]
    Range (min … max):   595.6 ms … 621.5 ms    10 runs

  Benchmark 2: fetch: many refs (refformat = files, refcount = 10000, revision = HEAD)
    Time (mean ± σ):     485.8 ms ±   4.3 ms    [User: 91.1 ms, System: 396.7 ms]
    Range (min … max):   477.6 ms … 494.3 ms    10 runs

  Summary
    fetch: many refs (refformat = files, refcount = 10000, revision = HEAD) ran
      1.25 ± 0.02 times faster than fetch: many refs (refformat = files, refcount = 10000, revision = master)

With this we'll either be using a regular transaction or a batch update
transaction. This helps cleanup some code which is no longer needed as
we'll now always have some type of 'ref_transaction' object being
propagated.

One big change is that earlier, each individual update would propagate a
failure. Whereas now, the `ref_transaction_for_each_rejected_update`
function is called at the end of the flow to capture the exit status for
'git-fetch(1)' and also to print F/D conflict errors. This does change
the order of the errors being printed, but the behavior stays the same.

Since transaction errors are now explicitly defined as part of
76e760b999 (refs: introduce enum-based transaction error types,
2025-04-08), utilize them and get rid of custom errors defined within
'builtin/fetch.c'.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 builtin/fetch.c | 119 +++++++++++++++++++++++++++++++-------------------------
 1 file changed, 65 insertions(+), 54 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 5279997c96..1558f6d1e8 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -641,9 +641,6 @@ static struct ref *get_ref_map(struct remote *remote,
 	return ref_map;
 }
 
-#define STORE_REF_ERROR_OTHER 1
-#define STORE_REF_ERROR_DF_CONFLICT 2
-
 static int s_update_ref(const char *action,
 			struct ref *ref,
 			struct ref_transaction *transaction,
@@ -651,7 +648,6 @@ static int s_update_ref(const char *action,
 {
 	char *msg;
 	char *rla = getenv("GIT_REFLOG_ACTION");
-	struct ref_transaction *our_transaction = NULL;
 	struct strbuf err = STRBUF_INIT;
 	int ret;
 
@@ -661,43 +657,10 @@ static int s_update_ref(const char *action,
 		rla = default_rla.buf;
 	msg = xstrfmt("%s: %s", rla, action);
 
-	/*
-	 * If no transaction was passed to us, we manage the transaction
-	 * ourselves. Otherwise, we trust the caller to handle the transaction
-	 * lifecycle.
-	 */
-	if (!transaction) {
-		transaction = our_transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
-									    0, &err);
-		if (!transaction) {
-			ret = STORE_REF_ERROR_OTHER;
-			goto out;
-		}
-	}
-
 	ret = ref_transaction_update(transaction, ref->name, &ref->new_oid,
 				     check_old ? &ref->old_oid : NULL,
 				     NULL, NULL, 0, msg, &err);
-	if (ret) {
-		ret = STORE_REF_ERROR_OTHER;
-		goto out;
-	}
 
-	if (our_transaction) {
-		switch (ref_transaction_commit(our_transaction, &err)) {
-		case 0:
-			break;
-		case REF_TRANSACTION_ERROR_NAME_CONFLICT:
-			ret = STORE_REF_ERROR_DF_CONFLICT;
-			goto out;
-		default:
-			ret = STORE_REF_ERROR_OTHER;
-			goto out;
-		}
-	}
-
-out:
-	ref_transaction_free(our_transaction);
 	if (ret)
 		error("%s", err.buf);
 	strbuf_release(&err);
@@ -1139,7 +1102,6 @@ N_("it took %.2f seconds to check forced updates; you can use\n"
    "to avoid this check\n");
 
 static int store_updated_refs(struct display_state *display_state,
-			      const char *remote_name,
 			      int connectivity_checked,
 			      struct ref_transaction *transaction, struct ref *ref_map,
 			      struct fetch_head *fetch_head,
@@ -1277,11 +1239,6 @@ static int store_updated_refs(struct display_state *display_state,
 		}
 	}
 
-	if (rc & STORE_REF_ERROR_DF_CONFLICT)
-		error(_("some local refs could not be updated; try running\n"
-		      " 'git remote prune %s' to remove any old, conflicting "
-		      "branches"), remote_name);
-
 	if (advice_enabled(ADVICE_FETCH_SHOW_FORCED_UPDATES)) {
 		if (!config->show_forced_updates) {
 			warning(_(warn_show_forced_updates));
@@ -1366,9 +1323,8 @@ static int fetch_and_consume_refs(struct display_state *display_state,
 	}
 
 	trace2_region_enter("fetch", "consume_refs", the_repository);
-	ret = store_updated_refs(display_state, transport->remote->name,
-				 connectivity_checked, transaction, ref_map,
-				 fetch_head, config);
+	ret = store_updated_refs(display_state, connectivity_checked,
+				 transaction, ref_map, fetch_head, config);
 	trace2_region_leave("fetch", "consume_refs", the_repository);
 
 out:
@@ -1688,6 +1644,32 @@ static int set_head(const struct ref *remote_refs, struct remote *remote)
 	return result;
 }
 
+struct ref_rejection_data {
+	int *retcode;
+	int conflict_msg_shown;
+	const char *remote_name;
+};
+
+static void ref_transaction_rejection_handler(const char *refname UNUSED,
+					      const struct object_id *old_oid UNUSED,
+					      const struct object_id *new_oid UNUSED,
+					      const char *old_target UNUSED,
+					      const char *new_target UNUSED,
+					      enum ref_transaction_error err,
+					      void *cb_data)
+{
+	struct ref_rejection_data *data = (struct ref_rejection_data *)cb_data;
+
+	if (err == REF_TRANSACTION_ERROR_NAME_CONFLICT && !data->conflict_msg_shown) {
+		error(_("some local refs could not be updated; try running\n"
+			" 'git remote prune %s' to remove any old, conflicting "
+			"branches"), data->remote_name);
+		data->conflict_msg_shown = 1;
+	}
+
+	*data->retcode = 1;
+}
+
 static int do_fetch(struct transport *transport,
 		    struct refspec *rs,
 		    const struct fetch_config *config)
@@ -1808,6 +1790,20 @@ static int do_fetch(struct transport *transport,
 			retcode = 1;
 	}
 
+	/*
+	 * If not atomic, we can still use batched updates, which would be much
+	 * more performent. We don't initiate the transaction before pruning,
+	 * since pruning must be an independent step, to avoid F/D conflicts.
+	 */
+	if (!transaction) {
+		transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
+							  REF_TRANSACTION_ALLOW_FAILURE, &err);
+		if (!transaction) {
+			retcode = -1;
+			goto cleanup;
+		}
+	}
+
 	if (fetch_and_consume_refs(&display_state, transport, transaction, ref_map,
 				   &fetch_head, config)) {
 		retcode = 1;
@@ -1839,16 +1835,31 @@ static int do_fetch(struct transport *transport,
 		free_refs(tags_ref_map);
 	}
 
-	if (transaction) {
-		if (retcode)
-			goto cleanup;
+	if (retcode)
+		goto cleanup;
+
+	retcode = ref_transaction_commit(transaction, &err);
+	if (retcode) {
+		/*
+		 * Explicitly handle transaction cleanup to avoid
+		 * aborting an already closed transaction.
+		 */
+		ref_transaction_free(transaction);
+		transaction = NULL;
+		goto cleanup;
+	}
 
-		retcode = ref_transaction_commit(transaction, &err);
+	if (!atomic_fetch) {
+		struct ref_rejection_data data = {
+			.retcode = &retcode,
+			.conflict_msg_shown = 0,
+			.remote_name = transport->remote->name,
+		};
+
+		ref_transaction_for_each_rejected_update(transaction,
+							 ref_transaction_rejection_handler,
+							 &data);
 		if (retcode) {
-			/*
-			 * Explicitly handle transaction cleanup to avoid
-			 * aborting an already closed transaction.
-			 */
 			ref_transaction_free(transaction);
 			transaction = NULL;
 			goto cleanup;

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] fetch: use batched reference updates
  2025-05-14  9:03 ` [PATCH 1/3] fetch: " Karthik Nayak
@ 2025-05-14 12:31   ` Patrick Steinhardt
  2025-05-15 11:13     ` Karthik Nayak
  2025-05-14 17:36   ` Junio C Hamano
  1 sibling, 1 reply; 41+ messages in thread
From: Patrick Steinhardt @ 2025-05-14 12:31 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon

On Wed, May 14, 2025 at 11:03:47AM +0200, Karthik Nayak wrote:
> The reference updates performed as a part of 'git-fetch(1)', take place

s/,//

> one at a time. For each reference update, a new transaction is created
> and committed. This is necessary to ensure we can allow individual
> updates to fail without failing the entire command. The command also
> supports an '--atomic' mode, which uses a single transaction to update
> all of the references. But this mode has an all-or-nothing approach,
> where if a single update fails, all updates would fail.
> 
> In 23fc8e4f61 (refs: implement batch reference update support,
> 2025-04-08), we introduced a new mechanism to batch reference updates.
> Under the hood, this uses a single transaction to perform a batch of
> reference updates, while allowing only individual updates to fail.
> Utilize this newly introduced batch update mechanism in 'git-fetch(1)'.
> This provides a significant bump in performance, especially when dealing
> with repositories with large number of references.
> 
> Adding support for batched updates is simply modifying the flow to also
> create a batch update transaction in the non-atomic flow.
> 
> With the reftable backend there is a 22x performance improvement, when
> performing 'git-fetch(1)' with 10000 refs:
> 
>   Benchmark 1: fetch: many refs (refformat = reftable, refcount = 10000, revision = master)
>     Time (mean ± σ):      3.403 s ±  0.775 s    [User: 1.875 s, System: 1.417 s]
>     Range (min … max):    2.454 s …  4.529 s    10 runs
> 
>   Benchmark 2: fetch: many refs (refformat = reftable, refcount = 10000, revision = HEAD)
>     Time (mean ± σ):     154.3 ms ±  17.6 ms    [User: 102.5 ms, System: 56.1 ms]
>     Range (min … max):   145.2 ms … 220.5 ms    18 runs
> 
>   Summary
>     fetch: many refs (refformat = reftable, refcount = 10000, revision = HEAD) ran
>      22.06 ± 5.62 times faster than fetch: many refs (refformat = reftable, refcount = 10000, revision = master)

Nice. The speedup is larger than I have originally anticipated, but I
certainly won't complain about that :) For a good part, the speedup
should result from us not having to do 10000 auto-compactions anymore
for each of the updates, as well as not having to write 10000 new
tables. Instead, we only write a single new table and perform compaction
a single time, only.

> In similar conditions, the files backend sees a 1.25x performance
> improvement:
> 
>   Benchmark 1: fetch: many refs (refformat = files, refcount = 10000, revision = master)
>     Time (mean ± σ):     605.5 ms ±   9.4 ms    [User: 117.8 ms, System: 483.3 ms]
>     Range (min … max):   595.6 ms … 621.5 ms    10 runs
> 
>   Benchmark 2: fetch: many refs (refformat = files, refcount = 10000, revision = HEAD)
>     Time (mean ± σ):     485.8 ms ±   4.3 ms    [User: 91.1 ms, System: 396.7 ms]
>     Range (min … max):   477.6 ms … 494.3 ms    10 runs
> 
>   Summary
>     fetch: many refs (refformat = files, refcount = 10000, revision = HEAD) ran
>       1.25 ± 0.02 times faster than fetch: many refs (refformat = files, refcount = 10000, revision = master)

Yup, this makes sense, as well. We lose a bunch of overhead by creating
separate transactions for each of the updates, but the slow path is
still that we have to create 10000 files for each of the references. So
it's expected to see a small performance improvement, but nothing game
changing.

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 5279997c96..1558f6d1e8 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1688,6 +1644,32 @@ static int set_head(const struct ref *remote_refs, struct remote *remote)
>  	return result;
>  }
>  
> +struct ref_rejection_data {
> +	int *retcode;
> +	int conflict_msg_shown;
> +	const char *remote_name;
> +};
> +
> +static void ref_transaction_rejection_handler(const char *refname UNUSED,
> +					      const struct object_id *old_oid UNUSED,
> +					      const struct object_id *new_oid UNUSED,
> +					      const char *old_target UNUSED,
> +					      const char *new_target UNUSED,
> +					      enum ref_transaction_error err,
> +					      void *cb_data)
> +{
> +	struct ref_rejection_data *data = (struct ref_rejection_data *)cb_data;
> +
> +	if (err == REF_TRANSACTION_ERROR_NAME_CONFLICT && !data->conflict_msg_shown) {
> +		error(_("some local refs could not be updated; try running\n"
> +			" 'git remote prune %s' to remove any old, conflicting "
> +			"branches"), data->remote_name);
> +		data->conflict_msg_shown = 1;
> +	}
> +
> +	*data->retcode = 1;
> +}
> +
>  static int do_fetch(struct transport *transport,
>  		    struct refspec *rs,
>  		    const struct fetch_config *config)

Okay, so we now handle errors over here. Is the handled error the only
error that we may see, or do we also accept other errors like D/F now?
If the latter, wouldn't it mean that we don't print any error messages
for those other failures at all? That might be quite confusing.

> @@ -1808,6 +1790,20 @@ static int do_fetch(struct transport *transport,
>  			retcode = 1;
>  	}
>  
> +	/*
> +	 * If not atomic, we can still use batched updates, which would be much
> +	 * more performent. We don't initiate the transaction before pruning,

s/performent/performant/

> +	 * since pruning must be an independent step, to avoid F/D conflicts.
> +	 */
> +	if (!transaction) {
> +		transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
> +							  REF_TRANSACTION_ALLOW_FAILURE, &err);
> +		if (!transaction) {
> +			retcode = -1;
> +			goto cleanup;
> +		}
> +	}
> +
>  	if (fetch_and_consume_refs(&display_state, transport, transaction, ref_map,
>  				   &fetch_head, config)) {
>  		retcode = 1;

Don't transactions handle D/F conflicts for us? Isn't that the sole
reason why for example `refs_verify_refname_available()` accepts an
"extras" parameter that is supposed to contain refs that are about to be
deleted?

Patrick

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] fetch: use batched reference updates
  2025-05-14 12:31   ` Patrick Steinhardt
@ 2025-05-15 11:13     ` Karthik Nayak
  2025-05-15 11:30       ` Patrick Steinhardt
  0 siblings, 1 reply; 41+ messages in thread
From: Karthik Nayak @ 2025-05-15 11:13 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, toon

[-- Attachment #1: Type: text/plain, Size: 4777 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

[snip]

> Yup, this makes sense, as well. We lose a bunch of overhead by creating
> separate transactions for each of the updates, but the slow path is
> still that we have to create 10000 files for each of the references. So
> it's expected to see a small performance improvement, but nothing game
> changing.
>

Exactly, but it is still a good bump in speed. Which is always welcome

>> diff --git a/builtin/fetch.c b/builtin/fetch.c
>> index 5279997c96..1558f6d1e8 100644
>> --- a/builtin/fetch.c
>> +++ b/builtin/fetch.c
>> @@ -1688,6 +1644,32 @@ static int set_head(const struct ref *remote_refs, struct remote *remote)
>>  	return result;
>>  }
>>
>> +struct ref_rejection_data {
>> +	int *retcode;
>> +	int conflict_msg_shown;
>> +	const char *remote_name;
>> +};
>> +
>> +static void ref_transaction_rejection_handler(const char *refname UNUSED,
>> +					      const struct object_id *old_oid UNUSED,
>> +					      const struct object_id *new_oid UNUSED,
>> +					      const char *old_target UNUSED,
>> +					      const char *new_target UNUSED,
>> +					      enum ref_transaction_error err,
>> +					      void *cb_data)
>> +{
>> +	struct ref_rejection_data *data = (struct ref_rejection_data *)cb_data;
>> +
>> +	if (err == REF_TRANSACTION_ERROR_NAME_CONFLICT && !data->conflict_msg_shown) {
>> +		error(_("some local refs could not be updated; try running\n"
>> +			" 'git remote prune %s' to remove any old, conflicting "
>> +			"branches"), data->remote_name);
>> +		data->conflict_msg_shown = 1;
>> +	}
>> +
>> +	*data->retcode = 1;
>> +}
>> +
>>  static int do_fetch(struct transport *transport,
>>  		    struct refspec *rs,
>>  		    const struct fetch_config *config)
>
> Okay, so we now handle errors over here. Is the handled error the only
> error that we may see, or do we also accept other errors like D/F now?
> If the latter, wouldn't it mean that we don't print any error messages
> for those other failures at all? That might be quite confusing.
>

I was mostly trying to replicate the current behavior, which is
- For F/D conflicts print this error message.
- For any other error, simply propagate the error code.

I do think there is merit in changing this, and since you're pointing
it out too, I think we should make this change. I've modified this to
now print a better message for other errors.

>> @@ -1808,6 +1790,20 @@ static int do_fetch(struct transport *transport,
>>  			retcode = 1;
>>  	}
>>
>> +	/*
>> +	 * If not atomic, we can still use batched updates, which would be much
>> +	 * more performent. We don't initiate the transaction before pruning,
>
> s/performent/performant/
>

Ah! Thanks.

>> +	 * since pruning must be an independent step, to avoid F/D conflicts.
>> +	 */
>> +	if (!transaction) {
>> +		transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
>> +							  REF_TRANSACTION_ALLOW_FAILURE, &err);
>> +		if (!transaction) {
>> +			retcode = -1;
>> +			goto cleanup;
>> +		}
>> +	}
>> +
>>  	if (fetch_and_consume_refs(&display_state, transport, transaction, ref_map,
>>  				   &fetch_head, config)) {
>>  		retcode = 1;
>
> Don't transactions handle D/F conflicts for us? Isn't that the sole
> reason why for example `refs_verify_refname_available()` accepts an
> "extras" parameter that is supposed to contain refs that are about to be
> deleted?
>

My understanding was a little different, from the documentation for the
function:

  If extras is non-NULL, it is a list of additional refnames with which
  refname is not allowed to conflict.

This is to capture additional conflicts. We want a way to avoid said
conflicts. That said, there is a 'skip' parameter which does exactly
what you're saying. But the transaction logic doesn't incorporate this
entirely. Specifically in the files backend, where we create a lock in
the filesystem, this would cause a conflict, consider the following:

  ❯ eza --tree .git/refs/remotes/
  .git/refs/remotes
  └── origin
      ├── dir
      │   └── file.lock
      ├── dir.lock
      └── HEAD

This is from the test 'branchname D/F conflict resolved by --prune', the
test prunes the existing reference 'refs/remotes/origin/dir/file' while
adding 'refs/remotes/origin/dir'. In 'lock_raw_ref()' we lock both and
read the reference, but this causes an issue since
'refs/remotes/origin/dir' exists as a directory already.

I would say this is logically solvable if we start treating conflict
resolution within updates as a first class problem. But perhaps that's
something of a patch series in itself and better solved outside of this?

> Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] fetch: use batched reference updates
  2025-05-15 11:13     ` Karthik Nayak
@ 2025-05-15 11:30       ` Patrick Steinhardt
  2025-05-15 11:36         ` Karthik Nayak
  0 siblings, 1 reply; 41+ messages in thread
From: Patrick Steinhardt @ 2025-05-15 11:30 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon

On Thu, May 15, 2025 at 11:13:32AM +0000, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> >> +	 * since pruning must be an independent step, to avoid F/D conflicts.
> >> +	 */
> >> +	if (!transaction) {
> >> +		transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
> >> +							  REF_TRANSACTION_ALLOW_FAILURE, &err);
> >> +		if (!transaction) {
> >> +			retcode = -1;
> >> +			goto cleanup;
> >> +		}
> >> +	}
> >> +
> >>  	if (fetch_and_consume_refs(&display_state, transport, transaction, ref_map,
> >>  				   &fetch_head, config)) {
> >>  		retcode = 1;
> >
> > Don't transactions handle D/F conflicts for us? Isn't that the sole
> > reason why for example `refs_verify_refname_available()` accepts an
> > "extras" parameter that is supposed to contain refs that are about to be
> > deleted?
> >
> 
> My understanding was a little different, from the documentation for the
> function:
> 
>   If extras is non-NULL, it is a list of additional refnames with which
>   refname is not allowed to conflict.
> 
> This is to capture additional conflicts. We want a way to avoid said
> conflicts. That said, there is a 'skip' parameter which does exactly
> what you're saying.

Oh, right, my mistake -- that's what I actually meant.

> But the transaction logic doesn't incorporate this
> entirely. Specifically in the files backend, where we create a lock in
> the filesystem, this would cause a conflict, consider the following:
> 
>   ❯ eza --tree .git/refs/remotes/
>   .git/refs/remotes
>   └── origin
>       ├── dir
>       │   └── file.lock
>       ├── dir.lock
>       └── HEAD
> 
> This is from the test 'branchname D/F conflict resolved by --prune', the
> test prunes the existing reference 'refs/remotes/origin/dir/file' while
> adding 'refs/remotes/origin/dir'. In 'lock_raw_ref()' we lock both and
> read the reference, but this causes an issue since
> 'refs/remotes/origin/dir' exists as a directory already.
> 
> I would say this is logically solvable if we start treating conflict
> resolution within updates as a first class problem. But perhaps that's
> something of a patch series in itself and better solved outside of this?

Fair enough, makes sense. Might be worth it to add a TODO comment there
then?

Patrick

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] fetch: use batched reference updates
  2025-05-15 11:30       ` Patrick Steinhardt
@ 2025-05-15 11:36         ` Karthik Nayak
  0 siblings, 0 replies; 41+ messages in thread
From: Karthik Nayak @ 2025-05-15 11:36 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, toon

[-- Attachment #1: Type: text/plain, Size: 2518 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> On Thu, May 15, 2025 at 11:13:32AM +0000, Karthik Nayak wrote:
>> Patrick Steinhardt <ps@pks.im> writes:
>> >> +	 * since pruning must be an independent step, to avoid F/D conflicts.
>> >> +	 */
>> >> +	if (!transaction) {
>> >> +		transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
>> >> +							  REF_TRANSACTION_ALLOW_FAILURE, &err);
>> >> +		if (!transaction) {
>> >> +			retcode = -1;
>> >> +			goto cleanup;
>> >> +		}
>> >> +	}
>> >> +
>> >>  	if (fetch_and_consume_refs(&display_state, transport, transaction, ref_map,
>> >>  				   &fetch_head, config)) {
>> >>  		retcode = 1;
>> >
>> > Don't transactions handle D/F conflicts for us? Isn't that the sole
>> > reason why for example `refs_verify_refname_available()` accepts an
>> > "extras" parameter that is supposed to contain refs that are about to be
>> > deleted?
>> >
>>
>> My understanding was a little different, from the documentation for the
>> function:
>>
>>   If extras is non-NULL, it is a list of additional refnames with which
>>   refname is not allowed to conflict.
>>
>> This is to capture additional conflicts. We want a way to avoid said
>> conflicts. That said, there is a 'skip' parameter which does exactly
>> what you're saying.
>
> Oh, right, my mistake -- that's what I actually meant.
>
>> But the transaction logic doesn't incorporate this
>> entirely. Specifically in the files backend, where we create a lock in
>> the filesystem, this would cause a conflict, consider the following:
>>
>>   ❯ eza --tree .git/refs/remotes/
>>   .git/refs/remotes
>>   └── origin
>>       ├── dir
>>       │   └── file.lock
>>       ├── dir.lock
>>       └── HEAD
>>
>> This is from the test 'branchname D/F conflict resolved by --prune', the
>> test prunes the existing reference 'refs/remotes/origin/dir/file' while
>> adding 'refs/remotes/origin/dir'. In 'lock_raw_ref()' we lock both and
>> read the reference, but this causes an issue since
>> 'refs/remotes/origin/dir' exists as a directory already.
>>
>> I would say this is logically solvable if we start treating conflict
>> resolution within updates as a first class problem. But perhaps that's
>> something of a patch series in itself and better solved outside of this?
>
> Fair enough, makes sense. Might be worth it to add a TODO comment there
> then?
>

Yeah totally agree, will add in a TODO here.

> Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] fetch: use batched reference updates
  2025-05-14  9:03 ` [PATCH 1/3] fetch: " Karthik Nayak
  2025-05-14 12:31   ` Patrick Steinhardt
@ 2025-05-14 17:36   ` Junio C Hamano
  1 sibling, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2025-05-14 17:36 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon, ps

Karthik Nayak <karthik.188@gmail.com> writes:

> With this we'll either be using a regular transaction or a batch update
> transaction. This helps cleanup some code which is no longer needed as
> we'll now always have some type of 'ref_transaction' object being
> propagated.

Great.  From the above description, I imagined that the change
involved would be removal of all "update one by one" code paths, and
addition of a new line to set one bit in the transaction object that
says "this is not the usual all-or-none transaction but is a
best-effort batch", but it does not seem to lose as many lines as I
hoped ;-)

But still, conceptually this change should simplify the things quite
a bit.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 2/3] send-pack: fix memory leak around duplicate refs
  2025-05-14  9:03 [PATCH 0/3] fetch/receive: use batched reference updates Karthik Nayak
  2025-05-14  9:03 ` [PATCH 1/3] fetch: " Karthik Nayak
@ 2025-05-14  9:03 ` Karthik Nayak
  2025-05-14 17:46   ` Junio C Hamano
  2025-05-14  9:03 ` [PATCH 3/3] receive-pack: use batched reference updates Karthik Nayak
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 41+ messages in thread
From: Karthik Nayak @ 2025-05-14  9:03 UTC (permalink / raw)
  To: git; +Cc: toon, ps, Karthik Nayak

The 'git-send-pack(1)' allows users to push objects to a remote
repository and explicitly list the references to be pushed. The status
of each reference pushed is captured into a list mapped by refname.

If a reference fails to be updated, its error message is captured in the
`ref->remote_status` field. While the command allows duplicate ref
inputs, the list of doesn't accommodate this behavior as a particular
refname is linked to a single `struct ref*` element. So if the user
inputs a reference twice like:

  git send-pack remote.git A:foo B:foo

where the user is trying to update the same reference 'foo' twice and
the reference fails to be updated, we first fill `ref->remote_status`
with error message for the input 'A:foo' then we override the same field
with the error message for 'B:foo'. This override happens without first
free'ing the previous value. Fix this leak.

The current tests already incorporate the above example, but in the test
'A:foo' succeeds while 'B:foo' fails, meaning that the memory leak isn't
triggered. Add a new test with multiple duplicates.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 send-pack.c                | 7 +++++++
 t/t5408-send-pack-stdin.sh | 6 ++++++
 2 files changed, 13 insertions(+)

diff --git a/send-pack.c b/send-pack.c
index 5005689cb5..4cd41a64ce 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -260,6 +260,13 @@ static int receive_status(struct repository *r,
 				refname);
 			continue;
 		}
+
+		/*
+		 * Clients sending duplicate refs can cause the same value
+		 * to be overridden, causing a memory leak.
+		 */
+		free(hint->remote_status);
+
 		if (!strcmp(head, "ng")) {
 			hint->status = REF_STATUS_REMOTE_REJECT;
 			if (p)
diff --git a/t/t5408-send-pack-stdin.sh b/t/t5408-send-pack-stdin.sh
index 526a675045..45fb20179b 100755
--- a/t/t5408-send-pack-stdin.sh
+++ b/t/t5408-send-pack-stdin.sh
@@ -73,6 +73,12 @@ test_expect_success 'cmdline refs written in order' '
 	verify_push A foo
 '

+test_expect_success 'cmdline refs with multiple duplicates' '
+	clear_remote &&
+	test_must_fail git send-pack remote.git A:foo B:foo C:foo &&
+	verify_push A foo
+'
+
 test_expect_success '--stdin refs come after cmdline' '
 	clear_remote &&
 	echo A:foo >input &&

-- 
2.49.0

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] send-pack: fix memory leak around duplicate refs
  2025-05-14  9:03 ` [PATCH 2/3] send-pack: fix memory leak around duplicate refs Karthik Nayak
@ 2025-05-14 17:46   ` Junio C Hamano
  2025-05-15 11:23     ` Karthik Nayak
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2025-05-14 17:46 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon, ps

Karthik Nayak <karthik.188@gmail.com> writes:

> The 'git-send-pack(1)' allows users to push objects to a remote
> repository and explicitly list the references to be pushed. The status
> of each reference pushed is captured into a list mapped by refname.
>
> If a reference fails to be updated, its error message is captured in the
> `ref->remote_status` field. While the command allows duplicate ref
> inputs, the list of doesn't accommodate this behavior as a particular

-ECANNOTPARSE around "the list of doesn't accommodate".

> refname is linked to a single `struct ref*` element. So if the user
> inputs a reference twice like:
>
>   git send-pack remote.git A:foo B:foo
>
> where the user is trying to update the same reference 'foo' twice and
> the reference fails to be updated, we first fill `ref->remote_status`
> with error message for the input 'A:foo' then we override the same field
> with the error message for 'B:foo'. This override happens without first
> free'ing the previous value. Fix this leak.

OK.  A natural question is what happens when A successfully updates
and B fails, or A fails but B successfully updates (failing both is
much less interesting).  What should happen in such cases is unclear
but my gut feeling is that the last-one wins (which you implemented)
is probably just as OK as the first-one gets retained (which might
be less work at runtime), and perhaps keeping-the-more-severe-one
might be more useful than either of these two, but I didn't think
it through.

THanks.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] send-pack: fix memory leak around duplicate refs
  2025-05-14 17:46   ` Junio C Hamano
@ 2025-05-15 11:23     ` Karthik Nayak
  0 siblings, 0 replies; 41+ messages in thread
From: Karthik Nayak @ 2025-05-15 11:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, toon, ps

[-- Attachment #1: Type: text/plain, Size: 2732 bytes --]

Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> The 'git-send-pack(1)' allows users to push objects to a remote
>> repository and explicitly list the references to be pushed. The status
>> of each reference pushed is captured into a list mapped by refname.
>>
>> If a reference fails to be updated, its error message is captured in the
>> `ref->remote_status` field. While the command allows duplicate ref
>> inputs, the list of doesn't accommodate this behavior as a particular
>
> -ECANNOTPARSE around "the list of doesn't accommodate".
>

Indeed, a s/of// should make this readable.

>> refname is linked to a single `struct ref*` element. So if the user
>> inputs a reference twice like:
>>
>>   git send-pack remote.git A:foo B:foo
>>
>> where the user is trying to update the same reference 'foo' twice and
>> the reference fails to be updated, we first fill `ref->remote_status`
>> with error message for the input 'A:foo' then we override the same field
>> with the error message for 'B:foo'. This override happens without first
>> free'ing the previous value. Fix this leak.
>
> OK.  A natural question is what happens when A successfully updates
> and B fails, or A fails but B successfully updates (failing both is
> much less interesting).  What should happen in such cases is unclear
> but my gut feeling is that the last-one wins (which you implemented)
> is probably just as OK as the first-one gets retained (which might
> be less work at runtime), and perhaps keeping-the-more-severe-one
> might be more useful than either of these two, but I didn't think
> it through.
>

I think the whole thing reeks a little bit. We shouldn't allow such
duplicates in the first place. This is something we tackle in the next
patch, where any such duplicates fails all updates. This is a lot more
deterministic and also the error reporting to the user is inline with
what is actually done.

As of this patch, if there are duplicates, one success one failure,
means that the ref is updated once. But we report to the user that both
the refs failed to be updated as:

  $ git send-pack remote.git A:foo B:foo
  Enumerating objects: 3, done.
  Counting objects: 100% (3/3), done.
  Delta compression using up to 20 threads
  Compressing objects: 100% (2/2), done.
  Writing objects: 100% (3/3), 226 bytes | 226.00 KiB/s, done.
  Total 3 (delta 1), reused 0 (delta 0), pack-reused 0 (from 0)
  remote: error: cannot lock ref 'refs/heads/foo': reference already exists
  To remote.git
   ! [remote rejected] A -> foo (failed to update ref)
   ! [remote failure]  B -> foo (remote failed to report status)

This is totally wrong, since we actually do update one of the refs.

> THanks.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 3/3] receive-pack: use batched reference updates
  2025-05-14  9:03 [PATCH 0/3] fetch/receive: use batched reference updates Karthik Nayak
  2025-05-14  9:03 ` [PATCH 1/3] fetch: " Karthik Nayak
  2025-05-14  9:03 ` [PATCH 2/3] send-pack: fix memory leak around duplicate refs Karthik Nayak
@ 2025-05-14  9:03 ` Karthik Nayak
  2025-05-14 12:31   ` Patrick Steinhardt
  2025-05-15 14:07 ` [PATCH v2 0/4] fetch/receive: " Karthik Nayak
  2025-05-19  9:58 ` [PATCH v3 0/4] fetch/receive: " Karthik Nayak
  4 siblings, 1 reply; 41+ messages in thread
From: Karthik Nayak @ 2025-05-14  9:03 UTC (permalink / raw)
  To: git; +Cc: toon, ps, Karthik Nayak

The reference updates performed as a part of 'git-receive-pack(1)', take
place one at a time. For each reference update, a new transaction is
created and committed. This is necessary to ensure we can allow
individual updates to fail without failing the entire command. The
command also supports an 'atomic' mode, which uses a single transaction
to update all of the references. But this mode has an all-or-nothing
approach, where if a single update fails, all updates would fail.

In 23fc8e4f61 (refs: implement batch reference update support,
2025-04-08), we introduced a new mechanism to batch reference updates.
Under the hood, this uses a single transaction to perform a batch of
reference updates, while allowing only individual updates to fail.
Utilize this newly introduced batch update mechanism in
'git-receive-pack(1)'. This provides a significant bump in performance,
especially when dealing with repositories with large number of
references.

With the reftable backend there is a 18x performance improvement, when
performing receive-pack with 10000 refs:

  Benchmark 1: receive: many refs (refformat = reftable, refcount = 10000, revision = master)
    Time (mean ± σ):      4.276 s ±  0.078 s    [User: 0.796 s, System: 3.318 s]
    Range (min … max):    4.185 s …  4.430 s    10 runs

  Benchmark 2: receive: many refs (refformat = reftable, refcount = 10000, revision = HEAD)
    Time (mean ± σ):     235.4 ms ±   6.9 ms    [User: 75.4 ms, System: 157.3 ms]
    Range (min … max):   228.5 ms … 254.2 ms    11 runs

  Summary
    receive: many refs (refformat = reftable, refcount = 10000, revision = HEAD) ran
     18.16 ± 0.63 times faster than receive: many refs (refformat = reftable, refcount = 10000, revision = master)

In similar conditions, the files backend sees a 1.21x performance
improvement:

  Benchmark 1: receive: many refs (refformat = files, refcount = 10000, revision = master)
    Time (mean ± σ):      1.121 s ±  0.021 s    [User: 0.128 s, System: 0.975 s]
    Range (min … max):    1.097 s …  1.156 s    10 runs

  Benchmark 2: receive: many refs (refformat = files, refcount = 10000, revision = HEAD)
    Time (mean ± σ):     927.9 ms ±  22.6 ms    [User: 99.0 ms, System: 815.2 ms]
    Range (min … max):   903.1 ms … 978.0 ms    10 runs

  Summary
    receive: many refs (refformat = files, refcount = 10000, revision = HEAD) ran
      1.21 ± 0.04 times faster than receive: many refs (refformat = files, refcount = 10000, revision = master)

As using batched updates requires the error handling to be moved to the
end of the flow, create and use a 'struct strset' to track the failed
refs and attribute the correct errors to them.

This change also uncovers an issue when a client provides multiple
updates to the same reference. For example:

  $ git send-pack remote.git A:foo B:foo
  Enumerating objects: 3, done.
  Counting objects: 100% (3/3), done.
  Delta compression using up to 20 threads
  Compressing objects: 100% (2/2), done.
  Writing objects: 100% (3/3), 226 bytes | 226.00 KiB/s, done.
  Total 3 (delta 1), reused 0 (delta 0), pack-reused 0 (from 0)
  remote: error: cannot lock ref 'refs/heads/foo': reference already exists
  To remote.git
   ! [remote rejected] A -> foo (failed to update ref)
   ! [remote failure]  B -> foo (remote failed to report status)

As you can see, the remote runs into an error because it cannot lock the
target reference for the second update. Furthermore, the remote complains
that the first update has been rejected whereas the second update didn't
receive any status update because we failed to lock it. Reading this status
message alone a user would probably expect that `foo` has not been updated
at all. But that's not the case: while we claim that the ref wasn't updated,
it surprisingly points to `A` now.

One could argue that this is merely an error in how we report the result of
this push. But ultimately, the user's request itself is already broken and
doesn't make any sense in the first place and cannot ever lead to a sensible
outcome that honors the full request.

The conversion to batched transactions fixes the issue because we now try to
queue both updates in the same transaction. As such, the transaction itself
will notice this conflict and refuse the update altogether before we commit
any of the values.

Note that this requires changes to a couple of tests in t5408 that happened
to exercise this behaviour. Given that the generated output is misleading
and given that the user request cannot ever be fully honored this really
feels more like a bug than properly designed behaviour. As such, changing
the behaviour feels like the right thing to do.

Since now reference updates are batched, the 'reference-transaction'
hook will be invoked with all updates together. Currently git will 'die'
when the hook returns with a non-zero exit status in the 'prepared'
stage. For 'git-receive-pack(1)', this allowed users to reject an
individual reference update, git would have applied previous updates but
immediately abort further execution. This is definitely an incorrect
usage of this hook, since the right place to do this would be the
'update' hook. This patch retains the latter behavior, but
'reference-transaction' hook now changes to a all-or-nothing behavior
when a non-zero exit status is returned in the 'prepared' stage, since
batch updates use a transaction under the hood. This explains the change
in 't1416'.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 builtin/receive-pack.c           | 88 ++++++++++++++++++++++++++++++++--------
 t/t1416-ref-transaction-hooks.sh |  2 -
 t/t5408-send-pack-stdin.sh       | 12 +++---
 3 files changed, 79 insertions(+), 23 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index be314879e8..b4fceb3837 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1843,35 +1843,91 @@ static void BUG_if_skipped_connectivity_check(struct command *commands,
 	BUG_if_bug("connectivity check skipped???");
 }
 
+static void ref_transaction_rejection_handler(const char *refname,
+					      const struct object_id *old_oid UNUSED,
+					      const struct object_id *new_oid UNUSED,
+					      const char *old_target UNUSED,
+					      const char *new_target UNUSED,
+					      enum ref_transaction_error err,
+					      void *cb_data)
+{
+	struct strmap *failed_refs = (struct strmap *)cb_data;
+	const char *reason = "";
+
+	switch (err) {
+	case REF_TRANSACTION_ERROR_NAME_CONFLICT:
+		reason = "refname conflict";
+		break;
+	case REF_TRANSACTION_ERROR_CREATE_EXISTS:
+		reason = "reference already exists";
+		break;
+	case REF_TRANSACTION_ERROR_NONEXISTENT_REF:
+		reason = "reference does not exist";
+		break;
+	case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE:
+		reason = "incorrect old value provided";
+		break;
+	case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE:
+		reason = "invalid new value provided";
+		break;
+	case REF_TRANSACTION_ERROR_EXPECTED_SYMREF:
+		reason = "expected symref but found regular ref";
+		break;
+	default:
+		reason = "unkown failure";
+	}
+
+	strmap_put(failed_refs, refname, xstrdup(reason));
+}
+
 static void execute_commands_non_atomic(struct command *commands,
 					struct shallow_info *si)
 {
 	struct command *cmd;
 	struct strbuf err = STRBUF_INIT;
+	const char *reported_error = "";
+	struct strmap failed_refs = STRMAP_INIT;
+
+	transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
+						  REF_TRANSACTION_ALLOW_FAILURE, &err);
+	if (!transaction) {
+		rp_error("%s", err.buf);
+		strbuf_reset(&err);
+		reported_error = "transaction failed to start";
+		goto failure;
+	}
 
 	for (cmd = commands; cmd; cmd = cmd->next) {
 		if (!should_process_cmd(cmd) || cmd->run_proc_receive)
 			continue;
 
-		transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
-							  0, &err);
-		if (!transaction) {
-			rp_error("%s", err.buf);
-			strbuf_reset(&err);
-			cmd->error_string = "transaction failed to start";
-			continue;
-		}
-
 		cmd->error_string = update(cmd, si);
+	}
 
-		if (!cmd->error_string
-		    && ref_transaction_commit(transaction, &err)) {
-			rp_error("%s", err.buf);
-			strbuf_reset(&err);
-			cmd->error_string = "failed to update ref";
-		}
-		ref_transaction_free(transaction);
+	if (ref_transaction_commit(transaction, &err)) {
+		rp_error("%s", err.buf);
+		reported_error = "failed to update refs";
+		goto failure;
 	}
+
+	ref_transaction_for_each_rejected_update(transaction,
+						 ref_transaction_rejection_handler,
+						 &failed_refs);
+
+	if (strmap_empty(&failed_refs))
+		goto cleanup;
+
+failure:
+	for (cmd = commands; cmd; cmd = cmd->next) {
+		if (strmap_empty(&failed_refs))
+			cmd->error_string = reported_error;
+		else if (strmap_contains(&failed_refs, cmd->ref_name))
+			cmd->error_string = xstrdup(strmap_get(&failed_refs, cmd->ref_name));
+	}
+
+cleanup:
+	ref_transaction_free(transaction);
+	strmap_clear(&failed_refs, 1);
 	strbuf_release(&err);
 }
 
diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh
index 8c777f7cf8..d91dd3a3b5 100755
--- a/t/t1416-ref-transaction-hooks.sh
+++ b/t/t1416-ref-transaction-hooks.sh
@@ -120,8 +120,6 @@ test_expect_success 'interleaving hook calls succeed' '
 
 	cat >expect <<-EOF &&
 		hooks/update refs/tags/PRE $ZERO_OID $PRE_OID
-		hooks/reference-transaction prepared
-		hooks/reference-transaction committed
 		hooks/update refs/tags/POST $ZERO_OID $POST_OID
 		hooks/reference-transaction prepared
 		hooks/reference-transaction committed
diff --git a/t/t5408-send-pack-stdin.sh b/t/t5408-send-pack-stdin.sh
index 45fb20179b..76fb8bbc68 100755
--- a/t/t5408-send-pack-stdin.sh
+++ b/t/t5408-send-pack-stdin.sh
@@ -69,21 +69,23 @@ test_expect_success 'stdin mixed with cmdline' '
 
 test_expect_success 'cmdline refs written in order' '
 	clear_remote &&
-	test_must_fail git send-pack remote.git A:foo B:foo &&
-	verify_push A foo
+	test_must_fail git send-pack remote.git A:foo B:foo 2>err &&
+	test_grep "multiple updates for ref ${SQ}refs/heads/foo${SQ} not allowed" err &&
+	test_must_fail git --git-dir=remote.git rev-parse foo
 '
 
 test_expect_success 'cmdline refs with multiple duplicates' '
 	clear_remote &&
-	test_must_fail git send-pack remote.git A:foo B:foo C:foo &&
-	verify_push A foo
+	test_must_fail git send-pack remote.git A:foo B:foo C:foo 2>err &&
+	test_grep "multiple updates for ref ${SQ}refs/heads/foo${SQ} not allowed" err &&
+	test_must_fail git --git-dir=remote.git rev-parse foo
 '
 
 test_expect_success '--stdin refs come after cmdline' '
 	clear_remote &&
 	echo A:foo >input &&
 	test_must_fail git send-pack remote.git --stdin B:foo <input &&
-	verify_push B foo
+	test_must_fail git --git-dir=remote.git rev-parse foo
 '
 
 test_expect_success 'refspecs and --mirror do not mix (cmdline)' '

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] receive-pack: use batched reference updates
  2025-05-14  9:03 ` [PATCH 3/3] receive-pack: use batched reference updates Karthik Nayak
@ 2025-05-14 12:31   ` Patrick Steinhardt
  2025-05-14 19:00     ` Junio C Hamano
  2025-05-15 11:30     ` Karthik Nayak
  0 siblings, 2 replies; 41+ messages in thread
From: Patrick Steinhardt @ 2025-05-14 12:31 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon

On Wed, May 14, 2025 at 11:03:49AM +0200, Karthik Nayak wrote:
[snip]
> With the reftable backend there is a 18x performance improvement, when
> performing receive-pack with 10000 refs:
> 
>   Benchmark 1: receive: many refs (refformat = reftable, refcount = 10000, revision = master)
>     Time (mean ± σ):      4.276 s ±  0.078 s    [User: 0.796 s, System: 3.318 s]
>     Range (min … max):    4.185 s …  4.430 s    10 runs
> 
>   Benchmark 2: receive: many refs (refformat = reftable, refcount = 10000, revision = HEAD)
>     Time (mean ± σ):     235.4 ms ±   6.9 ms    [User: 75.4 ms, System: 157.3 ms]
>     Range (min … max):   228.5 ms … 254.2 ms    11 runs
> 
>   Summary
>     receive: many refs (refformat = reftable, refcount = 10000, revision = HEAD) ran
>      18.16 ± 0.63 times faster than receive: many refs (refformat = reftable, refcount = 10000, revision = master)
> 
> In similar conditions, the files backend sees a 1.21x performance
> improvement:
> 
>   Benchmark 1: receive: many refs (refformat = files, refcount = 10000, revision = master)
>     Time (mean ± σ):      1.121 s ±  0.021 s    [User: 0.128 s, System: 0.975 s]
>     Range (min … max):    1.097 s …  1.156 s    10 runs
> 
>   Benchmark 2: receive: many refs (refformat = files, refcount = 10000, revision = HEAD)
>     Time (mean ± σ):     927.9 ms ±  22.6 ms    [User: 99.0 ms, System: 815.2 ms]
>     Range (min … max):   903.1 ms … 978.0 ms    10 runs
> 
>   Summary
>     receive: many refs (refformat = files, refcount = 10000, revision = HEAD) ran
>       1.21 ± 0.04 times faster than receive: many refs (refformat = files, refcount = 10000, revision = master)

We see almost the same speedups as we saw in git-fetch(1), and the
reason why we see them is basically the same.

> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index be314879e8..b4fceb3837 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -1843,35 +1843,91 @@ static void BUG_if_skipped_connectivity_check(struct command *commands,
>  	BUG_if_bug("connectivity check skipped???");
>  }
>  
> +static void ref_transaction_rejection_handler(const char *refname,
> +					      const struct object_id *old_oid UNUSED,
> +					      const struct object_id *new_oid UNUSED,
> +					      const char *old_target UNUSED,
> +					      const char *new_target UNUSED,
> +					      enum ref_transaction_error err,
> +					      void *cb_data)
> +{
> +	struct strmap *failed_refs = (struct strmap *)cb_data;

This cast is unnecessary.

> +	const char *reason = "";
> +
> +	switch (err) {
> +	case REF_TRANSACTION_ERROR_NAME_CONFLICT:
> +		reason = "refname conflict";
> +		break;
> +	case REF_TRANSACTION_ERROR_CREATE_EXISTS:
> +		reason = "reference already exists";
> +		break;
> +	case REF_TRANSACTION_ERROR_NONEXISTENT_REF:
> +		reason = "reference does not exist";
> +		break;
> +	case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE:
> +		reason = "incorrect old value provided";
> +		break;
> +	case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE:
> +		reason = "invalid new value provided";
> +		break;
> +	case REF_TRANSACTION_ERROR_EXPECTED_SYMREF:
> +		reason = "expected symref but found regular ref";
> +		break;
> +	default:
> +		reason = "unkown failure";
> +	}
> +
> +	strmap_put(failed_refs, refname, xstrdup(reason));
> +}

I'd have expected something like this for git-fetch(1), as well, so that
we don't silently swallow failed ref updates. Would it make sense to
maybe provide an array of reasons by enum so that we can reuse those
messages?

>  static void execute_commands_non_atomic(struct command *commands,
>  					struct shallow_info *si)
>  {
>  	struct command *cmd;
>  	struct strbuf err = STRBUF_INIT;
> +	const char *reported_error = "";
> +	struct strmap failed_refs = STRMAP_INIT;
> +
> +	transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
> +						  REF_TRANSACTION_ALLOW_FAILURE, &err);
> +	if (!transaction) {
> +		rp_error("%s", err.buf);
> +		strbuf_reset(&err);
> +		reported_error = "transaction failed to start";
> +		goto failure;
> +	}

Okay. We now create a single transaction with failures being allowed.

>  	for (cmd = commands; cmd; cmd = cmd->next) {
>  		if (!should_process_cmd(cmd) || cmd->run_proc_receive)
>  			continue;
>  
> -		transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
> -							  0, &err);
> -		if (!transaction) {
> -			rp_error("%s", err.buf);
> -			strbuf_reset(&err);
> -			cmd->error_string = "transaction failed to start";
> -			continue;
> -		}
> -
>  		cmd->error_string = update(cmd, si);
> +	}

So here we only need to queue each update.

> -		if (!cmd->error_string
> -		    && ref_transaction_commit(transaction, &err)) {
> -			rp_error("%s", err.buf);
> -			strbuf_reset(&err);
> -			cmd->error_string = "failed to update ref";
> -		}
> -		ref_transaction_free(transaction);
> +	if (ref_transaction_commit(transaction, &err)) {
> +		rp_error("%s", err.buf);
> +		reported_error = "failed to update refs";
> +		goto failure;
>  	}
> +
> +	ref_transaction_for_each_rejected_update(transaction,
> +						 ref_transaction_rejection_handler,
> +						 &failed_refs);
> +
> +	if (strmap_empty(&failed_refs))
> +		goto cleanup;
> +
> +failure:
> +	for (cmd = commands; cmd; cmd = cmd->next) {
> +		if (strmap_empty(&failed_refs))
> +			cmd->error_string = reported_error;

The reported error may have one of there values now:

  - The empty string. Is it correct to set the error string to that
    value? Shouldn't it rather be a `NULL` pointer?

  - "transaction failed to start". It makes sense to update every
    command accordingly, as we wouldn't have updated any of them.

  - "failed to update refs", in case the commit failed. Does the commit
    fail only in cases where we couldn't update _any_ reference, or does
    it also retrun an error when only one of the updates failed? If the
    latter, we shouldn't update all the others to "failed", should we?

In any case, it feels weird that we use `strmap_empty()` to check for
this condition. I'd have expected that we rather check for
`reported_error` to be a non-empty string directly to figure out whether
the transaction itself has failed as a whole. Like that, we'd know that
we only ever do this if we hit a `goto failure` branch.

Patrick

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] receive-pack: use batched reference updates
  2025-05-14 12:31   ` Patrick Steinhardt
@ 2025-05-14 19:00     ` Junio C Hamano
  2025-05-15 11:30     ` Karthik Nayak
  1 sibling, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2025-05-14 19:00 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Karthik Nayak, git, toon

Patrick Steinhardt <ps@pks.im> writes:

>> +	switch (err) {
>> +	case REF_TRANSACTION_ERROR_NAME_CONFLICT:
>> +		reason = "refname conflict";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_CREATE_EXISTS:
>> +		reason = "reference already exists";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_NONEXISTENT_REF:
>> +		reason = "reference does not exist";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE:
>> +		reason = "incorrect old value provided";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE:
>> +		reason = "invalid new value provided";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_EXPECTED_SYMREF:
>> +		reason = "expected symref but found regular ref";
>> +		break;
>> +	default:
>> +		reason = "unkown failure";
>> +	}
>> +
>> +	strmap_put(failed_refs, refname, xstrdup(reason));
>> +}
>
> I'd have expected something like this for git-fetch(1), as well, so that
> we don't silently swallow failed ref updates. Would it make sense to
> maybe provide an array of reasons by enum so that we can reuse those
> messages?

Excellent.  It does not have to be an array, but a helper function
that takes an enum and returns a "const char *".

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] receive-pack: use batched reference updates
  2025-05-14 12:31   ` Patrick Steinhardt
  2025-05-14 19:00     ` Junio C Hamano
@ 2025-05-15 11:30     ` Karthik Nayak
  1 sibling, 0 replies; 41+ messages in thread
From: Karthik Nayak @ 2025-05-15 11:30 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, toon

[-- Attachment #1: Type: text/plain, Size: 6080 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

[snip]

>> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
>> index be314879e8..b4fceb3837 100644
>> --- a/builtin/receive-pack.c
>> +++ b/builtin/receive-pack.c
>> @@ -1843,35 +1843,91 @@ static void BUG_if_skipped_connectivity_check(struct command *commands,
>>  	BUG_if_bug("connectivity check skipped???");
>>  }
>>
>> +static void ref_transaction_rejection_handler(const char *refname,
>> +					      const struct object_id *old_oid UNUSED,
>> +					      const struct object_id *new_oid UNUSED,
>> +					      const char *old_target UNUSED,
>> +					      const char *new_target UNUSED,
>> +					      enum ref_transaction_error err,
>> +					      void *cb_data)
>> +{
>> +	struct strmap *failed_refs = (struct strmap *)cb_data;
>
> This cast is unnecessary.
>

Yes, will remove.

>> +	const char *reason = "";
>> +
>> +	switch (err) {
>> +	case REF_TRANSACTION_ERROR_NAME_CONFLICT:
>> +		reason = "refname conflict";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_CREATE_EXISTS:
>> +		reason = "reference already exists";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_NONEXISTENT_REF:
>> +		reason = "reference does not exist";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE:
>> +		reason = "incorrect old value provided";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE:
>> +		reason = "invalid new value provided";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_EXPECTED_SYMREF:
>> +		reason = "expected symref but found regular ref";
>> +		break;
>> +	default:
>> +		reason = "unkown failure";
>> +	}
>> +
>> +	strmap_put(failed_refs, refname, xstrdup(reason));
>> +}
>
> I'd have expected something like this for git-fetch(1), as well, so that
> we don't silently swallow failed ref updates. Would it make sense to
> maybe provide an array of reasons by enum so that we can reuse those
> messages?
>

I think you've convinced me of this, As Junio also mentions, I've now
added a function to provide a mapping from 'enum -> char *'.

>>  static void execute_commands_non_atomic(struct command *commands,
>>  					struct shallow_info *si)
>>  {
>>  	struct command *cmd;
>>  	struct strbuf err = STRBUF_INIT;
>> +	const char *reported_error = "";
>> +	struct strmap failed_refs = STRMAP_INIT;
>> +
>> +	transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
>> +						  REF_TRANSACTION_ALLOW_FAILURE, &err);
>> +	if (!transaction) {
>> +		rp_error("%s", err.buf);
>> +		strbuf_reset(&err);
>> +		reported_error = "transaction failed to start";
>> +		goto failure;
>> +	}
>
> Okay. We now create a single transaction with failures being allowed.
>
>>  	for (cmd = commands; cmd; cmd = cmd->next) {
>>  		if (!should_process_cmd(cmd) || cmd->run_proc_receive)
>>  			continue;
>>
>> -		transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
>> -							  0, &err);
>> -		if (!transaction) {
>> -			rp_error("%s", err.buf);
>> -			strbuf_reset(&err);
>> -			cmd->error_string = "transaction failed to start";
>> -			continue;
>> -		}
>> -
>>  		cmd->error_string = update(cmd, si);
>> +	}
>
> So here we only need to queue each update.
>
>> -		if (!cmd->error_string
>> -		    && ref_transaction_commit(transaction, &err)) {
>> -			rp_error("%s", err.buf);
>> -			strbuf_reset(&err);
>> -			cmd->error_string = "failed to update ref";
>> -		}
>> -		ref_transaction_free(transaction);
>> +	if (ref_transaction_commit(transaction, &err)) {
>> +		rp_error("%s", err.buf);
>> +		reported_error = "failed to update refs";
>> +		goto failure;
>>  	}
>> +
>> +	ref_transaction_for_each_rejected_update(transaction,
>> +						 ref_transaction_rejection_handler,
>> +						 &failed_refs);
>> +
>> +	if (strmap_empty(&failed_refs))
>> +		goto cleanup;
>> +
>> +failure:
>> +	for (cmd = commands; cmd; cmd = cmd->next) {
>> +		if (strmap_empty(&failed_refs))
>> +			cmd->error_string = reported_error;
>
> The reported error may have one of there values now:
>
>   - The empty string. Is it correct to set the error string to that
>     value? Shouldn't it rather be a `NULL` pointer?
>
>   - "transaction failed to start". It makes sense to update every
>     command accordingly, as we wouldn't have updated any of them.
>
>   - "failed to update refs", in case the commit failed. Does the commit
>     fail only in cases where we couldn't update _any_ reference, or does
>     it also retrun an error when only one of the updates failed? If the
>     latter, we shouldn't update all the others to "failed", should we?
>
> In any case, it feels weird that we use `strmap_empty()` to check for
> this condition. I'd have expected that we rather check for
> `reported_error` to be a non-empty string directly to figure out whether
> the transaction itself has failed as a whole. Like that, we'd know that
> we only ever do this if we hit a `goto failure` branch.
>

This is silly mistake on my side. This is exactly what I wanted to do,
but instead of:

  const char *reported_error = NULL;

I did:

  const char *reported_error;

Which obviously raised:

  In function ‘execute_commands_non_atomic’,
      inlined from ‘execute_commands’ at ../builtin/receive-pack.c:2059:3,
      inlined from ‘cmd_receive_pack’ at ../builtin/receive-pack.c:2636:3:
  ../builtin/receive-pack.c:1899:43: warning: ‘reported_error’ may be
used uninitialized [-Wmaybe-uninitialized]
   1899 |                         cmd->error_string = reported_error;
        |                         ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
  ../builtin/receive-pack.c: In function ‘cmd_receive_pack’:
  ../builtin/receive-pack.c:1864:21: note: ‘reported_error’ was declared here
   1864 |         const char *reported_error;
        |                     ^~~~~~~~~~~~~~
  [31/31] Linking target git-upload-archive

prompting me to do this. Let me fix this. Thanks for pointing it out :)

> Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 0/4] fetch/receive: use batched reference updates
  2025-05-14  9:03 [PATCH 0/3] fetch/receive: use batched reference updates Karthik Nayak
                   ` (2 preceding siblings ...)
  2025-05-14  9:03 ` [PATCH 3/3] receive-pack: use batched reference updates Karthik Nayak
@ 2025-05-15 14:07 ` Karthik Nayak
  2025-05-15 14:07   ` [PATCH v2 1/4] refs: add function to translate errors to strings Karthik Nayak
                     ` (3 more replies)
  2025-05-19  9:58 ` [PATCH v3 0/4] fetch/receive: " Karthik Nayak
  4 siblings, 4 replies; 41+ messages in thread
From: Karthik Nayak @ 2025-05-15 14:07 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, toon, ps, gitster

The git-fetch(1) and git-receive-pack(1) commands update references as
part of the flow. Each reference update is treated as a single entity
and a transaction is created for each update.

This can be really slow, specifically in reference backends where there
are optimizations which ensure a single transaction with 'N' reference
update perform much faster than 'N' individual transactions. Also having
'N' individual transactions has buildup and teardown costs. These costs
add up in repositories with a large number of references.

Also specifically in the reftable backend, 'N' individual transactions
would also trigger auto-compaction for every transaction.

The reasoning for using individual transactions is because we want to
allow partial updates of references in these commands. Using a single
transaction would be an all-or-nothing scenario.

Recently we introduced an in-between solution called batched reference
updates in 23fc8e4f61 (refs: implement batch reference update support,
2025-04-08). This allows us to batch a set of reference updates, where
individual updates can pass/fail without affecting the batch.

This patch series, modifies both 'git-fetch(1)' and
'git-receive-pack(1)' to use this mechanism. With this, we see a
significant performance boost:

+---------------------+---------------+------------------+
|                     | files backend | reftable backend |
+---------------------+---------------+------------------+
| git-fetch(1)        | 1.25x         | 22x              |
| git-receive-pack(1) | 1.21x         | 18x              |
+---------------------+---------------+------------------+

The first and third patch handle the changes for 'git-fetch(1)' and
'git-receive-pack(1)' respectively. The second patch fixes a small
memory leak I encountered while working on this series.

This is based on top of master: 7a1d2bd0a5 (Merge branch 'master' of
https://github.com/j6t/gitk, 2025-05-09). There were no conflicts
observed with next or seen.

Junio, since the release window for 2.50 is closing in. I would prefer
we mark this for 2.51, so perhaps when/if we merge it to 'next', we let
it bake there till the next release window opens. While all the tests
pass, any bugs here would be high impact and it would be nice to catch
it before it hits a release.

---
Changes in v2:
- Added a new commit to introduce a mapping from 'ref_transaction_error'
  to a human readable string. We now use this to map the errors
  observed.
- Add a TODO regarding handling pruning and creation of refs in the same
  transaction in 'git-fetch(1)'.
- Cleanup commit message typos.
- Fix logic in 'builtin/receive-pack.c' around error reporting to be
  cleaner. Thanks Patrick for the suggestion.
- Link to v1: https://lore.kernel.org/r/20250514-501-update-git-fetch-1-to-use-partial-transactions-v1-0-7c65f46493d4@gmail.com

---
 builtin/fetch.c                  | 128 ++++++++++++++++++++++-----------------
 builtin/receive-pack.c           |  64 +++++++++++++++-----
 builtin/update-ref.c             |  26 +-------
 refs.c                           |  30 +++++++++
 refs.h                           |   5 ++
 send-pack.c                      |   7 +++
 t/t1416-ref-transaction-hooks.sh |   2 -
 t/t5408-send-pack-stdin.sh       |  14 ++++-
 8 files changed, 177 insertions(+), 99 deletions(-)

Karthik Nayak (4):
      refs: add function to translate errors to strings
      fetch: use batched reference updates
      send-pack: fix memory leak around duplicate refs
      receive-pack: use batched reference updates

Range-diff versus v1:

-:  ---------- > 1:  8c2b3d08d0 refs: add function to translate errors to strings
1:  b6a257eeff ! 2:  1683b19f10 fetch: use batched reference updates
    @@ builtin/fetch.c: static int s_update_ref(const char *action,
     -		ret = STORE_REF_ERROR_OTHER;
     -		goto out;
     -	}
    - 
    +-
     -	if (our_transaction) {
     -		switch (ref_transaction_commit(our_transaction, &err)) {
     -		case 0:
    @@ builtin/fetch.c: static int s_update_ref(const char *action,
     -			goto out;
     -		}
     -	}
    --
    + 
     -out:
     -	ref_transaction_free(our_transaction);
      	if (ret)
    @@ builtin/fetch.c: static int set_head(const struct ref *remote_refs, struct remot
     +	const char *remote_name;
     +};
     +
    -+static void ref_transaction_rejection_handler(const char *refname UNUSED,
    ++static void ref_transaction_rejection_handler(const char *refname,
     +					      const struct object_id *old_oid UNUSED,
     +					      const struct object_id *new_oid UNUSED,
     +					      const char *old_target UNUSED,
    @@ builtin/fetch.c: static int set_head(const struct ref *remote_refs, struct remot
     +			" 'git remote prune %s' to remove any old, conflicting "
     +			"branches"), data->remote_name);
     +		data->conflict_msg_shown = 1;
    ++	} else {
    ++		char *reason = ref_transaction_error_msg(err);
    ++
    ++		error(_("fetching ref %s failed: %s"), refname, reason);
    ++		free(reason);
     +	}
     +
     +	*data->retcode = 1;
    @@ builtin/fetch.c: static int do_fetch(struct transport *transport,
      
     +	/*
     +	 * If not atomic, we can still use batched updates, which would be much
    -+	 * more performent. We don't initiate the transaction before pruning,
    ++	 * more performant. We don't initiate the transaction before pruning,
     +	 * since pruning must be an independent step, to avoid F/D conflicts.
    ++	 *
    ++	 * TODO: if reference transactions gain logical conflict resolution, we
    ++	 * can delete and create refs (with F/D conflicts) in the same transaction
    ++	 * and this can be moved about the 'prune_refs()' block.
     +	 */
     +	if (!transaction) {
     +		transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
    @@ builtin/fetch.c: static int do_fetch(struct transport *transport,
     -			goto cleanup;
     +	if (retcode)
     +		goto cleanup;
    -+
    + 
    +-		retcode = ref_transaction_commit(transaction, &err);
     +	retcode = ref_transaction_commit(transaction, &err);
     +	if (retcode) {
     +		/*
    @@ builtin/fetch.c: static int do_fetch(struct transport *transport,
     +		transaction = NULL;
     +		goto cleanup;
     +	}
    - 
    --		retcode = ref_transaction_commit(transaction, &err);
    ++
     +	if (!atomic_fetch) {
     +		struct ref_rejection_data data = {
     +			.retcode = &retcode,
2:  1697d15527 ! 3:  85f68863d8 send-pack: fix memory leak around duplicate refs
    @@ Commit message
     
         If a reference fails to be updated, its error message is captured in the
         `ref->remote_status` field. While the command allows duplicate ref
    -    inputs, the list of doesn't accommodate this behavior as a particular
    +    inputs, the list doesn't accommodate this behavior as a particular
         refname is linked to a single `struct ref*` element. So if the user
         inputs a reference twice like:
     
3:  0579a66c98 ! 4:  1cd2324045 receive-pack: use batched reference updates
    @@ Commit message
         batch updates use a transaction under the hood. This explains the change
         in 't1416'.
     
    +    Helped-by: Patrick Steinhardt <ps@pks.im>
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
     
      ## builtin/receive-pack.c ##
    @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com
     +					      enum ref_transaction_error err,
     +					      void *cb_data)
     +{
    -+	struct strmap *failed_refs = (struct strmap *)cb_data;
    -+	const char *reason = "";
    ++	struct strmap *failed_refs = cb_data;
     +
    -+	switch (err) {
    -+	case REF_TRANSACTION_ERROR_NAME_CONFLICT:
    -+		reason = "refname conflict";
    -+		break;
    -+	case REF_TRANSACTION_ERROR_CREATE_EXISTS:
    -+		reason = "reference already exists";
    -+		break;
    -+	case REF_TRANSACTION_ERROR_NONEXISTENT_REF:
    -+		reason = "reference does not exist";
    -+		break;
    -+	case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE:
    -+		reason = "incorrect old value provided";
    -+		break;
    -+	case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE:
    -+		reason = "invalid new value provided";
    -+		break;
    -+	case REF_TRANSACTION_ERROR_EXPECTED_SYMREF:
    -+		reason = "expected symref but found regular ref";
    -+		break;
    -+	default:
    -+		reason = "unkown failure";
    -+	}
    -+
    -+	strmap_put(failed_refs, refname, xstrdup(reason));
    ++	strmap_put(failed_refs, refname, ref_transaction_error_msg(err));
     +}
     +
      static void execute_commands_non_atomic(struct command *commands,
    @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com
      {
      	struct command *cmd;
      	struct strbuf err = STRBUF_INIT;
    -+	const char *reported_error = "";
    ++	const char *reported_error = NULL;
     +	struct strmap failed_refs = STRMAP_INIT;
     +
     +	transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
    @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com
     +		rp_error("%s", err.buf);
     +		reported_error = "failed to update refs";
     +		goto failure;
    - 	}
    ++	}
     +
     +	ref_transaction_for_each_rejected_update(transaction,
     +						 ref_transaction_rejection_handler,
    @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com
     +
     +failure:
     +	for (cmd = commands; cmd; cmd = cmd->next) {
    -+		if (strmap_empty(&failed_refs))
    ++		if (reported_error)
     +			cmd->error_string = reported_error;
     +		else if (strmap_contains(&failed_refs, cmd->ref_name))
     +			cmd->error_string = xstrdup(strmap_get(&failed_refs, cmd->ref_name));
    -+	}
    + 	}
     +
     +cleanup:
     +	ref_transaction_free(transaction);


base-commit: 7a1d2bd0a596f42a8a7a68d55577967bb454fec0
change-id: 20250423-501-update-git-fetch-1-to-use-partial-transactions-7fb66339fe79

Thanks
- Karthik


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 1/4] refs: add function to translate errors to strings
  2025-05-15 14:07 ` [PATCH v2 0/4] fetch/receive: " Karthik Nayak
@ 2025-05-15 14:07   ` Karthik Nayak
  2025-05-15 19:11     ` Jeff King
  2025-05-15 20:26     ` Junio C Hamano
  2025-05-15 14:07   ` [PATCH v2 2/4] fetch: use batched reference updates Karthik Nayak
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 41+ messages in thread
From: Karthik Nayak @ 2025-05-15 14:07 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, toon, ps, gitster

The commit 76e760b999 (refs: introduce enum-based transaction error
types, 2025-04-08) introduced enum-based transaction error types. The
refs transaction logic was also modified to propagate these errors. For
clients of the ref transaction system, it would be beneficial to provide
human readable messages for these errors.

There is already an existing mapping in 'builtin/update-ref.c', move it
to 'refs.c' as `ref_transaction_error_msg()` and use the same within the
'builtin/update-ref.c'.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 builtin/update-ref.c | 26 ++------------------------
 refs.c               | 30 ++++++++++++++++++++++++++++++
 refs.h               |  5 +++++
 3 files changed, 37 insertions(+), 24 deletions(-)

diff --git a/builtin/update-ref.c b/builtin/update-ref.c
index 2b1e336ba1..09b99143bf 100644
--- a/builtin/update-ref.c
+++ b/builtin/update-ref.c
@@ -575,30 +575,7 @@ static void print_rejected_refs(const char *refname,
 				void *cb_data UNUSED)
 {
 	struct strbuf sb = STRBUF_INIT;
-	const char *reason = "";
-
-	switch (err) {
-	case REF_TRANSACTION_ERROR_NAME_CONFLICT:
-		reason = "refname conflict";
-		break;
-	case REF_TRANSACTION_ERROR_CREATE_EXISTS:
-		reason = "reference already exists";
-		break;
-	case REF_TRANSACTION_ERROR_NONEXISTENT_REF:
-		reason = "reference does not exist";
-		break;
-	case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE:
-		reason = "incorrect old value provided";
-		break;
-	case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE:
-		reason = "invalid new value provided";
-		break;
-	case REF_TRANSACTION_ERROR_EXPECTED_SYMREF:
-		reason = "expected symref but found regular ref";
-		break;
-	default:
-		reason = "unkown failure";
-	}
+	char *reason = ref_transaction_error_msg(err);
 
 	strbuf_addf(&sb, "rejected %s %s %s %s\n", refname,
 		    new_oid ? oid_to_hex(new_oid) : new_target,
@@ -606,6 +583,7 @@ static void print_rejected_refs(const char *refname,
 		    reason);
 
 	fwrite(sb.buf, sb.len, 1, stdout);
+	free(reason);
 	strbuf_release(&sb);
 }
 
diff --git a/refs.c b/refs.c
index 6559db3789..4548fd8fcf 100644
--- a/refs.c
+++ b/refs.c
@@ -3314,3 +3314,33 @@ int ref_update_expects_existing_old_ref(struct ref_update *update)
 	return (update->flags & REF_HAVE_OLD) &&
 		(!is_null_oid(&update->old_oid) || update->old_target);
 }
+
+char *ref_transaction_error_msg(enum ref_transaction_error err)
+{
+	const char *reason = "";
+
+	switch (err) {
+	case REF_TRANSACTION_ERROR_NAME_CONFLICT:
+		reason = "refname conflict";
+		break;
+	case REF_TRANSACTION_ERROR_CREATE_EXISTS:
+		reason = "reference already exists";
+		break;
+	case REF_TRANSACTION_ERROR_NONEXISTENT_REF:
+		reason = "reference does not exist";
+		break;
+	case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE:
+		reason = "incorrect old value provided";
+		break;
+	case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE:
+		reason = "invalid new value provided";
+		break;
+	case REF_TRANSACTION_ERROR_EXPECTED_SYMREF:
+		reason = "expected symref but found regular ref";
+		break;
+	default:
+		reason = "unkown failure";
+	}
+
+	return xstrdup(reason);
+}
diff --git a/refs.h b/refs.h
index 46a6008e07..a0b2e3c43d 100644
--- a/refs.h
+++ b/refs.h
@@ -907,6 +907,11 @@ void ref_transaction_for_each_rejected_update(struct ref_transaction *transactio
 					      ref_transaction_for_each_rejected_update_fn cb,
 					      void *cb_data);
 
+/*
+ * Translate errors to human readable error messages.
+ */
+char *ref_transaction_error_msg(enum ref_transaction_error err);
+
 /*
  * Free `*transaction` and all associated data.
  */

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/4] refs: add function to translate errors to strings
  2025-05-15 14:07   ` [PATCH v2 1/4] refs: add function to translate errors to strings Karthik Nayak
@ 2025-05-15 19:11     ` Jeff King
  2025-05-16  9:11       ` Karthik Nayak
  2025-05-15 20:26     ` Junio C Hamano
  1 sibling, 1 reply; 41+ messages in thread
From: Jeff King @ 2025-05-15 19:11 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon, ps, gitster

On Thu, May 15, 2025 at 04:07:25PM +0200, Karthik Nayak wrote:

> +char *ref_transaction_error_msg(enum ref_transaction_error err)
> +{
> +	const char *reason = "";
> +
> +	switch (err) {
> +	case REF_TRANSACTION_ERROR_NAME_CONFLICT:
> +		reason = "refname conflict";
> +		break;
> +	case REF_TRANSACTION_ERROR_CREATE_EXISTS:
> +		reason = "reference already exists";
> +		break;
> +	case REF_TRANSACTION_ERROR_NONEXISTENT_REF:
> +		reason = "reference does not exist";
> +		break;
> +	case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE:
> +		reason = "incorrect old value provided";
> +		break;
> +	case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE:
> +		reason = "invalid new value provided";
> +		break;
> +	case REF_TRANSACTION_ERROR_EXPECTED_SYMREF:
> +		reason = "expected symref but found regular ref";
> +		break;
> +	default:
> +		reason = "unkown failure";
> +	}
> +
> +	return xstrdup(reason);
> +}

The assignment of "" is dead code, I think? We will always assign
"unknown failure" as a last resort. Not a big deal, but just something I
noticed while reading this related to what's going on in patch 4.

Also, s/unkown/unknown/, but that is present in the pre-image. I hope we
don't need to retain it for bug-for-bug plumbing compatibility. :)

(I guess the dead store of "" was present in the original, too, for that
matter).

-Peff

PS Sorry for all the nit-picky comments. I was just going down the
   Coverity rabbit hole and didn't really review the rest of the series.
   But I wanted to say that the numbers you are seeing are very cool!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/4] refs: add function to translate errors to strings
  2025-05-15 19:11     ` Jeff King
@ 2025-05-16  9:11       ` Karthik Nayak
  0 siblings, 0 replies; 41+ messages in thread
From: Karthik Nayak @ 2025-05-16  9:11 UTC (permalink / raw)
  To: Jeff King; +Cc: git, toon, ps, gitster

[-- Attachment #1: Type: text/plain, Size: 1955 bytes --]

Jeff King <peff@peff.net> writes:

> On Thu, May 15, 2025 at 04:07:25PM +0200, Karthik Nayak wrote:
>
>> +char *ref_transaction_error_msg(enum ref_transaction_error err)
>> +{
>> +	const char *reason = "";
>> +
>> +	switch (err) {
>> +	case REF_TRANSACTION_ERROR_NAME_CONFLICT:
>> +		reason = "refname conflict";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_CREATE_EXISTS:
>> +		reason = "reference already exists";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_NONEXISTENT_REF:
>> +		reason = "reference does not exist";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE:
>> +		reason = "incorrect old value provided";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE:
>> +		reason = "invalid new value provided";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_EXPECTED_SYMREF:
>> +		reason = "expected symref but found regular ref";
>> +		break;
>> +	default:
>> +		reason = "unkown failure";
>> +	}
>> +
>> +	return xstrdup(reason);
>> +}
>
> The assignment of "" is dead code, I think? We will always assign
> "unknown failure" as a last resort. Not a big deal, but just something I
> noticed while reading this related to what's going on in patch 4.
>

Yeah, I mostly moved the code without much thought into it. This can be
cleaned up. Specifically because we can avoid all the memory allocation
and directly return the string as a 'const char *' as Junio mentioned.

> Also, s/unkown/unknown/, but that is present in the pre-image. I hope we
> don't need to retain it for bug-for-bug plumbing compatibility. :)
>

Thanks, will change.

> (I guess the dead store of "" was present in the original, too, for that
> matter).
>
> -Peff
>
> PS Sorry for all the nit-picky comments. I was just going down the
>    Coverity rabbit hole and didn't really review the rest of the series.
>    But I wanted to say that the numbers you are seeing are very cool!

I think all these points have good value. I'm happy to address them :)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/4] refs: add function to translate errors to strings
  2025-05-15 14:07   ` [PATCH v2 1/4] refs: add function to translate errors to strings Karthik Nayak
  2025-05-15 19:11     ` Jeff King
@ 2025-05-15 20:26     ` Junio C Hamano
  2025-05-16  9:12       ` Karthik Nayak
  1 sibling, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2025-05-15 20:26 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon, ps

Karthik Nayak <karthik.188@gmail.com> writes:

> diff --git a/builtin/update-ref.c b/builtin/update-ref.c
> index 2b1e336ba1..09b99143bf 100644
> --- a/builtin/update-ref.c
> +++ b/builtin/update-ref.c
> @@ -575,30 +575,7 @@ static void print_rejected_refs(const char *refname,
>  				void *cb_data UNUSED)
>  {
>  	struct strbuf sb = STRBUF_INIT;
> -	const char *reason = "";
> -
> -	switch (err) {
> -	case REF_TRANSACTION_ERROR_NAME_CONFLICT:
> -		reason = "refname conflict";
> -		break;
> -	case REF_TRANSACTION_ERROR_CREATE_EXISTS:
> -		reason = "reference already exists";
> -		break;
> -	case REF_TRANSACTION_ERROR_NONEXISTENT_REF:
> -		reason = "reference does not exist";
> -		break;
> -	case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE:
> -		reason = "incorrect old value provided";
> -		break;
> -	case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE:
> -		reason = "invalid new value provided";
> -		break;
> -	case REF_TRANSACTION_ERROR_EXPECTED_SYMREF:
> -		reason = "expected symref but found regular ref";
> -		break;
> -	default:
> -		reason = "unkown failure";
> -	}
> +	char *reason = ref_transaction_error_msg(err);
>  
>  	strbuf_addf(&sb, "rejected %s %s %s %s\n", refname,
>  		    new_oid ? oid_to_hex(new_oid) : new_target,
> @@ -606,6 +583,7 @@ static void print_rejected_refs(const char *refname,
>  		    reason);
>  
>  	fwrite(sb.buf, sb.len, 1, stdout);
> +	free(reason);
>  	strbuf_release(&sb);
>  }

Why free()?  Goes and reads on...

> +char *ref_transaction_error_msg(enum ref_transaction_error err)
> +{
> +	const char *reason = "";
> +
> +	switch (err) {
> +	case REF_TRANSACTION_ERROR_NAME_CONFLICT:
> +		reason = "refname conflict";
> +		break;
> +	case REF_TRANSACTION_ERROR_CREATE_EXISTS:
> +		reason = "reference already exists";
> +		break;
> +	case REF_TRANSACTION_ERROR_NONEXISTENT_REF:
> +		reason = "reference does not exist";
> +		break;
> +	case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE:
> +		reason = "incorrect old value provided";
> +		break;
> +	case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE:
> +		reason = "invalid new value provided";
> +		break;
> +	case REF_TRANSACTION_ERROR_EXPECTED_SYMREF:
> +		reason = "expected symref but found regular ref";
> +		break;
> +	default:
> +		reason = "unkown failure";
> +	}
> +
> +	return xstrdup(reason);
> +}

Why can't this return "const char *", without xstrdup()?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/4] refs: add function to translate errors to strings
  2025-05-15 20:26     ` Junio C Hamano
@ 2025-05-16  9:12       ` Karthik Nayak
  0 siblings, 0 replies; 41+ messages in thread
From: Karthik Nayak @ 2025-05-16  9:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, toon, ps

[-- Attachment #1: Type: text/plain, Size: 1131 bytes --]

Junio C Hamano <gitster@pobox.com> writes:

[snip]

>> +char *ref_transaction_error_msg(enum ref_transaction_error err)
>> +{
>> +	const char *reason = "";
>> +
>> +	switch (err) {
>> +	case REF_TRANSACTION_ERROR_NAME_CONFLICT:
>> +		reason = "refname conflict";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_CREATE_EXISTS:
>> +		reason = "reference already exists";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_NONEXISTENT_REF:
>> +		reason = "reference does not exist";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE:
>> +		reason = "incorrect old value provided";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE:
>> +		reason = "invalid new value provided";
>> +		break;
>> +	case REF_TRANSACTION_ERROR_EXPECTED_SYMREF:
>> +		reason = "expected symref but found regular ref";
>> +		break;
>> +	default:
>> +		reason = "unkown failure";
>> +	}
>> +
>> +	return xstrdup(reason);
>> +}
>
> Why can't this return "const char *", without xstrdup()?

Yeah I think that is much better, string literals are anyways not on the
stack and can be passed around. Will change, this would make it much
simpler.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 2/4] fetch: use batched reference updates
  2025-05-15 14:07 ` [PATCH v2 0/4] fetch/receive: " Karthik Nayak
  2025-05-15 14:07   ` [PATCH v2 1/4] refs: add function to translate errors to strings Karthik Nayak
@ 2025-05-15 14:07   ` Karthik Nayak
  2025-05-16  5:40     ` Patrick Steinhardt
  2025-05-15 14:07   ` [PATCH v2 3/4] send-pack: fix memory leak around duplicate refs Karthik Nayak
  2025-05-15 14:07   ` [PATCH v2 4/4] receive-pack: use batched reference updates Karthik Nayak
  3 siblings, 1 reply; 41+ messages in thread
From: Karthik Nayak @ 2025-05-15 14:07 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, toon, ps, gitster

The reference updates performed as a part of 'git-fetch(1)', take place
one at a time. For each reference update, a new transaction is created
and committed. This is necessary to ensure we can allow individual
updates to fail without failing the entire command. The command also
supports an '--atomic' mode, which uses a single transaction to update
all of the references. But this mode has an all-or-nothing approach,
where if a single update fails, all updates would fail.

In 23fc8e4f61 (refs: implement batch reference update support,
2025-04-08), we introduced a new mechanism to batch reference updates.
Under the hood, this uses a single transaction to perform a batch of
reference updates, while allowing only individual updates to fail.
Utilize this newly introduced batch update mechanism in 'git-fetch(1)'.
This provides a significant bump in performance, especially when dealing
with repositories with large number of references.

Adding support for batched updates is simply modifying the flow to also
create a batch update transaction in the non-atomic flow.

With the reftable backend there is a 22x performance improvement, when
performing 'git-fetch(1)' with 10000 refs:

  Benchmark 1: fetch: many refs (refformat = reftable, refcount = 10000, revision = master)
    Time (mean ± σ):      3.403 s ±  0.775 s    [User: 1.875 s, System: 1.417 s]
    Range (min … max):    2.454 s …  4.529 s    10 runs

  Benchmark 2: fetch: many refs (refformat = reftable, refcount = 10000, revision = HEAD)
    Time (mean ± σ):     154.3 ms ±  17.6 ms    [User: 102.5 ms, System: 56.1 ms]
    Range (min … max):   145.2 ms … 220.5 ms    18 runs

  Summary
    fetch: many refs (refformat = reftable, refcount = 10000, revision = HEAD) ran
     22.06 ± 5.62 times faster than fetch: many refs (refformat = reftable, refcount = 10000, revision = master)

In similar conditions, the files backend sees a 1.25x performance
improvement:

  Benchmark 1: fetch: many refs (refformat = files, refcount = 10000, revision = master)
    Time (mean ± σ):     605.5 ms ±   9.4 ms    [User: 117.8 ms, System: 483.3 ms]
    Range (min … max):   595.6 ms … 621.5 ms    10 runs

  Benchmark 2: fetch: many refs (refformat = files, refcount = 10000, revision = HEAD)
    Time (mean ± σ):     485.8 ms ±   4.3 ms    [User: 91.1 ms, System: 396.7 ms]
    Range (min … max):   477.6 ms … 494.3 ms    10 runs

  Summary
    fetch: many refs (refformat = files, refcount = 10000, revision = HEAD) ran
      1.25 ± 0.02 times faster than fetch: many refs (refformat = files, refcount = 10000, revision = master)

With this we'll either be using a regular transaction or a batch update
transaction. This helps cleanup some code which is no longer needed as
we'll now always have some type of 'ref_transaction' object being
propagated.

One big change is that earlier, each individual update would propagate a
failure. Whereas now, the `ref_transaction_for_each_rejected_update`
function is called at the end of the flow to capture the exit status for
'git-fetch(1)' and also to print F/D conflict errors. This does change
the order of the errors being printed, but the behavior stays the same.

Since transaction errors are now explicitly defined as part of
76e760b999 (refs: introduce enum-based transaction error types,
2025-04-08), utilize them and get rid of custom errors defined within
'builtin/fetch.c'.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 builtin/fetch.c | 128 ++++++++++++++++++++++++++++++++------------------------
 1 file changed, 74 insertions(+), 54 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 5279997c96..15eac2b1c2 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -641,9 +641,6 @@ static struct ref *get_ref_map(struct remote *remote,
 	return ref_map;
 }
 
-#define STORE_REF_ERROR_OTHER 1
-#define STORE_REF_ERROR_DF_CONFLICT 2
-
 static int s_update_ref(const char *action,
 			struct ref *ref,
 			struct ref_transaction *transaction,
@@ -651,7 +648,6 @@ static int s_update_ref(const char *action,
 {
 	char *msg;
 	char *rla = getenv("GIT_REFLOG_ACTION");
-	struct ref_transaction *our_transaction = NULL;
 	struct strbuf err = STRBUF_INIT;
 	int ret;
 
@@ -661,43 +657,10 @@ static int s_update_ref(const char *action,
 		rla = default_rla.buf;
 	msg = xstrfmt("%s: %s", rla, action);
 
-	/*
-	 * If no transaction was passed to us, we manage the transaction
-	 * ourselves. Otherwise, we trust the caller to handle the transaction
-	 * lifecycle.
-	 */
-	if (!transaction) {
-		transaction = our_transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
-									    0, &err);
-		if (!transaction) {
-			ret = STORE_REF_ERROR_OTHER;
-			goto out;
-		}
-	}
-
 	ret = ref_transaction_update(transaction, ref->name, &ref->new_oid,
 				     check_old ? &ref->old_oid : NULL,
 				     NULL, NULL, 0, msg, &err);
-	if (ret) {
-		ret = STORE_REF_ERROR_OTHER;
-		goto out;
-	}
-
-	if (our_transaction) {
-		switch (ref_transaction_commit(our_transaction, &err)) {
-		case 0:
-			break;
-		case REF_TRANSACTION_ERROR_NAME_CONFLICT:
-			ret = STORE_REF_ERROR_DF_CONFLICT;
-			goto out;
-		default:
-			ret = STORE_REF_ERROR_OTHER;
-			goto out;
-		}
-	}
 
-out:
-	ref_transaction_free(our_transaction);
 	if (ret)
 		error("%s", err.buf);
 	strbuf_release(&err);
@@ -1139,7 +1102,6 @@ N_("it took %.2f seconds to check forced updates; you can use\n"
    "to avoid this check\n");
 
 static int store_updated_refs(struct display_state *display_state,
-			      const char *remote_name,
 			      int connectivity_checked,
 			      struct ref_transaction *transaction, struct ref *ref_map,
 			      struct fetch_head *fetch_head,
@@ -1277,11 +1239,6 @@ static int store_updated_refs(struct display_state *display_state,
 		}
 	}
 
-	if (rc & STORE_REF_ERROR_DF_CONFLICT)
-		error(_("some local refs could not be updated; try running\n"
-		      " 'git remote prune %s' to remove any old, conflicting "
-		      "branches"), remote_name);
-
 	if (advice_enabled(ADVICE_FETCH_SHOW_FORCED_UPDATES)) {
 		if (!config->show_forced_updates) {
 			warning(_(warn_show_forced_updates));
@@ -1366,9 +1323,8 @@ static int fetch_and_consume_refs(struct display_state *display_state,
 	}
 
 	trace2_region_enter("fetch", "consume_refs", the_repository);
-	ret = store_updated_refs(display_state, transport->remote->name,
-				 connectivity_checked, transaction, ref_map,
-				 fetch_head, config);
+	ret = store_updated_refs(display_state, connectivity_checked,
+				 transaction, ref_map, fetch_head, config);
 	trace2_region_leave("fetch", "consume_refs", the_repository);
 
 out:
@@ -1688,6 +1644,37 @@ static int set_head(const struct ref *remote_refs, struct remote *remote)
 	return result;
 }
 
+struct ref_rejection_data {
+	int *retcode;
+	int conflict_msg_shown;
+	const char *remote_name;
+};
+
+static void ref_transaction_rejection_handler(const char *refname,
+					      const struct object_id *old_oid UNUSED,
+					      const struct object_id *new_oid UNUSED,
+					      const char *old_target UNUSED,
+					      const char *new_target UNUSED,
+					      enum ref_transaction_error err,
+					      void *cb_data)
+{
+	struct ref_rejection_data *data = (struct ref_rejection_data *)cb_data;
+
+	if (err == REF_TRANSACTION_ERROR_NAME_CONFLICT && !data->conflict_msg_shown) {
+		error(_("some local refs could not be updated; try running\n"
+			" 'git remote prune %s' to remove any old, conflicting "
+			"branches"), data->remote_name);
+		data->conflict_msg_shown = 1;
+	} else {
+		char *reason = ref_transaction_error_msg(err);
+
+		error(_("fetching ref %s failed: %s"), refname, reason);
+		free(reason);
+	}
+
+	*data->retcode = 1;
+}
+
 static int do_fetch(struct transport *transport,
 		    struct refspec *rs,
 		    const struct fetch_config *config)
@@ -1808,6 +1795,24 @@ static int do_fetch(struct transport *transport,
 			retcode = 1;
 	}
 
+	/*
+	 * If not atomic, we can still use batched updates, which would be much
+	 * more performant. We don't initiate the transaction before pruning,
+	 * since pruning must be an independent step, to avoid F/D conflicts.
+	 *
+	 * TODO: if reference transactions gain logical conflict resolution, we
+	 * can delete and create refs (with F/D conflicts) in the same transaction
+	 * and this can be moved about the 'prune_refs()' block.
+	 */
+	if (!transaction) {
+		transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
+							  REF_TRANSACTION_ALLOW_FAILURE, &err);
+		if (!transaction) {
+			retcode = -1;
+			goto cleanup;
+		}
+	}
+
 	if (fetch_and_consume_refs(&display_state, transport, transaction, ref_map,
 				   &fetch_head, config)) {
 		retcode = 1;
@@ -1839,16 +1844,31 @@ static int do_fetch(struct transport *transport,
 		free_refs(tags_ref_map);
 	}
 
-	if (transaction) {
-		if (retcode)
-			goto cleanup;
+	if (retcode)
+		goto cleanup;
 
-		retcode = ref_transaction_commit(transaction, &err);
+	retcode = ref_transaction_commit(transaction, &err);
+	if (retcode) {
+		/*
+		 * Explicitly handle transaction cleanup to avoid
+		 * aborting an already closed transaction.
+		 */
+		ref_transaction_free(transaction);
+		transaction = NULL;
+		goto cleanup;
+	}
+
+	if (!atomic_fetch) {
+		struct ref_rejection_data data = {
+			.retcode = &retcode,
+			.conflict_msg_shown = 0,
+			.remote_name = transport->remote->name,
+		};
+
+		ref_transaction_for_each_rejected_update(transaction,
+							 ref_transaction_rejection_handler,
+							 &data);
 		if (retcode) {
-			/*
-			 * Explicitly handle transaction cleanup to avoid
-			 * aborting an already closed transaction.
-			 */
 			ref_transaction_free(transaction);
 			transaction = NULL;
 			goto cleanup;

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/4] fetch: use batched reference updates
  2025-05-15 14:07   ` [PATCH v2 2/4] fetch: use batched reference updates Karthik Nayak
@ 2025-05-16  5:40     ` Patrick Steinhardt
  2025-05-16  9:53       ` Karthik Nayak
  0 siblings, 1 reply; 41+ messages in thread
From: Patrick Steinhardt @ 2025-05-16  5:40 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon, gitster

On Thu, May 15, 2025 at 04:07:26PM +0200, Karthik Nayak wrote:
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 5279997c96..15eac2b1c2 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1688,6 +1644,37 @@ static int set_head(const struct ref *remote_refs, struct remote *remote)
>  	return result;
>  }
>  
> +struct ref_rejection_data {
> +	int *retcode;
> +	int conflict_msg_shown;
> +	const char *remote_name;
> +};
> +
> +static void ref_transaction_rejection_handler(const char *refname,
> +					      const struct object_id *old_oid UNUSED,
> +					      const struct object_id *new_oid UNUSED,
> +					      const char *old_target UNUSED,
> +					      const char *new_target UNUSED,
> +					      enum ref_transaction_error err,
> +					      void *cb_data)
> +{
> +	struct ref_rejection_data *data = (struct ref_rejection_data *)cb_data;

Nit: unnecessary cast.

> +	if (err == REF_TRANSACTION_ERROR_NAME_CONFLICT && !data->conflict_msg_shown) {
> +		error(_("some local refs could not be updated; try running\n"
> +			" 'git remote prune %s' to remove any old, conflicting "
> +			"branches"), data->remote_name);
> +		data->conflict_msg_shown = 1;
> +	} else {
> +		char *reason = ref_transaction_error_msg(err);
> +
> +		error(_("fetching ref %s failed: %s"), refname, reason);
> +		free(reason);
> +	}
> +
> +	*data->retcode = 1;
> +}

Okay, we stopped ignoring generic errors now and will print them. What
I'm still unclear about: which exact errors do we accept now that
`REF_TRANSACTION_ALLOW_FAILURE` is specified? Most of the error codes we
probably want to accept, but what about `REF_TRANSACTION_ERROR_GENERIC`?

This makes me wonder a bit about the current layout of how we handle
these errors. If the rejection handler was invoked while preparing the
transaction for each reference as we go instead of afterwards we could
decide on-the-fly whether a specific error should be ignored or not.
That might lead to a design that is both more flexible and more obvious
at the same time because error handling is now handled explicitly by the
callsite that wants to ignore some errors.

Last but not least, I think that it would also allow us to decide ahead
of time whether we want to commit. Right now we basically say "just
commit it, whatever happens". But if I'm not mistaken, all the errors
that we care about and that callers may want to ignore are already
detected at prepare time. So if we already bubbled up relevant info
while calling `ref_transaction_prepare()` the caller may then decide to
not commit at all based on some criteria.

Sorry, I should've probably proposed this when you introducued this
mechanism. But sometimes you only see things like that as we gain more
users.

> @@ -1808,6 +1795,24 @@ static int do_fetch(struct transport *transport,
>  			retcode = 1;
>  	}
>  
> +	/*
> +	 * If not atomic, we can still use batched updates, which would be much
> +	 * more performant. We don't initiate the transaction before pruning,
> +	 * since pruning must be an independent step, to avoid F/D conflicts.
> +	 *
> +	 * TODO: if reference transactions gain logical conflict resolution, we
> +	 * can delete and create refs (with F/D conflicts) in the same transaction
> +	 * and this can be moved about the 'prune_refs()' block.

s/about/above/?

Patrick

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/4] fetch: use batched reference updates
  2025-05-16  5:40     ` Patrick Steinhardt
@ 2025-05-16  9:53       ` Karthik Nayak
  2025-05-16 10:00         ` Patrick Steinhardt
  0 siblings, 1 reply; 41+ messages in thread
From: Karthik Nayak @ 2025-05-16  9:53 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, toon, gitster

[-- Attachment #1: Type: text/plain, Size: 4967 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> On Thu, May 15, 2025 at 04:07:26PM +0200, Karthik Nayak wrote:
>> diff --git a/builtin/fetch.c b/builtin/fetch.c
>> index 5279997c96..15eac2b1c2 100644
>> --- a/builtin/fetch.c
>> +++ b/builtin/fetch.c
>> @@ -1688,6 +1644,37 @@ static int set_head(const struct ref *remote_refs, struct remote *remote)
>>  	return result;
>>  }
>>
>> +struct ref_rejection_data {
>> +	int *retcode;
>> +	int conflict_msg_shown;
>> +	const char *remote_name;
>> +};
>> +
>> +static void ref_transaction_rejection_handler(const char *refname,
>> +					      const struct object_id *old_oid UNUSED,
>> +					      const struct object_id *new_oid UNUSED,
>> +					      const char *old_target UNUSED,
>> +					      const char *new_target UNUSED,
>> +					      enum ref_transaction_error err,
>> +					      void *cb_data)
>> +{
>> +	struct ref_rejection_data *data = (struct ref_rejection_data *)cb_data;
>
> Nit: unnecessary cast.
>
>> +	if (err == REF_TRANSACTION_ERROR_NAME_CONFLICT && !data->conflict_msg_shown) {
>> +		error(_("some local refs could not be updated; try running\n"
>> +			" 'git remote prune %s' to remove any old, conflicting "
>> +			"branches"), data->remote_name);
>> +		data->conflict_msg_shown = 1;
>> +	} else {
>> +		char *reason = ref_transaction_error_msg(err);
>> +
>> +		error(_("fetching ref %s failed: %s"), refname, reason);
>> +		free(reason);
>> +	}
>> +
>> +	*data->retcode = 1;
>> +}
>
> Okay, we stopped ignoring generic errors now and will print them. What
> I'm still unclear about: which exact errors do we accept now that
> `REF_TRANSACTION_ALLOW_FAILURE` is specified? Most of the error codes we
> probably want to accept, but what about `REF_TRANSACTION_ERROR_GENERIC`?

The current mechanism in `ref_transaction_maybe_set_rejected()` doesn't
handle `REF_TRANSACTION_ERROR_GENERIC` errors. This was a design choice
(more of a requirement of what this error represents), where
`REF_TRANSACTION_ERROR_GENERIC` errors cannot be resolved on an
individual reference level. It includes:

  - System errors such as I/O errors
  - Duplicates present

Both of these represent issues which are bigger than a single ref
update, so we have to propagate these errors up.

>
> This makes me wonder a bit about the current layout of how we handle
> these errors. If the rejection handler was invoked while preparing the
> transaction for each reference as we go instead of afterwards we could
> decide on-the-fly whether a specific error should be ignored or not.
> That might lead to a design that is both more flexible and more obvious
> at the same time because error handling is now handled explicitly by the
> callsite that wants to ignore some errors.
>

I did ponder on this while I was building the batched transaction
mechanism. I decided to take it iteratively. We can, for instance,
modify `ref_transaction_maybe_set_rejected()` to work with a callback
function which would allow the users to accept/reject errors.

However, even if we go down that route, `REF_TRANSACTION_ERROR_GENERIC`
errors still cannot be overlooked, these errors will abort the entire
transaction.

That said, I'm not trying to avoid going down that route. I do agree
with the flexibility it does provide. Once we hit such a usecase, we
should make that change.

For 'git-fetch(1)' and 'git-recieve-pack(1)', do you see a usecase?

> Last but not least, I think that it would also allow us to decide ahead
> of time whether we want to commit. Right now we basically say "just
> commit it, whatever happens". But if I'm not mistaken, all the errors
> that we care about and that callers may want to ignore are already
> detected at prepare time. So if we already bubbled up relevant info
> while calling `ref_transaction_prepare()` the caller may then decide to
> not commit at all based on some criteria.
>

Indeed, that is correct. I can confirm that even now all the calls to
`ref_transaction_maybe_set_rejected()` are made in the prepare phase, so
we could already do this, since `transaction->rejections` is already
populated at this stage.

> Sorry, I should've probably proposed this when you introducued this
> mechanism. But sometimes you only see things like that as we gain more
> users.
>

You don't have to apologize. Such discussions are very important and you
shouldn't hesitate to bring up such points.

>> @@ -1808,6 +1795,24 @@ static int do_fetch(struct transport *transport,
>>  			retcode = 1;
>>  	}
>>
>> +	/*
>> +	 * If not atomic, we can still use batched updates, which would be much
>> +	 * more performant. We don't initiate the transaction before pruning,
>> +	 * since pruning must be an independent step, to avoid F/D conflicts.
>> +	 *
>> +	 * TODO: if reference transactions gain logical conflict resolution, we
>> +	 * can delete and create refs (with F/D conflicts) in the same transaction
>> +	 * and this can be moved about the 'prune_refs()' block.
>
> s/about/above/?
>

Indeed!

> Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/4] fetch: use batched reference updates
  2025-05-16  9:53       ` Karthik Nayak
@ 2025-05-16 10:00         ` Patrick Steinhardt
  2025-05-18 11:30           ` Karthik Nayak
  0 siblings, 1 reply; 41+ messages in thread
From: Patrick Steinhardt @ 2025-05-16 10:00 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon, gitster

On Fri, May 16, 2025 at 02:53:22AM -0700, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > On Thu, May 15, 2025 at 04:07:26PM +0200, Karthik Nayak wrote:
> >> diff --git a/builtin/fetch.c b/builtin/fetch.c
> >> index 5279997c96..15eac2b1c2 100644
> >> --- a/builtin/fetch.c
> >> +++ b/builtin/fetch.c
> >> @@ -1688,6 +1644,37 @@ static int set_head(const struct ref *remote_refs, struct remote *remote)
> >>  	return result;
> >>  }
> >>
> >> +struct ref_rejection_data {
> >> +	int *retcode;
> >> +	int conflict_msg_shown;
> >> +	const char *remote_name;
> >> +};
> >> +
> >> +static void ref_transaction_rejection_handler(const char *refname,
> >> +					      const struct object_id *old_oid UNUSED,
> >> +					      const struct object_id *new_oid UNUSED,
> >> +					      const char *old_target UNUSED,
> >> +					      const char *new_target UNUSED,
> >> +					      enum ref_transaction_error err,
> >> +					      void *cb_data)
> >> +{
> >> +	struct ref_rejection_data *data = (struct ref_rejection_data *)cb_data;
> >
> > Nit: unnecessary cast.
> >
> >> +	if (err == REF_TRANSACTION_ERROR_NAME_CONFLICT && !data->conflict_msg_shown) {
> >> +		error(_("some local refs could not be updated; try running\n"
> >> +			" 'git remote prune %s' to remove any old, conflicting "
> >> +			"branches"), data->remote_name);
> >> +		data->conflict_msg_shown = 1;
> >> +	} else {
> >> +		char *reason = ref_transaction_error_msg(err);
> >> +
> >> +		error(_("fetching ref %s failed: %s"), refname, reason);
> >> +		free(reason);
> >> +	}
> >> +
> >> +	*data->retcode = 1;
> >> +}
> >
> > Okay, we stopped ignoring generic errors now and will print them. What
> > I'm still unclear about: which exact errors do we accept now that
> > `REF_TRANSACTION_ALLOW_FAILURE` is specified? Most of the error codes we
> > probably want to accept, but what about `REF_TRANSACTION_ERROR_GENERIC`?
> 
> The current mechanism in `ref_transaction_maybe_set_rejected()` doesn't
> handle `REF_TRANSACTION_ERROR_GENERIC` errors. This was a design choice
> (more of a requirement of what this error represents), where
> `REF_TRANSACTION_ERROR_GENERIC` errors cannot be resolved on an
> individual reference level. It includes:
> 
>   - System errors such as I/O errors
>   - Duplicates present
> 
> Both of these represent issues which are bigger than a single ref
> update, so we have to propagate these errors up.

The second case is also why the behaviour changes now, right? If we were
able to handle duplicates via the same mechanism then it would become
possible to retain current behaviour for git-receive-pack(1)?

Not that I'm proposing this -- I very much think that the current
behaviour in git-receive-pack(1) is a bug that should be fixed. Mostly
trying to understand.

> > This makes me wonder a bit about the current layout of how we handle
> > these errors. If the rejection handler was invoked while preparing the
> > transaction for each reference as we go instead of afterwards we could
> > decide on-the-fly whether a specific error should be ignored or not.
> > That might lead to a design that is both more flexible and more obvious
> > at the same time because error handling is now handled explicitly by the
> > callsite that wants to ignore some errors.
> >
> 
> I did ponder on this while I was building the batched transaction
> mechanism. I decided to take it iteratively. We can, for instance,
> modify `ref_transaction_maybe_set_rejected()` to work with a callback
> function which would allow the users to accept/reject errors.
> 
> However, even if we go down that route, `REF_TRANSACTION_ERROR_GENERIC`
> errors still cannot be overlooked, these errors will abort the entire
> transaction.

Okay, good.

> That said, I'm not trying to avoid going down that route. I do agree
> with the flexibility it does provide. Once we hit such a usecase, we
> should make that change.
> 
> For 'git-fetch(1)' and 'git-recieve-pack(1)', do you see a usecase?

No, I don't right now. I just want to avoid that we have to eventually
refactor all of this to support an alternative API. But agreed, there
isn't really much of a reason why we wouldn't be able to introduce such
a mechanism retroactively while keeping existing callers intact.

So let's stick with what we have and keep this in the back of our minds
if we ever need such a mechanism going forward.

> > Last but not least, I think that it would also allow us to decide ahead
> > of time whether we want to commit. Right now we basically say "just
> > commit it, whatever happens". But if I'm not mistaken, all the errors
> > that we care about and that callers may want to ignore are already
> > detected at prepare time. So if we already bubbled up relevant info
> > while calling `ref_transaction_prepare()` the caller may then decide to
> > not commit at all based on some criteria.
> >
> 
> Indeed, that is correct. I can confirm that even now all the calls to
> `ref_transaction_maybe_set_rejected()` are made in the prepare phase, so
> we could already do this, since `transaction->rejections` is already
> populated at this stage.

Good. After all, we shouldn't have to perform checks in the "commit"
phase. Things are locked, things have been checked, so it should
basically be a mere "let's move everything into place now". Which of
course can still fail, but the only valid reason should be system
failures.

Patrick

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/4] fetch: use batched reference updates
  2025-05-16 10:00         ` Patrick Steinhardt
@ 2025-05-18 11:30           ` Karthik Nayak
  0 siblings, 0 replies; 41+ messages in thread
From: Karthik Nayak @ 2025-05-18 11:30 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, toon, gitster

[-- Attachment #1: Type: text/plain, Size: 6047 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> On Fri, May 16, 2025 at 02:53:22AM -0700, Karthik Nayak wrote:
>> Patrick Steinhardt <ps@pks.im> writes:
>>
>> > On Thu, May 15, 2025 at 04:07:26PM +0200, Karthik Nayak wrote:
>> >> diff --git a/builtin/fetch.c b/builtin/fetch.c
>> >> index 5279997c96..15eac2b1c2 100644
>> >> --- a/builtin/fetch.c
>> >> +++ b/builtin/fetch.c
>> >> @@ -1688,6 +1644,37 @@ static int set_head(const struct ref *remote_refs, struct remote *remote)
>> >>  	return result;
>> >>  }
>> >>
>> >> +struct ref_rejection_data {
>> >> +	int *retcode;
>> >> +	int conflict_msg_shown;
>> >> +	const char *remote_name;
>> >> +};
>> >> +
>> >> +static void ref_transaction_rejection_handler(const char *refname,
>> >> +					      const struct object_id *old_oid UNUSED,
>> >> +					      const struct object_id *new_oid UNUSED,
>> >> +					      const char *old_target UNUSED,
>> >> +					      const char *new_target UNUSED,
>> >> +					      enum ref_transaction_error err,
>> >> +					      void *cb_data)
>> >> +{
>> >> +	struct ref_rejection_data *data = (struct ref_rejection_data *)cb_data;
>> >
>> > Nit: unnecessary cast.
>> >
>> >> +	if (err == REF_TRANSACTION_ERROR_NAME_CONFLICT && !data->conflict_msg_shown) {
>> >> +		error(_("some local refs could not be updated; try running\n"
>> >> +			" 'git remote prune %s' to remove any old, conflicting "
>> >> +			"branches"), data->remote_name);
>> >> +		data->conflict_msg_shown = 1;
>> >> +	} else {
>> >> +		char *reason = ref_transaction_error_msg(err);
>> >> +
>> >> +		error(_("fetching ref %s failed: %s"), refname, reason);
>> >> +		free(reason);
>> >> +	}
>> >> +
>> >> +	*data->retcode = 1;
>> >> +}
>> >
>> > Okay, we stopped ignoring generic errors now and will print them. What
>> > I'm still unclear about: which exact errors do we accept now that
>> > `REF_TRANSACTION_ALLOW_FAILURE` is specified? Most of the error codes we
>> > probably want to accept, but what about `REF_TRANSACTION_ERROR_GENERIC`?
>>
>> The current mechanism in `ref_transaction_maybe_set_rejected()` doesn't
>> handle `REF_TRANSACTION_ERROR_GENERIC` errors. This was a design choice
>> (more of a requirement of what this error represents), where
>> `REF_TRANSACTION_ERROR_GENERIC` errors cannot be resolved on an
>> individual reference level. It includes:
>>
>>   - System errors such as I/O errors
>>   - Duplicates present
>>
>> Both of these represent issues which are bigger than a single ref
>> update, so we have to propagate these errors up.
>
> The second case is also why the behaviour changes now, right? If we were
> able to handle duplicates via the same mechanism then it would become
> possible to retain current behaviour for git-receive-pack(1)?
>

Yeah indeed, but if we want to allow users supporting conflict
resolution of duplicates, we'll also have to think about how that would
look. Our discussion till now was around allowing a callback for each
reference update with the associated error.

With duplicates, we'd also want to provide the context of which N
updates are duplicated.

> Not that I'm proposing this -- I very much think that the current
> behaviour in git-receive-pack(1) is a bug that should be fixed. Mostly
> trying to understand.
>

Yeah I agree with you on this!

>> > This makes me wonder a bit about the current layout of how we handle
>> > these errors. If the rejection handler was invoked while preparing the
>> > transaction for each reference as we go instead of afterwards we could
>> > decide on-the-fly whether a specific error should be ignored or not.
>> > That might lead to a design that is both more flexible and more obvious
>> > at the same time because error handling is now handled explicitly by the
>> > callsite that wants to ignore some errors.
>> >
>>
>> I did ponder on this while I was building the batched transaction
>> mechanism. I decided to take it iteratively. We can, for instance,
>> modify `ref_transaction_maybe_set_rejected()` to work with a callback
>> function which would allow the users to accept/reject errors.
>>
>> However, even if we go down that route, `REF_TRANSACTION_ERROR_GENERIC`
>> errors still cannot be overlooked, these errors will abort the entire
>> transaction.
>
> Okay, good.
>
>> That said, I'm not trying to avoid going down that route. I do agree
>> with the flexibility it does provide. Once we hit such a usecase, we
>> should make that change.
>>
>> For 'git-fetch(1)' and 'git-recieve-pack(1)', do you see a usecase?
>
> No, I don't right now. I just want to avoid that we have to eventually
> refactor all of this to support an alternative API. But agreed, there
> isn't really much of a reason why we wouldn't be able to introduce such
> a mechanism retroactively while keeping existing callers intact.
>
> So let's stick with what we have and keep this in the back of our minds
> if we ever need such a mechanism going forward.
>

I think this makes sense!

>> > Last but not least, I think that it would also allow us to decide ahead
>> > of time whether we want to commit. Right now we basically say "just
>> > commit it, whatever happens". But if I'm not mistaken, all the errors
>> > that we care about and that callers may want to ignore are already
>> > detected at prepare time. So if we already bubbled up relevant info
>> > while calling `ref_transaction_prepare()` the caller may then decide to
>> > not commit at all based on some criteria.
>> >
>>
>> Indeed, that is correct. I can confirm that even now all the calls to
>> `ref_transaction_maybe_set_rejected()` are made in the prepare phase, so
>> we could already do this, since `transaction->rejections` is already
>> populated at this stage.
>
> Good. After all, we shouldn't have to perform checks in the "commit"
> phase. Things are locked, things have been checked, so it should
> basically be a mere "let's move everything into place now". Which of
> course can still fail, but the only valid reason should be system
> failures.
>
> Patrick
>

Exactly, totally agreed.

Thanks for the discussion Patrick!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 3/4] send-pack: fix memory leak around duplicate refs
  2025-05-15 14:07 ` [PATCH v2 0/4] fetch/receive: " Karthik Nayak
  2025-05-15 14:07   ` [PATCH v2 1/4] refs: add function to translate errors to strings Karthik Nayak
  2025-05-15 14:07   ` [PATCH v2 2/4] fetch: use batched reference updates Karthik Nayak
@ 2025-05-15 14:07   ` Karthik Nayak
  2025-05-15 14:07   ` [PATCH v2 4/4] receive-pack: use batched reference updates Karthik Nayak
  3 siblings, 0 replies; 41+ messages in thread
From: Karthik Nayak @ 2025-05-15 14:07 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, toon, ps, gitster

The 'git-send-pack(1)' allows users to push objects to a remote
repository and explicitly list the references to be pushed. The status
of each reference pushed is captured into a list mapped by refname.

If a reference fails to be updated, its error message is captured in the
`ref->remote_status` field. While the command allows duplicate ref
inputs, the list doesn't accommodate this behavior as a particular
refname is linked to a single `struct ref*` element. So if the user
inputs a reference twice like:

  git send-pack remote.git A:foo B:foo

where the user is trying to update the same reference 'foo' twice and
the reference fails to be updated, we first fill `ref->remote_status`
with error message for the input 'A:foo' then we override the same field
with the error message for 'B:foo'. This override happens without first
free'ing the previous value. Fix this leak.

The current tests already incorporate the above example, but in the test
'A:foo' succeeds while 'B:foo' fails, meaning that the memory leak isn't
triggered. Add a new test with multiple duplicates.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 send-pack.c                | 7 +++++++
 t/t5408-send-pack-stdin.sh | 6 ++++++
 2 files changed, 13 insertions(+)

diff --git a/send-pack.c b/send-pack.c
index 5005689cb5..4cd41a64ce 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -260,6 +260,13 @@ static int receive_status(struct repository *r,
 				refname);
 			continue;
 		}
+
+		/*
+		 * Clients sending duplicate refs can cause the same value
+		 * to be overridden, causing a memory leak.
+		 */
+		free(hint->remote_status);
+
 		if (!strcmp(head, "ng")) {
 			hint->status = REF_STATUS_REMOTE_REJECT;
 			if (p)
diff --git a/t/t5408-send-pack-stdin.sh b/t/t5408-send-pack-stdin.sh
index 526a675045..45fb20179b 100755
--- a/t/t5408-send-pack-stdin.sh
+++ b/t/t5408-send-pack-stdin.sh
@@ -73,6 +73,12 @@ test_expect_success 'cmdline refs written in order' '
 	verify_push A foo
 '

+test_expect_success 'cmdline refs with multiple duplicates' '
+	clear_remote &&
+	test_must_fail git send-pack remote.git A:foo B:foo C:foo &&
+	verify_push A foo
+'
+
 test_expect_success '--stdin refs come after cmdline' '
 	clear_remote &&
 	echo A:foo >input &&

-- 
2.49.0

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 4/4] receive-pack: use batched reference updates
  2025-05-15 14:07 ` [PATCH v2 0/4] fetch/receive: " Karthik Nayak
                     ` (2 preceding siblings ...)
  2025-05-15 14:07   ` [PATCH v2 3/4] send-pack: fix memory leak around duplicate refs Karthik Nayak
@ 2025-05-15 14:07   ` Karthik Nayak
  2025-05-15 18:55     ` Jeff King
  3 siblings, 1 reply; 41+ messages in thread
From: Karthik Nayak @ 2025-05-15 14:07 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, toon, ps, gitster

The reference updates performed as a part of 'git-receive-pack(1)', take
place one at a time. For each reference update, a new transaction is
created and committed. This is necessary to ensure we can allow
individual updates to fail without failing the entire command. The
command also supports an 'atomic' mode, which uses a single transaction
to update all of the references. But this mode has an all-or-nothing
approach, where if a single update fails, all updates would fail.

In 23fc8e4f61 (refs: implement batch reference update support,
2025-04-08), we introduced a new mechanism to batch reference updates.
Under the hood, this uses a single transaction to perform a batch of
reference updates, while allowing only individual updates to fail.
Utilize this newly introduced batch update mechanism in
'git-receive-pack(1)'. This provides a significant bump in performance,
especially when dealing with repositories with large number of
references.

With the reftable backend there is a 18x performance improvement, when
performing receive-pack with 10000 refs:

  Benchmark 1: receive: many refs (refformat = reftable, refcount = 10000, revision = master)
    Time (mean ± σ):      4.276 s ±  0.078 s    [User: 0.796 s, System: 3.318 s]
    Range (min … max):    4.185 s …  4.430 s    10 runs

  Benchmark 2: receive: many refs (refformat = reftable, refcount = 10000, revision = HEAD)
    Time (mean ± σ):     235.4 ms ±   6.9 ms    [User: 75.4 ms, System: 157.3 ms]
    Range (min … max):   228.5 ms … 254.2 ms    11 runs

  Summary
    receive: many refs (refformat = reftable, refcount = 10000, revision = HEAD) ran
     18.16 ± 0.63 times faster than receive: many refs (refformat = reftable, refcount = 10000, revision = master)

In similar conditions, the files backend sees a 1.21x performance
improvement:

  Benchmark 1: receive: many refs (refformat = files, refcount = 10000, revision = master)
    Time (mean ± σ):      1.121 s ±  0.021 s    [User: 0.128 s, System: 0.975 s]
    Range (min … max):    1.097 s …  1.156 s    10 runs

  Benchmark 2: receive: many refs (refformat = files, refcount = 10000, revision = HEAD)
    Time (mean ± σ):     927.9 ms ±  22.6 ms    [User: 99.0 ms, System: 815.2 ms]
    Range (min … max):   903.1 ms … 978.0 ms    10 runs

  Summary
    receive: many refs (refformat = files, refcount = 10000, revision = HEAD) ran
      1.21 ± 0.04 times faster than receive: many refs (refformat = files, refcount = 10000, revision = master)

As using batched updates requires the error handling to be moved to the
end of the flow, create and use a 'struct strset' to track the failed
refs and attribute the correct errors to them.

This change also uncovers an issue when a client provides multiple
updates to the same reference. For example:

  $ git send-pack remote.git A:foo B:foo
  Enumerating objects: 3, done.
  Counting objects: 100% (3/3), done.
  Delta compression using up to 20 threads
  Compressing objects: 100% (2/2), done.
  Writing objects: 100% (3/3), 226 bytes | 226.00 KiB/s, done.
  Total 3 (delta 1), reused 0 (delta 0), pack-reused 0 (from 0)
  remote: error: cannot lock ref 'refs/heads/foo': reference already exists
  To remote.git
   ! [remote rejected] A -> foo (failed to update ref)
   ! [remote failure]  B -> foo (remote failed to report status)

As you can see, the remote runs into an error because it cannot lock the
target reference for the second update. Furthermore, the remote complains
that the first update has been rejected whereas the second update didn't
receive any status update because we failed to lock it. Reading this status
message alone a user would probably expect that `foo` has not been updated
at all. But that's not the case: while we claim that the ref wasn't updated,
it surprisingly points to `A` now.

One could argue that this is merely an error in how we report the result of
this push. But ultimately, the user's request itself is already broken and
doesn't make any sense in the first place and cannot ever lead to a sensible
outcome that honors the full request.

The conversion to batched transactions fixes the issue because we now try to
queue both updates in the same transaction. As such, the transaction itself
will notice this conflict and refuse the update altogether before we commit
any of the values.

Note that this requires changes to a couple of tests in t5408 that happened
to exercise this behaviour. Given that the generated output is misleading
and given that the user request cannot ever be fully honored this really
feels more like a bug than properly designed behaviour. As such, changing
the behaviour feels like the right thing to do.

Since now reference updates are batched, the 'reference-transaction'
hook will be invoked with all updates together. Currently git will 'die'
when the hook returns with a non-zero exit status in the 'prepared'
stage. For 'git-receive-pack(1)', this allowed users to reject an
individual reference update, git would have applied previous updates but
immediately abort further execution. This is definitely an incorrect
usage of this hook, since the right place to do this would be the
'update' hook. This patch retains the latter behavior, but
'reference-transaction' hook now changes to a all-or-nothing behavior
when a non-zero exit status is returned in the 'prepared' stage, since
batch updates use a transaction under the hood. This explains the change
in 't1416'.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 builtin/receive-pack.c           | 64 ++++++++++++++++++++++++++++++----------
 t/t1416-ref-transaction-hooks.sh |  2 --
 t/t5408-send-pack-stdin.sh       | 12 ++++----
 3 files changed, 55 insertions(+), 23 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index be314879e8..4c47d6d444 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1843,35 +1843,67 @@ static void BUG_if_skipped_connectivity_check(struct command *commands,
 	BUG_if_bug("connectivity check skipped???");
 }

+static void ref_transaction_rejection_handler(const char *refname,
+					      const struct object_id *old_oid UNUSED,
+					      const struct object_id *new_oid UNUSED,
+					      const char *old_target UNUSED,
+					      const char *new_target UNUSED,
+					      enum ref_transaction_error err,
+					      void *cb_data)
+{
+	struct strmap *failed_refs = cb_data;
+
+	strmap_put(failed_refs, refname, ref_transaction_error_msg(err));
+}
+
 static void execute_commands_non_atomic(struct command *commands,
 					struct shallow_info *si)
 {
 	struct command *cmd;
 	struct strbuf err = STRBUF_INIT;
+	const char *reported_error = NULL;
+	struct strmap failed_refs = STRMAP_INIT;
+
+	transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
+						  REF_TRANSACTION_ALLOW_FAILURE, &err);
+	if (!transaction) {
+		rp_error("%s", err.buf);
+		strbuf_reset(&err);
+		reported_error = "transaction failed to start";
+		goto failure;
+	}

 	for (cmd = commands; cmd; cmd = cmd->next) {
 		if (!should_process_cmd(cmd) || cmd->run_proc_receive)
 			continue;

-		transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
-							  0, &err);
-		if (!transaction) {
-			rp_error("%s", err.buf);
-			strbuf_reset(&err);
-			cmd->error_string = "transaction failed to start";
-			continue;
-		}
-
 		cmd->error_string = update(cmd, si);
+	}

-		if (!cmd->error_string
-		    && ref_transaction_commit(transaction, &err)) {
-			rp_error("%s", err.buf);
-			strbuf_reset(&err);
-			cmd->error_string = "failed to update ref";
-		}
-		ref_transaction_free(transaction);
+	if (ref_transaction_commit(transaction, &err)) {
+		rp_error("%s", err.buf);
+		reported_error = "failed to update refs";
+		goto failure;
+	}
+
+	ref_transaction_for_each_rejected_update(transaction,
+						 ref_transaction_rejection_handler,
+						 &failed_refs);
+
+	if (strmap_empty(&failed_refs))
+		goto cleanup;
+
+failure:
+	for (cmd = commands; cmd; cmd = cmd->next) {
+		if (reported_error)
+			cmd->error_string = reported_error;
+		else if (strmap_contains(&failed_refs, cmd->ref_name))
+			cmd->error_string = xstrdup(strmap_get(&failed_refs, cmd->ref_name));
 	}
+
+cleanup:
+	ref_transaction_free(transaction);
+	strmap_clear(&failed_refs, 1);
 	strbuf_release(&err);
 }

diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh
index 8c777f7cf8..d91dd3a3b5 100755
--- a/t/t1416-ref-transaction-hooks.sh
+++ b/t/t1416-ref-transaction-hooks.sh
@@ -120,8 +120,6 @@ test_expect_success 'interleaving hook calls succeed' '

 	cat >expect <<-EOF &&
 		hooks/update refs/tags/PRE $ZERO_OID $PRE_OID
-		hooks/reference-transaction prepared
-		hooks/reference-transaction committed
 		hooks/update refs/tags/POST $ZERO_OID $POST_OID
 		hooks/reference-transaction prepared
 		hooks/reference-transaction committed
diff --git a/t/t5408-send-pack-stdin.sh b/t/t5408-send-pack-stdin.sh
index 45fb20179b..76fb8bbc68 100755
--- a/t/t5408-send-pack-stdin.sh
+++ b/t/t5408-send-pack-stdin.sh
@@ -69,21 +69,23 @@ test_expect_success 'stdin mixed with cmdline' '

 test_expect_success 'cmdline refs written in order' '
 	clear_remote &&
-	test_must_fail git send-pack remote.git A:foo B:foo &&
-	verify_push A foo
+	test_must_fail git send-pack remote.git A:foo B:foo 2>err &&
+	test_grep "multiple updates for ref ${SQ}refs/heads/foo${SQ} not allowed" err &&
+	test_must_fail git --git-dir=remote.git rev-parse foo
 '

 test_expect_success 'cmdline refs with multiple duplicates' '
 	clear_remote &&
-	test_must_fail git send-pack remote.git A:foo B:foo C:foo &&
-	verify_push A foo
+	test_must_fail git send-pack remote.git A:foo B:foo C:foo 2>err &&
+	test_grep "multiple updates for ref ${SQ}refs/heads/foo${SQ} not allowed" err &&
+	test_must_fail git --git-dir=remote.git rev-parse foo
 '

 test_expect_success '--stdin refs come after cmdline' '
 	clear_remote &&
 	echo A:foo >input &&
 	test_must_fail git send-pack remote.git --stdin B:foo <input &&
-	verify_push B foo
+	test_must_fail git --git-dir=remote.git rev-parse foo
 '

 test_expect_success 'refspecs and --mirror do not mix (cmdline)' '

-- 
2.49.0

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/4] receive-pack: use batched reference updates
  2025-05-15 14:07   ` [PATCH v2 4/4] receive-pack: use batched reference updates Karthik Nayak
@ 2025-05-15 18:55     ` Jeff King
  2025-05-15 19:09       ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2025-05-15 18:55 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon, ps, gitster

On Thu, May 15, 2025 at 04:07:28PM +0200, Karthik Nayak wrote:

> +failure:
> +	for (cmd = commands; cmd; cmd = cmd->next) {
> +		if (reported_error)
> +			cmd->error_string = reported_error;
> +		else if (strmap_contains(&failed_refs, cmd->ref_name))
> +			cmd->error_string = xstrdup(strmap_get(&failed_refs, cmd->ref_name));
>  	}

Coverity complains about this code, claiming that strmap_get() could
return NULL. At first glance, I thought it was being totally stupid,
since we just called strmap_contains() above. But it's only being a
little bit stupid: even if the entry exists, it is still possible for it
to contain NULL.

However, that won't ever be the case, since it is always fed from the
set of ref transaction reasons.

Still, I wondered if:

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index bd0fb729ff..fe001bbfe8 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1897,10 +1897,11 @@ static void execute_commands_non_atomic(struct command *commands,
 
 failure:
 	for (cmd = commands; cmd; cmd = cmd->next) {
+		const char *reason;
 		if (reported_error)
 			cmd->error_string = reported_error;
-		else if (strmap_contains(&failed_refs, cmd->ref_name))
-			cmd->error_string = xstrdup(strmap_get(&failed_refs, cmd->ref_name));
+		else if ((reason = strmap_get(&failed_refs, cmd->ref_name)))
+			cmd->error_string = xstrdup(reason);
 	}
 
 cleanup:

might be simpler, and skip the extra lookup? I dunno, it is probably
getting into bikeshedding, and it is not like this is the only Coverity
false positive we have seen. So feel free to ignore. ;)

-Peff

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/4] receive-pack: use batched reference updates
  2025-05-15 18:55     ` Jeff King
@ 2025-05-15 19:09       ` Jeff King
  2025-05-16 19:49         ` Karthik Nayak
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2025-05-15 19:09 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, toon, ps, gitster

On Thu, May 15, 2025 at 02:55:36PM -0400, Jeff King wrote:

> On Thu, May 15, 2025 at 04:07:28PM +0200, Karthik Nayak wrote:
> 
> > +failure:
> > +	for (cmd = commands; cmd; cmd = cmd->next) {
> > +		if (reported_error)
> > +			cmd->error_string = reported_error;
> > +		else if (strmap_contains(&failed_refs, cmd->ref_name))
> > +			cmd->error_string = xstrdup(strmap_get(&failed_refs, cmd->ref_name));
> >  	}

BTW, one other funny thing about this code: we duplicate the strings
that we assign to cmd->error_string. But no other call path does so. So
now we have allocated memory, but nobody can ever free() it because
often it is not allocated!

I'm surprised LSan does not report a leak here, so I might be missing
something.

It looks like there is some magic in the struct here to handle this
like:

  cmd->error_string = cmd->error_string_owned = xstrdup(...);

which would solve it. But I wonder: do these need to be allocated at
all? We are pulling out strings owned by the strmap, which will free
them. So as-is, yes, we need to make our own copies.

But does the strmap need to own them in the first place? It is taking
the values from ref_transaction_error_msg(). That returns an allocated
string, but it doesn't need to. It could just return the string literals
directly, which are valid for the life of the program. That would save
other callers from having to call free(), though it does mean we have to
cast away the constness when putting them into the strmap (since it
accepts a non-const void pointer). Something like:

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 4a3c46eca7..cf0bcf1ad0 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1665,10 +1665,9 @@ static void ref_transaction_rejection_handler(const char *refname,
 			"branches"), data->remote_name);
 		data->conflict_msg_shown = 1;
 	} else {
-		char *reason = ref_transaction_error_msg(err);
+		const char *reason = ref_transaction_error_msg(err);
 
 		error(_("fetching ref %s failed: %s"), refname, reason);
-		free(reason);
 	}
 
 	*data->retcode = 1;
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index bd0fb729ff..4cef2ddd3d 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1855,7 +1855,7 @@ static void ref_transaction_rejection_handler(const char *refname,
 {
 	struct strmap *failed_refs = cb_data;
 
-	strmap_put(failed_refs, refname, ref_transaction_error_msg(err));
+	strmap_put(failed_refs, refname, (char *)ref_transaction_error_msg(err));
 }
 
 static void execute_commands_non_atomic(struct command *commands,
@@ -1897,15 +1897,16 @@ static void execute_commands_non_atomic(struct command *commands,
 
 failure:
 	for (cmd = commands; cmd; cmd = cmd->next) {
+		const char *reason;
 		if (reported_error)
 			cmd->error_string = reported_error;
-		else if (strmap_contains(&failed_refs, cmd->ref_name))
-			cmd->error_string = xstrdup(strmap_get(&failed_refs, cmd->ref_name));
+		else if ((reason = strmap_get(&failed_refs, cmd->ref_name)))
+			cmd->error_string = reason;
 	}
 
 cleanup:
 	ref_transaction_free(transaction);
-	strmap_clear(&failed_refs, 1);
+	strmap_clear(&failed_refs, 0);
 	strbuf_release(&err);
 }
 
diff --git a/builtin/update-ref.c b/builtin/update-ref.c
index 09b99143bf..1e6131e04a 100644
--- a/builtin/update-ref.c
+++ b/builtin/update-ref.c
@@ -575,15 +575,14 @@ static void print_rejected_refs(const char *refname,
 				void *cb_data UNUSED)
 {
 	struct strbuf sb = STRBUF_INIT;
-	char *reason = ref_transaction_error_msg(err);
+	const char *reason = ref_transaction_error_msg(err);
 
 	strbuf_addf(&sb, "rejected %s %s %s %s\n", refname,
 		    new_oid ? oid_to_hex(new_oid) : new_target,
 		    old_oid ? oid_to_hex(old_oid) : old_target,
 		    reason);
 
 	fwrite(sb.buf, sb.len, 1, stdout);
-	free(reason);
 	strbuf_release(&sb);
 }
 
diff --git a/refs.c b/refs.c
index 351ed52deb..953f83bb52 100644
--- a/refs.c
+++ b/refs.c
@@ -3315,7 +3315,7 @@ int ref_update_expects_existing_old_ref(struct ref_update *update)
 		(!is_null_oid(&update->old_oid) || update->old_target);
 }
 
-char *ref_transaction_error_msg(enum ref_transaction_error err)
+const char *ref_transaction_error_msg(enum ref_transaction_error err)
 {
 	const char *reason = "";
 
@@ -3342,5 +3342,5 @@ char *ref_transaction_error_msg(enum ref_transaction_error err)
 		reason = "unkown failure";
 	}
 
-	return xstrdup(reason);
+	return reason;
 }
diff --git a/refs.h b/refs.h
index a0b2e3c43d..2d58af3d88 100644
--- a/refs.h
+++ b/refs.h
@@ -910,7 +910,7 @@ void ref_transaction_for_each_rejected_update(struct ref_transaction *transactio
 /*
  * Translate errors to human readable error messages.
  */
-char *ref_transaction_error_msg(enum ref_transaction_error err);
+const char *ref_transaction_error_msg(enum ref_transaction_error err);
 
 /*
  * Free `*transaction` and all associated data.

-Peff

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/4] receive-pack: use batched reference updates
  2025-05-15 19:09       ` Jeff King
@ 2025-05-16 19:49         ` Karthik Nayak
  0 siblings, 0 replies; 41+ messages in thread
From: Karthik Nayak @ 2025-05-16 19:49 UTC (permalink / raw)
  To: Jeff King; +Cc: git, toon, ps, gitster

[-- Attachment #1: Type: text/plain, Size: 5563 bytes --]

Jeff King <peff@peff.net> writes:

> On Thu, May 15, 2025 at 02:55:36PM -0400, Jeff King wrote:
>
>> On Thu, May 15, 2025 at 04:07:28PM +0200, Karthik Nayak wrote:
>>
>> > +failure:
>> > +	for (cmd = commands; cmd; cmd = cmd->next) {
>> > +		if (reported_error)
>> > +			cmd->error_string = reported_error;
>> > +		else if (strmap_contains(&failed_refs, cmd->ref_name))
>> > +			cmd->error_string = xstrdup(strmap_get(&failed_refs, cmd->ref_name));
>> >  	}
>
> BTW, one other funny thing about this code: we duplicate the strings
> that we assign to cmd->error_string. But no other call path does so. So
> now we have allocated memory, but nobody can ever free() it because
> often it is not allocated!
>
> I'm surprised LSan does not report a leak here, so I might be missing
> something.
>

Yeah, indeed that is a leak and surprising that this wasn't caught by
LSan.

> It looks like there is some magic in the struct here to handle this
> like:
>
>   cmd->error_string = cmd->error_string_owned = xstrdup(...);
>
> which would solve it. But I wonder: do these need to be allocated at
> all? We are pulling out strings owned by the strmap, which will free
> them. So as-is, yes, we need to make our own copies.
>
> But does the strmap need to own them in the first place? It is taking
> the values from ref_transaction_error_msg(). That returns an allocated
> string, but it doesn't need to. It could just return the string literals
> directly, which are valid for the life of the program. That would save
> other callers from having to call free(), though it does mean we have to
> cast away the constness when putting them into the strmap (since it
> accepts a non-const void pointer). Something like:
>

Thanks Jeff, this does make a lot of sense, I think with Junio's
suggestion in the first patch, making `ref_transaction_error_msg()`
return a 'const char *' solves this issue too, we no longer will hold
allocated strings here and as such no longer have to worry about
allocation/freeing of memory. Much cleaner.

In the end it looks similar to you patch below!

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 4a3c46eca7..cf0bcf1ad0 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1665,10 +1665,9 @@ static void ref_transaction_rejection_handler(const char *refname,
>  			"branches"), data->remote_name);
>  		data->conflict_msg_shown = 1;
>  	} else {
> -		char *reason = ref_transaction_error_msg(err);
> +		const char *reason = ref_transaction_error_msg(err);
>
>  		error(_("fetching ref %s failed: %s"), refname, reason);
> -		free(reason);
>  	}
>
>  	*data->retcode = 1;
> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index bd0fb729ff..4cef2ddd3d 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -1855,7 +1855,7 @@ static void ref_transaction_rejection_handler(const char *refname,
>  {
>  	struct strmap *failed_refs = cb_data;
>
> -	strmap_put(failed_refs, refname, ref_transaction_error_msg(err));
> +	strmap_put(failed_refs, refname, (char *)ref_transaction_error_msg(err));
>  }
>
>  static void execute_commands_non_atomic(struct command *commands,
> @@ -1897,15 +1897,16 @@ static void execute_commands_non_atomic(struct command *commands,
>
>  failure:
>  	for (cmd = commands; cmd; cmd = cmd->next) {
> +		const char *reason;
>  		if (reported_error)
>  			cmd->error_string = reported_error;
> -		else if (strmap_contains(&failed_refs, cmd->ref_name))
> -			cmd->error_string = xstrdup(strmap_get(&failed_refs, cmd->ref_name));
> +		else if ((reason = strmap_get(&failed_refs, cmd->ref_name)))
> +			cmd->error_string = reason;
>  	}
>
>  cleanup:
>  	ref_transaction_free(transaction);
> -	strmap_clear(&failed_refs, 1);
> +	strmap_clear(&failed_refs, 0);
>  	strbuf_release(&err);
>  }
>
> diff --git a/builtin/update-ref.c b/builtin/update-ref.c
> index 09b99143bf..1e6131e04a 100644
> --- a/builtin/update-ref.c
> +++ b/builtin/update-ref.c
> @@ -575,15 +575,14 @@ static void print_rejected_refs(const char *refname,
>  				void *cb_data UNUSED)
>  {
>  	struct strbuf sb = STRBUF_INIT;
> -	char *reason = ref_transaction_error_msg(err);
> +	const char *reason = ref_transaction_error_msg(err);
>
>  	strbuf_addf(&sb, "rejected %s %s %s %s\n", refname,
>  		    new_oid ? oid_to_hex(new_oid) : new_target,
>  		    old_oid ? oid_to_hex(old_oid) : old_target,
>  		    reason);
>
>  	fwrite(sb.buf, sb.len, 1, stdout);
> -	free(reason);
>  	strbuf_release(&sb);
>  }
>
> diff --git a/refs.c b/refs.c
> index 351ed52deb..953f83bb52 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -3315,7 +3315,7 @@ int ref_update_expects_existing_old_ref(struct ref_update *update)
>  		(!is_null_oid(&update->old_oid) || update->old_target);
>  }
>
> -char *ref_transaction_error_msg(enum ref_transaction_error err)
> +const char *ref_transaction_error_msg(enum ref_transaction_error err)
>  {
>  	const char *reason = "";
>
> @@ -3342,5 +3342,5 @@ char *ref_transaction_error_msg(enum ref_transaction_error err)
>  		reason = "unkown failure";
>  	}
>
> -	return xstrdup(reason);
> +	return reason;
>  }
> diff --git a/refs.h b/refs.h
> index a0b2e3c43d..2d58af3d88 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -910,7 +910,7 @@ void ref_transaction_for_each_rejected_update(struct ref_transaction *transactio
>  /*
>   * Translate errors to human readable error messages.
>   */
> -char *ref_transaction_error_msg(enum ref_transaction_error err);
> +const char *ref_transaction_error_msg(enum ref_transaction_error err);
>
>  /*
>   * Free `*transaction` and all associated data.
>
> -Peff

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 0/4] fetch/receive: use batched reference updates
  2025-05-14  9:03 [PATCH 0/3] fetch/receive: use batched reference updates Karthik Nayak
                   ` (3 preceding siblings ...)
  2025-05-15 14:07 ` [PATCH v2 0/4] fetch/receive: " Karthik Nayak
@ 2025-05-19  9:58 ` Karthik Nayak
  2025-05-19  9:58   ` [PATCH v3 1/4] refs: add function to translate errors to strings Karthik Nayak
                     ` (4 more replies)
  4 siblings, 5 replies; 41+ messages in thread
From: Karthik Nayak @ 2025-05-19  9:58 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Patrick Steinhardt, gitster, peff

The git-fetch(1) and git-receive-pack(1) commands update references as
part of the flow. Each reference update is treated as a single entity
and a transaction is created for each update.

This can be really slow, specifically in reference backends where there
are optimizations which ensure a single transaction with 'N' reference
update perform much faster than 'N' individual transactions. Also having
'N' individual transactions has buildup and teardown costs. These costs
add up in repositories with a large number of references.

Also specifically in the reftable backend, 'N' individual transactions
would also trigger auto-compaction for every transaction.

The reasoning for using individual transactions is because we want to
allow partial updates of references in these commands. Using a single
transaction would be an all-or-nothing scenario.

Recently we introduced an in-between solution called batched reference
updates in 23fc8e4f61 (refs: implement batch reference update support,
2025-04-08). This allows us to batch a set of reference updates, where
individual updates can pass/fail without affecting the batch.

This patch series, modifies both 'git-fetch(1)' and
'git-receive-pack(1)' to use this mechanism. With this, we see a
significant performance boost:

+---------------------+---------------+------------------+
|                     | files backend | reftable backend |
+---------------------+---------------+------------------+
| git-fetch(1)        | 1.25x         | 22x              |
| git-receive-pack(1) | 1.21x         | 18x              |
+---------------------+---------------+------------------+

The first and third patch handle the changes for 'git-fetch(1)' and
'git-receive-pack(1)' respectively. The second patch fixes a small
memory leak I encountered while working on this series.

This is based on top of master: 7a1d2bd0a5 (Merge branch 'master' of
https://github.com/j6t/gitk, 2025-05-09). There were no conflicts
observed with next or seen.

Junio, since the release window for 2.50 is closing in. I would prefer
we mark this for 2.51, so perhaps when/if we merge it to 'next', we let
it bake there till the next release window opens. While all the tests
pass, any bugs here would be high impact and it would be nice to catch
it before it hits a release.

---
Changes in v3:
- Modify `ref_transaction_error_msg()` to no longer provided an
  allocated string, but rather a 'const char *' string literal, this
  cleans up unnecessary allocation/free calls.
- Remove some unnecessary type casts.
- Fix some typos.
- Link to v2: https://lore.kernel.org/r/20250515-501-update-git-fetch-1-to-use-partial-transactions-v2-0-80cbaaa55d2e@gmail.com

Changes in v2:
- Added a new commit to introduce a mapping from 'ref_transaction_error'
  to a human readable string. We now use this to map the errors
  observed.
- Add a TODO regarding handling pruning and creation of refs in the same
  transaction in 'git-fetch(1)'.
- Cleanup commit message typos.
- Fix logic in 'builtin/receive-pack.c' around error reporting to be
  cleaner. Thanks Patrick for the suggestion.
- Link to v1: https://lore.kernel.org/r/20250514-501-update-git-fetch-1-to-use-partial-transactions-v1-0-7c65f46493d4@gmail.com

---
 builtin/fetch.c                  | 127 ++++++++++++++++++++++-----------------
 builtin/receive-pack.c           |  64 +++++++++++++++-----
 builtin/update-ref.c             |  25 +-------
 refs.c                           |  20 ++++++
 refs.h                           |   5 ++
 send-pack.c                      |   7 +++
 t/t1416-ref-transaction-hooks.sh |   2 -
 t/t5408-send-pack-stdin.sh       |  15 ++++-
 8 files changed, 166 insertions(+), 99 deletions(-)

Karthik Nayak (4):
      refs: add function to translate errors to strings
      fetch: use batched reference updates
      send-pack: fix memory leak around duplicate refs
      receive-pack: use batched reference updates

Range-diff versus v2:

1:  5196124be4 ! 1:  5bd1ffe4d2 refs: add function to translate errors to strings
    @@ Commit message
         to 'refs.c' as `ref_transaction_error_msg()` and use the same within the
         'builtin/update-ref.c'.
     
    +    Helped-by: Junio C Hamano <gitster@pobox.com>
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
     
      ## builtin/update-ref.c ##
    @@ builtin/update-ref.c: static void print_rejected_refs(const char *refname,
     -	default:
     -		reason = "unkown failure";
     -	}
    -+	char *reason = ref_transaction_error_msg(err);
    ++	const char *reason = ref_transaction_error_msg(err);
      
      	strbuf_addf(&sb, "rejected %s %s %s %s\n", refname,
      		    new_oid ? oid_to_hex(new_oid) : new_target,
    -@@ builtin/update-ref.c: static void print_rejected_refs(const char *refname,
    - 		    reason);
    - 
    - 	fwrite(sb.buf, sb.len, 1, stdout);
    -+	free(reason);
    - 	strbuf_release(&sb);
    - }
    - 
     
      ## refs.c ##
     @@ refs.c: int ref_update_expects_existing_old_ref(struct ref_update *update)
    @@ refs.c: int ref_update_expects_existing_old_ref(struct ref_update *update)
      		(!is_null_oid(&update->old_oid) || update->old_target);
      }
     +
    -+char *ref_transaction_error_msg(enum ref_transaction_error err)
    ++const char *ref_transaction_error_msg(enum ref_transaction_error err)
     +{
    -+	const char *reason = "";
    -+
     +	switch (err) {
     +	case REF_TRANSACTION_ERROR_NAME_CONFLICT:
    -+		reason = "refname conflict";
    -+		break;
    ++		return "refname conflict";
     +	case REF_TRANSACTION_ERROR_CREATE_EXISTS:
    -+		reason = "reference already exists";
    -+		break;
    ++		return "reference already exists";
     +	case REF_TRANSACTION_ERROR_NONEXISTENT_REF:
    -+		reason = "reference does not exist";
    -+		break;
    ++		return "reference does not exist";
     +	case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE:
    -+		reason = "incorrect old value provided";
    -+		break;
    ++		return "incorrect old value provided";
     +	case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE:
    -+		reason = "invalid new value provided";
    -+		break;
    ++		return "invalid new value provided";
     +	case REF_TRANSACTION_ERROR_EXPECTED_SYMREF:
    -+		reason = "expected symref but found regular ref";
    -+		break;
    ++		return "expected symref but found regular ref";
     +	default:
    -+		reason = "unkown failure";
    ++		return "unknown failure";
     +	}
    -+
    -+	return xstrdup(reason);
     +}
     
      ## refs.h ##
    @@ refs.h: void ref_transaction_for_each_rejected_update(struct ref_transaction *tr
     +/*
     + * Translate errors to human readable error messages.
     + */
    -+char *ref_transaction_error_msg(enum ref_transaction_error err);
    ++const char *ref_transaction_error_msg(enum ref_transaction_error err);
     +
      /*
       * Free `*transaction` and all associated data.
2:  c2014b959f ! 2:  dadabf1918 fetch: use batched reference updates
    @@ builtin/fetch.c: static int set_head(const struct ref *remote_refs, struct remot
     +					      enum ref_transaction_error err,
     +					      void *cb_data)
     +{
    -+	struct ref_rejection_data *data = (struct ref_rejection_data *)cb_data;
    ++	struct ref_rejection_data *data = cb_data;
     +
     +	if (err == REF_TRANSACTION_ERROR_NAME_CONFLICT && !data->conflict_msg_shown) {
     +		error(_("some local refs could not be updated; try running\n"
    @@ builtin/fetch.c: static int set_head(const struct ref *remote_refs, struct remot
     +			"branches"), data->remote_name);
     +		data->conflict_msg_shown = 1;
     +	} else {
    -+		char *reason = ref_transaction_error_msg(err);
    ++		const char *reason = ref_transaction_error_msg(err);
     +
     +		error(_("fetching ref %s failed: %s"), refname, reason);
    -+		free(reason);
     +	}
     +
     +	*data->retcode = 1;
    @@ builtin/fetch.c: static int do_fetch(struct transport *transport,
     +	 *
     +	 * TODO: if reference transactions gain logical conflict resolution, we
     +	 * can delete and create refs (with F/D conflicts) in the same transaction
    -+	 * and this can be moved about the 'prune_refs()' block.
    ++	 * and this can be moved above the 'prune_refs()' block.
     +	 */
     +	if (!transaction) {
     +		transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
3:  43f0dd3c31 = 3:  4006905807 send-pack: fix memory leak around duplicate refs
4:  a976afa025 ! 4:  0549b78b48 receive-pack: use batched reference updates
    @@ Commit message
         batch updates use a transaction under the hood. This explains the change
         in 't1416'.
     
    +    Helped-by: Jeff King <peff@peff.net>
         Helped-by: Patrick Steinhardt <ps@pks.im>
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
     
    @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com
     +{
     +	struct strmap *failed_refs = cb_data;
     +
    -+	strmap_put(failed_refs, refname, ref_transaction_error_msg(err));
    ++	strmap_put(failed_refs, refname, (char *)ref_transaction_error_msg(err));
     +}
     +
      static void execute_commands_non_atomic(struct command *commands,
    @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com
     +		if (reported_error)
     +			cmd->error_string = reported_error;
     +		else if (strmap_contains(&failed_refs, cmd->ref_name))
    -+			cmd->error_string = xstrdup(strmap_get(&failed_refs, cmd->ref_name));
    ++			cmd->error_string = strmap_get(&failed_refs, cmd->ref_name);
      	}
     +
     +cleanup:
     +	ref_transaction_free(transaction);
    -+	strmap_clear(&failed_refs, 1);
    ++	strmap_clear(&failed_refs, 0);
      	strbuf_release(&err);
      }
      
    @@ t/t5408-send-pack-stdin.sh: test_expect_success 'stdin mixed with cmdline' '
      	echo A:foo >input &&
      	test_must_fail git send-pack remote.git --stdin B:foo <input &&
     -	verify_push B foo
    ++	test_grep "multiple updates for ref ${SQ}refs/heads/foo${SQ} not allowed" err &&
     +	test_must_fail git --git-dir=remote.git rev-parse foo
      '
      


base-commit: 7a1d2bd0a596f42a8a7a68d55577967bb454fec0
change-id: 20250423-501-update-git-fetch-1-to-use-partial-transactions-7fb66339fe79

Thanks
- Karthik


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 1/4] refs: add function to translate errors to strings
  2025-05-19  9:58 ` [PATCH v3 0/4] fetch/receive: " Karthik Nayak
@ 2025-05-19  9:58   ` Karthik Nayak
  2025-05-19  9:58   ` [PATCH v3 2/4] fetch: use batched reference updates Karthik Nayak
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 41+ messages in thread
From: Karthik Nayak @ 2025-05-19  9:58 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Patrick Steinhardt, gitster, peff

The commit 76e760b999 (refs: introduce enum-based transaction error
types, 2025-04-08) introduced enum-based transaction error types. The
refs transaction logic was also modified to propagate these errors. For
clients of the ref transaction system, it would be beneficial to provide
human readable messages for these errors.

There is already an existing mapping in 'builtin/update-ref.c', move it
to 'refs.c' as `ref_transaction_error_msg()` and use the same within the
'builtin/update-ref.c'.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 builtin/update-ref.c | 25 +------------------------
 refs.c               | 20 ++++++++++++++++++++
 refs.h               |  5 +++++
 3 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/builtin/update-ref.c b/builtin/update-ref.c
index 2b1e336ba1..1e6131e04a 100644
--- a/builtin/update-ref.c
+++ b/builtin/update-ref.c
@@ -575,30 +575,7 @@ static void print_rejected_refs(const char *refname,
 				void *cb_data UNUSED)
 {
 	struct strbuf sb = STRBUF_INIT;
-	const char *reason = "";
-
-	switch (err) {
-	case REF_TRANSACTION_ERROR_NAME_CONFLICT:
-		reason = "refname conflict";
-		break;
-	case REF_TRANSACTION_ERROR_CREATE_EXISTS:
-		reason = "reference already exists";
-		break;
-	case REF_TRANSACTION_ERROR_NONEXISTENT_REF:
-		reason = "reference does not exist";
-		break;
-	case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE:
-		reason = "incorrect old value provided";
-		break;
-	case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE:
-		reason = "invalid new value provided";
-		break;
-	case REF_TRANSACTION_ERROR_EXPECTED_SYMREF:
-		reason = "expected symref but found regular ref";
-		break;
-	default:
-		reason = "unkown failure";
-	}
+	const char *reason = ref_transaction_error_msg(err);
 
 	strbuf_addf(&sb, "rejected %s %s %s %s\n", refname,
 		    new_oid ? oid_to_hex(new_oid) : new_target,
diff --git a/refs.c b/refs.c
index 6559db3789..f21778c75f 100644
--- a/refs.c
+++ b/refs.c
@@ -3314,3 +3314,23 @@ int ref_update_expects_existing_old_ref(struct ref_update *update)
 	return (update->flags & REF_HAVE_OLD) &&
 		(!is_null_oid(&update->old_oid) || update->old_target);
 }
+
+const char *ref_transaction_error_msg(enum ref_transaction_error err)
+{
+	switch (err) {
+	case REF_TRANSACTION_ERROR_NAME_CONFLICT:
+		return "refname conflict";
+	case REF_TRANSACTION_ERROR_CREATE_EXISTS:
+		return "reference already exists";
+	case REF_TRANSACTION_ERROR_NONEXISTENT_REF:
+		return "reference does not exist";
+	case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE:
+		return "incorrect old value provided";
+	case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE:
+		return "invalid new value provided";
+	case REF_TRANSACTION_ERROR_EXPECTED_SYMREF:
+		return "expected symref but found regular ref";
+	default:
+		return "unknown failure";
+	}
+}
diff --git a/refs.h b/refs.h
index 46a6008e07..2d58af3d88 100644
--- a/refs.h
+++ b/refs.h
@@ -907,6 +907,11 @@ void ref_transaction_for_each_rejected_update(struct ref_transaction *transactio
 					      ref_transaction_for_each_rejected_update_fn cb,
 					      void *cb_data);
 
+/*
+ * Translate errors to human readable error messages.
+ */
+const char *ref_transaction_error_msg(enum ref_transaction_error err);
+
 /*
  * Free `*transaction` and all associated data.
  */

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 2/4] fetch: use batched reference updates
  2025-05-19  9:58 ` [PATCH v3 0/4] fetch/receive: " Karthik Nayak
  2025-05-19  9:58   ` [PATCH v3 1/4] refs: add function to translate errors to strings Karthik Nayak
@ 2025-05-19  9:58   ` Karthik Nayak
  2025-05-19  9:58   ` [PATCH v3 3/4] send-pack: fix memory leak around duplicate refs Karthik Nayak
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 41+ messages in thread
From: Karthik Nayak @ 2025-05-19  9:58 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Patrick Steinhardt, gitster, peff

The reference updates performed as a part of 'git-fetch(1)', take place
one at a time. For each reference update, a new transaction is created
and committed. This is necessary to ensure we can allow individual
updates to fail without failing the entire command. The command also
supports an '--atomic' mode, which uses a single transaction to update
all of the references. But this mode has an all-or-nothing approach,
where if a single update fails, all updates would fail.

In 23fc8e4f61 (refs: implement batch reference update support,
2025-04-08), we introduced a new mechanism to batch reference updates.
Under the hood, this uses a single transaction to perform a batch of
reference updates, while allowing only individual updates to fail.
Utilize this newly introduced batch update mechanism in 'git-fetch(1)'.
This provides a significant bump in performance, especially when dealing
with repositories with large number of references.

Adding support for batched updates is simply modifying the flow to also
create a batch update transaction in the non-atomic flow.

With the reftable backend there is a 22x performance improvement, when
performing 'git-fetch(1)' with 10000 refs:

  Benchmark 1: fetch: many refs (refformat = reftable, refcount = 10000, revision = master)
    Time (mean ± σ):      3.403 s ±  0.775 s    [User: 1.875 s, System: 1.417 s]
    Range (min … max):    2.454 s …  4.529 s    10 runs

  Benchmark 2: fetch: many refs (refformat = reftable, refcount = 10000, revision = HEAD)
    Time (mean ± σ):     154.3 ms ±  17.6 ms    [User: 102.5 ms, System: 56.1 ms]
    Range (min … max):   145.2 ms … 220.5 ms    18 runs

  Summary
    fetch: many refs (refformat = reftable, refcount = 10000, revision = HEAD) ran
     22.06 ± 5.62 times faster than fetch: many refs (refformat = reftable, refcount = 10000, revision = master)

In similar conditions, the files backend sees a 1.25x performance
improvement:

  Benchmark 1: fetch: many refs (refformat = files, refcount = 10000, revision = master)
    Time (mean ± σ):     605.5 ms ±   9.4 ms    [User: 117.8 ms, System: 483.3 ms]
    Range (min … max):   595.6 ms … 621.5 ms    10 runs

  Benchmark 2: fetch: many refs (refformat = files, refcount = 10000, revision = HEAD)
    Time (mean ± σ):     485.8 ms ±   4.3 ms    [User: 91.1 ms, System: 396.7 ms]
    Range (min … max):   477.6 ms … 494.3 ms    10 runs

  Summary
    fetch: many refs (refformat = files, refcount = 10000, revision = HEAD) ran
      1.25 ± 0.02 times faster than fetch: many refs (refformat = files, refcount = 10000, revision = master)

With this we'll either be using a regular transaction or a batch update
transaction. This helps cleanup some code which is no longer needed as
we'll now always have some type of 'ref_transaction' object being
propagated.

One big change is that earlier, each individual update would propagate a
failure. Whereas now, the `ref_transaction_for_each_rejected_update`
function is called at the end of the flow to capture the exit status for
'git-fetch(1)' and also to print F/D conflict errors. This does change
the order of the errors being printed, but the behavior stays the same.

Since transaction errors are now explicitly defined as part of
76e760b999 (refs: introduce enum-based transaction error types,
2025-04-08), utilize them and get rid of custom errors defined within
'builtin/fetch.c'.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 builtin/fetch.c | 127 ++++++++++++++++++++++++++++++++------------------------
 1 file changed, 73 insertions(+), 54 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 5279997c96..f200194f77 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -641,9 +641,6 @@ static struct ref *get_ref_map(struct remote *remote,
 	return ref_map;
 }
 
-#define STORE_REF_ERROR_OTHER 1
-#define STORE_REF_ERROR_DF_CONFLICT 2
-
 static int s_update_ref(const char *action,
 			struct ref *ref,
 			struct ref_transaction *transaction,
@@ -651,7 +648,6 @@ static int s_update_ref(const char *action,
 {
 	char *msg;
 	char *rla = getenv("GIT_REFLOG_ACTION");
-	struct ref_transaction *our_transaction = NULL;
 	struct strbuf err = STRBUF_INIT;
 	int ret;
 
@@ -661,43 +657,10 @@ static int s_update_ref(const char *action,
 		rla = default_rla.buf;
 	msg = xstrfmt("%s: %s", rla, action);
 
-	/*
-	 * If no transaction was passed to us, we manage the transaction
-	 * ourselves. Otherwise, we trust the caller to handle the transaction
-	 * lifecycle.
-	 */
-	if (!transaction) {
-		transaction = our_transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
-									    0, &err);
-		if (!transaction) {
-			ret = STORE_REF_ERROR_OTHER;
-			goto out;
-		}
-	}
-
 	ret = ref_transaction_update(transaction, ref->name, &ref->new_oid,
 				     check_old ? &ref->old_oid : NULL,
 				     NULL, NULL, 0, msg, &err);
-	if (ret) {
-		ret = STORE_REF_ERROR_OTHER;
-		goto out;
-	}
-
-	if (our_transaction) {
-		switch (ref_transaction_commit(our_transaction, &err)) {
-		case 0:
-			break;
-		case REF_TRANSACTION_ERROR_NAME_CONFLICT:
-			ret = STORE_REF_ERROR_DF_CONFLICT;
-			goto out;
-		default:
-			ret = STORE_REF_ERROR_OTHER;
-			goto out;
-		}
-	}
 
-out:
-	ref_transaction_free(our_transaction);
 	if (ret)
 		error("%s", err.buf);
 	strbuf_release(&err);
@@ -1139,7 +1102,6 @@ N_("it took %.2f seconds to check forced updates; you can use\n"
    "to avoid this check\n");
 
 static int store_updated_refs(struct display_state *display_state,
-			      const char *remote_name,
 			      int connectivity_checked,
 			      struct ref_transaction *transaction, struct ref *ref_map,
 			      struct fetch_head *fetch_head,
@@ -1277,11 +1239,6 @@ static int store_updated_refs(struct display_state *display_state,
 		}
 	}
 
-	if (rc & STORE_REF_ERROR_DF_CONFLICT)
-		error(_("some local refs could not be updated; try running\n"
-		      " 'git remote prune %s' to remove any old, conflicting "
-		      "branches"), remote_name);
-
 	if (advice_enabled(ADVICE_FETCH_SHOW_FORCED_UPDATES)) {
 		if (!config->show_forced_updates) {
 			warning(_(warn_show_forced_updates));
@@ -1366,9 +1323,8 @@ static int fetch_and_consume_refs(struct display_state *display_state,
 	}
 
 	trace2_region_enter("fetch", "consume_refs", the_repository);
-	ret = store_updated_refs(display_state, transport->remote->name,
-				 connectivity_checked, transaction, ref_map,
-				 fetch_head, config);
+	ret = store_updated_refs(display_state, connectivity_checked,
+				 transaction, ref_map, fetch_head, config);
 	trace2_region_leave("fetch", "consume_refs", the_repository);
 
 out:
@@ -1688,6 +1644,36 @@ static int set_head(const struct ref *remote_refs, struct remote *remote)
 	return result;
 }
 
+struct ref_rejection_data {
+	int *retcode;
+	int conflict_msg_shown;
+	const char *remote_name;
+};
+
+static void ref_transaction_rejection_handler(const char *refname,
+					      const struct object_id *old_oid UNUSED,
+					      const struct object_id *new_oid UNUSED,
+					      const char *old_target UNUSED,
+					      const char *new_target UNUSED,
+					      enum ref_transaction_error err,
+					      void *cb_data)
+{
+	struct ref_rejection_data *data = cb_data;
+
+	if (err == REF_TRANSACTION_ERROR_NAME_CONFLICT && !data->conflict_msg_shown) {
+		error(_("some local refs could not be updated; try running\n"
+			" 'git remote prune %s' to remove any old, conflicting "
+			"branches"), data->remote_name);
+		data->conflict_msg_shown = 1;
+	} else {
+		const char *reason = ref_transaction_error_msg(err);
+
+		error(_("fetching ref %s failed: %s"), refname, reason);
+	}
+
+	*data->retcode = 1;
+}
+
 static int do_fetch(struct transport *transport,
 		    struct refspec *rs,
 		    const struct fetch_config *config)
@@ -1808,6 +1794,24 @@ static int do_fetch(struct transport *transport,
 			retcode = 1;
 	}
 
+	/*
+	 * If not atomic, we can still use batched updates, which would be much
+	 * more performant. We don't initiate the transaction before pruning,
+	 * since pruning must be an independent step, to avoid F/D conflicts.
+	 *
+	 * TODO: if reference transactions gain logical conflict resolution, we
+	 * can delete and create refs (with F/D conflicts) in the same transaction
+	 * and this can be moved above the 'prune_refs()' block.
+	 */
+	if (!transaction) {
+		transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
+							  REF_TRANSACTION_ALLOW_FAILURE, &err);
+		if (!transaction) {
+			retcode = -1;
+			goto cleanup;
+		}
+	}
+
 	if (fetch_and_consume_refs(&display_state, transport, transaction, ref_map,
 				   &fetch_head, config)) {
 		retcode = 1;
@@ -1839,16 +1843,31 @@ static int do_fetch(struct transport *transport,
 		free_refs(tags_ref_map);
 	}
 
-	if (transaction) {
-		if (retcode)
-			goto cleanup;
+	if (retcode)
+		goto cleanup;
 
-		retcode = ref_transaction_commit(transaction, &err);
+	retcode = ref_transaction_commit(transaction, &err);
+	if (retcode) {
+		/*
+		 * Explicitly handle transaction cleanup to avoid
+		 * aborting an already closed transaction.
+		 */
+		ref_transaction_free(transaction);
+		transaction = NULL;
+		goto cleanup;
+	}
+
+	if (!atomic_fetch) {
+		struct ref_rejection_data data = {
+			.retcode = &retcode,
+			.conflict_msg_shown = 0,
+			.remote_name = transport->remote->name,
+		};
+
+		ref_transaction_for_each_rejected_update(transaction,
+							 ref_transaction_rejection_handler,
+							 &data);
 		if (retcode) {
-			/*
-			 * Explicitly handle transaction cleanup to avoid
-			 * aborting an already closed transaction.
-			 */
 			ref_transaction_free(transaction);
 			transaction = NULL;
 			goto cleanup;

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 3/4] send-pack: fix memory leak around duplicate refs
  2025-05-19  9:58 ` [PATCH v3 0/4] fetch/receive: " Karthik Nayak
  2025-05-19  9:58   ` [PATCH v3 1/4] refs: add function to translate errors to strings Karthik Nayak
  2025-05-19  9:58   ` [PATCH v3 2/4] fetch: use batched reference updates Karthik Nayak
@ 2025-05-19  9:58   ` Karthik Nayak
  2025-05-19  9:58   ` [PATCH v3 4/4] receive-pack: use batched reference updates Karthik Nayak
  2025-05-19 18:14   ` [PATCH v3 0/4] fetch/receive: " Junio C Hamano
  4 siblings, 0 replies; 41+ messages in thread
From: Karthik Nayak @ 2025-05-19  9:58 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Patrick Steinhardt, gitster, peff

The 'git-send-pack(1)' allows users to push objects to a remote
repository and explicitly list the references to be pushed. The status
of each reference pushed is captured into a list mapped by refname.

If a reference fails to be updated, its error message is captured in the
`ref->remote_status` field. While the command allows duplicate ref
inputs, the list doesn't accommodate this behavior as a particular
refname is linked to a single `struct ref*` element. So if the user
inputs a reference twice like:

  git send-pack remote.git A:foo B:foo

where the user is trying to update the same reference 'foo' twice and
the reference fails to be updated, we first fill `ref->remote_status`
with error message for the input 'A:foo' then we override the same field
with the error message for 'B:foo'. This override happens without first
free'ing the previous value. Fix this leak.

The current tests already incorporate the above example, but in the test
'A:foo' succeeds while 'B:foo' fails, meaning that the memory leak isn't
triggered. Add a new test with multiple duplicates.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 send-pack.c                | 7 +++++++
 t/t5408-send-pack-stdin.sh | 6 ++++++
 2 files changed, 13 insertions(+)

diff --git a/send-pack.c b/send-pack.c
index 5005689cb5..4cd41a64ce 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -260,6 +260,13 @@ static int receive_status(struct repository *r,
 				refname);
 			continue;
 		}
+
+		/*
+		 * Clients sending duplicate refs can cause the same value
+		 * to be overridden, causing a memory leak.
+		 */
+		free(hint->remote_status);
+
 		if (!strcmp(head, "ng")) {
 			hint->status = REF_STATUS_REMOTE_REJECT;
 			if (p)
diff --git a/t/t5408-send-pack-stdin.sh b/t/t5408-send-pack-stdin.sh
index 526a675045..45fb20179b 100755
--- a/t/t5408-send-pack-stdin.sh
+++ b/t/t5408-send-pack-stdin.sh
@@ -73,6 +73,12 @@ test_expect_success 'cmdline refs written in order' '
 	verify_push A foo
 '

+test_expect_success 'cmdline refs with multiple duplicates' '
+	clear_remote &&
+	test_must_fail git send-pack remote.git A:foo B:foo C:foo &&
+	verify_push A foo
+'
+
 test_expect_success '--stdin refs come after cmdline' '
 	clear_remote &&
 	echo A:foo >input &&

-- 
2.49.0

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 4/4] receive-pack: use batched reference updates
  2025-05-19  9:58 ` [PATCH v3 0/4] fetch/receive: " Karthik Nayak
                     ` (2 preceding siblings ...)
  2025-05-19  9:58   ` [PATCH v3 3/4] send-pack: fix memory leak around duplicate refs Karthik Nayak
@ 2025-05-19  9:58   ` Karthik Nayak
  2025-05-19 18:14   ` [PATCH v3 0/4] fetch/receive: " Junio C Hamano
  4 siblings, 0 replies; 41+ messages in thread
From: Karthik Nayak @ 2025-05-19  9:58 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Patrick Steinhardt, gitster, peff

The reference updates performed as a part of 'git-receive-pack(1)', take
place one at a time. For each reference update, a new transaction is
created and committed. This is necessary to ensure we can allow
individual updates to fail without failing the entire command. The
command also supports an 'atomic' mode, which uses a single transaction
to update all of the references. But this mode has an all-or-nothing
approach, where if a single update fails, all updates would fail.

In 23fc8e4f61 (refs: implement batch reference update support,
2025-04-08), we introduced a new mechanism to batch reference updates.
Under the hood, this uses a single transaction to perform a batch of
reference updates, while allowing only individual updates to fail.
Utilize this newly introduced batch update mechanism in
'git-receive-pack(1)'. This provides a significant bump in performance,
especially when dealing with repositories with large number of
references.

With the reftable backend there is a 18x performance improvement, when
performing receive-pack with 10000 refs:

  Benchmark 1: receive: many refs (refformat = reftable, refcount = 10000, revision = master)
    Time (mean ± σ):      4.276 s ±  0.078 s    [User: 0.796 s, System: 3.318 s]
    Range (min … max):    4.185 s …  4.430 s    10 runs

  Benchmark 2: receive: many refs (refformat = reftable, refcount = 10000, revision = HEAD)
    Time (mean ± σ):     235.4 ms ±   6.9 ms    [User: 75.4 ms, System: 157.3 ms]
    Range (min … max):   228.5 ms … 254.2 ms    11 runs

  Summary
    receive: many refs (refformat = reftable, refcount = 10000, revision = HEAD) ran
     18.16 ± 0.63 times faster than receive: many refs (refformat = reftable, refcount = 10000, revision = master)

In similar conditions, the files backend sees a 1.21x performance
improvement:

  Benchmark 1: receive: many refs (refformat = files, refcount = 10000, revision = master)
    Time (mean ± σ):      1.121 s ±  0.021 s    [User: 0.128 s, System: 0.975 s]
    Range (min … max):    1.097 s …  1.156 s    10 runs

  Benchmark 2: receive: many refs (refformat = files, refcount = 10000, revision = HEAD)
    Time (mean ± σ):     927.9 ms ±  22.6 ms    [User: 99.0 ms, System: 815.2 ms]
    Range (min … max):   903.1 ms … 978.0 ms    10 runs

  Summary
    receive: many refs (refformat = files, refcount = 10000, revision = HEAD) ran
      1.21 ± 0.04 times faster than receive: many refs (refformat = files, refcount = 10000, revision = master)

As using batched updates requires the error handling to be moved to the
end of the flow, create and use a 'struct strset' to track the failed
refs and attribute the correct errors to them.

This change also uncovers an issue when a client provides multiple
updates to the same reference. For example:

  $ git send-pack remote.git A:foo B:foo
  Enumerating objects: 3, done.
  Counting objects: 100% (3/3), done.
  Delta compression using up to 20 threads
  Compressing objects: 100% (2/2), done.
  Writing objects: 100% (3/3), 226 bytes | 226.00 KiB/s, done.
  Total 3 (delta 1), reused 0 (delta 0), pack-reused 0 (from 0)
  remote: error: cannot lock ref 'refs/heads/foo': reference already exists
  To remote.git
   ! [remote rejected] A -> foo (failed to update ref)
   ! [remote failure]  B -> foo (remote failed to report status)

As you can see, the remote runs into an error because it cannot lock the
target reference for the second update. Furthermore, the remote complains
that the first update has been rejected whereas the second update didn't
receive any status update because we failed to lock it. Reading this status
message alone a user would probably expect that `foo` has not been updated
at all. But that's not the case: while we claim that the ref wasn't updated,
it surprisingly points to `A` now.

One could argue that this is merely an error in how we report the result of
this push. But ultimately, the user's request itself is already broken and
doesn't make any sense in the first place and cannot ever lead to a sensible
outcome that honors the full request.

The conversion to batched transactions fixes the issue because we now try to
queue both updates in the same transaction. As such, the transaction itself
will notice this conflict and refuse the update altogether before we commit
any of the values.

Note that this requires changes to a couple of tests in t5408 that happened
to exercise this behaviour. Given that the generated output is misleading
and given that the user request cannot ever be fully honored this really
feels more like a bug than properly designed behaviour. As such, changing
the behaviour feels like the right thing to do.

Since now reference updates are batched, the 'reference-transaction'
hook will be invoked with all updates together. Currently git will 'die'
when the hook returns with a non-zero exit status in the 'prepared'
stage. For 'git-receive-pack(1)', this allowed users to reject an
individual reference update, git would have applied previous updates but
immediately abort further execution. This is definitely an incorrect
usage of this hook, since the right place to do this would be the
'update' hook. This patch retains the latter behavior, but
'reference-transaction' hook now changes to a all-or-nothing behavior
when a non-zero exit status is returned in the 'prepared' stage, since
batch updates use a transaction under the hood. This explains the change
in 't1416'.

Helped-by: Jeff King <peff@peff.net>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 builtin/receive-pack.c           | 64 ++++++++++++++++++++++++++++++----------
 t/t1416-ref-transaction-hooks.sh |  2 --
 t/t5408-send-pack-stdin.sh       | 13 ++++----
 3 files changed, 56 insertions(+), 23 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index be314879e8..2a302c836f 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1843,35 +1843,67 @@ static void BUG_if_skipped_connectivity_check(struct command *commands,
 	BUG_if_bug("connectivity check skipped???");
 }

+static void ref_transaction_rejection_handler(const char *refname,
+					      const struct object_id *old_oid UNUSED,
+					      const struct object_id *new_oid UNUSED,
+					      const char *old_target UNUSED,
+					      const char *new_target UNUSED,
+					      enum ref_transaction_error err,
+					      void *cb_data)
+{
+	struct strmap *failed_refs = cb_data;
+
+	strmap_put(failed_refs, refname, (char *)ref_transaction_error_msg(err));
+}
+
 static void execute_commands_non_atomic(struct command *commands,
 					struct shallow_info *si)
 {
 	struct command *cmd;
 	struct strbuf err = STRBUF_INIT;
+	const char *reported_error = NULL;
+	struct strmap failed_refs = STRMAP_INIT;
+
+	transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
+						  REF_TRANSACTION_ALLOW_FAILURE, &err);
+	if (!transaction) {
+		rp_error("%s", err.buf);
+		strbuf_reset(&err);
+		reported_error = "transaction failed to start";
+		goto failure;
+	}

 	for (cmd = commands; cmd; cmd = cmd->next) {
 		if (!should_process_cmd(cmd) || cmd->run_proc_receive)
 			continue;

-		transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
-							  0, &err);
-		if (!transaction) {
-			rp_error("%s", err.buf);
-			strbuf_reset(&err);
-			cmd->error_string = "transaction failed to start";
-			continue;
-		}
-
 		cmd->error_string = update(cmd, si);
+	}

-		if (!cmd->error_string
-		    && ref_transaction_commit(transaction, &err)) {
-			rp_error("%s", err.buf);
-			strbuf_reset(&err);
-			cmd->error_string = "failed to update ref";
-		}
-		ref_transaction_free(transaction);
+	if (ref_transaction_commit(transaction, &err)) {
+		rp_error("%s", err.buf);
+		reported_error = "failed to update refs";
+		goto failure;
+	}
+
+	ref_transaction_for_each_rejected_update(transaction,
+						 ref_transaction_rejection_handler,
+						 &failed_refs);
+
+	if (strmap_empty(&failed_refs))
+		goto cleanup;
+
+failure:
+	for (cmd = commands; cmd; cmd = cmd->next) {
+		if (reported_error)
+			cmd->error_string = reported_error;
+		else if (strmap_contains(&failed_refs, cmd->ref_name))
+			cmd->error_string = strmap_get(&failed_refs, cmd->ref_name);
 	}
+
+cleanup:
+	ref_transaction_free(transaction);
+	strmap_clear(&failed_refs, 0);
 	strbuf_release(&err);
 }

diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh
index 8c777f7cf8..d91dd3a3b5 100755
--- a/t/t1416-ref-transaction-hooks.sh
+++ b/t/t1416-ref-transaction-hooks.sh
@@ -120,8 +120,6 @@ test_expect_success 'interleaving hook calls succeed' '

 	cat >expect <<-EOF &&
 		hooks/update refs/tags/PRE $ZERO_OID $PRE_OID
-		hooks/reference-transaction prepared
-		hooks/reference-transaction committed
 		hooks/update refs/tags/POST $ZERO_OID $POST_OID
 		hooks/reference-transaction prepared
 		hooks/reference-transaction committed
diff --git a/t/t5408-send-pack-stdin.sh b/t/t5408-send-pack-stdin.sh
index 45fb20179b..ec339761c2 100755
--- a/t/t5408-send-pack-stdin.sh
+++ b/t/t5408-send-pack-stdin.sh
@@ -69,21 +69,24 @@ test_expect_success 'stdin mixed with cmdline' '

 test_expect_success 'cmdline refs written in order' '
 	clear_remote &&
-	test_must_fail git send-pack remote.git A:foo B:foo &&
-	verify_push A foo
+	test_must_fail git send-pack remote.git A:foo B:foo 2>err &&
+	test_grep "multiple updates for ref ${SQ}refs/heads/foo${SQ} not allowed" err &&
+	test_must_fail git --git-dir=remote.git rev-parse foo
 '

 test_expect_success 'cmdline refs with multiple duplicates' '
 	clear_remote &&
-	test_must_fail git send-pack remote.git A:foo B:foo C:foo &&
-	verify_push A foo
+	test_must_fail git send-pack remote.git A:foo B:foo C:foo 2>err &&
+	test_grep "multiple updates for ref ${SQ}refs/heads/foo${SQ} not allowed" err &&
+	test_must_fail git --git-dir=remote.git rev-parse foo
 '

 test_expect_success '--stdin refs come after cmdline' '
 	clear_remote &&
 	echo A:foo >input &&
 	test_must_fail git send-pack remote.git --stdin B:foo <input &&
-	verify_push B foo
+	test_grep "multiple updates for ref ${SQ}refs/heads/foo${SQ} not allowed" err &&
+	test_must_fail git --git-dir=remote.git rev-parse foo
 '

 test_expect_success 'refspecs and --mirror do not mix (cmdline)' '

-- 
2.49.0

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 0/4] fetch/receive: use batched reference updates
  2025-05-19  9:58 ` [PATCH v3 0/4] fetch/receive: " Karthik Nayak
                     ` (3 preceding siblings ...)
  2025-05-19  9:58   ` [PATCH v3 4/4] receive-pack: use batched reference updates Karthik Nayak
@ 2025-05-19 18:14   ` Junio C Hamano
  2025-05-20  9:05     ` Karthik Nayak
  4 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2025-05-19 18:14 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Patrick Steinhardt, peff

Karthik Nayak <karthik.188@gmail.com> writes:

> The git-fetch(1) and git-receive-pack(1) commands update references as
> part of the flow. Each reference update is treated as a single entity
> and a transaction is created for each update.
>
> This can be really slow, specifically in reference backends where there
> are optimizations which ensure a single transaction with 'N' reference
> update perform much faster than 'N' individual transactions. Also having
> 'N' individual transactions has buildup and teardown costs. These costs
> add up in repositories with a large number of references.
>
> Also specifically in the reftable backend, 'N' individual transactions
> would also trigger auto-compaction for every transaction.
>
> The reasoning for using individual transactions is because we want to
> allow partial updates of references in these commands. Using a single
> transaction would be an all-or-nothing scenario.
>
> Recently we introduced an in-between solution called batched reference
> updates in 23fc8e4f61 (refs: implement batch reference update support,
> 2025-04-08). This allows us to batch a set of reference updates, where
> individual updates can pass/fail without affecting the batch.
>
> This patch series, modifies both 'git-fetch(1)' and
> 'git-receive-pack(1)' to use this mechanism. With this, we see a
> significant performance boost:
>
> +---------------------+---------------+------------------+
> |                     | files backend | reftable backend |
> +---------------------+---------------+------------------+
> | git-fetch(1)        | 1.25x         | 22x              |
> | git-receive-pack(1) | 1.21x         | 18x              |
> +---------------------+---------------+------------------+

Very nice.

> The first and third patch handle the changes for 'git-fetch(1)' and
> 'git-receive-pack(1)' respectively. The second patch fixes a small
> memory leak I encountered while working on this series.
>
> This is based on top of master: 7a1d2bd0a5 (Merge branch 'master' of
> https://github.com/j6t/gitk, 2025-05-09). There were no conflicts
> observed with next or seen.
>
> Junio, since the release window for 2.50 is closing in. I would prefer
> we mark this for 2.51, so perhaps when/if we merge it to 'next', we let
> it bake there till the next release window opens. While all the tests
> pass, any bugs here would be high impact and it would be nice to catch
> it before it hits a release.

I've read the difference since the last iteration, "git diff @{1}",
and everything looked sensible.

Not an issue with this series at all, but one thing I wondered is if
it makes sense to change the type of strmap_get/strmap_put to deal
with "const void *".  That way, it would not be necessary to cast
away the constness like so:

>     -+	strmap_put(failed_refs, refname, ref_transaction_error_msg(err));
>     ++	strmap_put(failed_refs, refname, (char *)ref_transaction_error_msg(err));

without harming the other side, namely

>     @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com
>      +		if (reported_error)
>      +			cmd->error_string = reported_error;
>      +		else if (strmap_contains(&failed_refs, cmd->ref_name))
>     -+			cmd->error_string = xstrdup(strmap_get(&failed_refs, cmd->ref_name));
>     ++			cmd->error_string = strmap_get(&failed_refs, cmd->ref_name);

this piece of code.

It may not make sense, and even if it did, of course, it is totally
outside of this series.

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 0/4] fetch/receive: use batched reference updates
  2025-05-19 18:14   ` [PATCH v3 0/4] fetch/receive: " Junio C Hamano
@ 2025-05-20  9:05     ` Karthik Nayak
  2025-05-21 13:14       ` Junio C Hamano
  2025-05-22  6:00       ` Jeff King
  0 siblings, 2 replies; 41+ messages in thread
From: Karthik Nayak @ 2025-05-20  9:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt, peff

[-- Attachment #1: Type: text/plain, Size: 3872 bytes --]

Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> The git-fetch(1) and git-receive-pack(1) commands update references as
>> part of the flow. Each reference update is treated as a single entity
>> and a transaction is created for each update.
>>
>> This can be really slow, specifically in reference backends where there
>> are optimizations which ensure a single transaction with 'N' reference
>> update perform much faster than 'N' individual transactions. Also having
>> 'N' individual transactions has buildup and teardown costs. These costs
>> add up in repositories with a large number of references.
>>
>> Also specifically in the reftable backend, 'N' individual transactions
>> would also trigger auto-compaction for every transaction.
>>
>> The reasoning for using individual transactions is because we want to
>> allow partial updates of references in these commands. Using a single
>> transaction would be an all-or-nothing scenario.
>>
>> Recently we introduced an in-between solution called batched reference
>> updates in 23fc8e4f61 (refs: implement batch reference update support,
>> 2025-04-08). This allows us to batch a set of reference updates, where
>> individual updates can pass/fail without affecting the batch.
>>
>> This patch series, modifies both 'git-fetch(1)' and
>> 'git-receive-pack(1)' to use this mechanism. With this, we see a
>> significant performance boost:
>>
>> +---------------------+---------------+------------------+
>> |                     | files backend | reftable backend |
>> +---------------------+---------------+------------------+
>> | git-fetch(1)        | 1.25x         | 22x              |
>> | git-receive-pack(1) | 1.21x         | 18x              |
>> +---------------------+---------------+------------------+
>
> Very nice.
>
>> The first and third patch handle the changes for 'git-fetch(1)' and
>> 'git-receive-pack(1)' respectively. The second patch fixes a small
>> memory leak I encountered while working on this series.
>>
>> This is based on top of master: 7a1d2bd0a5 (Merge branch 'master' of
>> https://github.com/j6t/gitk, 2025-05-09). There were no conflicts
>> observed with next or seen.
>>
>> Junio, since the release window for 2.50 is closing in. I would prefer
>> we mark this for 2.51, so perhaps when/if we merge it to 'next', we let
>> it bake there till the next release window opens. While all the tests
>> pass, any bugs here would be high impact and it would be nice to catch
>> it before it hits a release.
>
> I've read the difference since the last iteration, "git diff @{1}",
> and everything looked sensible.
>

Thanks for the review!

> Not an issue with this series at all, but one thing I wondered is if
> it makes sense to change the type of strmap_get/strmap_put to deal
> with "const void *".  That way, it would not be necessary to cast
> away the constness like so:
>
>>     -+	strmap_put(failed_refs, refname, ref_transaction_error_msg(err));
>>     ++	strmap_put(failed_refs, refname, (char *)ref_transaction_error_msg(err));
>
> without harming the other side, namely
>
>>     @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com
>>      +		if (reported_error)
>>      +			cmd->error_string = reported_error;
>>      +		else if (strmap_contains(&failed_refs, cmd->ref_name))
>>     -+			cmd->error_string = xstrdup(strmap_get(&failed_refs, cmd->ref_name));
>>     ++			cmd->error_string = strmap_get(&failed_refs, cmd->ref_name);
>
> this piece of code.
>
> It may not make sense, and even if it did, of course, it is totally
> outside of this series.
>
> Thanks.

It definitely does, The only other typecast I did find for `strmap_put`
was within 'strmap.h'. Nevertheless, I think it makes sense to make that
change. strmap shouldn't modify the data provided. Perhaps #leftoverbits.

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 0/4] fetch/receive: use batched reference updates
  2025-05-20  9:05     ` Karthik Nayak
@ 2025-05-21 13:14       ` Junio C Hamano
  2025-05-22  6:00       ` Jeff King
  1 sibling, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2025-05-21 13:14 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Patrick Steinhardt, peff

Karthik Nayak <karthik.188@gmail.com> writes:

>> Not an issue with this series at all, but one thing I wondered is if
>> it makes sense to change the type of strmap_get/strmap_put to deal
>> with "const void *".  That way, it would not be necessary to cast
>> away the constness like so:
>>
>>>     -+	strmap_put(failed_refs, refname, ref_transaction_error_msg(err));
>>>     ++	strmap_put(failed_refs, refname, (char *)ref_transaction_error_msg(err));
>>
>> without harming the other side, namely
>>
>>>     @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com
>>>      +		if (reported_error)
>>>      +			cmd->error_string = reported_error;
>>>      +		else if (strmap_contains(&failed_refs, cmd->ref_name))
>>>     -+			cmd->error_string = xstrdup(strmap_get(&failed_refs, cmd->ref_name));
>>>     ++			cmd->error_string = strmap_get(&failed_refs, cmd->ref_name);
>>
>> this piece of code.
>>
>> It may not make sense, and even if it did, of course, it is totally
>> outside of this series.
>>
>> Thanks.
>
> It definitely does, The only other typecast I did find for `strmap_put`
> was within 'strmap.h'. Nevertheless, I think it makes sense to make that
> change. strmap shouldn't modify the data provided. Perhaps #leftoverbits.

OK.  Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 0/4] fetch/receive: use batched reference updates
  2025-05-20  9:05     ` Karthik Nayak
  2025-05-21 13:14       ` Junio C Hamano
@ 2025-05-22  6:00       ` Jeff King
  2025-05-22  8:50         ` Karthik Nayak
  1 sibling, 1 reply; 41+ messages in thread
From: Jeff King @ 2025-05-22  6:00 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: Junio C Hamano, git, Patrick Steinhardt

On Tue, May 20, 2025 at 02:05:09AM -0700, Karthik Nayak wrote:

> > Not an issue with this series at all, but one thing I wondered is if
> > it makes sense to change the type of strmap_get/strmap_put to deal
> > with "const void *".  That way, it would not be necessary to cast
> > away the constness like so:
> >
> >>     -+	strmap_put(failed_refs, refname, ref_transaction_error_msg(err));
> >>     ++	strmap_put(failed_refs, refname, (char *)ref_transaction_error_msg(err));
> >
> > without harming the other side, namely
> >
> >>     @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com
> >>      +		if (reported_error)
> >>      +			cmd->error_string = reported_error;
> >>      +		else if (strmap_contains(&failed_refs, cmd->ref_name))
> >>     -+			cmd->error_string = xstrdup(strmap_get(&failed_refs, cmd->ref_name));
> >>     ++			cmd->error_string = strmap_get(&failed_refs, cmd->ref_name);
> >
> > this piece of code.
> >
> > It may not make sense, and even if it did, of course, it is totally
> > outside of this series.
> >
> > Thanks.
> 
> It definitely does, The only other typecast I did find for `strmap_put`
> was within 'strmap.h'. Nevertheless, I think it makes sense to make that
> change. strmap shouldn't modify the data provided. Perhaps #leftoverbits.

I'm not sure that is a good idea. Even though strmap does not touch the
void data pointer itself, it is accessible to the callers, and we do not
know if they stored const data or not, or how they plan to access it.

If we store a "const void *" pointer and returned that via strmap_get(),
then there will be callers who need to cast away the constness.

If we store and return a "void *" pointer as we do now, but accept a
const pointer via strmap_put(), then we're casting away potentially
important const-ness without the caller even seeing it. I think it's
safer for the client to do the cast explicitly (since they are the ones
who know how they plan to use it).

I don't think we can really represent what we want in C's type system.
But if we wanted a safe(r) interface that didn't involve type-casting,
we might be able to do something like:

  - the strmap stores an extra flag for "the data pointer is const",
    which can be set by strmap_init(). (It is tempting to replace
    strmap_clear()'s free_entries parameter by checking this flag, but
    the two are not always going to be exactly the same).

  - introduce strmap_get_const() and strmap_put_const() which give and
    receive const pointers

  - in the const functions, BUG() if the "pointers are const" flag is
    not set

But it feels gross, and it only gives runtime checking, not
compile-time. What we really want are generics that can carry the type
annotation for a particular instantiated map. That is probably pretty
easy in most modern languages, but not really in C without horrible
macros. :)

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 0/4] fetch/receive: use batched reference updates
  2025-05-22  6:00       ` Jeff King
@ 2025-05-22  8:50         ` Karthik Nayak
  2025-05-22 15:31           ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Karthik Nayak @ 2025-05-22  8:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Patrick Steinhardt

[-- Attachment #1: Type: text/plain, Size: 3643 bytes --]

Jeff King <peff@peff.net> writes:

> On Tue, May 20, 2025 at 02:05:09AM -0700, Karthik Nayak wrote:
>
>> > Not an issue with this series at all, but one thing I wondered is if
>> > it makes sense to change the type of strmap_get/strmap_put to deal
>> > with "const void *".  That way, it would not be necessary to cast
>> > away the constness like so:
>> >
>> >>     -+	strmap_put(failed_refs, refname, ref_transaction_error_msg(err));
>> >>     ++	strmap_put(failed_refs, refname, (char *)ref_transaction_error_msg(err));
>> >
>> > without harming the other side, namely
>> >
>> >>     @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com
>> >>      +		if (reported_error)
>> >>      +			cmd->error_string = reported_error;
>> >>      +		else if (strmap_contains(&failed_refs, cmd->ref_name))
>> >>     -+			cmd->error_string = xstrdup(strmap_get(&failed_refs, cmd->ref_name));
>> >>     ++			cmd->error_string = strmap_get(&failed_refs, cmd->ref_name);
>> >
>> > this piece of code.
>> >
>> > It may not make sense, and even if it did, of course, it is totally
>> > outside of this series.
>> >
>> > Thanks.
>>
>> It definitely does, The only other typecast I did find for `strmap_put`
>> was within 'strmap.h'. Nevertheless, I think it makes sense to make that
>> change. strmap shouldn't modify the data provided. Perhaps #leftoverbits.
>
> I'm not sure that is a good idea. Even though strmap does not touch the
> void data pointer itself, it is accessible to the callers, and we do not
> know if they stored const data or not, or how they plan to access it.
>
> If we store a "const void *" pointer and returned that via strmap_get(),
> then there will be callers who need to cast away the constness.
>
> If we store and return a "void *" pointer as we do now, but accept a
> const pointer via strmap_put(), then we're casting away potentially
> important const-ness without the caller even seeing it. I think it's
> safer for the client to do the cast explicitly (since they are the ones
> who know how they plan to use it).
>

But isn't that the case now anyways? We always lose the const-ness since
we only accept a 'void *'. But by only changing 'strmap_put()' to accept
a 'const void *', but storing and returning a 'void *'. We simply modify
the current construct to also say that any data received is not
modified. But I do see your point, we'll have to cast there anyways and
might as well do it on the client side.


> I don't think we can really represent what we want in C's type system.
> But if we wanted a safe(r) interface that didn't involve type-casting,
> we might be able to do something like:
>
>   - the strmap stores an extra flag for "the data pointer is const",
>     which can be set by strmap_init(). (It is tempting to replace
>     strmap_clear()'s free_entries parameter by checking this flag, but
>     the two are not always going to be exactly the same).
>
>   - introduce strmap_get_const() and strmap_put_const() which give and
>     receive const pointers
>
>   - in the const functions, BUG() if the "pointers are const" flag is
>     not set
>
> But it feels gross, and it only gives runtime checking, not
> compile-time. What we really want are generics that can carry the type
> annotation for a particular instantiated map. That is probably pretty
> easy in most modern languages, but not really in C without horrible
> macros. :)
>
> -Peff

Agreed, and then there is additional load to ensure users will use
'strmap_put()' and 'strmap_put_const()' as required and simply not cast
away.

Overall I agree with what you're saying. Thanks for spelling it out.

Karthik

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 0/4] fetch/receive: use batched reference updates
  2025-05-22  8:50         ` Karthik Nayak
@ 2025-05-22 15:31           ` Jeff King
  0 siblings, 0 replies; 41+ messages in thread
From: Jeff King @ 2025-05-22 15:31 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: Junio C Hamano, git, Patrick Steinhardt

On Thu, May 22, 2025 at 01:50:18AM -0700, Karthik Nayak wrote:

> > If we store and return a "void *" pointer as we do now, but accept a
> > const pointer via strmap_put(), then we're casting away potentially
> > important const-ness without the caller even seeing it. I think it's
> > safer for the client to do the cast explicitly (since they are the ones
> > who know how they plan to use it).
> >
> 
> But isn't that the case now anyways? We always lose the const-ness since
> we only accept a 'void *'. But by only changing 'strmap_put()' to accept
> a 'const void *', but storing and returning a 'void *'. We simply modify
> the current construct to also say that any data received is not
> modified. But I do see your point, we'll have to cast there anyways and
> might as well do it on the client side.

Right, there is no winning here in the type system. But by pushing the
cast onto the calling side, it is more visible and closer to the code
that knows how the result will be used.

> > I don't think we can really represent what we want in C's type system.
> > But if we wanted a safe(r) interface that didn't involve type-casting,
> > we might be able to do something like:
> [...]
> Agreed, and then there is additional load to ensure users will use
> 'strmap_put()' and 'strmap_put_const()' as required and simply not cast
> away.

That part I'm not too worried about, since casts are easy-ish to catch
and flag in reviews. But I'm not sure if it's materially making the
world a better place to have the extra complexity.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2025-05-22 15:31 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-14  9:03 [PATCH 0/3] fetch/receive: use batched reference updates Karthik Nayak
2025-05-14  9:03 ` [PATCH 1/3] fetch: " Karthik Nayak
2025-05-14 12:31   ` Patrick Steinhardt
2025-05-15 11:13     ` Karthik Nayak
2025-05-15 11:30       ` Patrick Steinhardt
2025-05-15 11:36         ` Karthik Nayak
2025-05-14 17:36   ` Junio C Hamano
2025-05-14  9:03 ` [PATCH 2/3] send-pack: fix memory leak around duplicate refs Karthik Nayak
2025-05-14 17:46   ` Junio C Hamano
2025-05-15 11:23     ` Karthik Nayak
2025-05-14  9:03 ` [PATCH 3/3] receive-pack: use batched reference updates Karthik Nayak
2025-05-14 12:31   ` Patrick Steinhardt
2025-05-14 19:00     ` Junio C Hamano
2025-05-15 11:30     ` Karthik Nayak
2025-05-15 14:07 ` [PATCH v2 0/4] fetch/receive: " Karthik Nayak
2025-05-15 14:07   ` [PATCH v2 1/4] refs: add function to translate errors to strings Karthik Nayak
2025-05-15 19:11     ` Jeff King
2025-05-16  9:11       ` Karthik Nayak
2025-05-15 20:26     ` Junio C Hamano
2025-05-16  9:12       ` Karthik Nayak
2025-05-15 14:07   ` [PATCH v2 2/4] fetch: use batched reference updates Karthik Nayak
2025-05-16  5:40     ` Patrick Steinhardt
2025-05-16  9:53       ` Karthik Nayak
2025-05-16 10:00         ` Patrick Steinhardt
2025-05-18 11:30           ` Karthik Nayak
2025-05-15 14:07   ` [PATCH v2 3/4] send-pack: fix memory leak around duplicate refs Karthik Nayak
2025-05-15 14:07   ` [PATCH v2 4/4] receive-pack: use batched reference updates Karthik Nayak
2025-05-15 18:55     ` Jeff King
2025-05-15 19:09       ` Jeff King
2025-05-16 19:49         ` Karthik Nayak
2025-05-19  9:58 ` [PATCH v3 0/4] fetch/receive: " Karthik Nayak
2025-05-19  9:58   ` [PATCH v3 1/4] refs: add function to translate errors to strings Karthik Nayak
2025-05-19  9:58   ` [PATCH v3 2/4] fetch: use batched reference updates Karthik Nayak
2025-05-19  9:58   ` [PATCH v3 3/4] send-pack: fix memory leak around duplicate refs Karthik Nayak
2025-05-19  9:58   ` [PATCH v3 4/4] receive-pack: use batched reference updates Karthik Nayak
2025-05-19 18:14   ` [PATCH v3 0/4] fetch/receive: " Junio C Hamano
2025-05-20  9:05     ` Karthik Nayak
2025-05-21 13:14       ` Junio C Hamano
2025-05-22  6:00       ` Jeff King
2025-05-22  8:50         ` Karthik Nayak
2025-05-22 15:31           ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).