git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] upload-pack: don't ACK non-commits repeatedly
@ 2025-09-03  4:54 Patrick Steinhardt
  2025-09-03  4:54 ` [PATCH 1/2] t5530: modernize tests Patrick Steinhardt
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Patrick Steinhardt @ 2025-09-03  4:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Junio C Hamano

Hi,

this patch series addresses an issue with storing duplicate object IDs
sent by the client in git-upload-pack(1). If the client sends repeated
"have" lines for an object ID that doesn't refer to a commit, then we
end up storing that object ID repeatedly in the `have_obj` array. This
leads to sending out repeated "ACK"s for the same object.

The series applies on top of "maint" at c44beea485 (Git 2.51,
2025-08-17).

Thanks!

Patrick

---
Patrick Steinhardt (2):
      t5530: modernize tests
      upload-pack: don't ACK non-commits repeatedly in protocol v2

 t/t5530-upload-pack-error.sh | 68 +++++++++++++++++++++++++++-----------------
 upload-pack.c                | 19 ++++++-------
 2 files changed, 51 insertions(+), 36 deletions(-)


---
base-commit: c44beea485f0f2feaf460e2ac87fdd5608d63cf0
change-id: 20250903-b4-pks-upload-pack-repeated-non-commit-acks-49b363e1d0e1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] t5530: modernize tests
  2025-09-03  4:54 [PATCH 0/2] upload-pack: don't ACK non-commits repeatedly Patrick Steinhardt
@ 2025-09-03  4:54 ` Patrick Steinhardt
  2025-09-04  4:10   ` Junio C Hamano
  2025-09-03  4:54 ` [PATCH 2/2] upload-pack: don't ACK non-commits repeatedly in protocol v2 Patrick Steinhardt
  2025-09-05  6:18 ` [PATCH v2 0/2] upload-pack: don't ACK non-commits repeatedly Patrick Steinhardt
  2 siblings, 1 reply; 12+ messages in thread
From: Patrick Steinhardt @ 2025-09-03  4:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Junio C Hamano

Refactor tests to follow modern best practices:

  - Merge together tests that set up and verify a single use case.

  - Drop empty newlines at the beginning and end of test bodies.

  - Don't change directories in the main test body.

  - Remove an unused `D` variable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t5530-upload-pack-error.sh | 33 +++++----------------------------
 1 file changed, 5 insertions(+), 28 deletions(-)

diff --git a/t/t5530-upload-pack-error.sh b/t/t5530-upload-pack-error.sh
index 558eedf25a4..8e505786f1b 100755
--- a/t/t5530-upload-pack-error.sh
+++ b/t/t5530-upload-pack-error.sh
@@ -4,8 +4,6 @@ test_description='errors in upload-pack'
 
 . ./test-lib.sh
 
-D=$(pwd)
-
 corrupt_repo () {
 	object_sha1=$(git rev-parse "$1") &&
 	ob=$(expr "$object_sha1" : "\(..\)") &&
@@ -21,11 +19,7 @@ test_expect_success 'setup and corrupt repository' '
 	test_tick &&
 	echo changed >file &&
 	git commit -a -m changed &&
-	corrupt_repo HEAD:file
-
-'
-
-test_expect_success 'fsck fails' '
+	corrupt_repo HEAD:file &&
 	test_must_fail git fsck
 '
 
@@ -40,17 +34,12 @@ test_expect_success 'upload-pack fails due to error in pack-objects packing' '
 '
 
 test_expect_success 'corrupt repo differently' '
-
 	git hash-object -w file &&
-	corrupt_repo HEAD^^{tree}
-
-'
-
-test_expect_success 'fsck fails' '
+	corrupt_repo HEAD^^{tree} &&
 	test_must_fail git fsck
 '
-test_expect_success 'upload-pack fails due to error in rev-list' '
 
+test_expect_success 'upload-pack fails due to error in rev-list' '
 	printf "%04xwant %s\n%04xshallow %s00000009done\n0000" \
 		$(($hexsz + 10)) $(git rev-parse HEAD) \
 		$(($hexsz + 12)) $(git rev-parse HEAD^) >input &&
@@ -59,7 +48,6 @@ test_expect_success 'upload-pack fails due to error in rev-list' '
 '
 
 test_expect_success 'upload-pack fails due to bad want (no object)' '
-
 	printf "%04xwant %s multi_ack_detailed\n00000009done\n0000" \
 		$(($hexsz + 29)) $(test_oid deadbeef) >input &&
 	test_must_fail git upload-pack . <input >output 2>output.err &&
@@ -69,7 +57,6 @@ test_expect_success 'upload-pack fails due to bad want (no object)' '
 '
 
 test_expect_success 'upload-pack fails due to bad want (not tip)' '
-
 	oid=$(echo an object we have | git hash-object -w --stdin) &&
 	printf "%04xwant %s multi_ack_detailed\n00000009done\n0000" \
 		$(($hexsz + 29)) "$oid" >input &&
@@ -80,7 +67,6 @@ test_expect_success 'upload-pack fails due to bad want (not tip)' '
 '
 
 test_expect_success 'upload-pack fails due to error in pack-objects enumeration' '
-
 	printf "%04xwant %s\n00000009done\n0000" \
 		$((hexsz + 10)) $(git rev-parse HEAD) >input &&
 	test_must_fail git upload-pack . <input >/dev/null 2>output.err &&
@@ -105,18 +91,9 @@ test_expect_success 'upload-pack tolerates EOF just after stateless client wants
 	test_cmp expect actual
 '
 
-test_expect_success 'create empty repository' '
-
-	mkdir foo &&
-	cd foo &&
-	git init
-
-'
-
 test_expect_success 'fetch fails' '
-
-	test_must_fail git fetch .. main
-
+	git init foo &&
+	test_must_fail git -C foo fetch .. main
 '
 
 test_done

-- 
2.51.0.384.g4c02a37b29.dirty


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] upload-pack: don't ACK non-commits repeatedly in protocol v2
  2025-09-03  4:54 [PATCH 0/2] upload-pack: don't ACK non-commits repeatedly Patrick Steinhardt
  2025-09-03  4:54 ` [PATCH 1/2] t5530: modernize tests Patrick Steinhardt
@ 2025-09-03  4:54 ` Patrick Steinhardt
  2025-09-04  4:23   ` Junio C Hamano
  2025-09-05  6:18 ` [PATCH v2 0/2] upload-pack: don't ACK non-commits repeatedly Patrick Steinhardt
  2 siblings, 1 reply; 12+ messages in thread
From: Patrick Steinhardt @ 2025-09-03  4:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Junio C Hamano

When a client performs a fetch or clone they can optionally send "have"
lines to tell the server which objects they already have available
locally. These object IDs are stored by the server in an object array so
that it can remember any objects it doesn't have to include in the pack
sent to the client.

While there isn't any reason to do so, clients are free to send the same
"have" line repeatedly. git-upload-pack(1) already knows to handle this
well: every commit it has seen via a "have" line gets marked with the
`THEY_HAVE` flag, and if such a commit is seen repeatedly we know to not
process it another time. This also has the effect that we only store the
object ID once, only, in the `have_obj` array.

There is an edge case though: if the client sends an object ID that does
not refer to a commit we neither store nor check the `THEY_HAVE` flag.
This means that we repeatedly store the same object ID in our `have_obj`
array, with two consequences:

  - In protocol v2 we deduplicate ACKs for commits, but not for any
    other objects as we send ACKs for every object ID in the `have_obj`
    array.

  - The `have_obj` array can grow in size indefinitely with both
    protocols.

The potentially-more-serious issue is the second one, as we basically
have a way for an adversary to allocate arbitrarily large buffers now.
Ultimately, this doesn't seem to be all that serious though: on my
machine, the growth of that array is at around 4MB/s, and after roughly
five minutes I was only at 1GB RSS. So this is concerning, but only
mildly so.

Fix this bug by storing the `THEY_HAVE` flag independent of the object
type so that we don't store duplicate object IDs in `have_obj` anymore.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t5530-upload-pack-error.sh | 39 +++++++++++++++++++++++++++++++++++++++
 upload-pack.c                | 19 +++++++++----------
 2 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/t/t5530-upload-pack-error.sh b/t/t5530-upload-pack-error.sh
index 8e505786f1..d40292cfb7 100755
--- a/t/t5530-upload-pack-error.sh
+++ b/t/t5530-upload-pack-error.sh
@@ -96,4 +96,43 @@ test_expect_success 'fetch fails' '
 	test_must_fail git -C foo fetch .. main
 '
 
+test_expect_success 'upload-pack ACKs repeated non-commit objects repeatedly (protocol v0)' '
+	commit_id=$(git rev-parse HEAD) &&
+	tree_id=$(git rev-parse HEAD^{tree}) &&
+	test-tool pkt-line pack >request <<-EOF &&
+	want $commit_id
+	0000
+	have $tree_id
+	have $tree_id
+	0000
+	EOF
+	git upload-pack --stateless-rpc . <request >actual &&
+	depacketize <actual >actual.raw &&
+	grep ^ACK actual.raw >actual.acks &&
+	cat >expect <<-EOF &&
+	ACK $tree_id
+	ACK $tree_id
+	EOF
+	test_cmp expect actual.acks
+'
+
+test_expect_success 'upload-pack ACKs repeated non-commit objects once only (protocol v2)' '
+	commit_id=$(git rev-parse HEAD) &&
+	tree_id=$(git rev-parse HEAD^{tree}) &&
+	test-tool pkt-line pack >request <<-EOF &&
+	command=fetch
+	object-format=$(test_oid algo)
+	0001
+	want $commit_id
+	have $tree_id
+	have $tree_id
+	0000
+	EOF
+	GIT_PROTOCOL=version=2 git upload-pack . <request >actual &&
+	depacketize <actual >actual.raw &&
+	grep ^ACK actual.raw >actual.acks &&
+	echo "ACK $tree_id" >expect &&
+	test_cmp expect actual.acks
+'
+
 test_done
diff --git a/upload-pack.c b/upload-pack.c
index 4f26f6afc7..fba3e339e2 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -476,20 +476,21 @@ static void create_pack_file(struct upload_pack_data *pack_data,
 
 static int do_got_oid(struct upload_pack_data *data, const struct object_id *oid)
 {
-	int we_knew_they_have = 0;
 	struct object *o = parse_object_with_flags(the_repository, oid,
 						   PARSE_OBJECT_SKIP_HASH_CHECK |
 						   PARSE_OBJECT_DISCARD_TREE);
 
 	if (!o)
 		die("oops (%s)", oid_to_hex(oid));
+
+	if (o->flags & THEY_HAVE)
+		return 0;
+	o->flags |= THEY_HAVE;
+
 	if (o->type == OBJ_COMMIT) {
 		struct commit_list *parents;
 		struct commit *commit = (struct commit *)o;
-		if (o->flags & THEY_HAVE)
-			we_knew_they_have = 1;
-		else
-			o->flags |= THEY_HAVE;
+
 		if (!data->oldest_have || (commit->date < data->oldest_have))
 			data->oldest_have = commit->date;
 		for (parents = commit->parents;
@@ -497,11 +498,9 @@ static int do_got_oid(struct upload_pack_data *data, const struct object_id *oid
 		     parents = parents->next)
 			parents->item->object.flags |= THEY_HAVE;
 	}
-	if (!we_knew_they_have) {
-		add_object_array(o, NULL, &data->have_obj);
-		return 1;
-	}
-	return 0;
+
+	add_object_array(o, NULL, &data->have_obj);
+	return 1;
 }
 
 static int got_oid(struct upload_pack_data *data,

-- 
2.51.0.384.g4c02a37b29.dirty


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] t5530: modernize tests
  2025-09-03  4:54 ` [PATCH 1/2] t5530: modernize tests Patrick Steinhardt
@ 2025-09-04  4:10   ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2025-09-04  4:10 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Taylor Blau

Patrick Steinhardt <ps@pks.im> writes:

> -D=$(pwd)
> -
> @@ -21,11 +19,7 @@ test_expect_success 'setup and corrupt repository' '
>  	test_tick &&
>  	echo changed >file &&
>  	git commit -a -m changed &&
> -	corrupt_repo HEAD:file
> -
> -'
> -
> -test_expect_success 'fsck fails' '
> +	corrupt_repo HEAD:file &&
>  	test_must_fail git fsck
>  '
>  
> @@ -40,17 +34,12 @@ test_expect_success 'upload-pack fails due to error in pack-objects packing' '
>  '
>  
>  test_expect_success 'corrupt repo differently' '
> -
>  	git hash-object -w file &&
> -	corrupt_repo HEAD^^{tree}
> -
> -'
> -
> -test_expect_success 'fsck fails' '
> +	corrupt_repo HEAD^^{tree} &&
>  	test_must_fail git fsck
>  '

Both changes make sense.

> -test_expect_success 'create empty repository' '
> -
> -	mkdir foo &&
> -	cd foo &&
> -	git init
> -
> -'
> -
>  test_expect_success 'fetch fails' '
> -
> -	test_must_fail git fetch .. main
> -
> +	git init foo &&
> +	test_must_fail git -C foo fetch .. main
>  '

Nice.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] upload-pack: don't ACK non-commits repeatedly in protocol v2
  2025-09-03  4:54 ` [PATCH 2/2] upload-pack: don't ACK non-commits repeatedly in protocol v2 Patrick Steinhardt
@ 2025-09-04  4:23   ` Junio C Hamano
  2025-09-04 12:42     ` Patrick Steinhardt
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2025-09-04  4:23 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Taylor Blau

Patrick Steinhardt <ps@pks.im> writes:

> This means that we repeatedly store the same object ID in our `have_obj`
> array, with two consequences:
>
>   - In protocol v2 we deduplicate ACKs for commits, but not for any
>     other objects as we send ACKs for every object ID in the `have_obj`
>     array.
>
>   - The `have_obj` array can grow in size indefinitely with both
>     protocols.
>
> The potentially-more-serious issue is the second one, as we basically
> have a way for an adversary to allocate arbitrarily large buffers now.
> Ultimately, this doesn't seem to be all that serious though: on my
> machine, the growth of that array is at around 4MB/s, and after roughly
> five minutes I was only at 1GB RSS. So this is concerning, but only
> mildly so.
>
> Fix this bug by storing the `THEY_HAVE` flag independent of the object
> type so that we don't store duplicate object IDs in `have_obj` anymore.

Makes sense.

> diff --git a/upload-pack.c b/upload-pack.c
> index 4f26f6afc7..fba3e339e2 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -476,20 +476,21 @@ static void create_pack_file(struct upload_pack_data *pack_data,
>  
>  static int do_got_oid(struct upload_pack_data *data, const struct object_id *oid)
>  {
> -	int we_knew_they_have = 0;
>  	struct object *o = parse_object_with_flags(the_repository, oid,
>  						   PARSE_OBJECT_SKIP_HASH_CHECK |
>  						   PARSE_OBJECT_DISCARD_TREE);
>  
>  	if (!o)
>  		die("oops (%s)", oid_to_hex(oid));
> +
> +	if (o->flags & THEY_HAVE)
> +		return 0;
> +	o->flags |= THEY_HAVE;
> +
>  	if (o->type == OBJ_COMMIT) {
>  		struct commit_list *parents;
>  		struct commit *commit = (struct commit *)o;
> -		if (o->flags & THEY_HAVE)
> -			we_knew_they_have = 1;
> -		else
> -			o->flags |= THEY_HAVE;
> +

OK, so regardless of the type, an object that we already know they
have do not need to be processed twice, which makes sense?

For commits, we used to ensure that even if we see a commit that we
know they have for the second time, instead of returning, we marked
its immediate parents with THEY_HAVE bit.  We no longer do so.

Instead, the new code only paints the parents of a commit that we
haven't painted with THEY_HAVE.  Otherwise we return without doing
anything.

I wonder if they are equivalent.  In the original code, when our
commit A has a parent B, where neither of them we haven't painted,
we paint A with THEY_HAVE, we paint all parents of A with THEY_HAVE,
and put A in the have_obj array and return 1.  Then later when we
feed B to this function, we already know they have B, so we do not
add it to have_obj array, but otherwise we still paint B's parents
with THEY_HAVE bit.  In the new code with the same set-up, we paint
A and B with THEY_HAVE and put A in the have_obj array and return 1.
When the caller feeds B to this function, we pretty much do nothing
and return, so B's parent will not be painted, will it?

I must be misreading the patch, but I do not quite see how.

>  		if (!data->oldest_have || (commit->date < data->oldest_have))
>  			data->oldest_have = commit->date;
>  		for (parents = commit->parents;
> @@ -497,11 +498,9 @@ static int do_got_oid(struct upload_pack_data *data, const struct object_id *oid
>  		     parents = parents->next)
>  			parents->item->object.flags |= THEY_HAVE;
>  	}
> -	if (!we_knew_they_have) {
> -		add_object_array(o, NULL, &data->have_obj);
> -		return 1;
> -	}
> -	return 0;

> +
> +	add_object_array(o, NULL, &data->have_obj);
> +	return 1;
>  }
>  
>  static int got_oid(struct upload_pack_data *data,

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] upload-pack: don't ACK non-commits repeatedly in protocol v2
  2025-09-04  4:23   ` Junio C Hamano
@ 2025-09-04 12:42     ` Patrick Steinhardt
  2025-09-04 16:37       ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Patrick Steinhardt @ 2025-09-04 12:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Taylor Blau

On Wed, Sep 03, 2025 at 09:23:25PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > diff --git a/upload-pack.c b/upload-pack.c
> > index 4f26f6afc7..fba3e339e2 100644
> > --- a/upload-pack.c
> > +++ b/upload-pack.c
> > @@ -476,20 +476,21 @@ static void create_pack_file(struct upload_pack_data *pack_data,
> >  
> >  static int do_got_oid(struct upload_pack_data *data, const struct object_id *oid)
> >  {
> > -	int we_knew_they_have = 0;
> >  	struct object *o = parse_object_with_flags(the_repository, oid,
> >  						   PARSE_OBJECT_SKIP_HASH_CHECK |
> >  						   PARSE_OBJECT_DISCARD_TREE);
> >  
> >  	if (!o)
> >  		die("oops (%s)", oid_to_hex(oid));
> > +
> > +	if (o->flags & THEY_HAVE)
> > +		return 0;
> > +	o->flags |= THEY_HAVE;
> > +
> >  	if (o->type == OBJ_COMMIT) {
> >  		struct commit_list *parents;
> >  		struct commit *commit = (struct commit *)o;
> > -		if (o->flags & THEY_HAVE)
> > -			we_knew_they_have = 1;
> > -		else
> > -			o->flags |= THEY_HAVE;
> > +
> 
> OK, so regardless of the type, an object that we already know they
> have do not need to be processed twice, which makes sense?
> 
> For commits, we used to ensure that even if we see a commit that we
> know they have for the second time, instead of returning, we marked
> its immediate parents with THEY_HAVE bit.  We no longer do so.
> 
> Instead, the new code only paints the parents of a commit that we
> haven't painted with THEY_HAVE.  Otherwise we return without doing
> anything.
> 
> I wonder if they are equivalent.  In the original code, when our
> commit A has a parent B, where neither of them we haven't painted,
> we paint A with THEY_HAVE, we paint all parents of A with THEY_HAVE,
> and put A in the have_obj array and return 1.  Then later when we
> feed B to this function, we already know they have B, so we do not
> add it to have_obj array, but otherwise we still paint B's parents
> with THEY_HAVE bit.  In the new code with the same set-up, we paint
> A and B with THEY_HAVE and put A in the have_obj array and return 1.
> When the caller feeds B to this function, we pretty much do nothing
> and return, so B's parent will not be painted, will it?
> 
> I must be misreading the patch, but I do not quite see how.

Oh, I think you're indeed correct. I totally missed that we wouldn't
paint the grandparents anymore with this change. It's quite surprising
that this didn't cause any issues whatsoever. But that might be due to
how clients typically negotiate HAVEs with the remote side: when the
server ACKs an object there is no reason for the client to go deeper, so
they stop advertising any of its parents.

There will be cases where this breaks though. So we should probably move
the return until after we have marked parents, like in the below patch.

Patrick

diff --git a/upload-pack.c b/upload-pack.c
index fba3e339e2..9b9b149068 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -483,10 +483,6 @@ static int do_got_oid(struct upload_pack_data *data, const struct object_id *oid
 	if (!o)
 		die("oops (%s)", oid_to_hex(oid));
 
-	if (o->flags & THEY_HAVE)
-		return 0;
-	o->flags |= THEY_HAVE;
-
 	if (o->type == OBJ_COMMIT) {
 		struct commit_list *parents;
 		struct commit *commit = (struct commit *)o;
@@ -499,6 +495,10 @@ static int do_got_oid(struct upload_pack_data *data, const struct object_id *oid
 			parents->item->object.flags |= THEY_HAVE;
 	}
 
+	if (o->flags & THEY_HAVE)
+		return 0;
+	o->flags |= THEY_HAVE;
+
 	add_object_array(o, NULL, &data->have_obj);
 	return 1;
 }

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] upload-pack: don't ACK non-commits repeatedly in protocol v2
  2025-09-04 12:42     ` Patrick Steinhardt
@ 2025-09-04 16:37       ` Junio C Hamano
  2025-09-05  6:11         ` Patrick Steinhardt
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2025-09-04 16:37 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Taylor Blau

Patrick Steinhardt <ps@pks.im> writes:

> ... when the
> server ACKs an object there is no reason for the client to go deeper, so
> they stop advertising any of its parents.

Very true.  I did point out the difference in behaviour, but I
couldn't actually figure out what goodness we are deriving in this
code by marking grandparents (and possibly their ancestors) as
something they have.  Your change in behaviour could be seen as
stopping us from making unnecessary operations ;-)

If the other side is mischievous, they are not limited to feed us a
commit and then its parent in the naturally expected order.  They
could feed us A and then skip B and give us C that is a child of B.
Pre-painting B when we got A from them, in order to prepare for
seeing B (which we can return without doing anything) in the next
round, would not help us all that much, if they give us C after
giving us A, as we haven't even heard of C yet at that point.

But in the normal case against sane clients that do not skip the
probes, marking immediate parents (like B) when processing A might
be helping?  I dunno.  I also somehow thought that even normal case
we have an option to skip the probes in order to converge faster,
but I am misremembering.

cf. https://lore.kernel.org/git/?q=upload-pack%20fibonacci




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] upload-pack: don't ACK non-commits repeatedly in protocol v2
  2025-09-04 16:37       ` Junio C Hamano
@ 2025-09-05  6:11         ` Patrick Steinhardt
  0 siblings, 0 replies; 12+ messages in thread
From: Patrick Steinhardt @ 2025-09-05  6:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Taylor Blau

On Thu, Sep 04, 2025 at 09:37:58AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > ... when the
> > server ACKs an object there is no reason for the client to go deeper, so
> > they stop advertising any of its parents.
> 
> Very true.  I did point out the difference in behaviour, but I
> couldn't actually figure out what goodness we are deriving in this
> code by marking grandparents (and possibly their ancestors) as
> something they have.  Your change in behaviour could be seen as
> stopping us from making unnecessary operations ;-)
> 
> If the other side is mischievous, they are not limited to feed us a
> commit and then its parent in the naturally expected order.  They
> could feed us A and then skip B and give us C that is a child of B.
> Pre-painting B when we got A from them, in order to prepare for
> seeing B (which we can return without doing anything) in the next
> round, would not help us all that much, if they give us C after
> giving us A, as we haven't even heard of C yet at that point.

Yeah, it _feels_ like the change should be fine overall. But honestly, I
feel more comfortable with keeping the status quo for now so that the
change is only doing what it's advertised to do, namely cull the memory
growth.

So I'll apply the patch I had and send a v2, but we may think about the
other angle as a #leftoverbit.

> But in the normal case against sane clients that do not skip the
> probes, marking immediate parents (like B) when processing A might
> be helping?  I dunno.  I also somehow thought that even normal case
> we have an option to skip the probes in order to converge faster,
> but I am misremembering.
> 
> cf. https://lore.kernel.org/git/?q=upload-pack%20fibonacci

You probably mean `fetch.negotiationAlgorithm=skipping`?

Patrick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 0/2] upload-pack: don't ACK non-commits repeatedly
  2025-09-03  4:54 [PATCH 0/2] upload-pack: don't ACK non-commits repeatedly Patrick Steinhardt
  2025-09-03  4:54 ` [PATCH 1/2] t5530: modernize tests Patrick Steinhardt
  2025-09-03  4:54 ` [PATCH 2/2] upload-pack: don't ACK non-commits repeatedly in protocol v2 Patrick Steinhardt
@ 2025-09-05  6:18 ` Patrick Steinhardt
  2025-09-05  6:18   ` [PATCH v2 1/2] t5530: modernize tests Patrick Steinhardt
                     ` (2 more replies)
  2 siblings, 3 replies; 12+ messages in thread
From: Patrick Steinhardt @ 2025-09-05  6:18 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Junio C Hamano

Hi,

this patch series addresses an issue with storing duplicate object IDs
sent by the client in git-upload-pack(1). If the client sends repeated
"have" lines for an object ID that doesn't refer to a commit, then we
end up storing that object ID repeatedly in the `have_obj` array. This
leads to sending out repeated "ACK"s for the same object.

The series applies on top of "maint" at c44beea485 (Git 2.51,
2025-08-17).

Changes in v2:
  - Change ordering so that we always mark parents of already-seen
    commits as `THEY_HAVE`. The first version was _probably_ fine, but
    I don't feel too comfortable with a "probably".
  - Link to v1: https://lore.kernel.org/r/20250903-b4-pks-upload-pack-repeated-non-commit-acks-v1-0-4e019af4dddc@pks.im

Thanks!

Patrick

---
Patrick Steinhardt (2):
      t5530: modernize tests
      upload-pack: don't ACK non-commits repeatedly in protocol v2

 t/t5530-upload-pack-error.sh | 68 +++++++++++++++++++++++++++-----------------
 upload-pack.c                | 19 ++++++-------
 2 files changed, 51 insertions(+), 36 deletions(-)

Range-diff versus v1:

1:  a42d231813 = 1:  26ecbcf460 t5530: modernize tests
2:  1544160961 ! 2:  11e32bf5e6 upload-pack: don't ACK non-commits repeatedly in protocol v2
    @@ upload-pack.c: static void create_pack_file(struct upload_pack_data *pack_data,
      
      	if (!o)
      		die("oops (%s)", oid_to_hex(oid));
    -+
    -+	if (o->flags & THEY_HAVE)
    -+		return 0;
    -+	o->flags |= THEY_HAVE;
     +
      	if (o->type == OBJ_COMMIT) {
      		struct commit_list *parents;
    @@ upload-pack.c: static int do_got_oid(struct upload_pack_data *data, const struct
     -	}
     -	return 0;
     +
    ++	if (o->flags & THEY_HAVE)
    ++		return 0;
    ++	o->flags |= THEY_HAVE;
    ++
     +	add_object_array(o, NULL, &data->have_obj);
     +	return 1;
      }

---
base-commit: c44beea485f0f2feaf460e2ac87fdd5608d63cf0
change-id: 20250903-b4-pks-upload-pack-repeated-non-commit-acks-49b363e1d0e1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] t5530: modernize tests
  2025-09-05  6:18 ` [PATCH v2 0/2] upload-pack: don't ACK non-commits repeatedly Patrick Steinhardt
@ 2025-09-05  6:18   ` Patrick Steinhardt
  2025-09-05  6:18   ` [PATCH v2 2/2] upload-pack: don't ACK non-commits repeatedly in protocol v2 Patrick Steinhardt
  2025-09-05 21:39   ` [PATCH v2 0/2] upload-pack: don't ACK non-commits repeatedly Junio C Hamano
  2 siblings, 0 replies; 12+ messages in thread
From: Patrick Steinhardt @ 2025-09-05  6:18 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Junio C Hamano

Refactor tests to follow modern best practices:

  - Merge together tests that set up and verify a single use case.

  - Drop empty newlines at the beginning and end of test bodies.

  - Don't change directories in the main test body.

  - Remove an unused `D` variable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t5530-upload-pack-error.sh | 33 +++++----------------------------
 1 file changed, 5 insertions(+), 28 deletions(-)

diff --git a/t/t5530-upload-pack-error.sh b/t/t5530-upload-pack-error.sh
index 558eedf25a4..8e505786f1b 100755
--- a/t/t5530-upload-pack-error.sh
+++ b/t/t5530-upload-pack-error.sh
@@ -4,8 +4,6 @@ test_description='errors in upload-pack'
 
 . ./test-lib.sh
 
-D=$(pwd)
-
 corrupt_repo () {
 	object_sha1=$(git rev-parse "$1") &&
 	ob=$(expr "$object_sha1" : "\(..\)") &&
@@ -21,11 +19,7 @@ test_expect_success 'setup and corrupt repository' '
 	test_tick &&
 	echo changed >file &&
 	git commit -a -m changed &&
-	corrupt_repo HEAD:file
-
-'
-
-test_expect_success 'fsck fails' '
+	corrupt_repo HEAD:file &&
 	test_must_fail git fsck
 '
 
@@ -40,17 +34,12 @@ test_expect_success 'upload-pack fails due to error in pack-objects packing' '
 '
 
 test_expect_success 'corrupt repo differently' '
-
 	git hash-object -w file &&
-	corrupt_repo HEAD^^{tree}
-
-'
-
-test_expect_success 'fsck fails' '
+	corrupt_repo HEAD^^{tree} &&
 	test_must_fail git fsck
 '
-test_expect_success 'upload-pack fails due to error in rev-list' '
 
+test_expect_success 'upload-pack fails due to error in rev-list' '
 	printf "%04xwant %s\n%04xshallow %s00000009done\n0000" \
 		$(($hexsz + 10)) $(git rev-parse HEAD) \
 		$(($hexsz + 12)) $(git rev-parse HEAD^) >input &&
@@ -59,7 +48,6 @@ test_expect_success 'upload-pack fails due to error in rev-list' '
 '
 
 test_expect_success 'upload-pack fails due to bad want (no object)' '
-
 	printf "%04xwant %s multi_ack_detailed\n00000009done\n0000" \
 		$(($hexsz + 29)) $(test_oid deadbeef) >input &&
 	test_must_fail git upload-pack . <input >output 2>output.err &&
@@ -69,7 +57,6 @@ test_expect_success 'upload-pack fails due to bad want (no object)' '
 '
 
 test_expect_success 'upload-pack fails due to bad want (not tip)' '
-
 	oid=$(echo an object we have | git hash-object -w --stdin) &&
 	printf "%04xwant %s multi_ack_detailed\n00000009done\n0000" \
 		$(($hexsz + 29)) "$oid" >input &&
@@ -80,7 +67,6 @@ test_expect_success 'upload-pack fails due to bad want (not tip)' '
 '
 
 test_expect_success 'upload-pack fails due to error in pack-objects enumeration' '
-
 	printf "%04xwant %s\n00000009done\n0000" \
 		$((hexsz + 10)) $(git rev-parse HEAD) >input &&
 	test_must_fail git upload-pack . <input >/dev/null 2>output.err &&
@@ -105,18 +91,9 @@ test_expect_success 'upload-pack tolerates EOF just after stateless client wants
 	test_cmp expect actual
 '
 
-test_expect_success 'create empty repository' '
-
-	mkdir foo &&
-	cd foo &&
-	git init
-
-'
-
 test_expect_success 'fetch fails' '
-
-	test_must_fail git fetch .. main
-
+	git init foo &&
+	test_must_fail git -C foo fetch .. main
 '
 
 test_done

-- 
2.51.0.417.g1ba7204a04.dirty


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] upload-pack: don't ACK non-commits repeatedly in protocol v2
  2025-09-05  6:18 ` [PATCH v2 0/2] upload-pack: don't ACK non-commits repeatedly Patrick Steinhardt
  2025-09-05  6:18   ` [PATCH v2 1/2] t5530: modernize tests Patrick Steinhardt
@ 2025-09-05  6:18   ` Patrick Steinhardt
  2025-09-05 21:39   ` [PATCH v2 0/2] upload-pack: don't ACK non-commits repeatedly Junio C Hamano
  2 siblings, 0 replies; 12+ messages in thread
From: Patrick Steinhardt @ 2025-09-05  6:18 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Junio C Hamano

When a client performs a fetch or clone they can optionally send "have"
lines to tell the server which objects they already have available
locally. These object IDs are stored by the server in an object array so
that it can remember any objects it doesn't have to include in the pack
sent to the client.

While there isn't any reason to do so, clients are free to send the same
"have" line repeatedly. git-upload-pack(1) already knows to handle this
well: every commit it has seen via a "have" line gets marked with the
`THEY_HAVE` flag, and if such a commit is seen repeatedly we know to not
process it another time. This also has the effect that we only store the
object ID once, only, in the `have_obj` array.

There is an edge case though: if the client sends an object ID that does
not refer to a commit we neither store nor check the `THEY_HAVE` flag.
This means that we repeatedly store the same object ID in our `have_obj`
array, with two consequences:

  - In protocol v2 we deduplicate ACKs for commits, but not for any
    other objects as we send ACKs for every object ID in the `have_obj`
    array.

  - The `have_obj` array can grow in size indefinitely with both
    protocols.

The potentially-more-serious issue is the second one, as we basically
have a way for an adversary to allocate arbitrarily large buffers now.
Ultimately, this doesn't seem to be all that serious though: on my
machine, the growth of that array is at around 4MB/s, and after roughly
five minutes I was only at 1GB RSS. So this is concerning, but only
mildly so.

Fix this bug by storing the `THEY_HAVE` flag independent of the object
type so that we don't store duplicate object IDs in `have_obj` anymore.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t5530-upload-pack-error.sh | 39 +++++++++++++++++++++++++++++++++++++++
 upload-pack.c                | 19 +++++++++----------
 2 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/t/t5530-upload-pack-error.sh b/t/t5530-upload-pack-error.sh
index 8e505786f1..d40292cfb7 100755
--- a/t/t5530-upload-pack-error.sh
+++ b/t/t5530-upload-pack-error.sh
@@ -96,4 +96,43 @@ test_expect_success 'fetch fails' '
 	test_must_fail git -C foo fetch .. main
 '
 
+test_expect_success 'upload-pack ACKs repeated non-commit objects repeatedly (protocol v0)' '
+	commit_id=$(git rev-parse HEAD) &&
+	tree_id=$(git rev-parse HEAD^{tree}) &&
+	test-tool pkt-line pack >request <<-EOF &&
+	want $commit_id
+	0000
+	have $tree_id
+	have $tree_id
+	0000
+	EOF
+	git upload-pack --stateless-rpc . <request >actual &&
+	depacketize <actual >actual.raw &&
+	grep ^ACK actual.raw >actual.acks &&
+	cat >expect <<-EOF &&
+	ACK $tree_id
+	ACK $tree_id
+	EOF
+	test_cmp expect actual.acks
+'
+
+test_expect_success 'upload-pack ACKs repeated non-commit objects once only (protocol v2)' '
+	commit_id=$(git rev-parse HEAD) &&
+	tree_id=$(git rev-parse HEAD^{tree}) &&
+	test-tool pkt-line pack >request <<-EOF &&
+	command=fetch
+	object-format=$(test_oid algo)
+	0001
+	want $commit_id
+	have $tree_id
+	have $tree_id
+	0000
+	EOF
+	GIT_PROTOCOL=version=2 git upload-pack . <request >actual &&
+	depacketize <actual >actual.raw &&
+	grep ^ACK actual.raw >actual.acks &&
+	echo "ACK $tree_id" >expect &&
+	test_cmp expect actual.acks
+'
+
 test_done
diff --git a/upload-pack.c b/upload-pack.c
index 4f26f6afc7..9b9b149068 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -476,20 +476,17 @@ static void create_pack_file(struct upload_pack_data *pack_data,
 
 static int do_got_oid(struct upload_pack_data *data, const struct object_id *oid)
 {
-	int we_knew_they_have = 0;
 	struct object *o = parse_object_with_flags(the_repository, oid,
 						   PARSE_OBJECT_SKIP_HASH_CHECK |
 						   PARSE_OBJECT_DISCARD_TREE);
 
 	if (!o)
 		die("oops (%s)", oid_to_hex(oid));
+
 	if (o->type == OBJ_COMMIT) {
 		struct commit_list *parents;
 		struct commit *commit = (struct commit *)o;
-		if (o->flags & THEY_HAVE)
-			we_knew_they_have = 1;
-		else
-			o->flags |= THEY_HAVE;
+
 		if (!data->oldest_have || (commit->date < data->oldest_have))
 			data->oldest_have = commit->date;
 		for (parents = commit->parents;
@@ -497,11 +494,13 @@ static int do_got_oid(struct upload_pack_data *data, const struct object_id *oid
 		     parents = parents->next)
 			parents->item->object.flags |= THEY_HAVE;
 	}
-	if (!we_knew_they_have) {
-		add_object_array(o, NULL, &data->have_obj);
-		return 1;
-	}
-	return 0;
+
+	if (o->flags & THEY_HAVE)
+		return 0;
+	o->flags |= THEY_HAVE;
+
+	add_object_array(o, NULL, &data->have_obj);
+	return 1;
 }
 
 static int got_oid(struct upload_pack_data *data,

-- 
2.51.0.417.g1ba7204a04.dirty


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/2] upload-pack: don't ACK non-commits repeatedly
  2025-09-05  6:18 ` [PATCH v2 0/2] upload-pack: don't ACK non-commits repeatedly Patrick Steinhardt
  2025-09-05  6:18   ` [PATCH v2 1/2] t5530: modernize tests Patrick Steinhardt
  2025-09-05  6:18   ` [PATCH v2 2/2] upload-pack: don't ACK non-commits repeatedly in protocol v2 Patrick Steinhardt
@ 2025-09-05 21:39   ` Junio C Hamano
  2 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2025-09-05 21:39 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Taylor Blau

Patrick Steinhardt <ps@pks.im> writes:

> Hi,
>
> this patch series addresses an issue with storing duplicate object IDs
> sent by the client in git-upload-pack(1). If the client sends repeated
> "have" lines for an object ID that doesn't refer to a commit, then we
> end up storing that object ID repeatedly in the `have_obj` array. This
> leads to sending out repeated "ACK"s for the same object.
>
> The series applies on top of "maint" at c44beea485 (Git 2.51,
> 2025-08-17).
>
> Changes in v2:
>   - Change ordering so that we always mark parents of already-seen
>     commits as `THEY_HAVE`. The first version was _probably_ fine, but
>     I don't feel too comfortable with a "probably".
> ...
> 2:  1544160961 ! 2:  11e32bf5e6 upload-pack: don't ACK non-commits repeatedly in protocol v2
>     @@ upload-pack.c: static void create_pack_file(struct upload_pack_data *pack_data,
>       
>       	if (!o)
>       		die("oops (%s)", oid_to_hex(oid));
>     -+
>     -+	if (o->flags & THEY_HAVE)
>     -+		return 0;
>     -+	o->flags |= THEY_HAVE;
>      +
>       	if (o->type == OBJ_COMMIT) {
>       		struct commit_list *parents;
>     @@ upload-pack.c: static int do_got_oid(struct upload_pack_data *data, const struct
>      -	}
>      -	return 0;
>      +
>     ++	if (o->flags & THEY_HAVE)
>     ++		return 0;
>     ++	o->flags |= THEY_HAVE;
>     ++
>      +	add_object_array(o, NULL, &data->have_obj);
>      +	return 1;
>       }

OK.  So we used to do the "if the object were marked already, do not
bother adding it to the have_obj array again" only for commits, but
now we do not special case commits for that part of the logic.
Everybody is protected against getting added twice.

We still do special case commits by marking their direct parents
(but we do not add them to the have_obj array) as before, because we
want to play conservatively.

Which makes sense.

Thanks, will queue.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-09-05 21:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-03  4:54 [PATCH 0/2] upload-pack: don't ACK non-commits repeatedly Patrick Steinhardt
2025-09-03  4:54 ` [PATCH 1/2] t5530: modernize tests Patrick Steinhardt
2025-09-04  4:10   ` Junio C Hamano
2025-09-03  4:54 ` [PATCH 2/2] upload-pack: don't ACK non-commits repeatedly in protocol v2 Patrick Steinhardt
2025-09-04  4:23   ` Junio C Hamano
2025-09-04 12:42     ` Patrick Steinhardt
2025-09-04 16:37       ` Junio C Hamano
2025-09-05  6:11         ` Patrick Steinhardt
2025-09-05  6:18 ` [PATCH v2 0/2] upload-pack: don't ACK non-commits repeatedly Patrick Steinhardt
2025-09-05  6:18   ` [PATCH v2 1/2] t5530: modernize tests Patrick Steinhardt
2025-09-05  6:18   ` [PATCH v2 2/2] upload-pack: don't ACK non-commits repeatedly in protocol v2 Patrick Steinhardt
2025-09-05 21:39   ` [PATCH v2 0/2] upload-pack: don't ACK non-commits repeatedly Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).