git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org,  Jeff King <peff@peff.net>,
	 Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH 2/2] upload-pack: don't ACK non-commits repeatedly in protocol v2
Date: Wed, 03 Sep 2025 21:23:25 -0700	[thread overview]
Message-ID: <xmqqtt1ic0ci.fsf@gitster.g> (raw)
In-Reply-To: <20250903-b4-pks-upload-pack-repeated-non-commit-acks-v1-2-4e019af4dddc@pks.im> (Patrick Steinhardt's message of "Wed, 03 Sep 2025 06:54:11 +0200")

Patrick Steinhardt <ps@pks.im> writes:

> This means that we repeatedly store the same object ID in our `have_obj`
> array, with two consequences:
>
>   - In protocol v2 we deduplicate ACKs for commits, but not for any
>     other objects as we send ACKs for every object ID in the `have_obj`
>     array.
>
>   - The `have_obj` array can grow in size indefinitely with both
>     protocols.
>
> The potentially-more-serious issue is the second one, as we basically
> have a way for an adversary to allocate arbitrarily large buffers now.
> Ultimately, this doesn't seem to be all that serious though: on my
> machine, the growth of that array is at around 4MB/s, and after roughly
> five minutes I was only at 1GB RSS. So this is concerning, but only
> mildly so.
>
> Fix this bug by storing the `THEY_HAVE` flag independent of the object
> type so that we don't store duplicate object IDs in `have_obj` anymore.

Makes sense.

> diff --git a/upload-pack.c b/upload-pack.c
> index 4f26f6afc7..fba3e339e2 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -476,20 +476,21 @@ static void create_pack_file(struct upload_pack_data *pack_data,
>  
>  static int do_got_oid(struct upload_pack_data *data, const struct object_id *oid)
>  {
> -	int we_knew_they_have = 0;
>  	struct object *o = parse_object_with_flags(the_repository, oid,
>  						   PARSE_OBJECT_SKIP_HASH_CHECK |
>  						   PARSE_OBJECT_DISCARD_TREE);
>  
>  	if (!o)
>  		die("oops (%s)", oid_to_hex(oid));
> +
> +	if (o->flags & THEY_HAVE)
> +		return 0;
> +	o->flags |= THEY_HAVE;
> +
>  	if (o->type == OBJ_COMMIT) {
>  		struct commit_list *parents;
>  		struct commit *commit = (struct commit *)o;
> -		if (o->flags & THEY_HAVE)
> -			we_knew_they_have = 1;
> -		else
> -			o->flags |= THEY_HAVE;
> +

OK, so regardless of the type, an object that we already know they
have do not need to be processed twice, which makes sense?

For commits, we used to ensure that even if we see a commit that we
know they have for the second time, instead of returning, we marked
its immediate parents with THEY_HAVE bit.  We no longer do so.

Instead, the new code only paints the parents of a commit that we
haven't painted with THEY_HAVE.  Otherwise we return without doing
anything.

I wonder if they are equivalent.  In the original code, when our
commit A has a parent B, where neither of them we haven't painted,
we paint A with THEY_HAVE, we paint all parents of A with THEY_HAVE,
and put A in the have_obj array and return 1.  Then later when we
feed B to this function, we already know they have B, so we do not
add it to have_obj array, but otherwise we still paint B's parents
with THEY_HAVE bit.  In the new code with the same set-up, we paint
A and B with THEY_HAVE and put A in the have_obj array and return 1.
When the caller feeds B to this function, we pretty much do nothing
and return, so B's parent will not be painted, will it?

I must be misreading the patch, but I do not quite see how.

>  		if (!data->oldest_have || (commit->date < data->oldest_have))
>  			data->oldest_have = commit->date;
>  		for (parents = commit->parents;
> @@ -497,11 +498,9 @@ static int do_got_oid(struct upload_pack_data *data, const struct object_id *oid
>  		     parents = parents->next)
>  			parents->item->object.flags |= THEY_HAVE;
>  	}
> -	if (!we_knew_they_have) {
> -		add_object_array(o, NULL, &data->have_obj);
> -		return 1;
> -	}
> -	return 0;

> +
> +	add_object_array(o, NULL, &data->have_obj);
> +	return 1;
>  }
>  
>  static int got_oid(struct upload_pack_data *data,

  reply	other threads:[~2025-09-04  4:23 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-03  4:54 [PATCH 0/2] upload-pack: don't ACK non-commits repeatedly Patrick Steinhardt
2025-09-03  4:54 ` [PATCH 1/2] t5530: modernize tests Patrick Steinhardt
2025-09-04  4:10   ` Junio C Hamano
2025-09-03  4:54 ` [PATCH 2/2] upload-pack: don't ACK non-commits repeatedly in protocol v2 Patrick Steinhardt
2025-09-04  4:23   ` Junio C Hamano [this message]
2025-09-04 12:42     ` Patrick Steinhardt
2025-09-04 16:37       ` Junio C Hamano
2025-09-05  6:11         ` Patrick Steinhardt
2025-09-05  6:18 ` [PATCH v2 0/2] upload-pack: don't ACK non-commits repeatedly Patrick Steinhardt
2025-09-05  6:18   ` [PATCH v2 1/2] t5530: modernize tests Patrick Steinhardt
2025-09-05  6:18   ` [PATCH v2 2/2] upload-pack: don't ACK non-commits repeatedly in protocol v2 Patrick Steinhardt
2025-09-08 16:51     ` Jeff King
2025-09-05 21:39   ` [PATCH v2 0/2] upload-pack: don't ACK non-commits repeatedly Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqtt1ic0ci.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).