git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: Jonathan Tan <jonathantanmy@google.com>,
	 git@vger.kernel.org, hanyang.tony@bytedance.com
Subject: Re: [PATCH 2/3] index-pack: no blobs during outgoing link check
Date: Wed, 04 Dec 2024 07:16:24 +0900	[thread overview]
Message-ID: <xmqq1pyocszr.fsf@gitster.g> (raw)
In-Reply-To: <Z06ejDgTnC6gWXgx@pks.im> (Patrick Steinhardt's message of "Tue, 3 Dec 2024 07:00:44 +0100")

Patrick Steinhardt <ps@pks.im> writes:

>> The benefit of doing this is as above (fetch speedup), but the drawback
>> is that if the packfile to be indexed references a local blob directly
>> (that is, not through a local tree), that local blob is in danger of
>> being garbage collected. Such a situation may arise if we push local
>> commits, including one with a change to a blob in the root tree,
>> and then the server incorporates them into its main branch through a
>> "rebase" or "squash" merge strategy, and then we fetch the new main
>> branch from the server.
>
> Okay, so we know that we are basically doing the wrong thing with the
> optimization, but by skipping blobs we can get a significant speedup and
> the failure mode is that we will re-fetch the object in a later step.
> And because we think the situation is rare it shouldn't be a huge issue
> in practice.

That is how I read it, but the description may want to make the pros
and cons more explicit.  One of the reasons why the users choose to
use lazy clone is to avoid grabbing large blobs until they need
them, so I am not sure how they feel about having to discard and
then fetch again after creating a potentially large blob.  As long
as it does not happen repeatedly for the same blob, it probably is
OK.

> Without the context of the commit message this code snippet likely would
> not make any sense to a reader. The "correct" logic would be to record
> all objects, regardless of whether they are an object ID or not. But we
> explicitly choose not to as a tradeoff between performance and
> correctness.
>
> All to say that we should have a comment here that explains what is
> going on.

Thanks for reviewing and commenting.


  parent reply	other threads:[~2024-12-03 22:16 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-02 20:18 [PATCH 0/3] Performance improvements for repacking non-promisor objects Jonathan Tan
2024-12-02 20:18 ` [PATCH 1/3] index-pack: dedup first during outgoing link check Jonathan Tan
2024-12-02 21:24   ` Josh Steadmon
2024-12-02 20:18 ` [PATCH 2/3] index-pack: no blobs " Jonathan Tan
2024-12-03  6:00   ` Patrick Steinhardt
2024-12-03 21:40     ` Jonathan Tan
2024-12-03 22:16     ` Junio C Hamano [this message]
2024-12-02 20:18 ` [PATCH 3/3] index-pack: commit tree " Jonathan Tan
2024-12-03  3:10   ` Junio C Hamano
2024-12-03 21:42     ` Jonathan Tan
2024-12-04  0:21       ` Junio C Hamano
2024-12-09 20:29         ` Jonathan Tan
2024-12-09 23:51           ` Junio C Hamano
2024-12-02 21:25 ` [PATCH 0/3] Performance improvements for repacking non-promisor objects Josh Steadmon
2024-12-03  4:09   ` Junio C Hamano
2024-12-03  4:18 ` Junio C Hamano
2024-12-03  4:20   ` Junio C Hamano
2024-12-03  4:39     ` Junio C Hamano
2024-12-03 21:43 ` [PATCH v2 " Jonathan Tan
2024-12-03 21:43   ` [PATCH v2 1/3] index-pack --promisor: dedup before checking links Jonathan Tan
2024-12-03 21:43   ` [PATCH v2 2/3] index-pack --promisor: don't check blobs Jonathan Tan
2024-12-03 21:43   ` [PATCH v2 3/3] index-pack --promisor: also check commits' trees Jonathan Tan
2024-12-03 21:52 ` [PATCH v3 0/3] Performance improvements for repacking non-promisor objects Jonathan Tan
2024-12-03 21:52   ` [PATCH v3 1/3] index-pack --promisor: dedup before checking links Jonathan Tan
2024-12-04  4:36     ` Junio C Hamano
2024-12-03 21:52   ` [PATCH v3 2/3] index-pack --promisor: don't check blobs Jonathan Tan
2024-12-03 21:52   ` [PATCH v3 3/3] index-pack --promisor: also check commits' trees Jonathan Tan
2024-12-04  2:22   ` [PATCH v3 0/3] Performance improvements for repacking non-promisor objects Junio C Hamano
2024-12-04  4:46   ` [PATCH 4/3] index-pack: work around false positive use of uninitialized variable Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq1pyocszr.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=hanyang.tony@bytedance.com \
    --cc=jonathantanmy@google.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).