git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Karthik Nayak <karthik.188@gmail.com>
Subject: Re: [PATCH v2 07/13] builtin/index-pack: don't fetch promised objects for collision check
Date: Tue, 29 Apr 2025 08:15:54 +0200	[thread overview]
Message-ID: <aBBumhDhWoR9LEb3@pks.im> (raw)
In-Reply-To: <xmqq34dsarhv.fsf@gitster.g>

On Mon, Apr 28, 2025 at 02:46:52PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Any packed objects indexed via git-index-pack(1) are subject to a
> > collision check. This collision check has the intent to determine
> > whether we already have an object with the same object ID, but different
> > contents in the repository.
> >
> > The check whether the collision check is really needed is only performed
> > in case `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK)` tells
> > us that the object exists. But unless explicitly told otherwise by
> > passing `OBJECT_INFO_SKIP_FETCH_OBJECT`, this function will also cause
> > us to fetch the object in case it is part of a promisor pack. As such,
> > we may end up fetching the object only to check whether the fetched
> > object and the object that we're indexing have the same content.
> >
> > This behaviour is highly dubious and more likely than not unintended.
> > Fix it by converting to `has_object()`, which knows to neither reload
> > packfiles nor to fetch promisor objects by default.
> 
> It is unclear why you thing it is highly dubious from reading the
> above paragraph three times, though.
> 
> Is it that if we are suspicious of the incoming pack data we are
> indexing, we should also not be too trusting of the object that our
> promisor remote would be giving us?  To put it in reverse, our
> attitude being "we trust the first copy of object we saw", which
> translates to "we trust where we explicitly clone and fetch from" in
> the traditional world without lazy fetching, if somebody else we are
> explicitly fetching from offers us an object that the promisor
> remote would give us, we just do not bother if they are the same
> because it is not like we trust our promisor more than we trust the
> current counterpart we are fetching from?

Yes, exactly. When we don't have an object locally we don't have a trust
anchor for verifying that contents of the object look as expected. So
there are only two ways to do this:

  - Use a trust-on-first-use model. We trust the object we obtain
    initially and from thereon we start to treat it as the "correct"
    object and verify incoming objects with the same ID against it.

  - We only trust what everyone agrees one. In that case though we
    really should be cross-verifying with _all_ remotes, not only with
    the promisor remote.

Right now we do neither, but we end up treating the promisor as "more
trusted" than any of the other remotes.

I think it's completely unintentional that we end up fetching the object
from the promisor to perform a collision check against the packfile we
are about to index. It is highly likely that the promisor remote and the
remote that we're fetching from are the same anyway, so all this does is
to waste resources.

Anyway, I'll evict these patches from this series. I think a couple of
the sites are broken, but for now I care more about the bigger picture.

Patrick

  reply	other threads:[~2025-04-29  6:15 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-23  7:48 [PATCH 00/13] object-store: a handful of cleanups Patrick Steinhardt
2025-04-23  7:48 ` [PATCH 01/13] object-store: move `struct packed_git` into "packfile.h" Patrick Steinhardt
2025-04-23  7:48 ` [PATCH 02/13] object-store: drop `loose_object_path()` Patrick Steinhardt
2025-04-23  7:48 ` [PATCH 03/13] object-store: move and rename `odb_pack_keep()` Patrick Steinhardt
2025-04-23 10:03   ` Karthik Nayak
2025-04-23  7:48 ` [PATCH 04/13] object-store: move function declarations to their respective subsystems Patrick Steinhardt
2025-04-23  7:48 ` [PATCH 05/13] object-store: allow fetching objects via `has_object()` Patrick Steinhardt
2025-04-23 10:07   ` Karthik Nayak
2025-04-23  7:48 ` [PATCH 06/13] treewide: trivial conversions of `repo_has_object_file()` Patrick Steinhardt
2025-04-23  7:48 ` [PATCH 07/13] builtin/index-pack: don't fetch promised objects for collision check Patrick Steinhardt
2025-04-23 17:08   ` Karthik Nayak
2025-04-25  7:04     ` Patrick Steinhardt
2025-04-28 19:48       ` Karthik Nayak
2025-04-23  7:48 ` [PATCH 08/13] builtin/show-ref: don't fetch objects when printing refs Patrick Steinhardt
2025-04-23  7:48 ` [PATCH 09/13] refs: don't fetch promisor objects in `ref_resolves_to_object()` Patrick Steinhardt
2025-04-23 17:11   ` Karthik Nayak
2025-04-23  7:48 ` [PATCH 10/13] http-walker: don't fetch objects via promisor remotes Patrick Steinhardt
2025-04-23  7:48 ` [PATCH 11/13] list-objects: clarify how promised blobs are excluded Patrick Steinhardt
2025-04-23  7:48 ` [PATCH 12/13] bulk-checkin: don't fetch promised objects on write Patrick Steinhardt
2025-04-23  7:48 ` [PATCH 13/13] object-store: drop `repo_has_object_file()` Patrick Steinhardt
2025-04-23 17:20 ` [PATCH 00/13] object-store: a handful of cleanups Karthik Nayak
2025-04-25  7:07   ` Patrick Steinhardt
2025-04-25  7:08 ` [PATCH v2 " Patrick Steinhardt
2025-04-25  7:08   ` [PATCH v2 01/13] object-store: move `struct packed_git` into "packfile.h" Patrick Steinhardt
2025-04-25  7:08   ` [PATCH v2 02/13] object-store: drop `loose_object_path()` Patrick Steinhardt
2025-04-25  7:08   ` [PATCH v2 03/13] object-store: move and rename `odb_pack_keep()` Patrick Steinhardt
2025-04-25  7:08   ` [PATCH v2 04/13] object-store: move function declarations to their respective subsystems Patrick Steinhardt
2025-04-25  7:08   ` [PATCH v2 05/13] object-store: allow fetching objects via `has_object()` Patrick Steinhardt
2025-04-25  7:08   ` [PATCH v2 06/13] treewide: trivial conversions of `repo_has_object_file()` Patrick Steinhardt
2025-04-28 21:40     ` Junio C Hamano
2025-04-25  7:08   ` [PATCH v2 07/13] builtin/index-pack: don't fetch promised objects for collision check Patrick Steinhardt
2025-04-28 21:46     ` Junio C Hamano
2025-04-29  6:15       ` Patrick Steinhardt [this message]
2025-04-25  7:08   ` [PATCH v2 08/13] builtin/show-ref: don't fetch objects when printing refs Patrick Steinhardt
2025-04-28 21:50     ` Junio C Hamano
2025-04-25  7:09   ` [PATCH v2 09/13] refs: don't fetch promisor objects in `ref_resolves_to_object()` Patrick Steinhardt
2025-04-28 21:53     ` Junio C Hamano
2025-04-25  7:09   ` [PATCH v2 10/13] http-walker: don't fetch objects via promisor remotes Patrick Steinhardt
2025-04-28 21:56     ` Junio C Hamano
2025-04-25  7:09   ` [PATCH v2 11/13] list-objects: clarify how promised blobs are excluded Patrick Steinhardt
2025-04-25  7:09   ` [PATCH v2 12/13] bulk-checkin: don't fetch promised objects on write Patrick Steinhardt
2025-04-28 22:07     ` Junio C Hamano
2025-04-29  6:15       ` Patrick Steinhardt
2025-04-29 15:25         ` Junio C Hamano
2025-04-25  7:09   ` [PATCH v2 13/13] object-store: drop `repo_has_object_file()` Patrick Steinhardt
2025-04-28 19:49   ` [PATCH v2 00/13] object-store: a handful of cleanups Karthik Nayak
2025-04-29  7:52 ` [PATCH v3 0/7] " Patrick Steinhardt
2025-04-29  7:52   ` [PATCH v3 1/7] object-store: move `struct packed_git` into "packfile.h" Patrick Steinhardt
2025-04-29  7:52   ` [PATCH v3 2/7] object-store: drop `loose_object_path()` Patrick Steinhardt
2025-04-29  7:52   ` [PATCH v3 3/7] object-store: move and rename `odb_pack_keep()` Patrick Steinhardt
2025-04-29  7:52   ` [PATCH v3 4/7] object-store: move function declarations to their respective subsystems Patrick Steinhardt
2025-04-29  7:52   ` [PATCH v3 5/7] object-store: allow fetching objects via `has_object()` Patrick Steinhardt
2025-04-29  7:52   ` [PATCH v3 6/7] treewide: convert users of `repo_has_object_file()` to `has_object()` Patrick Steinhardt
2025-04-29  7:52   ` [PATCH v3 7/7] object-store: drop `repo_has_object_file()` Patrick Steinhardt
2025-04-29 20:07   ` [PATCH v3 0/7] object-store: a handful of cleanups Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBBumhDhWoR9LEb3@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=karthik.188@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).