git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Toon Claes <toon@iotcl.com>
Cc: git@vger.kernel.org, gitster@pobox.com
Subject: Re: [PATCH v2 3/3] fetch: use bundle URIs when having creationToken heuristic
Date: Fri, 26 Jul 2024 14:50:41 +0200	[thread overview]
Message-ID: <ZqObobw8FsDMkllm@tanuki> (raw)
In-Reply-To: <20240724144957.3033840-4-toon@iotcl.com>

[-- Attachment #1: Type: text/plain, Size: 2769 bytes --]

On Wed, Jul 24, 2024 at 04:49:57PM +0200, Toon Claes wrote:
> One way to achieve this is possible when the "creationToken" heuristic
> is used for bundle URIs. We attempt to download and unbundle the minimum
> number of bundles by creationToken in decreasing order. If we fail to
> unbundle (after a successful download) then move to the next
> non-downloaded bundle and attempt downloading. Once we succeed in
> applying a bundle, move to the previous unapplied bundle and attempt to
> unbundle it again. At the end the highest applied creationToken is
> written to `fetch.bundleCreationToken` in the git-config. The next time
> bundles are advertised by the server, bundles with a lower creationToken
> value are ignored. This was already implemented by
> 7903efb717 (bundle-uri: download in creationToken order, 2023-01-31) in
> fetch_bundles_by_token().

I think Junio essentially asked this already, but I'm still missing the
bigger picture here. When the "creationToken" heuristic is applied, the
effect of your change is that we'll always favor bundle URIs now over
performing proper fetches, right?

Now suppose that the server creates new bundled whenever somebody pushes
a new change to the default branch. We do not really have information
how this bundle is structured. It _could_ be an incremental bundle, and
in that case it might be sensible to fetch that bundle. But it could
also be that the server generates a full bundle including all objects
transitively reachable from that default branch. Now if we started to
rely on the "creationToken" heuristic, we would basically end up
re-downloading the complete repository, which is a strict regression.

Now that scenario is of course hypothetical. But the problem is that the
strategy for how bundle URIs are generated are determined by the hosting
provider. So ultimately, I expect that the reality will lie somewhere in
between and be different depending on which hosting solution you use.

All of this to me means that the "creationToken" heuristic is not really
a good signal, unless I'm missing something about the way it works. Is
there any additional signal provided by the server except for the time
when the bundle was created? If so, is that information sufficient to
determine whether it makes sense for a client to fetch a bundle instead
of performing a "proper" fetch? If not, what is the additional info that
we would need to make this schema work properly?

So unless I'm missing something, I feel like we need to think bigger and
design a heuristic that gives us the information needed. Without such a
heuristic, default-enabling may or may not do the right thing, and we
have no way to really argue whether it will do as we now depend on
server operators to do the right thing.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2024-07-26 12:51 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-22  8:07 [PATCH] fetch: use bundle URIs when having creationToken heuristic Toon Claes
2024-07-22 18:40 ` Junio C Hamano
2024-07-24 14:49 ` [PATCH v2 0/3] " Toon Claes
2024-07-24 14:49   ` [PATCH v2 1/3] clone: remove double bundle list clear code Toon Claes
2024-07-26  8:51     ` Karthik Nayak
2024-07-26 21:52     ` Justin Tobler
2024-08-02 15:45       ` Toon claes
2024-07-24 14:49   ` [PATCH v2 2/3] transport: introduce transport_has_remote_bundle_uri() Toon Claes
2024-07-26  8:58     ` Karthik Nayak
2024-07-26 15:25       ` Junio C Hamano
2024-07-24 14:49   ` [PATCH v2 3/3] fetch: use bundle URIs when having creationToken heuristic Toon Claes
2024-07-26  9:06     ` Karthik Nayak
2024-07-26 12:50     ` Patrick Steinhardt [this message]
2024-08-02 13:46       ` Toon claes
2024-08-22  7:12         ` Patrick Steinhardt
2024-09-27  9:04 ` [PATCH] builtin/clone: teach git-clone(1) the --ref= argument Toon Claes
2024-09-27  9:04   ` [PATCH v2] " Toon Claes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZqObobw8FsDMkllm@tanuki \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=toon@iotcl.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).