git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Subject: [NOTES 08/11] Resumable fetch / push
Date: Mon, 6 Oct 2025 15:20:12 -0400	[thread overview]
Message-ID: <aOQWbHzstGKiPUnc@nand.local> (raw)
In-Reply-To: <aOQVeVYY6zadPjln@nand.local>

Topic: Resumable fetch/push
Leader: Caleb (was Scott, but he's not here)

* Is this only client side or server side too?
	* Applies to both as GitButler has a forge too. Would be nice to have protocol
		improvements.
* Both bundle-uris and packfile-uris exist and at least packfile-uris are
	resumable. Both are fetch-only, so push is unsolved.
* Could use single-threaded output or server-side caching to make pushing work.
* Maybe make it so servers could receive a bundle and make that resumable.
* Use cases: Pushing a repo for the first time to a new server, once there's
	good large file support, android/chromium. Also a problem that's independent
	of size in environments with poor connectivity (some countries, Caltrain, …).
* Servers could hand out some kind of opaque data with the fetch to indicate
	what it has cached, clients can re-share that when attempting to resume and
	the server can choose to do something with it or not.
* GitHub support has told people to create a branch with N commits at a time to
	fetch.


Scrambly notes (Jack's notes):


* Specific Forge implementation, http based communication -> easier to set up,
	keen on improvement to protocol that allows large pack files sent between
	client and server
* For packfile uris at least the pack file part that is in the uri is already
	reasonable, for bundle url's may not be the same, might be low handing fruit
* Taylor: push side more interesting: server -> already sent you first m bytes
	of x, need something to send the resumable push
* Consider implications as an attack vector
* Brian: git's pack implementation is deterministic if you don't do
	multithreading, could use returnable mode like gzip has unsyncable mode, for
	client side pack a temporary file, this is resumable with an offset, and since
	pack is cached locally should be something you could resume with push. Some
	possibilities if we cache on the server side or use single threaded output
* an idea from pack file ui which could help solve fetch problem, server provide
	url to the client, let the server be the fetcher
* Emily: that would work pretty ok using a commit cloud server, already serving
	those objects. The server side can resume necessarily.
* Servers don't receive bundles, so would be adding support for server to
	receive bundles. What's the real use case for this? It's worth it's own
	protocol, not just a push protocol. When we try to mirror things in Gerrit it
	fails due to large number of refs - would need an enhancement to handle large
	numbers of refs.
* Caleb: So you suggest some sort of TCP protocol for handling these transfers?
* We have user stored binary and timeout uploading to server, it's not just
	migration path
* Having some way of guaranteeing forward progress on a push or a pull as long
	as you can get some smaller unit of data transfer, don't know how small to go,
	but would be very useful
* We talked about chunk format before, would introducing chunk format, small
	enough chunks help?
* If it's small enough and reproducible
* Elijah: Even if you have small chunks, if they are part of the same
	communication, if they're small enough you'll need to restart it
* If you have to resume now say you have sent X chunks then you have N - X left
* Peff: All you need to know is the byte offset.
* Elijah: Take the objects that you have received and say "I have these objects"
* What if you hash what you got, "I asked for this", the hash was this length,
	give me the rest
* Peff: Has to be able to regenerate everything from scratch, are you caching
	it? Kindof wasteful
* Doesn't need to be cached, just needs to be stable, so if there was a way to
	ask for it in a specific order
* Disable multithreading


* Peff: Looked into this with resumable clones, server can pass out some cache
	tag, here's an opaque tag that may or may not be valid in the future, I got X
	bytes of this tag can you send the rest. Becomes a heuristic on the server
	"I'll choose how much to cache", git doesn't need to know about that it's an
	implementation issue
* With a pack file uri you stop what you're doing talking to the server
* If you were trying to brute force it today, you would brute force sending a
	ref
* Peff: GitHub support has told people to do that

  parent reply	other threads:[~2025-10-06 19:20 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-06 19:16 Notes from the Git Contributor's Summit, 2025 Taylor Blau
2025-10-06 19:18 ` [NOTES 01/11] SHA-256 and interoperability work Taylor Blau
2025-10-06 19:18 ` [NOTES 02/11] First-class conflicts in Git? Taylor Blau
2025-10-06 19:18 ` [NOTES 03/11] The future of history rewriting - rebase, replay and history (+Change-IDs) Taylor Blau
2025-10-06 19:18 ` [NOTES 04/11] Rust Taylor Blau
2025-10-06 19:19 ` [NOTES 05/11] Pluggable object databases Taylor Blau
2025-10-06 19:19 ` [NOTES 06/11] Repository maintenance long-term goals Taylor Blau
2025-10-06 19:19 ` [NOTES 07/11] Change-ID Header in Git Taylor Blau
2025-10-06 19:20 ` Taylor Blau [this message]
2025-10-06 19:20 ` [NOTES 09/11] Git 3.0 Taylor Blau
2025-10-06 19:20 ` [NOTES 10/11] How can companies respectfully engage contractors to work on Git? Taylor Blau
2025-10-06 19:20 ` [NOTES 11/11] Conservancy 2025 updates Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aOQWbHzstGKiPUnc@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).