From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Subject: [NOTES 08/11] Resumable fetch / push
Date: Mon, 6 Oct 2025 15:20:12 -0400 [thread overview]
Message-ID: <aOQWbHzstGKiPUnc@nand.local> (raw)
In-Reply-To: <aOQVeVYY6zadPjln@nand.local>
Topic: Resumable fetch/push
Leader: Caleb (was Scott, but he's not here)
* Is this only client side or server side too?
* Applies to both as GitButler has a forge too. Would be nice to have protocol
improvements.
* Both bundle-uris and packfile-uris exist and at least packfile-uris are
resumable. Both are fetch-only, so push is unsolved.
* Could use single-threaded output or server-side caching to make pushing work.
* Maybe make it so servers could receive a bundle and make that resumable.
* Use cases: Pushing a repo for the first time to a new server, once there's
good large file support, android/chromium. Also a problem that's independent
of size in environments with poor connectivity (some countries, Caltrain, …).
* Servers could hand out some kind of opaque data with the fetch to indicate
what it has cached, clients can re-share that when attempting to resume and
the server can choose to do something with it or not.
* GitHub support has told people to create a branch with N commits at a time to
fetch.
Scrambly notes (Jack's notes):
* Specific Forge implementation, http based communication -> easier to set up,
keen on improvement to protocol that allows large pack files sent between
client and server
* For packfile uris at least the pack file part that is in the uri is already
reasonable, for bundle url's may not be the same, might be low handing fruit
* Taylor: push side more interesting: server -> already sent you first m bytes
of x, need something to send the resumable push
* Consider implications as an attack vector
* Brian: git's pack implementation is deterministic if you don't do
multithreading, could use returnable mode like gzip has unsyncable mode, for
client side pack a temporary file, this is resumable with an offset, and since
pack is cached locally should be something you could resume with push. Some
possibilities if we cache on the server side or use single threaded output
* an idea from pack file ui which could help solve fetch problem, server provide
url to the client, let the server be the fetcher
* Emily: that would work pretty ok using a commit cloud server, already serving
those objects. The server side can resume necessarily.
* Servers don't receive bundles, so would be adding support for server to
receive bundles. What's the real use case for this? It's worth it's own
protocol, not just a push protocol. When we try to mirror things in Gerrit it
fails due to large number of refs - would need an enhancement to handle large
numbers of refs.
* Caleb: So you suggest some sort of TCP protocol for handling these transfers?
* We have user stored binary and timeout uploading to server, it's not just
migration path
* Having some way of guaranteeing forward progress on a push or a pull as long
as you can get some smaller unit of data transfer, don't know how small to go,
but would be very useful
* We talked about chunk format before, would introducing chunk format, small
enough chunks help?
* If it's small enough and reproducible
* Elijah: Even if you have small chunks, if they are part of the same
communication, if they're small enough you'll need to restart it
* If you have to resume now say you have sent X chunks then you have N - X left
* Peff: All you need to know is the byte offset.
* Elijah: Take the objects that you have received and say "I have these objects"
* What if you hash what you got, "I asked for this", the hash was this length,
give me the rest
* Peff: Has to be able to regenerate everything from scratch, are you caching
it? Kindof wasteful
* Doesn't need to be cached, just needs to be stable, so if there was a way to
ask for it in a specific order
* Disable multithreading
* Peff: Looked into this with resumable clones, server can pass out some cache
tag, here's an opaque tag that may or may not be valid in the future, I got X
bytes of this tag can you send the rest. Becomes a heuristic on the server
"I'll choose how much to cache", git doesn't need to know about that it's an
implementation issue
* With a pack file uri you stop what you're doing talking to the server
* If you were trying to brute force it today, you would brute force sending a
ref
* Peff: GitHub support has told people to do that
next prev parent reply other threads:[~2025-10-06 19:20 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-06 19:16 Notes from the Git Contributor's Summit, 2025 Taylor Blau
2025-10-06 19:18 ` [NOTES 01/11] SHA-256 and interoperability work Taylor Blau
2025-10-06 19:18 ` [NOTES 02/11] First-class conflicts in Git? Taylor Blau
2025-10-06 19:18 ` [NOTES 03/11] The future of history rewriting - rebase, replay and history (+Change-IDs) Taylor Blau
2025-10-06 19:18 ` [NOTES 04/11] Rust Taylor Blau
2025-10-06 19:19 ` [NOTES 05/11] Pluggable object databases Taylor Blau
2025-10-06 19:19 ` [NOTES 06/11] Repository maintenance long-term goals Taylor Blau
2025-10-06 19:19 ` [NOTES 07/11] Change-ID Header in Git Taylor Blau
2025-10-06 19:20 ` Taylor Blau [this message]
2025-10-06 19:20 ` [NOTES 09/11] Git 3.0 Taylor Blau
2025-10-06 19:20 ` [NOTES 10/11] How can companies respectfully engage contractors to work on Git? Taylor Blau
2025-10-06 19:20 ` [NOTES 11/11] Conservancy 2025 updates Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aOQWbHzstGKiPUnc@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).