From: Josh Triplett <josh@joshtriplett.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: Duy Nguyen <pclouds@gmail.com>, Al Viro <viro@zeniv.linux.org.uk>,
Stefan Beller <sbeller@google.com>,
"git@vger.kernel.org" <git@vger.kernel.org>,
sarah@thesharps.us
Subject: Re: Resumable git clone?
Date: Wed, 2 Mar 2016 08:41:18 -0800 [thread overview]
Message-ID: <20160302164118.GA13732@x> (raw)
In-Reply-To: <xmqq4mcp5lij.fsf@gitster.mtv.corp.google.com>
On Wed, Mar 02, 2016 at 12:31:16AM -0800, Junio C Hamano wrote:
> Josh Triplett <josh@joshtriplett.org> writes:
> > I think several simpler optimizations seem
> > preferable, such as binary object names, and abbreviating complete
> > object sets ("I have these commits/trees and everything they need
> > recursively; I also have this stack of random objects.").
>
> Given the way pack stream is organized (i.e. commits first and then
> trees and blobs that belong to the same delta chain together), and
> our assumed goal being to salvage objects from an interrupted
> transfer of a packfile, you are unlikely to ever see "I have these
> commits/trees and everything they need" that are salvaged from such
> a failed transfer. So I doubt such an optimization is worth doing.
True for the resumable clone case. For that optimization, I was
thinking of the "pull during the merge window" case that Al Viro was
also interested in optimizing.
> Besides it is very expensive to compute (the computation is done on
> the client side, so the cycles burned and the time the user has to
> wait is of much less concern, though); you'd essentially be doing
> "git fsck" to find the "dangling" objects.
Trading client-side computation for bandwidth can potentially be
worthwhile if you have plenty of local compute but a slow and metered
link.
> The list of what would be transferred needs to come in full from the
> server end, as the list names objects that the receiving end may not
> have seen, but the response by the client could be encoded much
> tightly. For the full list of N objects from the server, we can
> think of your response to be a bitstream of N bits, each on-bit in
> which signals an unwanted object in the list. You can optimize this
> transfer by RLE compressing the bitstream, for example.
>
> As git-over-HTTP is stateless, however, you cannot assume that the
> server side remembers what it sent to the client (instead, the
> client side needs to re-post what it heard from the server in the
> previous exchange to allow the server side to use it after
> validating). So "objects at these indices in your list" kind of
> optimization may not work very well in that environment. I'd
> imagine that an exchange of "Here are the list of objects", "Give me
> these objects" done naively in full 40-hex object names would work
> OK there, though.
Good point. Between statelessness and Duy's point about the client list
usually being smaller than the server list, perhaps it would make sense
to not have the server send a list at all, and just have the client send
its own list.
- Josh Triplett
next prev parent reply other threads:[~2016-03-02 16:41 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-02 1:30 Resumable git clone? Josh Triplett
2016-03-02 1:40 ` Stefan Beller
2016-03-02 2:30 ` Al Viro
2016-03-02 6:31 ` Junio C Hamano
2016-03-02 7:37 ` Duy Nguyen
2016-03-02 7:44 ` Duy Nguyen
2016-03-02 7:54 ` Josh Triplett
2016-03-02 8:31 ` Junio C Hamano
2016-03-02 9:28 ` Duy Nguyen
2016-03-02 16:41 ` Josh Triplett [this message]
2016-03-02 8:13 ` Josh Triplett
2016-03-02 8:22 ` Duy Nguyen
2016-03-02 8:32 ` Jeff King
2016-03-02 10:47 ` Bhavik Bavishi
2016-03-02 16:40 ` Josh Triplett
2016-03-02 8:14 ` Duy Nguyen
2016-03-02 1:45 ` Duy Nguyen
2016-03-02 8:41 ` Junio C Hamano
2016-03-02 15:51 ` Konstantin Ryabitsev
2016-03-02 16:49 ` Josh Triplett
2016-03-02 17:57 ` Junio C Hamano
2016-03-24 8:00 ` Philip Oakley
2016-03-24 15:53 ` Junio C Hamano
2016-03-24 21:08 ` Philip Oakley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160302164118.GA13732@x \
--to=josh@joshtriplett.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
--cc=sarah@thesharps.us \
--cc=sbeller@google.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.