From: Jeff King <peff@peff.net>
To: Duy Nguyen <pclouds@gmail.com>
Cc: "zhifeng hu" <zf@ancientrocklab.com>,
"Karsten Blees" <karsten.blees@gmail.com>,
"Trần Ngọc Quân" <vnwildman@gmail.com>,
"Git Mailing List" <git@vger.kernel.org>
Subject: Re: How to resume broke clone ?
Date: Thu, 28 Nov 2013 04:29:35 -0500 [thread overview]
Message-ID: <20131128092935.GC11444@sigill.intra.peff.net> (raw)
In-Reply-To: <CACsJy8AOVWF2HssWNeYkVvYdmAXJOQ8HOehxJ0wpBFchA87ZWw@mail.gmail.com>
On Thu, Nov 28, 2013 at 04:09:18PM +0700, Duy Nguyen wrote:
> > Git should be better support resume transfer.
> > It now seems not doing better it’s job.
> > Share code, manage code, transfer code, what would it be a VCS we imagine it ?
>
> You're welcome to step up and do it. On top of my head there are a few options:
>
> - better integration with git bundles, provide a way to seamlessly
> create/fetch/resume the bundles with "git clone" and "git fetch"
I posted patches for this last year. One of the things that I got hung
up on was that I spooled the bundle to disk, and then cloned from it.
Which meant that you needed twice the disk space for a moment. I wanted
to teach index-pack to "--fix-thin" a pack that was already on disk, so
that we could spool to disk, and then finalize it without making another
copy.
One of the downsides of this approach is that it requires the repo
provider (or somebody else) to provide the bundle. I think that is
something that a big site like GitHub would do (and probably push the
bundles out to a CDN, too, to make getting them faster). But it's not a
universal solution.
> - stablize pack order so we can resume downloading a pack
I think stabilizing in all cases (e.g., including ones where the content
has changed) is hard, but I wonder if it would be enough to handle the
easy cases, where nothing has changed. If the server does not use
multiple threads for delta computation, it should generate the same pack
from the same on-disk deterministically. We just need a way for the
client to indicate that it has the same partial pack.
I'm thinking that the server would report some opaque hash representing
the current pack. The client would record that, along with the number of
pack bytes it received. If the transfer is interrupted, the client comes
back with the hash/bytes pair. The server starts to generate the pack,
checks whether the hash matches, and if so, says "here is the same pack,
resuming at byte X".
What would need to go into such a hash? It would need to represent the
exact bytes that will go into the pack, but without actually generating
those bytes. Perhaps a sha1 over the sequence of <object sha1, type,
base (if applicable), length> for each object would be enough. We should
know that after calling compute_write_order. If the client has a match,
we should be able to skip ahead to the correct byte.
> - remote alternates, the repo will ask for more and more objects as
> you need them (so goodbye to distributed model)
This is also something I've been playing with, but just for very large
objects (so to support something like git-media, but below the object
graph layer). I don't think it would apply here, as the kernel has a lot
of small objects, and getting them in the tight delta'd pack format
increases efficiency a lot.
-Peff
next prev parent reply other threads:[~2013-11-28 9:29 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-28 3:13 How to resume broke clone ? zhifeng hu
2013-11-28 7:39 ` Trần Ngọc Quân
2013-11-28 7:41 ` zhifeng hu
2013-11-28 8:14 ` Duy Nguyen
2013-11-28 8:35 ` Karsten Blees
2013-11-28 8:50 ` Duy Nguyen
2013-11-28 8:55 ` zhifeng hu
2013-11-28 9:09 ` Duy Nguyen
2013-11-28 9:29 ` Jeff King [this message]
2013-11-28 10:17 ` Duy Nguyen
2013-11-28 19:15 ` Shawn Pearce
2013-12-04 20:08 ` Jeff King
2013-12-05 6:50 ` Shawn Pearce
2013-12-05 13:21 ` Michael Haggerty
2013-12-05 15:11 ` Shawn Pearce
2013-12-05 16:12 ` Jeff King
2013-12-05 16:04 ` Jeff King
2013-12-05 18:01 ` Junio C Hamano
2013-12-05 19:08 ` Jeff King
2013-12-05 20:28 ` [PATCH] pack-objects: name pack files after trailer hash Jeff King
2013-12-05 21:56 ` Shawn Pearce
2013-12-05 22:59 ` Junio C Hamano
2013-12-06 22:18 ` Jeff King
2013-12-16 7:41 ` Michael Haggerty
2013-12-16 19:04 ` Jeff King
2013-12-16 19:19 ` Jonathan Nieder
2013-12-16 19:28 ` Jeff King
2013-12-16 19:37 ` Junio C Hamano
2013-12-16 19:33 ` Junio C Hamano
2013-12-16 19:35 ` Jeff King
2013-11-28 9:20 ` How to resume broke clone ? Tay Ray Chuan
2013-11-28 9:29 ` zhifeng hu
2013-11-28 19:35 ` Shawn Pearce
2013-11-28 21:54 ` Jakub Narebski
-- strict thread matches above, loose matches on Subject: below --
2013-11-28 8:32 Max Kirillov
2013-11-28 9:12 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131128092935.GC11444@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=karsten.blees@gmail.com \
--cc=pclouds@gmail.com \
--cc=vnwildman@gmail.com \
--cc=zf@ancientrocklab.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).