From: Sam Vilain <sam@vilain.net>
To: Luke Kenneth Casson Leighton <luke.leighton@gmail.com>
Cc: Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
Nicolas Pitre <nico@fluxnic.net>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: Resumable clone/Gittorrent (again)
Date: Sun, 16 Jan 2011 15:11:36 +1300 [thread overview]
Message-ID: <4D3253D8.8030302@vilain.net> (raw)
In-Reply-To: <AANLkTi=8s4mjreC1yiJi4R5Y3G6_kErmBfk0B9ALcBO8@mail.gmail.com>
On 15/01/11 03:26, Luke Kenneth Casson Leighton wrote:
>>> and change that graph? are you _certain_ that you can write an
>>> algorithm which is capable of generating exactly the same mapping,
>>> even as more commits are added to the repository being mirrored, or,
>>> does that situation not matter?
>> For a given set of start and end points, and a given sort algorithm,
>> walking the commit tree can yield deterministic results.
> excellent. out of curiosity, is it as efficient as git pack-objects
> for the same start and end points?
That isn't a sensible question; walking the revision tree is something
that many commands, including git pack-objects, do internally.
>> Did you look at any of the previous research I linked to before?
> i've been following this since you first originally started it, sam
> :) it would have been be nice if it was a completed implementation
> that i could test and see "for real" what you're referring to (above)
> - the fact that it's in perl and has "TODO" at some of the critical
> points, after trying to work with it for several days i stopped and
> went "i'm not getting anywhere with this" and focussed on bittorrent
> "as a black box" instead.
>
> if i recall, the original gittorrent work that you did (mirror-sync),
> the primary aim was to rely solely and exclusively on a one-to-one
> direct link between one machine and another. in other words, whilst
> syncing, if that peer went "offline", you're screwed - you have to
> start again. is that a fair assessment? please do correct any
> assumptions that i've made.
Ok. Well, first off - I didn't start gittorrent; that was Jonas
Fonseca, it was his Masters thesis. Criticism about not having a
completed implementation to work with is therefore shared between him
and people who have come along since such as myself.
I don't know why you got the idea that the protocol is one to one. It's
one to one just like BitTorrent is - every communication is between two
nodes who share information about what they have and what they need,
before transferring data. It is supposed to be restartable and it is
not supposed to matter which node data is exchanged with. In that way,
you could in principle download from multiple nodes at once, or you
could have restartable transfers. If you lose connectivity then the
most that should have to be re-transferred are incomplete blocks.
> because on the basis _of_ that assumption, i decided not to proceed
> with mirror-sync, instead to pursue a "cache git pack-objects"
> approach and to use bittorrent "black-box-style". which i
> demonstrated (minus the cacheing) works perfectly well, several months
> back.
Right, but as others have noted, there are significant drawbacks with
this approach. For a start, to participate in such a network, you need
to get the particular exact pack that is currently being torrented; just
having a clone is not enough. This is because the result of git
pack-objects is not repeatable.
That being said for many projects that would be an acceptable compromise
for the advantages of a restartable clone. That is why I suggest that a
torrent transfer, treated as a mirror which is infrequently updated, may
be a better approach than trying to overly automate everything.
> as well, after nicolas and others went to all the trouble to explain
> what git pack-objects is, how it works, and how damn efficient it is,
> i'm pretty much convinced that an approach to uniquely identify, then
> pick and cache the *best* git pack-object made [by all the peers
> requested to provide a particular commit range], is the best, most
> efficient - and importantly simplest and easiest to understand -
> approach so far that i've heard. perhaps that's because i came up
> with it, i dunno :) but the important thing is that i can _show_ that
> it works (http://gitorious.org/python-libbittorrent/pybtlib - go back
> a few revisions)
That's great. If you want to continue this simple approach and ignore
the gittorrent/mirror-sync path altogether, that's fine too.
Trying to determine the "best" pack-object may be counter-productive.
Here's a simple approach which allows the repository owner to easily
arrange for efficient torrenting of essential object files:
Add to the .torrent manifest just these files:
.git/objects/pack/pack-*.pack - just the files with .keep files
.git/packed-refs - just the references which are completely available
via the .keep packs
In that way, a repository owner can periodically re-pack their repo,
mark the new pack files as .keep, then re-generate the .torrent file.
All nodes will just have to transfer the new packs, not everything.
> so - perhaps it would help if mirrorsync was revived, so that it can
> be used to demonstrate what you mean (there aren't any instructions on
> how to set up mirrorsync, for example). that would then allow people
> to do a comparative analysis of the approaches being taken.
Ok, that sounds like a good plan - I'll see if I can devote some time to
an explanatory series with working examples with reference to the
existing code etc over the coming months.
Cheers,
Sam
next prev parent reply other threads:[~2011-01-16 2:16 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-05 16:23 Resumable clone/Gittorrent (again) Nguyen Thai Ngoc Duy
2011-01-05 16:56 ` Luke Kenneth Casson Leighton
2011-01-05 17:13 ` Thomas Rast
2011-01-05 18:07 ` Luke Kenneth Casson Leighton
2011-01-06 1:47 ` Nguyen Thai Ngoc Duy
2011-01-06 17:50 ` Luke Kenneth Casson Leighton
2011-01-05 23:28 ` Maaartin
2011-01-06 1:32 ` Nguyen Thai Ngoc Duy
2011-01-06 3:34 ` Maaartin-1
2011-01-06 6:36 ` Nguyen Thai Ngoc Duy
2011-01-08 1:04 ` Maaartin-1
2011-01-08 2:40 ` Nguyen Thai Ngoc Duy
2011-01-07 3:21 ` Nicolas Pitre
2011-01-07 6:34 ` Nguyen Thai Ngoc Duy
2011-01-07 15:59 ` Luke Kenneth Casson Leighton
2011-01-08 2:17 ` Nguyen Thai Ngoc Duy
2011-01-08 17:21 ` Luke Kenneth Casson Leighton
2011-01-09 3:34 ` Nguyen Thai Ngoc Duy
2011-01-09 13:55 ` Luke Kenneth Casson Leighton
2011-01-09 17:48 ` Nguyen Thai Ngoc Duy
2011-01-13 11:39 ` Luke Kenneth Casson Leighton
2011-01-13 23:40 ` Sam Vilain
2011-01-14 14:26 ` Luke Kenneth Casson Leighton
2011-01-16 2:11 ` Sam Vilain [this message]
2011-01-10 21:38 ` Sam Vilain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D3253D8.8030302@vilain.net \
--to=sam@vilain.net \
--cc=git@vger.kernel.org \
--cc=luke.leighton@gmail.com \
--cc=nico@fluxnic.net \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).