git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Luke Kenneth Casson Leighton <luke.leighton@gmail.com>
To: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Cc: Nicolas Pitre <nico@fluxnic.net>, Git Mailing List <git@vger.kernel.org>
Subject: Re: Resumable clone/Gittorrent (again)
Date: Sun, 9 Jan 2011 13:55:04 +0000	[thread overview]
Message-ID: <AANLkTinwb8orMBjcQjK0ogXd6rMEtRwT8SV41k8D3AXL@mail.gmail.com> (raw)
In-Reply-To: <AANLkTi=KPVMEviQhyJeWHynPa2q6NJpQ2VyAhbRcmQ1D@mail.gmail.com>

On Sun, Jan 9, 2011 at 3:34 AM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> On Sun, Jan 9, 2011 at 12:21 AM, Luke Kenneth Casson Leighton
> <luke.leighton@gmail.com> wrote:
>>  ok - you haven't answered the question: are the chains perfectly
>> fixed identical sizes?
>
> No.
>
>>  if so they can be slotted into the bittorrent protocol by simply
>> pre-selecting the size to match.  with the downside that if there are
>> a million such "chains" you now pretty much overwhelm the peers with
>> the amount of processing, network traffic and memory requirements to
>> maintain the "pieces" map.
>
> No, there are thousands of them only (less than 100k for repos I
> examined). It's precisely the reason I stay away from commits as
> pieces because commits can potentially go up to millions.

 ok - thousands is still a lot.  i recommend that you examine:

 * the heuristics algorithm in bittorrent for piece-selection
 * large repositories such as webkit (1.2gb) and the linux kernel (600mb)

 you still have to come up with a mapping from "chains" to "pieces".
in the bittorrent protocol the mapping is done *entirely* implicitly
and algorithmically.  the "meta" info in the .torrent contains
filenames and file lengths.  stack the files one after the other in a
big long data block, get a chopper and just go "whack, whack, whack"
at regular piece-long points, that's your "pieces".  so, reassembly is
a complete bitch, and picking just _one_ file to download rather than
the whole lot becomes a total pain.

why the bloody hell the bittorrent protocol doesn't just have a file
id i _really_ don't know, it would have made things a damn sight
easier.  anyway - if you're going to modify and "be inspired by" the
bittorrent protocol, you really should look at adding some sort of
"chain" identification - f*** the "chains"-to-"pieces" algorithm, just
add a unique chain id to the relevant bittorrent[-like] command.


>>  if not then you now need to modify the bittorrent protocol to cope
>> with variable-length block sizes: the protocol only allows for the
>> last block to be of variable-length.
>
> Ah I see. I do not reuse bittorrent code out there. Just its ideas,
> adapted to git model.

 that's hard work and you're now into "unproven" territory.  the
successful R&D proof-of-concept code that i wrote i _deliberately_
stayed away from "adapting" a proven bittorrent protocol, and as a
result managed to get that proof-of-concept up and running within ...
i think it was... 3 days.  most of the time was spent arseing about
adding in a VFS layer into bittornado, in order to libratise it.

i mention that just to give you something to think about.  if you're
up to the challenge of writing your own p2p protocol, however, GREAT!
you'll become a world expert on _both_ peer-to-peer protocols _and_
git :)

 l.

  reply	other threads:[~2011-01-09 13:55 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-05 16:23 Resumable clone/Gittorrent (again) Nguyen Thai Ngoc Duy
2011-01-05 16:56 ` Luke Kenneth Casson Leighton
2011-01-05 17:13   ` Thomas Rast
2011-01-05 18:07     ` Luke Kenneth Casson Leighton
2011-01-06  1:47       ` Nguyen Thai Ngoc Duy
2011-01-06 17:50         ` Luke Kenneth Casson Leighton
2011-01-05 23:28 ` Maaartin
2011-01-06  1:32   ` Nguyen Thai Ngoc Duy
2011-01-06  3:34     ` Maaartin-1
2011-01-06  6:36       ` Nguyen Thai Ngoc Duy
2011-01-08  1:04         ` Maaartin-1
2011-01-08  2:40           ` Nguyen Thai Ngoc Duy
2011-01-07  3:21 ` Nicolas Pitre
2011-01-07  6:34   ` Nguyen Thai Ngoc Duy
2011-01-07 15:59   ` Luke Kenneth Casson Leighton
2011-01-08  2:17     ` Nguyen Thai Ngoc Duy
2011-01-08 17:21       ` Luke Kenneth Casson Leighton
2011-01-09  3:34         ` Nguyen Thai Ngoc Duy
2011-01-09 13:55           ` Luke Kenneth Casson Leighton [this message]
2011-01-09 17:48             ` Nguyen Thai Ngoc Duy
2011-01-13 11:39               ` Luke Kenneth Casson Leighton
2011-01-13 23:40                 ` Sam Vilain
2011-01-14 14:26                   ` Luke Kenneth Casson Leighton
2011-01-16  2:11                     ` Sam Vilain
2011-01-10 21:38         ` Sam Vilain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTinwb8orMBjcQjK0ogXd6rMEtRwT8SV41k8D3AXL@mail.gmail.com \
    --to=luke.leighton@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=nico@fluxnic.net \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).