git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Maaartin <grajcar1@seznam.cz>
To: git@vger.kernel.org
Subject: Re: Resumable clone/Gittorrent (again)
Date: Wed, 5 Jan 2011 23:28:11 +0000 (UTC)	[thread overview]
Message-ID: <loom.20110105T222915-261@post.gmane.org> (raw)
In-Reply-To: AANLkTinUV9Z_w85Gz13J+bm8xqnxJ9jBJXJm9bn5Y2ec@mail.gmail.com

Nguyen Thai Ngoc Duy <pclouds <at> gmail.com> writes:

> I've been analyzing bittorrent protocol and come up with this. The
> last idea about a similar thing [1], gittorrent, was given by Nicolas.
> This keeps close to that idea (i.e the transfer protocol must be around git
> objects, not file chunks) with a bit difference.
>
> The idea is to transfer a chain of objects (trees or blobs), including
> base object and delta chain. Objects are chained in according to
> worktree layout, e.g. all objects of path/to/any/blob will form a
> chain, from a commit tip down to the root commits. Chains can have
> gaps, and don't need to start from commit tip. The transfer is
> resumable because if a delta chain is corrupt at some point, we can
> just request another chain from where it stops. Base object is
> obviously resumable.

I may be talking nonsense, please bare with me.

I'm not sure if it works well, since chains defined this way change over time. 
I may request commits A and B while declaring to possess commits C and D. One 
server may be ahead of A, so should it send me more data or repack the chain so 
that the non-requested versions get excluded? At the same time the server may 
be missing B and posses only some ancestors of it. Should it send me only a 
part of the chain or should I better ask a different server?

Moreover, in case a directory gets renamed, the content may get transfered 
needlessly. This is probably no big problem.

I haven't read the whole other thread yet, but what about going the other way 
round? Use a single commit as a chain, create deltas assuming that all 
ancestors are already available. The packs may arrive out of order, so the 
decompression may have to wait. The number of commits may be one order of 
magnitude larger than the the number of paths (there are currently 2254 paths 
and 24235 commits in git.git), so grouping consequent commits into one larger 
pack may be useful.

The advantage is that the packs stays stable over time, you may create them 
using the most aggressive and time-consuming settings and store them forever. 
You could create packs for single commits, packs for non-overlapping 
consecutive pairs of them, for non-overlapping pairs of pairs, etc. I mean with 
commits numbered 0, 1, 2, ... create packs [0,1], [2,3], ..., [0,3], [4,7], 
etc. The reason for this is obviously to allow reading groups of commits from 
different servers so that they fit together (similar to Buddy memory 
allocation). Of course, there are things like branches bringing chaos in this 
simple scheme, but I'm sure this can be solved somehow.

Another problem is the client requesting commits A and B while declaring to 
possess commits C and D. When both C and D are ancestors of either A or B, you 
can ignore it (as you assume this while packing, anyway). The other case is 
less probable, unless e.g. C is the master and A is a developing branch. 
Currently. I've no idea how to optimize this and whether this could be 
important.

I see no disadvantage when compared to path-based chains, but am probably 
overlooking something obvious.

  parent reply	other threads:[~2011-01-05 23:28 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-05 16:23 Resumable clone/Gittorrent (again) Nguyen Thai Ngoc Duy
2011-01-05 16:56 ` Luke Kenneth Casson Leighton
2011-01-05 17:13   ` Thomas Rast
2011-01-05 18:07     ` Luke Kenneth Casson Leighton
2011-01-06  1:47       ` Nguyen Thai Ngoc Duy
2011-01-06 17:50         ` Luke Kenneth Casson Leighton
2011-01-05 23:28 ` Maaartin [this message]
2011-01-06  1:32   ` Nguyen Thai Ngoc Duy
2011-01-06  3:34     ` Maaartin-1
2011-01-06  6:36       ` Nguyen Thai Ngoc Duy
2011-01-08  1:04         ` Maaartin-1
2011-01-08  2:40           ` Nguyen Thai Ngoc Duy
2011-01-07  3:21 ` Nicolas Pitre
2011-01-07  6:34   ` Nguyen Thai Ngoc Duy
2011-01-07 15:59   ` Luke Kenneth Casson Leighton
2011-01-08  2:17     ` Nguyen Thai Ngoc Duy
2011-01-08 17:21       ` Luke Kenneth Casson Leighton
2011-01-09  3:34         ` Nguyen Thai Ngoc Duy
2011-01-09 13:55           ` Luke Kenneth Casson Leighton
2011-01-09 17:48             ` Nguyen Thai Ngoc Duy
2011-01-13 11:39               ` Luke Kenneth Casson Leighton
2011-01-13 23:40                 ` Sam Vilain
2011-01-14 14:26                   ` Luke Kenneth Casson Leighton
2011-01-16  2:11                     ` Sam Vilain
2011-01-10 21:38         ` Sam Vilain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=loom.20110105T222915-261@post.gmane.org \
    --to=grajcar1@seznam.cz \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).