git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kyle Moffett <kyle@moffetthome.net>
To: Theodore Tso <tytso@mit.edu>
Cc: Nicolas Pitre <nico@fluxnic.net>,
	Junio C Hamano <gitster@pobox.com>,
	Luke Kenneth Casson Leighton <luke.leighton@gmail.com>,
	git <git@vger.kernel.org>
Subject: Re: git pack/unpack over bittorrent - works!
Date: Sat, 4 Sep 2010 01:23:54 -0400	[thread overview]
Message-ID: <AANLkTikVf=X8cLP9s6W9VGOt0EHE4J5MYsBpgKYhrAri@mail.gmail.com> (raw)
In-Reply-To: <04755B03-EE1D-48FA-8894-33AA8E2661C0@mit.edu>

Ted,

I think your "canonical pack" idea has value, but I'd be inclined to
try to optimize more for the "common case" of developing on a fast
local network with many local checkouts, where you occasionally
push/fetch external sources via a slow link.

Specifically, let's look at the very reasonable scenario of a
developer working over a slow DSL or dialup connection.  He's probably
got many copies of various GIT repositories cloned all over the place
(hey, disk is cheap!), but right now he just wants a fresh clean copy
of somebody else's new tree with whatever its 3 feature branches are.
Furthermore, he's probably even got 80% of the commit objects from
that tree archived in his last clone from linux-next.

In theory he could very carefully arrange his repositories with
judicious use of alternate object directories.  From personal
experience, though, such arrangements are *VERY* prone to accidentally
purging wanted objects; unless you *never* ever delete a branch in the
"reference" repository.

So I think the real problem to solve would be:  Given a collection of
local computers each with many local repositories, what is the best
way to optimize a clone of a "new" remote repository (over a slow
link) by copying most of the data from other local repositories
accessible via a fast link?

The goal would be to design a P2P protocol capable of rapidly and
efficiently building distributed searchable indexes of ordered commits
that identify which peer(s) contain that each commit.

When you attempt to perform a "git fetch --peer" from a repository, it
would quickly connect to a few of the metadata index nodes in the P2P
network and use them to negotiate "have"s with the upstream server.
The client would then sequentially perform the local "fetch"
operations necessary to obtain all the objects it used to minimize the
commit range with the server.  Once all of those "fetch" operations
completed, it could proceed to fetch objects from the server normally.

Some amount of design and benchmarking would need to be done in order
to figure out the most efficient indexing algorithm for finding a
minimal set of "have"s of potentially thousands of refs, many with
independent root commits.  For example if the index was grouped
according to "root commit" (of which there may be more than one), you
*should* be able to quickly ask the server about a small list of root
commits and then only continue asking about commits whose roots are
all known to the server.

The actual P2P software would probably involve 2 different daemon
processes.  The first would communicate with each other and with the
repositories, maintaining the ref and commit indexes.  These daemons
would advertise themselves with Avahi, or alternatively in an
enterprise environment they would be managed by your sysadmins and be
automatically discovered using DNS-SD.  Clients looking to perform a
P2P fetch would first ask these.

The second daemon would be a modified git-daemon that connects to the
advertised "index" daemons and advertises its own refs and commit
lists, as well as its IP address and port.

My apologies if there are any blatant typos or thinkos, it's a bit
later here than I would normally be writing about technical topics.

Cheers,
Kyle Moffett

  reply	other threads:[~2010-09-04  5:24 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-01 14:36 git pack/unpack over bittorrent - works! Luke Kenneth Casson Leighton
2010-09-01 22:04 ` Nguyen Thai Ngoc Duy
2010-09-02 13:37   ` Luke Kenneth Casson Leighton
2010-09-02 13:53     ` Luke Kenneth Casson Leighton
2010-09-02 14:08     ` Ævar Arnfjörð Bjarmason
2010-09-02 15:33     ` A Large Angry SCM
2010-09-02 15:42       ` Luke Kenneth Casson Leighton
2010-09-02 15:51         ` Luke Kenneth Casson Leighton
2010-09-02 17:06           ` A Large Angry SCM
2010-09-02 15:58         ` Jeff King
2010-09-02 16:41           ` Nicolas Pitre
2010-09-02 17:09             ` A Large Angry SCM
2010-09-02 17:31               ` Nicolas Pitre
2010-09-02 19:17                 ` Luke Kenneth Casson Leighton
2010-09-02 19:29                   ` Shawn O. Pearce
2010-09-02 19:51                     ` Luke Kenneth Casson Leighton
2010-09-02 20:06                     ` Luke Kenneth Casson Leighton
2010-09-03  0:36                       ` Nicolas Pitre
2010-09-03 10:34                         ` Luke Kenneth Casson Leighton
2010-09-03 17:03                         ` Junio C Hamano
2010-09-02 20:28                     ` Brandon Casey
2010-09-02 20:48                       ` Luke Kenneth Casson Leighton
2010-09-02 20:45                     ` Jakub Narebski
2010-09-02 21:10                       ` Luke Kenneth Casson Leighton
2010-09-02 21:19                         ` Luke Kenneth Casson Leighton
2010-09-03  0:29                         ` Nicolas Pitre
2010-09-03  2:48                           ` Nguyen Thai Ngoc Duy
2010-09-03 10:55                             ` Luke Kenneth Casson Leighton
2010-09-03 10:23                           ` Luke Kenneth Casson Leighton
2010-09-03 10:54                           ` Luke Kenneth Casson Leighton
2010-09-02 18:07           ` Luke Kenneth Casson Leighton
2010-09-02 18:23             ` Casey Dahlin
2010-09-02 16:58         ` A Large Angry SCM
2010-09-02 17:21         ` Nicolas Pitre
2010-09-02 19:41           ` Luke Kenneth Casson Leighton
2010-09-02 19:52             ` A Large Angry SCM
2010-09-02 23:09             ` Nicolas Pitre
2010-09-03 10:37               ` Theodore Tso
2010-09-03 11:04                 ` Luke Kenneth Casson Leighton
2010-09-03 17:12                 ` Junio C Hamano
2010-09-03 18:31                   ` Ted Ts'o
2010-09-03 19:41                     ` Nicolas Pitre
2010-09-03 21:11                       ` Luke Kenneth Casson Leighton
2010-09-04  0:24                         ` Nguyen Thai Ngoc Duy
2010-09-04  0:57                           ` Nguyen Thai Ngoc Duy
2010-09-04  1:52                           ` Artur Skawina
2010-09-04  4:39                             ` Nicolas Pitre
2010-09-04  5:42                               ` Artur Skawina
2010-09-04  6:13                                 ` Nicolas Pitre
2010-09-04 11:58                                   ` Luke Kenneth Casson Leighton
2010-09-04 13:14                                     ` Luke Kenneth Casson Leighton
2010-09-05  2:16                                       ` Nicolas Pitre
2010-09-05 18:05                                         ` Luke Kenneth Casson Leighton
2010-09-05 23:52                                           ` Nicolas Pitre
2010-09-06 13:23                                             ` Luke Kenneth Casson Leighton
2010-09-06 16:51                                               ` Nicolas Pitre
2010-09-06 22:33                                                 ` Luke Kenneth Casson Leighton
2010-09-06 23:34                                                 ` Junio C Hamano
2010-09-06 23:57                                                   ` Nicolas Pitre
2010-09-07  0:17                                                     ` Luke Kenneth Casson Leighton
2010-09-07  0:29                                                     ` Luke Kenneth Casson Leighton
2010-09-04 13:42                                   ` Artur Skawina
     [not found]                                     ` <20100904155638.GA17606@pcpool00.mathematik.uni-freiburg.de>
2010-09-04 17:23                                       ` Artur Skawina
2010-09-04 18:46                                       ` Artur Skawina
2010-09-04  1:57                       ` Theodore Tso
2010-09-04  5:23                         ` Kyle Moffett [this message]
2010-09-04 11:46                           ` Theodore Tso
2010-09-04 14:06                           ` Luke Kenneth Casson Leighton
2010-09-05  1:32                             ` Nicolas Pitre
2010-09-05 17:16                               ` Luke Kenneth Casson Leighton
2010-09-04  5:40                         ` Nicolas Pitre
2010-09-04 12:00                           ` Theodore Tso
2010-09-04 12:44                             ` Luke Kenneth Casson Leighton
2010-09-04 14:50                             ` Luke Kenneth Casson Leighton
2010-09-04 18:14                               ` Ted Ts'o
2010-09-04 20:00                                 ` Luke Kenneth Casson Leighton
2010-09-04 22:41                                   ` Ted Ts'o
2010-09-05 17:22                                     ` Luke Kenneth Casson Leighton
2010-09-04 20:20                                 ` Jakub Narebski
2010-09-04 20:47                                   ` Luke Kenneth Casson Leighton
2010-09-04 21:16                                     ` Jakub Narebski
2010-09-04 21:24                                       ` Luke Kenneth Casson Leighton
2010-09-04 22:47                                     ` Ted Ts'o
2010-09-05  1:43                                       ` Tomas Carnecky
2010-09-05  1:18                             ` Nicolas Pitre
2010-09-05 17:25                               ` Luke Kenneth Casson Leighton
2010-09-06  0:05                                 ` Nicolas Pitre
2010-09-04 12:33                           ` Luke Kenneth Casson Leighton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='AANLkTikVf=X8cLP9s6W9VGOt0EHE4J5MYsBpgKYhrAri@mail.gmail.com' \
    --to=kyle@moffetthome.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=luke.leighton@gmail.com \
    --cc=nico@fluxnic.net \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).