From: Jakub Narebski <jnareb@gmail.com>
To: Jan Holesovsky <kendy@suse.cz>
Cc: git@vger.kernel.org, gitster@pobox.com
Subject: Re: [PATCH] RFC: git lazy clone proof-of-concept
Date: Fri, 08 Feb 2008 11:00:55 -0800 (PST) [thread overview]
Message-ID: <m3ejbngtnn.fsf@localhost.localdomain> (raw)
In-Reply-To: <200802081828.43849.kendy@suse.cz>
Jan Holesovsky <kendy@suse.cz> writes:
> This is my attempt to implement the 'lazy clone' I've read about a
> bit in the git mailing list archive, but did not see implemented
> anywhere - the clone that fetches a minimal amount of data with the
> possibility to download the rest later (transparently!) when
> necessary.
It was not implemented because it was thought to be hard; git assumes
in many places that if it has an object, it has all objects referenced
by it.
But it is very nice of you to [try to] implement 'lazy clone'/'remote
alternates'.
Could you provide some benchmarks (time, network throughtput, latency)
for your implementation?
> Currently we are evaluating the usage of git for OpenOffice.org as
> one of the candidates (SVN is the other one), see
>
> http://wiki.services.openoffice.org/wiki/SCM_Migration
>
> I've provided a git import of OOo with the entire history; the
> problem is that the pack has 2.5G, so it's not too convenient to
> download for casual developers that just want to try it.
One of the reasons why 'lazy clone' was not implemented was the fact
that by using large enough window, and larger than default delta
length you can repack "archive pack" (and keep it from trying to
repack using .keep files, see git-config(1)) much tighter than with
default (time and CPU conserving) options, and much, much tighter than
pack which is result of fast-import driven import.
Both Mozilla import, and GCC import were packed below 0.5 GB. Warning:
you would need machine with large amount of memory to repack it
tightly in sensible time!
> Shallow clone is not a possibility - we don't get patches through
> mailing lists, so we need the pull/push, and also thanks to the OOo
> development cycle, we have too many living heads which causes the
> shallow clone to download about 1.5G even with --depth 1.
Wouldn't be easier to try to fix shallow clone implementation to allow
for pushing from shallow to full clone (fetching from full to shallow
is implemented), and perhaps also push/pull between two shallow
clones?
As to many living heads: first, you don't need to fetch all
heads. Currently git-clone has no option to select subset of heads to
clone, but you can always use git-init + hand configuration +
git-remote and git-fetch for actual fetching.
By the way, did you try to split OpenOffice.org repository at the
components boundary into submodules (subprojects)? This would also
limit amount of needed download, as you don't neeed to download and
checkout all subprojects.
The problem of course is _how_ to split repository into
submodules. Submodules should be enough self contained so the
whole-tree commit is alsays (or almost always) only about submodule.
> Lazy clone sounded like the right idea to me. With this
> proof-of-concept implementation, just about 550M from the 2.5G is
> downloaded, which is still about twice as much in comparison with
> downloading a tarball, but bearable.
Do you have any numbers for OOo repository like number of revisions,
depth of DAG of commits (maximum number of revisions in one line of
commits), number of files, size of checkout, average size of file,
etc.?
--
Jakub Narebski
Poland
ShadeHawk on #git
next prev parent reply other threads:[~2008-02-08 19:01 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-08 17:28 [PATCH] RFC: git lazy clone proof-of-concept Jan Holesovsky
2008-02-08 18:03 ` Nicolas Pitre
2008-02-09 14:25 ` Jan Holesovsky
2008-02-09 22:05 ` Mike Hommey
2008-02-09 23:38 ` Nicolas Pitre
2008-02-10 7:23 ` Marco Costalba
2008-02-10 12:08 ` Johannes Schindelin
2008-02-10 16:46 ` David Symonds
2008-02-10 17:45 ` Johannes Schindelin
2008-02-10 19:45 ` Nicolas Pitre
2008-02-10 20:32 ` Johannes Schindelin
2008-02-08 18:14 ` Harvey Harrison
2008-02-09 14:27 ` Jan Holesovsky
2008-02-08 18:20 ` Johannes Schindelin
2008-02-08 18:49 ` Mike Hommey
2008-02-08 19:04 ` Johannes Schindelin
2008-02-09 15:06 ` Jan Holesovsky
2008-02-08 19:00 ` Jakub Narebski [this message]
2008-02-08 19:26 ` Jon Smirl
2008-02-08 20:09 ` Nicolas Pitre
2008-02-11 10:13 ` Andreas Ericsson
2008-02-12 2:55 ` [PATCH 1/2] pack-objects: Allow setting the #threads equal to #cpus automatically Brandon Casey
2008-02-12 5:53 ` Andreas Ericsson
[not found] ` <1202784078-23700-1-git-send-email-casey@nrlssc.navy.mil>
2008-02-12 2:59 ` [PATCH 2/2] pack-objects: Default to zero threads, meaning auto-assign to #cpus Brandon Casey
2008-02-12 4:57 ` Nicolas Pitre
2008-02-08 20:19 ` [PATCH] RFC: git lazy clone proof-of-concept Harvey Harrison
2008-02-08 20:24 ` Jon Smirl
2008-02-08 20:25 ` Harvey Harrison
2008-02-08 20:41 ` Jon Smirl
2008-02-09 15:27 ` Jan Holesovsky
2008-02-10 3:10 ` Nicolas Pitre
2008-02-10 4:59 ` Sean
2008-02-10 5:22 ` Nicolas Pitre
2008-02-10 5:35 ` Sean
2008-02-11 1:42 ` Jakub Narebski
2008-02-11 2:04 ` Nicolas Pitre
2008-02-11 10:11 ` Jakub Narebski
2008-02-10 9:34 ` Joachim B Haga
2008-02-10 16:43 ` Johannes Schindelin
2008-02-10 17:01 ` Jon Smirl
2008-02-10 17:36 ` Johannes Schindelin
2008-02-10 18:47 ` Johannes Schindelin
2008-02-10 19:42 ` Nicolas Pitre
2008-02-10 20:11 ` Jon Smirl
2008-02-12 20:37 ` Johannes Schindelin
2008-02-12 21:05 ` Nicolas Pitre
2008-02-12 21:08 ` Linus Torvalds
2008-02-12 21:36 ` Jon Smirl
2008-02-12 21:59 ` Linus Torvalds
2008-02-12 22:25 ` Linus Torvalds
2008-02-12 22:43 ` Jon Smirl
2008-02-12 23:39 ` Linus Torvalds
2008-02-12 21:25 ` Jon Smirl
2008-02-14 19:20 ` Johannes Schindelin
2008-02-14 20:05 ` Jakub Narebski
2008-02-14 20:16 ` Nicolas Pitre
2008-02-14 21:04 ` Johannes Schindelin
2008-02-14 21:59 ` Jakub Narebski
2008-02-14 23:38 ` Johannes Schindelin
2008-02-14 23:51 ` Brian Downing
2008-02-14 23:57 ` Brian Downing
2008-02-15 0:08 ` Johannes Schindelin
2008-02-15 1:41 ` Nicolas Pitre
2008-02-17 8:18 ` Shawn O. Pearce
2008-02-17 9:05 ` Junio C Hamano
2008-02-17 18:44 ` Nicolas Pitre
2008-02-15 1:07 ` Jakub Narebski
2008-02-15 9:43 ` Jan Holesovsky
2008-02-14 21:08 ` Brandon Casey
2008-02-15 9:34 ` Jan Holesovsky
2008-02-10 19:50 ` Nicolas Pitre
2008-02-14 19:41 ` Brandon Casey
2008-02-14 19:58 ` Johannes Schindelin
2008-02-14 20:11 ` Nicolas Pitre
2008-02-11 1:20 ` Jakub Narebski
2008-02-08 20:16 ` Johannes Schindelin
2008-02-08 21:35 ` Jakub Narebski
2008-02-08 21:52 ` Johannes Schindelin
2008-02-08 22:03 ` Mike Hommey
2008-02-08 22:34 ` Johannes Schindelin
2008-02-08 22:50 ` Mike Hommey
2008-02-08 23:14 ` Johannes Schindelin
2008-02-08 23:38 ` Mike Hommey
2008-02-09 21:20 ` Jan Hudec
2008-02-09 15:54 ` Jan Holesovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m3ejbngtnn.fsf@localhost.localdomain \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kendy@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).