git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dmitry Potapov <dpotapov@gmail.com>
To: Noah Silverman <noah@smartmediacorp.com>
Cc: git@vger.kernel.org
Subject: Re: Advice on choosing git
Date: Wed, 12 May 2010 13:04:19 +0400	[thread overview]
Message-ID: <20100512090418.GM14069@dpotapov.dyndns.org> (raw)
In-Reply-To: <4BEA4B46.6010009@smartmediacorp.com>

On Tue, May 11, 2010 at 11:31:34PM -0700, Noah Silverman wrote:
> 
> 1) Size.  THIS IS MY MAIN CONCERN - If I want to sync my home, office,
> and server Document directories.  From what I have read, I will
> effectively have multiple copies of each item on my hard drive, thus
> eating up a lot of space (One of the "working file"and several in the
> .git directory.)

Usually, Git is more efficient in disk space than other DVCS, because
it uses packages to store files. In each package contains deltified
and then gzip data, and this deltification is done not only relatively
to direct ancestor but potentially any suitable candidate (there is some
heuristic to find best). But when you add a new file to the repository
then it is stored just gzip compressed inside .gzip/objects. Such files
are often referred as "loose" in Git documentation. When you have a lot
of loose objects then the garbage collector is activated and packs them
together. Obviously, you can run "git gc" that manually, or to configure
the condition what means too many loose objects.

Even those files that are stored as loose objects is never transfered
separately over network. When you pull or push, all required objects are
packed together in a single package, and this package is sent to the
other side. So, on the other side they will never stored as separate
files. But each push/pull can create a new package, if you have too many
small packages, git-gc will combine them into a single package.

However, if you have huge multi-media files, I am not sure how Git is
good at handling them. There were some improvements to Git recently,
and there is a clone of git that specifically focuses on this problem:
http://caca.zoy.org/wiki/git-bigfile
but I don't know much about it.

> several full versions of it on my machine.  This could be a problem for
> a directory with 100GB or more, especially on a laptop with limited hard
> drive space.  I know Subversion is a dirty word around here, but it
> seemed to only annotate and send the changes

Actually, Subversion is very inefficient in space usage (at least,
when I used it last time). I had a repository where subversion checkout
took much more space than git working tree and the whole repository with
all history combine! Obviously, a centralized VCS do not have to store
the whole history on each client, which saves space, but having the
whole history with you is very handy, and also it avoids the situation
where you have a single point of failure.

BTW, git allows to do a shallow clone to save space by not storying the
whole history (only the specified number of revisions), but I have never
used this feature, and it has some limitations.

> 
> 2) Sub-directory selection.  On my laptop, I only want a few
> sub-directories to be synced up.  I don't need my whole document tree,
> but just a few directories of things I work on.

Synchronization works on what you committed in your repository. At
this level, directories are completely irrelevant. Probably, you
want to have a separate repository for each sub-directory that you
want to synchronize separately, and then you can bundle them together
using git-submodules mechanism or trivial shell script that will
synchronize all of them.

In fact, the basic concept of Git is to treat a single repository
as whole. So, if you have some pieces that are irrelevant, it is
better to store them in separate repositories. It will improve
speed and possible disk usage, because deltifying will have easy
time to find related files, so compression will be better.

> 
> Bazaar also looks like a possible option, but I'm not sure it handles
> drive usage better.  Their website has a lengthy manifesto about how
> they're better than Git, but I don't have enough experience with either
> to make an informed decision.

Well, this manifesto sounds like written by a marketing guy, and it
compares Bazaar to rather old version of Git... So I am not going to
comment on it.

In fact, any meaningful comparison has to consider your workflow. Git
targets fully distributed workflow, which may even have hierarchy of
repositories, while Bazaar focus around more centralized solution and
close to what you have with Subversion. So, people who got used to a
centralized VCS may find Bazaar easier at the beginning, but IMHO,
Git is more flexible and when you learn basic principles everything
feels very natural.

In any case, your main concern was the size of the repository, and
even this marketing piece from Bazaar admits that Git is better at
saving disk space.

Here you can see some comparison of a repository size for Git,
Mercurial, Bazaar:
http://vcscompare.blogspot.com/2008/06/git-mercurial-bazaar-repository-size.html



Dmitry

  reply	other threads:[~2010-05-12  9:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-12  6:31 Advice on choosing git Noah Silverman
2010-05-12  9:04 ` Dmitry Potapov [this message]
2010-05-12  9:15 ` Ramkumar Ramachandra
2010-05-12  9:24 ` Jonathan Nieder
2010-05-13  0:18 ` Joe Brenner
2010-05-13  0:31   ` Avery Pennarun
2010-05-13 11:48     ` Matthieu Moy
2010-05-13 17:31       ` Avery Pennarun
2010-05-19  0:37     ` Anthony W. Youngman
2010-05-19  1:12       ` Avery Pennarun
2010-05-13 11:42   ` Matthieu Moy
2010-05-13 11:51     ` Jeff King
2010-05-13 18:20 ` Martin Langhoff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100512090418.GM14069@dpotapov.dyndns.org \
    --to=dpotapov@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=noah@smartmediacorp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).