git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Carlos Martín Nieto" <cmn@elego.de>
To: neubyr <neubyr@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: git repository size / compression
Date: Fri, 09 Sep 2011 10:23:13 +0200	[thread overview]
Message-ID: <1315556595.2019.11.camel@bee.lab.cmartin.tk> (raw)
In-Reply-To: <CALFxCvzVjC+u=RDkDCQp0QqPETsv8ROE8tY=37tmMWxmQoJOEw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2087 bytes --]

On Thu, 2011-09-08 at 21:37 -0500, neubyr wrote:
> I have a test git repository with just two files in it. One of the
> file in it has a set of two lines that is repeated n times.
> e.g.:
> {{{
> $ for i in {1..5}; do cat ./lexico.txt >> lexico1.txt &&  cat
> ./lexico.txt >> lexico1.txt && mv ./lexico1.txt ./lexico.txt;  done
> }}}
> 

So you've just created some data that can be compressed quite
efficiently.

> I ran above command few times and performed commit after each run. Now
> disk usage of this repository directory is mentioned below. The 419M
> is working directory size and 2.7M is git repository/database size.
> 
> {{{
> $ du -h -d 1 .
> 2.7M    ./.git
> 419M    .
> 
> }}}
> 
> Is it because of the compression performed by git before storing data
> (or before sending commit)??
> 

Yes. Git stores its objects (the commit, the snapshot of the files,
etc.) compressed. When these objects are stored in a pack, the size can
be further reduced by storing some objects as deltas which describe the
difference between itself and some other object in the object-db.

> Following were results with subversion:
> 
> Subversion client (redundant(?) copy exists in .svn/text-base/
> directory, hence double size in client):
> {{{
> $ du -h -d 1
> 416M    ./.svn
> 832M    .
> }}}

Subversion stores the "pristines" (which is the status of the files in
the latest revision) inside the .svn directory. I wouldn't call this
copy redundant, though, as it allows you to run diff locally. The
pristines are stored uncompressed, which is why you half of the space is
taken up by the .svn directory.

> 
> Subversion repo/server:
> {{{
> $ du -h -d 1
>  12K    ./conf
> 1.2M    ./db
>  36K    ./hooks
> 8.0K    ./locks
> 1.2M    .
> }}}

I don't know how the repository is stored in Subversion, but it may also
be compressed. You may be able to reduced your git repository size by
(re)generating packs with 'git repack' and doing some cleanups with 'git
gc', but the repository size is not often a concern.

   cmn



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

  reply	other threads:[~2011-09-09  8:23 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-09  2:37 git repository size / compression neubyr
2011-09-09  8:23 ` Carlos Martín Nieto [this message]
2011-09-09 14:04   ` neubyr
2011-09-09 14:25     ` Sverre Rabbelier
2011-09-09 14:28     ` Carlos Martín Nieto
2011-09-09 15:07       ` neubyr
2011-09-09 14:54     ` Jakub Narebski
2011-09-09 15:09       ` neubyr
2011-09-09 16:05   ` John Szakmeister
2011-09-09 17:49     ` Andreas Krey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1315556595.2019.11.camel@bee.lab.cmartin.tk \
    --to=cmn@elego.de \
    --cc=git@vger.kernel.org \
    --cc=neubyr@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).