* Re: git repository size / compression
2011-09-09 14:04 ` neubyr
@ 2011-09-09 14:25 ` Sverre Rabbelier
2011-09-09 14:28 ` Carlos Martín Nieto
2011-09-09 14:54 ` Jakub Narebski
2 siblings, 0 replies; 10+ messages in thread
From: Sverre Rabbelier @ 2011-09-09 14:25 UTC (permalink / raw)
To: neubyr; +Cc: Carlos Martín Nieto, git
Heya,
On Fri, Sep 9, 2011 at 16:04, neubyr <neubyr@gmail.com> wrote:
> Does git store deltas for some files? I thought it uses snapshots
> (exact copy of staged files) only.
In packs, yes, it will try to delta objects as efficient as possible.
--
Cheers,
Sverre Rabbelier
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git repository size / compression
2011-09-09 14:04 ` neubyr
2011-09-09 14:25 ` Sverre Rabbelier
@ 2011-09-09 14:28 ` Carlos Martín Nieto
2011-09-09 15:07 ` neubyr
2011-09-09 14:54 ` Jakub Narebski
2 siblings, 1 reply; 10+ messages in thread
From: Carlos Martín Nieto @ 2011-09-09 14:28 UTC (permalink / raw)
To: neubyr; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 2895 bytes --]
On Fri, 2011-09-09 at 09:04 -0500, neubyr wrote:
> On Fri, Sep 9, 2011 at 3:23 AM, Carlos Martín Nieto <cmn@elego.de> wrote:
> > On Thu, 2011-09-08 at 21:37 -0500, neubyr wrote:
> >> I have a test git repository with just two files in it. One of the
> >> file in it has a set of two lines that is repeated n times.
> >> e.g.:
> >> {{{
> >> $ for i in {1..5}; do cat ./lexico.txt >> lexico1.txt && cat
> >> ./lexico.txt >> lexico1.txt && mv ./lexico1.txt ./lexico.txt; done
> >> }}}
> >>
> >
> > So you've just created some data that can be compressed quite
> > efficiently.
> >
> >> I ran above command few times and performed commit after each run. Now
> >> disk usage of this repository directory is mentioned below. The 419M
> >> is working directory size and 2.7M is git repository/database size.
> >>
> >> {{{
> >> $ du -h -d 1 .
> >> 2.7M ./.git
> >> 419M .
> >>
> >> }}}
> >>
> >> Is it because of the compression performed by git before storing data
> >> (or before sending commit)??
> >>
> >
> > Yes. Git stores its objects (the commit, the snapshot of the files,
> > etc.) compressed. When these objects are stored in a pack, the size can
> > be further reduced by storing some objects as deltas which describe the
> > difference between itself and some other object in the object-db.
> >
>
> Does git store deltas for some files? I thought it uses snapshots
> (exact copy of staged files) only.
Yes and no. The data model for git is to always store snapshots, and it
always expects to have the full files available. In a packfile, however,
in order to save space, some objects are stored as deltas to other
objects in the same file.
http://progit.org/book/ch9-4.html
>
>
> >> Following were results with subversion:
> >>
> >> Subversion client (redundant(?) copy exists in .svn/text-base/
> >> directory, hence double size in client):
> >> {{{
> >> $ du -h -d 1
> >> 416M ./.svn
> >> 832M .
> >> }}}
> >
> > Subversion stores the "pristines" (which is the status of the files in
> > the latest revision) inside the .svn directory. I wouldn't call this
> > copy redundant, though, as it allows you to run diff locally. The
> > pristines are stored uncompressed, which is why you half of the space is
> > taken up by the .svn directory.
> >
> >>
> >> Subversion repo/server:
> >> {{{
> >> $ du -h -d 1
> >> 12K ./conf
> >> 1.2M ./db
> >> 36K ./hooks
> >> 8.0K ./locks
> >> 1.2M .
> >> }}}
> >
> > I don't know how the repository is stored in Subversion, but it may also
> > be compressed. You may be able to reduced your git repository size by
> > (re)generating packs with 'git repack' and doing some cleanups with 'git
> > gc', but the repository size is not often a concern.
> >
> > cmn
> >
> >
> >
>
> that's helpful. thanks.
>
> --
> neuby.r
>
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git repository size / compression
2011-09-09 14:28 ` Carlos Martín Nieto
@ 2011-09-09 15:07 ` neubyr
0 siblings, 0 replies; 10+ messages in thread
From: neubyr @ 2011-09-09 15:07 UTC (permalink / raw)
To: Carlos Martín Nieto; +Cc: git
On Fri, Sep 9, 2011 at 9:28 AM, Carlos Martín Nieto <cmn@elego.de> wrote:
> On Fri, 2011-09-09 at 09:04 -0500, neubyr wrote:
>> On Fri, Sep 9, 2011 at 3:23 AM, Carlos Martín Nieto <cmn@elego.de> wrote:
>> > On Thu, 2011-09-08 at 21:37 -0500, neubyr wrote:
>> >> I have a test git repository with just two files in it. One of the
>> >> file in it has a set of two lines that is repeated n times.
>> >> e.g.:
>> >> {{{
>> >> $ for i in {1..5}; do cat ./lexico.txt >> lexico1.txt && cat
>> >> ./lexico.txt >> lexico1.txt && mv ./lexico1.txt ./lexico.txt; done
>> >> }}}
>> >>
>> >
>> > So you've just created some data that can be compressed quite
>> > efficiently.
>> >
>> >> I ran above command few times and performed commit after each run. Now
>> >> disk usage of this repository directory is mentioned below. The 419M
>> >> is working directory size and 2.7M is git repository/database size.
>> >>
>> >> {{{
>> >> $ du -h -d 1 .
>> >> 2.7M ./.git
>> >> 419M .
>> >>
>> >> }}}
>> >>
>> >> Is it because of the compression performed by git before storing data
>> >> (or before sending commit)??
>> >>
>> >
>> > Yes. Git stores its objects (the commit, the snapshot of the files,
>> > etc.) compressed. When these objects are stored in a pack, the size can
>> > be further reduced by storing some objects as deltas which describe the
>> > difference between itself and some other object in the object-db.
>> >
>>
>> Does git store deltas for some files? I thought it uses snapshots
>> (exact copy of staged files) only.
>
> Yes and no. The data model for git is to always store snapshots, and it
> always expects to have the full files available. In a packfile, however,
> in order to save space, some objects are stored as deltas to other
> objects in the same file.
>
> http://progit.org/book/ch9-4.html
>
Excellent.. That explains compression and deltas really well. Thanks again..
--
neuby.r
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git repository size / compression
2011-09-09 14:04 ` neubyr
2011-09-09 14:25 ` Sverre Rabbelier
2011-09-09 14:28 ` Carlos Martín Nieto
@ 2011-09-09 14:54 ` Jakub Narebski
2011-09-09 15:09 ` neubyr
2 siblings, 1 reply; 10+ messages in thread
From: Jakub Narebski @ 2011-09-09 14:54 UTC (permalink / raw)
To: neubyr; +Cc: Carlos Martín Nieto, git
neubyr <neubyr@gmail.com> writes:
> On Fri, Sep 9, 2011 at 3:23 AM, Carlos Martín Nieto <cmn@elego.de> wrote:
> > On Thu, 2011-09-08 at 21:37 -0500, neubyr wrote:
>>> I have a test git repository with just two files in it. One of the
>>> file in it has a set of two lines that is repeated n times.
>>> e.g.:
>>> {{{
>>> $ for i in {1..5}; do cat ./lexico.txt>> lexico1.txt && cat
>>> ./lexico.txt>> lexico1.txt && mv ./lexico1.txt ./lexico.txt; done
>>> }}}
>>>
>>
>> So you've just created some data that can be compressed quite
>> efficiently.
>>
>>> I ran above command few times and performed commit after each run. Now
>>> disk usage of this repository directory is mentioned below. The 419M
>>> is working directory size and 2.7M is git repository/database size.
>>>
>>> {{{
>>> $ du -h -d 1 .
>>> 2.7M ./.git
>>> 419M .
>>>
>>> }}}
Have you tried the same but with
$ git gc --prune=now
before running `du`?
>>> Is it because of the compression performed by git before storing data
>>> (or before sending commit)??
>>
>> Yes. Git stores its objects (the commit, the snapshot of the files,
>> etc.) compressed. When these objects are stored in a pack, the size can
>> be further reduced by storing some objects as deltas which describe the
>> difference between itself and some other object in the object-db.
>
> Does git store deltas for some files? I thought it uses snapshots
> (exact copy of staged files) only.
When creating packfile from loose objects (e.g. via `git gc`), it
does perform delta compression.
--
Jakub Narębski
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git repository size / compression
2011-09-09 14:54 ` Jakub Narebski
@ 2011-09-09 15:09 ` neubyr
0 siblings, 0 replies; 10+ messages in thread
From: neubyr @ 2011-09-09 15:09 UTC (permalink / raw)
To: Jakub Narebski; +Cc: Carlos Martín Nieto, git, pjweisberg
2011/9/9 Jakub Narebski <jnareb@gmail.com>:
> neubyr <neubyr@gmail.com> writes:
>> On Fri, Sep 9, 2011 at 3:23 AM, Carlos Martín Nieto <cmn@elego.de> wrote:
>> > On Thu, 2011-09-08 at 21:37 -0500, neubyr wrote:
>
>>>> I have a test git repository with just two files in it. One of the
>>>> file in it has a set of two lines that is repeated n times.
>>>> e.g.:
>>>> {{{
>>>> $ for i in {1..5}; do cat ./lexico.txt>> lexico1.txt && cat
>>>> ./lexico.txt>> lexico1.txt && mv ./lexico1.txt ./lexico.txt; done
>>>> }}}
>>>>
>>>
>>> So you've just created some data that can be compressed quite
>>> efficiently.
>>>
>>>> I ran above command few times and performed commit after each run. Now
>>>> disk usage of this repository directory is mentioned below. The 419M
>>>> is working directory size and 2.7M is git repository/database size.
>>>>
>>>> {{{
>>>> $ du -h -d 1 .
>>>> 2.7M ./.git
>>>> 419M .
>>>>
>>>> }}}
>
> Have you tried the same but with
>
> $ git gc --prune=now
>
> before running `du`?
>
Nope, I hadn't run git gc before. Here are du results after running
git gc command. That's about 55% less space now.. Great!
{{{
$ du -d 1 -h
924K ./.git
417M .
}}}
>>>> Is it because of the compression performed by git before storing data
>>>> (or before sending commit)??
>>>
>>> Yes. Git stores its objects (the commit, the snapshot of the files,
>>> etc.) compressed. When these objects are stored in a pack, the size can
>>> be further reduced by storing some objects as deltas which describe the
>>> difference between itself and some other object in the object-db.
>>
>> Does git store deltas for some files? I thought it uses snapshots
>> (exact copy of staged files) only.
>
> When creating packfile from loose objects (e.g. via `git gc`), it
> does perform delta compression.
>
> --
> Jakub Narębski
>
thank you everyone for explaining in detail..
--
neuby.r
^ permalink raw reply [flat|nested] 10+ messages in thread