git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Curious about details of optimization of object database...
@ 2009-01-09 17:46 chris
  2009-01-09 17:55 ` Matthieu Moy
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: chris @ 2009-01-09 17:46 UTC (permalink / raw)
  To: git

I'm told a commit is *not* a patch (diff), but, rather a copy of the entire
tree.

Can anyone say, in a few sentences, how git avoids needing to keep multiple
slightly different copies of entire files without just storing lots of
patches/diffs?

cs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Curious about details of optimization of object database...
  2009-01-09 17:46 Curious about details of optimization of object database chris
@ 2009-01-09 17:55 ` Matthieu Moy
  2009-01-09 18:34   ` Nicolas Pitre
  2009-01-09 17:56 ` David Brown
  2009-01-09 19:07 ` Boyd Stephen Smith Jr.
  2 siblings, 1 reply; 5+ messages in thread
From: Matthieu Moy @ 2009-01-09 17:55 UTC (permalink / raw)
  To: chris; +Cc: git

chris@seberino.org writes:

> I'm told a commit is *not* a patch (diff), but, rather a copy of the entire
> tree.

Conceptually, yes. But obviously, the storage format (pack) does what
people usually call "delta-compression", which is basically storing
only the diff against another, similar object.

-- 
Matthieu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Curious about details of optimization of object database...
  2009-01-09 17:46 Curious about details of optimization of object database chris
  2009-01-09 17:55 ` Matthieu Moy
@ 2009-01-09 17:56 ` David Brown
  2009-01-09 19:07 ` Boyd Stephen Smith Jr.
  2 siblings, 0 replies; 5+ messages in thread
From: David Brown @ 2009-01-09 17:56 UTC (permalink / raw)
  To: chris; +Cc: git

On Fri, Jan 09, 2009 at 09:46:23AM -0800, chris@seberino.org wrote:
>I'm told a commit is *not* a patch (diff), but, rather a copy of the entire
>tree.
>
>Can anyone say, in a few sentences, how git avoids needing to keep multiple
>slightly different copies of entire files without just storing lots of
>patches/diffs?

   Documentation/technical/pack-heuristics.txt

David

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Curious about details of optimization of object database...
  2009-01-09 17:55 ` Matthieu Moy
@ 2009-01-09 18:34   ` Nicolas Pitre
  0 siblings, 0 replies; 5+ messages in thread
From: Nicolas Pitre @ 2009-01-09 18:34 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: chris, git

On Fri, 9 Jan 2009, Matthieu Moy wrote:

> chris@seberino.org writes:
> 
> > I'm told a commit is *not* a patch (diff), but, rather a copy of the entire
> > tree.
> 
> Conceptually, yes. But obviously, the storage format (pack) does what
> people usually call "delta-compression", which is basically storing
> only the diff against another, similar object.

Also, since objects representing files and directories are named after 
their actual content, having two commits with identical files and 
directories will of course share the same blob and tree objects for 
those identical parts.


Nicolas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Curious about details of optimization of object database...
  2009-01-09 17:46 Curious about details of optimization of object database chris
  2009-01-09 17:55 ` Matthieu Moy
  2009-01-09 17:56 ` David Brown
@ 2009-01-09 19:07 ` Boyd Stephen Smith Jr.
  2 siblings, 0 replies; 5+ messages in thread
From: Boyd Stephen Smith Jr. @ 2009-01-09 19:07 UTC (permalink / raw)
  To: chris; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1519 bytes --]

On Friday 2009 January 09 11:46:23 chris@seberino.org wrote:
>I'm told a commit is *not* a patch (diff), but, rather a copy of the entire
>tree.

It's even more than that.  A commit object contains its message, the SHA of 
the tree, and zero or more SHAs for its parents.

>Can anyone say, in a few sentences, how git avoids needing to keep multiple
>slightly different copies of entire files without just storing lots of
>patches/diffs?

Loose objects can have large swaths of duplicated data.  However, git also 
supports storing objects in a packed format, which uses delta compression to 
reduce the duplication to close to nothing.

Some examples:
Sizes are from "du -sh .git ."; The .git directory stores all the objects as 
well as the repository configuration, refs, reflogs, etc.  The . directory 
has .git and a clean checkout of master.

The LinuxPMI (http://linuxpmi.org/) tree:
41M     .git
83M     .
(So, the storage is actually a bit smaller than the checkout; 984 objects; 140 
commits)

A small project between me an my flatmates:
309K    .git
3.6M    .
(Here, the storage is significantly smaller than the checkout; 786 objects; 
155 commits)

My repository that tracks my dotfiles:
124K    .git
176K    .
(113 objects; 28 commits)
-- 
Boyd Stephen Smith Jr.                     ,= ,-_-. =. 
bss@iguanasuicide.net                     ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy           `-'(. .)`-' 
http://iguanasuicide.net/                      \_/     

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-01-09 19:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-09 17:46 Curious about details of optimization of object database chris
2009-01-09 17:55 ` Matthieu Moy
2009-01-09 18:34   ` Nicolas Pitre
2009-01-09 17:56 ` David Brown
2009-01-09 19:07 ` Boyd Stephen Smith Jr.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).