git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* GIT Performance question
@ 2010-04-17  9:55 santos2010
  2010-04-17 10:37 ` Geert Bosch
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: santos2010 @ 2010-04-17  9:55 UTC (permalink / raw)
  To: git


Hello,

Our company is evaluating SCM solutions, one of our most important
requirements is performance as we develop over 3 differents sites across the
world.
I read that GIT doesn't use deltas, it uses snapshots. My question is: how
could GIT have high performance (most of the users say that) if for
synchronization (pull/push command) with e.g. a shared repository GIT
transfers all modified files (and references) instead of the respective
deltas? 

Thanks in advance,

Santos
-- 
View this message in context: http://n2.nabble.com/GIT-Performance-question-tp4917066p4917066.html
Sent from the git mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: GIT Performance question
  2010-04-17  9:55 GIT Performance question santos2010
@ 2010-04-17 10:37 ` Geert Bosch
  2010-04-17 10:37 ` Jeff King
  2010-04-17 10:40 ` Dmitry Potapov
  2 siblings, 0 replies; 6+ messages in thread
From: Geert Bosch @ 2010-04-17 10:37 UTC (permalink / raw)
  To: santos2010; +Cc: git


On Apr 17, 2010, at 05:55, santos2010 wrote:

> 
> Hello,
> 
> Our company is evaluating SCM solutions, one of our most important
> requirements is performance as we develop over 3 differents sites across the
> world.
> I read that GIT doesn't use deltas, it uses snapshots.
Git does use deltas between snapshots.

  -Geert

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: GIT Performance question
  2010-04-17  9:55 GIT Performance question santos2010
  2010-04-17 10:37 ` Geert Bosch
@ 2010-04-17 10:37 ` Jeff King
  2010-04-17 10:40 ` Dmitry Potapov
  2 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2010-04-17 10:37 UTC (permalink / raw)
  To: santos2010; +Cc: git

On Sat, Apr 17, 2010 at 01:55:49AM -0800, santos2010 wrote:

> Our company is evaluating SCM solutions, one of our most important
> requirements is performance as we develop over 3 differents sites across the
> world.
> I read that GIT doesn't use deltas, it uses snapshots. My question is: how
> could GIT have high performance (most of the users say that) if for
> synchronization (pull/push command) with e.g. a shared repository GIT
> transfers all modified files (and references) instead of the respective
> deltas? 

Short answer: Git does store and transfer deltas. It generally beats any
other system in terms of repo size.

Longer answer:

Git separates the concept of the history graph and the actual storage
mechanism. So conceptually the history is a directed graph of snapshots,
each representing the whole tree. But there are two things that save
space:

  1. Git addresses content by its sha1. So each snapshot may refer to a
     file by the sha1 of its content, meaning we only have to store that
     content once.

  2. Git packs "objects" (where each file's content is in a single
     object) into "packfiles", in which it aggressively deltas objects
     against each other, including objects which do not come from the
     same path in your tree.

Git will store "loose" objects when performing most operations, but will
occasionally pack when the number of objects get too high. You can also
initiate a full pack by running "git gc".

For transferring between repositories, git will figure out which parts
of the history each side has, and will only send the objects that the
other side needs. In addition, it will send them as a packfile using
delta compression, including deltas against objects that are not being
sent but that it knows the other side has.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: GIT Performance question
  2010-04-17  9:55 GIT Performance question santos2010
  2010-04-17 10:37 ` Geert Bosch
  2010-04-17 10:37 ` Jeff King
@ 2010-04-17 10:40 ` Dmitry Potapov
  2010-04-17 11:21   ` santos2010
  2 siblings, 1 reply; 6+ messages in thread
From: Dmitry Potapov @ 2010-04-17 10:40 UTC (permalink / raw)
  To: santos2010; +Cc: git

On Sat, Apr 17, 2010 at 01:55:49AM -0800, santos2010 wrote:
> 
> I read that GIT doesn't use deltas, it uses snapshots. My question is: how
> could GIT have high performance (most of the users say that) if for
> synchronization (pull/push command) with e.g. a shared repository GIT
> transfers all modified files (and references) instead of the respective
> deltas? 

Well, Git _does_ use deltas for storage and synchronization, but this
deltas are unrelated to history of changes stored in the repository. So,
conceptually, Git just stores snapshots, but files in those snapshots
are deltified against some old files based on some heuristic of finding
similar files, which allows Git to create deltas not only to previous
version of the same file (which most VCSes do), but potentially to any
file stored in the repository if it similar enough. So, typically, Git
has the most compact storage comparing to other VCSes, in particular, in
case of complex history with a lot of branches and merges.


Dmitry

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: GIT Performance question
  2010-04-17 10:40 ` Dmitry Potapov
@ 2010-04-17 11:21   ` santos2010
  2010-04-17 11:55     ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: santos2010 @ 2010-04-17 11:21 UTC (permalink / raw)
  To: git


Thanks a lot for the quick answer. Are there some references (on-line books
or web pages) where i could find details about this approach? I need this to
justify my evaluation :)
-- 
View this message in context: http://n2.nabble.com/GIT-Performance-question-tp4917066p4917251.html
Sent from the git mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: GIT Performance question
  2010-04-17 11:21   ` santos2010
@ 2010-04-17 11:55     ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2010-04-17 11:55 UTC (permalink / raw)
  To: santos2010; +Cc: git

On Sat, Apr 17, 2010 at 03:21:13AM -0800, santos2010 wrote:

> Thanks a lot for the quick answer. Are there some references (on-line books
> or web pages) where i could find details about this approach? I need this to
> justify my evaluation :)

Try the "Git Internals" chapter of Scott Chacon's Pro Git book, which is
available online here:

  http://progit.org/book/ch9-0.html

He has some pretty pictures, too.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-04-17 11:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-17  9:55 GIT Performance question santos2010
2010-04-17 10:37 ` Geert Bosch
2010-04-17 10:37 ` Jeff King
2010-04-17 10:40 ` Dmitry Potapov
2010-04-17 11:21   ` santos2010
2010-04-17 11:55     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).