From: Nicolas Pitre <nico@cam.org>
To: Jakub Narebski <jnareb@gmail.com>
Cc: Martin Langhoff <martin.langhoff@gmail.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: Why Git is so fast (was: Re: Eric Sink's blog - notes on git, dscms and a "whole product" approach)
Date: Thu, 30 Apr 2009 14:56:23 -0400 (EDT) [thread overview]
Message-ID: <alpine.LFD.2.00.0904301401120.6741@xanadu.home> (raw)
In-Reply-To: <m3fxfqnxn5.fsf_-_@localhost.localdomain>
On Thu, 30 Apr 2009, Jakub Narebski wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
>
> es> Two: Maybe Git is fast because Linus Torvalds is so smart.
>
> [non answer; the details are important]
I think Linus is certainly responsible for a big part of Git's speed.
He came with the basic data structure used by git which has lots to do
with that. Also, he designed Git specifically to fulfill a need for
which none of the alternatives were fast enough. Hence Git was designed
from the ground up with speed as one of the primary design goals, such
as being able to create multiple commits per second instead of the other
way around (several seconds per commit). And yes, Linus is usually smart
enough with the proper mindset to achieve such goals.
> es> Three: Maybe Git is fast because it's written in C instead of one
> es> of those newfangled higher-level languages.
> es>
> es> Nah, probably not. Lots of people have written fast software in
> es> C#, Java or Python.
> es>
> es> And lots of people have written really slow software in
> es> traditional native languages like C/C++. [...]
>
> Well, I guess that access to low-level optimization techniques like
> mmap are important for performance. But here I am guessing and
> speculating like Eric did; well, I am asking on a proper forum ;-)
>
> We have some anegdotical evidence supporting this possibility (which
> Eric dismisses), namely the fact that pure-Python Bazaar is slowest of
> three most common open source DVCS (Git, Mercurial, bazaar) and the
> fact that parts of Mercurial were written in C for better performance.
>
> We can also compare implementations of Git in other, higher level
> languages, with reference implementation in C (and shell scripts, and
> Perl ;-)). For example most complete I think but still not fully
> complete Java implementation: JGit. I hope that JGit developers can
> tell us whether using higher level language affects performance, how
> much, and what features of higher-level language are causing decrease
> in performance. Of course we have to take into account the
> possibility that JGit isn't simply as well optimized because of less
> manpower.
One of the main JGit developers is Shawn Pearce. If you look at Shawn's
contribution to C git, they mostly are all related to performance
issues. Amongst other things, he is the author of git-fast-import, he
contributed the pack access windowing code, and he was also involved in
the initial design of pack v4. Hence Shawn is a smart guy who certainly
knows one or two things about performance optimizations. Yet he
reported on this list that his efforts to make JGit faster were not much
successful anymore, most probably due to the language overhead.
> es> Four: Maybe Git is fast because being fast is the primary goal for
> es> Git.
>
> [non answer; the details are important]
Still, this is actually true (see about Linus above). Without such a
goal, you quickly lose sight of performance regressions.
> es> Maybe Git is fast because every time they faced one of these "buy
> es> vs. build" choices, they decided to just write it themselves.
>
> I don't think so. Rather the opposite is true. Git uses libcurl for
> HTTP transport. Git uses zlib for compression. Git uses SHA-1 from
> OpenSSL or from Mozilla. Git uses (modified, internal) LibXDiff for
> (binary) deltaifying, for diffs and for merges.
Well, I think he's right on this point as well. libcurl is not so
relevant since it is rarely the bottleneck (the network bandwidth itself
usually is). zlib is already as fast as it can be as multiple attempts
to make it faster didn't succeed. Git already carries its own version
of SHA-1 code for ARM and PPC because the alternatives were slower.
The fact that libxdiff was made internal is indeed to have a better
impedance matching with the core code, otherwise it could have remained
fully external just like zlib. And the binary delta code is not
libxdiff anymore but a much smaller, straight forward, and optimized to
death version to achieve speed over versatility (no need to be versatile
when strictly dealing with Git's needs only).
> es> Seven: Maybe Git isn't really that fast.
> es>
> es> If there is one thing I've learned about version control it's that
> es> everybody's situation is different. It is quite likely that Git
> es> is a lot faster for some scenarios than it is for others.
> es>
> es> How does Git handle really large trees? Git was designed primary
> es> to support the efforts of the Linux kernel developers. A lot of
> es> people think the Linux kernel is a large tree, but it's really
> es> not. Many enterprise configuration management repositories are
> es> FAR bigger than the Linux kernel.
>
> c.f. "Why Perforce is more scalable than Git" by Steve Hanov
> http://gandolf.homelinux.org/blog/index.php?id=50
>
> I don't really know about this.
Git certainly sucks big time with large files.
Git also sucks to a lesser extent (but still) with very large
repositories.
But large trees? I don't think Git is worse than anything out there
with a large tree of average size files.
Yet, this point is misleading because when people gives to Git the
reputation of being faster, this is certainly from comparison of
operations performed on the same source tree. Who cares about scenarios
for which the tool was not designed? Those "enterprise configuration
management repositories" are not what Git was designed for indeed, but
neither was Mercurial nor Bazaar, or any other contender to which Git is
usually compared.
Nicolas
next prev parent reply other threads:[~2009-04-30 18:57 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-27 8:55 Eric Sink's blog - notes on git, dscms and a "whole product" approach Martin Langhoff
2009-04-28 11:24 ` Cross-Platform Version Control (was: Eric Sink's blog - notes on git, dscms and a "whole product" approach) Jakub Narebski
2009-04-28 21:00 ` Robin Rosenberg
2009-04-29 6:55 ` Martin Langhoff
2009-04-29 7:21 ` Jeff King
2009-04-29 20:05 ` Markus Heidelberg
2009-04-29 7:52 ` Cross-Platform Version Control Jakub Narebski
2009-04-29 8:25 ` Martin Langhoff
2009-04-28 18:16 ` Eric Sink's blog - notes on git, dscms and a "whole product" approach Jakub Narebski
2009-04-29 7:54 ` Sitaram Chamarty
2009-04-30 12:17 ` Why Git is so fast (was: Re: Eric Sink's blog - notes on git, dscms and a "whole product" approach) Jakub Narebski
2009-04-30 12:56 ` Michael Witten
2009-04-30 15:28 ` Why Git is so fast Jakub Narebski
2009-04-30 18:52 ` Shawn O. Pearce
2009-04-30 20:36 ` Kjetil Barvik
2009-04-30 20:40 ` Shawn O. Pearce
2009-04-30 21:36 ` Kjetil Barvik
2009-05-01 0:23 ` Steven Noonan
2009-05-01 1:25 ` James Pickens
2009-05-01 9:19 ` Kjetil Barvik
2009-05-01 9:34 ` Mike Hommey
2009-05-01 9:42 ` Kjetil Barvik
2009-05-01 17:42 ` Tony Finch
2009-05-01 5:24 ` Dmitry Potapov
2009-05-01 9:42 ` Mike Hommey
2009-05-01 10:46 ` Dmitry Potapov
2009-04-30 18:43 ` Why Git is so fast (was: Re: Eric Sink's blog - notes on git, dscms and a "whole product" approach) Shawn O. Pearce
2009-04-30 14:22 ` Jeff King
2009-05-01 18:43 ` Linus Torvalds
2009-05-01 19:08 ` Jeff King
2009-05-01 19:13 ` david
2009-05-01 19:32 ` Nicolas Pitre
2009-05-01 21:17 ` Daniel Barkalow
2009-05-01 21:37 ` Linus Torvalds
2009-05-01 22:11 ` david
2009-04-30 18:56 ` Nicolas Pitre [this message]
2009-04-30 19:16 ` Alex Riesen
2009-05-04 8:01 ` Why Git is so fast Andreas Ericsson
2009-04-30 19:33 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.00.0904301401120.6741@xanadu.home \
--to=nico@cam.org \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
--cc=martin.langhoff@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).