From: Linus Torvalds <torvalds@osdl.org>
To: Junio C Hamano <junkio@cox.net>, Ben Clifford <benc@hawaga.org.uk>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Handling large files with GIT
Date: Wed, 15 Feb 2006 09:16:21 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.64.0602150904310.3691@g5.osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0602150715470.3691@g5.osdl.org>
Btw, some actual numbers: I did the recent kernel networking merge (which
is a trivial in-index merge) with the standard three-way
git-read-tree -m <base> <branch> <branch>
and with the new git-merge-tree to compare performance.
Doing git-read-tree takes ~0.35s, while git-merge-tree took 0.015s.
Now, that's not a really fair comparison, because the end result is very
different: the git-read-tree has populated the index, ready for a
git-writet-ree, while the git-merge-tree has not.
However, the interesting part is that especially for a trivial merge, we
don't actually _want_ to necessarily populate the index, because doing a
"git-write-tree" is actually a pretty expensive operation (on the kernel,
it will try to write 1000+ directory trees, most of which already exist.
Admittedly we don't actually have to write the objects, since we figure
out that they already exist, but we have to do the SHA1 calculations to
do so).
So if we made the git-merge-tree based merge work entirely on trees all
the way, and never even necessarily populate the index at all (unless it
has to, due to actual data conflicts that want to be fixed up), that would
actually be another performance advantage. The only downside there is that
we would literally have to write the resulting tree objects by hand (ie
we'd need a new helper for doing that, and another thing to validate).
Anyway, that should almost certainly make it possible to scale up git
merges to hundreds of thousands of files without huge performance problems
(still, that depends a bit on layout - again, flat directory structures
won't scale as well, so it might not be enough for maildir handling).
But just at a guess, I think there's at least an order of magnitude to be
had there. So if a maildir merge currently takes an hour, at least we
should be able to get it down to a few minutes.
Ben, are you interested in trying this out in your maildir experiments?
Linus
next prev parent reply other threads:[~2006-02-15 17:17 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-02-08 9:14 Handling large files with GIT Martin Langhoff
2006-02-08 11:54 ` Johannes Schindelin
2006-02-08 16:34 ` Linus Torvalds
2006-02-08 17:01 ` Linus Torvalds
2006-02-08 20:11 ` Junio C Hamano
2006-02-08 21:20 ` Florian Weimer
2006-02-08 22:35 ` Martin Langhoff
2006-02-13 1:26 ` Ben Clifford
2006-02-13 3:42 ` Linus Torvalds
2006-02-13 4:57 ` Linus Torvalds
2006-02-13 5:05 ` Linus Torvalds
2006-02-13 23:17 ` Ian Molton
2006-02-13 23:19 ` Martin Langhoff
2006-02-14 18:56 ` Johannes Schindelin
2006-02-14 19:52 ` Linus Torvalds
2006-02-14 21:21 ` Sam Vilain
2006-02-14 22:01 ` Linus Torvalds
2006-02-14 22:30 ` Junio C Hamano
2006-02-15 0:40 ` Sam Vilain
2006-02-15 1:39 ` Junio C Hamano
2006-02-15 4:03 ` Sam Vilain
2006-02-15 2:07 ` Martin Langhoff
2006-02-15 2:05 ` Linus Torvalds
2006-02-15 2:18 ` Linus Torvalds
2006-02-15 2:33 ` Linus Torvalds
2006-02-15 3:58 ` Linus Torvalds
2006-02-15 9:54 ` Junio C Hamano
2006-02-15 15:44 ` Linus Torvalds
2006-02-15 17:16 ` Linus Torvalds [this message]
2006-02-16 3:25 ` Linus Torvalds
2006-02-16 3:29 ` Junio C Hamano
2006-02-16 20:32 ` Fredrik Kuivinen
2006-02-13 5:55 ` Jeff Garzik
2006-02-13 6:07 ` Keith Packard
2006-02-14 0:07 ` Martin Langhoff
2006-02-13 16:19 ` Linus Torvalds
2006-02-13 4:40 ` Martin Langhoff
2006-02-09 4:54 ` Greg KH
2006-02-09 5:38 ` Martin Langhoff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0602150904310.3691@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=benc@hawaga.org.uk \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).