From: Linus Torvalds <torvalds@osdl.org>
To: Josh Triplett <josh@freedesktop.org>
Cc: Junio C Hamano <junkio@cox.net>, git@vger.kernel.org
Subject: Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
Date: Mon, 23 Oct 2006 14:19:45 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0610231411200.3962@g5.osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0610231402560.3962@g5.osdl.org>
On Mon, 23 Oct 2006, Linus Torvalds wrote:
>
> Try it. The default "extreme" simplification is a _hell_ of a lot faster
> than doing the full history.
[ timings removed ]
Btw, the reason it is so much faster is that it can be done early, and
allows us to prune out parts of the history that we don't care about.
For example, when we hit a merge, and the result of that merge is
identical to one of the parents (in the set of filenames that we are
interested in), we can simply choose to totally ignore the other parent,
and we don't need to traverse that history at _all_. Because clearly, all
the actual _data_ came from just the other one.
So the "extreme" simplification is way way faster, because in the presense
of a lot of merges, it can select to go down just one of the paths, and
totally ignore the other ones. In practice, for a fairly "bushy" history
tree like the kernel, that can cut down the number of commits you need to
compare by a factor of two or more.
In many ways, it is also actually a _better_ result, in that it's a
"closer to minimal" way of reaching a particular state. So if you're just
interested in how something came to be, and want to just cut through the
crap, the result extreme simplification really _is_ better.
So the branches that were dismissed really _aren't_ important - they might
contain real work, but from the point of the end result, that real work
might as well not have happened, since the simpler history we chose _also_
explain the end result sufficiently.
So I think the default simplification is really a good default: not only
because it's fundamentally cheaper, but because it is actually more likely
to be distill what you actually care about if you wonder what happened to
a file or a set of files.
But if you care about all the "side efforts" that didn't actually matter
for the end result too, then you'd want the more expensive, and more
complete graph. But it _will_ be a lot more expensive to compute.
Linus
next prev parent reply other threads:[~2006-10-23 21:20 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-09-27 8:05 [RFC] git-split: Split the history of a git repository by subdirectories and ranges Josh Triplett
2006-09-27 10:13 ` Junio C Hamano
2006-09-27 11:59 ` Andy Whitcroft
2006-09-27 19:08 ` Junio C Hamano
2006-09-27 19:31 ` Junio C Hamano
2006-10-23 10:17 ` Josh Triplett
2006-10-23 15:52 ` Linus Torvalds
2006-10-23 19:27 ` Josh Triplett
2006-10-23 19:50 ` Linus Torvalds
2006-10-23 20:07 ` Jakub Narebski
2006-10-23 20:52 ` Josh Triplett
2006-10-23 21:06 ` Linus Torvalds
2006-10-23 21:19 ` Linus Torvalds [this message]
2006-10-24 14:56 ` Johannes Schindelin
2006-10-24 15:19 ` Linus Torvalds
2006-10-25 0:10 ` Junio C Hamano
2006-10-25 0:19 ` Jakub Narebski
2006-10-25 1:59 ` Josh Triplett
2006-10-25 2:13 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0610231411200.3962@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=git@vger.kernel.org \
--cc=josh@freedesktop.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).