git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Josh Triplett <josh@freedesktop.org>
Cc: Junio C Hamano <junkio@cox.net>, git@vger.kernel.org
Subject: Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
Date: Mon, 23 Oct 2006 14:19:45 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0610231411200.3962@g5.osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0610231402560.3962@g5.osdl.org>



On Mon, 23 Oct 2006, Linus Torvalds wrote:
> 
> Try it. The default "extreme" simplification is a _hell_ of a lot faster 
> than doing the full history.
[ timings removed ]

Btw, the reason it is so much faster is that it can be done early, and 
allows us to prune out parts of the history that we don't care about.

For example, when we hit a merge, and the result of that merge is 
identical to one of the parents (in the set of filenames that we are 
interested in), we can simply choose to totally ignore the other parent, 
and we don't need to traverse that history at _all_. Because clearly, all 
the actual _data_ came from just the other one.

So the "extreme" simplification is way way faster, because in the presense 
of a lot of merges, it can select to go down just one of the paths, and 
totally ignore the other ones. In practice, for a fairly "bushy" history 
tree like the kernel, that can cut down the number of commits you need to 
compare by a factor of two or more.

In many ways, it is also actually a _better_ result, in that it's a 
"closer to minimal" way of reaching a particular state. So if you're just 
interested in how something came to be, and want to just cut through the 
crap, the result extreme simplification really _is_ better.

So the branches that were dismissed really _aren't_ important - they might 
contain real work, but from the point of the end result, that real work 
might as well not have happened, since the simpler history we chose _also_ 
explain the end result sufficiently.

So I think the default simplification is really a good default: not only 
because it's fundamentally cheaper, but because it is actually more likely 
to be distill what you actually care about if you wonder what happened to 
a file or a set of files.

But if you care about all the "side efforts" that didn't actually matter 
for the end result too, then you'd want the more expensive, and more 
complete graph. But it _will_ be a lot more expensive to compute.

		Linus

  reply	other threads:[~2006-10-23 21:20 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-27  8:05 [RFC] git-split: Split the history of a git repository by subdirectories and ranges Josh Triplett
2006-09-27 10:13 ` Junio C Hamano
2006-09-27 11:59   ` Andy Whitcroft
2006-09-27 19:08     ` Junio C Hamano
2006-09-27 19:31       ` Junio C Hamano
2006-10-23 10:17   ` Josh Triplett
2006-10-23 15:52     ` Linus Torvalds
2006-10-23 19:27       ` Josh Triplett
2006-10-23 19:50         ` Linus Torvalds
2006-10-23 20:07           ` Jakub Narebski
2006-10-23 20:52           ` Josh Triplett
2006-10-23 21:06             ` Linus Torvalds
2006-10-23 21:19               ` Linus Torvalds [this message]
2006-10-24 14:56           ` Johannes Schindelin
2006-10-24 15:19             ` Linus Torvalds
2006-10-25  0:10         ` Junio C Hamano
2006-10-25  0:19           ` Jakub Narebski
2006-10-25  1:59           ` Josh Triplett
2006-10-25  2:13             ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0610231411200.3962@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=josh@freedesktop.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).