From: Linus Torvalds <torvalds@osdl.org>
To: Josh Triplett <josh@freedesktop.org>
Cc: Junio C Hamano <junkio@cox.net>, git@vger.kernel.org
Subject: Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
Date: Mon, 23 Oct 2006 12:50:58 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0610231237080.3962@g5.osdl.org> (raw)
In-Reply-To: <453D17B5.6070203@freedesktop.org>
On Mon, 23 Oct 2006, Josh Triplett wrote:
>
> > Without the "--full-history", you get a simplified history, but it's
> > likely to be _too_ simplified for your use, since it will not only
> > collapse multiple identical parents, it will also totally _remove_ parents
> > that don't introduce any new content.
>
> Considering that git-split does exactly that (remove parents that don't
> introduce new content, assuming they changed things outside the
> subtree), that might actually work for us. I just checked, and the
> output of "git log --parents -- $project" on one of my repositories
> seems to show the same sequence of commits as git log --parents on the
> head commit printed by git-split $project (apart from the rewritten
> sha1s), including elimination of irrelevant merges.
Ok. In that case, you're good to go, and just use the current
simplification entirely.
Although I think that somebody (Dscho?) also had a patch to remove
multiple identical parents, which he claimed could happen with
simplification otherwise. I didn't look any closer at it.
> > So there are multiple levels of history simplification, and right now the
> > internal git revision parser only gives you two choices: "none"
> > (--full-history) and "extreme" (which is the default when you give a set
> > of filenames).
>
> I don't think we need any middle ground here; why might we want less
> simplification?
There's really three levels of simplification:
- none at all ("--full-history"). This is really annoying, but if you
want to guarantee that you see all the changes (even duplicate ones)
done along all branches, you currently need to do this one.
Currently "git whatchanged" uses this one (and that ignores merges by
default, making it quite palatable). So with "git whatchanged", you
will get _every_ commit that changed the file, even if there are
duplicates alogn different histories.
- extreme (the current default). This one is really nice, in that it
shows the simplest history you can make that explains the end result.
But it means that if you had two branches that ended up with the same
result, we will pick just one of them. And the other one may have done
it differently, and the different way of reaching the same result might
be interesting. We'll never know.
As an exmple: the extreme simplification can also throw away branches
that had work reverted on them - the branch ended up the _same_ as the
one we chose, but it did so because it had some experimental work that
was deemed to be bad. Extreme simplification may or may not remove that
experiment, simply depending on which branch it _happened_ to pick.
Currently, this is what most git users see if they ask for pathname
simplification, ie "gitk drivers/char" or "git log -p kernel/sched.c"
uses this simplification. It's extremely useful, but it definitely
culls real history too.
- The nice one that doesn't throw away potentially interesting
duplicate paths to reach the same end result. We don't have this one,
so no git commands do this yet.
The way to do this one would be "--full-history", but then removing all
parents that are "redundant". In other words, for any merge that
remains (because of the --full-history), check if one parent is a full
superset of another one, and if so, remove the "dominated" parent,
which simplifies the merge. Continue until nothing can be simplified
any more.
This would _usually_ end up giving the same graph as the "extreme"
simplification, but if there were two branches that really _did_
generate the same end result using different commits, they'd remain in
the end result.
The problem with the "nice one" is that it's expensive as hell. There may
be clever tricks to make it less so, though. But I think it's the
RightThing(tm) to do, at least as an option for when you really want to
see a reasonable history that still contains everything that is relevant.
Linus
next prev parent reply other threads:[~2006-10-23 19:52 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-09-27 8:05 [RFC] git-split: Split the history of a git repository by subdirectories and ranges Josh Triplett
2006-09-27 10:13 ` Junio C Hamano
2006-09-27 11:59 ` Andy Whitcroft
2006-09-27 19:08 ` Junio C Hamano
2006-09-27 19:31 ` Junio C Hamano
2006-10-23 10:17 ` Josh Triplett
2006-10-23 15:52 ` Linus Torvalds
2006-10-23 19:27 ` Josh Triplett
2006-10-23 19:50 ` Linus Torvalds [this message]
2006-10-23 20:07 ` Jakub Narebski
2006-10-23 20:52 ` Josh Triplett
2006-10-23 21:06 ` Linus Torvalds
2006-10-23 21:19 ` Linus Torvalds
2006-10-24 14:56 ` Johannes Schindelin
2006-10-24 15:19 ` Linus Torvalds
2006-10-25 0:10 ` Junio C Hamano
2006-10-25 0:19 ` Jakub Narebski
2006-10-25 1:59 ` Josh Triplett
2006-10-25 2:13 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0610231237080.3962@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=git@vger.kernel.org \
--cc=josh@freedesktop.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).