git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Josh Triplett <josh@freedesktop.org>
Cc: Junio C Hamano <junkio@cox.net>, git@vger.kernel.org
Subject: Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
Date: Mon, 23 Oct 2006 12:50:58 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0610231237080.3962@g5.osdl.org> (raw)
In-Reply-To: <453D17B5.6070203@freedesktop.org>



On Mon, 23 Oct 2006, Josh Triplett wrote:
>
> > Without the "--full-history", you get a simplified history, but it's 
> > likely to be _too_ simplified for your use, since it will not only 
> > collapse multiple identical parents, it will also totally _remove_ parents 
> > that don't introduce any new content.
> 
> Considering that git-split does exactly that (remove parents that don't
> introduce new content, assuming they changed things outside the
> subtree), that might actually work for us.  I just checked, and the
> output of "git log --parents -- $project" on one of my repositories
> seems to show the same sequence of commits as git log --parents on the
> head commit printed by git-split $project (apart from the rewritten
> sha1s), including elimination of irrelevant merges.

Ok. In that case, you're good to go, and just use the current 
simplification entirely.

Although I think that somebody (Dscho?) also had a patch to remove 
multiple identical parents, which he claimed could happen with 
simplification otherwise. I didn't look any closer at it.

> > So there are multiple levels of history simplification, and right now the 
> > internal git revision parser only gives you two choices: "none" 
> > (--full-history) and "extreme" (which is the default when you give a set 
> > of filenames). 
> 
> I don't think we need any middle ground here; why might we want less
> simplification?

There's really three levels of simplification:

 - none at all ("--full-history"). This is really annoying, but if you 
   want to guarantee that you see all the changes (even duplicate ones) 
   done along all branches, you currently need to do this one.

   Currently "git whatchanged" uses this one (and that ignores merges by
   default, making it quite palatable). So with "git whatchanged", you 
   will get _every_ commit that changed the file, even if there are 
   duplicates alogn different histories.

 - extreme (the current default). This one is really nice, in that it 
   shows the simplest history you can make that explains the end result. 
   But it means that if you had two branches that ended up with the same 
   result, we will pick just one of them. And the other one may have done 
   it differently, and the different way of reaching the same result might 
   be interesting. We'll never know.

   As an exmple: the extreme simplification can also throw away branches 
   that had work reverted on them - the branch ended up the _same_ as the 
   one we chose, but it did so because it had some experimental work that 
   was deemed to be bad. Extreme simplification may or may not remove that 
   experiment, simply depending on which branch it _happened_ to pick.

   Currently, this is what most git users see if they ask for pathname 
   simplification, ie "gitk drivers/char" or "git log -p kernel/sched.c"
   uses this simplification. It's extremely useful, but it definitely 
   culls real history too.

 - The nice one that doesn't throw away potentially interesting 
   duplicate paths to reach the same end result. We don't have this one, 
   so no git commands do this yet.

   The way to do this one would be "--full-history", but then removing all 
   parents that are "redundant". In other words, for any merge that 
   remains (because of the --full-history), check if one parent is a full 
   superset of another one, and if so, remove the "dominated" parent, 
   which simplifies the merge. Continue until nothing can be simplified 
   any more.

   This would _usually_ end up giving the same graph as the "extreme" 
   simplification, but if there were two branches that really _did_ 
   generate the same end result using different commits, they'd remain in 
   the end result.

The problem with the "nice one" is that it's expensive as hell. There may 
be clever tricks to make it less so, though. But I think it's the 
RightThing(tm) to do, at least as an option for when you really want to 
see a reasonable history that still contains everything that is relevant.

			Linus

  reply	other threads:[~2006-10-23 19:52 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-27  8:05 [RFC] git-split: Split the history of a git repository by subdirectories and ranges Josh Triplett
2006-09-27 10:13 ` Junio C Hamano
2006-09-27 11:59   ` Andy Whitcroft
2006-09-27 19:08     ` Junio C Hamano
2006-09-27 19:31       ` Junio C Hamano
2006-10-23 10:17   ` Josh Triplett
2006-10-23 15:52     ` Linus Torvalds
2006-10-23 19:27       ` Josh Triplett
2006-10-23 19:50         ` Linus Torvalds [this message]
2006-10-23 20:07           ` Jakub Narebski
2006-10-23 20:52           ` Josh Triplett
2006-10-23 21:06             ` Linus Torvalds
2006-10-23 21:19               ` Linus Torvalds
2006-10-24 14:56           ` Johannes Schindelin
2006-10-24 15:19             ` Linus Torvalds
2006-10-25  0:10         ` Junio C Hamano
2006-10-25  0:19           ` Jakub Narebski
2006-10-25  1:59           ` Josh Triplett
2006-10-25  2:13             ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0610231237080.3962@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=josh@freedesktop.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).