git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] git-split: Split the history of a git repository by subdirectories and ranges
@ 2006-09-27  8:05 Josh Triplett
  2006-09-27 10:13 ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Josh Triplett @ 2006-09-27  8:05 UTC (permalink / raw)
  To: git; +Cc: Jamey Sharp


[-- Attachment #1.1: Type: text/plain, Size: 1334 bytes --]

Hello,

I co-maintain the X C Binding (XCB) project with Jamey Sharp.
Previously, several XCB-related projects all existed under the umbrella
of a single monolithic GIT repository with per-project subdirectories.
We have split this repository into individual per-project repositories.

Jamey Sharp and I wrote a script called git-split to accomplish this
repository split. git-split reconstructs the history of a sub-project
previously stored in a subdirectory of a larger repository. It
constructs new commit objects based on the existing tree objects for the
subtree in each commit, and discards commits which do not affect the
history of the sub-project, as well as merges made unnecessary due to
these discarded commits.  When git-split finishes, it will output the
sha1 for the new head commit, suitable for redirection into a file in
.git/refs/heads.  At that point, you can clone the new head, or copy the
repository and prune out undesired heads, tags, and objects.

I have attached git-split for review.  If the GIT community has any
interest in seeing git-split become a part of GIT, we can write up the
necessary documentation and patch.

We would like to acknowledge the work of the gobby team in creating a
collaborative editor which greatly aided the development of git-split.

- Josh Triplett

[-- Attachment #1.2: git-split --]
[-- Type: text/plain, Size: 4980 bytes --]

#!/usr/bin/python
# git-split: Split the history of a git repository by subdirectories and ranges
# Copyright (C) 2006 Jamey Sharp, Josh Triplett
#
# You can redistribute this software and/or modify it under the terms of
# the GNU General Public License as published by the Free Software
# Foundation; version 2 dated June, 1991.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# General Public License for more details.

from itertools import izip
from subprocess import Popen, PIPE
import os, sys

def run(cmd, stdin=None, env={}):
    newenv = os.environ.copy()
    newenv.update(env)
    return Popen(cmd, stdin=PIPE, stdout=PIPE, env=newenv).communicate(stdin)[0]

def parse_author_date(s):
    """Given a GIT author or committer string, return (name, email, date)"""
    (name, email, time, timezone) = s.rsplit(None, 3)
    return (name, email[1:-1], time + " " + timezone)

def get_subtree(tree, name):
    output = run(["git-ls-tree", tree, name])
    if not output:
        return None
    return output.split()[2]

def is_ancestor(new_commits, cur, other):
    """Return True if cur has other as an ancestor, or False otherwise."""
    return run(["git-merge-base", cur, other]).strip() == other

def walk(commits, new_commits, commit_hash, project):
    commit = commits[commit_hash]
    if not(commit.has_key("new_hash")):
        tree = get_subtree(commit["tree"], project)
        commit["new_tree"] = tree
        if not tree:
            raise Exception("Did not find project in tree for commit " + commit_hash)
        new_parents = list(set([walk(commits, new_commits, parent, project)
                                for parent in commit["parents"]]))

        new_hash = None
        if len(new_parents) == 1:
            new_hash = new_parents[0]
        elif len(new_parents) == 2: # Check for unnecessary merge
            if is_ancestor(new_commits, new_parents[0], new_parents[1]):
                new_hash = new_parents[0]
            elif is_ancestor(new_commits, new_parents[1], new_parents[0]):
                new_hash = new_parents[1]
        if new_hash and new_commits[new_hash]["new_tree"] != tree:
            new_hash = None

        if not new_hash:
            args = ["git-commit-tree", tree]
            for new_parent in new_parents:
                args.extend(["-p", new_parent])
            env = dict(zip(["GIT_AUTHOR_"+n for n in ["NAME", "EMAIL", "DATE"]],
                           parse_author_date(commit["author"]))
                       +zip(["GIT_COMMITTER_"+n for n in ["NAME", "EMAIL", "DATE"]],
                            parse_author_date(commit["committer"])))
            new_hash = run(args, commit["message"], env).strip()

        commit["new_parents"] = new_parents
        commit["new_hash"] = new_hash
        if new_hash not in new_commits:
            new_commits[new_hash] = commit
    return commit["new_hash"]

def main(args):
    if not(1 <= len(args) <= 3):
        print "Usage: git-split subdir [newest [oldest]]"
        return 1

    project = args[0]
    if len(args) > 1:
        newest = args[1]
    else:
        newest = "HEAD"
    newest_hash = run(["git-rev-parse", newest]).strip()
    if len(args) > 2:
        oldest = args[2]
        oldest_hash = run(["git-rev-parse", oldest]).strip()
    else:
        oldest_hash = None

    grafts = {}
    try:
        for line in file(".git/info/grafts").read().split("\n"):
            if line:
                child, parents = line.split(None, 1)
                parents = parents.split()
                grafts[child] = parents
    except IOError:
        pass

    temp = run(["git-log", "--pretty=raw", newest_hash]).split("\n\n")
    commits = {}
    for headers,message in izip(temp[0::2], temp[1::2]):
        commit = {}
        commit_hash = None
        headers = [header.split(None, 1) for header in headers.split("\n")]
        for key,value in headers:
            if key == "parent":
                commit.setdefault("parents", []).append(value)
            elif key == "commit":
                commit_hash = value
            else:
                if key in commit:
                    raise Exception('Duplicate key "%s"' % key)
                commit[key] = value
        commit["message"] = "".join([line[4:]+"\n"
                                      for line in message.split("\n")])
        if commit_hash is None:
            raise Exception("Commit without hash")
        if commit_hash in grafts:
            commit["parents"] = grafts[commit_hash]
        if commit_hash == oldest_hash or "parents" not in commit:
            commit["parents"] = []
        commits[commit_hash] = commit

    print walk(commits, dict(), newest_hash, project)

try:
    import psyco
    psyco.full()
except ImportError:
    pass

if __name__ == "__main__": sys.exit(main(sys.argv[1:]))

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
  2006-09-27  8:05 [RFC] git-split: Split the history of a git repository by subdirectories and ranges Josh Triplett
@ 2006-09-27 10:13 ` Junio C Hamano
  2006-09-27 11:59   ` Andy Whitcroft
  2006-10-23 10:17   ` Josh Triplett
  0 siblings, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2006-09-27 10:13 UTC (permalink / raw)
  To: Josh Triplett; +Cc: git

Josh Triplett <josh@freedesktop.org> writes:

> Jamey Sharp and I wrote a script called git-split to accomplish this
> repository split. git-split reconstructs the history of a sub-project
> previously stored in a subdirectory of a larger repository. It
> constructs new commit objects based on the existing tree objects for the
> subtree in each commit, and discards commits which do not affect the
> history of the sub-project, as well as merges made unnecessary due to
> these discarded commits.

Very nicely done.

> We would like to acknowledge the work of the gobby team in creating a
> collaborative editor which greatly aided the development of git-split.

> from itertools import izip
> from subprocess import Popen, PIPE
> import os, sys

How recent a Python are we assuming here?  Is late 2.4 recent
enough?

> def walk(commits, new_commits, commit_hash, project):
>     commit = commits[commit_hash]
>     if not(commit.has_key("new_hash")):
>         tree = get_subtree(commit["tree"], project)
>         commit["new_tree"] = tree
>         if not tree:
>             raise Exception("Did not find project in tree for commit " + commit_hash)
>         new_parents = list(set([walk(commits, new_commits, parent, project)
>                                 for parent in commit["parents"]]))
>
>         new_hash = None
>         if len(new_parents) == 1:
>             new_hash = new_parents[0]
>         elif len(new_parents) == 2: # Check for unnecessary merge
>             if is_ancestor(new_commits, new_parents[0], new_parents[1]):
>                 new_hash = new_parents[0]
>             elif is_ancestor(new_commits, new_parents[1], new_parents[0]):
>                 new_hash = new_parents[1]
>         if new_hash and new_commits[new_hash]["new_tree"] != tree:
>             new_hash = None

This is a real gem.  I really like reading well-written Python
programs.

When git-rev-list (or "git-log --pretty=raw" that you use in
your main()) simplifies the merge history based on subtree, we
look at the merge and if the tree matches any of the parent we
discard other parents and make the history a single strand of
pearls.  However for this application that is not what you want,
so I can see why you run full "git-log" and prune history by
hand here like this.

I wonder if using "git-log --full-history -- $project" to let
the core side omit commits that do not change the $project (but
still give you all merged branches) would have made your job any
easier?

You are handling grafts by hand because --pretty=raw is special
in that it displays the real parents (although traversal does
use grafts).  Maybe it would have helped if we had a --pretty
format that is similar to raw but rewrites the parents?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
  2006-09-27 10:13 ` Junio C Hamano
@ 2006-09-27 11:59   ` Andy Whitcroft
  2006-09-27 19:08     ` Junio C Hamano
  2006-10-23 10:17   ` Josh Triplett
  1 sibling, 1 reply; 19+ messages in thread
From: Andy Whitcroft @ 2006-09-27 11:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josh Triplett, git

Junio C Hamano wrote:
> Josh Triplett <josh@freedesktop.org> writes:
> 
>> Jamey Sharp and I wrote a script called git-split to accomplish this
>> repository split. git-split reconstructs the history of a sub-project
>> previously stored in a subdirectory of a larger repository. It
>> constructs new commit objects based on the existing tree objects for the
>> subtree in each commit, and discards commits which do not affect the
>> history of the sub-project, as well as merges made unnecessary due to
>> these discarded commits.
> 
> Very nicely done.
> 
>> We would like to acknowledge the work of the gobby team in creating a
>> collaborative editor which greatly aided the development of git-split.
> 
>> from itertools import izip
>> from subprocess import Popen, PIPE
>> import os, sys
> 
> How recent a Python are we assuming here?  Is late 2.4 recent
> enough?
> 
>> def walk(commits, new_commits, commit_hash, project):
>>     commit = commits[commit_hash]
>>     if not(commit.has_key("new_hash")):
>>         tree = get_subtree(commit["tree"], project)
>>         commit["new_tree"] = tree
>>         if not tree:
>>             raise Exception("Did not find project in tree for commit " + commit_hash)
>>         new_parents = list(set([walk(commits, new_commits, parent, project)
>>                                 for parent in commit["parents"]]))
>>
>>         new_hash = None
>>         if len(new_parents) == 1:
>>             new_hash = new_parents[0]
>>         elif len(new_parents) == 2: # Check for unnecessary merge
>>             if is_ancestor(new_commits, new_parents[0], new_parents[1]):
>>                 new_hash = new_parents[0]
>>             elif is_ancestor(new_commits, new_parents[1], new_parents[0]):
>>                 new_hash = new_parents[1]
>>         if new_hash and new_commits[new_hash]["new_tree"] != tree:
>>             new_hash = None
> 
> This is a real gem.  I really like reading well-written Python
> programs.
> 
> When git-rev-list (or "git-log --pretty=raw" that you use in
> your main()) simplifies the merge history based on subtree, we
> look at the merge and if the tree matches any of the parent we
> discard other parents and make the history a single strand of
> pearls.  However for this application that is not what you want,
> so I can see why you run full "git-log" and prune history by
> hand here like this.
> 
> I wonder if using "git-log --full-history -- $project" to let
> the core side omit commits that do not change the $project (but
> still give you all merged branches) would have made your job any
> easier?
> 
> You are handling grafts by hand because --pretty=raw is special
> in that it displays the real parents (although traversal does
> use grafts).  Maybe it would have helped if we had a --pretty
> format that is similar to raw but rewrites the parents?

I have wondered recently why grafts are hidden in this way.  I feel they
are something I want to know is occuring in my history as this history
is being manipulated.  Perhaps we could emit a graft record in the
output, indeed have a graft object there?  Someone could commit a
change, then graft over it in the history so I'd not see it even though
its in my working copy.

For instance in my historical git tree I have the following, note the
lack of a parent.  If I move the graft up one commit, then I get a
parent, but not a parent that points at the next commit.

  commit 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2
  tree 0bba044c4ce775e45a88a51686b5d9f90697ea9d
  author Linus Torvalds <torvalds@ppc970.osdl.org> 1113690036 -0700
  committer Linus Torvalds <torvalds@ppc970.osdl.org> 1113690036 -0700

  [...]

  commit e7e173af42dbf37b1d946f9ee00219cb3b2bea6a
  tree 0bba044c4ce775e45a88a51686b5d9f90697ea9d
  parent 607899e17218b485a021c6ebb1cff771fd690eec
  author Linus Torvalds <torvalds@ppc970.osdl.org> 1112580513 -0700
  committer Linus Torvalds <torvalds@ppc970.osdl.org> 1112580513 -0700

It might be nice to have it more like the following, with a graft in
there, N*40 would obviously be fakes in the first instance as the object
isn't modified.  M*40 would refer to the old parent if there was one,
else NONE or 0*40 perhaps.

  commit 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2
  tree 0bba044c4ce775e45a88a51686b5d9f90697ea9d
  parent NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
  author Linus Torvalds <torvalds@ppc970.osdl.org> 1113690036 -0700
  committer Linus Torvalds <torvalds@ppc970.osdl.org> 1113690036 -0700

  [...]

  graft NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
  parent e7e173af42dbf37b1d946f9ee00219cb3b2bea6a

    OLD: MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM

  commit e7e173af42dbf37b1d946f9ee00219cb3b2bea6a
  tree 0bba044c4ce775e45a88a51686b5d9f90697ea9d
  parent 607899e17218b485a021c6ebb1cff771fd690eec
  author Linus Torvalds <torvalds@ppc970.osdl.org> 1112580513 -0700
  committer Linus Torvalds <torvalds@ppc970.osdl.org> 1112580513 -0700

I guess the other option would be to annotate the previous commit,
perhaps on the parent line so we can see the 'right' data in the normal
place, but the overridden data is right there and grepable.

  parent e7e173af42dbf37b1d946f9ee00219cb3b2bea6a GRAFT M*40
or
  parent e7e173af42dbf37b1d946f9ee00219cb3b2bea6a GRAFT NONE

-apw

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
  2006-09-27 11:59   ` Andy Whitcroft
@ 2006-09-27 19:08     ` Junio C Hamano
  2006-09-27 19:31       ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2006-09-27 19:08 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Josh Triplett, git

Andy Whitcroft <apw@shadowen.org> writes:

>> You are handling grafts by hand because --pretty=raw is special
>> in that it displays the real parents (although traversal does
>> use grafts).  Maybe it would have helped if we had a --pretty
>> format that is similar to raw but rewrites the parents?
>
> I have wondered recently why grafts are hidden in this way.  I feel they
> are something I want to know is occuring in my history as this history
> is being manipulated.

Just to make sure we are on the same page, only "raw" format
output is special and it is special only on output.  Ancestry
traversal always honors what you have in grafts.

However, you can do:

$ git rev-list --parents --pretty=raw

which would give you "commit $this_commit $its $parents" lines
and "parent $true_parent" lines at the same time.

And they will be inconsistent when you have grafts or path
limiter.  The former honor grafts and path limiter, and the
latter show the true set of parents.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
  2006-09-27 19:08     ` Junio C Hamano
@ 2006-09-27 19:31       ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2006-09-27 19:31 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Josh Triplett, git

Junio C Hamano <junkio@cox.net> writes:

> However, you can do:
>
> $ git rev-list --parents --pretty=raw
>
> which would give you "commit $this_commit $its $parents" lines
> and "parent $true_parent" lines at the same time.
>
> And they will be inconsistent when you have grafts or path
> limiter.  The former honor grafts and path limiter, and the
> latter show the true set of parents.

-- >8 --
An illustration of rev-list --parents --pretty=raw

This script creates two separate histories, A and B, each of
which does:

      (A0, B0): create fileA and subdir/fileB
      (A1, B1): modify fileA
      (A2, B2): modify subdir/fileB

and then grafts them together to make B0 a child of A2.  So
the final history looks like (time flows from top to bottom):

		true parent	touches subdir?

	A0	none		yes (creates it)
        A1      A0		no
        A2	A1		yes
        B0	none		yes (different from what's in A2)
        B1	B0		no
        B2	B1		yes

"git rev-list --parents --pretty=raw B2" would give "fake"
parents on the "commit " header lines while "parent " header
lines show the parent as recorded in the commit object (i.e. B0
appears to have A2 as its parent on "commit " header but there
is no "parent A2" header line in it).

When you have path limiters, we simplify history to omit
commits that do not affect the specified paths.  

So "git rev-list --parents --pretty=raw B2 subdir" would return
"B2 B0 A2 A0" (because B1 and A1 do not touch the path).  When
it does so, the "commit " header lines have "fake" parents
(i.e. B2 appears to have B0 as its parent on "commit " header),
but you can still get the true parents by looking at "parent "
header.

---
diff --git a/t/t6001-rev-list-graft.sh b/t/t6001-rev-list-graft.sh
new file mode 100755
index 0000000..08a6cff
--- /dev/null
+++ b/t/t6001-rev-list-graft.sh
@@ -0,0 +1,113 @@
+#!/bin/sh
+
+test_description='Revision traversal vs grafts and path limiter'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+	mkdir subdir &&
+	echo >fileA fileA &&
+	echo >subdir/fileB fileB &&
+	git add fileA subdir/fileB &&
+	git commit -a -m "Initial in one history." &&
+	A0=`git rev-parse --verify HEAD` &&
+
+	echo >fileA fileA modified &&
+	git commit -a -m "Second in one history." &&
+	A1=`git rev-parse --verify HEAD` &&
+
+	echo >subdir/fileB fileB modified &&
+	git commit -a -m "Third in one history." &&
+	A2=`git rev-parse --verify HEAD` &&
+
+	rm -f .git/refs/heads/master .git/index &&
+	
+	echo >fileA fileA again &&
+	echo >subdir/fileB fileB again &&
+	git add fileA subdir/fileB &&
+	git commit -a -m "Initial in alternate history." &&
+	B0=`git rev-parse --verify HEAD` &&
+
+	echo >fileA fileA modified in alternate history &&
+	git commit -a -m "Second in alternate history." &&
+	B1=`git rev-parse --verify HEAD` &&
+
+	echo >subdir/fileB fileB modified in alternate history &&
+	git commit -a -m "Third in alternate history." &&
+	B2=`git rev-parse --verify HEAD` &&
+	: done
+'
+
+check () {
+	type=$1
+	shift
+
+	arg=
+	which=arg
+	rm -f test.expect
+	for a
+	do
+		if test "z$a" = z--
+		then
+			which=expect
+			child=
+			continue
+		fi
+		if test "$which" = arg
+		then
+			arg="$arg$a "
+			continue
+		fi
+		if test "$type" = basic
+		then
+			echo "$a"
+		else
+			if test "z$child" != z
+			then
+				echo "$child $a"
+			fi
+			child="$a"
+		fi
+	done >test.expect
+	if test "$type" != basic && test "z$child" != z
+	then
+		echo >>test.expect $child
+	fi
+	if test $type = basic
+	then
+		git rev-list $arg >test.actual
+	elif test $type = parents
+	then
+		git rev-list --parents $arg >test.actual
+	elif test $type = parents-raw
+	then
+		git rev-list --parents --pretty=raw $arg |
+		sed -n -e 's/^commit //p' >test.actual
+	fi
+	diff test.expect test.actual
+}
+
+for type in basic parents parents-raw
+do
+	test_expect_success 'without grafts' "
+		rm -f .git/info/grafts
+		check $type $B2 -- $B2 $B1 $B0
+	"
+
+	test_expect_success 'with grafts' "
+		echo '$B0 $A2' >.git/info/grafts
+		check $type $B2 -- $B2 $B1 $B0 $A2 $A1 $A0
+	"
+
+	test_expect_success 'without grafts, with pathlimit' "
+		rm -f .git/info/grafts
+		check $type $B2 subdir -- $B2 $B0
+	"
+
+	test_expect_success 'with grafts, with pathlimit' "
+		echo '$B0 $A2' >.git/info/grafts
+		check $type $B2 subdir -- $B2 $B0 $A2 $A0
+	"
+
+done
+test_done

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
  2006-09-27 10:13 ` Junio C Hamano
  2006-09-27 11:59   ` Andy Whitcroft
@ 2006-10-23 10:17   ` Josh Triplett
  2006-10-23 15:52     ` Linus Torvalds
  1 sibling, 1 reply; 19+ messages in thread
From: Josh Triplett @ 2006-10-23 10:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3301 bytes --]

Junio C Hamano wrote:
> Josh Triplett <josh@freedesktop.org> writes:
>> Jamey Sharp and I wrote a script called git-split to accomplish this
>> repository split. git-split reconstructs the history of a sub-project
>> previously stored in a subdirectory of a larger repository. It
>> constructs new commit objects based on the existing tree objects for the
>> subtree in each commit, and discards commits which do not affect the
>> history of the sub-project, as well as merges made unnecessary due to
>> these discarded commits.
> 
> Very nicely done.

Thanks!

>> We would like to acknowledge the work of the gobby team in creating a
>> collaborative editor which greatly aided the development of git-split.
> 
>> from itertools import izip
>> from subprocess import Popen, PIPE
>> import os, sys
> 
> How recent a Python are we assuming here?  Is late 2.4 recent
> enough?

We ran it with 2.4, so yes.  git-split does require at least 2.4,
though, because it uses set(), str.rsplit(), and subprocess, none of
which existed in 2.3.

>> def walk(commits, new_commits, commit_hash, project):
>>     commit = commits[commit_hash]
>>     if not(commit.has_key("new_hash")):
>>         tree = get_subtree(commit["tree"], project)
>>         commit["new_tree"] = tree
>>         if not tree:
>>             raise Exception("Did not find project in tree for commit " + commit_hash)
>>         new_parents = list(set([walk(commits, new_commits, parent, project)
>>                                 for parent in commit["parents"]]))
>>
>>         new_hash = None
>>         if len(new_parents) == 1:
>>             new_hash = new_parents[0]
>>         elif len(new_parents) == 2: # Check for unnecessary merge
>>             if is_ancestor(new_commits, new_parents[0], new_parents[1]):
>>                 new_hash = new_parents[0]
>>             elif is_ancestor(new_commits, new_parents[1], new_parents[0]):
>>                 new_hash = new_parents[1]
>>         if new_hash and new_commits[new_hash]["new_tree"] != tree:
>>             new_hash = None
> 
> This is a real gem.  I really like reading well-written Python
> programs.

Thanks.  We had some fun writing this; git's elegant repository
structure made it a joy to work with.

> I wonder if using "git-log --full-history -- $project" to let
> the core side omit commits that do not change the $project (but
> still give you all merged branches) would have made your job any
> easier?

I don't think it would.  We still need to know what commit to use as the
parent of any given commit, so we don't want commits in the log output
with parents that don't exist in the log output.  And rewriting parents
in git-log based on which revisions change the specified subdirectory
seems like a bad idea.

> You are handling grafts by hand because --pretty=raw is special
> in that it displays the real parents (although traversal does
> use grafts).  Maybe it would have helped if we had a --pretty
> format that is similar to raw but rewrites the parents?

Yes, that would help.  We could then avoid dealing with grafts manually.


How would you feel about including git-split in the GIT tree?  We could
certainly write up the necessary documentation for it.

- Josh Triplett


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
  2006-10-23 10:17   ` Josh Triplett
@ 2006-10-23 15:52     ` Linus Torvalds
  2006-10-23 19:27       ` Josh Triplett
  0 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2006-10-23 15:52 UTC (permalink / raw)
  To: Josh Triplett; +Cc: Junio C Hamano, git



On Mon, 23 Oct 2006, Josh Triplett wrote:
> 
> > I wonder if using "git-log --full-history -- $project" to let the core 
> > side omit commits that do not change the $project (but still give you 
> > all merged branches) would have made your job any easier?
> 
> I don't think it would.  We still need to know what commit to use as the
> parent of any given commit, so we don't want commits in the log output
> with parents that don't exist in the log output.  And rewriting parents
> in git-log based on which revisions change the specified subdirectory
> seems like a bad idea.

Umm.. You didn't realize that git log already _does_ exactly that?

You need to rewrite the parents in order to get a nice and readable 
history, which in turn is needed for any visualizer. So git has long done 
the parent rewriting in order to be able to do things like

	gitk drivers/char

on the kernel.

And yes, that's done by the core revision parsing code, so when you do

	git log --full-history --parents -- $project

you do get the rewritten parent output (of course, it's not actually 
_simplified_, so you get a fair amount of duplicate parents etc which 
you'd still have to simplify and which don't do anything at all).

Without the "--full-history", you get a simplified history, but it's 
likely to be _too_ simplified for your use, since it will not only 
collapse multiple identical parents, it will also totally _remove_ parents 
that don't introduce any new content.

So there are multiple levels of history simplification, and right now the 
internal git revision parser only gives you two choices: "none" 
(--full-history) and "extreme" (which is the default when you give a set 
of filenames). 

			Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
  2006-10-23 15:52     ` Linus Torvalds
@ 2006-10-23 19:27       ` Josh Triplett
  2006-10-23 19:50         ` Linus Torvalds
  2006-10-25  0:10         ` Junio C Hamano
  0 siblings, 2 replies; 19+ messages in thread
From: Josh Triplett @ 2006-10-23 19:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 2586 bytes --]

Linus Torvalds wrote:
> 
> On Mon, 23 Oct 2006, Josh Triplett wrote:
>>> I wonder if using "git-log --full-history -- $project" to let the core 
>>> side omit commits that do not change the $project (but still give you 
>>> all merged branches) would have made your job any easier?
>> I don't think it would.  We still need to know what commit to use as the
>> parent of any given commit, so we don't want commits in the log output
>> with parents that don't exist in the log output.  And rewriting parents
>> in git-log based on which revisions change the specified subdirectory
>> seems like a bad idea.
> 
> Umm.. You didn't realize that git log already _does_ exactly that?

No, I didn't, primarily because the git log output I've scrutinized most
carefully came from git log --pretty=raw, which doesn't rewrite parents
even when pointed at a subdirectory.

> You need to rewrite the parents in order to get a nice and readable 
> history, which in turn is needed for any visualizer. So git has long done 
> the parent rewriting in order to be able to do things like
> 
> 	gitk drivers/char
> 
> on the kernel.
>
> And yes, that's done by the core revision parsing code, so when you do
> 
> 	git log --full-history --parents -- $project
> 
> you do get the rewritten parent output (of course, it's not actually 
> _simplified_, so you get a fair amount of duplicate parents etc which 
> you'd still have to simplify and which don't do anything at all).
> 
> Without the "--full-history", you get a simplified history, but it's 
> likely to be _too_ simplified for your use, since it will not only 
> collapse multiple identical parents, it will also totally _remove_ parents 
> that don't introduce any new content.

Considering that git-split does exactly that (remove parents that don't
introduce new content, assuming they changed things outside the
subtree), that might actually work for us.  I just checked, and the
output of "git log --parents -- $project" on one of my repositories
seems to show the same sequence of commits as git log --parents on the
head commit printed by git-split $project (apart from the rewritten
sha1s), including elimination of irrelevant merges.

> So there are multiple levels of history simplification, and right now the 
> internal git revision parser only gives you two choices: "none" 
> (--full-history) and "extreme" (which is the default when you give a set 
> of filenames). 

I don't think we need any middle ground here; why might we want less
simplification?

- Josh Triplett



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
  2006-10-23 19:27       ` Josh Triplett
@ 2006-10-23 19:50         ` Linus Torvalds
  2006-10-23 20:07           ` Jakub Narebski
                             ` (2 more replies)
  2006-10-25  0:10         ` Junio C Hamano
  1 sibling, 3 replies; 19+ messages in thread
From: Linus Torvalds @ 2006-10-23 19:50 UTC (permalink / raw)
  To: Josh Triplett; +Cc: Junio C Hamano, git



On Mon, 23 Oct 2006, Josh Triplett wrote:
>
> > Without the "--full-history", you get a simplified history, but it's 
> > likely to be _too_ simplified for your use, since it will not only 
> > collapse multiple identical parents, it will also totally _remove_ parents 
> > that don't introduce any new content.
> 
> Considering that git-split does exactly that (remove parents that don't
> introduce new content, assuming they changed things outside the
> subtree), that might actually work for us.  I just checked, and the
> output of "git log --parents -- $project" on one of my repositories
> seems to show the same sequence of commits as git log --parents on the
> head commit printed by git-split $project (apart from the rewritten
> sha1s), including elimination of irrelevant merges.

Ok. In that case, you're good to go, and just use the current 
simplification entirely.

Although I think that somebody (Dscho?) also had a patch to remove 
multiple identical parents, which he claimed could happen with 
simplification otherwise. I didn't look any closer at it.

> > So there are multiple levels of history simplification, and right now the 
> > internal git revision parser only gives you two choices: "none" 
> > (--full-history) and "extreme" (which is the default when you give a set 
> > of filenames). 
> 
> I don't think we need any middle ground here; why might we want less
> simplification?

There's really three levels of simplification:

 - none at all ("--full-history"). This is really annoying, but if you 
   want to guarantee that you see all the changes (even duplicate ones) 
   done along all branches, you currently need to do this one.

   Currently "git whatchanged" uses this one (and that ignores merges by
   default, making it quite palatable). So with "git whatchanged", you 
   will get _every_ commit that changed the file, even if there are 
   duplicates alogn different histories.

 - extreme (the current default). This one is really nice, in that it 
   shows the simplest history you can make that explains the end result. 
   But it means that if you had two branches that ended up with the same 
   result, we will pick just one of them. And the other one may have done 
   it differently, and the different way of reaching the same result might 
   be interesting. We'll never know.

   As an exmple: the extreme simplification can also throw away branches 
   that had work reverted on them - the branch ended up the _same_ as the 
   one we chose, but it did so because it had some experimental work that 
   was deemed to be bad. Extreme simplification may or may not remove that 
   experiment, simply depending on which branch it _happened_ to pick.

   Currently, this is what most git users see if they ask for pathname 
   simplification, ie "gitk drivers/char" or "git log -p kernel/sched.c"
   uses this simplification. It's extremely useful, but it definitely 
   culls real history too.

 - The nice one that doesn't throw away potentially interesting 
   duplicate paths to reach the same end result. We don't have this one, 
   so no git commands do this yet.

   The way to do this one would be "--full-history", but then removing all 
   parents that are "redundant". In other words, for any merge that 
   remains (because of the --full-history), check if one parent is a full 
   superset of another one, and if so, remove the "dominated" parent, 
   which simplifies the merge. Continue until nothing can be simplified 
   any more.

   This would _usually_ end up giving the same graph as the "extreme" 
   simplification, but if there were two branches that really _did_ 
   generate the same end result using different commits, they'd remain in 
   the end result.

The problem with the "nice one" is that it's expensive as hell. There may 
be clever tricks to make it less so, though. But I think it's the 
RightThing(tm) to do, at least as an option for when you really want to 
see a reasonable history that still contains everything that is relevant.

			Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
  2006-10-23 19:50         ` Linus Torvalds
@ 2006-10-23 20:07           ` Jakub Narebski
  2006-10-23 20:52           ` Josh Triplett
  2006-10-24 14:56           ` Johannes Schindelin
  2 siblings, 0 replies; 19+ messages in thread
From: Jakub Narebski @ 2006-10-23 20:07 UTC (permalink / raw)
  To: git

There is also not-that-obvious result that

  git rev-log --parents --full-history <head> -- <pathspec> 

generates different result than if either --parents or --full-history are
absent.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
  2006-10-23 19:50         ` Linus Torvalds
  2006-10-23 20:07           ` Jakub Narebski
@ 2006-10-23 20:52           ` Josh Triplett
  2006-10-23 21:06             ` Linus Torvalds
  2006-10-24 14:56           ` Johannes Schindelin
  2 siblings, 1 reply; 19+ messages in thread
From: Josh Triplett @ 2006-10-23 20:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 2415 bytes --]

Linus Torvalds wrote:
> On Mon, 23 Oct 2006, Josh Triplett wrote:
>  - The nice one that doesn't throw away potentially interesting 
>    duplicate paths to reach the same end result. We don't have this one, 
>    so no git commands do this yet.
> 
>    The way to do this one would be "--full-history", but then removing all 
>    parents that are "redundant". In other words, for any merge that 
>    remains (because of the --full-history), check if one parent is a full 
>    superset of another one, and if so, remove the "dominated" parent, 
>    which simplifies the merge. Continue until nothing can be simplified 
>    any more.
> 
>    This would _usually_ end up giving the same graph as the "extreme" 
>    simplification, but if there were two branches that really _did_ 
>    generate the same end result using different commits, they'd remain in 
>    the end result.
> 
> The problem with the "nice one" is that it's expensive as hell. There may 
> be clever tricks to make it less so, though. But I think it's the 
> RightThing(tm) to do, at least as an option for when you really want to 
> see a reasonable history that still contains everything that is relevant.

So, if a commit has more than one parent (a merge), you want to
eliminate any parents that end up as ancestors to other parents in the
merge (including if their head has the same commit ID), but not
eliminate multiple parents with different head commits but the same tree
object?  That seems simple enough; I *think* git-split actually already
does that, though I haven't actually tested that particular case.  If
git log eliminates all but one of the parents with different commits but
the same tree, I believe the commit sequence generated by git-split will
differ from that of git log in that case, by including all such parents.

I do agree that the behavior you describe seems like the best
simplification, and I don't think the alternative you describe as
"extreme simplification" makes any sense at all (picking a parent
arbitrarily), nor does it seem any simpler to generate; either way, you
still have to figure out if one parent has another as an ancestor, while
the additional "extreme simplification" just *adds* a comparison of tree
hashes.

Or have I misunderstood the case you have concerns about?  Why would the
"nice" format incur additional cost?

- Josh Triplett



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
  2006-10-23 20:52           ` Josh Triplett
@ 2006-10-23 21:06             ` Linus Torvalds
  2006-10-23 21:19               ` Linus Torvalds
  0 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2006-10-23 21:06 UTC (permalink / raw)
  To: Josh Triplett; +Cc: Junio C Hamano, git



On Mon, 23 Oct 2006, Josh Triplett wrote:
> 
> Or have I misunderstood the case you have concerns about?  Why would the
> "nice" format incur additional cost?

Try it. The default "extreme" simplification is a _hell_ of a lot faster 
than doing the full history.

	[torvalds@g5 linux]$ time git-rev-list --full-history --parents HEAD -- kernel/sched.c >/dev/null

	real    0m4.660s
	user    0m4.612s
	sys     0m0.044s

	[torvalds@g5 linux]$ time git-rev-list --parents HEAD -- kernel/sched.c >/dev/null
	
	real    0m1.684s
	user    0m1.680s
	sys     0m0.004s

and the "nice" thing will be much slower still: just trying to figure out 
whether a commit is a parent of another commit is expensive. Doing so for 
_each_ merge is more expensive still. I think it's O(n^3), but what do I 
know..

			Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
  2006-10-23 21:06             ` Linus Torvalds
@ 2006-10-23 21:19               ` Linus Torvalds
  0 siblings, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2006-10-23 21:19 UTC (permalink / raw)
  To: Josh Triplett; +Cc: Junio C Hamano, git



On Mon, 23 Oct 2006, Linus Torvalds wrote:
> 
> Try it. The default "extreme" simplification is a _hell_ of a lot faster 
> than doing the full history.
[ timings removed ]

Btw, the reason it is so much faster is that it can be done early, and 
allows us to prune out parts of the history that we don't care about.

For example, when we hit a merge, and the result of that merge is 
identical to one of the parents (in the set of filenames that we are 
interested in), we can simply choose to totally ignore the other parent, 
and we don't need to traverse that history at _all_. Because clearly, all 
the actual _data_ came from just the other one.

So the "extreme" simplification is way way faster, because in the presense 
of a lot of merges, it can select to go down just one of the paths, and 
totally ignore the other ones. In practice, for a fairly "bushy" history 
tree like the kernel, that can cut down the number of commits you need to 
compare by a factor of two or more.

In many ways, it is also actually a _better_ result, in that it's a 
"closer to minimal" way of reaching a particular state. So if you're just 
interested in how something came to be, and want to just cut through the 
crap, the result extreme simplification really _is_ better.

So the branches that were dismissed really _aren't_ important - they might 
contain real work, but from the point of the end result, that real work 
might as well not have happened, since the simpler history we chose _also_ 
explain the end result sufficiently.

So I think the default simplification is really a good default: not only 
because it's fundamentally cheaper, but because it is actually more likely 
to be distill what you actually care about if you wonder what happened to 
a file or a set of files.

But if you care about all the "side efforts" that didn't actually matter 
for the end result too, then you'd want the more expensive, and more 
complete graph. But it _will_ be a lot more expensive to compute.

		Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
  2006-10-23 19:50         ` Linus Torvalds
  2006-10-23 20:07           ` Jakub Narebski
  2006-10-23 20:52           ` Josh Triplett
@ 2006-10-24 14:56           ` Johannes Schindelin
  2006-10-24 15:19             ` Linus Torvalds
  2 siblings, 1 reply; 19+ messages in thread
From: Johannes Schindelin @ 2006-10-24 14:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Josh Triplett, Junio C Hamano, git

Hi,

On Mon, 23 Oct 2006, Linus Torvalds wrote:

> Although I think that somebody (Dscho?) also had a patch to remove 
> multiple identical parents, which he claimed could happen with 
> simplification otherwise. I didn't look any closer at it.

IIRC It only happened when full history was wished for, _and_ path 
limiting. And Junio said that in that case, culling identical parents 
would be the wrong thing to do.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
  2006-10-24 14:56           ` Johannes Schindelin
@ 2006-10-24 15:19             ` Linus Torvalds
  0 siblings, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2006-10-24 15:19 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Josh Triplett, Junio C Hamano, git



On Tue, 24 Oct 2006, Johannes Schindelin wrote:
> 
> On Mon, 23 Oct 2006, Linus Torvalds wrote:
> 
> > Although I think that somebody (Dscho?) also had a patch to remove 
> > multiple identical parents, which he claimed could happen with 
> > simplification otherwise. I didn't look any closer at it.
> 
> IIRC It only happened when full history was wished for, _and_ path 
> limiting. And Junio said that in that case, culling identical parents 
> would be the wrong thing to do.

Yeah, with full history, you might as well keep the trivially identical 
parents too. In the "nice rewrite", you'd get rid of them, but you'd get 
rid of them not because they are trivially identical, but because they are 
just the trivial case of "one parent totally dominates the other".

Sadly, while it may be the _trivial_ case in --full-history, it's probably 
not even the common case, at least if you have lots of merges (because you 
end up having one parent point to one merge, and the other to another, and 
you need to simplify things in multiple passes).

			Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
  2006-10-23 19:27       ` Josh Triplett
  2006-10-23 19:50         ` Linus Torvalds
@ 2006-10-25  0:10         ` Junio C Hamano
  2006-10-25  0:19           ` Jakub Narebski
  2006-10-25  1:59           ` Josh Triplett
  1 sibling, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2006-10-25  0:10 UTC (permalink / raw)
  To: git; +Cc: Josh Triplett

Josh Triplett <josh@freedesktop.org> writes:

> Linus Torvalds wrote:
>> 
>> And yes, that's done by the core revision parsing code, so when you do
>> 
>> 	git log --full-history --parents -- $project
>> 
>> you do get the rewritten parent output (of course, it's not actually 
>> _simplified_, so you get a fair amount of duplicate parents etc which 
>> you'd still have to simplify and which don't do anything at all).
>> 
>> Without the "--full-history", you get a simplified history, but it's 
>> likely to be _too_ simplified for your use, since it will not only 
>> collapse multiple identical parents, it will also totally _remove_ parents 
>> that don't introduce any new content.
>
> Considering that git-split does exactly that (remove parents that don't
> introduce new content, assuming they changed things outside the
> subtree), that might actually work for us.  I just checked, and the
> output of "git log --parents -- $project" on one of my repositories
> seems to show the same sequence of commits as git log --parents on the
> head commit printed by git-split $project (apart from the rewritten
> sha1s), including elimination of irrelevant merges.

So one potential action item that came out from this discussion
for me is to either modify --pretty=raw (or add --pretty=rawish)
that gives the rewritten parents instead of real parents?  With
that, you can drop the code to simplify ancestry by hand in your
loop, and also you do not have to do the grafts inforamation
yourself either?

If that is the case I'd be very happy.

The only thing left for us to decide is if reporting the true
parenthood like the current --pretty=raw makes sense (if so we
need to keep it and introduce --pretty=rawfish).

The only in-tree user of --pretty=raw seems to be git-svn but it
only looks at path-unlimited log/rev-list from one given commit,
so the only difference between dumping what is recorded in the
commit object and listing what parents we _think_ the commit has
is what we read from grafts.  I think we are safe to just "fix"
the behaviour of --pretty=raw

Comments?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
  2006-10-25  0:10         ` Junio C Hamano
@ 2006-10-25  0:19           ` Jakub Narebski
  2006-10-25  1:59           ` Josh Triplett
  1 sibling, 0 replies; 19+ messages in thread
From: Jakub Narebski @ 2006-10-25  0:19 UTC (permalink / raw)
  To: git

Junio C Hamano wrote:

> So one potential action item that came out from this discussion
> for me is to either modify --pretty=raw (or add --pretty=rawish)
> that gives the rewritten parents instead of real parents?  With
> that, you can drop the code to simplify ancestry by hand in your
> loop, and also you do not have to do the grafts inforamation
> yourself either?
> 
> If that is the case I'd be very happy.
> 
> The only thing left for us to decide is if reporting the true
> parenthood like the current --pretty=raw makes sense (if so we
> need to keep it and introduce --pretty=rawfish).
> 
> The only in-tree user of --pretty=raw seems to be git-svn but it
> only looks at path-unlimited log/rev-list from one given commit,
> so the only difference between dumping what is recorded in the
> commit object and listing what parents we _think_ the commit has
> is what we read from grafts.  I think we are safe to just "fix"
> the behaviour of --pretty=raw
> 
> Comments?

The name --pretty=raw suggest output of info directly from commit object,
but perhaps that just me (--pretty=rawish or ==pretty=headers).

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
  2006-10-25  0:10         ` Junio C Hamano
  2006-10-25  0:19           ` Jakub Narebski
@ 2006-10-25  1:59           ` Josh Triplett
  2006-10-25  2:13             ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Josh Triplett @ 2006-10-25  1:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3067 bytes --]

Junio C Hamano wrote:
> Josh Triplett <josh@freedesktop.org> writes:
>> Linus Torvalds wrote:
>>> And yes, that's done by the core revision parsing code, so when you do
>>>
>>> 	git log --full-history --parents -- $project
>>>
>>> you do get the rewritten parent output (of course, it's not actually 
>>> _simplified_, so you get a fair amount of duplicate parents etc which 
>>> you'd still have to simplify and which don't do anything at all).
>>>
>>> Without the "--full-history", you get a simplified history, but it's 
>>> likely to be _too_ simplified for your use, since it will not only 
>>> collapse multiple identical parents, it will also totally _remove_ parents 
>>> that don't introduce any new content.
>> Considering that git-split does exactly that (remove parents that don't
>> introduce new content, assuming they changed things outside the
>> subtree), that might actually work for us.  I just checked, and the
>> output of "git log --parents -- $project" on one of my repositories
>> seems to show the same sequence of commits as git log --parents on the
>> head commit printed by git-split $project (apart from the rewritten
>> sha1s), including elimination of irrelevant merges.
> 
> So one potential action item that came out from this discussion
> for me is to either modify --pretty=raw (or add --pretty=rawish)
> that gives the rewritten parents instead of real parents?  With
> that, you can drop the code to simplify ancestry by hand in your
> loop, and also you do not have to do the grafts inforamation
> yourself either?
> 
> If that is the case I'd be very happy.
> 
> The only thing left for us to decide is if reporting the true
> parenthood like the current --pretty=raw makes sense (if so we
> need to keep it and introduce --pretty=rawfish).
> 
> The only in-tree user of --pretty=raw seems to be git-svn but it
> only looks at path-unlimited log/rev-list from one given commit,
> so the only difference between dumping what is recorded in the
> commit object and listing what parents we _think_ the commit has
> is what we read from grafts.  I think we are safe to just "fix"
> the behaviour of --pretty=raw

I actually think I want to look further into the idea of just using git
--pretty=raw --parents -- $project, and see if I can find any corner cases
where it generates a different history than what we want.  This combination of
options seems like it provides everything we need: redundant history
simplification, parent rewriting based on simplification and grafts, and easy
parsing.  If the only case in which it differs occurs when you have two
distinct commits with identical trees, I don't know that I care too much; that
particular scenario seems unlikely to occur in any of the trees I care about,
and any sane simplification behavior for it seems OK. :) As long as it runs
correctly with various ancestor/descendant/cousin/unrelated relationships
between merged branches (which I want to test further), I think it will do the
job nicely.

- Josh Triplett


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
  2006-10-25  1:59           ` Josh Triplett
@ 2006-10-25  2:13             ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2006-10-25  2:13 UTC (permalink / raw)
  To: Josh Triplett; +Cc: git

Josh Triplett <josh@freedesktop.org> writes:

> Junio C Hamano wrote:
>
>> The only thing left for us to decide is if reporting the true
>> parenthood like the current --pretty=raw makes sense (if so we
>> need to keep it and introduce --pretty=rawfish).
>> 
>> The only in-tree user of --pretty=raw seems to be git-svn but it
>> only looks at path-unlimited log/rev-list from one given commit,
>> so the only difference between dumping what is recorded in the
>> commit object and listing what parents we _think_ the commit has
>> is what we read from grafts.  I think we are safe to just "fix"
>> the behaviour of --pretty=raw
>
> I actually think I want to look further into the idea of just using git
> --pretty=raw --parents -- $project, and see if I can find any corner cases
> where it generates a different history than what we want.

I do not mind _coding_ the --pretty=rawfish change if needed but
I do not think it is necessary, which is pretty good news.

After I wrote the message I realized that I probably do not have
to do anything, since --parents would give you the rewritten
parents already anyway.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2006-10-25  2:14 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-27  8:05 [RFC] git-split: Split the history of a git repository by subdirectories and ranges Josh Triplett
2006-09-27 10:13 ` Junio C Hamano
2006-09-27 11:59   ` Andy Whitcroft
2006-09-27 19:08     ` Junio C Hamano
2006-09-27 19:31       ` Junio C Hamano
2006-10-23 10:17   ` Josh Triplett
2006-10-23 15:52     ` Linus Torvalds
2006-10-23 19:27       ` Josh Triplett
2006-10-23 19:50         ` Linus Torvalds
2006-10-23 20:07           ` Jakub Narebski
2006-10-23 20:52           ` Josh Triplett
2006-10-23 21:06             ` Linus Torvalds
2006-10-23 21:19               ` Linus Torvalds
2006-10-24 14:56           ` Johannes Schindelin
2006-10-24 15:19             ` Linus Torvalds
2006-10-25  0:10         ` Junio C Hamano
2006-10-25  0:19           ` Jakub Narebski
2006-10-25  1:59           ` Josh Triplett
2006-10-25  2:13             ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).