* [RFC] git-split: Split the history of a git repository by subdirectories and ranges
@ 2006-09-27 8:05 Josh Triplett
2006-09-27 10:13 ` Junio C Hamano
0 siblings, 1 reply; 19+ messages in thread
From: Josh Triplett @ 2006-09-27 8:05 UTC (permalink / raw)
To: git; +Cc: Jamey Sharp
[-- Attachment #1.1: Type: text/plain, Size: 1334 bytes --]
Hello,
I co-maintain the X C Binding (XCB) project with Jamey Sharp.
Previously, several XCB-related projects all existed under the umbrella
of a single monolithic GIT repository with per-project subdirectories.
We have split this repository into individual per-project repositories.
Jamey Sharp and I wrote a script called git-split to accomplish this
repository split. git-split reconstructs the history of a sub-project
previously stored in a subdirectory of a larger repository. It
constructs new commit objects based on the existing tree objects for the
subtree in each commit, and discards commits which do not affect the
history of the sub-project, as well as merges made unnecessary due to
these discarded commits. When git-split finishes, it will output the
sha1 for the new head commit, suitable for redirection into a file in
.git/refs/heads. At that point, you can clone the new head, or copy the
repository and prune out undesired heads, tags, and objects.
I have attached git-split for review. If the GIT community has any
interest in seeing git-split become a part of GIT, we can write up the
necessary documentation and patch.
We would like to acknowledge the work of the gobby team in creating a
collaborative editor which greatly aided the development of git-split.
- Josh Triplett
[-- Attachment #1.2: git-split --]
[-- Type: text/plain, Size: 4980 bytes --]
#!/usr/bin/python
# git-split: Split the history of a git repository by subdirectories and ranges
# Copyright (C) 2006 Jamey Sharp, Josh Triplett
#
# You can redistribute this software and/or modify it under the terms of
# the GNU General Public License as published by the Free Software
# Foundation; version 2 dated June, 1991.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
from itertools import izip
from subprocess import Popen, PIPE
import os, sys
def run(cmd, stdin=None, env={}):
newenv = os.environ.copy()
newenv.update(env)
return Popen(cmd, stdin=PIPE, stdout=PIPE, env=newenv).communicate(stdin)[0]
def parse_author_date(s):
"""Given a GIT author or committer string, return (name, email, date)"""
(name, email, time, timezone) = s.rsplit(None, 3)
return (name, email[1:-1], time + " " + timezone)
def get_subtree(tree, name):
output = run(["git-ls-tree", tree, name])
if not output:
return None
return output.split()[2]
def is_ancestor(new_commits, cur, other):
"""Return True if cur has other as an ancestor, or False otherwise."""
return run(["git-merge-base", cur, other]).strip() == other
def walk(commits, new_commits, commit_hash, project):
commit = commits[commit_hash]
if not(commit.has_key("new_hash")):
tree = get_subtree(commit["tree"], project)
commit["new_tree"] = tree
if not tree:
raise Exception("Did not find project in tree for commit " + commit_hash)
new_parents = list(set([walk(commits, new_commits, parent, project)
for parent in commit["parents"]]))
new_hash = None
if len(new_parents) == 1:
new_hash = new_parents[0]
elif len(new_parents) == 2: # Check for unnecessary merge
if is_ancestor(new_commits, new_parents[0], new_parents[1]):
new_hash = new_parents[0]
elif is_ancestor(new_commits, new_parents[1], new_parents[0]):
new_hash = new_parents[1]
if new_hash and new_commits[new_hash]["new_tree"] != tree:
new_hash = None
if not new_hash:
args = ["git-commit-tree", tree]
for new_parent in new_parents:
args.extend(["-p", new_parent])
env = dict(zip(["GIT_AUTHOR_"+n for n in ["NAME", "EMAIL", "DATE"]],
parse_author_date(commit["author"]))
+zip(["GIT_COMMITTER_"+n for n in ["NAME", "EMAIL", "DATE"]],
parse_author_date(commit["committer"])))
new_hash = run(args, commit["message"], env).strip()
commit["new_parents"] = new_parents
commit["new_hash"] = new_hash
if new_hash not in new_commits:
new_commits[new_hash] = commit
return commit["new_hash"]
def main(args):
if not(1 <= len(args) <= 3):
print "Usage: git-split subdir [newest [oldest]]"
return 1
project = args[0]
if len(args) > 1:
newest = args[1]
else:
newest = "HEAD"
newest_hash = run(["git-rev-parse", newest]).strip()
if len(args) > 2:
oldest = args[2]
oldest_hash = run(["git-rev-parse", oldest]).strip()
else:
oldest_hash = None
grafts = {}
try:
for line in file(".git/info/grafts").read().split("\n"):
if line:
child, parents = line.split(None, 1)
parents = parents.split()
grafts[child] = parents
except IOError:
pass
temp = run(["git-log", "--pretty=raw", newest_hash]).split("\n\n")
commits = {}
for headers,message in izip(temp[0::2], temp[1::2]):
commit = {}
commit_hash = None
headers = [header.split(None, 1) for header in headers.split("\n")]
for key,value in headers:
if key == "parent":
commit.setdefault("parents", []).append(value)
elif key == "commit":
commit_hash = value
else:
if key in commit:
raise Exception('Duplicate key "%s"' % key)
commit[key] = value
commit["message"] = "".join([line[4:]+"\n"
for line in message.split("\n")])
if commit_hash is None:
raise Exception("Commit without hash")
if commit_hash in grafts:
commit["parents"] = grafts[commit_hash]
if commit_hash == oldest_hash or "parents" not in commit:
commit["parents"] = []
commits[commit_hash] = commit
print walk(commits, dict(), newest_hash, project)
try:
import psyco
psyco.full()
except ImportError:
pass
if __name__ == "__main__": sys.exit(main(sys.argv[1:]))
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
2006-09-27 8:05 [RFC] git-split: Split the history of a git repository by subdirectories and ranges Josh Triplett
@ 2006-09-27 10:13 ` Junio C Hamano
2006-09-27 11:59 ` Andy Whitcroft
2006-10-23 10:17 ` Josh Triplett
0 siblings, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2006-09-27 10:13 UTC (permalink / raw)
To: Josh Triplett; +Cc: git
Josh Triplett <josh@freedesktop.org> writes:
> Jamey Sharp and I wrote a script called git-split to accomplish this
> repository split. git-split reconstructs the history of a sub-project
> previously stored in a subdirectory of a larger repository. It
> constructs new commit objects based on the existing tree objects for the
> subtree in each commit, and discards commits which do not affect the
> history of the sub-project, as well as merges made unnecessary due to
> these discarded commits.
Very nicely done.
> We would like to acknowledge the work of the gobby team in creating a
> collaborative editor which greatly aided the development of git-split.
> from itertools import izip
> from subprocess import Popen, PIPE
> import os, sys
How recent a Python are we assuming here? Is late 2.4 recent
enough?
> def walk(commits, new_commits, commit_hash, project):
> commit = commits[commit_hash]
> if not(commit.has_key("new_hash")):
> tree = get_subtree(commit["tree"], project)
> commit["new_tree"] = tree
> if not tree:
> raise Exception("Did not find project in tree for commit " + commit_hash)
> new_parents = list(set([walk(commits, new_commits, parent, project)
> for parent in commit["parents"]]))
>
> new_hash = None
> if len(new_parents) == 1:
> new_hash = new_parents[0]
> elif len(new_parents) == 2: # Check for unnecessary merge
> if is_ancestor(new_commits, new_parents[0], new_parents[1]):
> new_hash = new_parents[0]
> elif is_ancestor(new_commits, new_parents[1], new_parents[0]):
> new_hash = new_parents[1]
> if new_hash and new_commits[new_hash]["new_tree"] != tree:
> new_hash = None
This is a real gem. I really like reading well-written Python
programs.
When git-rev-list (or "git-log --pretty=raw" that you use in
your main()) simplifies the merge history based on subtree, we
look at the merge and if the tree matches any of the parent we
discard other parents and make the history a single strand of
pearls. However for this application that is not what you want,
so I can see why you run full "git-log" and prune history by
hand here like this.
I wonder if using "git-log --full-history -- $project" to let
the core side omit commits that do not change the $project (but
still give you all merged branches) would have made your job any
easier?
You are handling grafts by hand because --pretty=raw is special
in that it displays the real parents (although traversal does
use grafts). Maybe it would have helped if we had a --pretty
format that is similar to raw but rewrites the parents?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
2006-09-27 10:13 ` Junio C Hamano
@ 2006-09-27 11:59 ` Andy Whitcroft
2006-09-27 19:08 ` Junio C Hamano
2006-10-23 10:17 ` Josh Triplett
1 sibling, 1 reply; 19+ messages in thread
From: Andy Whitcroft @ 2006-09-27 11:59 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Josh Triplett, git
Junio C Hamano wrote:
> Josh Triplett <josh@freedesktop.org> writes:
>
>> Jamey Sharp and I wrote a script called git-split to accomplish this
>> repository split. git-split reconstructs the history of a sub-project
>> previously stored in a subdirectory of a larger repository. It
>> constructs new commit objects based on the existing tree objects for the
>> subtree in each commit, and discards commits which do not affect the
>> history of the sub-project, as well as merges made unnecessary due to
>> these discarded commits.
>
> Very nicely done.
>
>> We would like to acknowledge the work of the gobby team in creating a
>> collaborative editor which greatly aided the development of git-split.
>
>> from itertools import izip
>> from subprocess import Popen, PIPE
>> import os, sys
>
> How recent a Python are we assuming here? Is late 2.4 recent
> enough?
>
>> def walk(commits, new_commits, commit_hash, project):
>> commit = commits[commit_hash]
>> if not(commit.has_key("new_hash")):
>> tree = get_subtree(commit["tree"], project)
>> commit["new_tree"] = tree
>> if not tree:
>> raise Exception("Did not find project in tree for commit " + commit_hash)
>> new_parents = list(set([walk(commits, new_commits, parent, project)
>> for parent in commit["parents"]]))
>>
>> new_hash = None
>> if len(new_parents) == 1:
>> new_hash = new_parents[0]
>> elif len(new_parents) == 2: # Check for unnecessary merge
>> if is_ancestor(new_commits, new_parents[0], new_parents[1]):
>> new_hash = new_parents[0]
>> elif is_ancestor(new_commits, new_parents[1], new_parents[0]):
>> new_hash = new_parents[1]
>> if new_hash and new_commits[new_hash]["new_tree"] != tree:
>> new_hash = None
>
> This is a real gem. I really like reading well-written Python
> programs.
>
> When git-rev-list (or "git-log --pretty=raw" that you use in
> your main()) simplifies the merge history based on subtree, we
> look at the merge and if the tree matches any of the parent we
> discard other parents and make the history a single strand of
> pearls. However for this application that is not what you want,
> so I can see why you run full "git-log" and prune history by
> hand here like this.
>
> I wonder if using "git-log --full-history -- $project" to let
> the core side omit commits that do not change the $project (but
> still give you all merged branches) would have made your job any
> easier?
>
> You are handling grafts by hand because --pretty=raw is special
> in that it displays the real parents (although traversal does
> use grafts). Maybe it would have helped if we had a --pretty
> format that is similar to raw but rewrites the parents?
I have wondered recently why grafts are hidden in this way. I feel they
are something I want to know is occuring in my history as this history
is being manipulated. Perhaps we could emit a graft record in the
output, indeed have a graft object there? Someone could commit a
change, then graft over it in the history so I'd not see it even though
its in my working copy.
For instance in my historical git tree I have the following, note the
lack of a parent. If I move the graft up one commit, then I get a
parent, but not a parent that points at the next commit.
commit 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2
tree 0bba044c4ce775e45a88a51686b5d9f90697ea9d
author Linus Torvalds <torvalds@ppc970.osdl.org> 1113690036 -0700
committer Linus Torvalds <torvalds@ppc970.osdl.org> 1113690036 -0700
[...]
commit e7e173af42dbf37b1d946f9ee00219cb3b2bea6a
tree 0bba044c4ce775e45a88a51686b5d9f90697ea9d
parent 607899e17218b485a021c6ebb1cff771fd690eec
author Linus Torvalds <torvalds@ppc970.osdl.org> 1112580513 -0700
committer Linus Torvalds <torvalds@ppc970.osdl.org> 1112580513 -0700
It might be nice to have it more like the following, with a graft in
there, N*40 would obviously be fakes in the first instance as the object
isn't modified. M*40 would refer to the old parent if there was one,
else NONE or 0*40 perhaps.
commit 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2
tree 0bba044c4ce775e45a88a51686b5d9f90697ea9d
parent NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
author Linus Torvalds <torvalds@ppc970.osdl.org> 1113690036 -0700
committer Linus Torvalds <torvalds@ppc970.osdl.org> 1113690036 -0700
[...]
graft NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
parent e7e173af42dbf37b1d946f9ee00219cb3b2bea6a
OLD: MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
commit e7e173af42dbf37b1d946f9ee00219cb3b2bea6a
tree 0bba044c4ce775e45a88a51686b5d9f90697ea9d
parent 607899e17218b485a021c6ebb1cff771fd690eec
author Linus Torvalds <torvalds@ppc970.osdl.org> 1112580513 -0700
committer Linus Torvalds <torvalds@ppc970.osdl.org> 1112580513 -0700
I guess the other option would be to annotate the previous commit,
perhaps on the parent line so we can see the 'right' data in the normal
place, but the overridden data is right there and grepable.
parent e7e173af42dbf37b1d946f9ee00219cb3b2bea6a GRAFT M*40
or
parent e7e173af42dbf37b1d946f9ee00219cb3b2bea6a GRAFT NONE
-apw
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
2006-09-27 11:59 ` Andy Whitcroft
@ 2006-09-27 19:08 ` Junio C Hamano
2006-09-27 19:31 ` Junio C Hamano
0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2006-09-27 19:08 UTC (permalink / raw)
To: Andy Whitcroft; +Cc: Josh Triplett, git
Andy Whitcroft <apw@shadowen.org> writes:
>> You are handling grafts by hand because --pretty=raw is special
>> in that it displays the real parents (although traversal does
>> use grafts). Maybe it would have helped if we had a --pretty
>> format that is similar to raw but rewrites the parents?
>
> I have wondered recently why grafts are hidden in this way. I feel they
> are something I want to know is occuring in my history as this history
> is being manipulated.
Just to make sure we are on the same page, only "raw" format
output is special and it is special only on output. Ancestry
traversal always honors what you have in grafts.
However, you can do:
$ git rev-list --parents --pretty=raw
which would give you "commit $this_commit $its $parents" lines
and "parent $true_parent" lines at the same time.
And they will be inconsistent when you have grafts or path
limiter. The former honor grafts and path limiter, and the
latter show the true set of parents.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
2006-09-27 19:08 ` Junio C Hamano
@ 2006-09-27 19:31 ` Junio C Hamano
0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2006-09-27 19:31 UTC (permalink / raw)
To: Andy Whitcroft; +Cc: Josh Triplett, git
Junio C Hamano <junkio@cox.net> writes:
> However, you can do:
>
> $ git rev-list --parents --pretty=raw
>
> which would give you "commit $this_commit $its $parents" lines
> and "parent $true_parent" lines at the same time.
>
> And they will be inconsistent when you have grafts or path
> limiter. The former honor grafts and path limiter, and the
> latter show the true set of parents.
-- >8 --
An illustration of rev-list --parents --pretty=raw
This script creates two separate histories, A and B, each of
which does:
(A0, B0): create fileA and subdir/fileB
(A1, B1): modify fileA
(A2, B2): modify subdir/fileB
and then grafts them together to make B0 a child of A2. So
the final history looks like (time flows from top to bottom):
true parent touches subdir?
A0 none yes (creates it)
A1 A0 no
A2 A1 yes
B0 none yes (different from what's in A2)
B1 B0 no
B2 B1 yes
"git rev-list --parents --pretty=raw B2" would give "fake"
parents on the "commit " header lines while "parent " header
lines show the parent as recorded in the commit object (i.e. B0
appears to have A2 as its parent on "commit " header but there
is no "parent A2" header line in it).
When you have path limiters, we simplify history to omit
commits that do not affect the specified paths.
So "git rev-list --parents --pretty=raw B2 subdir" would return
"B2 B0 A2 A0" (because B1 and A1 do not touch the path). When
it does so, the "commit " header lines have "fake" parents
(i.e. B2 appears to have B0 as its parent on "commit " header),
but you can still get the true parents by looking at "parent "
header.
---
diff --git a/t/t6001-rev-list-graft.sh b/t/t6001-rev-list-graft.sh
new file mode 100755
index 0000000..08a6cff
--- /dev/null
+++ b/t/t6001-rev-list-graft.sh
@@ -0,0 +1,113 @@
+#!/bin/sh
+
+test_description='Revision traversal vs grafts and path limiter'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+ mkdir subdir &&
+ echo >fileA fileA &&
+ echo >subdir/fileB fileB &&
+ git add fileA subdir/fileB &&
+ git commit -a -m "Initial in one history." &&
+ A0=`git rev-parse --verify HEAD` &&
+
+ echo >fileA fileA modified &&
+ git commit -a -m "Second in one history." &&
+ A1=`git rev-parse --verify HEAD` &&
+
+ echo >subdir/fileB fileB modified &&
+ git commit -a -m "Third in one history." &&
+ A2=`git rev-parse --verify HEAD` &&
+
+ rm -f .git/refs/heads/master .git/index &&
+
+ echo >fileA fileA again &&
+ echo >subdir/fileB fileB again &&
+ git add fileA subdir/fileB &&
+ git commit -a -m "Initial in alternate history." &&
+ B0=`git rev-parse --verify HEAD` &&
+
+ echo >fileA fileA modified in alternate history &&
+ git commit -a -m "Second in alternate history." &&
+ B1=`git rev-parse --verify HEAD` &&
+
+ echo >subdir/fileB fileB modified in alternate history &&
+ git commit -a -m "Third in alternate history." &&
+ B2=`git rev-parse --verify HEAD` &&
+ : done
+'
+
+check () {
+ type=$1
+ shift
+
+ arg=
+ which=arg
+ rm -f test.expect
+ for a
+ do
+ if test "z$a" = z--
+ then
+ which=expect
+ child=
+ continue
+ fi
+ if test "$which" = arg
+ then
+ arg="$arg$a "
+ continue
+ fi
+ if test "$type" = basic
+ then
+ echo "$a"
+ else
+ if test "z$child" != z
+ then
+ echo "$child $a"
+ fi
+ child="$a"
+ fi
+ done >test.expect
+ if test "$type" != basic && test "z$child" != z
+ then
+ echo >>test.expect $child
+ fi
+ if test $type = basic
+ then
+ git rev-list $arg >test.actual
+ elif test $type = parents
+ then
+ git rev-list --parents $arg >test.actual
+ elif test $type = parents-raw
+ then
+ git rev-list --parents --pretty=raw $arg |
+ sed -n -e 's/^commit //p' >test.actual
+ fi
+ diff test.expect test.actual
+}
+
+for type in basic parents parents-raw
+do
+ test_expect_success 'without grafts' "
+ rm -f .git/info/grafts
+ check $type $B2 -- $B2 $B1 $B0
+ "
+
+ test_expect_success 'with grafts' "
+ echo '$B0 $A2' >.git/info/grafts
+ check $type $B2 -- $B2 $B1 $B0 $A2 $A1 $A0
+ "
+
+ test_expect_success 'without grafts, with pathlimit' "
+ rm -f .git/info/grafts
+ check $type $B2 subdir -- $B2 $B0
+ "
+
+ test_expect_success 'with grafts, with pathlimit' "
+ echo '$B0 $A2' >.git/info/grafts
+ check $type $B2 subdir -- $B2 $B0 $A2 $A0
+ "
+
+done
+test_done
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
2006-09-27 10:13 ` Junio C Hamano
2006-09-27 11:59 ` Andy Whitcroft
@ 2006-10-23 10:17 ` Josh Triplett
2006-10-23 15:52 ` Linus Torvalds
1 sibling, 1 reply; 19+ messages in thread
From: Josh Triplett @ 2006-10-23 10:17 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 3301 bytes --]
Junio C Hamano wrote:
> Josh Triplett <josh@freedesktop.org> writes:
>> Jamey Sharp and I wrote a script called git-split to accomplish this
>> repository split. git-split reconstructs the history of a sub-project
>> previously stored in a subdirectory of a larger repository. It
>> constructs new commit objects based on the existing tree objects for the
>> subtree in each commit, and discards commits which do not affect the
>> history of the sub-project, as well as merges made unnecessary due to
>> these discarded commits.
>
> Very nicely done.
Thanks!
>> We would like to acknowledge the work of the gobby team in creating a
>> collaborative editor which greatly aided the development of git-split.
>
>> from itertools import izip
>> from subprocess import Popen, PIPE
>> import os, sys
>
> How recent a Python are we assuming here? Is late 2.4 recent
> enough?
We ran it with 2.4, so yes. git-split does require at least 2.4,
though, because it uses set(), str.rsplit(), and subprocess, none of
which existed in 2.3.
>> def walk(commits, new_commits, commit_hash, project):
>> commit = commits[commit_hash]
>> if not(commit.has_key("new_hash")):
>> tree = get_subtree(commit["tree"], project)
>> commit["new_tree"] = tree
>> if not tree:
>> raise Exception("Did not find project in tree for commit " + commit_hash)
>> new_parents = list(set([walk(commits, new_commits, parent, project)
>> for parent in commit["parents"]]))
>>
>> new_hash = None
>> if len(new_parents) == 1:
>> new_hash = new_parents[0]
>> elif len(new_parents) == 2: # Check for unnecessary merge
>> if is_ancestor(new_commits, new_parents[0], new_parents[1]):
>> new_hash = new_parents[0]
>> elif is_ancestor(new_commits, new_parents[1], new_parents[0]):
>> new_hash = new_parents[1]
>> if new_hash and new_commits[new_hash]["new_tree"] != tree:
>> new_hash = None
>
> This is a real gem. I really like reading well-written Python
> programs.
Thanks. We had some fun writing this; git's elegant repository
structure made it a joy to work with.
> I wonder if using "git-log --full-history -- $project" to let
> the core side omit commits that do not change the $project (but
> still give you all merged branches) would have made your job any
> easier?
I don't think it would. We still need to know what commit to use as the
parent of any given commit, so we don't want commits in the log output
with parents that don't exist in the log output. And rewriting parents
in git-log based on which revisions change the specified subdirectory
seems like a bad idea.
> You are handling grafts by hand because --pretty=raw is special
> in that it displays the real parents (although traversal does
> use grafts). Maybe it would have helped if we had a --pretty
> format that is similar to raw but rewrites the parents?
Yes, that would help. We could then avoid dealing with grafts manually.
How would you feel about including git-split in the GIT tree? We could
certainly write up the necessary documentation for it.
- Josh Triplett
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
2006-10-23 10:17 ` Josh Triplett
@ 2006-10-23 15:52 ` Linus Torvalds
2006-10-23 19:27 ` Josh Triplett
0 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2006-10-23 15:52 UTC (permalink / raw)
To: Josh Triplett; +Cc: Junio C Hamano, git
On Mon, 23 Oct 2006, Josh Triplett wrote:
>
> > I wonder if using "git-log --full-history -- $project" to let the core
> > side omit commits that do not change the $project (but still give you
> > all merged branches) would have made your job any easier?
>
> I don't think it would. We still need to know what commit to use as the
> parent of any given commit, so we don't want commits in the log output
> with parents that don't exist in the log output. And rewriting parents
> in git-log based on which revisions change the specified subdirectory
> seems like a bad idea.
Umm.. You didn't realize that git log already _does_ exactly that?
You need to rewrite the parents in order to get a nice and readable
history, which in turn is needed for any visualizer. So git has long done
the parent rewriting in order to be able to do things like
gitk drivers/char
on the kernel.
And yes, that's done by the core revision parsing code, so when you do
git log --full-history --parents -- $project
you do get the rewritten parent output (of course, it's not actually
_simplified_, so you get a fair amount of duplicate parents etc which
you'd still have to simplify and which don't do anything at all).
Without the "--full-history", you get a simplified history, but it's
likely to be _too_ simplified for your use, since it will not only
collapse multiple identical parents, it will also totally _remove_ parents
that don't introduce any new content.
So there are multiple levels of history simplification, and right now the
internal git revision parser only gives you two choices: "none"
(--full-history) and "extreme" (which is the default when you give a set
of filenames).
Linus
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
2006-10-23 15:52 ` Linus Torvalds
@ 2006-10-23 19:27 ` Josh Triplett
2006-10-23 19:50 ` Linus Torvalds
2006-10-25 0:10 ` Junio C Hamano
0 siblings, 2 replies; 19+ messages in thread
From: Josh Triplett @ 2006-10-23 19:27 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, git
[-- Attachment #1: Type: text/plain, Size: 2586 bytes --]
Linus Torvalds wrote:
>
> On Mon, 23 Oct 2006, Josh Triplett wrote:
>>> I wonder if using "git-log --full-history -- $project" to let the core
>>> side omit commits that do not change the $project (but still give you
>>> all merged branches) would have made your job any easier?
>> I don't think it would. We still need to know what commit to use as the
>> parent of any given commit, so we don't want commits in the log output
>> with parents that don't exist in the log output. And rewriting parents
>> in git-log based on which revisions change the specified subdirectory
>> seems like a bad idea.
>
> Umm.. You didn't realize that git log already _does_ exactly that?
No, I didn't, primarily because the git log output I've scrutinized most
carefully came from git log --pretty=raw, which doesn't rewrite parents
even when pointed at a subdirectory.
> You need to rewrite the parents in order to get a nice and readable
> history, which in turn is needed for any visualizer. So git has long done
> the parent rewriting in order to be able to do things like
>
> gitk drivers/char
>
> on the kernel.
>
> And yes, that's done by the core revision parsing code, so when you do
>
> git log --full-history --parents -- $project
>
> you do get the rewritten parent output (of course, it's not actually
> _simplified_, so you get a fair amount of duplicate parents etc which
> you'd still have to simplify and which don't do anything at all).
>
> Without the "--full-history", you get a simplified history, but it's
> likely to be _too_ simplified for your use, since it will not only
> collapse multiple identical parents, it will also totally _remove_ parents
> that don't introduce any new content.
Considering that git-split does exactly that (remove parents that don't
introduce new content, assuming they changed things outside the
subtree), that might actually work for us. I just checked, and the
output of "git log --parents -- $project" on one of my repositories
seems to show the same sequence of commits as git log --parents on the
head commit printed by git-split $project (apart from the rewritten
sha1s), including elimination of irrelevant merges.
> So there are multiple levels of history simplification, and right now the
> internal git revision parser only gives you two choices: "none"
> (--full-history) and "extreme" (which is the default when you give a set
> of filenames).
I don't think we need any middle ground here; why might we want less
simplification?
- Josh Triplett
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
2006-10-23 19:27 ` Josh Triplett
@ 2006-10-23 19:50 ` Linus Torvalds
2006-10-23 20:07 ` Jakub Narebski
` (2 more replies)
2006-10-25 0:10 ` Junio C Hamano
1 sibling, 3 replies; 19+ messages in thread
From: Linus Torvalds @ 2006-10-23 19:50 UTC (permalink / raw)
To: Josh Triplett; +Cc: Junio C Hamano, git
On Mon, 23 Oct 2006, Josh Triplett wrote:
>
> > Without the "--full-history", you get a simplified history, but it's
> > likely to be _too_ simplified for your use, since it will not only
> > collapse multiple identical parents, it will also totally _remove_ parents
> > that don't introduce any new content.
>
> Considering that git-split does exactly that (remove parents that don't
> introduce new content, assuming they changed things outside the
> subtree), that might actually work for us. I just checked, and the
> output of "git log --parents -- $project" on one of my repositories
> seems to show the same sequence of commits as git log --parents on the
> head commit printed by git-split $project (apart from the rewritten
> sha1s), including elimination of irrelevant merges.
Ok. In that case, you're good to go, and just use the current
simplification entirely.
Although I think that somebody (Dscho?) also had a patch to remove
multiple identical parents, which he claimed could happen with
simplification otherwise. I didn't look any closer at it.
> > So there are multiple levels of history simplification, and right now the
> > internal git revision parser only gives you two choices: "none"
> > (--full-history) and "extreme" (which is the default when you give a set
> > of filenames).
>
> I don't think we need any middle ground here; why might we want less
> simplification?
There's really three levels of simplification:
- none at all ("--full-history"). This is really annoying, but if you
want to guarantee that you see all the changes (even duplicate ones)
done along all branches, you currently need to do this one.
Currently "git whatchanged" uses this one (and that ignores merges by
default, making it quite palatable). So with "git whatchanged", you
will get _every_ commit that changed the file, even if there are
duplicates alogn different histories.
- extreme (the current default). This one is really nice, in that it
shows the simplest history you can make that explains the end result.
But it means that if you had two branches that ended up with the same
result, we will pick just one of them. And the other one may have done
it differently, and the different way of reaching the same result might
be interesting. We'll never know.
As an exmple: the extreme simplification can also throw away branches
that had work reverted on them - the branch ended up the _same_ as the
one we chose, but it did so because it had some experimental work that
was deemed to be bad. Extreme simplification may or may not remove that
experiment, simply depending on which branch it _happened_ to pick.
Currently, this is what most git users see if they ask for pathname
simplification, ie "gitk drivers/char" or "git log -p kernel/sched.c"
uses this simplification. It's extremely useful, but it definitely
culls real history too.
- The nice one that doesn't throw away potentially interesting
duplicate paths to reach the same end result. We don't have this one,
so no git commands do this yet.
The way to do this one would be "--full-history", but then removing all
parents that are "redundant". In other words, for any merge that
remains (because of the --full-history), check if one parent is a full
superset of another one, and if so, remove the "dominated" parent,
which simplifies the merge. Continue until nothing can be simplified
any more.
This would _usually_ end up giving the same graph as the "extreme"
simplification, but if there were two branches that really _did_
generate the same end result using different commits, they'd remain in
the end result.
The problem with the "nice one" is that it's expensive as hell. There may
be clever tricks to make it less so, though. But I think it's the
RightThing(tm) to do, at least as an option for when you really want to
see a reasonable history that still contains everything that is relevant.
Linus
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
2006-10-23 19:50 ` Linus Torvalds
@ 2006-10-23 20:07 ` Jakub Narebski
2006-10-23 20:52 ` Josh Triplett
2006-10-24 14:56 ` Johannes Schindelin
2 siblings, 0 replies; 19+ messages in thread
From: Jakub Narebski @ 2006-10-23 20:07 UTC (permalink / raw)
To: git
There is also not-that-obvious result that
git rev-log --parents --full-history <head> -- <pathspec>
generates different result than if either --parents or --full-history are
absent.
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
2006-10-23 19:50 ` Linus Torvalds
2006-10-23 20:07 ` Jakub Narebski
@ 2006-10-23 20:52 ` Josh Triplett
2006-10-23 21:06 ` Linus Torvalds
2006-10-24 14:56 ` Johannes Schindelin
2 siblings, 1 reply; 19+ messages in thread
From: Josh Triplett @ 2006-10-23 20:52 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, git
[-- Attachment #1: Type: text/plain, Size: 2415 bytes --]
Linus Torvalds wrote:
> On Mon, 23 Oct 2006, Josh Triplett wrote:
> - The nice one that doesn't throw away potentially interesting
> duplicate paths to reach the same end result. We don't have this one,
> so no git commands do this yet.
>
> The way to do this one would be "--full-history", but then removing all
> parents that are "redundant". In other words, for any merge that
> remains (because of the --full-history), check if one parent is a full
> superset of another one, and if so, remove the "dominated" parent,
> which simplifies the merge. Continue until nothing can be simplified
> any more.
>
> This would _usually_ end up giving the same graph as the "extreme"
> simplification, but if there were two branches that really _did_
> generate the same end result using different commits, they'd remain in
> the end result.
>
> The problem with the "nice one" is that it's expensive as hell. There may
> be clever tricks to make it less so, though. But I think it's the
> RightThing(tm) to do, at least as an option for when you really want to
> see a reasonable history that still contains everything that is relevant.
So, if a commit has more than one parent (a merge), you want to
eliminate any parents that end up as ancestors to other parents in the
merge (including if their head has the same commit ID), but not
eliminate multiple parents with different head commits but the same tree
object? That seems simple enough; I *think* git-split actually already
does that, though I haven't actually tested that particular case. If
git log eliminates all but one of the parents with different commits but
the same tree, I believe the commit sequence generated by git-split will
differ from that of git log in that case, by including all such parents.
I do agree that the behavior you describe seems like the best
simplification, and I don't think the alternative you describe as
"extreme simplification" makes any sense at all (picking a parent
arbitrarily), nor does it seem any simpler to generate; either way, you
still have to figure out if one parent has another as an ancestor, while
the additional "extreme simplification" just *adds* a comparison of tree
hashes.
Or have I misunderstood the case you have concerns about? Why would the
"nice" format incur additional cost?
- Josh Triplett
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
2006-10-23 20:52 ` Josh Triplett
@ 2006-10-23 21:06 ` Linus Torvalds
2006-10-23 21:19 ` Linus Torvalds
0 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2006-10-23 21:06 UTC (permalink / raw)
To: Josh Triplett; +Cc: Junio C Hamano, git
On Mon, 23 Oct 2006, Josh Triplett wrote:
>
> Or have I misunderstood the case you have concerns about? Why would the
> "nice" format incur additional cost?
Try it. The default "extreme" simplification is a _hell_ of a lot faster
than doing the full history.
[torvalds@g5 linux]$ time git-rev-list --full-history --parents HEAD -- kernel/sched.c >/dev/null
real 0m4.660s
user 0m4.612s
sys 0m0.044s
[torvalds@g5 linux]$ time git-rev-list --parents HEAD -- kernel/sched.c >/dev/null
real 0m1.684s
user 0m1.680s
sys 0m0.004s
and the "nice" thing will be much slower still: just trying to figure out
whether a commit is a parent of another commit is expensive. Doing so for
_each_ merge is more expensive still. I think it's O(n^3), but what do I
know..
Linus
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
2006-10-23 21:06 ` Linus Torvalds
@ 2006-10-23 21:19 ` Linus Torvalds
0 siblings, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2006-10-23 21:19 UTC (permalink / raw)
To: Josh Triplett; +Cc: Junio C Hamano, git
On Mon, 23 Oct 2006, Linus Torvalds wrote:
>
> Try it. The default "extreme" simplification is a _hell_ of a lot faster
> than doing the full history.
[ timings removed ]
Btw, the reason it is so much faster is that it can be done early, and
allows us to prune out parts of the history that we don't care about.
For example, when we hit a merge, and the result of that merge is
identical to one of the parents (in the set of filenames that we are
interested in), we can simply choose to totally ignore the other parent,
and we don't need to traverse that history at _all_. Because clearly, all
the actual _data_ came from just the other one.
So the "extreme" simplification is way way faster, because in the presense
of a lot of merges, it can select to go down just one of the paths, and
totally ignore the other ones. In practice, for a fairly "bushy" history
tree like the kernel, that can cut down the number of commits you need to
compare by a factor of two or more.
In many ways, it is also actually a _better_ result, in that it's a
"closer to minimal" way of reaching a particular state. So if you're just
interested in how something came to be, and want to just cut through the
crap, the result extreme simplification really _is_ better.
So the branches that were dismissed really _aren't_ important - they might
contain real work, but from the point of the end result, that real work
might as well not have happened, since the simpler history we chose _also_
explain the end result sufficiently.
So I think the default simplification is really a good default: not only
because it's fundamentally cheaper, but because it is actually more likely
to be distill what you actually care about if you wonder what happened to
a file or a set of files.
But if you care about all the "side efforts" that didn't actually matter
for the end result too, then you'd want the more expensive, and more
complete graph. But it _will_ be a lot more expensive to compute.
Linus
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
2006-10-23 19:50 ` Linus Torvalds
2006-10-23 20:07 ` Jakub Narebski
2006-10-23 20:52 ` Josh Triplett
@ 2006-10-24 14:56 ` Johannes Schindelin
2006-10-24 15:19 ` Linus Torvalds
2 siblings, 1 reply; 19+ messages in thread
From: Johannes Schindelin @ 2006-10-24 14:56 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Josh Triplett, Junio C Hamano, git
Hi,
On Mon, 23 Oct 2006, Linus Torvalds wrote:
> Although I think that somebody (Dscho?) also had a patch to remove
> multiple identical parents, which he claimed could happen with
> simplification otherwise. I didn't look any closer at it.
IIRC It only happened when full history was wished for, _and_ path
limiting. And Junio said that in that case, culling identical parents
would be the wrong thing to do.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
2006-10-24 14:56 ` Johannes Schindelin
@ 2006-10-24 15:19 ` Linus Torvalds
0 siblings, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2006-10-24 15:19 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Josh Triplett, Junio C Hamano, git
On Tue, 24 Oct 2006, Johannes Schindelin wrote:
>
> On Mon, 23 Oct 2006, Linus Torvalds wrote:
>
> > Although I think that somebody (Dscho?) also had a patch to remove
> > multiple identical parents, which he claimed could happen with
> > simplification otherwise. I didn't look any closer at it.
>
> IIRC It only happened when full history was wished for, _and_ path
> limiting. And Junio said that in that case, culling identical parents
> would be the wrong thing to do.
Yeah, with full history, you might as well keep the trivially identical
parents too. In the "nice rewrite", you'd get rid of them, but you'd get
rid of them not because they are trivially identical, but because they are
just the trivial case of "one parent totally dominates the other".
Sadly, while it may be the _trivial_ case in --full-history, it's probably
not even the common case, at least if you have lots of merges (because you
end up having one parent point to one merge, and the other to another, and
you need to simplify things in multiple passes).
Linus
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
2006-10-23 19:27 ` Josh Triplett
2006-10-23 19:50 ` Linus Torvalds
@ 2006-10-25 0:10 ` Junio C Hamano
2006-10-25 0:19 ` Jakub Narebski
2006-10-25 1:59 ` Josh Triplett
1 sibling, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2006-10-25 0:10 UTC (permalink / raw)
To: git; +Cc: Josh Triplett
Josh Triplett <josh@freedesktop.org> writes:
> Linus Torvalds wrote:
>>
>> And yes, that's done by the core revision parsing code, so when you do
>>
>> git log --full-history --parents -- $project
>>
>> you do get the rewritten parent output (of course, it's not actually
>> _simplified_, so you get a fair amount of duplicate parents etc which
>> you'd still have to simplify and which don't do anything at all).
>>
>> Without the "--full-history", you get a simplified history, but it's
>> likely to be _too_ simplified for your use, since it will not only
>> collapse multiple identical parents, it will also totally _remove_ parents
>> that don't introduce any new content.
>
> Considering that git-split does exactly that (remove parents that don't
> introduce new content, assuming they changed things outside the
> subtree), that might actually work for us. I just checked, and the
> output of "git log --parents -- $project" on one of my repositories
> seems to show the same sequence of commits as git log --parents on the
> head commit printed by git-split $project (apart from the rewritten
> sha1s), including elimination of irrelevant merges.
So one potential action item that came out from this discussion
for me is to either modify --pretty=raw (or add --pretty=rawish)
that gives the rewritten parents instead of real parents? With
that, you can drop the code to simplify ancestry by hand in your
loop, and also you do not have to do the grafts inforamation
yourself either?
If that is the case I'd be very happy.
The only thing left for us to decide is if reporting the true
parenthood like the current --pretty=raw makes sense (if so we
need to keep it and introduce --pretty=rawfish).
The only in-tree user of --pretty=raw seems to be git-svn but it
only looks at path-unlimited log/rev-list from one given commit,
so the only difference between dumping what is recorded in the
commit object and listing what parents we _think_ the commit has
is what we read from grafts. I think we are safe to just "fix"
the behaviour of --pretty=raw
Comments?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
2006-10-25 0:10 ` Junio C Hamano
@ 2006-10-25 0:19 ` Jakub Narebski
2006-10-25 1:59 ` Josh Triplett
1 sibling, 0 replies; 19+ messages in thread
From: Jakub Narebski @ 2006-10-25 0:19 UTC (permalink / raw)
To: git
Junio C Hamano wrote:
> So one potential action item that came out from this discussion
> for me is to either modify --pretty=raw (or add --pretty=rawish)
> that gives the rewritten parents instead of real parents? With
> that, you can drop the code to simplify ancestry by hand in your
> loop, and also you do not have to do the grafts inforamation
> yourself either?
>
> If that is the case I'd be very happy.
>
> The only thing left for us to decide is if reporting the true
> parenthood like the current --pretty=raw makes sense (if so we
> need to keep it and introduce --pretty=rawfish).
>
> The only in-tree user of --pretty=raw seems to be git-svn but it
> only looks at path-unlimited log/rev-list from one given commit,
> so the only difference between dumping what is recorded in the
> commit object and listing what parents we _think_ the commit has
> is what we read from grafts. I think we are safe to just "fix"
> the behaviour of --pretty=raw
>
> Comments?
The name --pretty=raw suggest output of info directly from commit object,
but perhaps that just me (--pretty=rawish or ==pretty=headers).
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
2006-10-25 0:10 ` Junio C Hamano
2006-10-25 0:19 ` Jakub Narebski
@ 2006-10-25 1:59 ` Josh Triplett
2006-10-25 2:13 ` Junio C Hamano
1 sibling, 1 reply; 19+ messages in thread
From: Josh Triplett @ 2006-10-25 1:59 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 3067 bytes --]
Junio C Hamano wrote:
> Josh Triplett <josh@freedesktop.org> writes:
>> Linus Torvalds wrote:
>>> And yes, that's done by the core revision parsing code, so when you do
>>>
>>> git log --full-history --parents -- $project
>>>
>>> you do get the rewritten parent output (of course, it's not actually
>>> _simplified_, so you get a fair amount of duplicate parents etc which
>>> you'd still have to simplify and which don't do anything at all).
>>>
>>> Without the "--full-history", you get a simplified history, but it's
>>> likely to be _too_ simplified for your use, since it will not only
>>> collapse multiple identical parents, it will also totally _remove_ parents
>>> that don't introduce any new content.
>> Considering that git-split does exactly that (remove parents that don't
>> introduce new content, assuming they changed things outside the
>> subtree), that might actually work for us. I just checked, and the
>> output of "git log --parents -- $project" on one of my repositories
>> seems to show the same sequence of commits as git log --parents on the
>> head commit printed by git-split $project (apart from the rewritten
>> sha1s), including elimination of irrelevant merges.
>
> So one potential action item that came out from this discussion
> for me is to either modify --pretty=raw (or add --pretty=rawish)
> that gives the rewritten parents instead of real parents? With
> that, you can drop the code to simplify ancestry by hand in your
> loop, and also you do not have to do the grafts inforamation
> yourself either?
>
> If that is the case I'd be very happy.
>
> The only thing left for us to decide is if reporting the true
> parenthood like the current --pretty=raw makes sense (if so we
> need to keep it and introduce --pretty=rawfish).
>
> The only in-tree user of --pretty=raw seems to be git-svn but it
> only looks at path-unlimited log/rev-list from one given commit,
> so the only difference between dumping what is recorded in the
> commit object and listing what parents we _think_ the commit has
> is what we read from grafts. I think we are safe to just "fix"
> the behaviour of --pretty=raw
I actually think I want to look further into the idea of just using git
--pretty=raw --parents -- $project, and see if I can find any corner cases
where it generates a different history than what we want. This combination of
options seems like it provides everything we need: redundant history
simplification, parent rewriting based on simplification and grafts, and easy
parsing. If the only case in which it differs occurs when you have two
distinct commits with identical trees, I don't know that I care too much; that
particular scenario seems unlikely to occur in any of the trees I care about,
and any sane simplification behavior for it seems OK. :) As long as it runs
correctly with various ancestor/descendant/cousin/unrelated relationships
between merged branches (which I want to test further), I think it will do the
job nicely.
- Josh Triplett
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
2006-10-25 1:59 ` Josh Triplett
@ 2006-10-25 2:13 ` Junio C Hamano
0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2006-10-25 2:13 UTC (permalink / raw)
To: Josh Triplett; +Cc: git
Josh Triplett <josh@freedesktop.org> writes:
> Junio C Hamano wrote:
>
>> The only thing left for us to decide is if reporting the true
>> parenthood like the current --pretty=raw makes sense (if so we
>> need to keep it and introduce --pretty=rawfish).
>>
>> The only in-tree user of --pretty=raw seems to be git-svn but it
>> only looks at path-unlimited log/rev-list from one given commit,
>> so the only difference between dumping what is recorded in the
>> commit object and listing what parents we _think_ the commit has
>> is what we read from grafts. I think we are safe to just "fix"
>> the behaviour of --pretty=raw
>
> I actually think I want to look further into the idea of just using git
> --pretty=raw --parents -- $project, and see if I can find any corner cases
> where it generates a different history than what we want.
I do not mind _coding_ the --pretty=rawfish change if needed but
I do not think it is necessary, which is pretty good news.
After I wrote the message I realized that I probably do not have
to do anything, since --parents would give you the rewritten
parents already anyway.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2006-10-25 2:14 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-27 8:05 [RFC] git-split: Split the history of a git repository by subdirectories and ranges Josh Triplett
2006-09-27 10:13 ` Junio C Hamano
2006-09-27 11:59 ` Andy Whitcroft
2006-09-27 19:08 ` Junio C Hamano
2006-09-27 19:31 ` Junio C Hamano
2006-10-23 10:17 ` Josh Triplett
2006-10-23 15:52 ` Linus Torvalds
2006-10-23 19:27 ` Josh Triplett
2006-10-23 19:50 ` Linus Torvalds
2006-10-23 20:07 ` Jakub Narebski
2006-10-23 20:52 ` Josh Triplett
2006-10-23 21:06 ` Linus Torvalds
2006-10-23 21:19 ` Linus Torvalds
2006-10-24 14:56 ` Johannes Schindelin
2006-10-24 15:19 ` Linus Torvalds
2006-10-25 0:10 ` Junio C Hamano
2006-10-25 0:19 ` Jakub Narebski
2006-10-25 1:59 ` Josh Triplett
2006-10-25 2:13 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).