From: Josh Triplett <josh@freedesktop.org>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges
Date: Mon, 23 Oct 2006 03:17:45 -0700 [thread overview]
Message-ID: <453C96C9.4010005@freedesktop.org> (raw)
In-Reply-To: <7vlko5d3bx.fsf@assigned-by-dhcp.cox.net>
[-- Attachment #1: Type: text/plain, Size: 3301 bytes --]
Junio C Hamano wrote:
> Josh Triplett <josh@freedesktop.org> writes:
>> Jamey Sharp and I wrote a script called git-split to accomplish this
>> repository split. git-split reconstructs the history of a sub-project
>> previously stored in a subdirectory of a larger repository. It
>> constructs new commit objects based on the existing tree objects for the
>> subtree in each commit, and discards commits which do not affect the
>> history of the sub-project, as well as merges made unnecessary due to
>> these discarded commits.
>
> Very nicely done.
Thanks!
>> We would like to acknowledge the work of the gobby team in creating a
>> collaborative editor which greatly aided the development of git-split.
>
>> from itertools import izip
>> from subprocess import Popen, PIPE
>> import os, sys
>
> How recent a Python are we assuming here? Is late 2.4 recent
> enough?
We ran it with 2.4, so yes. git-split does require at least 2.4,
though, because it uses set(), str.rsplit(), and subprocess, none of
which existed in 2.3.
>> def walk(commits, new_commits, commit_hash, project):
>> commit = commits[commit_hash]
>> if not(commit.has_key("new_hash")):
>> tree = get_subtree(commit["tree"], project)
>> commit["new_tree"] = tree
>> if not tree:
>> raise Exception("Did not find project in tree for commit " + commit_hash)
>> new_parents = list(set([walk(commits, new_commits, parent, project)
>> for parent in commit["parents"]]))
>>
>> new_hash = None
>> if len(new_parents) == 1:
>> new_hash = new_parents[0]
>> elif len(new_parents) == 2: # Check for unnecessary merge
>> if is_ancestor(new_commits, new_parents[0], new_parents[1]):
>> new_hash = new_parents[0]
>> elif is_ancestor(new_commits, new_parents[1], new_parents[0]):
>> new_hash = new_parents[1]
>> if new_hash and new_commits[new_hash]["new_tree"] != tree:
>> new_hash = None
>
> This is a real gem. I really like reading well-written Python
> programs.
Thanks. We had some fun writing this; git's elegant repository
structure made it a joy to work with.
> I wonder if using "git-log --full-history -- $project" to let
> the core side omit commits that do not change the $project (but
> still give you all merged branches) would have made your job any
> easier?
I don't think it would. We still need to know what commit to use as the
parent of any given commit, so we don't want commits in the log output
with parents that don't exist in the log output. And rewriting parents
in git-log based on which revisions change the specified subdirectory
seems like a bad idea.
> You are handling grafts by hand because --pretty=raw is special
> in that it displays the real parents (although traversal does
> use grafts). Maybe it would have helped if we had a --pretty
> format that is similar to raw but rewrites the parents?
Yes, that would help. We could then avoid dealing with grafts manually.
How would you feel about including git-split in the GIT tree? We could
certainly write up the necessary documentation for it.
- Josh Triplett
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]
next prev parent reply other threads:[~2006-10-23 10:17 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-09-27 8:05 [RFC] git-split: Split the history of a git repository by subdirectories and ranges Josh Triplett
2006-09-27 10:13 ` Junio C Hamano
2006-09-27 11:59 ` Andy Whitcroft
2006-09-27 19:08 ` Junio C Hamano
2006-09-27 19:31 ` Junio C Hamano
2006-10-23 10:17 ` Josh Triplett [this message]
2006-10-23 15:52 ` Linus Torvalds
2006-10-23 19:27 ` Josh Triplett
2006-10-23 19:50 ` Linus Torvalds
2006-10-23 20:07 ` Jakub Narebski
2006-10-23 20:52 ` Josh Triplett
2006-10-23 21:06 ` Linus Torvalds
2006-10-23 21:19 ` Linus Torvalds
2006-10-24 14:56 ` Johannes Schindelin
2006-10-24 15:19 ` Linus Torvalds
2006-10-25 0:10 ` Junio C Hamano
2006-10-25 0:19 ` Jakub Narebski
2006-10-25 1:59 ` Josh Triplett
2006-10-25 2:13 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=453C96C9.4010005@freedesktop.org \
--to=josh@freedesktop.org \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).