From: Jan Harkes <jaharkes@cs.cmu.edu>
To: Linus Torvalds <torvalds@linux-foundation.org>,
Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: History cleanup/rewriting script for git
Date: Fri, 20 Apr 2007 11:54:46 -0400 [thread overview]
Message-ID: <20070420155446.GA11506@delft.aura.cs.cmu.edu> (raw)
In-Reply-To: <alpine.LFD.0.98.0704190940330.9964@woody.linux-foundation.org>
On Thu, Apr 19, 2007 at 09:43:50AM -0700, Linus Torvalds wrote:
> On Thu, 19 Apr 2007, Johannes Schindelin wrote:
> > Hmm. However, I have to say that cogito serves/d another purpose quite
> > well: Look at what came from cogito into git. Loads of useful
> > enhancements. So, I really have to point to "at this stage", because that
> > sure was not true 18 months ago.
>
> Absolutely. I think there are still some pieces of cogito that we might
> want to migrate into git too, although they're fairly esoteric (ie the
> whole history rewriting thing). And I think we still have some places
I actually have a fairly simple history rewriting script (written in python)
that I used when I converted some CVS archives to git. It is really intended
for such an initial import and history cleanup case so it doesn't deal with
reflogs and such.
Basic workflow I used is,
- Import CVS archive into a git repository
- Use gitk + the grafts file to clean up history as much as feasible
- Run git-rewrite-history.py which will
- write out new commit objects with the corrected set of parents
- copy existing refs to .git/newrefs, pointing them at the new commits.
- start gitk --all to see the tree before the rewrite.
- mv .git/refs .git/oldrefs ; mv .git/newrefs .git/refs
- start a second gitk --all to see the tree after the rewrite.
- compare gitk output to check if everything matches up.
- run git repack/prune/gc to get rid of the old commits, or clone the repo.
Jan
--8<-----------------------------------------------------------------------
#!/usr/bin/python
import os, sys
def git_write_object(type, blob):
stdin, stdout = os.popen2("git-hash-object -t %s -w --stdin" % type)
stdin.write(blob)
stdin.close()
return stdout.readline().strip()
def git_commits(branch):
f = os.popen('git-rev-list --parents --header --topo-order %s' % branch)
buf = ''
while 1:
buf = buf + f.read(4096)
if not buf: break
if not '\0' in buf: continue
commit, buf = buf.split('\0', 1)
yield Commit(commit)
def git_update_ref(name, hash):
os.system('git-update-ref "%s" "%s"' % (name, hash))
grafts = []
pending = []
rewriteable = []
remap = {}
todo = 0
class Commit:
def __init__(self, commit):
global grafts
lines = commit.split('\n')
parts = lines.pop(0).split()
self.hash, self.parents = parts[0], parts[1:]
self.tree = lines.pop(0)
parents = []
while lines[0][:7] == 'parent ':
parents = parents + lines.pop(0).split()[1:]
if parents != self.parents:
grafts.append(self.hash)
commit = []
while 1:
line = lines.pop(0)
commit.append(line)
if not line: break
for line in lines:
commit.append(line[4:])
self.commit = '\n'.join(commit)
self.wait = 0
self.children = []
def mark(self):
global todo, pending
self.wait = self.wait + 1
if self.wait == 1:
todo = todo + 1
for child in self.children:
pending.append(child.hash)
def pick(self):
global rewriteable
self.wait = self.wait - 1
if not self.wait:
rewriteable.append(self)
def fixup(self, old_hash, new_hash):
i = self.parents.index(old_hash)
self.parents[i] = new_hash
self.pick()
def rehash(self):
global todo, remap
todo = todo - 1
blob = self.tree + '\n'
for parent in self.parents:
blob = blob + 'parent %s\n' % parent
blob = blob + self.commit
new_hash = git_write_object('commit', blob)
remap[self.hash] = new_hash
for child in self.children:
child.fixup(self.hash, new_hash)
print "Reading commits... ",
commits = {}
for commit in git_commits('--all'):
commits[commit.hash] = commit
print "read %d commits, found %d grafts" % (len(commits), len(grafts))
print "Setting up reverse linkage"
for commit in commits.values():
for parent in commit.parents:
commits[parent].children.append(commit)
print "Propagating graft information... ",
# first mark all commits that will have to be rewritten.
for commit in grafts:
commits[commit].mark()
for commit in pending:
commits[commit].mark()
# pick those commits that do not depend on any earlier rewrites
for commit in grafts:
commits[commit].pick()
print "%d commits need to be rewritten" % todo
print "Rewriting commits... "
while rewriteable:
print "\rrewriting %5d/%5d commits" % (len(rewriteable), todo),
rewriteable.pop().rehash()
print "done..."
print "Rewriting refs..."
for ref in os.popen('git-for-each-ref'):
hash, type, name = ref.split()
if type != 'commit': continue
if remap.has_key(hash):
hash = remap[hash]
# write updated refs to .git/newrefs
git_update_ref('new' + name, hash)
print "done..."
next prev parent reply other threads:[~2007-04-20 16:29 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-17 9:02 GIT vs Other: Need argument Pietro Mascagni
2007-04-17 9:13 ` Matthieu Moy
2007-04-17 10:26 ` Andy Parkins
2007-04-17 14:32 ` Alex Riesen
2007-04-17 10:37 ` Martin Langhoff
2007-04-17 15:28 ` Linus Torvalds
2007-04-17 17:07 ` Matthieu Moy
2007-04-17 10:33 ` Martin Langhoff
2007-04-17 14:39 ` Alex Riesen
2007-04-25 8:58 ` Dana How
2007-04-25 10:35 ` Alex Riesen
2007-04-17 10:45 ` Tomash Brechko
2007-04-17 15:41 ` Guilhem Bonnefille
2007-04-17 17:18 ` Andy Parkins
2007-04-17 17:30 ` Shawn O. Pearce
2007-04-17 19:36 ` Marcin Kasperski
2007-04-18 10:05 ` Johannes Schindelin
2007-04-18 16:07 ` Linus Torvalds
2007-04-18 16:31 ` Nicolas Pitre
2007-04-18 16:49 ` Bill Lear
2007-04-18 17:43 ` Matthieu Moy
2007-04-18 17:50 ` Nicolas Pitre
2007-04-19 13:16 ` Matthieu Moy
2007-04-19 18:44 ` Petr Baudis
2007-04-20 9:04 ` Matthieu Moy
2007-04-18 20:57 ` Theodore Tso
2007-04-18 20:08 ` Guilhem Bonnefille
2007-04-18 20:19 ` Linus Torvalds
2007-04-18 21:45 ` Daniel Barkalow
2007-04-18 21:21 ` Michael K. Edwards
2007-04-19 8:37 ` Johannes Schindelin
2007-04-19 13:29 ` Matthieu Moy
2007-04-19 9:24 ` Johannes Schindelin
2007-04-19 12:21 ` Alex Riesen
2007-04-19 12:22 ` Christian MICHON
2007-04-19 12:37 ` Johannes Schindelin
2007-04-19 12:54 ` Christian MICHON
2007-04-19 16:43 ` Linus Torvalds
2007-04-19 17:49 ` Marcin Kasperski
2007-04-19 20:57 ` Linus Torvalds
2007-04-23 18:54 ` Carl Worth
2007-04-23 19:52 ` Josef Weidendorfer
2007-04-23 22:12 ` Carl Worth
2007-04-23 22:23 ` Junio C Hamano
2007-04-23 22:58 ` Carl Worth
2007-04-23 23:24 ` Linus Torvalds
2007-04-23 23:55 ` Brian Gernhardt
2007-04-24 1:31 ` Daniel Barkalow
2007-04-24 5:15 ` Junio C Hamano
2007-04-24 14:23 ` J. Bruce Fields
2007-04-24 15:01 ` Linus Torvalds
2007-04-30 4:31 ` J. Bruce Fields
2007-04-25 13:12 ` Making git disappear when talking about my code (was: Re: GIT vs Other: Need argument) Carl Worth
2007-04-25 14:09 ` Carl Worth
2007-04-25 14:55 ` Linus Torvalds
2007-04-25 16:28 ` Carl Worth
2007-04-25 18:07 ` Nicolas Pitre
2007-04-25 19:03 ` Carl Worth
2007-04-25 19:17 ` Making git disappear when talking about my code Junio C Hamano
2007-04-25 19:22 ` Nicolas Pitre
2007-04-25 20:26 ` Carl Worth
2007-04-25 20:23 ` Making git disappear when talking about my code (was: Re: GIT vs Other: Need argument) Nicolas Pitre
2007-04-25 14:51 ` Linus Torvalds
2007-04-25 19:44 ` Daniel Barkalow
2007-04-25 19:56 ` Making git disappear when talking about my code Junio C Hamano
2007-04-25 20:29 ` Linus Torvalds
2007-04-25 20:32 ` Nicolas Pitre
2007-04-25 21:38 ` Daniel Barkalow
2007-04-25 20:29 ` Making git disappear when talking about my code (was: Re: GIT vs Other: Need argument) Carl Worth
2007-04-25 22:39 ` Daniel Barkalow
2007-04-25 20:31 ` Nicolas Pitre
2007-04-23 23:22 ` GIT vs Other: Need argument Junio C Hamano
2007-04-19 20:49 ` Johannes Schindelin
2007-04-20 15:54 ` Jan Harkes [this message]
2007-04-20 18:39 ` History cleanup/rewriting script for git Johannes Schindelin
2007-04-20 18:44 ` Petr Baudis
2007-04-20 20:36 ` Jan Harkes
2007-04-19 12:15 ` GIT vs Other: Need argument Marcin Kasperski
2007-04-19 12:33 ` Johannes Schindelin
2007-04-19 12:42 ` Marcin Kasperski
2007-04-19 13:36 ` Johannes Schindelin
2007-04-19 14:27 ` J. Bruce Fields
2007-04-19 12:45 ` Theodore Tso
2007-04-19 12:46 ` [ANNOUNCE] Cogito is for sale Petr Baudis
2007-04-19 13:32 ` Matthieu Moy
2007-04-19 20:23 ` Junio C Hamano
2007-04-19 20:42 ` Johannes Schindelin
[not found] ` <1176984208.30690.18.camel@cauchy.softax.local>
2007-04-19 12:28 ` GIT vs Other: Need argument Johannes Schindelin
2007-04-19 12:37 ` Marcin Kasperski
2007-04-19 13:32 ` Johannes Schindelin
[not found] ` <200704172239.20124.andyparkins@gmail.com>
2007-04-19 11:59 ` Marcin Kasperski
2007-04-19 12:48 ` Alex Riesen
2007-04-19 12:57 ` Andy Parkins
2007-04-20 6:22 ` Shawn O. Pearce
2007-04-20 13:03 ` Eric Blake
2007-04-18 12:40 ` Guilhem Bonnefille
2007-04-18 13:26 ` Andy Parkins
2007-04-18 17:08 ` Steven Grimm
2007-04-19 0:33 ` Jakub Narebski
2007-04-19 1:24 ` Steven Grimm
2007-04-19 2:08 ` Jakub Narebski
2007-04-19 8:48 ` Johannes Schindelin
2007-04-19 8:57 ` Julian Phillips
2007-04-19 19:03 ` Steven Grimm
2007-04-19 21:00 ` Johannes Schindelin
2007-04-19 2:11 ` Junio C Hamano
2007-04-19 6:02 ` Junio C Hamano
2007-04-19 18:18 ` Steven Grimm
2007-04-19 23:30 ` Junio C Hamano
2007-04-20 5:32 ` Shawn O. Pearce
2007-04-20 9:04 ` Jakub Narebski
2007-04-20 10:18 ` Karl Hasselström
2007-04-20 10:39 ` Junio C Hamano
2007-04-20 13:57 ` Petr Baudis
2007-04-20 8:36 ` Junio C Hamano
2007-04-20 16:42 ` Steven Grimm
2007-04-18 20:54 ` Yann Dirson
2007-04-18 3:09 ` Sam Vilain
2007-04-18 20:49 ` Yann Dirson
2007-04-25 8:55 ` Dana How
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070420155446.GA11506@delft.aura.cs.cmu.edu \
--to=jaharkes@cs.cmu.edu \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).