git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Harkes <jaharkes@cs.cmu.edu>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: History cleanup/rewriting script for git
Date: Fri, 20 Apr 2007 11:54:46 -0400	[thread overview]
Message-ID: <20070420155446.GA11506@delft.aura.cs.cmu.edu> (raw)
In-Reply-To: <alpine.LFD.0.98.0704190940330.9964@woody.linux-foundation.org>

On Thu, Apr 19, 2007 at 09:43:50AM -0700, Linus Torvalds wrote:
> On Thu, 19 Apr 2007, Johannes Schindelin wrote:
> > Hmm. However, I have to say that cogito serves/d another purpose quite 
> > well: Look at what came from cogito into git. Loads of useful 
> > enhancements. So, I really have to point to "at this stage", because that 
> > sure was not true 18 months ago.
> 
> Absolutely. I think there are still some pieces of cogito that we might 
> want to migrate into git too, although they're fairly esoteric (ie the 
> whole history rewriting thing). And I think we still have some places 

I actually have a fairly simple history rewriting script (written in python)
that I used when I converted some CVS archives to git. It is really intended
for such an initial import and history cleanup case so it doesn't deal with
reflogs and such.

Basic workflow I used is,

- Import CVS archive into a git repository
- Use gitk + the grafts file to clean up history as much as feasible
- Run git-rewrite-history.py which will
    - write out new commit objects with the corrected set of parents
    - copy existing refs to .git/newrefs, pointing them at the new commits.

- start gitk --all to see the tree before the rewrite.
- mv .git/refs .git/oldrefs ; mv .git/newrefs .git/refs
- start a second gitk --all to see the tree after the rewrite.
- compare gitk output to check if everything matches up.

- run git repack/prune/gc to get rid of the old commits, or clone the repo.

Jan

--8<-----------------------------------------------------------------------

#!/usr/bin/python

import os, sys

def git_write_object(type, blob):
    stdin, stdout = os.popen2("git-hash-object -t %s -w --stdin" % type)
    stdin.write(blob)
    stdin.close()
    return stdout.readline().strip()

def git_commits(branch):
    f = os.popen('git-rev-list --parents --header --topo-order %s' % branch)
    buf = ''
    while 1:
	buf = buf + f.read(4096)
	if not buf: break
	if not '\0' in buf: continue
	commit, buf = buf.split('\0', 1)
	yield Commit(commit)

def git_update_ref(name, hash):
    os.system('git-update-ref "%s" "%s"' % (name, hash))

grafts = []
pending = []
rewriteable = []
remap = {}
todo = 0
class Commit:
    def __init__(self, commit):
	global grafts
	lines = commit.split('\n')
	parts = lines.pop(0).split()
	self.hash, self.parents = parts[0], parts[1:]

	self.tree = lines.pop(0)

	parents = []
	while lines[0][:7] == 'parent ':
	    parents = parents + lines.pop(0).split()[1:]

	if parents != self.parents:
	    grafts.append(self.hash)

	commit = []
	while 1:
	    line = lines.pop(0)
	    commit.append(line)
	    if not line: break

	for line in lines:
	    commit.append(line[4:])
	self.commit = '\n'.join(commit)

	self.wait = 0
	self.children = []

    def mark(self):
	global todo, pending
	self.wait = self.wait + 1
	if self.wait == 1:
	    todo = todo + 1
	    for child in self.children:
		pending.append(child.hash)

    def pick(self):
	global rewriteable
	self.wait = self.wait - 1
	if not self.wait:
	    rewriteable.append(self)

    def fixup(self, old_hash, new_hash):
	i = self.parents.index(old_hash)
	self.parents[i] = new_hash
	self.pick()

    def rehash(self):
	global todo, remap
	todo = todo - 1

	blob = self.tree + '\n'
	for parent in self.parents:
	    blob = blob + 'parent %s\n' % parent
	blob = blob + self.commit

	new_hash = git_write_object('commit', blob)
	remap[self.hash] = new_hash

	for child in self.children:
	    child.fixup(self.hash, new_hash)

print "Reading commits... ",
commits = {}
for commit in git_commits('--all'):
    commits[commit.hash] = commit
print "read %d commits, found %d grafts" % (len(commits), len(grafts))

print "Setting up reverse linkage"
for commit in commits.values():
    for parent in commit.parents:
	commits[parent].children.append(commit)

print "Propagating graft information... ",
# first mark all commits that will have to be rewritten.
for commit in grafts:
    commits[commit].mark()

for commit in pending:
    commits[commit].mark()

# pick those commits that do not depend on any earlier rewrites
for commit in grafts:
    commits[commit].pick()
print "%d commits need to be rewritten" % todo

print "Rewriting commits... "
while rewriteable:
    print "\rrewriting %5d/%5d commits" % (len(rewriteable), todo),
    rewriteable.pop().rehash()
print "done..."

print "Rewriting refs..."
for ref in os.popen('git-for-each-ref'):
    hash, type, name = ref.split()
    if type != 'commit': continue

    if remap.has_key(hash):
	hash = remap[hash]

    # write updated refs to .git/newrefs
    git_update_ref('new' + name, hash)

print "done..."

  parent reply	other threads:[~2007-04-20 16:29 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-17  9:02 GIT vs Other: Need argument Pietro Mascagni
2007-04-17  9:13 ` Matthieu Moy
2007-04-17 10:26   ` Andy Parkins
2007-04-17 14:32     ` Alex Riesen
2007-04-17 10:37   ` Martin Langhoff
2007-04-17 15:28   ` Linus Torvalds
2007-04-17 17:07     ` Matthieu Moy
2007-04-17 10:33 ` Martin Langhoff
2007-04-17 14:39   ` Alex Riesen
2007-04-25  8:58     ` Dana How
2007-04-25 10:35       ` Alex Riesen
2007-04-17 10:45 ` Tomash Brechko
2007-04-17 15:41   ` Guilhem Bonnefille
2007-04-17 17:18     ` Andy Parkins
2007-04-17 17:30       ` Shawn O. Pearce
2007-04-17 19:36         ` Marcin Kasperski
2007-04-18 10:05           ` Johannes Schindelin
2007-04-18 16:07             ` Linus Torvalds
2007-04-18 16:31               ` Nicolas Pitre
2007-04-18 16:49               ` Bill Lear
2007-04-18 17:43                 ` Matthieu Moy
2007-04-18 17:50                   ` Nicolas Pitre
2007-04-19 13:16                     ` Matthieu Moy
2007-04-19 18:44                       ` Petr Baudis
2007-04-20  9:04                         ` Matthieu Moy
2007-04-18 20:57                   ` Theodore Tso
2007-04-18 20:08               ` Guilhem Bonnefille
2007-04-18 20:19                 ` Linus Torvalds
2007-04-18 21:45                   ` Daniel Barkalow
2007-04-18 21:21                 ` Michael K. Edwards
2007-04-19  8:37                 ` Johannes Schindelin
2007-04-19 13:29                   ` Matthieu Moy
2007-04-19  9:24               ` Johannes Schindelin
2007-04-19 12:21                 ` Alex Riesen
2007-04-19 12:22                 ` Christian MICHON
2007-04-19 12:37                   ` Johannes Schindelin
2007-04-19 12:54                     ` Christian MICHON
2007-04-19 16:43                 ` Linus Torvalds
2007-04-19 17:49                   ` Marcin Kasperski
2007-04-19 20:57                     ` Linus Torvalds
2007-04-23 18:54                       ` Carl Worth
2007-04-23 19:52                         ` Josef Weidendorfer
2007-04-23 22:12                           ` Carl Worth
2007-04-23 22:23                         ` Junio C Hamano
2007-04-23 22:58                           ` Carl Worth
2007-04-23 23:24                             ` Linus Torvalds
2007-04-23 23:55                               ` Brian Gernhardt
2007-04-24  1:31                               ` Daniel Barkalow
2007-04-24  5:15                               ` Junio C Hamano
2007-04-24 14:23                                 ` J. Bruce Fields
2007-04-24 15:01                                   ` Linus Torvalds
2007-04-30  4:31                                     ` J. Bruce Fields
2007-04-25 13:12                                 ` Making git disappear when talking about my code (was: Re: GIT vs Other: Need argument) Carl Worth
2007-04-25 14:09                                   ` Carl Worth
2007-04-25 14:55                                     ` Linus Torvalds
2007-04-25 16:28                                       ` Carl Worth
2007-04-25 18:07                                         ` Nicolas Pitre
2007-04-25 19:03                                           ` Carl Worth
2007-04-25 19:17                                             ` Making git disappear when talking about my code Junio C Hamano
2007-04-25 19:22                                               ` Nicolas Pitre
2007-04-25 20:26                                               ` Carl Worth
2007-04-25 20:23                                             ` Making git disappear when talking about my code (was: Re: GIT vs Other: Need argument) Nicolas Pitre
2007-04-25 14:51                                   ` Linus Torvalds
2007-04-25 19:44                                   ` Daniel Barkalow
2007-04-25 19:56                                     ` Making git disappear when talking about my code Junio C Hamano
2007-04-25 20:29                                       ` Linus Torvalds
2007-04-25 20:32                                       ` Nicolas Pitre
2007-04-25 21:38                                       ` Daniel Barkalow
2007-04-25 20:29                                     ` Making git disappear when talking about my code (was: Re: GIT vs Other: Need argument) Carl Worth
2007-04-25 22:39                                       ` Daniel Barkalow
2007-04-25 20:31                                     ` Nicolas Pitre
2007-04-23 23:22                         ` GIT vs Other: Need argument Junio C Hamano
2007-04-19 20:49                   ` Johannes Schindelin
2007-04-20 15:54                   ` Jan Harkes [this message]
2007-04-20 18:39                     ` History cleanup/rewriting script for git Johannes Schindelin
2007-04-20 18:44                       ` Petr Baudis
2007-04-20 20:36                       ` Jan Harkes
2007-04-19 12:15               ` GIT vs Other: Need argument Marcin Kasperski
2007-04-19 12:33                 ` Johannes Schindelin
2007-04-19 12:42                   ` Marcin Kasperski
2007-04-19 13:36                     ` Johannes Schindelin
2007-04-19 14:27                     ` J. Bruce Fields
2007-04-19 12:45                   ` Theodore Tso
2007-04-19 12:46               ` [ANNOUNCE] Cogito is for sale Petr Baudis
2007-04-19 13:32                 ` Matthieu Moy
2007-04-19 20:23                 ` Junio C Hamano
2007-04-19 20:42                   ` Johannes Schindelin
     [not found]             ` <1176984208.30690.18.camel@cauchy.softax.local>
2007-04-19 12:28               ` GIT vs Other: Need argument Johannes Schindelin
2007-04-19 12:37                 ` Marcin Kasperski
2007-04-19 13:32                   ` Johannes Schindelin
     [not found]           ` <200704172239.20124.andyparkins@gmail.com>
2007-04-19 11:59             ` Marcin Kasperski
2007-04-19 12:48               ` Alex Riesen
2007-04-19 12:57               ` Andy Parkins
2007-04-20  6:22               ` Shawn O. Pearce
2007-04-20 13:03                 ` Eric Blake
2007-04-18 12:40       ` Guilhem Bonnefille
2007-04-18 13:26         ` Andy Parkins
2007-04-18 17:08           ` Steven Grimm
2007-04-19  0:33             ` Jakub Narebski
2007-04-19  1:24               ` Steven Grimm
2007-04-19  2:08                 ` Jakub Narebski
2007-04-19  8:48                   ` Johannes Schindelin
2007-04-19  8:57                     ` Julian Phillips
2007-04-19 19:03                     ` Steven Grimm
2007-04-19 21:00                       ` Johannes Schindelin
2007-04-19  2:11                 ` Junio C Hamano
2007-04-19  6:02                   ` Junio C Hamano
2007-04-19 18:18                     ` Steven Grimm
2007-04-19 23:30                       ` Junio C Hamano
2007-04-20  5:32                         ` Shawn O. Pearce
2007-04-20  9:04                         ` Jakub Narebski
2007-04-20 10:18                         ` Karl Hasselström
2007-04-20 10:39                           ` Junio C Hamano
2007-04-20 13:57                             ` Petr Baudis
2007-04-20  8:36                       ` Junio C Hamano
2007-04-20 16:42                         ` Steven Grimm
2007-04-18 20:54           ` Yann Dirson
2007-04-18  3:09     ` Sam Vilain
2007-04-18 20:49   ` Yann Dirson
2007-04-25  8:55   ` Dana How

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070420155446.GA11506@delft.aura.cs.cmu.edu \
    --to=jaharkes@cs.cmu.edu \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).