* mercurial to git
@ 2007-03-06 21:06 Rocco Rutte
2007-03-06 21:54 ` Theodore Tso
` (3 more replies)
0 siblings, 4 replies; 19+ messages in thread
From: Rocco Rutte @ 2007-03-06 21:06 UTC (permalink / raw)
To: git
[-- Attachment #1: Type: text/plain, Size: 1964 bytes --]
Hi,
attached are two files of take #1 of writing a hg2git converter/tracker
using git-fast-import. It basically works so use at your own risk and
send patches... :)
"Basically" means that it gets tags, branches and merges right (working
tree md5 sums match after imports). It also means that it is horribly
slow for the repos I tested it own (only mutt and hg-crew).
The performance bottleneck is hg exporting data, as discovered by people
on #mercurial, the problem is not really fixable and is due to hg's
revlog handling. As a result, I needed to let the script feed the full
contents of the repository at each revision we walk (i.e. all for the
initial import) into git-fast-import. This is horribly slow. For mutt
which contains several tags, a handfull of branches and only 5k commits
this takes roughly two hours at 1 commit/sec. My earlier version not
using 'deleteall' and feeding only files that changed took 15 minutes
alltogether, git-fast-import from a textfile 1 min 30 sec.
As I'll use this my for daily work (more or less), I'll think I'll
"maintain" and keep improving it, so if anyone has comments, critics,
hints, patches, ...
Somewhat related: It would be really nice to teach git-fast-import to
init from a previously saved mark file. Right now I use hg revision
numbers as marks, let git-fast-import save them, and read them back next
time. These are needed to map hg revisions to git SHA1s in case I need
to reference something in an incremental import from an earlier run. It
would be nice if git-fast-import could do this on its own so that all
consumers can benefit and can have persistent marks accross sessions.
About the attached files: hg2git.py is the worker script using the
mercurial python package so that no more slow shell or pipes including
fork are needed for the raw export, hg2git.sh is a convenience shell
wrapper taking core of the state files for incremental imports.
bye, Rocco
--
:wq!
[-- Attachment #2: hg2git.sh --]
[-- Type: application/x-sh, Size: 1691 bytes --]
[-- Attachment #3: hg2git.py --]
[-- Type: text/plain, Size: 6961 bytes --]
#!/usr/bin/env python
# Copyright (c) 2007 Rocco Rutte <pdmef@gmx.net>
# License: GPLv2
"""hg2git.py - A mercurial-to-git filter for git-fast-import(1)
Usage: hg2git.py <hg repo url> <marks file> <heads file> <tip file>
"""
from mercurial import repo,hg,cmdutil,util,ui,revlog
from tempfile import mkstemp
import re
import sys
import os
# silly regex to see if user field has email address
user_re=re.compile('[^<]+ <[^>]+>$')
# git branch for hg's default 'HEAD' branch
cfg_master='master'
# insert 'checkpoint' command after this many commits
cfg_checkpoint_count=1000
def usage(ret):
sys.stderr.write(__doc__)
return ret
def setup_repo(url):
myui=ui.ui()
return myui,hg.repository(myui,url)
def get_changeset(ui,repo,revision):
def get_branch(name):
if name=='HEAD':
name=cfg_master
return name
def fixup_user(user):
if user_re.match(user)==None:
if '@' not in user:
return user+' <none@none>'
return user+' <'+user+'>'
return user
node=repo.lookup(revision)
(manifest,user,(time,timezone),files,desc,extra)=repo.changelog.read(node)
tz="%+03d%02d" % (-timezone / 3600, ((-timezone % 3600) / 60))
branch=get_branch(extra.get('branch','master'))
return (manifest,fixup_user(user),(time,tz),files,desc,branch,extra)
def gitmode(x):
return x and '100755' or '100644'
def wr(msg=''):
print msg
#map(lambda x: sys.stderr.write('\t[%s]\n' % x),msg.split('\n'))
def checkpoint(count):
count=count+1
if count%cfg_checkpoint_count==0:
sys.stderr.write("Checkpoint after %d commits\n" % count)
wr('checkpoint')
wr()
return count
def get_parent_mark(parent,marks):
p=marks.get(str(parent),None)
if p==None:
# if we didn't see parent previously, assume we saw it in this run
p=':%d' % (parent+1)
return p
def export_commit(ui,repo,revision,marks,heads,last,max,count):
sys.stderr.write('Exporting revision %d (tip %d) as [:%d]\n' % (revision,max,revision+1))
(_,user,(time,timezone),files,desc,branch,_)=get_changeset(ui,repo,revision)
parents=repo.changelog.parentrevs(revision)
# we need this later to write out tags
marks[str(revision)]=':%d'%(revision+1)
wr('commit refs/heads/%s' % branch)
wr('mark :%d' % (revision+1))
wr('committer %s %d %s' % (user,time,timezone))
wr('data %d' % (len(desc)+1)) # wtf?
wr(desc)
wr()
src=heads.get(branch,'')
link=''
if src!='':
# if we have a cached head, this is an incremental import: initialize it
# and kill reference so we won't init it again
wr('from %s' % src)
heads[branch]=''
elif not heads.has_key(branch) and revision>0:
# newly created branch and not the first one: connect to parent
tmp=get_parent_mark(parents[0],marks)
wr('from %s' % tmp)
sys.stderr.write('Link new branch [%s] to parent [%s]\n' %
(branch,tmp))
link=tmp # avoid making a merge commit for branch fork
if parents:
l=last.get(branch,revision)
for p in parents:
# 1) as this commit implicitely is the child of the most recent
# commit of this branch, ignore this parent
# 2) ignore nonexistent parents
# 3) merge otherwise
if p==l or p==revision or p<0:
continue
tmp=get_parent_mark(p,marks)
# if we fork off a branch, don't merge via 'merge' as we have
# 'from' already above
if tmp==link:
continue
sys.stderr.write('Merging branch [%s] with parent [%s] from [r%d]\n' %
(branch,tmp,p))
wr('merge %s' % tmp)
last[branch]=revision
heads[branch]=''
# just wipe the branch clean, all full manifest contents
wr('deleteall')
ctx=repo.changectx(str(revision))
man=ctx.manifest()
#for f in man.keys():
# fctx=ctx.filectx(f)
# d=fctx.data()
# wr('M %s inline %s' % (gitmode(man.execf(f)),f))
# wr('data %d' % len(d)) # had some trouble with size()
# wr(d)
for fctx in ctx.filectxs():
f=fctx.path()
d=fctx.data()
wr('M %s inline %s' % (gitmode(man.execf(f)),f))
wr('data %d' % len(d)) # had some trouble with size()
wr(d)
wr()
return checkpoint(count)
def export_tags(ui,repo,cache,count):
l=repo.tagslist()
for tag,node in l:
if tag=='tip':
continue
rev=repo.changelog.rev(node)
ref=cache.get(str(rev),None)
if ref==None:
sys.stderr.write('Failed to find reference for creating tag'
' %s at r%d\n' % (tag,rev))
continue
(_,user,(time,timezone),_,desc,branch,_)=get_changeset(ui,repo,rev)
sys.stderr.write('Exporting tag [%s] at [hg r%d] [git %s]\n' % (tag,rev,ref))
wr('tag %s' % tag)
wr('from %s' % ref)
wr('tagger %s %d %s' % (user,time,timezone))
msg='hg2git created tag %s for hg revision %d on branch %s on (summary):\n\t%s' % (tag,
rev,branch,desc.split('\n')[0])
wr('data %d' % (len(msg)+1))
wr(msg)
wr()
count=checkpoint(count)
return count
def load_cache(filename):
cache={}
if not os.path.exists(filename):
return cache
f=open(filename,'r')
l=0
for line in f.readlines():
l+=1
fields=line.split(' ')
if fields==None or not len(fields)==2 or fields[0][0]!=':':
sys.stderr.write('Invalid file format in [%s], line %d\n' % (filename,l))
continue
# put key:value in cache, key without ^:
cache[fields[0][1:]]=fields[1].split('\n')[0]
f.close()
return cache
def save_cache(filename,cache):
f=open(filename,'w+')
map(lambda x: f.write(':%s %s\n' % (str(x),str(cache.get(x)))),cache.keys())
f.close()
def verify_heads(ui,repo,cache):
def getsha1(branch):
f=open(os.getenv('GIT_DIR','/dev/null')+'/refs/heads/'+branch)
sha1=f.readlines()[0].split('\n')[0]
f.close()
return sha1
for b in cache.keys():
sys.stderr.write('Verifying branch [%s]\n' % b)
sha1=getsha1(b)
c=cache.get(b)
if sha1!=c:
sys.stderr.write('Warning: Branch [%s] modified outside hg2git:'
'\n%s (repo) != %s (cache)\n' % (b,sha1,c))
return True
if __name__=='__main__':
if len(sys.argv)!=6: sys.exit(usage(1))
repourl,m,marksfile,headsfile,tipfile=sys.argv[1:]
_max=int(m)
marks_cache=load_cache(marksfile)
heads_cache=load_cache(headsfile)
state_cache=load_cache(tipfile)
ui,repo=setup_repo(repourl)
if not verify_heads(ui,repo,heads_cache):
sys.exit(1)
tip=repo.changelog.count()
min=int(state_cache.get('tip',0))
max=_max
if _max<0:
max=tip
c=int(state_cache.get('count',0))
last={}
for rev in range(min,max):
c=export_commit(ui,repo,rev,marks_cache,heads_cache,last,tip,c)
c=export_tags(ui,repo,marks_cache,c)
state_cache['tip']=max
state_cache['count']=c
state_cache['repo']=repourl
save_cache(tipfile,state_cache)
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: mercurial to git
2007-03-06 21:06 mercurial to git Rocco Rutte
@ 2007-03-06 21:54 ` Theodore Tso
2007-03-06 22:47 ` Rocco Rutte
` (2 more replies)
2007-03-07 15:59 ` Shawn O. Pearce
` (2 subsequent siblings)
3 siblings, 3 replies; 19+ messages in thread
From: Theodore Tso @ 2007-03-06 21:54 UTC (permalink / raw)
To: git
On Tue, Mar 06, 2007 at 09:06:29PM +0000, Rocco Rutte wrote:
>
> attached are two files of take #1 of writing a hg2git converter/tracker
> using git-fast-import. It basically works so use at your own risk and
> send patches... :)
I was actually thinking about doing this too, but apparently you beat
me too it. :-)
> The performance bottleneck is hg exporting data, as discovered by people
> on #mercurial, the problem is not really fixable and is due to hg's
> revlog handling. As a result, I needed to let the script feed the full
> contents of the repository at each revision we walk (i.e. all for the
> initial import) into git-fast-import. This is horribly slow. For mutt
> which contains several tags, a handfull of branches and only 5k commits
> this takes roughly two hours at 1 commit/sec. My earlier version not
> using 'deleteall' and feeding only files that changed took 15 minutes
> alltogether, git-fast-import from a textfile 1 min 30 sec.
Hmm.... the way I was planning on handling the performance bottleneck
was to use "hg manifest --debug <rev>" and diffing the hashes against
its parents. Using "hg manifest" only hits .hg/00manifest.[di] and
.hg/00changelog.[di] files, so it's highly efficient. With the
--debug option to hg manifest (not needed on some earlier versions of
hg, but it seems to be needed on the latest development version of
hg), it outputs the mode and SHA1 hash of the files, so it becomes
easy to see which files were changed relative to the revision's
parent(s).
Once we know which files we need to feed to git-fast-import, it's just
a matter of using "hg cat -r <rev> <pathname>" to feed the individual
changed file to git-fast-import. For each file, you only have to
touch .hg/data/pathane.[di] files. So this should allow us to feed
input into git-fast-important without needing to feed the full
contents of the repository for each revision.
The other thing that I've been working in my design is how to make the
converter to be bidrectional. That is, if a changelog is made on the
hg repository, it should be possible to push it over to the git
repository, and vice versa, if there are changes made in the git
repository, it should be possible to push it back to git.
In order to do this it becomes necessary to special case the .hgrc
file, and in fact we need to make sure that the .hgrc file does *not*
show up in the git repository, but the contents of the .hgrc file
needs to be stored in the state file that lives alongside the git and
hg repositories.
Regards,
- Ted
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: mercurial to git
2007-03-06 21:54 ` Theodore Tso
@ 2007-03-06 22:47 ` Rocco Rutte
2007-03-06 23:08 ` Josef Sipek
2007-03-08 9:01 ` Rocco Rutte
2 siblings, 0 replies; 19+ messages in thread
From: Rocco Rutte @ 2007-03-06 22:47 UTC (permalink / raw)
To: git
Hi,
* Theodore Tso [07-03-06 16:54:59 -0500] wrote:
>Hmm.... the way I was planning on handling the performance bottleneck
>was to use "hg manifest --debug <rev>" and diffing the hashes against
>its parents. Using "hg manifest" only hits .hg/00manifest.[di] and
>.hg/00changelog.[di] files, so it's highly efficient. With the
>--debug option to hg manifest (not needed on some earlier versions of
>hg, but it seems to be needed on the latest development version of
>hg), it outputs the mode and SHA1 hash of the files, so it becomes
>easy to see which files were changed relative to the revision's
>parent(s).
I started getting/looking at hg a few days ago, mainly at the source
only so that I likely miss some things...
Hmm. I'll need to further read the hg source to see how they do it. I
now switched to defaulting to use the hg changes for normal changesets
and the full manifest for merges. That's a huge boost already. Your
approach sounds even better... so that I'll use it. :)
bye, Rocco
--
:wq!
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: mercurial to git
2007-03-06 21:54 ` Theodore Tso
2007-03-06 22:47 ` Rocco Rutte
@ 2007-03-06 23:08 ` Josef Sipek
2007-03-07 0:11 ` Theodore Tso
2007-03-08 9:01 ` Rocco Rutte
2 siblings, 1 reply; 19+ messages in thread
From: Josef Sipek @ 2007-03-06 23:08 UTC (permalink / raw)
To: Theodore Tso; +Cc: git
On Tue, Mar 06, 2007 at 04:54:59PM -0500, Theodore Tso wrote:
...
> The other thing that I've been working in my design is how to make the
> converter to be bidrectional.
A while back, I tried to write an extension to mercurial that would export a
hg repo using the git protocol. One side-effect was that it converted the
entire repository to a git repo with many loose objects.
It "worked" (I never finished it enough) on a small repo in a bidirectional
way.
I'll try to dig up the code, and put it up somewhere...
Josef "Jeff" Sipek.
--
I'm somewhere between geek and normal.
- Linus Torvalds
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: mercurial to git
2007-03-06 23:08 ` Josef Sipek
@ 2007-03-07 0:11 ` Theodore Tso
[not found] ` <20070314111257.GA4526@peter.daprodeges.fqdn.th-h.de>
0 siblings, 1 reply; 19+ messages in thread
From: Theodore Tso @ 2007-03-07 0:11 UTC (permalink / raw)
To: Josef Sipek; +Cc: git
On Tue, Mar 06, 2007 at 06:08:02PM -0500, Josef Sipek wrote:
> I'll try to dig up the code, and put it up somewhere...
Here's a hacked up version of Stelian Pop's converter code that I used
for an initial test conversion of e2fsprogs from hg to git. The main
improvements from Stelian's is that it's a bit faster by caching the
results of "hg log", and that it handles parses the Signed-off-by:
headers to feed in into the ChangeSet's Author identity (as distinct
from the committer identity, which it gets from the hg information).
The other change which I added was add a pretty kludgy committer name
cannonicalizer, since there the commiter information dates is pretty
grotty. That's because the e2fsprogs source repository has over the
years been converted from CVS, to BitKeeper, to Mercurial, and now at
some point soon when I'm happy with a decent hg-to-git tool, to git.
My plan was to rewrite the converter to call Mercurial's python
classes directly (using the equivalent python code to 'hg manifest'
and 'hg cat' to speed things up enormously, compared to checking out
each revision one at a time and then using git to figure out which
files had been added/changed/deleted), and to interface it into
git-fast-import, and make the necessary changes (including more
intelligent handling of .hgtags) so that the conversion could be
bidrectional.
But if I can convince someone else to do the work, especially if their
converter handles the Signed-off-by: parsing, and making sure the
author and commit dates are properly set, that would certainly be a
bonus. :-)
- Ted
P.S. Oh yes, my plan was to use Python's ConfigParser class to store
the author cannonicalization information, instead of hard-coding the
data into the python script. Code snippets to do this available on
request; it was pretty trivial to do.
#! /usr/bin/python
""" hg-to-git.py - A Mercurial to GIT converter
Copyright (C)2007 Stelian Pop <stelian@xxxxxxxxxx>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
"""
import os, os.path, sys
import tempfile, popen2, pickle, getopt
import re
# Maps hg version -> git version
hgvers = {}
# List of children for each hg revision
hgchildren = {}
# Current branch for each hg revision
hgbranch = {}
#------------------------------------------------------------------------------
def usage():
print """\
%s: [OPTIONS] <hgprj>
options:
-s, --gitstate=FILE: name of the state to be saved/read
for incrementals
required:
hgprj: name of the HG project to import (directory)
""" % sys.argv[0]
#------------------------------------------------------------------------------
def getgitenv(user, author, date):
env = ''
if author == '':
author = user
elems = re.compile('(.*?)\s+<(.*)>').match(user)
if elems:
env += 'export GIT_COMMITER_NAME="%s" ;' % elems.group(1)
env += 'export GIT_COMMITER_EMAIL="%s" ;' % elems.group(2)
else:
env += 'export GIT_COMMITER_NAME="%s" ;' % user
env += 'export GIT_COMMITER_EMAIL= ;'
elems = re.compile('(.*?)\s+<(.*)>').match(author)
if elems:
env += 'export GIT_AUTHOR_NAME="%s" ;' % elems.group(1)
env += 'export GIT_AUTHOR_EMAIL="%s" ;' % elems.group(2)
else:
env += 'export GIT_AUTHOR_NAME="%s" ;' % author
env += 'export GIT_AUTHOR_EMAIL= ;'
env += 'export GIT_AUTHOR_DATE="%s" ;' % date
env += 'export GIT_COMMITTER_DATE="%s" ;' % date
return env
#------------------------------------------------------------------------------
state = ''
try:
opts, args = getopt.getopt(sys.argv[1:], 's:t:', ['gitstate=', 'tempdir='])
for o, a in opts:
if o in ('-s', '--gitstate'):
state = a
state = os.path.abspath(state)
if len(args) != 1:
raise('params')
except:
usage()
sys.exit(1)
hgprj = args[0]
os.chdir(hgprj)
if state:
if os.path.exists(state):
print 'State does exist, reading'
f = open(state, 'r')
hgvers = pickle.load(f)
else:
print 'State does not exist, first run'
tip = os.popen('hg tip | head -1 | cut -f 2 -d :').read().strip()
print 'tip is', tip
# Calculate the branches
print 'analysing the branches...'
hgchildren["0"] = ()
hgbranch["0"] = "master"
for cset in range(1, int(tip) + 1):
hgchildren[str(cset)] = ()
prnts = os.popen('hg log -r %d | grep ^parent: | cut -f 2 -d :' % cset).readlines()
if len(prnts) > 0:
parent = prnts[0].strip()
else:
parent = str(cset - 1)
hgchildren[parent] += ( str(cset), )
if len(prnts) > 1:
mparent = prnts[1].strip()
hgchildren[mparent] += ( str(cset), )
else:
mparent = None
if mparent:
# For merge changesets, take either one, preferably the 'master' branch
if hgbranch[mparent] == 'master':
hgbranch[str(cset)] = 'master'
else:
hgbranch[str(cset)] = hgbranch[parent]
else:
# Normal changesets
# For first children, take the parent branch, for the others create a new branch
if hgchildren[parent][0] == str(cset):
hgbranch[str(cset)] = hgbranch[parent]
else:
hgbranch[str(cset)] = "branch-" + str(cset)
if not hgvers.has_key("0"):
print 'creating repository'
os.system('git-init-db')
# loop through every hg changeset
for cset in range(int(tip) + 1):
# incremental, already seen
if hgvers.has_key(str(cset)):
continue
# get info
prnts = os.popen('hg log -r %d | grep ^parent: | cut -f 2 -d :' % cset).readlines()
if len(prnts) > 0:
parent = prnts[0].strip()
else:
parent = str(cset - 1)
if len(prnts) > 1:
mparent = prnts[1].strip()
else:
mparent = None
(fdlog, filelog) = tempfile.mkstemp()
logtxt = os.popen('hg log -r %d -v' % cset).read().strip()
os.write(fdlog, logtxt)
os.close(fdlog)
(fdcomment, filecomment) = tempfile.mkstemp()
csetcomment = os.popen('grep -v ^changeset: < %s | grep -v ^parent: | grep -v ^user: | grep -v ^date | grep -v ^files: | grep -v ^description: | grep -v ^tag:' % filelog).read().strip()
os.write(fdcomment, csetcomment)
os.close(fdcomment)
date = os.popen('grep -m 1 ^date: < %s | cut -f 2- -d :' % filelog).read().strip()
tag = os.popen('grep -m 1 ^tag: < %s | cut -f 2- -d :' % filelog).read().strip()
user = os.popen('grep -m 1 ^user: < %s | cut -f 2- -d :' % filelog).read().strip()
if user == 'tytso@mit.edu':
user = "Theodore Ts'o <tytso@mit.edu>"
if user == 'tytso@think.thunk.org':
user = "Theodore Ts'o <tytso@mit.edu>"
if user == 'tytso@snap.thunk.org':
user = "Theodore Ts'o <tytso@mit.edu>"
if user == 'tytso@fs.thunk.org':
user = "Theodore Ts'o <tytso@mit.edu>"
if user == 'tytso@voltaire.debian.org':
user = "Theodore Ts'o <tytso@mit.edu>"
if user == 'tytso@who-could-of.thunk.org':
user = "Theodore Ts'o <tytso@mit.edu>"
if user == 'tytso@universal.(none)':
user = "Theodore Ts'o <tytso@mit.edu>"
if user == 'tytso@theodore-tsos-computer.local':
user = "Theodore Ts'o <tytso@mit.edu>"
if user == 'adilger@clusterfs.com':
user = "Andreas Dilger <adilger@clusterfs.com>"
if user == 'adilger@lynx.adilger.int':
user = "Andreas Dilger <adilger@clusterfs.com>"
if user == 'root@lynx.adilger.int':
user = "Andreas Dilger <adilger@clusterfs.com>"
if user == 'matthias.andree@gmx.de':
user = "Matthias Andree <matthias.andree@gmx.de>"
if user == 'laptop@duncow.home.oldelvet.org.uk':
user = "Richard Mortimer <richm@oldelvet.org.uk>"
if user == 'sct@redhat.com':
user = 'Stephen Tweedie <sct@redhat.com>'
if user == 'sct@sisko.scot.redhat.com':
user = 'Stephen Tweedie <sct@redhat.com>'
if user == 'paubert@gra-vd1.iram.es':
user = 'Gabriel Paubert <paubert@iram.es>'
author = os.popen('grep -m 1 ^Signed-off-by: < %s | cut -f 2- -d :' % filelog).read().strip()
if author == '"Theodore Ts\'o" <tytso@mit.edu>':
author = "Theodore Ts'o <tytso@mit.edu>"
os.unlink(filelog)
print '-----------------------------------------'
print 'cset:', cset
print 'branch:', hgbranch[str(cset)]
print 'user:', user
print 'author:', author
print 'date:', date
print 'comment:', csetcomment
print 'parent:', parent
if mparent:
print 'mparent:', mparent
if tag:
print 'tag:', tag
print '-----------------------------------------'
# checkout the parent if necessary
if cset != 0:
if hgbranch[str(cset)] == "branch-" + str(cset):
print 'creating new branch', hgbranch[str(cset)]
os.system('git-checkout -b %s %s' % (hgbranch[str(cset)], hgvers[parent]))
else:
print 'checking out branch', hgbranch[str(cset)]
os.system('git-checkout %s' % hgbranch[str(cset)])
# merge
if mparent:
if hgbranch[parent] == hgbranch[str(cset)]:
otherbranch = hgbranch[mparent]
else:
otherbranch = hgbranch[parent]
print 'merging', otherbranch, 'into', hgbranch[str(cset)]
os.system(getgitenv(user, author, date) + 'git-merge --no-commit -s ours "" %s %s' % (hgbranch[str(cset)], otherbranch))
# remove everything except .git and .hg directories
os.system('find . \( -path "./.hg" -o -path "./.git" \) -prune -o ! -name "." -print | xargs rm -rf')
# repopulate with checkouted files
os.system('hg update -C %d' % cset)
# add new files
os.system('git-ls-files -x .hg --others | git-update-index --add --stdin')
# delete removed files
os.system('git-ls-files -x .hg --deleted | git-update-index --remove --stdin')
# commit
os.system(getgitenv(user, author, date) + 'git-commit -a -F %s' % filecomment)
os.unlink(filecomment)
# tag
if tag and tag != 'tip':
os.system(getgitenv(user, author, date) + 'git-tag %s' % tag)
# delete branch if not used anymore...
if mparent and len(hgchildren[str(cset)]):
print "Deleting unused branch:", otherbranch
os.system('git-branch -d %s' % otherbranch)
# retrieve and record the version
vvv = os.popen('git-show | head -1').read()
vvv = vvv[vvv.index(' ') + 1 : ].strip()
print 'record', cset, '->', vvv
hgvers[str(cset)] = vvv
os.system('git-repack -a -d')
# write the state for incrementals
if state:
print 'Writing state'
f = open(state, 'w')
pickle.dump(hgvers, f)
# vim: et ts=8 sw=4 sts=4
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: mercurial to git
2007-03-06 21:54 ` Theodore Tso
2007-03-06 22:47 ` Rocco Rutte
2007-03-06 23:08 ` Josef Sipek
@ 2007-03-08 9:01 ` Rocco Rutte
2 siblings, 0 replies; 19+ messages in thread
From: Rocco Rutte @ 2007-03-08 9:01 UTC (permalink / raw)
To: git
Hi,
* Theodore Tso [07-03-06 16:54:59 -0500] wrote:
>Hmm.... the way I was planning on handling the performance bottleneck
>was to use "hg manifest --debug <rev>" and diffing the hashes against
>its parents. Using "hg manifest" only hits .hg/00manifest.[di] and
>.hg/00changelog.[di] files, so it's highly efficient. With the
>--debug option to hg manifest (not needed on some earlier versions of
>hg, but it seems to be needed on the latest development version of
>hg), it outputs the mode and SHA1 hash of the files, so it becomes
>easy to see which files were changed relative to the revision's
>parent(s).
>Once we know which files we need to feed to git-fast-import, it's just
>a matter of using "hg cat -r <rev> <pathname>" to feed the individual
>changed file to git-fast-import.
I've done that now and the repositories come out as before in about 10
minutes. Also I sanitized the tags handling and will push out the
changed version somewhere soon.
bye, Rocco
--
:wq!
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: mercurial to git
2007-03-06 21:06 mercurial to git Rocco Rutte
2007-03-06 21:54 ` Theodore Tso
@ 2007-03-07 15:59 ` Shawn O. Pearce
2007-03-08 8:56 ` Rocco Rutte
2007-03-07 23:14 ` Shawn O. Pearce
2007-03-08 10:49 ` Rocco Rutte
3 siblings, 1 reply; 19+ messages in thread
From: Shawn O. Pearce @ 2007-03-07 15:59 UTC (permalink / raw)
To: Rocco Rutte; +Cc: git
Rocco Rutte <pdmef@gmx.net> wrote:
> The performance bottleneck is hg exporting data, as discovered by people
> on #mercurial, the problem is not really fixable and is due to hg's
> revlog handling. As a result, I needed to let the script feed the full
> contents of the repository at each revision we walk (i.e. all for the
> initial import) into git-fast-import.
I thought that hg stored file revisions such that each source file
(e.g. foo.c) had its own revision file (e.g. foo.revdata) and that
every revision of foo.c was stored in that one file, ordered from
oldest to newest? If that is the case why not strip all of those
into fast-import up front, doing one source file at a time as a
huge series of blobs and mark them, then do the commit/trees later
on using only the marks?
Or am I just missing something about hg?
> This is horribly slow. For mutt
> which contains several tags, a handfull of branches and only 5k commits
> this takes roughly two hours at 1 commit/sec.
Not fast-import's fault. ;-)
> Somewhat related: It would be really nice to teach git-fast-import to
> init from a previously saved mark file. Right now I use hg revision
> numbers as marks, let git-fast-import save them, and read them back next
> time. These are needed to map hg revisions to git SHA1s in case I need
> to reference something in an incremental import from an earlier run. It
> would be nice if git-fast-import could do this on its own so that all
> consumers can benefit and can have persistent marks accross sessions.
Sure, that sounds pretty easy. I'll try to work that up later
today or tomorrow.
--
Shawn.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: mercurial to git
2007-03-07 15:59 ` Shawn O. Pearce
@ 2007-03-08 8:56 ` Rocco Rutte
0 siblings, 0 replies; 19+ messages in thread
From: Rocco Rutte @ 2007-03-08 8:56 UTC (permalink / raw)
To: git
Hi,
* Shawn O. Pearce [07-03-07 10:59:29 -0500] wrote:
>I thought that hg stored file revisions such that each source file
>(e.g. foo.c) had its own revision file (e.g. foo.revdata) and that
>every revision of foo.c was stored in that one file, ordered from
>oldest to newest? If that is the case why not strip all of those
>into fast-import up front, doing one source file at a time as a
>huge series of blobs and mark them, then do the commit/trees later
>on using only the marks?
>Or am I just missing something about hg?
I don't want to use anything except the hg mecurial API so that in
theory the importer could work even for remote hg repositories.
But the "blob feed" approach doesn't seem perfectly right to me
especially for incremental imports. There would have to be state files
and internal tables telling what revisions of what files there are with
what content. With thousands of files I think this gets quite messy to
find even the minimum set to start of with for an incremental import.
Also, you can already specify up to which revision to import so it would
get even more complicated.
bye, Rocco
--
:wq!
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: mercurial to git
2007-03-06 21:06 mercurial to git Rocco Rutte
2007-03-06 21:54 ` Theodore Tso
2007-03-07 15:59 ` Shawn O. Pearce
@ 2007-03-07 23:14 ` Shawn O. Pearce
2007-03-08 10:49 ` Rocco Rutte
3 siblings, 0 replies; 19+ messages in thread
From: Shawn O. Pearce @ 2007-03-07 23:14 UTC (permalink / raw)
To: Rocco Rutte, Junio C Hamano; +Cc: git
Rocco Rutte <pdmef@gmx.net> wrote:
> Somewhat related: It would be really nice to teach git-fast-import to
> init from a previously saved mark file. Right now I use hg revision
> numbers as marks, let git-fast-import save them, and read them back next
> time. These are needed to map hg revisions to git SHA1s in case I need
> to reference something in an incremental import from an earlier run. It
> would be nice if git-fast-import could do this on its own so that all
> consumers can benefit and can have persistent marks accross sessions.
Done. See the new --import-marks option.
The following changes since commit c390ae97beb9e8cdab159b593ea9659e8096c4db:
Li Yang (1):
gitweb: Change to use explicitly function call cgi->escapHTML()
are found in the git repository at:
git://repo.or.cz:/git/fastimport.git
Shawn O. Pearce (3):
Preallocate memory earlier in fast-import
Use atomic updates to the fast-import mark file
Allow fast-import frontends to reload the marks table
Documentation/git-fast-import.txt | 13 +++++-
fast-import.c | 85 +++++++++++++++++++++++++++++-------
t/t9300-fast-import.sh | 8 ++++
3 files changed, 88 insertions(+), 18 deletions(-)
--
Shawn.
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: mercurial to git
2007-03-06 21:06 mercurial to git Rocco Rutte
` (2 preceding siblings ...)
2007-03-07 23:14 ` Shawn O. Pearce
@ 2007-03-08 10:49 ` Rocco Rutte
3 siblings, 0 replies; 19+ messages in thread
From: Rocco Rutte @ 2007-03-08 10:49 UTC (permalink / raw)
To: git
Hi,
* Rocco Rutte [07-03-06 21:06:29 +0000] wrote:
[...]
I've now pushed the changes out to:
http://repo.or.cz/w/hg2git.git
I don't know that the status is and/or future plans are for:
http://repo.or.cz/w/fast-export.git
...but these two seem worth combining, IMHO.
I haven't followed git development consequently lately, so are there any
plans of including these or replacing current importers by these?
bye, Rocco
--
:wq!
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2007-03-17 11:37 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-06 21:06 mercurial to git Rocco Rutte
2007-03-06 21:54 ` Theodore Tso
2007-03-06 22:47 ` Rocco Rutte
2007-03-06 23:08 ` Josef Sipek
2007-03-07 0:11 ` Theodore Tso
[not found] ` <20070314111257.GA4526@peter.daprodeges.fqdn.th-h.de>
2007-03-15 0:25 ` Theodore Tso
2007-03-15 10:19 ` Rocco Rutte
2007-03-15 14:12 ` Theodore Tso
2007-03-15 15:19 ` Rocco Rutte
2007-03-15 15:56 ` Linus Torvalds
[not found] ` <20070314132951.GE12710@thunk.org>
[not found] ` <20070315094434.GA4425@peter.daprodeges.fqdn.th-h.de>
2007-03-15 21:04 ` Theodore Tso
2007-03-15 22:07 ` Rocco Rutte
2007-03-17 11:37 ` Simon 'corecode' Schubert
2007-03-16 4:53 ` Len Brown
2007-03-08 9:01 ` Rocco Rutte
2007-03-07 15:59 ` Shawn O. Pearce
2007-03-08 8:56 ` Rocco Rutte
2007-03-07 23:14 ` Shawn O. Pearce
2007-03-08 10:49 ` Rocco Rutte
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).