git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mercurial to git
@ 2007-03-06 21:06 Rocco Rutte
  2007-03-06 21:54 ` Theodore Tso
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Rocco Rutte @ 2007-03-06 21:06 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1964 bytes --]

Hi,

attached are two files of take #1 of writing a hg2git converter/tracker 
using git-fast-import. It basically works so use at your own risk and 
send patches... :)

"Basically" means that it gets tags, branches and merges right (working 
tree md5 sums match after imports). It also means that it is horribly 
slow for the repos I tested it own (only mutt and hg-crew).

The performance bottleneck is hg exporting data, as discovered by people 
on #mercurial, the problem is not really fixable and is due to hg's 
revlog handling. As a result, I needed to let the script feed the full 
contents of the repository at each revision we walk (i.e. all for the 
initial import) into git-fast-import. This is horribly slow. For mutt 
which contains several tags, a handfull of branches and only 5k commits 
this takes roughly two hours at 1 commit/sec. My earlier version not 
using 'deleteall' and feeding only files that changed took 15 minutes 
alltogether, git-fast-import from a textfile 1 min 30 sec.

As I'll use this my for daily work (more or less), I'll think I'll 
"maintain" and keep improving it, so if anyone has comments, critics, 
hints, patches, ...

Somewhat related: It would be really nice to teach git-fast-import to 
init from a previously saved mark file. Right now I use hg revision 
numbers as marks, let git-fast-import save them, and read them back next 
time. These are needed to map hg revisions to git SHA1s in case I need 
to reference something in an incremental import from an earlier run. It 
would be nice if git-fast-import could do this on its own so that all 
consumers can benefit and can have persistent marks accross sessions.

About the attached files: hg2git.py is the worker script using the 
mercurial python package so that no more slow shell or pipes including 
fork are needed for the raw export, hg2git.sh is a convenience shell 
wrapper taking core of the state files for incremental imports.

   bye, Rocco
-- 
:wq!

[-- Attachment #2: hg2git.sh --]
[-- Type: application/x-sh, Size: 1691 bytes --]

[-- Attachment #3: hg2git.py --]
[-- Type: text/plain, Size: 6961 bytes --]

#!/usr/bin/env python

# Copyright (c) 2007 Rocco Rutte <pdmef@gmx.net>
# License: GPLv2

"""hg2git.py - A mercurial-to-git filter for git-fast-import(1)
Usage: hg2git.py <hg repo url> <marks file> <heads file> <tip file>
"""

from mercurial import repo,hg,cmdutil,util,ui,revlog
from tempfile import mkstemp
import re
import sys
import os

# silly regex to see if user field has email address
user_re=re.compile('[^<]+ <[^>]+>$')
# git branch for hg's default 'HEAD' branch
cfg_master='master'
# insert 'checkpoint' command after this many commits
cfg_checkpoint_count=1000

def usage(ret):
  sys.stderr.write(__doc__)
  return ret

def setup_repo(url):
  myui=ui.ui()
  return myui,hg.repository(myui,url)

def get_changeset(ui,repo,revision):
  def get_branch(name):
    if name=='HEAD':
      name=cfg_master
    return name
  def fixup_user(user):
    if user_re.match(user)==None:
      if '@' not in user:
        return user+' <none@none>'
      return user+' <'+user+'>'
    return user
  node=repo.lookup(revision)
  (manifest,user,(time,timezone),files,desc,extra)=repo.changelog.read(node)
  tz="%+03d%02d" % (-timezone / 3600, ((-timezone % 3600) / 60))
  branch=get_branch(extra.get('branch','master'))
  return (manifest,fixup_user(user),(time,tz),files,desc,branch,extra)

def gitmode(x):
  return x and '100755' or '100644'

def wr(msg=''):
  print msg
  #map(lambda x: sys.stderr.write('\t[%s]\n' % x),msg.split('\n'))

def checkpoint(count):
  count=count+1
  if count%cfg_checkpoint_count==0:
    sys.stderr.write("Checkpoint after %d commits\n" % count)
    wr('checkpoint')
    wr()
  return count

def get_parent_mark(parent,marks):
  p=marks.get(str(parent),None)
  if p==None:
    # if we didn't see parent previously, assume we saw it in this run
    p=':%d' % (parent+1)
  return p

def export_commit(ui,repo,revision,marks,heads,last,max,count):
  sys.stderr.write('Exporting revision %d (tip %d) as [:%d]\n' % (revision,max,revision+1))

  (_,user,(time,timezone),files,desc,branch,_)=get_changeset(ui,repo,revision)
  parents=repo.changelog.parentrevs(revision)

  # we need this later to write out tags
  marks[str(revision)]=':%d'%(revision+1)

  wr('commit refs/heads/%s' % branch)
  wr('mark :%d' % (revision+1))
  wr('committer %s %d %s' % (user,time,timezone))
  wr('data %d' % (len(desc)+1)) # wtf?
  wr(desc)
  wr()

  src=heads.get(branch,'')
  link=''
  if src!='':
    # if we have a cached head, this is an incremental import: initialize it
    # and kill reference so we won't init it again
    wr('from %s' % src)
    heads[branch]=''
  elif not heads.has_key(branch) and revision>0:
    # newly created branch and not the first one: connect to parent
    tmp=get_parent_mark(parents[0],marks)
    wr('from %s' % tmp)
    sys.stderr.write('Link new branch [%s] to parent [%s]\n' %
        (branch,tmp))
    link=tmp # avoid making a merge commit for branch fork

  if parents:
    l=last.get(branch,revision)
    for p in parents:
      # 1) as this commit implicitely is the child of the most recent
      #    commit of this branch, ignore this parent
      # 2) ignore nonexistent parents
      # 3) merge otherwise
      if p==l or p==revision or p<0:
        continue
      tmp=get_parent_mark(p,marks)
      # if we fork off a branch, don't merge via 'merge' as we have
      # 'from' already above
      if tmp==link:
        continue
      sys.stderr.write('Merging branch [%s] with parent [%s] from [r%d]\n' %
          (branch,tmp,p))
      wr('merge %s' % tmp)

  last[branch]=revision
  heads[branch]=''

  # just wipe the branch clean, all full manifest contents
  wr('deleteall')

  ctx=repo.changectx(str(revision))
  man=ctx.manifest()

  #for f in man.keys():
  #  fctx=ctx.filectx(f)
  #  d=fctx.data()
  #  wr('M %s inline %s' % (gitmode(man.execf(f)),f))
  #  wr('data %d' % len(d)) # had some trouble with size()
  #  wr(d)

  for fctx in ctx.filectxs():
    f=fctx.path()
    d=fctx.data()
    wr('M %s inline %s' % (gitmode(man.execf(f)),f))
    wr('data %d' % len(d)) # had some trouble with size()
    wr(d)

  wr()
  return checkpoint(count)

def export_tags(ui,repo,cache,count):
  l=repo.tagslist()
  for tag,node in l:
    if tag=='tip':
      continue
    rev=repo.changelog.rev(node)
    ref=cache.get(str(rev),None)
    if ref==None:
      sys.stderr.write('Failed to find reference for creating tag'
          ' %s at r%d\n' % (tag,rev))
      continue
    (_,user,(time,timezone),_,desc,branch,_)=get_changeset(ui,repo,rev)
    sys.stderr.write('Exporting tag [%s] at [hg r%d] [git %s]\n' % (tag,rev,ref))
    wr('tag %s' % tag)
    wr('from %s' % ref)
    wr('tagger %s %d %s' % (user,time,timezone))
    msg='hg2git created tag %s for hg revision %d on branch %s on (summary):\n\t%s' % (tag,
        rev,branch,desc.split('\n')[0])
    wr('data %d' % (len(msg)+1))
    wr(msg)
    wr()
    count=checkpoint(count)
  return count

def load_cache(filename):
  cache={}
  if not os.path.exists(filename):
    return cache
  f=open(filename,'r')
  l=0
  for line in f.readlines():
    l+=1
    fields=line.split(' ')
    if fields==None or not len(fields)==2 or fields[0][0]!=':':
      sys.stderr.write('Invalid file format in [%s], line %d\n' % (filename,l))
      continue
    # put key:value in cache, key without ^:
    cache[fields[0][1:]]=fields[1].split('\n')[0]
  f.close()
  return cache

def save_cache(filename,cache):
  f=open(filename,'w+')
  map(lambda x: f.write(':%s %s\n' % (str(x),str(cache.get(x)))),cache.keys())
  f.close()

def verify_heads(ui,repo,cache):
  def getsha1(branch):
    f=open(os.getenv('GIT_DIR','/dev/null')+'/refs/heads/'+branch)
    sha1=f.readlines()[0].split('\n')[0]
    f.close()
    return sha1

  for b in cache.keys():
    sys.stderr.write('Verifying branch [%s]\n' % b)
    sha1=getsha1(b)
    c=cache.get(b)
    if sha1!=c:
      sys.stderr.write('Warning: Branch [%s] modified outside hg2git:'
        '\n%s (repo) != %s (cache)\n' % (b,sha1,c))
  return True

if __name__=='__main__':
  if len(sys.argv)!=6: sys.exit(usage(1))
  repourl,m,marksfile,headsfile,tipfile=sys.argv[1:]
  _max=int(m)

  marks_cache=load_cache(marksfile)
  heads_cache=load_cache(headsfile)
  state_cache=load_cache(tipfile)

  ui,repo=setup_repo(repourl)

  if not verify_heads(ui,repo,heads_cache):
    sys.exit(1)

  tip=repo.changelog.count()

  min=int(state_cache.get('tip',0))
  max=_max
  if _max<0:
    max=tip

  c=int(state_cache.get('count',0))
  last={}
  for rev in range(min,max):
    c=export_commit(ui,repo,rev,marks_cache,heads_cache,last,tip,c)

  c=export_tags(ui,repo,marks_cache,c)

  state_cache['tip']=max
  state_cache['count']=c
  state_cache['repo']=repourl
  save_cache(tipfile,state_cache)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mercurial to git
  2007-03-06 21:06 mercurial to git Rocco Rutte
@ 2007-03-06 21:54 ` Theodore Tso
  2007-03-06 22:47   ` Rocco Rutte
                     ` (2 more replies)
  2007-03-07 15:59 ` Shawn O. Pearce
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 19+ messages in thread
From: Theodore Tso @ 2007-03-06 21:54 UTC (permalink / raw)
  To: git

On Tue, Mar 06, 2007 at 09:06:29PM +0000, Rocco Rutte wrote:
> 
> attached are two files of take #1 of writing a hg2git converter/tracker 
> using git-fast-import. It basically works so use at your own risk and 
> send patches... :)

I was actually thinking about doing this too, but apparently you beat
me too it.  :-)

> The performance bottleneck is hg exporting data, as discovered by people 
> on #mercurial, the problem is not really fixable and is due to hg's 
> revlog handling. As a result, I needed to let the script feed the full 
> contents of the repository at each revision we walk (i.e. all for the 
> initial import) into git-fast-import. This is horribly slow. For mutt 
> which contains several tags, a handfull of branches and only 5k commits 
> this takes roughly two hours at 1 commit/sec. My earlier version not 
> using 'deleteall' and feeding only files that changed took 15 minutes 
> alltogether, git-fast-import from a textfile 1 min 30 sec.

Hmm.... the way I was planning on handling the performance bottleneck
was to use "hg manifest --debug <rev>" and diffing the hashes against
its parents.  Using "hg manifest" only hits .hg/00manifest.[di] and
.hg/00changelog.[di] files, so it's highly efficient.  With the
--debug option to hg manifest (not needed on some earlier versions of
hg, but it seems to be needed on the latest development version of
hg), it outputs the mode and SHA1 hash of the files, so it becomes
easy to see which files were changed relative to the revision's
parent(s).

Once we know which files we need to feed to git-fast-import, it's just
a matter of using "hg cat -r <rev> <pathname>" to feed the individual
changed file to git-fast-import.  For each file, you only have to
touch .hg/data/pathane.[di] files.  So this should allow us to feed
input into git-fast-important without needing to feed the full
contents of the repository for each revision.

The other thing that I've been working in my design is how to make the
converter to be bidrectional.  That is, if a changelog is made on the
hg repository, it should be possible to push it over to the git
repository, and vice versa, if there are changes made in the git
repository, it should be possible to push it back to git.  

In order to do this it becomes necessary to special case the .hgrc
file, and in fact we need to make sure that the .hgrc file does *not*
show up in the git repository, but the contents of the .hgrc file
needs to be stored in the state file that lives alongside the git and
hg repositories.

Regards,
						- Ted

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mercurial to git
  2007-03-06 21:54 ` Theodore Tso
@ 2007-03-06 22:47   ` Rocco Rutte
  2007-03-06 23:08   ` Josef Sipek
  2007-03-08  9:01   ` Rocco Rutte
  2 siblings, 0 replies; 19+ messages in thread
From: Rocco Rutte @ 2007-03-06 22:47 UTC (permalink / raw)
  To: git

Hi,

* Theodore Tso [07-03-06 16:54:59 -0500] wrote:

>Hmm.... the way I was planning on handling the performance bottleneck
>was to use "hg manifest --debug <rev>" and diffing the hashes against
>its parents.  Using "hg manifest" only hits .hg/00manifest.[di] and
>.hg/00changelog.[di] files, so it's highly efficient.  With the
>--debug option to hg manifest (not needed on some earlier versions of
>hg, but it seems to be needed on the latest development version of
>hg), it outputs the mode and SHA1 hash of the files, so it becomes
>easy to see which files were changed relative to the revision's
>parent(s).

I started getting/looking at hg a few days ago, mainly at the source 
only so that I likely miss some things...

Hmm. I'll need to further read the hg source to see how they do it. I 
now switched to defaulting to use the hg changes for normal changesets 
and the full manifest for merges. That's a huge boost already. Your 
approach sounds even better... so that I'll use it. :)

   bye, Rocco
-- 
:wq!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mercurial to git
  2007-03-06 21:54 ` Theodore Tso
  2007-03-06 22:47   ` Rocco Rutte
@ 2007-03-06 23:08   ` Josef Sipek
  2007-03-07  0:11     ` Theodore Tso
  2007-03-08  9:01   ` Rocco Rutte
  2 siblings, 1 reply; 19+ messages in thread
From: Josef Sipek @ 2007-03-06 23:08 UTC (permalink / raw)
  To: Theodore Tso; +Cc: git

On Tue, Mar 06, 2007 at 04:54:59PM -0500, Theodore Tso wrote:
...
> The other thing that I've been working in my design is how to make the
> converter to be bidrectional.
 
A while back, I tried to write an extension to mercurial that would export a
hg repo using the git protocol. One side-effect was that it converted the
entire repository to a git repo with many loose objects.

It "worked" (I never finished it enough) on a small repo in a bidirectional
way.

I'll try to dig up the code, and put it up somewhere...

Josef "Jeff" Sipek.

-- 
I'm somewhere between geek and normal.
		- Linus Torvalds

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mercurial to git
  2007-03-06 23:08   ` Josef Sipek
@ 2007-03-07  0:11     ` Theodore Tso
       [not found]       ` <20070314111257.GA4526@peter.daprodeges.fqdn.th-h.de>
  0 siblings, 1 reply; 19+ messages in thread
From: Theodore Tso @ 2007-03-07  0:11 UTC (permalink / raw)
  To: Josef Sipek; +Cc: git

On Tue, Mar 06, 2007 at 06:08:02PM -0500, Josef Sipek wrote:
> I'll try to dig up the code, and put it up somewhere...

Here's a hacked up version of Stelian Pop's converter code that I used
for an initial test conversion of e2fsprogs from hg to git.  The main
improvements from Stelian's is that it's a bit faster by caching the
results of "hg log", and that it handles parses the Signed-off-by:
headers to feed in into the ChangeSet's Author identity (as distinct
from the committer identity, which it gets from the hg information).

The other change which I added was add a pretty kludgy committer name
cannonicalizer, since there the commiter information dates is pretty
grotty.  That's because the e2fsprogs source repository has over the
years been converted from CVS, to BitKeeper, to Mercurial, and now at
some point soon when I'm happy with a decent hg-to-git tool, to git.

My plan was to rewrite the converter to call Mercurial's python
classes directly (using the equivalent python code to 'hg manifest'
and 'hg cat' to speed things up enormously, compared to checking out
each revision one at a time and then using git to figure out which
files had been added/changed/deleted), and to interface it into
git-fast-import, and make the necessary changes (including more
intelligent handling of .hgtags) so that the conversion could be
bidrectional.

But if I can convince someone else to do the work, especially if their
converter handles the Signed-off-by: parsing, and making sure the
author and commit dates are properly set, that would certainly be a
bonus.  :-)

						- Ted

P.S.  Oh yes, my plan was to use Python's ConfigParser class to store
the author cannonicalization information, instead of hard-coding the
data into the python script.  Code snippets to do this available on
request; it was pretty trivial to do.

#! /usr/bin/python

""" hg-to-git.py - A Mercurial to GIT converter

    Copyright (C)2007 Stelian Pop <stelian@xxxxxxxxxx>

    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2, or (at your option)
    any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
"""

import os, os.path, sys
import tempfile, popen2, pickle, getopt
import re

# Maps hg version -> git version
hgvers = {}
# List of children for each hg revision
hgchildren = {}
# Current branch for each hg revision
hgbranch = {}

#------------------------------------------------------------------------------

def usage():

        print """\
%s: [OPTIONS] <hgprj>

options:
    -s, --gitstate=FILE: name of the state to be saved/read
                         for incrementals

required:
    hgprj:  name of the HG project to import (directory)
""" % sys.argv[0]

#------------------------------------------------------------------------------

def getgitenv(user, author, date):
    env = ''
    if author == '':
	author = user

    elems = re.compile('(.*?)\s+<(.*)>').match(user)
    if elems:
        env += 'export GIT_COMMITER_NAME="%s" ;' % elems.group(1)
        env += 'export GIT_COMMITER_EMAIL="%s" ;' % elems.group(2)
    else:
        env += 'export GIT_COMMITER_NAME="%s" ;' % user
        env += 'export GIT_COMMITER_EMAIL= ;'

    elems = re.compile('(.*?)\s+<(.*)>').match(author)
    if elems:
        env += 'export GIT_AUTHOR_NAME="%s" ;' % elems.group(1)
        env += 'export GIT_AUTHOR_EMAIL="%s" ;' % elems.group(2)
    else:
        env += 'export GIT_AUTHOR_NAME="%s" ;' % author
        env += 'export GIT_AUTHOR_EMAIL= ;'

    env += 'export GIT_AUTHOR_DATE="%s" ;' % date
    env += 'export GIT_COMMITTER_DATE="%s" ;' % date
    return env

#------------------------------------------------------------------------------

state = ''

try:
    opts, args = getopt.getopt(sys.argv[1:], 's:t:', ['gitstate=', 'tempdir='])
    for o, a in opts:
        if o in ('-s', '--gitstate'):
            state = a
            state = os.path.abspath(state)

    if len(args) != 1:
        raise('params')
except:
    usage()
    sys.exit(1)

hgprj = args[0]
os.chdir(hgprj)

if state:
    if os.path.exists(state):
        print 'State does exist, reading'
        f = open(state, 'r')
        hgvers = pickle.load(f)
    else:
        print 'State does not exist, first run'

tip = os.popen('hg tip | head -1 | cut -f 2 -d :').read().strip()
print 'tip is', tip

# Calculate the branches
print 'analysing the branches...'
hgchildren["0"] = ()
hgbranch["0"] = "master"
for cset in range(1, int(tip) + 1):
    hgchildren[str(cset)] = ()
    prnts = os.popen('hg log -r %d | grep ^parent: | cut -f 2 -d :' % cset).readlines()
    if len(prnts) > 0:
        parent = prnts[0].strip()
    else:
        parent = str(cset - 1)
    hgchildren[parent] += ( str(cset), )
    if len(prnts) > 1:
        mparent = prnts[1].strip()
        hgchildren[mparent] += ( str(cset), )
    else:
        mparent = None

    if mparent:
        # For merge changesets, take either one, preferably the 'master' branch
        if hgbranch[mparent] == 'master':
            hgbranch[str(cset)] = 'master'
        else:
            hgbranch[str(cset)] = hgbranch[parent]
    else:
        # Normal changesets
        # For first children, take the parent branch, for the others create a new branch
        if hgchildren[parent][0] == str(cset):
            hgbranch[str(cset)] = hgbranch[parent]
        else:
            hgbranch[str(cset)] = "branch-" + str(cset)

if not hgvers.has_key("0"):
    print 'creating repository'
    os.system('git-init-db')

# loop through every hg changeset
for cset in range(int(tip) + 1):

    # incremental, already seen
    if hgvers.has_key(str(cset)):
        continue

    # get info
    prnts = os.popen('hg log -r %d | grep ^parent: | cut -f 2 -d :' % cset).readlines()
    if len(prnts) > 0:
        parent = prnts[0].strip()
    else:
        parent = str(cset - 1)
    if len(prnts) > 1:
        mparent = prnts[1].strip()
    else:
        mparent = None

    (fdlog, filelog) = tempfile.mkstemp()
    logtxt = os.popen('hg log -r %d -v' % cset).read().strip()
    os.write(fdlog, logtxt)
    os.close(fdlog)

    (fdcomment, filecomment) = tempfile.mkstemp()
    csetcomment = os.popen('grep -v ^changeset: < %s | grep -v ^parent: | grep -v ^user: | grep -v ^date | grep -v ^files: | grep -v ^description: | grep -v ^tag:' % filelog).read().strip()
    os.write(fdcomment, csetcomment)
    os.close(fdcomment)

    date = os.popen('grep -m 1 ^date: < %s | cut -f 2- -d :' % filelog).read().strip()

    tag = os.popen('grep -m 1 ^tag: < %s | cut -f 2- -d :' % filelog).read().strip()

    user = os.popen('grep -m 1 ^user: < %s | cut -f 2- -d :' % filelog).read().strip()
    if user == 'tytso@mit.edu':
	user = "Theodore Ts'o <tytso@mit.edu>"
    if user == 'tytso@think.thunk.org':
	user = "Theodore Ts'o <tytso@mit.edu>"
    if user == 'tytso@snap.thunk.org':
	user = "Theodore Ts'o <tytso@mit.edu>"
    if user == 'tytso@fs.thunk.org':
	user = "Theodore Ts'o <tytso@mit.edu>"
    if user == 'tytso@voltaire.debian.org':
	user = "Theodore Ts'o <tytso@mit.edu>"
    if user == 'tytso@who-could-of.thunk.org':
	user = "Theodore Ts'o <tytso@mit.edu>"
    if user == 'tytso@universal.(none)':
	user = "Theodore Ts'o <tytso@mit.edu>"
    if user == 'tytso@theodore-tsos-computer.local':
	user = "Theodore Ts'o <tytso@mit.edu>"

    if user == 'adilger@clusterfs.com':
	user = "Andreas Dilger <adilger@clusterfs.com>"
    if user == 'adilger@lynx.adilger.int':
	user = "Andreas Dilger <adilger@clusterfs.com>"
    if user == 'root@lynx.adilger.int':
	user = "Andreas Dilger <adilger@clusterfs.com>"
    if user == 'matthias.andree@gmx.de':
	user = "Matthias Andree <matthias.andree@gmx.de>"

    if user == 'laptop@duncow.home.oldelvet.org.uk':
	user = "Richard Mortimer <richm@oldelvet.org.uk>"

    if user == 'sct@redhat.com':
	user = 'Stephen Tweedie <sct@redhat.com>'
    if user == 'sct@sisko.scot.redhat.com':
	user = 'Stephen Tweedie <sct@redhat.com>'

    if user == 'paubert@gra-vd1.iram.es':
	user = 'Gabriel Paubert <paubert@iram.es>'

    author = os.popen('grep -m 1 ^Signed-off-by: < %s | cut -f 2- -d :' % filelog).read().strip()
    if author == '"Theodore Ts\'o" <tytso@mit.edu>':
	author = "Theodore Ts'o <tytso@mit.edu>"

    os.unlink(filelog)

    print '-----------------------------------------'
    print 'cset:', cset
    print 'branch:', hgbranch[str(cset)]
    print 'user:', user
    print 'author:', author
    print 'date:', date
    print 'comment:', csetcomment
    print 'parent:', parent
    if mparent:
        print 'mparent:', mparent
    if tag:
        print 'tag:', tag
    print '-----------------------------------------'

    # checkout the parent if necessary
    if cset != 0:
        if hgbranch[str(cset)] == "branch-" + str(cset):
            print 'creating new branch', hgbranch[str(cset)]
            os.system('git-checkout -b %s %s' % (hgbranch[str(cset)], hgvers[parent]))
        else:
            print 'checking out branch', hgbranch[str(cset)]
            os.system('git-checkout %s' % hgbranch[str(cset)])

    # merge
    if mparent:
        if hgbranch[parent] == hgbranch[str(cset)]:
            otherbranch = hgbranch[mparent]
        else:
            otherbranch = hgbranch[parent]
        print 'merging', otherbranch, 'into', hgbranch[str(cset)]
        os.system(getgitenv(user, author, date) + 'git-merge --no-commit -s ours "" %s %s' % (hgbranch[str(cset)], otherbranch))

    # remove everything except .git and .hg directories
    os.system('find . \( -path "./.hg" -o -path "./.git" \) -prune -o ! -name "." -print | xargs rm -rf')

    # repopulate with checkouted files
    os.system('hg update -C %d' % cset)

    # add new files
    os.system('git-ls-files -x .hg --others | git-update-index --add --stdin')
    # delete removed files
    os.system('git-ls-files -x .hg --deleted | git-update-index --remove --stdin')

    # commit
    os.system(getgitenv(user, author, date) + 'git-commit -a -F %s' % filecomment)
    os.unlink(filecomment)

    # tag
    if tag and tag != 'tip':
        os.system(getgitenv(user, author, date) + 'git-tag %s' % tag)

    # delete branch if not used anymore...
    if mparent and len(hgchildren[str(cset)]):
        print "Deleting unused branch:", otherbranch
        os.system('git-branch -d %s' % otherbranch)

    # retrieve and record the version
    vvv = os.popen('git-show | head -1').read()
    vvv = vvv[vvv.index(' ') + 1 : ].strip()
    print 'record', cset, '->', vvv
    hgvers[str(cset)] = vvv

os.system('git-repack -a -d')

# write the state for incrementals
if state:
    print 'Writing state'
    f = open(state, 'w')
    pickle.dump(hgvers, f)

# vim: et ts=8 sw=4 sts=4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mercurial to git
  2007-03-06 21:06 mercurial to git Rocco Rutte
  2007-03-06 21:54 ` Theodore Tso
@ 2007-03-07 15:59 ` Shawn O. Pearce
  2007-03-08  8:56   ` Rocco Rutte
  2007-03-07 23:14 ` Shawn O. Pearce
  2007-03-08 10:49 ` Rocco Rutte
  3 siblings, 1 reply; 19+ messages in thread
From: Shawn O. Pearce @ 2007-03-07 15:59 UTC (permalink / raw)
  To: Rocco Rutte; +Cc: git

Rocco Rutte <pdmef@gmx.net> wrote:
> The performance bottleneck is hg exporting data, as discovered by people 
> on #mercurial, the problem is not really fixable and is due to hg's 
> revlog handling. As a result, I needed to let the script feed the full 
> contents of the repository at each revision we walk (i.e. all for the 
> initial import) into git-fast-import.

I thought that hg stored file revisions such that each source file
(e.g. foo.c) had its own revision file (e.g. foo.revdata) and that
every revision of foo.c was stored in that one file, ordered from
oldest to newest?  If that is the case why not strip all of those
into fast-import up front, doing one source file at a time as a
huge series of blobs and mark them, then do the commit/trees later
on using only the marks?

Or am I just missing something about hg?

> This is horribly slow. For mutt 
> which contains several tags, a handfull of branches and only 5k commits 
> this takes roughly two hours at 1 commit/sec.

Not fast-import's fault.  ;-)

> Somewhat related: It would be really nice to teach git-fast-import to 
> init from a previously saved mark file. Right now I use hg revision 
> numbers as marks, let git-fast-import save them, and read them back next 
> time. These are needed to map hg revisions to git SHA1s in case I need 
> to reference something in an incremental import from an earlier run. It 
> would be nice if git-fast-import could do this on its own so that all 
> consumers can benefit and can have persistent marks accross sessions.

Sure, that sounds pretty easy.  I'll try to work that up later
today or tomorrow.
 
-- 
Shawn.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mercurial to git
  2007-03-06 21:06 mercurial to git Rocco Rutte
  2007-03-06 21:54 ` Theodore Tso
  2007-03-07 15:59 ` Shawn O. Pearce
@ 2007-03-07 23:14 ` Shawn O. Pearce
  2007-03-08 10:49 ` Rocco Rutte
  3 siblings, 0 replies; 19+ messages in thread
From: Shawn O. Pearce @ 2007-03-07 23:14 UTC (permalink / raw)
  To: Rocco Rutte, Junio C Hamano; +Cc: git

Rocco Rutte <pdmef@gmx.net> wrote:
> Somewhat related: It would be really nice to teach git-fast-import to 
> init from a previously saved mark file. Right now I use hg revision 
> numbers as marks, let git-fast-import save them, and read them back next 
> time. These are needed to map hg revisions to git SHA1s in case I need 
> to reference something in an incremental import from an earlier run. It 
> would be nice if git-fast-import could do this on its own so that all 
> consumers can benefit and can have persistent marks accross sessions.

Done.  See the new --import-marks option.

The following changes since commit c390ae97beb9e8cdab159b593ea9659e8096c4db:
  Li Yang (1):
        gitweb: Change to use explicitly function call cgi->escapHTML()

are found in the git repository at:

  git://repo.or.cz:/git/fastimport.git

Shawn O. Pearce (3):
      Preallocate memory earlier in fast-import
      Use atomic updates to the fast-import mark file
      Allow fast-import frontends to reload the marks table

 Documentation/git-fast-import.txt |   13 +++++-
 fast-import.c                     |   85 +++++++++++++++++++++++++++++-------
 t/t9300-fast-import.sh            |    8 ++++
 3 files changed, 88 insertions(+), 18 deletions(-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mercurial to git
  2007-03-07 15:59 ` Shawn O. Pearce
@ 2007-03-08  8:56   ` Rocco Rutte
  0 siblings, 0 replies; 19+ messages in thread
From: Rocco Rutte @ 2007-03-08  8:56 UTC (permalink / raw)
  To: git

Hi,

* Shawn O. Pearce [07-03-07 10:59:29 -0500] wrote:

>I thought that hg stored file revisions such that each source file
>(e.g. foo.c) had its own revision file (e.g. foo.revdata) and that
>every revision of foo.c was stored in that one file, ordered from
>oldest to newest?  If that is the case why not strip all of those
>into fast-import up front, doing one source file at a time as a
>huge series of blobs and mark them, then do the commit/trees later
>on using only the marks?

>Or am I just missing something about hg?

I don't want to use anything except the hg mecurial API so that in 
theory the importer could work even for remote hg repositories.

But the "blob feed" approach doesn't seem perfectly right to me 
especially for incremental imports. There would have to be state files 
and internal tables telling what revisions of what files there are with 
what content. With thousands of files I think this gets quite messy to 
find even the minimum set to start of with for an incremental import. 
Also, you can already specify up to which revision to import so it would 
get even more complicated.

   bye, Rocco
-- 
:wq!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mercurial to git
  2007-03-06 21:54 ` Theodore Tso
  2007-03-06 22:47   ` Rocco Rutte
  2007-03-06 23:08   ` Josef Sipek
@ 2007-03-08  9:01   ` Rocco Rutte
  2 siblings, 0 replies; 19+ messages in thread
From: Rocco Rutte @ 2007-03-08  9:01 UTC (permalink / raw)
  To: git

Hi,

* Theodore Tso [07-03-06 16:54:59 -0500] wrote:

>Hmm.... the way I was planning on handling the performance bottleneck
>was to use "hg manifest --debug <rev>" and diffing the hashes against
>its parents.  Using "hg manifest" only hits .hg/00manifest.[di] and
>.hg/00changelog.[di] files, so it's highly efficient.  With the
>--debug option to hg manifest (not needed on some earlier versions of
>hg, but it seems to be needed on the latest development version of
>hg), it outputs the mode and SHA1 hash of the files, so it becomes
>easy to see which files were changed relative to the revision's
>parent(s).

>Once we know which files we need to feed to git-fast-import, it's just
>a matter of using "hg cat -r <rev> <pathname>" to feed the individual
>changed file to git-fast-import.

I've done that now and the repositories come out as before in about 10 
minutes. Also I sanitized the tags handling and will push out the 
changed version somewhere soon.

   bye, Rocco
-- 
:wq!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mercurial to git
  2007-03-06 21:06 mercurial to git Rocco Rutte
                   ` (2 preceding siblings ...)
  2007-03-07 23:14 ` Shawn O. Pearce
@ 2007-03-08 10:49 ` Rocco Rutte
  3 siblings, 0 replies; 19+ messages in thread
From: Rocco Rutte @ 2007-03-08 10:49 UTC (permalink / raw)
  To: git

Hi,

* Rocco Rutte [07-03-06 21:06:29 +0000] wrote:

[...]

I've now pushed the changes out to:

   http://repo.or.cz/w/hg2git.git

I don't know that the status is and/or future plans are for:

   http://repo.or.cz/w/fast-export.git

...but these two seem worth combining, IMHO.

I haven't followed git development consequently lately, so are there any 
plans of including these or replacing current importers by these?

   bye, Rocco
-- 
:wq!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mercurial to git
       [not found]       ` <20070314111257.GA4526@peter.daprodeges.fqdn.th-h.de>
@ 2007-03-15  0:25         ` Theodore Tso
  2007-03-15 10:19           ` Rocco Rutte
       [not found]         ` <20070314132951.GE12710@thunk.org>
  1 sibling, 1 reply; 19+ messages in thread
From: Theodore Tso @ 2007-03-15  0:25 UTC (permalink / raw)
  To: Rocco Rutte; +Cc: git

On Wed, Mar 14, 2007 at 11:12:57AM +0000, Rocco Rutte wrote:
> 
> I tried the import on the e2fsprogs repo and the files come out 
> identical, authors/comitters look okay to me, too.

Very cool!  It looks like some of the git author dates are only
getting set if the -s flag is set.  Was that intentional?

						- Ted

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mercurial to git
  2007-03-15  0:25         ` Theodore Tso
@ 2007-03-15 10:19           ` Rocco Rutte
  2007-03-15 14:12             ` Theodore Tso
  0 siblings, 1 reply; 19+ messages in thread
From: Rocco Rutte @ 2007-03-15 10:19 UTC (permalink / raw)
  To: git

Hi,

* Theodore Tso [07-03-14 20:25:07 -0400] wrote:
>On Wed, Mar 14, 2007 at 11:12:57AM +0000, Rocco Rutte wrote:

I failed to send a response to the list and it went Theodore privately 
only, sorry. I merged hg2git into fast-export.git at repo.or.cz and 
named it 'hg-fast-export' to match with the other importers there. It 
now can parse Signed-off-by lines and supports author maps (as 
git-cvsimport and git-svnimport do, same syntax).

>> I tried the import on the e2fsprogs repo and the files come out 
>> identical, authors/comitters look okay to me, too.

>Very cool!  It looks like some of the git author dates are only
>getting set if the -s flag is set.  Was that intentional?

For which changesets exactly? The script only attempts to write out the 
'author' command if -s (for parsing signed-off-by) is given. But for 
both commands the time information written out are identical and are 
exactly what hg gives us. So the bug must be elsewhere.

   bye, Rocco
-- 
:wq!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mercurial to git
  2007-03-15 10:19           ` Rocco Rutte
@ 2007-03-15 14:12             ` Theodore Tso
  2007-03-15 15:19               ` Rocco Rutte
  2007-03-15 15:56               ` Linus Torvalds
  0 siblings, 2 replies; 19+ messages in thread
From: Theodore Tso @ 2007-03-15 14:12 UTC (permalink / raw)
  To: git

On Thu, Mar 15, 2007 at 10:19:13AM +0000, Rocco Rutte wrote:
> I failed to send a response to the list and it went Theodore privately 
> only, sorry. I merged hg2git into fast-export.git at repo.or.cz and 
> named it 'hg-fast-export' to match with the other importers there. It 
> now can parse Signed-off-by lines and supports author maps (as 
> git-cvsimport and git-svnimport do, same syntax).

BTW, there are a number of places where the old name (hg2git) is still
being used for filenames, et.al, because $PFX is still being set to
hg2git.  

> For which changesets exactly? The script only attempts to write out the 
> 'author' command if -s (for parsing signed-off-by) is given. But for 
> both commands the time information written out are identical and are 
> exactly what hg gives us. So the bug must be elsewhere.

All of them.  :-)

Upon doing more investigation, the failure case seems to be if -A is
specified but NOT -s.  Comare:

(Generated using: hg-fast-export.sh -A ../e2fsprogs.authors -r ../e2fsprogs)
commit b584b9c57ecbbeef91970ca2924d66662029ab29
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Thu Jan 1 00:00:00 1970 +0000  <=============

with

(Generated using: hg-fast-export.sh -s -A ../e2fsprogs.authors -r ../e2fsprogs)
commit 9e9a5867e4d4985bde6d6be072efb96e901e08cc
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Wed Mar 7 08:09:10 2007 -0500  <=============

The date seems to be correctly generated using

	hg-fast-export.sh -s -A ../e2fsprogs.authors -r ../e2fsprogs
	hg-fast-export.sh -s -r ../e2fsprogs
	hg-fast-export.sh -r ../e2fsprogs

It seems to be this combination of options:

	hg-fast-export.sh -A ../e2fsprogs.authors -r ../e2fsprogs

Where all of the dates end up being Jan 1, 1970.

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mercurial to git
  2007-03-15 14:12             ` Theodore Tso
@ 2007-03-15 15:19               ` Rocco Rutte
  2007-03-15 15:56               ` Linus Torvalds
  1 sibling, 0 replies; 19+ messages in thread
From: Rocco Rutte @ 2007-03-15 15:19 UTC (permalink / raw)
  To: git

Hi,

* Theodore Tso [07-03-15 10:12:27 -0400] wrote:
>On Thu, Mar 15, 2007 at 10:19:13AM +0000, Rocco Rutte wrote:
>> I failed to send a response to the list and it went Theodore privately 
>> only, sorry. I merged hg2git into fast-export.git at repo.or.cz and 
>> named it 'hg-fast-export' to match with the other importers there. It 
>> now can parse Signed-off-by lines and supports author maps (as 
>> git-cvsimport and git-svnimport do, same syntax).

>BTW, there are a number of places where the old name (hg2git) is still
>being used for filenames, et.al, because $PFX is still being set to
>hg2git.

I know and intend to leave it that way as the filenames are shorter.

>The date seems to be correctly generated using

>	hg-fast-export.sh -s -A ../e2fsprogs.authors -r ../e2fsprogs
>	hg-fast-export.sh -s -r ../e2fsprogs
>	hg-fast-export.sh -r ../e2fsprogs

>It seems to be this combination of options:

>	hg-fast-export.sh -A ../e2fsprogs.authors -r ../e2fsprogs

>Where all of the dates end up being Jan 1, 1970.

Hmm. Strange, I cannot reproduce this. Can you mail me your authors file 
along with version information privately please?

   bye, Rocco
-- 
:wq!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mercurial to git
  2007-03-15 14:12             ` Theodore Tso
  2007-03-15 15:19               ` Rocco Rutte
@ 2007-03-15 15:56               ` Linus Torvalds
  1 sibling, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2007-03-15 15:56 UTC (permalink / raw)
  To: Theodore Tso; +Cc: git



On Thu, 15 Mar 2007, Theodore Tso wrote:
>
> (Generated using: hg-fast-export.sh -A ../e2fsprogs.authors -r ../e2fsprogs)
> commit b584b9c57ecbbeef91970ca2924d66662029ab29
> Author: Theodore Ts'o <tytso@mit.edu>
> Date:   Thu Jan 1 00:00:00 1970 +0000  <=============

Is the committer date ok (does it even exist)? Use "--pretty=fuller" or 
perhaps even "--pretty=raw" to see both author and committer date.

The normal log only shows author date, since that's usually the one people 
care about (git itself doesn't at all, it uses the committer date-stamp to 
do it's "choose most recent path to go down").

		Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mercurial to git
       [not found]           ` <20070315094434.GA4425@peter.daprodeges.fqdn.th-h.de>
@ 2007-03-15 21:04             ` Theodore Tso
  2007-03-15 22:07               ` Rocco Rutte
  2007-03-16  4:53               ` Len Brown
  0 siblings, 2 replies; 19+ messages in thread
From: Theodore Tso @ 2007-03-15 21:04 UTC (permalink / raw)
  To: Rocco Rutte; +Cc: git

Hopefully you won't mind that I'm adding the git list back to the cc
line, since it would be useful for others to provide some feedback.

On Thu, Mar 15, 2007 at 09:44:35AM +0000, Rocco Rutte wrote:
> >So I'll go try it out in the near future.  Are you planning on being
> >able to make it be bi-directional?  (i.e., so that changes in the git
> >tree can get propagated back to the hg tree?)
> 

> But as there's no hg-fast-import, I think git to hg not so trivial to 
> implement and convert-repo already exists, so I'd rather prefer 
> extending it to do the job.

Actually, there *is* an hg-fast-import.  It exists in the hg sources
in contrib/convert-repo, and it is being used in production to do
incremental conversion from the Linux kernel git tree to an hg tree.
So it does handle octopus merges already (it has to, the ACPI folks
are very ocotpus merge happy :-).

> However, I never even used hg and have only some knowledge about the API 
> so that I see some difficulties and need more time to think about it 
> (e.g. how to detect whether a change in hg originates at git and vice 
> versa, what to do with octopus merges, cherry-picks, etc).

So actually I have thought about this a fair amount, so if you don't
mind my pontificating a bit.   :-)

At the highest architectural viewpoint, there are three levels of
difficulty of SCM conversions:

A) One-way conversion utilities.  Examples of this would be the
	hg2git, hg-fast-import scripts that convert from hg to git,
	and the convert-repo script which will convert from git to hg.

B) Single point bidrectional conversion.  At this level, the hg/git
	gateway will run on a single machine, and with a state file,
	can recognize new git changesets, and create a functionally
	equivalent hg changeset and commit it to the hg repository,
	and can also recognize new hg changeset, and create a
	functionaly equivalent git changeset, and commit it to the git
	repository.  

C) Multisite bidirectional conversion.  At this level, multiple users
	be gatewaying between the two DSCM systems, and as long as
	they are using the same configuration parameters (more on this
	in a moment), if user A converts changeset from hg to git, and
	that changeset is passed along via git to user B, who then
	running the birectional gateway program, converts it back from
	git to hg, the hg changeset is identical so that hg recognizes
	is the same changeset when it gets propgated back to user A.

(C) would be the ideal, given the distributed nature of hg and git.
It is also the most difficult, because it means that we need to be
able to do a lossless, one-to-one conversion of a Changeset.  It is
also somewhat at odds with doing author mapping and signed-off-by
parsing, since that could prevent a reversible transformation.
However, what may very well be common for projects is for them to
start with (B), and to convert over some of the historical changesets,
and then later on allow multiple users to clone from the two git/hg
repositories and then do the multisite conversion.

So what that also means is that even if we only do (B) at first, it
might be useful if we have some of the characteristics needed to
eventually get to (C), even if we can't get there right away.

So more practially, here are some of the things that we would need to
do, looking at hg-fast-export:

*) Change the index/marks file to map between hg SHA hash ID's instead
of the small integer ordinals.  This is useful for enabling multisite
conversion, but it is also useful for tracking tag changes in .hgtags.

*) Have a mode so that instead of only checking changes greater than
last run, to simply iterate over all changesets in mercurial and check
to see if hg SHA1 commit ID is already in the marks file; if so, skip
it.  

*) Have a mode where the COMMITER id is "hg2git" and the COMMITER_DATE
is the same as the AUTHOR_DATE (so that the changelog converesion is
the same no matter where or who does the converation).  This is mainly
to enable multisite converstaion.

						- Ted

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mercurial to git
  2007-03-15 21:04             ` Theodore Tso
@ 2007-03-15 22:07               ` Rocco Rutte
  2007-03-17 11:37                 ` Simon 'corecode' Schubert
  2007-03-16  4:53               ` Len Brown
  1 sibling, 1 reply; 19+ messages in thread
From: Rocco Rutte @ 2007-03-15 22:07 UTC (permalink / raw)
  To: git

Hi,

* Theodore Tso [07-03-15 17:04:07 -0400] wrote:
>Hopefully you won't mind that I'm adding the git list back to the cc
>line, since it would be useful for others to provide some feedback.

Not at all. Just wondering when others would get too bored... :)

>Actually, there *is* an hg-fast-import.  It exists in the hg sources
>in contrib/convert-repo, and it is being used in production to do
>incremental conversion from the Linux kernel git tree to an hg tree.
>So it does handle octopus merges already (it has to, the ACPI folks
>are very ocotpus merge happy :-).

I know convert-repo and like it as a starting point. But it has some 
problems like not properly creating hg branches, can import only one 
branch at a time which must also be checkout out on the git side, etc.

With 'hg-fast-import' I meant something like git-fast-import where 
clients can feed in more raw data instead of preparing each commit on 
its own and comitting it.

[...]

>So more practially, here are some of the things that we would need to
>do, looking at hg-fast-export:

>*) Change the index/marks file to map between hg SHA hash ID's instead
>of the small integer ordinals.  This is useful for enabling multisite
>conversion, but it is also useful for tracking tag changes in .hgtags.

The small numbers are the hg revision numbers which we'll need for 
git-fast-import. Ideally git-fast-import would allow us to use anything 
for a mark we want as long as it's unique. But I'm sure there's a cheap 
way of mapping revision to SHA1 in the hg API.

So, if anybody wants to join in writing up such a hybrid system, I'm for 
it. :)

   bye, Rocco
-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mercurial to git
  2007-03-15 21:04             ` Theodore Tso
  2007-03-15 22:07               ` Rocco Rutte
@ 2007-03-16  4:53               ` Len Brown
  1 sibling, 0 replies; 19+ messages in thread
From: Len Brown @ 2007-03-16  4:53 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Rocco Rutte, git

On Thursday 15 March 2007 17:04, Theodore Tso wrote:
> So it does handle octopus merges already (it has to, the ACPI folks
> are very ocotpus merge happy :-).

Well, just to set the record straight...

So yes, I did a 12-way merge in the kernel a long while back on a lark.
I don't generally do them any more in the official kernel tree
because I think they make bisect more complicated than it needs to be.

cheers,
-Len

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mercurial to git
  2007-03-15 22:07               ` Rocco Rutte
@ 2007-03-17 11:37                 ` Simon 'corecode' Schubert
  0 siblings, 0 replies; 19+ messages in thread
From: Simon 'corecode' Schubert @ 2007-03-17 11:37 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 745 bytes --]

Rocco Rutte wrote:
> With 'hg-fast-import' I meant something like git-fast-import where 
> clients can feed in more raw data instead of preparing each commit on 
> its own and comitting it.

I wrote something like that for fromcvs: <http://ww2.fs.ei.tum.de/~corecode/hg/fromcvs?f=7992019d6861;file=tohg.py>

it still communicates blobs via files, because that's what hg wants to see.  performance is quite okay.

cheers
  simon

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2007-03-17 11:37 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-06 21:06 mercurial to git Rocco Rutte
2007-03-06 21:54 ` Theodore Tso
2007-03-06 22:47   ` Rocco Rutte
2007-03-06 23:08   ` Josef Sipek
2007-03-07  0:11     ` Theodore Tso
     [not found]       ` <20070314111257.GA4526@peter.daprodeges.fqdn.th-h.de>
2007-03-15  0:25         ` Theodore Tso
2007-03-15 10:19           ` Rocco Rutte
2007-03-15 14:12             ` Theodore Tso
2007-03-15 15:19               ` Rocco Rutte
2007-03-15 15:56               ` Linus Torvalds
     [not found]         ` <20070314132951.GE12710@thunk.org>
     [not found]           ` <20070315094434.GA4425@peter.daprodeges.fqdn.th-h.de>
2007-03-15 21:04             ` Theodore Tso
2007-03-15 22:07               ` Rocco Rutte
2007-03-17 11:37                 ` Simon 'corecode' Schubert
2007-03-16  4:53               ` Len Brown
2007-03-08  9:01   ` Rocco Rutte
2007-03-07 15:59 ` Shawn O. Pearce
2007-03-08  8:56   ` Rocco Rutte
2007-03-07 23:14 ` Shawn O. Pearce
2007-03-08 10:49 ` Rocco Rutte

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).