git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: git@vger.kernel.org
Subject: Re: mercurial to git
Date: Tue, 6 Mar 2007 16:54:59 -0500	[thread overview]
Message-ID: <20070306215459.GI18370@thunk.org> (raw)
In-Reply-To: <20070306210629.GA42331@peter.daprodeges.fqdn.th-h.de>

On Tue, Mar 06, 2007 at 09:06:29PM +0000, Rocco Rutte wrote:
> 
> attached are two files of take #1 of writing a hg2git converter/tracker 
> using git-fast-import. It basically works so use at your own risk and 
> send patches... :)

I was actually thinking about doing this too, but apparently you beat
me too it.  :-)

> The performance bottleneck is hg exporting data, as discovered by people 
> on #mercurial, the problem is not really fixable and is due to hg's 
> revlog handling. As a result, I needed to let the script feed the full 
> contents of the repository at each revision we walk (i.e. all for the 
> initial import) into git-fast-import. This is horribly slow. For mutt 
> which contains several tags, a handfull of branches and only 5k commits 
> this takes roughly two hours at 1 commit/sec. My earlier version not 
> using 'deleteall' and feeding only files that changed took 15 minutes 
> alltogether, git-fast-import from a textfile 1 min 30 sec.

Hmm.... the way I was planning on handling the performance bottleneck
was to use "hg manifest --debug <rev>" and diffing the hashes against
its parents.  Using "hg manifest" only hits .hg/00manifest.[di] and
.hg/00changelog.[di] files, so it's highly efficient.  With the
--debug option to hg manifest (not needed on some earlier versions of
hg, but it seems to be needed on the latest development version of
hg), it outputs the mode and SHA1 hash of the files, so it becomes
easy to see which files were changed relative to the revision's
parent(s).

Once we know which files we need to feed to git-fast-import, it's just
a matter of using "hg cat -r <rev> <pathname>" to feed the individual
changed file to git-fast-import.  For each file, you only have to
touch .hg/data/pathane.[di] files.  So this should allow us to feed
input into git-fast-important without needing to feed the full
contents of the repository for each revision.

The other thing that I've been working in my design is how to make the
converter to be bidrectional.  That is, if a changelog is made on the
hg repository, it should be possible to push it over to the git
repository, and vice versa, if there are changes made in the git
repository, it should be possible to push it back to git.  

In order to do this it becomes necessary to special case the .hgrc
file, and in fact we need to make sure that the .hgrc file does *not*
show up in the git repository, but the contents of the .hgrc file
needs to be stored in the state file that lives alongside the git and
hg repositories.

Regards,
						- Ted

  reply	other threads:[~2007-03-06 21:55 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-06 21:06 mercurial to git Rocco Rutte
2007-03-06 21:54 ` Theodore Tso [this message]
2007-03-06 22:47   ` Rocco Rutte
2007-03-06 23:08   ` Josef Sipek
2007-03-07  0:11     ` Theodore Tso
     [not found]       ` <20070314111257.GA4526@peter.daprodeges.fqdn.th-h.de>
2007-03-15  0:25         ` Theodore Tso
2007-03-15 10:19           ` Rocco Rutte
2007-03-15 14:12             ` Theodore Tso
2007-03-15 15:19               ` Rocco Rutte
2007-03-15 15:56               ` Linus Torvalds
     [not found]         ` <20070314132951.GE12710@thunk.org>
     [not found]           ` <20070315094434.GA4425@peter.daprodeges.fqdn.th-h.de>
2007-03-15 21:04             ` Theodore Tso
2007-03-15 22:07               ` Rocco Rutte
2007-03-17 11:37                 ` Simon 'corecode' Schubert
2007-03-16  4:53               ` Len Brown
2007-03-08  9:01   ` Rocco Rutte
2007-03-07 15:59 ` Shawn O. Pearce
2007-03-08  8:56   ` Rocco Rutte
2007-03-07 23:14 ` Shawn O. Pearce
2007-03-08 10:49 ` Rocco Rutte

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070306215459.GI18370@thunk.org \
    --to=tytso@mit.edu \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).