From: "Jon Smirl" <jonsmirl@gmail.com>
To: "Martin Langhoff" <martin.langhoff@gmail.com>
Cc: git <git@vger.kernel.org>
Subject: Re: packs and trees
Date: Tue, 20 Jun 2006 10:35:49 -0400 [thread overview]
Message-ID: <9e4733910606200735u5741a9adr83264ae7d51dd37@mail.gmail.com> (raw)
In-Reply-To: <46a038f90606192313l16b16132r1523f5e05ae1566a@mail.gmail.com>
On 6/20/06, Martin Langhoff <martin.langhoff@gmail.com> wrote:
> On 6/20/06, Jon Smirl <jonsmirl@gmail.com> wrote:
> > The plan is to modify rcs2git from parsecvs to create all of the git
> > objects for the tree.
>
> Sounds like a good plan. Have you seen recent discussions about it
> being impossible to repack usefully when you don't have trees (and
> resulting performance problems on ext3).
No, I will look back in the archives. If needed we can do a repack
after each file is added. I would hope that git can handle a repack
when the new stuff is 100% deltas from a single file.
If I can't pack the exploded deltas need about 35GB disk space. That
is an awful lot to feed to pack all at once, but it will have trees,
>
> > cvs2svn seems to do a good job at generating the trees.
>
> No doubt. Gut the last stage, and use all the data in the intermediate
> DBs to run a git import. It's a great plan, and if you can understand
> that Python code... all yours ;-)
How hard would it be to adjust cvsps to use cvs2svn's algorithm for
grouping the changesets? I'd rather do this in a C app but I haven't
figured out the guts of parsecvs or cvsps well enough to change the
algorithms. There is no requirement to use external databases, sorting
everything in RAM is fine.
If you are interested in changing the cvsps grouping algorithm I can
look at moding it to write out the revisions as are they are parsed.
Then you only need to save the git sha1 in memory instead of the
file:rev when sorting.
> > exactly sure how the changeset detection algorithms in the three apps
> > compare, but cvs2svn is not having any trouble building changesets for
> > Mozilla. The other two apps have some issues, cvsps throws away some
> > of the branches and parsecvs can't complete the analysis.
>
> Have you tried a recent parsecvs from Keith's tree? There's been quite
> a bit of activity there too. And Keith's interested in sorting out
> incremental imports too, which you need for a reasonable Moz
> transition plan as well.
Keith's parsecvs run ended up in a loop and mine hit a parsecvs error
and then had memory corruption after about eight hours. That was last
week, I just checked the logs and I don't see any comments about
fixing it.
Even after spending eight hours building the changeset info iit is
still going to take it a couple of days to retrieve the versions one
at a time and write them to git. Reparsing 50MB delta files n^2/2
times is a major bottleneck for all three programs.
--
Jon Smirl
jonsmirl@gmail.com
next prev parent reply other threads:[~2006-06-20 14:35 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-20 5:57 packs and trees Jon Smirl
2006-06-20 6:13 ` Martin Langhoff
2006-06-20 14:35 ` Jon Smirl [this message]
2006-06-20 15:18 ` Keith Packard
2006-06-20 16:33 ` Jon Smirl
2006-06-20 15:03 ` Nicolas Pitre
2006-06-20 19:41 ` Martin Langhoff
2006-06-20 20:51 ` Nicolas Pitre
2006-06-21 3:54 ` Linus Torvalds
2006-06-21 15:32 ` David Lang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9e4733910606200735u5741a9adr83264ae7d51dd37@mail.gmail.com \
--to=jonsmirl@gmail.com \
--cc=git@vger.kernel.org \
--cc=martin.langhoff@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).