git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Keith Packard <keithp@keithp.com>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: keithp@keithp.com, Linus Torvalds <torvalds@osdl.org>,
	git <git@vger.kernel.org>
Subject: Re: Why so much time in the kernel?
Date: Fri, 16 Jun 2006 11:02:05 -0700	[thread overview]
Message-ID: <1150480925.6983.15.camel@neko.keithp.com> (raw)
In-Reply-To: <9e4733910606161044h736c9675kc91ff77904c5a1d0@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1566 bytes --]

On Fri, 2006-06-16 at 13:44 -0400, Jon Smirl wrote:

> I've been extracting versions from cvs and adding them to git now for
> 2.5 days and the process still isn't finished. It is completely CPU
> bound. It's just a loop of cvs co, add it to git, make tree, commit,
> etc.

To do all of mozilla using parsecvs (even with the quadratic algorithm)
takes about three hours on annarchy.freedesktop.org (two dual-core
Opteron with 4GB memory), including all conversion to packs. The pack
time is a tiny fraction of that.

> What about the cvs2svn algorithm described in the attachment? A ram
> based version could be faster. Compression could be acheived by
> switching from using the full path to a version to the sha1 for it.

Yes, parsecvs currently keeps everything in memory when doing the tree
conversion, which means it grows to a huge size to compute the full tree
of revisions. Computing git tree objects from the top down, then
computing commit objects from the bottom up should allow us to free most
of that during the full branch history computation process. I'm starting
a rewrite of parsecvs to try this approach and see how well it works.

If you've looked at the parsecvs source code, you'll notice it's a mess
at present; I started by attempting to do pair-wise tree merges in a
mistaken attempt to convert a linear term to log. Hacking that code into
its present form should be viewed more as a demonstration of how the
overall process can work, not as an optimal expression of the algorithm.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

  reply	other threads:[~2006-06-16 18:02 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-16 14:49 Why so much time in the kernel? Jon Smirl
2006-06-16 15:06 ` Linus Torvalds
2006-06-16 15:25   ` Jon Smirl
2006-06-16 16:09     ` Linus Torvalds
2006-06-16 17:00       ` Jon Smirl
2006-06-16 17:09         ` Jakub Narebski
2006-06-16 17:29         ` Keith Packard
2006-06-16 17:44           ` Jon Smirl
2006-06-16 18:02             ` Keith Packard [this message]
2006-06-16 18:07             ` Nicolas Pitre
2006-06-16 18:32         ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1150480925.6983.15.camel@neko.keithp.com \
    --to=keithp@keithp.com \
    --cc=git@vger.kernel.org \
    --cc=jonsmirl@gmail.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).