From: "Jon Smirl" <jonsmirl@gmail.com>
To: "Linus Torvalds" <torvalds@osdl.org>
Cc: git <git@vger.kernel.org>
Subject: Re: Why so much time in the kernel?
Date: Fri, 16 Jun 2006 11:25:22 -0400 [thread overview]
Message-ID: <9e4733910606160825hb538d6fo4c9f1d7d9768e100@mail.gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0606160755170.5498@g5.osdl.org>
On 6/16/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Fri, 16 Jun 2006, Jon Smirl wrote:
> >
> > I am spending over 40% of the time in the kernel. This looks to be
> > caused from forks and starting small tasks, is that the correct
> > interpretation?
>
> Yes. Your kernel profile is all for stuff related to setting up and
> tearing down process space (well, __mutex_lock_slowpath at 1.88% and
> __d_lookup at 1.3% is not, but every single one before that does seem to
> be about fork/exec/exit).
>
> I think it's both the CVS server that continually forks/exits (it doesn't
> actually do a exec at all - it seem sto be using fork/exit as a way to
> control its memory usage - knowing that the OS will free all the temporary
> memory on exit - I think the newer CVS development trees don't do this,
> but that also seems to be why they leak memory like mad and eventually run
> out ;).
I am using cvs-1.11.21-3.2
I can try running their development tree.
>
> AND it's git-cvsimport forking and exec'ing git helper processes.
Is it worthwhile to make a library version of these? Svn has lib
versions and they barely show up in oprofile. cvsimport is only using
4-5 low level git funtions.
>
> So that process overhead is expected.
>
> What I would _not_ have expected is:
>
> > 933646 2.0983 /usr/local/bin/git-read-tree
>
> I don't see why git-read-tree is so hot for you. We should never need to
> read a tree when we're importing something, unless there are tons of
> branches and we switch back and forth between them.
>
> I guess mozilla really does use a fair number of branches?
Is 1,800 a lot?
>
> Martin sent out a patch (that I don't think has been merged yet) to avoid
> the git-read-tree overhead when switching branches. Look for an email with
> a subject like "cvsimport: keep one index per branch during import", I
> suspect that would speed up the git part a lot.
I'll check this out
> (It will also avoid a few fork/exec's, but you'll still have most of them,
> so I don't think you'll see any really _fundamental_ changes to this, but
> the git-read-tree overhead should be basically gone, and some of the
> libz.so pressure would also be gone with it. It should also avoid
> rewriting the index file, so you'd get lower disk pressure, but it looks
> like none of your problems are really due to IO, so again, that probably
> won't make much of a difference for you).
I have been CPU bound for two days, disk activity is minor.
git-cvsimport is 250MB and I have 2GB of disk cache.
After looking at this process for about a week it doesn't look like
processing chronologically is the best strategy. cvsps can quickly
work out the changesets, 15 minutes. Then it might be better to walk
the CVS files one at a time generating git IDs for each revision. Next
use the IDs and changeset info to build the git trees. Finally pack
everything. This strategy would minimize the work load on the CVS
files (adding all those delta to get random revs).
Can git build a repository in this manner? If this is feasible it may
be possible to do all of this in a single pass over the CVS tree by
modifying cvsps.
--
Jon Smirl
jonsmirl@gmail.com
next prev parent reply other threads:[~2006-06-16 15:25 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-16 14:49 Why so much time in the kernel? Jon Smirl
2006-06-16 15:06 ` Linus Torvalds
2006-06-16 15:25 ` Jon Smirl [this message]
2006-06-16 16:09 ` Linus Torvalds
2006-06-16 17:00 ` Jon Smirl
2006-06-16 17:09 ` Jakub Narebski
2006-06-16 17:29 ` Keith Packard
2006-06-16 17:44 ` Jon Smirl
2006-06-16 18:02 ` Keith Packard
2006-06-16 18:07 ` Nicolas Pitre
2006-06-16 18:32 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9e4733910606160825hb538d6fo4c9f1d7d9768e100@mail.gmail.com \
--to=jonsmirl@gmail.com \
--cc=git@vger.kernel.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).