All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Henning Moll <newsScott@gmx.de>
Cc: git@vger.kernel.org
Subject: Re: filter-branch performance
Date: Tue, 9 Dec 2014 13:59:34 -0500	[thread overview]
Message-ID: <20141209185933.GC31158@peff.net> (raw)
In-Reply-To: <548744F1.9000902@gmx.de>

On Tue, Dec 09, 2014 at 07:52:33PM +0100, Henning Moll wrote:

> i am runningthis command
> 
> git filter-branch --env-filter 'export
> GIT_COMMITTER_EMAIL="$GIT_AUTHOR_EMAIL"
> GIT_COMMITTER_NAME="$GIT_AUTHOR_NAME" GIT_COMMITTER_DATE="$GIT_AUTHOR_DATE"'
> --prune-empty --tag-name-filter cat -- --all
> 
> in a repository which i copied to /dev/shm before. According to "top", the
> git process only consumes about 5 percent of the CPU. The load is between
> 0.70 and 1.00.
> 
> I assume that there is a lot of process forking going on. Could that be the
> cause?

Yes. filter-branch is a shell scripts, and it is probably running
multiple git commands per commit it is filtering.

> Any ideas how to further improve?

In your case you are not touching the tree contents at all. Last time I
looked into this, I believe that filter-branch always loaded the index
for each commit, even if no --index-filter is being used. So teaching
filter-branch to optimize this case would be one strategy.

Another is to try using "git fast-export | git fast-import", and munging
the data stream in between. That's may be more work, depending how fancy
you want to get with accurate parsing (look into fast-export's
--no-data, which omits blob data; that should make things faster and
make hacky context-less parsing less likely to cause problems).

-Peff

  reply	other threads:[~2014-12-09 18:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-09 18:52 filter-branch performance Henning Moll
2014-12-09 18:59 ` Jeff King [this message]
2014-12-10 14:18   ` Roberto Tyley
2014-12-10 14:37     ` Jeff King
2014-12-10 15:25       ` Roberto Tyley
2014-12-10 16:05     ` Junio C Hamano
2014-12-10 23:44       ` Roberto Tyley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141209185933.GC31158@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=newsScott@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.