All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Ken Brownfield <krb@irridia.com>
Cc: git@vger.kernel.org, David Barr <david.barr@cordelta.com>,
	Thomas Rast <trast@student.ethz.ch>,
	Jakub Narebski <jnareb@gmail.com>
Subject: Re: Performance issue exposed by git-filter-branch
Date: Thu, 16 Dec 2010 21:37:15 -0600	[thread overview]
Message-ID: <20101217033715.GA7302@burratino> (raw)
In-Reply-To: <20101217032232.GC7003@burratino>

Jonathan Nieder wrote:
> Ken Brownfield wrote:

>> The thread titled "git and larger trees, not so fast?".
>
> Here it is[1].  Sorry to say, the improvements discussed there
> were made right away and indeed had a dramatic effect.

Of course I missed your point. :)

filter-branch --index-filter works a little like this: for
each commit:

. find the underlying tree
. read-tree: unpack that tree and all of its subtrees into
the index file.  That is, convert from a recursive structure
   /:
	COPYING
	Documentation/
	INSTALL
	Makefile
	...

   Documentation/:
	CodingGuidelines
	Makefile
	...

into a flat structure

	COPYING
	Documentation/CodingGuideLines
	Documentation/Makefile
	Documentation/RelNotes/1.5.0.txt
	...
. rm: find entries matching certain patterns and remove them
from the index file.  This takes two passes through the index:
first to find matching entries, second to write the result to
disk.
. write-tree: write new trees for the object store.  That is,
convert from a flat structure back to a recursive structure.

This is convenient, but it does not sound to me like the most
efficient way to eliminate a few subtrees from each commit.  That is
why I was suggesting a method that avoids unpacking some trees
altogether.

That said, speedups for read-tree, rm, and write-tree would certainly
be nice to have.  One project of interest to some people is to give
the index file a recursive structure, so finding the entries to remove
in the "git rm" example could be faster.

  reply	other threads:[~2010-12-17  3:37 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-17  1:07 Performance issue exposed by git-filter-branch Ken Brownfield
2010-12-17  1:45 ` Jonathan Nieder
2010-12-17  2:31   ` Ken Brownfield
2010-12-17  3:22     ` Jonathan Nieder
2010-12-17  3:37       ` Jonathan Nieder [this message]
2010-12-17  1:54 ` Thomas Rast
2010-12-17  2:36   ` Ken Brownfield
2010-12-17  2:51     ` Jakub Narebski
2010-12-21  4:49       ` Ken Brownfield
2010-12-17  3:08     ` Jonathan Nieder
2010-12-17  5:39       ` Elijah Newren
2011-02-04 21:17         ` Ken Brownfield
2011-02-05 14:21           ` Elijah Newren
2010-12-17 13:01 ` Nguyen Thai Ngoc Duy
2010-12-21  4:59   ` Ken Brownfield

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101217033715.GA7302@burratino \
    --to=jrnieder@gmail.com \
    --cc=david.barr@cordelta.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=krb@irridia.com \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.