git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: Strange O(N^3) behavior in "git filter-branch"
Date: Sat, 16 Jul 2011 07:26:43 +0200	[thread overview]
Message-ID: <4E212113.6000906@alum.mit.edu> (raw)
In-Reply-To: <20110715212059.GA2117@sigill.intra.peff.net>

On 07/15/2011 11:20 PM, Jeff King wrote:
> On Fri, Jul 15, 2011 at 11:51:49AM -0700, Junio C Hamano wrote:
>> But I think the replace-object codepath should be optimized to realize
>> there is no funky replacement (which _is_ a rare configuration) going on
>> much early so that it does not incur that much overhead you observed. IOW,
>> I tend to agree with your 3. below.
>> [...]
>>> 3. Optimize the specific case where there is no refs/replace
>>> directory--if this directory is missing, then defer populating the loose
>>> refs cache in the hope that it will never be needed.
> 
> It already tries to do so. It looks like it calls:
> 
>   for_each_replace_ref(...)
> 
> which calls:
> 
>   do_for_each_ref("refs/replace", ...)
> 
> which reads _every_ loose ref, regardless of the prefix we have given
> it. So the optimization should go into the for_each_ref code, which
> should avoid looking at parts of the hierarchy that are just going to be
> culled, no?

The way to implement the deferral of reading refs if for_each_ref() is
called on an empty subtree (alternative 4 or 5 from my earlier email) is
to change get_loose_refs() to take a base argument, have it check
whether the directory corresponding to base is empty, and if so return
NULL without reading anything into the cache.

Currently, the loose ref cache is stored as a single linked list, so
there is no easy way to populate part of it now and part of it later.
So with the current data structure, the loose refs cache is
all-or-nothing.  It would be possible to avoid filling it if there are
not replace references, but if there is even one loose replace reference
then the whole refs tree would have to be crawled.  Implementing this
variation is alternative 4 from the early email.

More flexible would be to change the way the loose ref cache is stored
from a linked list into a tree (probably mirroring the directory tree).
 If this were done, then it would be possible to populate the cache
lazily, only crawling the part of the refs tree that is needed for a
particular call of for_each_ref() and reusing any part of the cache that
is already in memory.  This (alternative "5") is considerably more
involved because the data structure has to be changed, but also
potentially a much bigger win because the presence of a single reference
in refs/replace would not force all of the refs/heads and refs/tags and
... to be read.

> And then we would see immediately that there is no refs/replace at all,
> and quit early. And as a bonus, things like "for_each_tag_ref" would get
> faster in a repository with many branches, too.

Only if the data structure used to hold the loose refs cache is changed...

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

  reply	other threads:[~2011-07-16  5:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-14  7:16 Strange O(N^3) behavior in "git filter-branch" Michael Haggerty
2011-07-14  9:24 ` Michael Haggerty
2011-07-15  9:19   ` Michael Haggerty
2011-07-15 18:51     ` Junio C Hamano
2011-07-15 21:20       ` Jeff King
2011-07-16  5:26         ` Michael Haggerty [this message]
2011-07-17 13:24           ` Drew Northup
2011-07-18  8:59             ` Jakub Narebski
2011-07-18 16:01               ` Drew Northup
2011-08-03 13:33     ` Michael Haggerty
2011-08-03 19:37       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E212113.6000906@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).