All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Jeff King <peff@peff.net>
Cc: Martin Fick <mfick@codeaurora.org>,
	git discussion list <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: remove_duplicates() in builtin/fetch-pack.c is O(N^2)
Date: Tue, 22 May 2012 05:59:29 +0200	[thread overview]
Message-ID: <4FBB0F21.5080608@alum.mit.edu> (raw)
In-Reply-To: <20120521235219.GA5589@sigill.intra.peff.net>

On 05/22/2012 01:52 AM, Jeff King wrote:
> On Mon, May 21, 2012 at 06:14:17PM -0400, Jeff King wrote:
>
>> The rails and git cases run in ~28s and ~37s, respectively, again mostly
>> going to the actual object transfer. So I think this series removes all
>> of the asymptotically bad behavior from this code path.
>>
>> One thing to note about all of these repos is that they tend to have
>> several refs pointing to a single commit. None of the speedups in this
>> series depends on that fact, but it may be that on a repo with more
>> independent refs, we may uncover other code paths (e.g., I know that my
>> fix for mark_complete in ea5f220 improves the case with duplicate refs,
>> but would not help if you really have 400K refs pointing to unique
>> commits[1]).
>
> Hmm. So I started to do some experimentation with this and noticed
> something odd.
>
> Try doing "git fetch . refs/*:refs/*" in a repository with a large
> number of refs (e.g., 400K). With git v1.7.10, this takes about 9.5s on
> my machine. But using the version of git in "next" takes about 16.5s.
> Bisection points to your 432ad41 (refs: store references hierarchically,
> 2012-04-10). Perf shows sort_ref_dir and msort_with_tmp as hot spots.

I'm looking into this.

For your test, were the 400k references all in one "directory", or were 
they sharded somehow?

A large number of packed references in a single namespace is the 
worst-case scenario for the hierarchical refs change.  Since the refs in 
a namespace are all stored in a single array, there is no benefit from 
the hierarchical storage whereas there is some extra cost from 
traversing the namespace tree for each lookup.  But the performance hit 
shouldn't be this large.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

  parent reply	other threads:[~2012-05-22  3:59 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-21  8:13 remove_duplicates() in builtin/fetch-pack.c is O(N^2) Michael Haggerty
2012-05-21  9:09 ` Junio C Hamano
2012-05-21  9:42 ` demerphq
2012-05-21 17:45 ` Jeff King
2012-05-21 22:14   ` Jeff King
2012-05-21 22:17     ` [PATCH 1/5] fetch-pack: sort incoming heads Jeff King
2012-05-22 20:08       ` Junio C Hamano
2012-05-22 20:23         ` Jeff King
2012-05-24  6:04           ` Jeff King
2012-05-21 22:17     ` [PATCH 2/5] fetch-pack: avoid quadratic behavior in remove_duplicates Jeff King
2012-05-21 22:19     ` [PATCH 3/5] add sorting infrastructure for list refs Jeff King
2012-05-21 22:19     ` [PATCH 4/5] fetch-pack: sort the list of incoming refs Jeff King
2012-05-21 22:23     ` [PATCH 5/5] fetch-pack: avoid quadratic loop in filter_refs Jeff King
2012-05-22 20:16       ` Junio C Hamano
2012-05-21 23:52     ` remove_duplicates() in builtin/fetch-pack.c is O(N^2) Jeff King
2012-05-22  0:07       ` Jeff King
2012-05-22  3:59       ` Michael Haggerty [this message]
2012-05-22  4:11         ` Jeff King
2012-05-22  7:15           ` Michael Haggerty
2012-05-22  7:37             ` Jeff King
2012-05-22 13:28               ` Michael Haggerty
2012-05-22 17:33                 ` Jeff King
2012-05-24 12:05                   ` Michael Haggerty
2012-05-25  0:17     ` Martin Fick
2012-05-25  0:39       ` Jeff King
2012-05-25  0:54         ` Martin Fick
2012-05-25  1:04           ` Jeff King
2012-05-25  1:32             ` Martin Fick
2012-05-25  6:50               ` Jeff King
2012-05-22 12:18   ` Nguyen Thai Ngoc Duy
2012-05-22 13:35     ` Michael Haggerty
2012-05-22 17:01     ` Jeff King
2012-05-22 17:35       ` Junio C Hamano
2012-05-22 17:46         ` Jeff King
2012-05-24  4:54         ` Michael Haggerty
2012-05-23  1:20       ` Nguyen Thai Ngoc Duy
2012-05-22 16:16   ` Junio C Hamano
2012-05-21 18:15 ` Martin Fick
2012-05-21 19:41   ` Jeff King
2012-05-21 22:08     ` Junio C Hamano
2012-05-21 22:24       ` Jeff King
2012-05-22  5:51     ` Martin Fick
2012-05-22 18:21       ` Jeff King
2012-05-22 22:19         ` Martin Fick
2012-05-22 23:23           ` Junio C Hamano
2012-05-23  0:46             ` Martin Fick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FBB0F21.5080608@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mfick@codeaurora.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.