All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Jeff King <peff@peff.net>
Cc: Martin Fick <mfick@codeaurora.org>,
	git discussion list <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: remove_duplicates() in builtin/fetch-pack.c is O(N^2)
Date: Tue, 22 May 2012 09:15:55 +0200	[thread overview]
Message-ID: <4FBB3D2B.4010300@alum.mit.edu> (raw)
In-Reply-To: <20120522041123.GA9972@sigill.intra.peff.net>

On 05/22/2012 06:11 AM, Jeff King wrote:
> On Tue, May 22, 2012 at 05:59:29AM +0200, Michael Haggerty wrote:
>
>>> Try doing "git fetch . refs/*:refs/*" in a repository with a large
>>> number of refs (e.g., 400K). With git v1.7.10, this takes about 9.5s on
>>> my machine. But using the version of git in "next" takes about 16.5s.
>>> Bisection points to your 432ad41 (refs: store references hierarchically,
>>> 2012-04-10). Perf shows sort_ref_dir and msort_with_tmp as hot spots.
>>
>> I'm looking into this.
>>
>> For your test, were the 400k references all in one "directory", or
>> were they sharded somehow?
>
> Sharded. This was the rails network repo, so it basically looks like
> this:
>
>    refs/remotes/XXXXXXX/heads/master
>    refs/remotes/XXXXXXX/tags/v1.0
>    refs/remotes/XXXXXXX/tags/v1.1
>    ...
>    refs/remotes/XXXXXXX/tags/v3.5
>    refs/remotes/YYYYYYY/heads/master
>    refs/remotes/YYYYYYY/tags/v1.0
>    refs/remotes/YYYYYYY/tags/v1.1
>    ...
>
> Basically the same 90 or so refs you would have in a normal repository
> (a branch or two, and a bunch of tags), repeated over and over under
> numbered remote hierarchies (i.e., each remote is basically a copy of
> the refs from somebody's fork of the repo).
>
> I can provide you with the exact repo if you want, but it is about 100M.
>
> Interestingly, I don't seem to get nearly as much slow-down in my "fake"
> many-refs repo, which has all 100K entries directly in refs/heads.

That could explain why I cannot reproduce your result.  I have done all 
of my testing in fake repos (with sharded and non-sharded variants) with 
very little file content.

If it is not too much trouble, please let me know where I can obtain 
your test repo and what commands you used to get your result.  (Was the 
local repo already a full clone of the remote, including all 400k 
references?  How was the remote set up--sharing data or not, local or 
remote?  Warm or cold disk cache?)

Thanks,
Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

  reply	other threads:[~2012-05-22  7:23 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-21  8:13 remove_duplicates() in builtin/fetch-pack.c is O(N^2) Michael Haggerty
2012-05-21  9:09 ` Junio C Hamano
2012-05-21  9:42 ` demerphq
2012-05-21 17:45 ` Jeff King
2012-05-21 22:14   ` Jeff King
2012-05-21 22:17     ` [PATCH 1/5] fetch-pack: sort incoming heads Jeff King
2012-05-22 20:08       ` Junio C Hamano
2012-05-22 20:23         ` Jeff King
2012-05-24  6:04           ` Jeff King
2012-05-21 22:17     ` [PATCH 2/5] fetch-pack: avoid quadratic behavior in remove_duplicates Jeff King
2012-05-21 22:19     ` [PATCH 3/5] add sorting infrastructure for list refs Jeff King
2012-05-21 22:19     ` [PATCH 4/5] fetch-pack: sort the list of incoming refs Jeff King
2012-05-21 22:23     ` [PATCH 5/5] fetch-pack: avoid quadratic loop in filter_refs Jeff King
2012-05-22 20:16       ` Junio C Hamano
2012-05-21 23:52     ` remove_duplicates() in builtin/fetch-pack.c is O(N^2) Jeff King
2012-05-22  0:07       ` Jeff King
2012-05-22  3:59       ` Michael Haggerty
2012-05-22  4:11         ` Jeff King
2012-05-22  7:15           ` Michael Haggerty [this message]
2012-05-22  7:37             ` Jeff King
2012-05-22 13:28               ` Michael Haggerty
2012-05-22 17:33                 ` Jeff King
2012-05-24 12:05                   ` Michael Haggerty
2012-05-25  0:17     ` Martin Fick
2012-05-25  0:39       ` Jeff King
2012-05-25  0:54         ` Martin Fick
2012-05-25  1:04           ` Jeff King
2012-05-25  1:32             ` Martin Fick
2012-05-25  6:50               ` Jeff King
2012-05-22 12:18   ` Nguyen Thai Ngoc Duy
2012-05-22 13:35     ` Michael Haggerty
2012-05-22 17:01     ` Jeff King
2012-05-22 17:35       ` Junio C Hamano
2012-05-22 17:46         ` Jeff King
2012-05-24  4:54         ` Michael Haggerty
2012-05-23  1:20       ` Nguyen Thai Ngoc Duy
2012-05-22 16:16   ` Junio C Hamano
2012-05-21 18:15 ` Martin Fick
2012-05-21 19:41   ` Jeff King
2012-05-21 22:08     ` Junio C Hamano
2012-05-21 22:24       ` Jeff King
2012-05-22  5:51     ` Martin Fick
2012-05-22 18:21       ` Jeff King
2012-05-22 22:19         ` Martin Fick
2012-05-22 23:23           ` Junio C Hamano
2012-05-23  0:46             ` Martin Fick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FBB3D2B.4010300@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mfick@codeaurora.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.