git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Martin Fick <mfick@codeaurora.org>
Cc: Michael Haggerty <mhagger@alum.mit.edu>,
	git discussion list <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: remove_duplicates() in builtin/fetch-pack.c is O(N^2)
Date: Mon, 21 May 2012 18:14:17 -0400	[thread overview]
Message-ID: <20120521221417.GA22664@sigill.intra.peff.net> (raw)
In-Reply-To: <20120521174525.GA22643@sigill.intra.peff.net>

On Mon, May 21, 2012 at 01:45:25PM -0400, Jeff King wrote:

> > I don't plan to work on this, but I thought I would point it out in
> > case it is causing somebody pain.
> 
> I'll clean up the patch and make one for the filter_refs case, too.

Here are the patches. I tested these with three repositories:

  - the rails/rails network repo, with ~400K refs

  - the git/git network repo, with ~220 refs

  - a fake repo made from git.git, with 3 numbered refs pointing at each
    commit for a total of ~100K refs (I didn't have a real dataset that
    was ~100K).

Before these patches, I never managed to complete a "git clone --mirror
file://$repo" on the first two, as I got impatient after 5-10 minutes
and killed the process. The third one completed in ~110s.

  [1/5]: fetch-pack: sort incoming heads
  [2/5]: fetch-pack: avoid quadratic behavior in remove_duplicates

After this one, the "fake" case was down to ~63s.

  [3/5]: add sorting infrastructure for list refs
  [4/5]: fetch-pack: sort the list of incoming refs
  [5/5]: fetch-pack: avoid quadratic loop in filter_refs

And after this one, the "fake" case was down to ~32s. Notably, most of
the time was spent on the actual object transfer (i.e., before it would
hang before even getting to "Counting objects...", and now it starts
that almost immediately). Perf corroborates this by showing most of the
time in zlib inflate and delta resolution.

The rails and git cases run in ~28s and ~37s, respectively, again mostly
going to the actual object transfer. So I think this series removes all
of the asymptotically bad behavior from this code path.

One thing to note about all of these repos is that they tend to have
several refs pointing to a single commit. None of the speedups in this
series depends on that fact, but it may be that on a repo with more
independent refs, we may uncover other code paths (e.g., I know that my
fix for mark_complete in ea5f220 improves the case with duplicate refs,
but would not help if you really have 400K refs pointing to unique
commits[1]).

Martin, let me know if this improves things for your many-ref cases (and
if you are still seeing slowness in your repos with many refs, let me
know which operations cause it).

-Peff

[1] At the time of that commit, I proposed a fix with a priority queue
    replacing the commit_list, but we deemed it too much code for
    a case that was unlikely to happen. But now it sounds like that case
    is less unlikely than we thought, and now that we have René's
    linked-list merge-sort code, I think we could fix it with a 2-line
    change. I'll look into it.

  reply	other threads:[~2012-05-21 22:14 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-21  8:13 remove_duplicates() in builtin/fetch-pack.c is O(N^2) Michael Haggerty
2012-05-21  9:09 ` Junio C Hamano
2012-05-21  9:42 ` demerphq
2012-05-21 17:45 ` Jeff King
2012-05-21 22:14   ` Jeff King [this message]
2012-05-21 22:17     ` [PATCH 1/5] fetch-pack: sort incoming heads Jeff King
2012-05-22 20:08       ` Junio C Hamano
2012-05-22 20:23         ` Jeff King
2012-05-24  6:04           ` Jeff King
2012-05-21 22:17     ` [PATCH 2/5] fetch-pack: avoid quadratic behavior in remove_duplicates Jeff King
2012-05-21 22:19     ` [PATCH 3/5] add sorting infrastructure for list refs Jeff King
2012-05-21 22:19     ` [PATCH 4/5] fetch-pack: sort the list of incoming refs Jeff King
2012-05-21 22:23     ` [PATCH 5/5] fetch-pack: avoid quadratic loop in filter_refs Jeff King
2012-05-22 20:16       ` Junio C Hamano
2012-05-21 23:52     ` remove_duplicates() in builtin/fetch-pack.c is O(N^2) Jeff King
2012-05-22  0:07       ` Jeff King
2012-05-22  3:59       ` Michael Haggerty
2012-05-22  4:11         ` Jeff King
2012-05-22  7:15           ` Michael Haggerty
2012-05-22  7:37             ` Jeff King
2012-05-22 13:28               ` Michael Haggerty
2012-05-22 17:33                 ` Jeff King
2012-05-24 12:05                   ` Michael Haggerty
2012-05-25  0:17     ` Martin Fick
2012-05-25  0:39       ` Jeff King
2012-05-25  0:54         ` Martin Fick
2012-05-25  1:04           ` Jeff King
2012-05-25  1:32             ` Martin Fick
2012-05-25  6:50               ` Jeff King
2012-05-22 12:18   ` Nguyen Thai Ngoc Duy
2012-05-22 13:35     ` Michael Haggerty
2012-05-22 17:01     ` Jeff King
2012-05-22 17:35       ` Junio C Hamano
2012-05-22 17:46         ` Jeff King
2012-05-24  4:54         ` Michael Haggerty
2012-05-23  1:20       ` Nguyen Thai Ngoc Duy
2012-05-22 16:16   ` Junio C Hamano
2012-05-21 18:15 ` Martin Fick
2012-05-21 19:41   ` Jeff King
2012-05-21 22:08     ` Junio C Hamano
2012-05-21 22:24       ` Jeff King
2012-05-22  5:51     ` Martin Fick
2012-05-22 18:21       ` Jeff King
2012-05-22 22:19         ` Martin Fick
2012-05-22 23:23           ` Junio C Hamano
2012-05-23  0:46             ` Martin Fick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120521221417.GA22664@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mfick@codeaurora.org \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).