From: Eric Sunshine <sunshine@sunshineco.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
Martin Fick <mfick@codeaurora.org>,
Git List <git@vger.kernel.org>
Subject: Re: [PATCH 3/3] fetch-pack: avoid quadratic behavior in rev_list_push
Date: Tue, 2 Jul 2013 03:52:04 -0400 [thread overview]
Message-ID: <CAPig+cRYdinZVA0+nU6tU_TcusFu6Pe0qaQ5Bv4Xbmg1izDbLQ@mail.gmail.com> (raw)
In-Reply-To: <20130702062420.GC27856@sigill.intra.peff.net>
On Tue, Jul 2, 2013 at 2:24 AM, Jeff King <peff@peff.net> wrote:
> When we call find_common to start finding common ancestors
> with the remote side of a fetch, the first thing we do is
> insert the tip of each ref into our rev_list linked list. We
> keep the list sorted the whole time with
> commit_list_insert_by_date, which means our insertion ends
> up doing O(n^2) timestamp comparisons.
>
> We could teach rev_list_push to use an unsorted list, and
> then sort it once after we have added each ref. However, in
> get_rev, we process the list by popping commits off the
> front and adding parents back in timestamp-sorted order. So
> that procedure would still operate on the large list.
>
> Instead, we can replace the linked list with a heap-based
> priority queue, which can do O(log n) insertion, making the
> whole insertion procedure O(n log n).
>
> As a result of switching to the prio_queue struct, we fix
> two minor bugs:
>
> 1. When we "pop" a commit in get_rev, and when we clear
> the rev_list in find_common, we do not take care to
> free the "struct commit_list", and just leak its
> memory. With the prio_queue implementation, the memory
> management is handled for us.
>
> 2. In get_rev, we look at the head commit of the list,
> possibly push its parents onto the list, and then "pop"
> the front of the list off, assuming it is the same
> element that we just peeked at. This is typically going
> to be the case, but would not be in the face of clock
> skew: the parents are inserted by date, and could
> potentailly be inserted at the head of the list if they
s/potentailly/potentially/
> have a timestamp newer than their descendent. In this
> case, we would accidentally pop the parent, and never
> process it at all.
>
> The new implementation pulls the commit off of the
> queue as we examine it, and so does not suffer from
> this problem.
>
> With this patch, a fetch of a single commit into a
> repository with 50,000 refs went from:
>
> real 0m7.984s
> user 0m7.852s
> sys 0m0.120s
>
> to:
>
> real 0m2.017s
> user 0m1.884s
> sys 0m0.124s
>
> Before this patch, a larger case with 370K refs still had
> not completed after tens of minutes; with this patch, it
> completes in about 12 seconds.
>
> Signed-off-by: Jeff King <peff@peff.net>
next prev parent reply other threads:[~2013-07-02 7:52 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-02 3:02 How to still kill git fetch with too many refs Martin Fick
2013-07-02 4:07 ` Jeff King
2013-07-02 4:41 ` Jeff King
2013-07-02 5:01 ` Jeff King
2013-07-02 5:19 ` Junio C Hamano
2013-07-02 5:28 ` Jeff King
2013-07-02 6:11 ` [PATCH 0/3] avoid quadratic behavior in fetch-pack Jeff King
2013-07-02 6:16 ` [PATCH 1/3] fetch-pack: avoid quadratic list insertion in mark_complete Jeff King
2013-07-02 6:21 ` [PATCH 2/3] commit.c: make compare_commits_by_commit_date global Jeff King
2013-07-02 6:24 ` [PATCH 3/3] fetch-pack: avoid quadratic behavior in rev_list_push Jeff King
2013-07-02 7:52 ` Eric Sunshine [this message]
2013-07-02 17:45 ` [PATCH 0/3] avoid quadratic behavior in fetch-pack Martin Fick
2013-07-02 17:52 ` How to still kill git fetch with too many refs Brandon Casey
2013-07-02 9:24 ` Michael Haggerty
2013-07-02 16:58 ` Martin Fick
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAPig+cRYdinZVA0+nU6tU_TcusFu6Pe0qaQ5Bv4Xbmg1izDbLQ@mail.gmail.com \
--to=sunshine@sunshineco.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mfick@codeaurora.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).