From: Andreas Ericsson <ae@op5.se>
To: Shawn Pearce <spearce@spearce.org>
Cc: A Large Angry SCM <gitzilla@gmail.com>,
Sverre Rabbelier <srabbelier@gmail.com>,
NAKAMURA Takumi <geek4civic@gmail.com>, git <git@vger.kernel.org>
Subject: Re: Git is not scalable with too many refs/*
Date: Mon, 13 Jun 2011 09:08:09 +0200 [thread overview]
Message-ID: <4DF5B759.8090401@op5.se> (raw)
In-Reply-To: <BANLkTi=4zfO5jKKzbncJk7ihcoHX7Rst4Q@mail.gmail.com>
On 06/10/2011 09:41 PM, Shawn Pearce wrote:
> On Fri, Jun 10, 2011 at 00:41, Andreas Ericsson<ae@op5.se> wrote:
>> On 06/09/2011 05:56 PM, Shawn Pearce wrote:
>>>
>>> A lot of operations toss every commit that a reference points at into
>>> the revision walker's LRU queue. If you have a tag pointing to every
>>> commit, then the entire project history enters the LRU queue at once,
>>> up front. That queue is managed with O(N^2) insertion time. And the
>>> entire queue has to be filled before anything can be output.
>>
>> Hmm. Since we're using pre-hashed data with an obvious lookup method
>> we should be able to do much, much better than O(n^2) for insertion
>> and better than O(n) for worst-case lookups. I'm thinking a 1-byte
>> trie, resulting in a depth, lookup and insertion complexity of 20. It
>> would waste some memory but it might be worth it for fixed asymptotic
>> complexity for both insertion and lookup.
>
> Not really.
>
> The queue isn't sorting by SHA-1. Its sorting by commit timestamp,
> descending. Those aren't pre-hashed. The O(N^2) insertion is because
> the code is trying to find where this commit belongs in the list of
> commits as sorted by commit timestamp.
>
Hmm. We should still be able to do better than that, and particularly
for the "tag-each-commit" workflow. Since it's most likely those tags
are generated using incrementing numbers, we could have a cut-off where
we first parse all the refs and make an optimistic assumption that an
alphabetical sort of the refs provides a map of insertion-points for
the commits. Since the best case behaviour is still O(1) for insertion
sort and it's unlikely that thousands of refs are in random order, that
should cause the vast majority of the refs we insert to follow the best
case scenario.
This will fall on its arse when people start doing hg-ref -> git-commit
tags ofcourse, but that doesn't seem to be happening, or at least not to
the same extent as with svn-revisions -> git-gommit mapping.
We're still not improving the asymptotic complexity, but it's a pretty
safe bet that we for a vast majority of cases improve wallclock runtime
by a hefty amount with a relatively minor effort.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
next prev parent reply other threads:[~2011-06-13 7:08 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-09 3:44 Git is not scalable with too many refs/* NAKAMURA Takumi
2011-06-09 6:50 ` Sverre Rabbelier
2011-06-09 15:23 ` Shawn Pearce
2011-06-09 15:52 ` A Large Angry SCM
2011-06-09 15:56 ` Shawn Pearce
2011-06-09 16:26 ` Jeff King
2011-06-10 3:59 ` NAKAMURA Takumi
2011-06-13 22:27 ` Jeff King
2011-06-14 0:17 ` Andreas Ericsson
2011-06-14 0:30 ` Jeff King
2011-06-14 4:41 ` Junio C Hamano
2011-06-14 7:26 ` Sverre Rabbelier
2011-06-14 10:02 ` Johan Herland
2011-06-14 10:34 ` Sverre Rabbelier
2011-06-14 17:02 ` Jeff King
2011-06-14 19:20 ` Shawn Pearce
2011-06-14 19:47 ` Jeff King
2011-06-14 20:12 ` Shawn Pearce
2011-09-08 19:53 ` Martin Fick
2011-09-09 0:52 ` Martin Fick
2011-09-09 1:05 ` Thomas Rast
2011-09-09 1:13 ` Thomas Rast
2011-09-09 15:59 ` Jens Lehmann
2011-09-25 20:43 ` Martin Fick
2011-09-26 12:41 ` Christian Couder
2011-09-26 17:47 ` Martin Fick
2011-09-26 18:56 ` Christian Couder
2011-09-30 16:41 ` Martin Fick
2011-09-30 19:26 ` Martin Fick
2011-09-30 21:02 ` Martin Fick
2011-09-30 22:06 ` Martin Fick
2011-10-01 20:41 ` Junio C Hamano
2011-10-02 5:19 ` Michael Haggerty
2011-10-03 0:46 ` Martin Fick
2011-10-04 8:08 ` Michael Haggerty
2011-10-03 18:12 ` Martin Fick
2011-10-03 19:42 ` Junio C Hamano
2011-10-04 8:16 ` Michael Haggerty
2011-10-08 20:59 ` Martin Fick
2011-10-09 5:43 ` Michael Haggerty
2011-09-28 19:38 ` Martin Fick
2011-09-28 22:10 ` Martin Fick
2011-09-29 0:54 ` Julian Phillips
2011-09-29 1:37 ` Martin Fick
2011-09-29 2:19 ` Julian Phillips
2011-09-29 16:38 ` Martin Fick
2011-09-29 18:26 ` Julian Phillips
2011-09-29 18:27 ` René Scharfe
2011-09-29 19:10 ` Junio C Hamano
2011-09-29 4:18 ` [PATCH] refs: Use binary search to lookup refs faster Julian Phillips
2011-09-29 21:57 ` Junio C Hamano
2011-09-29 22:04 ` [PATCH v2] " Julian Phillips
2011-09-29 22:06 ` [PATCH] " Junio C Hamano
2011-09-29 22:11 ` [PATCH v3] " Julian Phillips
2011-09-29 23:48 ` Junio C Hamano
2011-09-30 15:30 ` Michael Haggerty
2011-09-30 16:38 ` Junio C Hamano
2011-09-30 17:56 ` [PATCH] refs: Remove duplicates after sorting with qsort Julian Phillips
2011-10-02 5:15 ` [PATCH v3] refs: Use binary search to lookup refs faster Michael Haggerty
2011-10-02 5:45 ` Junio C Hamano
2011-10-04 20:58 ` Junio C Hamano
2011-09-30 1:13 ` Martin Fick
2011-09-30 3:44 ` Junio C Hamano
2011-09-30 8:04 ` Julian Phillips
2011-09-30 15:45 ` Martin Fick
2011-09-29 20:44 ` Git is not scalable with too many refs/* Martin Fick
2011-09-29 19:10 ` Julian Phillips
2011-09-29 20:11 ` Martin Fick
2011-09-30 9:12 ` René Scharfe
2011-09-30 16:09 ` Martin Fick
2011-09-30 16:52 ` Junio C Hamano
2011-09-30 18:17 ` René Scharfe
2011-10-01 15:28 ` René Scharfe
2011-10-01 15:38 ` [PATCH 1/8] checkout: check for "Previous HEAD" notice in t2020 René Scharfe
2011-10-01 19:02 ` Sverre Rabbelier
2011-10-01 15:43 ` [PATCH 2/8] revision: factor out add_pending_sha1 René Scharfe
2011-10-01 15:51 ` [PATCH 3/8] checkout: use add_pending_{object,sha1} in orphan check René Scharfe
2011-10-01 15:56 ` [PATCH 4/8] revision: add leak_pending flag René Scharfe
2011-10-01 16:01 ` [PATCH 5/8] bisect: use " René Scharfe
2011-10-01 16:02 ` [PATCH 6/8] bundle: " René Scharfe
2011-10-01 16:09 ` [PATCH 7/8] checkout: " René Scharfe
2011-10-01 16:16 ` [PATCH 8/8] commit: factor out clear_commit_marks_for_object_array René Scharfe
2011-09-26 15:15 ` Git is not scalable with too many refs/* Martin Fick
2011-09-26 15:21 ` Sverre Rabbelier
2011-09-26 15:48 ` Martin Fick
2011-09-26 15:56 ` Sverre Rabbelier
2011-09-26 16:38 ` Martin Fick
2011-09-26 16:49 ` Julian Phillips
2011-09-26 18:07 ` Martin Fick
2011-09-26 18:37 ` Julian Phillips
2011-09-26 20:01 ` Martin Fick
2011-09-26 20:07 ` Junio C Hamano
2011-09-26 20:28 ` Julian Phillips
2011-09-26 21:39 ` Martin Fick
2011-09-26 21:52 ` Martin Fick
2011-09-26 23:26 ` Julian Phillips
2011-09-26 23:37 ` David Michael Barr
2011-09-27 1:01 ` [PATCH] refs.c: Fix slowness with numerous loose refs David Barr
2011-09-27 2:04 ` David Michael Barr
2011-09-26 23:38 ` Git is not scalable with too many refs/* Junio C Hamano
2011-09-27 0:00 ` [PATCH] Don't sort ref_list too early Julian Phillips
2011-10-02 4:58 ` Michael Haggerty
2011-09-27 0:12 ` Git is not scalable with too many refs/* Martin Fick
2011-09-27 0:22 ` Julian Phillips
2011-09-27 2:34 ` Martin Fick
2011-09-27 7:59 ` Julian Phillips
2011-09-27 8:20 ` Sverre Rabbelier
2011-09-27 9:01 ` Julian Phillips
2011-09-27 10:01 ` Sverre Rabbelier
2011-09-27 10:25 ` Nguyen Thai Ngoc Duy
2011-09-27 11:07 ` Michael Haggerty
2011-09-27 12:10 ` Julian Phillips
2011-09-26 22:30 ` Julian Phillips
2011-09-26 15:32 ` Michael Haggerty
2011-09-26 15:42 ` Martin Fick
2011-09-26 16:25 ` Thomas Rast
2011-09-09 13:50 ` Michael Haggerty
2011-09-09 15:51 ` Michael Haggerty
2011-09-09 16:03 ` Jens Lehmann
2011-06-10 7:41 ` Andreas Ericsson
2011-06-10 19:41 ` Shawn Pearce
2011-06-10 20:12 ` Jakub Narebski
2011-06-10 20:35 ` Jeff King
2011-06-13 7:08 ` Andreas Ericsson [this message]
2011-06-09 11:18 ` Jakub Narebski
2011-06-09 15:42 ` Stephen Bash
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DF5B759.8090401@op5.se \
--to=ae@op5.se \
--cc=geek4civic@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitzilla@gmail.com \
--cc=spearce@spearce.org \
--cc=srabbelier@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).