git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Fick <mfick@codeaurora.org>
To: Julian Phillips <julian@quantumfyre.co.uk>
Cc: Christian Couder <christian.couder@gmail.com>,
	git@vger.kernel.org, Christian Couder <chriscool@tuxfamily.org>,
	Thomas Rast <trast@student.ethz.ch>
Subject: Re: Git is not scalable with too many refs/*
Date: Thu, 29 Sep 2011 10:38:44 -0600	[thread overview]
Message-ID: <201109291038.45290.mfick@codeaurora.org> (raw)
In-Reply-To: <7c0105c6cca7dd0aa336522f90617fe4@quantumfyre.co.uk>

On Wednesday, September 28, 2011 08:19:16 pm Julian Phillips 
wrote:
> On Wed, 28 Sep 2011 19:37:18 -0600, Martin Fick wrote:
> > On Wednesday 28 September 2011 18:59:09 Martin Fick 
wrote:
> >> Julian Phillips <julian@quantumfyre.co.uk> wrote:
> -- snip --
> 
> >> I've created a test repo with ~100k refs/changes/...
> >> style refs, and ~40000 refs/heads/... style refs, and
> >> checkout can walk the list of ~140k refs seven times
> >> in 85ms user time including doing whatever other
> >> processing is needed for checkout. The real time is
> >> only 114ms - but then my test repo has no real data
> >> in.
> > 
> > If I understand what you are saying, it sounds like you
> > do not have a very good test case. The amount of time
> > it takes for checkout depends on how long it takes to
> > find a ref with the sha1 that you are on. If that sha1
> > is so early in the list of refs that it only took you
> > 7 traversals to find it, then that is not a very good
> > testcase. I think that you should probably try making
> > an orphaned ref (checkout a detached head, commit to
> > it), that is probably the worst testcase since it
> > should then have to search all 140K refs to eventually
> > give up.
> > 
> > Again, if I understand what you are saying, if it took
> > 85ms for 7 traversals, then it takes approximately
> > 10ms per traversal, that's only 100/s! If you have to
> > traverse it 140K times, that should work out to 1400s
> > ~ 23mins.
> 
> Well, it's no more than 10ms per traversal - since the
> rest of the work presumably takes some time too ...
> 
> However, I had forgotten to make the orphaned commit as
> you suggest - and then _bang_ 7N^2, it tries seven
> different variants of each ref (which is silly as they
> are all fully qualified), and with packed refs it has to
> search for them each time, all to turn names into hashes
> that we already know to start with.
> 
> So, yes - it is that list traversal.
> 
> Does the following help?
> 
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 5e356a6..f0f4ca1 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -605,7 +605,7 @@ static int
> add_one_ref_to_rev_list_arg(const char *refname,
>                                         int flags,
>                                         void *cb_data)
>   {
> -       add_one_rev_list_arg(cb_data, refname);
> +       add_one_rev_list_arg(cb_data,
> strdup(sha1_to_hex(sha1))); return 0;
>   }


Yes, but in some strange ways. :)

First, let me clarify that all the tests here involve your 
"sort fix" from 2 days ago applied first.

In the packed ref repo, it brings the time down to about 
~10s (from > 5 mins).  In the unpacked ref repo, it brings 
it down to about the same thing ~10s, but it was only 
starting at about ~20s.

So, I have to ask, what does that change do, I don't quite 
understand it?  Does it just do only one lookup per ref by 
normalizing it?  Is the list still being traversed, just 
about 7 time less now?  Should the packed_ref list simply be 
put in an array which could be binary searched instead, it 
is a fixed list once loaded, no?  

I prototyped a packed_ref implementation using the hash.c 
provided in the git sources and it seemed to speed a 
checkout up to almost instantaneous, but I was getting a few 
collisions so the implementation was not good enough.  That 
is when I started to wonder if an array wouldn't be better 
in this case?



Now I also decided to go back and test a noop fetch (a 
refetch) of all the changes (since this use case is still 
taking way longer than I think it should, even with the 
submodule fix posted earlier).  Up until this point, even 
the sorting fix did not help.  So I tried it with this fix.  
In the unpackref case, it did not seem to change (2~4mins).  
However, in the packed ref change (which was previously also 
about 2-4mins), this now only takes about 10-15s!

Any clues as to why the unpacked refs would still be so slow 
on noop fetches and not be sped up by this?


-Martin


-- 
Employee of Qualcomm Innovation Center, Inc. which is a 
member of Code Aurora Forum

  reply	other threads:[~2011-09-29 16:38 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-09  3:44 Git is not scalable with too many refs/* NAKAMURA Takumi
2011-06-09  6:50 ` Sverre Rabbelier
2011-06-09 15:23   ` Shawn Pearce
2011-06-09 15:52     ` A Large Angry SCM
2011-06-09 15:56       ` Shawn Pearce
2011-06-09 16:26         ` Jeff King
2011-06-10  3:59           ` NAKAMURA Takumi
2011-06-13 22:27             ` Jeff King
2011-06-14  0:17             ` Andreas Ericsson
2011-06-14  0:30               ` Jeff King
2011-06-14  4:41                 ` Junio C Hamano
2011-06-14  7:26                   ` Sverre Rabbelier
2011-06-14 10:02                     ` Johan Herland
2011-06-14 10:34                       ` Sverre Rabbelier
2011-06-14 17:02                       ` Jeff King
2011-06-14 19:20                         ` Shawn Pearce
2011-06-14 19:47                           ` Jeff King
2011-06-14 20:12                             ` Shawn Pearce
2011-09-08 19:53                               ` Martin Fick
2011-09-09  0:52                                 ` Martin Fick
2011-09-09  1:05                                   ` Thomas Rast
2011-09-09  1:13                                     ` Thomas Rast
2011-09-09 15:59                                   ` Jens Lehmann
2011-09-25 20:43                                   ` Martin Fick
2011-09-26 12:41                                     ` Christian Couder
2011-09-26 17:47                                       ` Martin Fick
2011-09-26 18:56                                         ` Christian Couder
2011-09-30 16:41                                           ` Martin Fick
2011-09-30 19:26                                             ` Martin Fick
2011-09-30 21:02                                             ` Martin Fick
2011-09-30 22:06                                               ` Martin Fick
2011-10-01 20:41                                                 ` Junio C Hamano
2011-10-02  5:19                                                   ` Michael Haggerty
2011-10-03  0:46                                                     ` Martin Fick
2011-10-04  8:08                                                       ` Michael Haggerty
2011-10-03 18:12                                                 ` Martin Fick
2011-10-03 19:42                                                   ` Junio C Hamano
2011-10-04  8:16                                                   ` Michael Haggerty
2011-10-08 20:59                                                 ` Martin Fick
2011-10-09  5:43                                                   ` Michael Haggerty
2011-09-28 19:38                                       ` Martin Fick
2011-09-28 22:10                                         ` Martin Fick
2011-09-29  0:54                                           ` Julian Phillips
2011-09-29  1:37                                             ` Martin Fick
2011-09-29  2:19                                               ` Julian Phillips
2011-09-29 16:38                                                 ` Martin Fick [this message]
2011-09-29 18:26                                                   ` Julian Phillips
2011-09-29 18:27                                                 ` René Scharfe
2011-09-29 19:10                                                   ` Junio C Hamano
2011-09-29  4:18                                                     ` [PATCH] refs: Use binary search to lookup refs faster Julian Phillips
2011-09-29 21:57                                                       ` Junio C Hamano
2011-09-29 22:04                                                       ` [PATCH v2] " Julian Phillips
2011-09-29 22:06                                                       ` [PATCH] " Junio C Hamano
2011-09-29 22:11                                                         ` [PATCH v3] " Julian Phillips
2011-09-29 23:48                                                           ` Junio C Hamano
2011-09-30 15:30                                                             ` Michael Haggerty
2011-09-30 16:38                                                               ` Junio C Hamano
2011-09-30 17:56                                                                 ` [PATCH] refs: Remove duplicates after sorting with qsort Julian Phillips
2011-10-02  5:15                                                                 ` [PATCH v3] refs: Use binary search to lookup refs faster Michael Haggerty
2011-10-02  5:45                                                                   ` Junio C Hamano
2011-10-04 20:58                                                                     ` Junio C Hamano
2011-09-30  1:13                                                           ` Martin Fick
2011-09-30  3:44                                                             ` Junio C Hamano
2011-09-30  8:04                                                               ` Julian Phillips
2011-09-30 15:45                                                               ` Martin Fick
2011-09-29 20:44                                                     ` Git is not scalable with too many refs/* Martin Fick
2011-09-29 19:10                                                   ` Julian Phillips
2011-09-29 20:11                                                   ` Martin Fick
2011-09-30  9:12                                                     ` René Scharfe
2011-09-30 16:09                                                       ` Martin Fick
2011-09-30 16:52                                                       ` Junio C Hamano
2011-09-30 18:17                                                         ` René Scharfe
2011-10-01 15:28                                                           ` René Scharfe
2011-10-01 15:38                                                             ` [PATCH 1/8] checkout: check for "Previous HEAD" notice in t2020 René Scharfe
2011-10-01 19:02                                                               ` Sverre Rabbelier
2011-10-01 15:43                                                             ` [PATCH 2/8] revision: factor out add_pending_sha1 René Scharfe
2011-10-01 15:51                                                             ` [PATCH 3/8] checkout: use add_pending_{object,sha1} in orphan check René Scharfe
2011-10-01 15:56                                                             ` [PATCH 4/8] revision: add leak_pending flag René Scharfe
2011-10-01 16:01                                                             ` [PATCH 5/8] bisect: use " René Scharfe
2011-10-01 16:02                                                             ` [PATCH 6/8] bundle: " René Scharfe
2011-10-01 16:09                                                             ` [PATCH 7/8] checkout: " René Scharfe
2011-10-01 16:16                                                             ` [PATCH 8/8] commit: factor out clear_commit_marks_for_object_array René Scharfe
2011-09-26 15:15                                     ` Git is not scalable with too many refs/* Martin Fick
2011-09-26 15:21                                       ` Sverre Rabbelier
2011-09-26 15:48                                         ` Martin Fick
2011-09-26 15:56                                           ` Sverre Rabbelier
2011-09-26 16:38                                             ` Martin Fick
2011-09-26 16:49                                               ` Julian Phillips
2011-09-26 18:07                                       ` Martin Fick
2011-09-26 18:37                                         ` Julian Phillips
2011-09-26 20:01                                           ` Martin Fick
2011-09-26 20:07                                             ` Junio C Hamano
2011-09-26 20:28                                             ` Julian Phillips
2011-09-26 21:39                                               ` Martin Fick
2011-09-26 21:52                                                 ` Martin Fick
2011-09-26 23:26                                                   ` Julian Phillips
2011-09-26 23:37                                                     ` David Michael Barr
2011-09-27  1:01                                                       ` [PATCH] refs.c: Fix slowness with numerous loose refs David Barr
2011-09-27  2:04                                                         ` David Michael Barr
2011-09-26 23:38                                                     ` Git is not scalable with too many refs/* Junio C Hamano
2011-09-27  0:00                                                       ` [PATCH] Don't sort ref_list too early Julian Phillips
2011-10-02  4:58                                                         ` Michael Haggerty
2011-09-27  0:12                                                     ` Git is not scalable with too many refs/* Martin Fick
2011-09-27  0:22                                                       ` Julian Phillips
2011-09-27  2:34                                                         ` Martin Fick
2011-09-27  7:59                                                           ` Julian Phillips
2011-09-27  8:20                                                     ` Sverre Rabbelier
2011-09-27  9:01                                                       ` Julian Phillips
2011-09-27 10:01                                                         ` Sverre Rabbelier
2011-09-27 10:25                                                           ` Nguyen Thai Ngoc Duy
2011-09-27 11:07                                                         ` Michael Haggerty
2011-09-27 12:10                                                           ` Julian Phillips
2011-09-26 22:30                                                 ` Julian Phillips
2011-09-26 15:32                                     ` Michael Haggerty
2011-09-26 15:42                                       ` Martin Fick
2011-09-26 16:25                                         ` Thomas Rast
2011-09-09 13:50                                 ` Michael Haggerty
2011-09-09 15:51                                   ` Michael Haggerty
2011-09-09 16:03                                   ` Jens Lehmann
2011-06-10  7:41         ` Andreas Ericsson
2011-06-10 19:41           ` Shawn Pearce
2011-06-10 20:12             ` Jakub Narebski
2011-06-10 20:35             ` Jeff King
2011-06-13  7:08             ` Andreas Ericsson
2011-06-09 11:18 ` Jakub Narebski
2011-06-09 15:42   ` Stephen Bash

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201109291038.45290.mfick@codeaurora.org \
    --to=mfick@codeaurora.org \
    --cc=chriscool@tuxfamily.org \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=julian@quantumfyre.co.uk \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).