All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Fick <mfick@codeaurora.org>
To: Julian Phillips <julian@quantumfyre.co.uk>
Cc: Christian Couder <christian.couder@gmail.com>,
	git@vger.kernel.org, Christian Couder <chriscool@tuxfamily.org>,
	Thomas Rast <trast@student.ethz.ch>
Subject: Re: Git is not scalable with too many refs/*
Date: Thu, 29 Sep 2011 10:38:44 -0600	[thread overview]
Message-ID: <201109291038.45290.mfick@codeaurora.org> (raw)
In-Reply-To: <7c0105c6cca7dd0aa336522f90617fe4@quantumfyre.co.uk>

On Wednesday, September 28, 2011 08:19:16 pm Julian Phillips 
wrote:
> On Wed, 28 Sep 2011 19:37:18 -0600, Martin Fick wrote:
> > On Wednesday 28 September 2011 18:59:09 Martin Fick 
wrote:
> >> Julian Phillips <julian@quantumfyre.co.uk> wrote:
> -- snip --
> 
> >> I've created a test repo with ~100k refs/changes/...
> >> style refs, and ~40000 refs/heads/... style refs, and
> >> checkout can walk the list of ~140k refs seven times
> >> in 85ms user time including doing whatever other
> >> processing is needed for checkout. The real time is
> >> only 114ms - but then my test repo has no real data
> >> in.
> > 
> > If I understand what you are saying, it sounds like you
> > do not have a very good test case. The amount of time
> > it takes for checkout depends on how long it takes to
> > find a ref with the sha1 that you are on. If that sha1
> > is so early in the list of refs that it only took you
> > 7 traversals to find it, then that is not a very good
> > testcase. I think that you should probably try making
> > an orphaned ref (checkout a detached head, commit to
> > it), that is probably the worst testcase since it
> > should then have to search all 140K refs to eventually
> > give up.
> > 
> > Again, if I understand what you are saying, if it took
> > 85ms for 7 traversals, then it takes approximately
> > 10ms per traversal, that's only 100/s! If you have to
> > traverse it 140K times, that should work out to 1400s
> > ~ 23mins.
> 
> Well, it's no more than 10ms per traversal - since the
> rest of the work presumably takes some time too ...
> 
> However, I had forgotten to make the orphaned commit as
> you suggest - and then _bang_ 7N^2, it tries seven
> different variants of each ref (which is silly as they
> are all fully qualified), and with packed refs it has to
> search for them each time, all to turn names into hashes
> that we already know to start with.
> 
> So, yes - it is that list traversal.
> 
> Does the following help?
> 
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 5e356a6..f0f4ca1 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -605,7 +605,7 @@ static int
> add_one_ref_to_rev_list_arg(const char *refname,
>                                         int flags,
>                                         void *cb_data)
>   {
> -       add_one_rev_list_arg(cb_data, refname);
> +       add_one_rev_list_arg(cb_data,
> strdup(sha1_to_hex(sha1))); return 0;
>   }


Yes, but in some strange ways. :)

First, let me clarify that all the tests here involve your 
"sort fix" from 2 days ago applied first.

In the packed ref repo, it brings the time down to about 
~10s (from > 5 mins).  In the unpacked ref repo, it brings 
it down to about the same thing ~10s, but it was only 
starting at about ~20s.

So, I have to ask, what does that change do, I don't quite 
understand it?  Does it just do only one lookup per ref by 
normalizing it?  Is the list still being traversed, just 
about 7 time less now?  Should the packed_ref list simply be 
put in an array which could be binary searched instead, it 
is a fixed list once loaded, no?  

I prototyped a packed_ref implementation using the hash.c 
provided in the git sources and it seemed to speed a 
checkout up to almost instantaneous, but I was getting a few 
collisions so the implementation was not good enough.  That 
is when I started to wonder if an array wouldn't be better 
in this case?



Now I also decided to go back and test a noop fetch (a 
refetch) of all the changes (since this use case is still 
taking way longer than I think it should, even with the 
submodule fix posted earlier).  Up until this point, even 
the sorting fix did not help.  So I tried it with this fix.  
In the unpackref case, it did not seem to change (2~4mins).  
However, in the packed ref change (which was previously also 
about 2-4mins), this now only takes about 10-15s!

Any clues as to why the unpacked refs would still be so slow 
on noop fetches and not be sped up by this?


-Martin


-- 
Employee of Qualcomm Innovation Center, Inc. which is a 
member of Code Aurora Forum

  reply	other threads:[~2011-09-29 16:38 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-09  3:44 Git is not scalable with too many refs/* NAKAMURA Takumi
2011-06-09  6:50 ` Sverre Rabbelier
2011-06-09 15:23   ` Shawn Pearce
2011-06-09 15:52     ` A Large Angry SCM
2011-06-09 15:56       ` Shawn Pearce
2011-06-09 16:26         ` Jeff King
2011-06-10  3:59           ` NAKAMURA Takumi
2011-06-13 22:27             ` Jeff King
2011-06-14  0:17             ` Andreas Ericsson
2011-06-14  0:30               ` Jeff King
2011-06-14  4:41                 ` Junio C Hamano
2011-06-14  7:26                   ` Sverre Rabbelier
2011-06-14 10:02                     ` Johan Herland
2011-06-14 10:34                       ` Sverre Rabbelier
2011-06-14 17:02                       ` Jeff King
2011-06-14 19:20                         ` Shawn Pearce
2011-06-14 19:47                           ` Jeff King
2011-06-14 20:12                             ` Shawn Pearce
2011-09-08 19:53                               ` Martin Fick
2011-09-09  0:52                                 ` Martin Fick
2011-09-09  1:05                                   ` Thomas Rast
2011-09-09  1:13                                     ` Thomas Rast
2011-09-09 15:59                                   ` Jens Lehmann
2011-09-25 20:43                                   ` Martin Fick
2011-09-26 12:41                                     ` Christian Couder
2011-09-26 17:47                                       ` Martin Fick
2011-09-26 18:56                                         ` Christian Couder
2011-09-30 16:41                                           ` Martin Fick
2011-09-30 19:26                                             ` Martin Fick
2011-09-30 21:02                                             ` Martin Fick
2011-09-30 22:06                                               ` Martin Fick
2011-10-01 20:41                                                 ` Junio C Hamano
2011-10-02  5:19                                                   ` Michael Haggerty
2011-10-03  0:46                                                     ` Martin Fick
2011-10-04  8:08                                                       ` Michael Haggerty
2011-10-03 18:12                                                 ` Martin Fick
2011-10-03 19:42                                                   ` Junio C Hamano
2011-10-04  8:16                                                   ` Michael Haggerty
2011-10-08 20:59                                                 ` Martin Fick
2011-10-09  5:43                                                   ` Michael Haggerty
2011-09-28 19:38                                       ` Martin Fick
2011-09-28 22:10                                         ` Martin Fick
2011-09-29  0:54                                           ` Julian Phillips
2011-09-29  1:37                                             ` Martin Fick
2011-09-29  2:19                                               ` Julian Phillips
2011-09-29 16:38                                                 ` Martin Fick [this message]
2011-09-29 18:26                                                   ` Julian Phillips
2011-09-29 18:27                                                 ` René Scharfe
2011-09-29 19:10                                                   ` Junio C Hamano
2011-09-29  4:18                                                     ` [PATCH] refs: Use binary search to lookup refs faster Julian Phillips
2011-09-29 21:57                                                       ` Junio C Hamano
2011-09-29 22:04                                                       ` [PATCH v2] " Julian Phillips
2011-09-29 22:06                                                       ` [PATCH] " Junio C Hamano
2011-09-29 22:11                                                         ` [PATCH v3] " Julian Phillips
2011-09-29 23:48                                                           ` Junio C Hamano
2011-09-30 15:30                                                             ` Michael Haggerty
2011-09-30 16:38                                                               ` Junio C Hamano
2011-09-30 17:56                                                                 ` [PATCH] refs: Remove duplicates after sorting with qsort Julian Phillips
2011-10-02  5:15                                                                 ` [PATCH v3] refs: Use binary search to lookup refs faster Michael Haggerty
2011-10-02  5:45                                                                   ` Junio C Hamano
2011-10-04 20:58                                                                     ` Junio C Hamano
2011-09-30  1:13                                                           ` Martin Fick
2011-09-30  3:44                                                             ` Junio C Hamano
2011-09-30  8:04                                                               ` Julian Phillips
2011-09-30 15:45                                                               ` Martin Fick
2011-09-29 20:44                                                     ` Git is not scalable with too many refs/* Martin Fick
2011-09-29 19:10                                                   ` Julian Phillips
2011-09-29 20:11                                                   ` Martin Fick
2011-09-30  9:12                                                     ` René Scharfe
2011-09-30 16:09                                                       ` Martin Fick
2011-09-30 16:52                                                       ` Junio C Hamano
2011-09-30 18:17                                                         ` René Scharfe
2011-10-01 15:28                                                           ` René Scharfe
2011-10-01 15:38                                                             ` [PATCH 1/8] checkout: check for "Previous HEAD" notice in t2020 René Scharfe
2011-10-01 19:02                                                               ` Sverre Rabbelier
2011-10-01 15:43                                                             ` [PATCH 2/8] revision: factor out add_pending_sha1 René Scharfe
2011-10-01 15:51                                                             ` [PATCH 3/8] checkout: use add_pending_{object,sha1} in orphan check René Scharfe
2011-10-01 15:56                                                             ` [PATCH 4/8] revision: add leak_pending flag René Scharfe
2011-10-01 16:01                                                             ` [PATCH 5/8] bisect: use " René Scharfe
2011-10-01 16:02                                                             ` [PATCH 6/8] bundle: " René Scharfe
2011-10-01 16:09                                                             ` [PATCH 7/8] checkout: " René Scharfe
2011-10-01 16:16                                                             ` [PATCH 8/8] commit: factor out clear_commit_marks_for_object_array René Scharfe
2011-09-26 15:15                                     ` Git is not scalable with too many refs/* Martin Fick
2011-09-26 15:21                                       ` Sverre Rabbelier
2011-09-26 15:48                                         ` Martin Fick
2011-09-26 15:56                                           ` Sverre Rabbelier
2011-09-26 16:38                                             ` Martin Fick
2011-09-26 16:49                                               ` Julian Phillips
2011-09-26 18:07                                       ` Martin Fick
2011-09-26 18:37                                         ` Julian Phillips
2011-09-26 20:01                                           ` Martin Fick
2011-09-26 20:07                                             ` Junio C Hamano
2011-09-26 20:28                                             ` Julian Phillips
2011-09-26 21:39                                               ` Martin Fick
2011-09-26 21:52                                                 ` Martin Fick
2011-09-26 23:26                                                   ` Julian Phillips
2011-09-26 23:37                                                     ` David Michael Barr
2011-09-27  1:01                                                       ` [PATCH] refs.c: Fix slowness with numerous loose refs David Barr
2011-09-27  2:04                                                         ` David Michael Barr
2011-09-26 23:38                                                     ` Git is not scalable with too many refs/* Junio C Hamano
2011-09-27  0:00                                                       ` [PATCH] Don't sort ref_list too early Julian Phillips
2011-10-02  4:58                                                         ` Michael Haggerty
2011-09-27  0:12                                                     ` Git is not scalable with too many refs/* Martin Fick
2011-09-27  0:22                                                       ` Julian Phillips
2011-09-27  2:34                                                         ` Martin Fick
2011-09-27  7:59                                                           ` Julian Phillips
2011-09-27  8:20                                                     ` Sverre Rabbelier
2011-09-27  9:01                                                       ` Julian Phillips
2011-09-27 10:01                                                         ` Sverre Rabbelier
2011-09-27 10:25                                                           ` Nguyen Thai Ngoc Duy
2011-09-27 11:07                                                         ` Michael Haggerty
2011-09-27 12:10                                                           ` Julian Phillips
2011-09-26 22:30                                                 ` Julian Phillips
2011-09-26 15:32                                     ` Michael Haggerty
2011-09-26 15:42                                       ` Martin Fick
2011-09-26 16:25                                         ` Thomas Rast
2011-09-09 13:50                                 ` Michael Haggerty
2011-09-09 15:51                                   ` Michael Haggerty
2011-09-09 16:03                                   ` Jens Lehmann
2011-06-10  7:41         ` Andreas Ericsson
2011-06-10 19:41           ` Shawn Pearce
2011-06-10 20:12             ` Jakub Narebski
2011-06-10 20:35             ` Jeff King
2011-06-13  7:08             ` Andreas Ericsson
2011-06-09 11:18 ` Jakub Narebski
2011-06-09 15:42   ` Stephen Bash

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201109291038.45290.mfick@codeaurora.org \
    --to=mfick@codeaurora.org \
    --cc=chriscool@tuxfamily.org \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=julian@quantumfyre.co.uk \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.