git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Martin Fick <mfick@codeaurora.org>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: How to still kill git fetch with too many refs
Date: Tue, 02 Jul 2013 11:24:14 +0200	[thread overview]
Message-ID: <51D29C3E.5070600@alum.mit.edu> (raw)
In-Reply-To: <201307012102.31384.mfick@codeaurora.org>

On 07/02/2013 05:02 AM, Martin Fick wrote:
> I have often reported problems with git fetch when there are 
> many refs in a repo, and I have been pleasantly surprised 
> how many problems I reported were so quickly fixed. :) With 
> time, others have created various synthetic test cases to 
> ensure that git can handle many many refs.  A simple 
> synthetic test case with 1M refs all pointing to the same 
> sha1 seems to be easily handled by git these days.  However, 
> in our experience with our internal git repo, we still have 
> performance issues related to having too many refs, in our 
> kernel/msm instance we have around 400K.
> 
> When I tried the simple synthetic test case and could not 
> reproduce bad results, so I tried something just a little 
> more complex and was able to get atrocious results!!! 
> Basically, I generate a packed-refs files with many refs 
> which each point to a different sha1.  To get a list of 
> valid but unique sha1s for the repo, I simply used rev-list.  
> The result, a copy of linus' repo with a million unique 
> valid refs and a git fetch of a single updated ref taking a 
> very long time (55mins and it did not complete yet).  Note, 
> with 100K refs it completes in about 2m40s.  It is likely 
> not linear since 2m40s * 10 would be ~26m (but the 
> difference could also just be how the data in the sha1s are 
> ordered).
> 
> 
> Here is my small reproducible test case for this issue:
> 
> git clone 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> cp -rp linux linux.1Mrefs-revlist
> 
> cd linux
> echo "Hello" > hello ; git add hello ; git ci -a -m 'hello'
> cd ..
> 
> cd linux.1Mrefs-revlist
> git rev-list HEAD | for nn in $(seq 0 100) ; do for c in 
> $(seq 0 10000) ; do  read sha ; echo $sha refs/c/$nn/$c$nn ; 
> done ; done > .git/packed-refs

I believe this generates a packed-refs file that is not sorted
lexicographically by refname, whereas all Git-generated packed-refs
files are sorted.  There are some optimizations in refs.c for adding
references in order that might therefore be circumvented by your
unsorted file.  Please try sorting the file by refname and see if that
helps.  (You can do so by deleting one of the packed references; then
git will sort the remainder while rewriting the file.)

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

  parent reply	other threads:[~2013-07-02  9:24 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-02  3:02 How to still kill git fetch with too many refs Martin Fick
2013-07-02  4:07 ` Jeff King
2013-07-02  4:41   ` Jeff King
2013-07-02  5:01     ` Jeff King
2013-07-02  5:19       ` Junio C Hamano
2013-07-02  5:28         ` Jeff King
2013-07-02  6:11           ` [PATCH 0/3] avoid quadratic behavior in fetch-pack Jeff King
2013-07-02  6:16             ` [PATCH 1/3] fetch-pack: avoid quadratic list insertion in mark_complete Jeff King
2013-07-02  6:21             ` [PATCH 2/3] commit.c: make compare_commits_by_commit_date global Jeff King
2013-07-02  6:24             ` [PATCH 3/3] fetch-pack: avoid quadratic behavior in rev_list_push Jeff King
2013-07-02  7:52               ` Eric Sunshine
2013-07-02 17:45             ` [PATCH 0/3] avoid quadratic behavior in fetch-pack Martin Fick
2013-07-02 17:52       ` How to still kill git fetch with too many refs Brandon Casey
2013-07-02  9:24 ` Michael Haggerty [this message]
2013-07-02 16:58   ` Martin Fick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51D29C3E.5070600@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mfick@codeaurora.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).