git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Fick <mfick@codeaurora.org>
To: git@vger.kernel.org
Subject: How to still kill git fetch with too many refs
Date: Mon, 1 Jul 2013 21:02:31 -0600	[thread overview]
Message-ID: <201307012102.31384.mfick@codeaurora.org> (raw)

I have often reported problems with git fetch when there are 
many refs in a repo, and I have been pleasantly surprised 
how many problems I reported were so quickly fixed. :) With 
time, others have created various synthetic test cases to 
ensure that git can handle many many refs.  A simple 
synthetic test case with 1M refs all pointing to the same 
sha1 seems to be easily handled by git these days.  However, 
in our experience with our internal git repo, we still have 
performance issues related to having too many refs, in our 
kernel/msm instance we have around 400K.

When I tried the simple synthetic test case and could not 
reproduce bad results, so I tried something just a little 
more complex and was able to get atrocious results!!! 
Basically, I generate a packed-refs files with many refs 
which each point to a different sha1.  To get a list of 
valid but unique sha1s for the repo, I simply used rev-list.  
The result, a copy of linus' repo with a million unique 
valid refs and a git fetch of a single updated ref taking a 
very long time (55mins and it did not complete yet).  Note, 
with 100K refs it completes in about 2m40s.  It is likely 
not linear since 2m40s * 10 would be ~26m (but the 
difference could also just be how the data in the sha1s are 
ordered).


Here is my small reproducible test case for this issue:

git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
cp -rp linux linux.1Mrefs-revlist

cd linux
echo "Hello" > hello ; git add hello ; git ci -a -m 'hello'
cd ..

cd linux.1Mrefs-revlist
git rev-list HEAD | for nn in $(seq 0 100) ; do for c in 
$(seq 0 10000) ; do  read sha ; echo $sha refs/c/$nn/$c$nn ; 
done ; done > .git/packed-refs

time git fetch file:///$(dirname $PWD)/linux 
refs/heads/master

Any insights as to why it is so slow, and how we could 
possibly speed it up?

Thanks,

-Martin

PS: My tests were performed with git version 1.8.2.1 on 
linux 2.6.32-37-generic #81-Ubuntu SMP 


-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation
 

             reply	other threads:[~2013-07-02  3:02 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-02  3:02 Martin Fick [this message]
2013-07-02  4:07 ` How to still kill git fetch with too many refs Jeff King
2013-07-02  4:41   ` Jeff King
2013-07-02  5:01     ` Jeff King
2013-07-02  5:19       ` Junio C Hamano
2013-07-02  5:28         ` Jeff King
2013-07-02  6:11           ` [PATCH 0/3] avoid quadratic behavior in fetch-pack Jeff King
2013-07-02  6:16             ` [PATCH 1/3] fetch-pack: avoid quadratic list insertion in mark_complete Jeff King
2013-07-02  6:21             ` [PATCH 2/3] commit.c: make compare_commits_by_commit_date global Jeff King
2013-07-02  6:24             ` [PATCH 3/3] fetch-pack: avoid quadratic behavior in rev_list_push Jeff King
2013-07-02  7:52               ` Eric Sunshine
2013-07-02 17:45             ` [PATCH 0/3] avoid quadratic behavior in fetch-pack Martin Fick
2013-07-02 17:52       ` How to still kill git fetch with too many refs Brandon Casey
2013-07-02  9:24 ` Michael Haggerty
2013-07-02 16:58   ` Martin Fick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201307012102.31384.mfick@codeaurora.org \
    --to=mfick@codeaurora.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).