From: Linus Torvalds <torvalds@linux-foundation.org>
To: Junio C Hamano <gitster@pobox.com>,
Brandon Casey <casey@nrlssc.navy.mil>,
Johannes Schindelin <johannes.schindelin@gmx.de>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: "git reflog expire --all" very slow
Date: Mon, 30 Mar 2009 18:43:59 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.2.00.0903301803190.4093@localhost.localdomain> (raw)
I haven't checked in detail what is up, but I just did a "git gc --prune",
and it was quiet for about half a minute before anything seemed to happen.
Very irritating, as normally the expensive stuff at least gives you some
kind of indication of what it's doing.
It turns out that it's the reflog expiration. On my crazy beefy
Nehalem machine:
[torvalds@nehalem linux]$ time git reflog expire --all
real 0m37.596s
user 0m37.554s
sys 0m0.040s
and that really isn't good. 37 cpu-seconds on this machine is like half a
decade on some laptops I could name.
The flat pgprof for this thing (user-land oprofile isn't doing Nehalem
yet) looks like this:
% cumulative self self total
time seconds seconds calls s/call s/call name
60.94 30.24 30.24 301120211 0.00 0.00 interesting
12.37 36.38 6.14 301338513 0.00 0.00 insert_by_date
11.35 42.01 5.63 8776 0.00 0.00 clear_commit_marks
9.96 46.95 4.94 4388 0.00 0.01 merge_bases_many
2.16 48.02 1.07 301486366 0.00 0.00 commit_list_insert
1.21 48.62 0.60 301329737 0.00 0.00 parse_commit
0.87 49.05 0.43 301637945 0.00 0.00 xmalloc
0.34 49.22 0.17 24 0.01 0.01 xstrdup
...
Ok, so my reflog on this thing has 1583 entries on HEAD (yes, in the last
90 days, the problem is _not_ that I have a long reflog and am pruning it,
it _is_ already pruned). Add to that the reflogs for the branches (mainly
master: 1294), and you end up with apparently a nice total of 4388 reflog
entries.
And then it looks like for _each_ reflog entry we have:
expire_reflog_ent()
in_merge_bases()
which then calls
get_merge_bases()
get_merge_bases_many()
..
each of which probably often traverses an appreciable part of the kernel
tree, since my reflog entries are often merges, and the merge bases need
easily thousands of commits to look up.
Which explains how you end up with 301 _million_ commits inserted into the
lists and checked if they are interesting. Since the whole kernel tree has
only something like 140k commits, and my revlog doesn't even go back more
than three months, I guess that means that we'll be traversing the same
commits tens of thousands of times each.
Even on this machine, that whole cluster-f*ck takes a little while. Oops.
I have not checked if there is anything really obvious going on that could
change that whole logic that causes us to do merge-bases into something
saner, since the reflog code is not a part of git I'm familiar with.
Instead, I'm just sending this to Junio, Brandon, and Dscho, who are
getting the main blame for 'builtin-reflog.c'. Although I'm pretty sure
this is all Junio, but just in case..
Linus
next reply other threads:[~2009-03-31 1:48 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-31 1:43 Linus Torvalds [this message]
2009-03-31 4:34 ` "git reflog expire --all" very slow Junio C Hamano
2009-03-31 5:09 ` Linus Torvalds
2009-03-31 5:24 ` Junio C Hamano
2009-03-31 5:42 ` Linus Torvalds
2009-03-31 5:57 ` Junio C Hamano
2009-03-31 5:50 ` Junio C Hamano
2009-03-31 5:38 ` Linus Torvalds
2009-03-31 5:50 ` Linus Torvalds
2009-03-31 5:51 ` Linus Torvalds
2009-04-02 6:46 ` Junio C Hamano
2009-04-02 15:30 ` Linus Torvalds
2009-03-31 6:08 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.00.0903301803190.4093@localhost.localdomain \
--to=torvalds@linux-foundation.org \
--cc=casey@nrlssc.navy.mil \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).