git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* "git reflog expire --all" very slow
@ 2009-03-31  1:43 Linus Torvalds
  2009-03-31  4:34 ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2009-03-31  1:43 UTC (permalink / raw)
  To: Junio C Hamano, Brandon Casey, Johannes Schindelin; +Cc: Git Mailing List


I haven't checked in detail what is up, but I just did a "git gc --prune", 
and it was quiet for about half a minute before anything seemed to happen.

Very irritating, as normally the expensive stuff at least gives you some 
kind of indication of what it's doing.

It turns out that it's the reflog expiration. On my crazy beefy 
Nehalem machine:

	[torvalds@nehalem linux]$ time git reflog expire --all

	real	0m37.596s
	user	0m37.554s
	sys	0m0.040s

and that really isn't good. 37 cpu-seconds on this machine is like half a 
decade on some laptops I could name.

The flat pgprof for this thing (user-land oprofile isn't doing Nehalem 
yet) looks like this:

      %   cumulative   self              self     total           
     time   seconds   seconds    calls   s/call   s/call  name    
     60.94     30.24    30.24 301120211     0.00     0.00  interesting
     12.37     36.38     6.14 301338513     0.00     0.00  insert_by_date
     11.35     42.01     5.63     8776     0.00     0.00  clear_commit_marks
      9.96     46.95     4.94     4388     0.00     0.01  merge_bases_many
      2.16     48.02     1.07 301486366     0.00     0.00  commit_list_insert
      1.21     48.62     0.60 301329737     0.00     0.00  parse_commit
      0.87     49.05     0.43 301637945     0.00     0.00  xmalloc
      0.34     49.22     0.17       24     0.01     0.01  xstrdup
      ...

Ok, so my reflog on this thing has 1583 entries on HEAD (yes, in the last 
90 days, the problem is _not_ that I have a long reflog and am pruning it, 
it _is_ already pruned). Add to that the reflogs for the branches (mainly 
master: 1294), and you end up with apparently a nice total of 4388 reflog 
entries.

And then it looks like for _each_ reflog entry we have:

  expire_reflog_ent()
    in_merge_bases()

which then calls 

  get_merge_bases()
    get_merge_bases_many()
      ..

each of which probably often traverses an appreciable part of the kernel 
tree, since my reflog entries are often merges, and the merge bases need 
easily thousands of commits to look up.

Which explains how you end up with 301 _million_ commits inserted into the 
lists and checked if they are interesting. Since the whole kernel tree has 
only something like 140k commits, and my revlog doesn't even go back more 
than three months, I guess that means that we'll be traversing the same 
commits tens of thousands of times each.

Even on this machine, that whole cluster-f*ck takes a little while. Oops.

I have not checked if there is anything really obvious going on that could 
change that whole logic that causes us to do merge-bases into something 
saner, since the reflog code is not a part of git I'm familiar with. 

Instead, I'm just sending this to Junio, Brandon, and Dscho, who are 
getting the main blame for 'builtin-reflog.c'. Although I'm pretty sure 
this is all Junio, but just in case..

			Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-04-02 15:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-31  1:43 "git reflog expire --all" very slow Linus Torvalds
2009-03-31  4:34 ` Junio C Hamano
2009-03-31  5:09   ` Linus Torvalds
2009-03-31  5:24     ` Junio C Hamano
2009-03-31  5:42       ` Linus Torvalds
2009-03-31  5:57         ` Junio C Hamano
2009-03-31  5:50       ` Junio C Hamano
2009-03-31  5:38     ` Linus Torvalds
2009-03-31  5:50       ` Linus Torvalds
2009-03-31  5:51         ` Linus Torvalds
2009-04-02  6:46           ` Junio C Hamano
2009-04-02 15:30             ` Linus Torvalds
2009-03-31  6:08         ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).