git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
	Junio C Hamano <junkio@cox.net>,
	git@vger.kernel.org
Subject: Re: [PATCH] Limit the size of the new delta_base_cache
Date: Mon, 19 Mar 2007 14:08:36 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.0.83.0703191332240.18328@xanadu.home> (raw)
In-Reply-To: <Pine.LNX.4.64.0703191008400.6730@woody.linux-foundation.org>


First this is Shawn's patch not mine.  But I still don't think it is 
that bad even as is, and it was already merged by Junio.  I believe it 
can be made better without segregating big objects though.

On Mon, 19 Mar 2007, Linus Torvalds wrote:

> And no, I don't have any "hard numbers" to back this up, but if you want 
> another argument, realize that delta chains are limited to a depth of ten, 
> and if a *single* chain can overflow the delta-chain cache, then the cache 
> ends up being 100% *in*effective.

No.  You forget what I said about delta direction.

Deltas are meant to be ordered so a newer object should be a base for an 
older objects.  And we usually parse objects from newer to older ones 
too.

So let's suppose that the cache has only two entries.  If there was only 
one delta chain in the pack and it is ideally laid out so the delta 
chain is effectively going deeper with older objects as we currently 
mean it to, then the cache would still be perfectly effective because, 
as we move further back in history, objects deeper in the delta chain 
will be requested, and the delta base a single level up will be there to 
be used just before being evicted.  And so on all through the delta 
chain.

> And do the math: if the object size is 
> 10-20% of the total allowed cache size, how many objects in the chain are 
> you going to fit in the cache?

Only 5 objects in the 20% case.  But as I say you could effectively have 
only _one_ entry in the cache and it would still be 100% effective in 
the single perfect delta chain I described above.

Of course in practice there is more than a single delta chain, and the 
delta direction might be the other way around.  But on _average_ it 
should work fine pretty well.

And notice how blobs are the first victims for cache eviction.  So for a 
mixed tree+blob usage case, if big blobs are to trash the cache, then 
with LRU eviction performance should auto-regulate itself quite fine.  
Of course a single blob might clear the cache entirely in the worst 
case.  But there is a possibility that this big blob could be the base 
for the next delta.  Adding a size treshold simply throw that 
possibility away in which case your performance will suffer _anyway_ 
because you'll experience the O(n**2) behavior with that delta chain of 
big blobs.  Compared to that the lost cache of small objects won't even 
be on the radar.  But if you let large objects reach the cache then 
there is at least some possibility that those large objects might be the 
next wanted delta base effectively saving big time on performance 
especially if that base was itself a deep delta object.

So to resume:

 - the issue concerns only operations needing blob parsing

 - if large deltified blobs are denied the cache you will _always_ have
   a sudden and drastic performance degradation when the size treshold 
   is crossed as the O(n**2) behavior on those big blob deltas will 
   dominate the profile

 - if you instead don't impose any treshold then performance will 
   degrade smoothly with object size down to the absolute worst case of 
   a single object in the cache, and for some cases it might even be 
   quite effective!

I think it is therefore best to let the cache regulate itself with LRU 
eviction than using arbitrary size treshold to segregate objects.  The 
cache will quickly repopulate itself if the big blob happens to be  an 
odd ball in a repository anyway.


Nicolas

  reply	other threads:[~2007-03-19 18:08 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-19  5:14 [PATCH] Limit the size of the new delta_base_cache Shawn O. Pearce
2007-03-19 16:10 ` Linus Torvalds
2007-03-19 16:41   ` Nicolas Pitre
2007-03-19 16:54     ` Nicolas Pitre
2007-03-19 17:18       ` Linus Torvalds
2007-03-19 17:07     ` Linus Torvalds
2007-03-19 17:17       ` Linus Torvalds
2007-03-19 18:08         ` Nicolas Pitre [this message]
  -- strict thread matches above, loose matches on Subject: below --
2007-03-19  4:48 Shawn O. Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.0.83.0703191332240.18328@xanadu.home \
    --to=nico@cam.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=spearce@spearce.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).