git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Junio C Hamano <junkio@cox.net>
Cc: Nicolas Pitre <nico@cam.org>, Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 2/2] Implement a simple delta_base cache
Date: Sat, 17 Mar 2007 16:09:57 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0703171557360.4964@woody.linux-foundation.org> (raw)
In-Reply-To: <7vfy83qyxh.fsf@assigned-by-dhcp.cox.net>



On Sat, 17 Mar 2007, Junio C Hamano wrote:
> 
> When unpacking a depth-3 deltified object A, the code finds the
> target object A (which is a delta), ask for its base B and put B
> in the cache after using it to reconstitute A.  While doing so,
> the first-generation base B is also a delta so its base C (which
> is a non-delta) is found and placed in the cache.  When A is
> returned, the cache has B and C.  If you ask for B at this
> point, we read the delta, pick up its base C from the cache,
> apply, and return while putting C back in the cache.  If you ask
> for A after that, we do not read from the cache, although it is
> available.

Yes.

I debated that a bit with myself, but decided that:

 (a) it probably doesn't really matter a lot (but I don't have the 
     numbers)

 (b) trying to *also* fill non-delta-base queries from the delta-base 
     cache actually complicates things a lot. Surprisingly much so (the 
     current logic of removing the entry from the cache only to re-insert 
     it after being used made the memory management totally trivial, as 
     you noticed)

 (c) and regardless, we could decide to do a more extensive caching layer 
     later if we really wanted to, and at that point it probably makes 
     more sense to integrate it with the delta-base cache.

     Most git objects are use-once, which is why we really *just* save the 
     flag bits and the SHA1 hash name itself in "struct object", but doing 
     a generic caching layer for object content would likely obviate the 
     need for the current logic to do "save_commit_buffer".

That (c) in particular was what made me think that it's better to keep it 
simple and obvious for now, since even the simple thing largely fixes the 
performance issue.  Almost three seconds I felt bad about, while just over 
a second for something as complex as "git log drivers/usb/" I just cannot 
make myself worry about.

> In any way, your code makes a deeply delitified packfiles a lot
> more practical.  As long as the working set of delta chains fits
> in the cache, after unpacking the longuest delta, the objects on
> the chain can be had by one lookup and one delta application.

Yeah. I think it would be good to probably (separately and as "further 
tweaks"):

 - have somebody actually look at hit-rates for different repositories and 
   hash sizes.

 - possibly allow people to set the hash size as a config option, if it 
   turns out that certain repository layouts or usage scenarios end up 
   preferring bigger caches.

   For example, it may be that for historical archives you might want to 
   have deeper delta queues to make the repository smaller, and if they 
   are big anyway maybe they would prefer to have a larger-than-normal 
   cache as a result. On the other hand, if you are memory-constrained, 
   maybe you'd prefer to re-generate the objects and waste a bit of CPU 
   rather than cache the results.

But neither of the above is really an argument against the patch, just a 
"there's certainly room for more work here if anybody cares".

> Very good job.

I'm pretty happy with the results myself. Partly because the patches just 
ended up looking so *nice*.

		Linus

  reply	other threads:[~2007-03-17 23:10 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-16  1:04 cleaner/better zlib sources? Linus Torvalds
2007-03-16  1:10 ` Shawn O. Pearce
2007-03-16  1:11 ` Jeff Garzik
2007-03-16  1:14   ` Matt Mackall
2007-03-16  1:46   ` Linus Torvalds
2007-03-16  1:54     ` Linus Torvalds
2007-03-16  2:43       ` Davide Libenzi
2007-03-16  2:56         ` Linus Torvalds
2007-03-16  3:16           ` Davide Libenzi
2007-03-16 16:21             ` Linus Torvalds
2007-03-16 16:24               ` Davide Libenzi
2007-03-16 16:35                 ` Linus Torvalds
2007-03-16 19:21                   ` Davide Libenzi
2007-03-17  0:01                     ` Linus Torvalds
2007-03-17  1:11                       ` Linus Torvalds
2007-03-17  3:28                         ` Nicolas Pitre
2007-03-17  5:19                           ` Shawn O. Pearce
2007-03-17 17:55                           ` Linus Torvalds
2007-03-17 19:40                             ` Linus Torvalds
2007-03-17 19:42                               ` [PATCH 1/2] Make trivial wrapper functions around delta base generation and freeing Linus Torvalds
2007-03-17 19:44                               ` [PATCH 2/2] Implement a simple delta_base cache Linus Torvalds
2007-03-17 21:45                                 ` Linus Torvalds
2007-03-17 22:37                                   ` Junio C Hamano
2007-03-17 23:09                                     ` Linus Torvalds [this message]
2007-03-17 23:54                                       ` Linus Torvalds
2007-03-18  1:13                                     ` Nicolas Pitre
2007-03-18  7:47                                       ` Junio C Hamano
2007-03-17 23:12                                   ` Junio C Hamano
2007-03-17 23:24                                     ` Linus Torvalds
2007-03-17 23:52                                       ` Jon Smirl
2007-03-18  1:14                                   ` Morten Welinder
2007-03-18  1:29                                     ` Linus Torvalds
2007-03-18  1:38                                       ` Nicolas Pitre
2007-03-18  1:55                                         ` Linus Torvalds
2007-03-18  2:03                                           ` Nicolas Pitre
2007-03-18  2:20                                             ` Linus Torvalds
2007-03-18  3:00                                               ` Nicolas Pitre
2007-03-18  3:31                                                 ` Linus Torvalds
2007-03-18  5:30                                                   ` Julian Phillips
2007-03-18 17:23                                                     ` Linus Torvalds
2007-03-18 10:53                                                   ` Robin Rosenberg
2007-03-18 17:34                                                     ` Linus Torvalds
2007-03-18 18:29                                                       ` Robin Rosenberg
2007-03-18 21:25                                                         ` Shawn O. Pearce
2007-03-19 13:16                                                         ` David Brodsky
2007-03-20  6:35                                                           ` Robin Rosenberg
2007-03-20  9:13                                                             ` David Brodsky
2007-03-21  2:37                                                               ` Linus Torvalds
2007-03-21  2:54                                                                 ` Nicolas Pitre
2007-03-18  3:06                                               ` [PATCH 3/2] Avoid unnecessary strlen() calls Linus Torvalds
2007-03-18  9:45                                                 ` Junio C Hamano
2007-03-18 15:54                                                   ` Linus Torvalds
2007-03-18 15:57                                                     ` Linus Torvalds
2007-03-18 21:38                                                       ` Shawn O. Pearce
2007-03-18 21:48                                                         ` Linus Torvalds
2007-03-20  3:05                                                     ` Johannes Schindelin
2007-03-20  3:29                                                       ` Shawn O. Pearce
2007-03-20  3:40                                                         ` Shawn O. Pearce
2007-03-20  4:11                                                           ` Linus Torvalds
2007-03-20  4:18                                                             ` Shawn O. Pearce
2007-03-20  4:45                                                               ` Linus Torvalds
2007-03-20  5:44                                                             ` Junio C Hamano
2007-03-20  3:16                                                     ` Junio C Hamano
2007-03-20  4:31                                                       ` Linus Torvalds
2007-03-20  4:39                                                         ` Shawn O. Pearce
2007-03-20  4:57                                                           ` Linus Torvalds
2007-03-18  1:44                                       ` [PATCH 2/2] Implement a simple delta_base cache Linus Torvalds
2007-03-18  6:28                                     ` Avi Kivity
2007-03-17 22:44                                 ` Linus Torvalds
2007-03-16 16:35               ` cleaner/better zlib sources? Jeff Garzik
2007-03-16 16:42                 ` Matt Mackall
2007-03-16 16:51                 ` Linus Torvalds
2007-03-16 17:12                 ` Nicolas Pitre
2007-03-16 23:22                 ` Shawn O. Pearce
2007-03-16 17:06               ` Nicolas Pitre
2007-03-16 17:51                 ` Linus Torvalds
2007-03-16 18:09                   ` Nicolas Pitre
2007-03-16  1:33 ` Davide Libenzi
2007-03-16  2:06   ` Davide Libenzi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0703171557360.4964@woody.linux-foundation.org \
    --to=torvalds@linux-foundation.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).