git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Junio C Hamano <junkio@cox.net>, Nicolas Pitre <nico@cam.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 2/2] Implement a simple delta_base cache
Date: Sat, 17 Mar 2007 14:45:49 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0703171420420.4964@woody.linux-foundation.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0703171242180.4964@woody.linux-foundation.org>



On Sat, 17 Mar 2007, Linus Torvalds wrote:
> 
> Instead of always re-generating the delta bases (possibly over and over 
> and over again), just cache the last few ones. They often can get re-used.

Not just to compare actual timings, this shows the difference in the 
traces I did. Remember, before we had:

	[torvalds@woody linux]$ grep Needs delta-base-trace | wc -l
	469334
	[torvalds@woody linux]$ grep Needs delta-base-trace |sort -u | wc -l
	21933

and now with the simple cache, I get:

	[torvalds@woody linux]$ grep Needs delta-base-trace-new | wc -l
	28688
	[torvalds@woody linux]$ grep Needs delta-base-trace-new | sort -u | wc -l
	21933

ie, we still re-generate some of the objects multiple times, but now, 
rather than generating them (on average) 20+ times each, we now generate 
them an average of just 1.3 times each. Which explains why the wall-time 
goes down by over a factor of two.

Changing the (statically sized) cache from 256 entries to 1024 (and 
updating the hash function appropriately of course) gets the number down 
to 23953 delta-base lookups (the number of unique ones obviously stays the 
same), for an average of just 1.1 object generates per unique object, and 
also means that you occasionally get sub-second times for my test-case of 
logging drivers/usb/.

It all also means that libz isn't really even the top entry in the 
profiles any more, although it's still pretty high. But the profile now 
says:

	samples  %        app name                 symbol name
	41527    15.6550  git                      strlen
	30215    11.3905  git                      inflate
	27504    10.3685  git                      inflate_table
	20321     7.6607  git                      find_pack_entry_one
	16892     6.3680  git                      interesting
	16259     6.1294  vmlinux                  __copy_user_nocache
	16010     6.0355  git                      inflate_fast
	9240      3.4833  git                      get_mode
	8863      3.3412  git                      tree_entry_extract
	7145      2.6935  git                      strncmp
	7131      2.6883  git                      memcpy
	6863      2.5872  git                      diff_tree
	6113      2.3045  git                      adler32
	4515      1.7021  git                      _int_malloc
	3022      1.1392  git                      update_tree_entry
	...

(Adding up all of libz is still ~31%, but it's lower as a percentage *and* 
it's obviously a smaller percentage of a much lower absolute time, so the 
zlib overhead went down much more than any other git overheads did)

In general, this all seems very cool. The patches are simple enough that I 
think this is very safe to merge indeed: the only question I have is that 
somebody should verify that the "struct packed_git *p" is stable over the 
whole lifetime of a process - so that we can use it as a hash key without 
having to invalidate hashes if we unmap a pack (I *think* we just unmap 
the virtual mapping, and "struct packed_git *" stays valid, but Junio 
should ack that for me).

Here's the trivial patch to extend the caching to 1k entries if somebody 
cares. I don't know if the small added performance is worth it.

		Linus
---
diff --git a/sha1_file.c b/sha1_file.c
index a7e3a2a..372af60 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1352,7 +1352,7 @@ static void *unpack_compressed_entry(struct packed_git *p,
 	return buffer;
 }
 
-#define MAX_DELTA_CACHE (256)
+#define MAX_DELTA_CACHE (1024)
 
 static struct delta_base_cache_entry {
 	struct packed_git *p;
@@ -1367,8 +1367,8 @@ static unsigned long pack_entry_hash(struct packed_git *p, off_t base_offset)
 	unsigned long hash;
 
 	hash = (unsigned long)p + (unsigned long)base_offset;
-	hash += (hash >> 8) + (hash >> 16);
-	return hash & 0xff;
+	hash += (hash >> 10) + (hash >> 20);
+	return hash & (MAX_DELTA_CACHE-1);
 }
 
 static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset,

  reply	other threads:[~2007-03-17 21:46 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-16  1:04 cleaner/better zlib sources? Linus Torvalds
2007-03-16  1:10 ` Shawn O. Pearce
2007-03-16  1:11 ` Jeff Garzik
2007-03-16  1:14   ` Matt Mackall
2007-03-16  1:46   ` Linus Torvalds
2007-03-16  1:54     ` Linus Torvalds
2007-03-16  2:43       ` Davide Libenzi
2007-03-16  2:56         ` Linus Torvalds
2007-03-16  3:16           ` Davide Libenzi
2007-03-16 16:21             ` Linus Torvalds
2007-03-16 16:24               ` Davide Libenzi
2007-03-16 16:35                 ` Linus Torvalds
2007-03-16 19:21                   ` Davide Libenzi
2007-03-17  0:01                     ` Linus Torvalds
2007-03-17  1:11                       ` Linus Torvalds
2007-03-17  3:28                         ` Nicolas Pitre
2007-03-17  5:19                           ` Shawn O. Pearce
2007-03-17 17:55                           ` Linus Torvalds
2007-03-17 19:40                             ` Linus Torvalds
2007-03-17 19:42                               ` [PATCH 1/2] Make trivial wrapper functions around delta base generation and freeing Linus Torvalds
2007-03-17 19:44                               ` [PATCH 2/2] Implement a simple delta_base cache Linus Torvalds
2007-03-17 21:45                                 ` Linus Torvalds [this message]
2007-03-17 22:37                                   ` Junio C Hamano
2007-03-17 23:09                                     ` Linus Torvalds
2007-03-17 23:54                                       ` Linus Torvalds
2007-03-18  1:13                                     ` Nicolas Pitre
2007-03-18  7:47                                       ` Junio C Hamano
2007-03-17 23:12                                   ` Junio C Hamano
2007-03-17 23:24                                     ` Linus Torvalds
2007-03-17 23:52                                       ` Jon Smirl
2007-03-18  1:14                                   ` Morten Welinder
2007-03-18  1:29                                     ` Linus Torvalds
2007-03-18  1:38                                       ` Nicolas Pitre
2007-03-18  1:55                                         ` Linus Torvalds
2007-03-18  2:03                                           ` Nicolas Pitre
2007-03-18  2:20                                             ` Linus Torvalds
2007-03-18  3:00                                               ` Nicolas Pitre
2007-03-18  3:31                                                 ` Linus Torvalds
2007-03-18  5:30                                                   ` Julian Phillips
2007-03-18 17:23                                                     ` Linus Torvalds
2007-03-18 10:53                                                   ` Robin Rosenberg
2007-03-18 17:34                                                     ` Linus Torvalds
2007-03-18 18:29                                                       ` Robin Rosenberg
2007-03-18 21:25                                                         ` Shawn O. Pearce
2007-03-19 13:16                                                         ` David Brodsky
2007-03-20  6:35                                                           ` Robin Rosenberg
2007-03-20  9:13                                                             ` David Brodsky
2007-03-21  2:37                                                               ` Linus Torvalds
2007-03-21  2:54                                                                 ` Nicolas Pitre
2007-03-18  3:06                                               ` [PATCH 3/2] Avoid unnecessary strlen() calls Linus Torvalds
2007-03-18  9:45                                                 ` Junio C Hamano
2007-03-18 15:54                                                   ` Linus Torvalds
2007-03-18 15:57                                                     ` Linus Torvalds
2007-03-18 21:38                                                       ` Shawn O. Pearce
2007-03-18 21:48                                                         ` Linus Torvalds
2007-03-20  3:05                                                     ` Johannes Schindelin
2007-03-20  3:29                                                       ` Shawn O. Pearce
2007-03-20  3:40                                                         ` Shawn O. Pearce
2007-03-20  4:11                                                           ` Linus Torvalds
2007-03-20  4:18                                                             ` Shawn O. Pearce
2007-03-20  4:45                                                               ` Linus Torvalds
2007-03-20  5:44                                                             ` Junio C Hamano
2007-03-20  3:16                                                     ` Junio C Hamano
2007-03-20  4:31                                                       ` Linus Torvalds
2007-03-20  4:39                                                         ` Shawn O. Pearce
2007-03-20  4:57                                                           ` Linus Torvalds
2007-03-18  1:44                                       ` [PATCH 2/2] Implement a simple delta_base cache Linus Torvalds
2007-03-18  6:28                                     ` Avi Kivity
2007-03-17 22:44                                 ` Linus Torvalds
2007-03-16 16:35               ` cleaner/better zlib sources? Jeff Garzik
2007-03-16 16:42                 ` Matt Mackall
2007-03-16 16:51                 ` Linus Torvalds
2007-03-16 17:12                 ` Nicolas Pitre
2007-03-16 23:22                 ` Shawn O. Pearce
2007-03-16 17:06               ` Nicolas Pitre
2007-03-16 17:51                 ` Linus Torvalds
2007-03-16 18:09                   ` Nicolas Pitre
2007-03-16  1:33 ` Davide Libenzi
2007-03-16  2:06   ` Davide Libenzi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0703171420420.4964@woody.linux-foundation.org \
    --to=torvalds@linux-foundation.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).