[PATCH] Limit the size of the new delta_base

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] Limit the size of the new delta_base_cache
@ 2007-03-19  4:48 Shawn O. Pearce
  0 siblings, 0 replies; 9+ messages in thread
From: Shawn O. Pearce @ 2007-03-19  4:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

The new configuration variable core.deltaBaseCacheLimit allows the
user to control how much memory they are willing to give to Git for
caching base objects of deltas.  This is not normally meant to be
a user tweakable knob; the "out of the box" settings are meant to
be suitable for almost all workloads.

We default to 16 MiB under the assumption that the cache is not
meant to consume all of the user's available memory, and that the
cache's main purpose was to cache trees, for faster path limiters
during revision traversal.  Since trees tend to be relatively small
objects, this relatively small limit should still allow a large
number of objects.

On the other hand we don't want the cache to start storing 200
different versions of a 200 MiB blob, as this could easily blow
the entire address space of a 32 bit process.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 Documentation/config.txt |   13 +++++++++++++
 cache.h                  |    1 +
 config.c                 |    5 +++++
 environment.c            |    1 +
 sha1_file.c              |   23 ++++++++++++++++++++---
 5 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 953acae..8796929 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -240,6 +240,19 @@ the largest projects.  You probably do not need to adjust this value.
 +
 Common unit suffixes of 'k', 'm', or 'g' are supported.
 
+core.deltaBaseCacheLimit::
+	Maximum number of bytes to reserve for caching base objects
+	that multiple deltafied objects reference.  By storing the
+	entire decompressed base objects in a cache Git is able
+	to avoid unpacking and decompressing frequently used base
+	objects multiple times.
++
+Default is 16 MiB on all platforms.  This should be reasonable
+for all users/operating systems, except on the largest projects.
+You probably do not need to adjust this value.
++
+Common unit suffixes of 'k', 'm', or 'g' are supported.
+
 alias.*::
 	Command aliases for the gitlink:git[1] command wrapper - e.g.
 	after defining "alias.last = cat-file commit HEAD", the invocation
diff --git a/cache.h b/cache.h
index 3818e10..9a3c8c8 100644
--- a/cache.h
+++ b/cache.h
@@ -228,6 +228,7 @@ extern const char *apply_default_whitespace;
 extern int zlib_compression_level;
 extern size_t packed_git_window_size;
 extern size_t packed_git_limit;
+extern size_t delta_base_cache_limit;
 extern int auto_crlf;
 
 #define GIT_REPO_VERSION 0
diff --git a/config.c b/config.c
index d537311..6479855 100644
--- a/config.c
+++ b/config.c
@@ -331,6 +331,11 @@ int git_default_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.deltabasecachelimit")) {
+		delta_base_cache_limit = git_config_int(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.autocrlf")) {
 		if (value && !strcasecmp(value, "input")) {
 			auto_crlf = -1;
diff --git a/environment.c b/environment.c
index 0151ad0..ef2f490 100644
--- a/environment.c
+++ b/environment.c
@@ -27,6 +27,7 @@ const char *apply_default_whitespace;
 int zlib_compression_level = Z_DEFAULT_COMPRESSION;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
+size_t delta_base_cache_limit = 16 * 1024 * 1024;
 int pager_in_use;
 int pager_use_color = 1;
 int auto_crlf = 0;	/* 1: both ways, -1: only when adding git objects */
diff --git a/sha1_file.c b/sha1_file.c
index ee64865..6d8b86a 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1354,6 +1354,7 @@ static void *unpack_compressed_entry(struct packed_git *p,
 
 #define MAX_DELTA_CACHE (256)
 
+static size_t delta_base_cached;
 static struct delta_base_cache_entry {
 	struct packed_git *p;
 	off_t base_offset;
@@ -1384,9 +1385,10 @@ static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset,
 	return unpack_entry(p, base_offset, type, base_size);
 
 found_cache_entry:
-	if (!keep_cache)
+	if (!keep_cache) {
 		ent->data = NULL;
-	else {
+		delta_base_cached -= ent->size;
+	} else {
 		ret = xmalloc(ent->size + 1);
 		memcpy(ret, ent->data, ent->size);
 		((char *)ret)[ent->size] = 0;
@@ -1401,9 +1403,24 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 {
 	unsigned long hash = pack_entry_hash(p, base_offset);
 	struct delta_base_cache_entry *ent = delta_base_cache + hash;
+	struct delta_base_cache_entry *f = delta_base_cache;
+	struct delta_base_cache_entry *e = f + ARRAY_SIZE(delta_base_cache);
 
-	if (ent->data)
+	if (ent->data) {
 		free(ent->data);
+		ent->data = NULL;
+		delta_base_cached -= ent->size;
+	}
+
+	delta_base_cached += base_size;
+	for (; delta_base_cached > delta_base_cache_limit && f < e; f++) {
+		if (f->data) {
+			free(f->data);
+			f->data = NULL;
+			delta_base_cached -= f->size;
+		}
+	}
+
 	ent->p = p;
 	ent->base_offset = base_offset;
 	ent->type = type;
-- 
1.5.0.5.1030.gc69a

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] Limit the size of the new delta_base_cache
@ 2007-03-19  5:14 Shawn O. Pearce
  2007-03-19 16:10 ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Shawn O. Pearce @ 2007-03-19  5:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Round two, based on comments on IRC from Junio:

-->8--
[PATCH] Limit the size of the new delta_base_cache

The new configuration variable core.deltaBaseCacheLimit allows the
user to control how much memory they are willing to give to Git for
caching base objects of deltas.  This is not normally meant to be
a user tweakable knob; the "out of the box" settings are meant to
be suitable for almost all workloads.

We default to 16 MiB under the assumption that the cache is not
meant to consume all of the user's available memory, and that the
cache's main purpose was to cache trees, for faster path limiters
during revision traversal.  Since trees tend to be relatively small
objects, this relatively small limit should still allow a large
number of objects.

On the other hand we don't want the cache to start storing 200
different versions of a 200 MiB blob, as this could easily blow
the entire address space of a 32 bit process.

We evict OBJ_BLOB from the cache first (credit goes to Junio) as
we want to favor OBJ_TREE within the cache.  These are the objects
that have the highest inflate() startup penalty, as they tend to
be small and thus don't have that much of a chance to ammortize
that penalty over the entire data.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 Documentation/config.txt |   13 +++++++++++++
 cache.h                  |    1 +
 config.c                 |    5 +++++
 environment.c            |    1 +
 sha1_file.c              |   31 ++++++++++++++++++++++++++-----
 5 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 953acae..8796929 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -240,6 +240,19 @@ the largest projects.  You probably do not need to adjust this value.
 +
 Common unit suffixes of 'k', 'm', or 'g' are supported.
 
+core.deltaBaseCacheLimit::
+	Maximum number of bytes to reserve for caching base objects
+	that multiple deltafied objects reference.  By storing the
+	entire decompressed base objects in a cache Git is able
+	to avoid unpacking and decompressing frequently used base
+	objects multiple times.
++
+Default is 16 MiB on all platforms.  This should be reasonable
+for all users/operating systems, except on the largest projects.
+You probably do not need to adjust this value.
++
+Common unit suffixes of 'k', 'm', or 'g' are supported.
+
 alias.*::
 	Command aliases for the gitlink:git[1] command wrapper - e.g.
 	after defining "alias.last = cat-file commit HEAD", the invocation
diff --git a/cache.h b/cache.h
index 3818e10..9a3c8c8 100644
--- a/cache.h
+++ b/cache.h
@@ -228,6 +228,7 @@ extern const char *apply_default_whitespace;
 extern int zlib_compression_level;
 extern size_t packed_git_window_size;
 extern size_t packed_git_limit;
+extern size_t delta_base_cache_limit;
 extern int auto_crlf;
 
 #define GIT_REPO_VERSION 0
diff --git a/config.c b/config.c
index d537311..6479855 100644
--- a/config.c
+++ b/config.c
@@ -331,6 +331,11 @@ int git_default_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.deltabasecachelimit")) {
+		delta_base_cache_limit = git_config_int(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.autocrlf")) {
 		if (value && !strcasecmp(value, "input")) {
 			auto_crlf = -1;
diff --git a/environment.c b/environment.c
index 0151ad0..ef2f490 100644
--- a/environment.c
+++ b/environment.c
@@ -27,6 +27,7 @@ const char *apply_default_whitespace;
 int zlib_compression_level = Z_DEFAULT_COMPRESSION;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
+size_t delta_base_cache_limit = 16 * 1024 * 1024;
 int pager_in_use;
 int pager_use_color = 1;
 int auto_crlf = 0;	/* 1: both ways, -1: only when adding git objects */
diff --git a/sha1_file.c b/sha1_file.c
index ee64865..0d039a3 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1354,6 +1354,7 @@ static void *unpack_compressed_entry(struct packed_git *p,
 
 #define MAX_DELTA_CACHE (256)
 
+static size_t delta_base_cached;
 static struct delta_base_cache_entry {
 	struct packed_git *p;
 	off_t base_offset;
@@ -1384,9 +1385,10 @@ static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset,
 	return unpack_entry(p, base_offset, type, base_size);
 
 found_cache_entry:
-	if (!keep_cache)
+	if (!keep_cache) {
 		ent->data = NULL;
-	else {
+		delta_base_cached -= ent->size;
+	} else {
 		ret = xmalloc(ent->size + 1);
 		memcpy(ret, ent->data, ent->size);
 		((char *)ret)[ent->size] = 0;
@@ -1396,14 +1398,33 @@ found_cache_entry:
 	return ret;
 }
 
+static inline void release_delta_base_cache(struct delta_base_cache_entry *ent)
+{
+	if (ent->data) {
+		free(ent->data);
+		ent->data = NULL;
+		delta_base_cached -= ent->size;
+	}
+}
+
 static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 	void *base, unsigned long base_size, enum object_type type)
 {
-	unsigned long hash = pack_entry_hash(p, base_offset);
+	unsigned long i, hash = pack_entry_hash(p, base_offset);
 	struct delta_base_cache_entry *ent = delta_base_cache + hash;
 
-	if (ent->data)
-		free(ent->data);
+	release_delta_base_cache(ent);
+	delta_base_cached += base_size;
+	for (i = 0; delta_base_cached > delta_base_cache_limit
+		&& i < ARRAY_SIZE(delta_base_cache); i++) {
+		struct delta_base_cache_entry *f = delta_base_cache + i;
+		if (f->type == OBJ_BLOB)
+			release_delta_base_cache(f);
+	}
+	for (i = 0; delta_base_cached > delta_base_cache_limit
+		&& i < ARRAY_SIZE(delta_base_cache); i++)
+		release_delta_base_cache(delta_base_cache + i);
+
 	ent->p = p;
 	ent->base_offset = base_offset;
 	ent->type = type;
-- 
1.5.0.5.1030.gc69a

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] Limit the size of the new delta_base_cache
  2007-03-19  5:14 [PATCH] Limit the size of the new delta_base_cache Shawn O. Pearce
@ 2007-03-19 16:10 ` Linus Torvalds
  2007-03-19 16:41   ` Nicolas Pitre
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2007-03-19 16:10 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Junio C Hamano, git

On Mon, 19 Mar 2007, Shawn O. Pearce wrote:
>
> Round two, based on comments on IRC from Junio:

One more change: please don't even *add* objects to the cache that are 
bigger than 10-20% of the cache limit!

If you start adding big objects, and you have some cache limit, that will 
make the cache seriously less useful by causing eviction of potentially 
much more interesting objects, and also obviously causing the cache code 
itself to spend much more time picking objects (since it's enough to have 
just a few big objects to make the cache eviction decide it needs to 
evict).

Limiting by size is also effective since anything that is more than a 
megabyte is likely to be a blob anyway, and thus much less useful for 
caching in the first place. So there are just tons of reasons to say 
"don't even add it in the first place" if you decide to go with any limit 
at all.

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Limit the size of the new delta_base_cache
  2007-03-19 16:10 ` Linus Torvalds
@ 2007-03-19 16:41   ` Nicolas Pitre
  2007-03-19 16:54     ` Nicolas Pitre
  2007-03-19 17:07     ` Linus Torvalds
  0 siblings, 2 replies; 9+ messages in thread
From: Nicolas Pitre @ 2007-03-19 16:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Shawn O. Pearce, Junio C Hamano, git

On Mon, 19 Mar 2007, Linus Torvalds wrote:

> 
> 
> On Mon, 19 Mar 2007, Shawn O. Pearce wrote:
> >
> > Round two, based on comments on IRC from Junio:
> 
> One more change: please don't even *add* objects to the cache that are 
> bigger than 10-20% of the cache limit!
> 
> If you start adding big objects, and you have some cache limit, that will 
> make the cache seriously less useful by causing eviction of potentially 
> much more interesting objects, and also obviously causing the cache code 
> itself to spend much more time picking objects (since it's enough to have 
> just a few big objects to make the cache eviction decide it needs to 
> evict).
> 
> Limiting by size is also effective since anything that is more than a 
> megabyte is likely to be a blob anyway, and thus much less useful for 
> caching in the first place. So there are just tons of reasons to say 
> "don't even add it in the first place" if you decide to go with any limit 
> at all.

On the other hand......

Either you parse blobs or you don't.  If you only do logs with path 
limiting then you won't add blobs to the cache anyway.

If you do end up adding blobs to the cache that means you have blob 
deltas to resolve, and even that operation should benefit from the cache 
regardless of the object size.

In fact, the bigger is the object, the more effective will be the cache.  
Because you certainly don't want to have a complete breakdown in 
performance just because a blob just crossed the 20% treshold.

And because we usually walk objects from newest to oldest, and because 
deltas are usually oriented in the same direction, we only need to tweak 
the current eviction loop a bit so on average the oldest objects are 
evicted first so next time around the current base will still be there 
for the next delta depth.  Given the nature of the hash containing the 
object's offset that means starting the loop at the next entry index 
instead of zero which should do the trick pretty well.

Of course if you end up in a condition where you have to prune the cache 
continuously, you'll spend more cycles picking up the object to evict, 
but it is likely to be so much less work than reintering that O(n!) 
behavior with deflate we had without the cache, and even worse since we 
mean big objects in this case.

So I wouldn't add any rule of that sort unless it is actually proven to 
be bad.

Nicolas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Limit the size of the new delta_base_cache
  2007-03-19 16:41   ` Nicolas Pitre
@ 2007-03-19 16:54     ` Nicolas Pitre
  2007-03-19 17:18       ` Linus Torvalds
  2007-03-19 17:07     ` Linus Torvalds
  1 sibling, 1 reply; 9+ messages in thread
From: Nicolas Pitre @ 2007-03-19 16:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Shawn O. Pearce, Junio C Hamano, git

On Mon, 19 Mar 2007, Nicolas Pitre wrote:

> And because we usually walk objects from newest to oldest, and because 
> deltas are usually oriented in the same direction, we only need to tweak 
> the current eviction loop a bit so on average the oldest objects are 
> evicted first so next time around the current base will still be there 
> for the next delta depth.  Given the nature of the hash containing the 
> object's offset that means starting the loop at the next entry index 
> instead of zero which should do the trick pretty well.

OK. Two flaws above: to be clear it is the newest objects in terms of 
absolute age should be evicted first, which means the oldest to have 
been cached.  Then the hash cannot be representative of the object 
ordering because it is reduced to 8 bits and objects are often enough 
more than 256 bytes apart to make this hash based ordering completely 
random.  A simple LRU might be quite effective instead.

Nicolas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Limit the size of the new delta_base_cache
  2007-03-19 16:41   ` Nicolas Pitre
  2007-03-19 16:54     ` Nicolas Pitre
@ 2007-03-19 17:07     ` Linus Torvalds
  2007-03-19 17:17       ` Linus Torvalds
  1 sibling, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2007-03-19 17:07 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Shawn O. Pearce, Junio C Hamano, git

On Mon, 19 Mar 2007, Nicolas Pitre wrote:
> 
> In fact, the bigger is the object, the more effective will be the cache.  

Yes. BUT ONLY IF THE CACHE ISN'T SIZE-LIMITED!

Once you limit the size of the cache, you change the whole equation.

It's usually *more* expensive to re-parse two half-sized objects than it 
is to parse one large object. So if you have to choose between 
throwing out two small objects or throwing out one large one, you 
should choose the single large one.

Which totally throws your argument out of the window. It's simply not true 
any more: the cache will *not* be more effective the larger the objects 
are, because you are ignoring the fact that adding a large object will 
*remove* many small ones.

So if I were Junio, I wouldn't accept your patch as it is now. Your 
argument is simply fundamentally flawed.

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Limit the size of the new delta_base_cache
  2007-03-19 17:07     ` Linus Torvalds
@ 2007-03-19 17:17       ` Linus Torvalds
  2007-03-19 18:08         ` Nicolas Pitre
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2007-03-19 17:17 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Shawn O. Pearce, Junio C Hamano, git

On Mon, 19 Mar 2007, Linus Torvalds wrote:
> 
> Which totally throws your argument out of the window. It's simply not true 
> any more: the cache will *not* be more effective the larger the objects 
> are, because you are ignoring the fact that adding a large object will 
> *remove* many small ones.

And btw, if you are adding a delta-base object that is bigger than 10%-20% 
(those particular numbers taken out of where the sun don't shine, but they 
are reasonable) of your cache size, you are pretty much *guaranteed* to be 
removing many small ones.

Why? 

The whole *point* of the delta-base cache is that it wants to avoid the 
O(n**2) costs of the long cache-chains. So a delta-base object on its own 
is almost not interesting for the cache: the cache comes into its own only 
if you have a *chain* of delta-base objects.

So rather than thinking about "one big object", you need to think about "a 
chain of objects", and realize that if one of them is big, then the others 
will be too (otherwise they'd not have generated a delta-chain in the 
first place).

So at an object size of 10-20% of the total allowed cache size, and a 
total cache _index_ of 256 entries, if you start adding big objects, you 
can pretty much be guaranteed that either

 (a) the delta-base cache won't be effective for the big object anyway, 
     since there's just one or two of them (and the cache size limit isn't 
     triggered)
OR
 (b) you start probing the cache size limit, and you'll be throwing out 
     many small object.

Finally, since the delta-chain cache is likely not as useful for blobs 
anyway (you don't tend to use tons of big related blobs anyway, and if you 
do, the real costs are likely the xdl diff generation between them rather 
than the object creation itself!), you're also likely optimizing the wrong 
case, since big delta-base objects are almost certainly goign to be blobs.

And no, I don't have any "hard numbers" to back this up, but if you want 
another argument, realize that delta chains are limited to a depth of ten, 
and if a *single* chain can overflow the delta-chain cache, then the cache 
ends up being 100% *in*effective. And do the math: if the object size is 
10-20% of the total allowed cache size, how many objects in the chain are 
you going to fit in the cache?

So there are multiple different reasons to think that big objects are 
actually *bad* for caching when you have a total size limit, not good.

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Limit the size of the new delta_base_cache
  2007-03-19 16:54     ` Nicolas Pitre
@ 2007-03-19 17:18       ` Linus Torvalds
  0 siblings, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2007-03-19 17:18 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Shawn O. Pearce, Junio C Hamano, git



On Mon, 19 Mar 2007, Nicolas Pitre wrote:
>
> A simple LRU might be quite effective instead.

I definitely agree on that. A simple LRU is probably the right replacement 
strategy.

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Limit the size of the new delta_base_cache
  2007-03-19 17:17       ` Linus Torvalds
@ 2007-03-19 18:08         ` Nicolas Pitre
  0 siblings, 0 replies; 9+ messages in thread
From: Nicolas Pitre @ 2007-03-19 18:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Shawn O. Pearce, Junio C Hamano, git

First this is Shawn's patch not mine.  But I still don't think it is 
that bad even as is, and it was already merged by Junio.  I believe it 
can be made better without segregating big objects though.

On Mon, 19 Mar 2007, Linus Torvalds wrote:

> And no, I don't have any "hard numbers" to back this up, but if you want 
> another argument, realize that delta chains are limited to a depth of ten, 
> and if a *single* chain can overflow the delta-chain cache, then the cache 
> ends up being 100% *in*effective.

No.  You forget what I said about delta direction.

Deltas are meant to be ordered so a newer object should be a base for an 
older objects.  And we usually parse objects from newer to older ones 
too.

So let's suppose that the cache has only two entries.  If there was only 
one delta chain in the pack and it is ideally laid out so the delta 
chain is effectively going deeper with older objects as we currently 
mean it to, then the cache would still be perfectly effective because, 
as we move further back in history, objects deeper in the delta chain 
will be requested, and the delta base a single level up will be there to 
be used just before being evicted.  And so on all through the delta 
chain.

> And do the math: if the object size is 
> 10-20% of the total allowed cache size, how many objects in the chain are 
> you going to fit in the cache?

Only 5 objects in the 20% case.  But as I say you could effectively have 
only _one_ entry in the cache and it would still be 100% effective in 
the single perfect delta chain I described above.

Of course in practice there is more than a single delta chain, and the 
delta direction might be the other way around.  But on _average_ it 
should work fine pretty well.

And notice how blobs are the first victims for cache eviction.  So for a 
mixed tree+blob usage case, if big blobs are to trash the cache, then 
with LRU eviction performance should auto-regulate itself quite fine.  
Of course a single blob might clear the cache entirely in the worst 
case.  But there is a possibility that this big blob could be the base 
for the next delta.  Adding a size treshold simply throw that 
possibility away in which case your performance will suffer _anyway_ 
because you'll experience the O(n**2) behavior with that delta chain of 
big blobs.  Compared to that the lost cache of small objects won't even 
be on the radar.  But if you let large objects reach the cache then 
there is at least some possibility that those large objects might be the 
next wanted delta base effectively saving big time on performance 
especially if that base was itself a deep delta object.

So to resume:

 - the issue concerns only operations needing blob parsing

 - if large deltified blobs are denied the cache you will _always_ have
   a sudden and drastic performance degradation when the size treshold 
   is crossed as the O(n**2) behavior on those big blob deltas will 
   dominate the profile

 - if you instead don't impose any treshold then performance will 
   degrade smoothly with object size down to the absolute worst case of 
   a single object in the cache, and for some cases it might even be 
   quite effective!

I think it is therefore best to let the cache regulate itself with LRU 
eviction than using arbitrary size treshold to segregate objects.  The 
cache will quickly repopulate itself if the big blob happens to be  an 
odd ball in a repository anyway.

Nicolas

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-03-19 18:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-19  5:14 [PATCH] Limit the size of the new delta_base_cache Shawn O. Pearce
2007-03-19 16:10 ` Linus Torvalds
2007-03-19 16:41   ` Nicolas Pitre
2007-03-19 16:54     ` Nicolas Pitre
2007-03-19 17:18       ` Linus Torvalds
2007-03-19 17:07     ` Linus Torvalds
2007-03-19 17:17       ` Linus Torvalds
2007-03-19 18:08         ` Nicolas Pitre
  -- strict thread matches above, loose matches on Subject: below --
2007-03-19  4:48 Shawn O. Pearce

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).