* [PATCH 1/5] Don't try to delta if target is much smaller than source
2007-07-12 3:14 [PATCH 0/5] Memory-limited pack-object window support Brian Downing
@ 2007-07-12 3:14 ` Brian Downing
2007-07-12 3:14 ` [PATCH 2/5] Support fetching the memory usage of a delta index Brian Downing
` (4 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Brian Downing @ 2007-07-12 3:14 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Brian Downing
Add a new try_delta heuristic: Don't bother trying to make a delta if
the target object size is much smaller (currently 1/32) than the source,
as it's very likely not going to get a match. Even if it does, you will
have to read at least 32x the size of the new file to reassemble it,
which isn't such a good deal. This leads to a considerable performance
improvement when deltifying a mix of small and large files with a very
large window, because you don't have to wait for the large files to
percolate out of the window before things start going fast again.
Signed-off-by: Brian Downing <bdowning@lavos.net>
---
builtin-pack-objects.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 54b9d26..132ce96 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1342,6 +1342,8 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
sizediff = src_size < trg_size ? trg_size - src_size : 0;
if (sizediff >= max_size)
return 0;
+ if (trg_size < src_size / 32)
+ return 0;
/* Load data if not already done */
if (!trg->data) {
--
1.5.2.GIT
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/5] Support fetching the memory usage of a delta index
2007-07-12 3:14 [PATCH 0/5] Memory-limited pack-object window support Brian Downing
2007-07-12 3:14 ` [PATCH 1/5] Don't try to delta if target is much smaller than source Brian Downing
@ 2007-07-12 3:14 ` Brian Downing
2007-07-12 3:14 ` [PATCH 3/5] Add pack-objects window memory usage limit Brian Downing
` (3 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Brian Downing @ 2007-07-12 3:14 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Brian Downing
Delta indexes, at least on 64-bit platforms, tend to be larger than
the actual uncompressed data. As such, keeping track of this storage
is important if you want to successfully limit the memory size of your
pack window.
Squirrel away the total allocation size inside the delta_index struct,
and add an accessor "sizeof_delta_index" to access it.
Signed-off-by: Brian Downing <bdowning@lavos.net>
---
delta.h | 7 +++++++
diff-delta.c | 10 ++++++++++
2 files changed, 17 insertions(+), 0 deletions(-)
diff --git a/delta.h b/delta.h
index 7b3f86d..40ccf5a 100644
--- a/delta.h
+++ b/delta.h
@@ -24,6 +24,13 @@ create_delta_index(const void *buf, unsigned long bufsize);
extern void free_delta_index(struct delta_index *index);
/*
+ * sizeof_delta_index: returns memory usage of delta index
+ *
+ * Given pointer must be what create_delta_index() returned, or NULL.
+ */
+extern unsigned long sizeof_delta_index(struct delta_index *index);
+
+/*
* create_delta: create a delta from given index for the given buffer
*
* This function may be called multiple times with different buffers using
diff --git a/diff-delta.c b/diff-delta.c
index faf96e4..3af5835 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -119,6 +119,7 @@ struct index_entry {
};
struct delta_index {
+ unsigned long memsize;
const void *src_buf;
unsigned long src_size;
unsigned int hash_mask;
@@ -159,6 +160,7 @@ struct delta_index * create_delta_index(const void *buf, unsigned long bufsize)
mem = hash + hsize;
entry = mem;
+ index->memsize = memsize;
index->src_buf = buf;
index->src_size = bufsize;
index->hash_mask = hmask;
@@ -228,6 +230,14 @@ void free_delta_index(struct delta_index *index)
free(index);
}
+unsigned long sizeof_delta_index(struct delta_index *index)
+{
+ if (index)
+ return index->memsize;
+ else
+ return 0;
+}
+
/*
* The maximum size for any opcode sequence, including the initial header
* plus Rabin window plus biggest copy.
--
1.5.2.GIT
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/5] Add pack-objects window memory usage limit
2007-07-12 3:14 [PATCH 0/5] Memory-limited pack-object window support Brian Downing
2007-07-12 3:14 ` [PATCH 1/5] Don't try to delta if target is much smaller than source Brian Downing
2007-07-12 3:14 ` [PATCH 2/5] Support fetching the memory usage of a delta index Brian Downing
@ 2007-07-12 3:14 ` Brian Downing
2007-07-12 4:25 ` Nicolas Pitre
2007-07-12 3:14 ` [PATCH 4/5] Add --window-bytes option to git-repack Brian Downing
` (2 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: Brian Downing @ 2007-07-12 3:14 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Brian Downing
This adds an option (--window-bytes=N) and configuration variable
(pack.windowBytes = N) to limit the memory size of the pack-objects
delta search window. This works by removing the oldest unpacked objects
whenever the total size goes above the limit. It will always leave
at least one object, though, so as not to completely eliminate the
possibility of computing deltas.
This is an extra limit on top of the normal window size (--window=N);
the window will not dynamically grow above the fixed number of entries
specified to fill the memory limit.
With this, repacking a repository with a mix of large and small objects
is possible even with a very large window.
Signed-off-by: Brian Downing <bdowning@lavos.net>
---
builtin-pack-objects.c | 56 ++++++++++++++++++++++++++++++++++++++++++------
1 files changed, 49 insertions(+), 7 deletions(-)
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 132ce96..6e441b7 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -16,8 +16,9 @@
#include "progress.h"
static const char pack_usage[] = "\
-git-pack-objects [{ -q | --progress | --all-progress }] [--max-pack-size=N] \n\
- [--local] [--incremental] [--window=N] [--depth=N] \n\
+git-pack-objects [{ -q | --progress | --all-progress }] \n\
+ [--max-pack-size=N] [--local] [--incremental] \n\
+ [--window=N] [--window-bytes=N] [--depth=N] \n\
[--no-reuse-delta] [--no-reuse-object] [--delta-base-offset] \n\
[--non-empty] [--revs [--unpacked | --all]*] [--reflog] \n\
[--stdout | base-name] [<ref-list | <object-list]";
@@ -79,6 +80,9 @@ static unsigned long delta_cache_size = 0;
static unsigned long max_delta_cache_size = 0;
static unsigned long cache_max_small_delta_size = 1000;
+static unsigned long window_memory_usage = 0;
+static unsigned long window_memory_limit = 0;
+
/*
* The object names in objects array are hashed with this hashtable,
* to help looking up the entry by object name.
@@ -1351,12 +1355,14 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
if (sz != trg_size)
die("object %s inconsistent object length (%lu vs %lu)",
sha1_to_hex(trg_entry->idx.sha1), sz, trg_size);
+ window_memory_usage += sz;
}
if (!src->data) {
src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
if (sz != src_size)
die("object %s inconsistent object length (%lu vs %lu)",
sha1_to_hex(src_entry->idx.sha1), sz, src_size);
+ window_memory_usage += sz;
}
if (!src->index) {
src->index = create_delta_index(src->data, src_size);
@@ -1366,6 +1372,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
warning("suboptimal pack - out of memory");
return 0;
}
+ window_memory_usage += sizeof_delta_index(src->index);
}
delta_buf = create_delta(src->index, trg->data, trg_size, &delta_size, max_size);
@@ -1408,9 +1415,22 @@ static unsigned int check_delta_limit(struct object_entry *me, unsigned int n)
return m;
}
+static void free_unpacked(struct unpacked *n)
+{
+ window_memory_usage -= sizeof_delta_index(n->index);
+ free_delta_index(n->index);
+ n->index = NULL;
+ if (n->data) {
+ free(n->data);
+ n->data = NULL;
+ window_memory_usage -= n->entry->size;
+ }
+ n->entry = NULL;
+}
+
static void find_deltas(struct object_entry **list, int window, int depth)
{
- uint32_t i = nr_objects, idx = 0, processed = 0;
+ uint32_t i = nr_objects, idx = 0, count = 0, processed = 0;
unsigned int array_size = window * sizeof(struct unpacked);
struct unpacked *array;
int max_depth;
@@ -1445,12 +1465,21 @@ static void find_deltas(struct object_entry **list, int window, int depth)
if (entry->no_try_delta)
continue;
- free_delta_index(n->index);
- n->index = NULL;
- free(n->data);
- n->data = NULL;
+ free_unpacked(n);
n->entry = entry;
+ while (window_memory_limit &&
+ window_memory_usage > window_memory_limit &&
+ count > 1) {
+ uint32_t tail = idx - count;
+ if (tail > idx) {
+ tail += window + 1;
+ tail %= window;
+ }
+ free_unpacked(array + tail);
+ count--;
+ }
+
/*
* If the current object is at pack edge, take the depth the
* objects that depend on the current object into account
@@ -1485,6 +1514,8 @@ static void find_deltas(struct object_entry **list, int window, int depth)
next:
idx++;
+ if (count < window)
+ count++;
if (idx >= window)
idx = 0;
} while (i > 0);
@@ -1523,6 +1554,10 @@ static int git_pack_config(const char *k, const char *v)
window = git_config_int(k, v);
return 0;
}
+ if(!strcmp(k, "pack.windowbytes")) {
+ window_memory_limit = git_config_int(k, v);
+ return 0;
+ }
if(!strcmp(k, "pack.depth")) {
depth = git_config_int(k, v);
return 0;
@@ -1699,6 +1734,13 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
usage(pack_usage);
continue;
}
+ if (!prefixcmp(arg, "--window-bytes=")) {
+ char *end;
+ window_memory_limit = strtoul(arg+15, &end, 0);
+ if (!arg[15] || *end)
+ usage(pack_usage);
+ continue;
+ }
if (!prefixcmp(arg, "--depth=")) {
char *end;
depth = strtoul(arg+8, &end, 0);
--
1.5.2.GIT
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 3/5] Add pack-objects window memory usage limit
2007-07-12 3:14 ` [PATCH 3/5] Add pack-objects window memory usage limit Brian Downing
@ 2007-07-12 4:25 ` Nicolas Pitre
2007-07-12 10:02 ` Brian Downing
0 siblings, 1 reply; 10+ messages in thread
From: Nicolas Pitre @ 2007-07-12 4:25 UTC (permalink / raw)
To: Brian Downing; +Cc: git, Junio C Hamano
On Wed, 11 Jul 2007, Brian Downing wrote:
> + while (window_memory_limit &&
> + window_memory_usage > window_memory_limit &&
> + count > 1) {
> + uint32_t tail = idx - count;
> + if (tail > idx) {
> + tail += window + 1;
> + tail %= window;
> + }
> + free_unpacked(array + tail);
> + count--;
> + }
This is bogus. Suppose window = 10 and only array entries 8, 9, 0, 1
and 2 are populated. In that case idx = 2 and count should be 4 (not
counting the current entry yet). You want to evict entry 8.
-- tail = 2 - 4 = -2 (or a big uint32_t value)
-- tail > idx is true
-- tail += window + 1 -> -2 + 10 + 1 = 9
-- tail %= window is useless
-- you free entry 9 instead of entry 8.
Instead, you should do:
tail = idx - count;
if (tail > idx)
tail += window;
or even:
tail = (idx + window - count) % window;
> next:
> idx++;
> + if (count < window)
> + count++;
And of course you want:
if (count + 1 < window)
count++;
So not to count the new entry when the window gets full.
Nicolas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/5] Add pack-objects window memory usage limit
2007-07-12 4:25 ` Nicolas Pitre
@ 2007-07-12 10:02 ` Brian Downing
0 siblings, 0 replies; 10+ messages in thread
From: Brian Downing @ 2007-07-12 10:02 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: git, Junio C Hamano
On Thu, Jul 12, 2007 at 12:25:54AM -0400, Nicolas Pitre wrote:
> On Wed, 11 Jul 2007, Brian Downing wrote:
>
> > + while (window_memory_limit &&
> > + window_memory_usage > window_memory_limit &&
> > + count > 1) {
> > + uint32_t tail = idx - count;
> > + if (tail > idx) {
> > + tail += window + 1;
> > + tail %= window;
> > + }
> > + free_unpacked(array + tail);
> > + count--;
> > + }
>
> This is bogus. Suppose window = 10 and only array entries 8, 9, 0, 1
> and 2 are populated. In that case idx = 2 and count should be 4 (not
> counting the current entry yet). You want to evict entry 8.
The current idx has already been depopulated by the time that code is
run, and count is probably one higher than you are expecting, so this
does actually work.
However, looking at it again, I think if the window hasn't been saturated
yet in my current code count will be what you expect in this situation
and it will screw up as you describe.
Besides, it is admittedly clumsy as hell (a common affliction when
dealing with circular buffers for me it seems). I'll see if I can get
something better that works.
Thanks,
-bcd
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 4/5] Add --window-bytes option to git-repack
2007-07-12 3:14 [PATCH 0/5] Memory-limited pack-object window support Brian Downing
` (2 preceding siblings ...)
2007-07-12 3:14 ` [PATCH 3/5] Add pack-objects window memory usage limit Brian Downing
@ 2007-07-12 3:14 ` Brian Downing
2007-07-12 3:14 ` [PATCH 5/5] Add documentation for --window-bytes, pack.windowBytes Brian Downing
2007-07-12 4:38 ` [PATCH 0/5] Memory-limited pack-object window support Nicolas Pitre
5 siblings, 0 replies; 10+ messages in thread
From: Brian Downing @ 2007-07-12 3:14 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Brian Downing
Signed-off-by: Brian Downing <bdowning@lavos.net>
---
git-repack.sh | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/git-repack.sh b/git-repack.sh
index b5c6671..4cff812 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -3,7 +3,7 @@
# Copyright (c) 2005 Linus Torvalds
#
-USAGE='[-a] [-d] [-f] [-l] [-n] [-q] [--max-pack-size=N] [--window=N] [--depth=N]'
+USAGE='[-a] [-d] [-f] [-l] [-n] [-q] [--max-pack-size=N] [--window=N] [--window-bytes=N] [--depth=N]'
SUBDIRECTORY_OK='Yes'
. git-sh-setup
@@ -20,6 +20,7 @@ do
-l) local=--local ;;
--max-pack-size=*) extra="$extra $1" ;;
--window=*) extra="$extra $1" ;;
+ --window-bytes=*) extra="$extra $1" ;;
--depth=*) extra="$extra $1" ;;
*) usage ;;
esac
--
1.5.2.GIT
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 5/5] Add documentation for --window-bytes, pack.windowBytes
2007-07-12 3:14 [PATCH 0/5] Memory-limited pack-object window support Brian Downing
` (3 preceding siblings ...)
2007-07-12 3:14 ` [PATCH 4/5] Add --window-bytes option to git-repack Brian Downing
@ 2007-07-12 3:14 ` Brian Downing
2007-07-12 4:35 ` Nicolas Pitre
2007-07-12 4:38 ` [PATCH 0/5] Memory-limited pack-object window support Nicolas Pitre
5 siblings, 1 reply; 10+ messages in thread
From: Brian Downing @ 2007-07-12 3:14 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Brian Downing
Signed-off-by: Brian Downing <bdowning@lavos.net>
---
Documentation/config.txt | 5 +++++
Documentation/git-pack-objects.txt | 8 ++++++++
Documentation/git-repack.txt | 8 ++++++++
3 files changed, 21 insertions(+), 0 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index aeece84..83c7dc1 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -592,6 +592,11 @@ pack.depth::
The maximum delta depth used by gitlink:git-pack-objects[1] when no
maximum depth is given on the command line. Defaults to 50.
+pack.windowBytes::
+ This option provides an additional limit on top of `pack.window`;
+ the window size will dynamically scale down so as to not take
+ up more than N bytes in memory.
+
pack.compression::
An integer -1..9, indicating the compression level for objects
in a pack file. -1 is the zlib default. 0 means no
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index e3549b5..21ed198 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -85,6 +85,14 @@ base-name::
times to get to the necessary object.
The default value for --window is 10 and --depth is 50.
+--window-bytes=[N]::
+ This option provides an additional limit on top of `--window`;
+ the window size will dynamically scale down so as to not take
+ up more than N bytes in memory. This is useful in
+ repositories with a mix of large and small objects to not run
+ out of memory with a large window, but still be able to take
+ advantage of the large window for the smaller objects.
+
--max-pack-size=<n>::
Maximum size of each output packfile, expressed in MiB.
If specified, multiple packfiles may be created.
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 2894939..805d930 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -68,6 +68,14 @@ OPTIONS
to be applied that many times to get to the necessary object.
The default value for --window is 10 and --depth is 50.
+--window-bytes=[N]::
+ This option provides an additional limit on top of `--window`;
+ the window size will dynamically scale down so as to not take
+ up more than N bytes in memory. This is useful in
+ repositories with a mix of large and small objects to not run
+ out of memory with a large window, but still be able to take
+ advantage of the large window for the smaller objects.
+
--max-pack-size=<n>::
Maximum size of each output packfile, expressed in MiB.
If specified, multiple packfiles may be created.
--
1.5.2.GIT
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 5/5] Add documentation for --window-bytes, pack.windowBytes
2007-07-12 3:14 ` [PATCH 5/5] Add documentation for --window-bytes, pack.windowBytes Brian Downing
@ 2007-07-12 4:35 ` Nicolas Pitre
0 siblings, 0 replies; 10+ messages in thread
From: Nicolas Pitre @ 2007-07-12 4:35 UTC (permalink / raw)
To: Brian Downing; +Cc: git, Junio C Hamano
On Wed, 11 Jul 2007, Brian Downing wrote:
> Signed-off-by: Brian Downing <bdowning@lavos.net>
> ---
> Documentation/config.txt | 5 +++++
> Documentation/git-pack-objects.txt | 8 ++++++++
> Documentation/git-repack.txt | 8 ++++++++
> 3 files changed, 21 insertions(+), 0 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index aeece84..83c7dc1 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -592,6 +592,11 @@ pack.depth::
> The maximum delta depth used by gitlink:git-pack-objects[1] when no
> maximum depth is given on the command line. Defaults to 50.
>
> +pack.windowBytes::
> + This option provides an additional limit on top of `pack.window`;
> + the window size will dynamically scale down so as to not take
> + up more than N bytes in memory.
> +
This doesn't say what the default (unlimited) is.
> pack.compression::
> An integer -1..9, indicating the compression level for objects
> in a pack file. -1 is the zlib default. 0 means no
> diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
> index e3549b5..21ed198 100644
> --- a/Documentation/git-pack-objects.txt
> +++ b/Documentation/git-pack-objects.txt
> @@ -85,6 +85,14 @@ base-name::
> times to get to the necessary object.
> The default value for --window is 10 and --depth is 50.
>
> +--window-bytes=[N]::
> + This option provides an additional limit on top of `--window`;
> + the window size will dynamically scale down so as to not take
> + up more than N bytes in memory. This is useful in
> + repositories with a mix of large and small objects to not run
> + out of memory with a large window, but still be able to take
> + advantage of the large window for the smaller objects.
Ditto here.
Also it is a bit akward to specify a size in bytes when you probably
want to specify a limit which is in the megabyte range. I'd call them
--window_mem and pack.windowmemory, and allow for unit suffixes of 'k',
'm', or 'g' to be supported if not already.
Nicolas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/5] Memory-limited pack-object window support
2007-07-12 3:14 [PATCH 0/5] Memory-limited pack-object window support Brian Downing
` (4 preceding siblings ...)
2007-07-12 3:14 ` [PATCH 5/5] Add documentation for --window-bytes, pack.windowBytes Brian Downing
@ 2007-07-12 4:38 ` Nicolas Pitre
5 siblings, 0 replies; 10+ messages in thread
From: Nicolas Pitre @ 2007-07-12 4:38 UTC (permalink / raw)
To: Brian Downing; +Cc: git, Junio C Hamano
On Wed, 11 Jul 2007, Brian Downing wrote:
> This patch series implements a memory limit on the window size for
> pack-objects and repack. Basically, the window size will temporarily
> grow smaller than the --window option specifies if the total memory
> usage of the window is over the specified limit.
Besides the small issues I've pointed out already, I think this is a
very good thing.
Nicolas
^ permalink raw reply [flat|nested] 10+ messages in thread