git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Pack window memory limit, take 2
@ 2007-07-12 12:55 Brian Downing
  2007-07-12 12:55 ` [PATCH 1/6] Don't try to delta if target is much smaller than source Brian Downing
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Brian Downing @ 2007-07-12 12:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nicolas Pitre, Brian Downing

This series has my circular buffer errors (hopefully) corrected, and
the code is a lot cleaner as a bonus.  Also, the options are now named
--window-memory and pack.windowMemory, and both can take {k,m,g} suffixes.

I split out the {k,m,g} parsing code from git_config_int into
git_parse_long and git_parse_ulong, so it can be used for command-line
arguments as well.  Hopefully these will be useful elsewhere.

Finally the documentation has been cleaned up a bit and information on
the defaults has been added.

Patches 1 and 2 are unmodified from last time.

 [PATCH 1/6] Don't try to delta if target is much smaller than source
 [PATCH 2/6] Support fetching the memory usage of a delta index
 [PATCH 3/6] Add functions for parsing integers with size suffixes
 [PATCH 4/6] Add pack-objects window memory usage limit
 [PATCH 5/6] Add --window-memory option to git-repack
 [PATCH 6/6] Add documentation for --window-memory, pack.windowMemory

 Documentation/config.txt           |    6 +++
 Documentation/git-pack-objects.txt |   11 ++++++
 Documentation/git-repack.txt       |   11 ++++++
 builtin-pack-objects.c             |   52 ++++++++++++++++++++++++++----
 cache.h                            |    3 ++
 config.c                           |   61 +++++++++++++++++++++++++++++------
 delta.h                            |    7 ++++
 diff-delta.c                       |   10 ++++++
 git-repack.sh                      |    3 +-
 9 files changed, 145 insertions(+), 19 deletions(-)

-bcd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/6] Don't try to delta if target is much smaller than source
  2007-07-12 12:55 [PATCH 0/6] Pack window memory limit, take 2 Brian Downing
@ 2007-07-12 12:55 ` Brian Downing
  2007-07-12 12:55 ` [PATCH 2/6] Support fetching the memory usage of a delta index Brian Downing
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Brian Downing @ 2007-07-12 12:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nicolas Pitre, Brian Downing

Add a new try_delta heuristic:  Don't bother trying to make a delta if
the target object size is much smaller (currently 1/32) than the source,
as it's very likely not going to get a match.  Even if it does, you will
have to read at least 32x the size of the new file to reassemble it,
which isn't such a good deal.  This leads to a considerable performance
improvement when deltifying a mix of small and large files with a very
large window, because you don't have to wait for the large files to
percolate out of the window before things start going fast again.

Signed-off-by: Brian Downing <bdowning@lavos.net>
---
 builtin-pack-objects.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 54b9d26..132ce96 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1342,6 +1342,8 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
 	sizediff = src_size < trg_size ? trg_size - src_size : 0;
 	if (sizediff >= max_size)
 		return 0;
+	if (trg_size < src_size / 32)
+		return 0;
 
 	/* Load data if not already done */
 	if (!trg->data) {
-- 
1.5.2.GIT

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/6] Support fetching the memory usage of a delta index
  2007-07-12 12:55 [PATCH 0/6] Pack window memory limit, take 2 Brian Downing
  2007-07-12 12:55 ` [PATCH 1/6] Don't try to delta if target is much smaller than source Brian Downing
@ 2007-07-12 12:55 ` Brian Downing
  2007-07-12 12:55 ` [PATCH 3/6] Add functions for parsing integers with size suffixes Brian Downing
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Brian Downing @ 2007-07-12 12:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nicolas Pitre, Brian Downing

Delta indexes, at least on 64-bit platforms, tend to be larger than
the actual uncompressed data.  As such, keeping track of this storage
is important if you want to successfully limit the memory size of your
pack window.

Squirrel away the total allocation size inside the delta_index struct,
and add an accessor "sizeof_delta_index" to access it.

Signed-off-by: Brian Downing <bdowning@lavos.net>
---
 delta.h      |    7 +++++++
 diff-delta.c |   10 ++++++++++
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/delta.h b/delta.h
index 7b3f86d..40ccf5a 100644
--- a/delta.h
+++ b/delta.h
@@ -24,6 +24,13 @@ create_delta_index(const void *buf, unsigned long bufsize);
 extern void free_delta_index(struct delta_index *index);
 
 /*
+ * sizeof_delta_index: returns memory usage of delta index
+ *
+ * Given pointer must be what create_delta_index() returned, or NULL.
+ */
+extern unsigned long sizeof_delta_index(struct delta_index *index);
+
+/*
  * create_delta: create a delta from given index for the given buffer
  *
  * This function may be called multiple times with different buffers using
diff --git a/diff-delta.c b/diff-delta.c
index faf96e4..3af5835 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -119,6 +119,7 @@ struct index_entry {
 };
 
 struct delta_index {
+	unsigned long memsize;
 	const void *src_buf;
 	unsigned long src_size;
 	unsigned int hash_mask;
@@ -159,6 +160,7 @@ struct delta_index * create_delta_index(const void *buf, unsigned long bufsize)
 	mem = hash + hsize;
 	entry = mem;
 
+	index->memsize = memsize;
 	index->src_buf = buf;
 	index->src_size = bufsize;
 	index->hash_mask = hmask;
@@ -228,6 +230,14 @@ void free_delta_index(struct delta_index *index)
 	free(index);
 }
 
+unsigned long sizeof_delta_index(struct delta_index *index)
+{
+	if (index)
+		return index->memsize;
+	else
+		return 0;
+}
+
 /*
  * The maximum size for any opcode sequence, including the initial header
  * plus Rabin window plus biggest copy.
-- 
1.5.2.GIT

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/6] Add functions for parsing integers with size suffixes
  2007-07-12 12:55 [PATCH 0/6] Pack window memory limit, take 2 Brian Downing
  2007-07-12 12:55 ` [PATCH 1/6] Don't try to delta if target is much smaller than source Brian Downing
  2007-07-12 12:55 ` [PATCH 2/6] Support fetching the memory usage of a delta index Brian Downing
@ 2007-07-12 12:55 ` Brian Downing
  2007-07-12 13:07   ` Johannes Schindelin
  2007-07-12 12:55 ` [PATCH 4/6] Add pack-objects window memory usage limit Brian Downing
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: Brian Downing @ 2007-07-12 12:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nicolas Pitre, Brian Downing

Split out the nnn{k,m,g} parsing code from git_config_int into
git_parse_long, so command-line parameters can enjoy the same
functionality.  Also add get_parse_ulong for unsigned values.

Make git_config_int use git_parse_long, and add get_config_ulong
as well.

Signed-off-by: Brian Downing <bdowning@lavos.net>
---
 cache.h  |    3 +++
 config.c |   61 ++++++++++++++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 53 insertions(+), 11 deletions(-)

diff --git a/cache.h b/cache.h
index e64071e..917a7e3 100644
--- a/cache.h
+++ b/cache.h
@@ -521,7 +521,10 @@ typedef int (*config_fn_t)(const char *, const char *);
 extern int git_default_config(const char *, const char *);
 extern int git_config_from_file(config_fn_t fn, const char *);
 extern int git_config(config_fn_t fn);
+extern int git_parse_long(const char *, long *);
+extern int git_parse_ulong(const char *, unsigned long *);
 extern int git_config_int(const char *, const char *);
+extern unsigned long git_config_ulong(const char *, const char *);
 extern int git_config_bool(const char *, const char *);
 extern int git_config_set(const char *, const char *);
 extern int git_config_set_multivar(const char *, const char *, const char *, int);
diff --git a/config.c b/config.c
index 561ee3b..ee338d1 100644
--- a/config.c
+++ b/config.c
@@ -233,21 +233,60 @@ static int git_parse_file(config_fn_t fn)
 	die("bad config file line %d in %s", config_linenr, config_file_name);
 }
 
-int git_config_int(const char *name, const char *value)
+int git_parse_long(const char *value, long *ret)
+{
+	if (value && *value) {
+		char *end;
+		long val = strtol(value, &end, 0);
+		if (!*end)
+			*ret = val;
+		else if (!strcasecmp(end, "k"))
+			*ret = val * 1024;
+		else if (!strcasecmp(end, "m"))
+			*ret = val * 1024 * 1024;
+		else if (!strcasecmp(end, "g"))
+			*ret = val * 1024 * 1024 * 1024;
+		else
+			return 0;
+		return 1;
+	}
+	return 0;
+}
+
+int git_parse_ulong(const char *value, unsigned long *ret)
 {
 	if (value && *value) {
 		char *end;
-		int val = strtol(value, &end, 0);
+		unsigned long val = strtoul(value, &end, 0);
 		if (!*end)
-			return val;
-		if (!strcasecmp(end, "k"))
-			return val * 1024;
-		if (!strcasecmp(end, "m"))
-			return val * 1024 * 1024;
-		if (!strcasecmp(end, "g"))
-			return val * 1024 * 1024 * 1024;
-	}
-	die("bad config value for '%s' in %s", name, config_file_name);
+			*ret = val;
+		else if (!strcasecmp(end, "k"))
+			*ret = val * 1024;
+		else if (!strcasecmp(end, "m"))
+			*ret = val * 1024 * 1024;
+		else if (!strcasecmp(end, "g"))
+			*ret = val * 1024 * 1024 * 1024;
+		else
+			return 0;
+		return 1;
+	}
+	return 0;
+}
+
+int git_config_int(const char *name, const char *value)
+{
+	long ret;
+	if (!git_parse_long(value, &ret))
+		die("bad config value for '%s' in %s", name, config_file_name);
+	return ret;
+}
+
+unsigned long git_config_ulong(const char *name, const char *value)
+{
+	unsigned long ret;
+	if (!git_parse_ulong(value, &ret))
+		die("bad config value for '%s' in %s", name, config_file_name);
+	return ret;
 }
 
 int git_config_bool(const char *name, const char *value)
-- 
1.5.2.GIT

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/6] Add pack-objects window memory usage limit
  2007-07-12 12:55 [PATCH 0/6] Pack window memory limit, take 2 Brian Downing
                   ` (2 preceding siblings ...)
  2007-07-12 12:55 ` [PATCH 3/6] Add functions for parsing integers with size suffixes Brian Downing
@ 2007-07-12 12:55 ` Brian Downing
  2007-07-12 13:04   ` Brian Downing
  2007-07-12 12:55 ` [PATCH 5/6] Add --window-memory option to git-repack Brian Downing
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: Brian Downing @ 2007-07-12 12:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nicolas Pitre, Brian Downing

This adds an option (--window-memory=N) and configuration variable
(pack.windowMemory = N) to limit the memory size of the pack-objects
delta search window.  This works by removing the oldest unpacked objects
whenever the total size goes above the limit.  It will always leave
at least one object, though, so as not to completely eliminate the
possibility of computing deltas.

This is an extra limit on top of the normal window size (--window=N);
the window will not dynamically grow above the fixed number of entries
specified to fill the memory limit.

With this, repacking a repository with a mix of large and small objects
is possible even with a very large window.

Cleaner and correct circular buffer handling courtesy of Nicolas Pitre.

Signed-off-by: Brian Downing <bdowning@lavos.net>
---
 builtin-pack-objects.c |   50 +++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 132ce96..dc6a5f4 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -16,8 +16,9 @@
 #include "progress.h"
 
 static const char pack_usage[] = "\
-git-pack-objects [{ -q | --progress | --all-progress }] [--max-pack-size=N] \n\
-	[--local] [--incremental] [--window=N] [--depth=N] \n\
+git-pack-objects [{ -q | --progress | --all-progress }] \n\
+	[--max-pack-size=N] [--local] [--incremental] \n\
+	[--window=N] [--window-memory=N] [--depth=N] \n\
 	[--no-reuse-delta] [--no-reuse-object] [--delta-base-offset] \n\
 	[--non-empty] [--revs [--unpacked | --all]*] [--reflog] \n\
 	[--stdout | base-name] [<ref-list | <object-list]";
@@ -79,6 +80,9 @@ static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = 0;
 static unsigned long cache_max_small_delta_size = 1000;
 
+static unsigned long window_memory_usage = 0;
+static unsigned long window_memory_limit = 0;
+
 /*
  * The object names in objects array are hashed with this hashtable,
  * to help looking up the entry by object name.
@@ -1351,12 +1355,14 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
 		if (sz != trg_size)
 			die("object %s inconsistent object length (%lu vs %lu)",
 			    sha1_to_hex(trg_entry->idx.sha1), sz, trg_size);
+		window_memory_usage += sz;
 	}
 	if (!src->data) {
 		src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
 		if (sz != src_size)
 			die("object %s inconsistent object length (%lu vs %lu)",
 			    sha1_to_hex(src_entry->idx.sha1), sz, src_size);
+		window_memory_usage += sz;
 	}
 	if (!src->index) {
 		src->index = create_delta_index(src->data, src_size);
@@ -1366,6 +1372,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
 				warning("suboptimal pack - out of memory");
 			return 0;
 		}
+		window_memory_usage += sizeof_delta_index(src->index);
 	}
 
 	delta_buf = create_delta(src->index, trg->data, trg_size, &delta_size, max_size);
@@ -1408,9 +1415,22 @@ static unsigned int check_delta_limit(struct object_entry *me, unsigned int n)
 	return m;
 }
 
+static void free_unpacked(struct unpacked *n)
+{
+	window_memory_usage -= sizeof_delta_index(n->index);
+	free_delta_index(n->index);
+	n->index = NULL;
+	if (n->data) {
+		free(n->data);
+		n->data = NULL;
+		window_memory_usage -= n->entry->size;
+	}
+	n->entry = NULL;
+}
+
 static void find_deltas(struct object_entry **list, int window, int depth)
 {
-	uint32_t i = nr_objects, idx = 0, processed = 0;
+	uint32_t i = nr_objects, idx = 0, count = 0, processed = 0;
 	unsigned int array_size = window * sizeof(struct unpacked);
 	struct unpacked *array;
 	int max_depth;
@@ -1445,12 +1465,17 @@ static void find_deltas(struct object_entry **list, int window, int depth)
 		if (entry->no_try_delta)
 			continue;
 
-		free_delta_index(n->index);
-		n->index = NULL;
-		free(n->data);
-		n->data = NULL;
+		free_unpacked(n);
 		n->entry = entry;
 
+		while (window_memory_limit &&
+		       window_memory_usage > window_memory_limit &&
+		       count > 1) {
+			uint32_t tail = (idx + window - count) % window;
+			free_unpacked(array + tail);
+			count--;
+		}
+
 		/*
 		 * If the current object is at pack edge, take the depth the
 		 * objects that depend on the current object into account
@@ -1485,6 +1510,8 @@ static void find_deltas(struct object_entry **list, int window, int depth)
 
 		next:
 		idx++;
+		if (count + 1 < window)
+			count++;
 		if (idx >= window)
 			idx = 0;
 	} while (i > 0);
@@ -1523,6 +1550,10 @@ static int git_pack_config(const char *k, const char *v)
 		window = git_config_int(k, v);
 		return 0;
 	}
+	if(!strcmp(k, "pack.windowmemory")) {
+		window_memory_limit = git_config_ulong(k, v);
+		return 0;
+	}
 	if(!strcmp(k, "pack.depth")) {
 		depth = git_config_int(k, v);
 		return 0;
@@ -1699,6 +1730,11 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 				usage(pack_usage);
 			continue;
 		}
+		if (!prefixcmp(arg, "--window-memory=")) {
+			if (!git_parse_ulong(arg+15, &window_memory_limit))
+				usage(pack_usage);
+			continue;
+		}
 		if (!prefixcmp(arg, "--depth=")) {
 			char *end;
 			depth = strtoul(arg+8, &end, 0);
-- 
1.5.2.GIT

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/6] Add --window-memory option to git-repack
  2007-07-12 12:55 [PATCH 0/6] Pack window memory limit, take 2 Brian Downing
                   ` (3 preceding siblings ...)
  2007-07-12 12:55 ` [PATCH 4/6] Add pack-objects window memory usage limit Brian Downing
@ 2007-07-12 12:55 ` Brian Downing
  2007-07-12 12:55 ` [PATCH 6/6] Add documentation for --window-memory, pack.windowMemory Brian Downing
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Brian Downing @ 2007-07-12 12:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nicolas Pitre, Brian Downing

Signed-off-by: Brian Downing <bdowning@lavos.net>
---
 git-repack.sh |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/git-repack.sh b/git-repack.sh
index b5c6671..156c5e8 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -3,7 +3,7 @@
 # Copyright (c) 2005 Linus Torvalds
 #
 
-USAGE='[-a] [-d] [-f] [-l] [-n] [-q] [--max-pack-size=N] [--window=N] [--depth=N]'
+USAGE='[-a] [-d] [-f] [-l] [-n] [-q] [--max-pack-size=N] [--window=N] [--window-memory=N] [--depth=N]'
 SUBDIRECTORY_OK='Yes'
 . git-sh-setup
 
@@ -20,6 +20,7 @@ do
 	-l)	local=--local ;;
 	--max-pack-size=*) extra="$extra $1" ;;
 	--window=*) extra="$extra $1" ;;
+	--window-memory=*) extra="$extra $1" ;;
 	--depth=*) extra="$extra $1" ;;
 	*)	usage ;;
 	esac
-- 
1.5.2.GIT

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/6] Add documentation for --window-memory, pack.windowMemory
  2007-07-12 12:55 [PATCH 0/6] Pack window memory limit, take 2 Brian Downing
                   ` (4 preceding siblings ...)
  2007-07-12 12:55 ` [PATCH 5/6] Add --window-memory option to git-repack Brian Downing
@ 2007-07-12 12:55 ` Brian Downing
  2007-07-12 15:46 ` [PATCH 0/6] Pack window memory limit, take 2 Nicolas Pitre
  2007-07-13 18:23 ` Junio C Hamano
  7 siblings, 0 replies; 13+ messages in thread
From: Brian Downing @ 2007-07-12 12:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nicolas Pitre, Brian Downing

Signed-off-by: Brian Downing <bdowning@lavos.net>
---
 Documentation/config.txt           |    6 ++++++
 Documentation/git-pack-objects.txt |   11 +++++++++++
 Documentation/git-repack.txt       |   11 +++++++++++
 3 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 4b67f0a..11b3321 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -589,6 +589,12 @@ pack.depth::
 	The maximum delta depth used by gitlink:git-pack-objects[1] when no
 	maximum depth is given on the command line. Defaults to 50.
 
+pack.windowMemory::
+	The window memory size limit used by gitlink:git-pack-objects[1]
+	when no limit is given on the command line.  The value can be
+	suffixed with "k", "m", or "g".  Defaults to 0, meaning no
+	limit.
+
 pack.compression::
 	An integer -1..9, indicating the compression level for objects
 	in a pack file. -1 is the zlib default. 0 means no
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index e3549b5..6f17cff 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -85,6 +85,17 @@ base-name::
 	times to get to the necessary object.
 	The default value for --window is 10 and --depth is 50.
 
+--window-memory=[N]::
+	This option provides an additional limit on top of `--window`;
+	the window size will dynamically scale down so as to not take
+	up more than N bytes in memory.  This is useful in
+	repositories with a mix of large and small objects to not run
+	out of memory with a large window, but still be able to take
+	advantage of the large window for the smaller objects.  The
+	size can be suffixed with "k", "m", or "g".
+	`--window-memory=0` makes memory usage unlimited, which is the
+	default.
+
 --max-pack-size=<n>::
 	Maximum size of each output packfile, expressed in MiB.
 	If specified,  multiple packfiles may be created.
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 2894939..5283ef8 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -68,6 +68,17 @@ OPTIONS
 	to be applied that many times to get to the necessary object.
 	The default value for --window is 10 and --depth is 50.
 
+--window-memory=[N]::
+	This option provides an additional limit on top of `--window`;
+	the window size will dynamically scale down so as to not take
+	up more than N bytes in memory.  This is useful in
+	repositories with a mix of large and small objects to not run
+	out of memory with a large window, but still be able to take
+	advantage of the large window for the smaller objects.  The
+	size can be suffixed with "k", "m", or "g".
+	`--window-memory=0` makes memory usage unlimited, which is the
+	default.
+
 --max-pack-size=<n>::
 	Maximum size of each output packfile, expressed in MiB.
 	If specified,  multiple packfiles may be created.
-- 
1.5.2.GIT

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/6] Add pack-objects window memory usage limit
  2007-07-12 12:55 ` [PATCH 4/6] Add pack-objects window memory usage limit Brian Downing
@ 2007-07-12 13:04   ` Brian Downing
  2007-07-12 13:07     ` [PATCH] " Brian Downing
  0 siblings, 1 reply; 13+ messages in thread
From: Brian Downing @ 2007-07-12 13:04 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nicolas Pitre

On Thu, Jul 12, 2007 at 07:55:50AM -0500, Brian Downing wrote:
> +		if (!prefixcmp(arg, "--window-memory=")) {
> +			if (!git_parse_ulong(arg+15, &window_memory_limit))
> +				usage(pack_usage);
> +			continue;
> +		}

This is incorrect.  I had this fixed to +16, but somewhere in my remaking
the series it got lost.  I will resend the correct patch.

-bcd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/6] Add functions for parsing integers with size suffixes
  2007-07-12 12:55 ` [PATCH 3/6] Add functions for parsing integers with size suffixes Brian Downing
@ 2007-07-12 13:07   ` Johannes Schindelin
  2007-07-12 13:32     ` [PATCH] " Brian Downing
  0 siblings, 1 reply; 13+ messages in thread
From: Johannes Schindelin @ 2007-07-12 13:07 UTC (permalink / raw)
  To: Brian Downing; +Cc: git, Junio C Hamano, Nicolas Pitre

Hi,

On Thu, 12 Jul 2007, Brian Downing wrote:

> Split out the nnn{k,m,g} parsing code from git_config_int into 
> git_parse_long, so command-line parameters can enjoy the same 
> functionality.  Also add get_parse_ulong for unsigned values.

Nice!

> +		if (!*end)
> +			*ret = val;
> +		else if (!strcasecmp(end, "k"))
> +			*ret = val * 1024;
> +		else if (!strcasecmp(end, "m"))
> +			*ret = val * 1024 * 1024;
> +		else if (!strcasecmp(end, "g"))
> +			*ret = val * 1024 * 1024 * 1024;
> +		else
> +			return 0;

This could be an own static function, like this:

unsigned long get_unit_factor(const char *end)
{
	if (!*end)
		return 1;
	if (!strcasecmp(end, "k"))
		return 1024;
	...
	error("Unknown unit: %s", end);
	return 1;
}

to avoid duplicated code.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] Add pack-objects window memory usage limit
  2007-07-12 13:04   ` Brian Downing
@ 2007-07-12 13:07     ` Brian Downing
  0 siblings, 0 replies; 13+ messages in thread
From: Brian Downing @ 2007-07-12 13:07 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nicolas Pitre, Brian Downing

This adds an option (--window-memory=N) and configuration variable
(pack.windowMemory = N) to limit the memory size of the pack-objects
delta search window.  This works by removing the oldest unpacked objects
whenever the total size goes above the limit.  It will always leave
at least one object, though, so as not to completely eliminate the
possibility of computing deltas.

This is an extra limit on top of the normal window size (--window=N);
the window will not dynamically grow above the fixed number of entries
specified to fill the memory limit.

With this, repacking a repository with a mix of large and small objects
is possible even with a very large window.

Cleaner and correct circular buffer handling courtesy of Nicolas Pitre.

Signed-off-by: Brian Downing <bdowning@lavos.net>
---
 builtin-pack-objects.c |   50 +++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 132ce96..5cc2148 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -16,8 +16,9 @@
 #include "progress.h"
 
 static const char pack_usage[] = "\
-git-pack-objects [{ -q | --progress | --all-progress }] [--max-pack-size=N] \n\
-	[--local] [--incremental] [--window=N] [--depth=N] \n\
+git-pack-objects [{ -q | --progress | --all-progress }] \n\
+	[--max-pack-size=N] [--local] [--incremental] \n\
+	[--window=N] [--window-memory=N] [--depth=N] \n\
 	[--no-reuse-delta] [--no-reuse-object] [--delta-base-offset] \n\
 	[--non-empty] [--revs [--unpacked | --all]*] [--reflog] \n\
 	[--stdout | base-name] [<ref-list | <object-list]";
@@ -79,6 +80,9 @@ static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = 0;
 static unsigned long cache_max_small_delta_size = 1000;
 
+static unsigned long window_memory_usage = 0;
+static unsigned long window_memory_limit = 0;
+
 /*
  * The object names in objects array are hashed with this hashtable,
  * to help looking up the entry by object name.
@@ -1351,12 +1355,14 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
 		if (sz != trg_size)
 			die("object %s inconsistent object length (%lu vs %lu)",
 			    sha1_to_hex(trg_entry->idx.sha1), sz, trg_size);
+		window_memory_usage += sz;
 	}
 	if (!src->data) {
 		src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
 		if (sz != src_size)
 			die("object %s inconsistent object length (%lu vs %lu)",
 			    sha1_to_hex(src_entry->idx.sha1), sz, src_size);
+		window_memory_usage += sz;
 	}
 	if (!src->index) {
 		src->index = create_delta_index(src->data, src_size);
@@ -1366,6 +1372,7 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
 				warning("suboptimal pack - out of memory");
 			return 0;
 		}
+		window_memory_usage += sizeof_delta_index(src->index);
 	}
 
 	delta_buf = create_delta(src->index, trg->data, trg_size, &delta_size, max_size);
@@ -1408,9 +1415,22 @@ static unsigned int check_delta_limit(struct object_entry *me, unsigned int n)
 	return m;
 }
 
+static void free_unpacked(struct unpacked *n)
+{
+	window_memory_usage -= sizeof_delta_index(n->index);
+	free_delta_index(n->index);
+	n->index = NULL;
+	if (n->data) {
+		free(n->data);
+		n->data = NULL;
+		window_memory_usage -= n->entry->size;
+	}
+	n->entry = NULL;
+}
+
 static void find_deltas(struct object_entry **list, int window, int depth)
 {
-	uint32_t i = nr_objects, idx = 0, processed = 0;
+	uint32_t i = nr_objects, idx = 0, count = 0, processed = 0;
 	unsigned int array_size = window * sizeof(struct unpacked);
 	struct unpacked *array;
 	int max_depth;
@@ -1445,12 +1465,17 @@ static void find_deltas(struct object_entry **list, int window, int depth)
 		if (entry->no_try_delta)
 			continue;
 
-		free_delta_index(n->index);
-		n->index = NULL;
-		free(n->data);
-		n->data = NULL;
+		free_unpacked(n);
 		n->entry = entry;
 
+		while (window_memory_limit &&
+		       window_memory_usage > window_memory_limit &&
+		       count > 1) {
+			uint32_t tail = (idx + window - count) % window;
+			free_unpacked(array + tail);
+			count--;
+		}
+
 		/*
 		 * If the current object is at pack edge, take the depth the
 		 * objects that depend on the current object into account
@@ -1485,6 +1510,8 @@ static void find_deltas(struct object_entry **list, int window, int depth)
 
 		next:
 		idx++;
+		if (count + 1 < window)
+			count++;
 		if (idx >= window)
 			idx = 0;
 	} while (i > 0);
@@ -1523,6 +1550,10 @@ static int git_pack_config(const char *k, const char *v)
 		window = git_config_int(k, v);
 		return 0;
 	}
+	if(!strcmp(k, "pack.windowmemory")) {
+		window_memory_limit = git_config_ulong(k, v);
+		return 0;
+	}
 	if(!strcmp(k, "pack.depth")) {
 		depth = git_config_int(k, v);
 		return 0;
@@ -1699,6 +1730,11 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 				usage(pack_usage);
 			continue;
 		}
+		if (!prefixcmp(arg, "--window-memory=")) {
+			if (!git_parse_ulong(arg+16, &window_memory_limit))
+				usage(pack_usage);
+			continue;
+		}
 		if (!prefixcmp(arg, "--depth=")) {
 			char *end;
 			depth = strtoul(arg+8, &end, 0);
-- 
1.5.2.GIT

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH] Add functions for parsing integers with size suffixes
  2007-07-12 13:07   ` Johannes Schindelin
@ 2007-07-12 13:32     ` Brian Downing
  0 siblings, 0 replies; 13+ messages in thread
From: Brian Downing @ 2007-07-12 13:32 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nicolas Pitre, Johannes Schindelin, Brian Downing

Split out the nnn{k,m,g} parsing code from git_config_int into
git_parse_long, so command-line parameters can enjoy the same
functionality.  Also add get_parse_ulong for unsigned values.

Make git_config_int use git_parse_long, and add get_config_ulong
as well.

Signed-off-by: Brian Downing <bdowning@lavos.net>
---
    Good idea!

 cache.h  |    3 +++
 config.c |   58 ++++++++++++++++++++++++++++++++++++++++++++++------------
 2 files changed, 49 insertions(+), 12 deletions(-)

diff --git a/cache.h b/cache.h
index e64071e..917a7e3 100644
--- a/cache.h
+++ b/cache.h
@@ -521,7 +521,10 @@ typedef int (*config_fn_t)(const char *, const char *);
 extern int git_default_config(const char *, const char *);
 extern int git_config_from_file(config_fn_t fn, const char *);
 extern int git_config(config_fn_t fn);
+extern int git_parse_long(const char *, long *);
+extern int git_parse_ulong(const char *, unsigned long *);
 extern int git_config_int(const char *, const char *);
+extern unsigned long git_config_ulong(const char *, const char *);
 extern int git_config_bool(const char *, const char *);
 extern int git_config_set(const char *, const char *);
 extern int git_config_set_multivar(const char *, const char *, const char *, int);
diff --git a/config.c b/config.c
index 561ee3b..f89a611 100644
--- a/config.c
+++ b/config.c
@@ -233,21 +233,55 @@ static int git_parse_file(config_fn_t fn)
 	die("bad config file line %d in %s", config_linenr, config_file_name);
 }
 
-int git_config_int(const char *name, const char *value)
+static unsigned long get_unit_factor(const char *end)
+{
+	if (!*end)
+		return 1;
+	else if (!strcasecmp(end, "k"))
+		return 1024;
+	else if (!strcasecmp(end, "m"))
+		return 1024 * 1024;
+	else if (!strcasecmp(end, "g"))
+		return 1024 * 1024 * 1024;
+	die("unknown unit: '%s'", end);
+}
+
+int git_parse_long(const char *value, long *ret)
+{
+	if (value && *value) {
+		char *end;
+		long val = strtol(value, &end, 0);
+		*ret = val * get_unit_factor(end);
+		return 1;
+	}
+	return 0;
+}
+
+int git_parse_ulong(const char *value, unsigned long *ret)
 {
 	if (value && *value) {
 		char *end;
-		int val = strtol(value, &end, 0);
-		if (!*end)
-			return val;
-		if (!strcasecmp(end, "k"))
-			return val * 1024;
-		if (!strcasecmp(end, "m"))
-			return val * 1024 * 1024;
-		if (!strcasecmp(end, "g"))
-			return val * 1024 * 1024 * 1024;
-	}
-	die("bad config value for '%s' in %s", name, config_file_name);
+		unsigned long val = strtoul(value, &end, 0);
+		*ret = val * get_unit_factor(end);
+		return 1;
+	}
+	return 0;
+}
+
+int git_config_int(const char *name, const char *value)
+{
+	long ret;
+	if (!git_parse_long(value, &ret))
+		die("bad config value for '%s' in %s", name, config_file_name);
+	return ret;
+}
+
+unsigned long git_config_ulong(const char *name, const char *value)
+{
+	unsigned long ret;
+	if (!git_parse_ulong(value, &ret))
+		die("bad config value for '%s' in %s", name, config_file_name);
+	return ret;
 }
 
 int git_config_bool(const char *name, const char *value)
-- 
1.5.2.GIT

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/6] Pack window memory limit, take 2
  2007-07-12 12:55 [PATCH 0/6] Pack window memory limit, take 2 Brian Downing
                   ` (5 preceding siblings ...)
  2007-07-12 12:55 ` [PATCH 6/6] Add documentation for --window-memory, pack.windowMemory Brian Downing
@ 2007-07-12 15:46 ` Nicolas Pitre
  2007-07-13 18:23 ` Junio C Hamano
  7 siblings, 0 replies; 13+ messages in thread
From: Nicolas Pitre @ 2007-07-12 15:46 UTC (permalink / raw)
  To: Brian Downing; +Cc: git, Junio C Hamano

On Thu, 12 Jul 2007, Brian Downing wrote:

> This series has my circular buffer errors (hopefully) corrected, and
> the code is a lot cleaner as a bonus.  Also, the options are now named
> --window-memory and pack.windowMemory, and both can take {k,m,g} suffixes.

Acked-by: Nicolas Pitre <nico@cam.org>
(including the amended patches).


Nicolas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/6] Pack window memory limit, take 2
  2007-07-12 12:55 [PATCH 0/6] Pack window memory limit, take 2 Brian Downing
                   ` (6 preceding siblings ...)
  2007-07-12 15:46 ` [PATCH 0/6] Pack window memory limit, take 2 Nicolas Pitre
@ 2007-07-13 18:23 ` Junio C Hamano
  7 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2007-07-13 18:23 UTC (permalink / raw)
  To: Brian Downing; +Cc: git, Nicolas Pitre

Very nicely done.  Thanks, everybody involved.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-07-13 18:23 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-12 12:55 [PATCH 0/6] Pack window memory limit, take 2 Brian Downing
2007-07-12 12:55 ` [PATCH 1/6] Don't try to delta if target is much smaller than source Brian Downing
2007-07-12 12:55 ` [PATCH 2/6] Support fetching the memory usage of a delta index Brian Downing
2007-07-12 12:55 ` [PATCH 3/6] Add functions for parsing integers with size suffixes Brian Downing
2007-07-12 13:07   ` Johannes Schindelin
2007-07-12 13:32     ` [PATCH] " Brian Downing
2007-07-12 12:55 ` [PATCH 4/6] Add pack-objects window memory usage limit Brian Downing
2007-07-12 13:04   ` Brian Downing
2007-07-12 13:07     ` [PATCH] " Brian Downing
2007-07-12 12:55 ` [PATCH 5/6] Add --window-memory option to git-repack Brian Downing
2007-07-12 12:55 ` [PATCH 6/6] Add documentation for --window-memory, pack.windowMemory Brian Downing
2007-07-12 15:46 ` [PATCH 0/6] Pack window memory limit, take 2 Nicolas Pitre
2007-07-13 18:23 ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).