* [PATCH 1/3] builtin-pack-objects: don't fail, if delta is not possible @ 2007-05-28 21:20 Martin Koegler 2007-05-28 21:20 ` [PATCH 2/3] git-pack-objects: cache small deltas between big objects Martin Koegler 2007-05-29 2:45 ` [PATCH 1/3] builtin-pack-objects: don't fail, if delta is not possible Nicolas Pitre 0 siblings, 2 replies; 6+ messages in thread From: Martin Koegler @ 2007-05-28 21:20 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Martin Koegler If builtin-pack-objects runs out of memory while finding the best deltas, it bails out with an error. If the delta index creation fails (because there is not enough memory), we can downgrade the error message to a warning and continue with the next object. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> --- The patches apply on top of next. builtin-pack-objects.c | 8 ++++++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index e52332d..17627b3 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -1454,8 +1454,12 @@ static int try_delta(struct unpacked *trg, struct unpacked *src, } if (!src->index) { src->index = create_delta_index(src->data, src_size); - if (!src->index) - die("out of memory"); + if (!src->index) { + static int warned = 0; + if (!warned++) + warning("suboptimal pack - out of memory"); + return 0; + } } delta_buf = create_delta(src->index, trg->data, trg_size, &delta_size, max_size); -- 1.5.2.846.g9a144 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/3] git-pack-objects: cache small deltas between big objects 2007-05-28 21:20 [PATCH 1/3] builtin-pack-objects: don't fail, if delta is not possible Martin Koegler @ 2007-05-28 21:20 ` Martin Koegler 2007-05-28 21:20 ` [PATCH 3/3] builtin-pack-object: cache small deltas Martin Koegler 2007-05-29 2:45 ` [PATCH 1/3] builtin-pack-objects: don't fail, if delta is not possible Nicolas Pitre 1 sibling, 1 reply; 6+ messages in thread From: Martin Koegler @ 2007-05-28 21:20 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Martin Koegler Creating deltas between big blobs is a CPU and memory intensive task. In the writing phase, all (not reused) deltas are redone. This patch adds support for caching deltas from the deltifing phase, so that that the writing phase is faster. The caching is limited to small deltas to avoid increasing memory usage very much. The implemented limit is (memory needed to create the delta)/1024. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> --- The patch is an improved version of the last patch. I added a limit for the cache size. It's only configureable by git-config, as builtin-back-objects (git-repack) has already enough options. The patch applies on top of next. Documentation/config.txt | 5 +++ builtin-pack-objects.c | 69 ++++++++++++++++++++++++++++++++++++---------- 2 files changed, 59 insertions(+), 15 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 3d8f03d..83cc4cd 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -567,6 +567,11 @@ pack.compression:: slowest. If not set, defaults to core.compression. If that is not set, defaults to -1. +pack.deltaCacheSize:: + The maxium memory in bytes used for caching deltas in + gitlink:git-pack-objects[1]. + A value of 0 means no limit. Defaults to 0. + pull.octopus:: The default merge strategy to use when pulling multiple branches at once. diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 17627b3..85e08dc 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -36,6 +36,7 @@ struct object_entry { struct object_entry *delta_sibling; /* other deltified objects who * uses the same base as me */ + void *delta_data; /* cached delta (uncompressed) */ unsigned long delta_size; /* delta data size (uncompressed) */ enum object_type type; enum object_type in_pack_type; /* could be delta */ @@ -76,6 +77,9 @@ static struct progress progress_state; static int pack_compression_level = Z_DEFAULT_COMPRESSION; static int pack_compression_seen; +static unsigned long delta_cache_size = 0; +static unsigned long max_delta_cache_size = 0; + /* * The object names in objects array are hashed with this hashtable, * to help looking up the entry by object name. @@ -405,24 +409,31 @@ static unsigned long write_object(struct sha1file *f, z_stream stream; unsigned long maxsize; void *out; - buf = read_sha1_file(entry->sha1, &type, &size); - if (!buf) - die("unable to read %s", sha1_to_hex(entry->sha1)); - if (size != entry->size) - die("object %s size inconsistency (%lu vs %lu)", - sha1_to_hex(entry->sha1), size, entry->size); - if (usable_delta) { - buf = delta_against(buf, size, entry); + if (entry->delta_data && usable_delta) { + buf = entry->delta_data; size = entry->delta_size; obj_type = (allow_ofs_delta && entry->delta->offset) ? OBJ_OFS_DELTA : OBJ_REF_DELTA; } else { - /* - * recover real object type in case - * check_object() wanted to re-use a delta, - * but we couldn't since base was in previous split pack - */ - obj_type = type; + buf = read_sha1_file(entry->sha1, &type, &size); + if (!buf) + die("unable to read %s", sha1_to_hex(entry->sha1)); + if (size != entry->size) + die("object %s size inconsistency (%lu vs %lu)", + sha1_to_hex(entry->sha1), size, entry->size); + if (usable_delta) { + buf = delta_against(buf, size, entry); + size = entry->delta_size; + obj_type = (allow_ofs_delta && entry->delta->offset) ? + OBJ_OFS_DELTA : OBJ_REF_DELTA; + } else { + /* + * recover real object type in case + * check_object() wanted to re-use a delta, + * but we couldn't since base was in previous split pack + */ + obj_type = type; + } } /* compress the data to store and put compressed length in datalen */ memset(&stream, 0, sizeof(stream)); @@ -1385,6 +1396,20 @@ struct unpacked { struct delta_index *index; }; +static int delta_cacheable (struct unpacked *trg, struct unpacked *src, + unsigned long src_size, unsigned long trg_size, + unsigned long delta_size) +{ + if (max_delta_cache_size && delta_cache_size + delta_size > max_delta_cache_size) + return 0; + + /* cache delta, if objects are large enough compared to delta size */ + if ((src_size >> 20) + (trg_size >> 21) > (delta_size >> 10)) + return 1; + + return 0; +} + /* * We search for deltas _backwards_ in a list sorted by type and * by size, so that we see progressively smaller and smaller files. @@ -1466,10 +1491,20 @@ static int try_delta(struct unpacked *trg, struct unpacked *src, if (!delta_buf) return 0; + if (trg_entry->delta_data) { + delta_cache_size -= trg_entry->delta_size; + free (trg_entry->delta_data); + } + trg_entry->delta_data = 0; trg_entry->delta = src_entry; trg_entry->delta_size = delta_size; trg_entry->depth = src_entry->depth + 1; - free(delta_buf); + + if (delta_cacheable (src, trg, src_size, trg_size, delta_size)) { + trg_entry->delta_data = realloc(delta_buf, delta_size); + delta_cache_size += trg_entry->delta_size; + } else + free(delta_buf); return 1; } @@ -1615,6 +1650,10 @@ static int git_pack_config(const char *k, const char *v) pack_compression_seen = 1; return 0; } + if(!strcmp(k, "pack.deltacachesize")) { + max_delta_cache_size = git_config_int(k, v); + return 0; + } return git_default_config(k, v); } -- 1.5.2.846.g9a144 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/3] builtin-pack-object: cache small deltas 2007-05-28 21:20 ` [PATCH 2/3] git-pack-objects: cache small deltas between big objects Martin Koegler @ 2007-05-28 21:20 ` Martin Koegler 2007-05-29 0:33 ` Dana How 0 siblings, 1 reply; 6+ messages in thread From: Martin Koegler @ 2007-05-28 21:20 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Martin Koegler Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> --- Caching small deltas improves packing time even on small repostistories. Repacking git.git with a delta size limit of 1000 brings CPU time from 66 to 49 seconds down. A limit of 500 bytes is only two secondes slower. The implicit cache size limit is (#objects)*(delta size limit). Documentation/config.txt | 4 ++++ builtin-pack-objects.c | 8 ++++++++ 2 files changed, 12 insertions(+), 0 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 83cc4cd..0061f7f 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -572,6 +572,10 @@ pack.deltaCacheSize:: gitlink:git-pack-objects[1]. A value of 0 means no limit. Defaults to 0. +pack.deltaCacheLimit:: + The maxium size of a delta, that is cached in + gitlink:git-pack-objects[1]. Defaults to 1000. + pull.octopus:: The default merge strategy to use when pulling multiple branches at once. diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 85e08dc..c316fea 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -79,6 +79,7 @@ static int pack_compression_seen; static unsigned long delta_cache_size = 0; static unsigned long max_delta_cache_size = 0; +static unsigned long cache_max_small_delta_size = 1000; /* * The object names in objects array are hashed with this hashtable, @@ -1403,6 +1404,9 @@ static int delta_cacheable (struct unpacked *trg, struct unpacked *src, if (max_delta_cache_size && delta_cache_size + delta_size > max_delta_cache_size) return 0; + if (delta_size < cache_max_small_delta_size) + return 1; + /* cache delta, if objects are large enough compared to delta size */ if ((src_size >> 20) + (trg_size >> 21) > (delta_size >> 10)) return 1; @@ -1654,6 +1658,10 @@ static int git_pack_config(const char *k, const char *v) max_delta_cache_size = git_config_int(k, v); return 0; } + if(!strcmp(k, "pack.deltacachelimit")) { + cache_max_small_delta_size = git_config_int(k, v); + return 0; + } return git_default_config(k, v); } -- 1.5.2.846.g9a144 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] builtin-pack-object: cache small deltas 2007-05-28 21:20 ` [PATCH 3/3] builtin-pack-object: cache small deltas Martin Koegler @ 2007-05-29 0:33 ` Dana How 0 siblings, 0 replies; 6+ messages in thread From: Dana How @ 2007-05-29 0:33 UTC (permalink / raw) To: Martin Koegler; +Cc: Junio C Hamano, git, danahow On 5/28/07, Martin Koegler <mkoegler@auto.tuwien.ac.at> wrote: > Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> > --- > Caching small deltas improves packing time even on small repostistories. > Repacking git.git with a delta size limit of 1000 brings CPU time from > 66 to 49 seconds down. A limit of 500 bytes is only two secondes slower. > > The implicit cache size limit is (#objects)*(delta size limit). This patchset appears a lot more robust over more special cases. Thanks for your extra work, -- Dana L. How danahow@gmail.com +1 650 804 5991 cell ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/3] builtin-pack-objects: don't fail, if delta is not possible 2007-05-28 21:20 [PATCH 1/3] builtin-pack-objects: don't fail, if delta is not possible Martin Koegler 2007-05-28 21:20 ` [PATCH 2/3] git-pack-objects: cache small deltas between big objects Martin Koegler @ 2007-05-29 2:45 ` Nicolas Pitre 2007-05-29 2:53 ` Shawn O. Pearce 1 sibling, 1 reply; 6+ messages in thread From: Nicolas Pitre @ 2007-05-29 2:45 UTC (permalink / raw) To: Martin Koegler; +Cc: Junio C Hamano, git On Mon, 28 May 2007, Martin Koegler wrote: > If builtin-pack-objects runs out of memory while finding > the best deltas, it bails out with an error. > > If the delta index creation fails (because there is not enough memory), > we can downgrade the error message to a warning and continue with the > next object. In the same vain, there is one realloc() that was turned into a xrealloc() in diff-delta.c. I think this was a mistake and should probably be a non fatal realloc again to let the caller go on. Nicolas ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/3] builtin-pack-objects: don't fail, if delta is not possible 2007-05-29 2:45 ` [PATCH 1/3] builtin-pack-objects: don't fail, if delta is not possible Nicolas Pitre @ 2007-05-29 2:53 ` Shawn O. Pearce 0 siblings, 0 replies; 6+ messages in thread From: Shawn O. Pearce @ 2007-05-29 2:53 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Martin Koegler, Junio C Hamano, git Nicolas Pitre <nico@cam.org> wrote: > On Mon, 28 May 2007, Martin Koegler wrote: > > > If builtin-pack-objects runs out of memory while finding > > the best deltas, it bails out with an error. > > > > If the delta index creation fails (because there is not enough memory), > > we can downgrade the error message to a warning and continue with the > > next object. > > In the same vain, there is one realloc() that was turned into a > xrealloc() in diff-delta.c. I think this was a mistake and should > probably be a non fatal realloc again to let the caller go on. And if those two calls fail to alloc their memory, they might want to try calling the pack window gc thingy (release_pack_memory(need, -1)) and then retry the alloc before they fail the delta generation. Its possible that we are better off releasing the LRU pack window and produce the delta, then to fail because we're hanging onto some mmap we don't need... And actaully release_pack_memory could get more aggressive now that the index_data can be lazily loaded. We could actually unload LRU indexes too. -- Shawn. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-05-29 2:54 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-05-28 21:20 [PATCH 1/3] builtin-pack-objects: don't fail, if delta is not possible Martin Koegler 2007-05-28 21:20 ` [PATCH 2/3] git-pack-objects: cache small deltas between big objects Martin Koegler 2007-05-28 21:20 ` [PATCH 3/3] builtin-pack-object: cache small deltas Martin Koegler 2007-05-29 0:33 ` Dana How 2007-05-29 2:45 ` [PATCH 1/3] builtin-pack-objects: don't fail, if delta is not possible Nicolas Pitre 2007-05-29 2:53 ` Shawn O. Pearce
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).