* [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc @ 2007-05-08 2:54 Theodore Ts'o 2007-05-08 3:13 ` Nicolas Pitre 2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Theodore Ts'o 0 siblings, 2 replies; 40+ messages in thread From: Theodore Ts'o @ 2007-05-08 2:54 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Sometimes users might want to use more aggressive packing options when doing a git-gc. This allows them to do so without having to use the low-level plumbing commands. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> --- Documentation/git-gc.txt | 3 ++- Documentation/git-pack-objects.txt | 19 +------------------ Documentation/pack-options.txt | 18 ++++++++++++++++++ builtin-gc.c | 30 ++++++++++++++++++++++++++++-- 4 files changed, 49 insertions(+), 21 deletions(-) create mode 100644 Documentation/pack-options.txt diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt index bc16584..3c807c2 100644 --- a/Documentation/git-gc.txt +++ b/Documentation/git-gc.txt @@ -8,7 +8,7 @@ git-gc - Cleanup unnecessary files and optimize the local repository SYNOPSIS -------- -'git-gc' [--prune] +'git-gc' [--prune] [--no-reuse-delta] [--window=N] [--depth=N] DESCRIPTION ----------- @@ -35,6 +35,7 @@ OPTIONS repository at the same time (e.g. never use this option in a cron script). +include::pack-options.txt[] Configuration ------------- diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index d9e11c6..aac71f6 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -73,18 +73,6 @@ base-name:: as if all refs under `$GIT_DIR/refs` are specified to be included. ---window=[N], --depth=[N]:: - These two options affect how the objects contained in - the pack are stored using delta compression. The - objects are first internally sorted by type, size and - optionally names and compared against the other objects - within --window to see if using delta compression saves - space. --depth limits the maximum delta depth; making - it too deep affects the performance on the unpacker - side, because delta data needs to be applied that many - times to get to the necessary object. - The default value for both --window and --depth is 10. - --incremental:: This flag causes an object already in a pack ignored even if it appears in the standard input. @@ -120,12 +108,7 @@ base-name:: This flag makes the command not to report its progress on the standard error stream. ---no-reuse-delta:: - When creating a packed archive in a repository that - has existing packs, the command reuses existing deltas. - This sometimes results in a slightly suboptimal pack. - This flag tells the command not to reuse existing deltas - but compute them from scratch. +include::pack-options.txt[] --delta-base-offset:: A packed archive can express base object of a delta as diff --git a/Documentation/pack-options.txt b/Documentation/pack-options.txt new file mode 100644 index 0000000..7b0ae5f --- /dev/null +++ b/Documentation/pack-options.txt @@ -0,0 +1,18 @@ +--window=[N], --depth=[N]:: + These two options affect how the objects contained in + the pack are stored using delta compression. The + objects are first internally sorted by type, size and + optionally names and compared against the other objects + within --window to see if using delta compression saves + space. --depth limits the maximum delta depth; making + it too deep affects the performance on the unpacker + side, because delta data needs to be applied that many + times to get to the necessary object. + The default value for both --window and --depth is 10. + +--no-reuse-delta:: + When creating a packed archive in a repository that + has existing packs, the command reuses existing deltas. + This sometimes results in a slightly suboptimal pack. + This flag tells the command not to reuse existing deltas + but compute them from scratch. diff --git a/builtin-gc.c b/builtin-gc.c index 3b1f8c2..7e7775d 100644 --- a/builtin-gc.c +++ b/builtin-gc.c @@ -15,13 +15,15 @@ #define FAILED_RUN "failed to run %s" -static const char builtin_gc_usage[] = "git-gc [--prune]"; +static const char builtin_gc_usage[] = "git-gc [--prune] [--no-reuse-delta] [--window=N] [--depth=N]"; static int pack_refs = -1; +#define MAX_ADD 10 + static const char *argv_pack_refs[] = {"pack-refs", "--prune", NULL}; static const char *argv_reflog[] = {"reflog", "expire", "--all", NULL}; -static const char *argv_repack[] = {"repack", "-a", "-d", "-l", NULL}; +static const char *argv_repack[MAX_ADD] = {"repack", "-a", "-d", "-l", NULL}; static const char *argv_prune[] = {"prune", NULL}; static const char *argv_rerere[] = {"rerere", "gc", NULL}; @@ -37,6 +39,21 @@ static int gc_config(const char *var, const char *value) return git_default_config(var, value); } +static append_option(const char **cmd, const char *opt, int max_length) +{ + int i; + + for (i=0; cmd[i]; i++) + ; + + if (i+2 >= max_length) { + fprintf(stderr, "Too many options specified\n"); + exit(1); + } + cmd[i++] = opt; + cmd[i] = 0; +} + int cmd_gc(int argc, const char **argv, const char *prefix) { int i; @@ -53,6 +70,15 @@ int cmd_gc(int argc, const char **argv, const char *prefix) prune = 1; continue; } + if (!strcmp(arg, "--no-reuse-delta")) { + append_option(argv_repack, "-f", MAX_ADD); + continue; + } + if (!strncmp(arg, "--window", 8) || + !strncmp(arg, "--depth", 7)) { + append_option(argv_repack, arg, MAX_ADD); + continue; + } /* perhaps other parameters later... */ break; } -- 1.5.2.rc1.20.g86b9-dirty ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc 2007-05-08 2:54 [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc Theodore Ts'o @ 2007-05-08 3:13 ` Nicolas Pitre 2007-05-08 3:21 ` Theodore Tso 2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Theodore Ts'o 1 sibling, 1 reply; 40+ messages in thread From: Nicolas Pitre @ 2007-05-08 3:13 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Junio C Hamano, git On Mon, 7 May 2007, Theodore Ts'o wrote: > Sometimes users might want to use more aggressive packing options > when doing a git-gc. This allows them to do so without having > to use the low-level plumbing commands. The 'git repack' command isn't _that_ low level, is it? git-pack-objects is plumbing for sure, but not git-repack? Especially if you're aware and interested in those options, you won't be afraid of 'git repack -a -f -d --window=...". In the context of "gc", having an option that reads "window" looks a bit strange too. Maybe it's just me... Nicolas ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc 2007-05-08 3:13 ` Nicolas Pitre @ 2007-05-08 3:21 ` Theodore Tso 2007-05-08 3:38 ` Dana How 2007-05-08 4:43 ` Junio C Hamano 0 siblings, 2 replies; 40+ messages in thread From: Theodore Tso @ 2007-05-08 3:21 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Junio C Hamano, git On Mon, May 07, 2007 at 11:13:58PM -0400, Nicolas Pitre wrote: > On Mon, 7 May 2007, Theodore Ts'o wrote: > > > Sometimes users might want to use more aggressive packing options > > when doing a git-gc. This allows them to do so without having > > to use the low-level plumbing commands. > > The 'git repack' command isn't _that_ low level, is it? > git-pack-objects is plumbing for sure, but not git-repack? > > Especially if you're aware and interested in those options, you won't be > afraid of 'git repack -a -f -d --window=...". > > In the context of "gc", having an option that reads "window" looks a bit > strange too. I suppose, but you either need to then know all of the other commands which git-gc runs, and do them manually, skipping git-gc altogether, or use git-gc, and end up rewriting the pack twice, ince using the git-repack in git-gc, and then once manually so you can give the options that you really want to give to git-repack. Maybe the right approach is to have a way to specify default --window and --depth as git configuration variables? Looks like there is a pack.window already, but not a pack.depth. What if we add a pack.depth configuration option, and add only --no-reuse-delta to git-gc? Would that be better? - Ted ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc 2007-05-08 3:21 ` Theodore Tso @ 2007-05-08 3:38 ` Dana How 2007-05-08 4:43 ` Junio C Hamano 1 sibling, 0 replies; 40+ messages in thread From: Dana How @ 2007-05-08 3:38 UTC (permalink / raw) To: Theodore Tso; +Cc: Nicolas Pitre, Junio C Hamano, git, danahow On 5/7/07, Theodore Tso <tytso@mit.edu> wrote: > On Mon, May 07, 2007 at 11:13:58PM -0400, Nicolas Pitre wrote: > > On Mon, 7 May 2007, Theodore Ts'o wrote: > > > Sometimes users might want to use more aggressive packing options > > > when doing a git-gc. This allows them to do so without having > > > to use the low-level plumbing commands. > > In the context of "gc", having an option that reads "window" looks a bit > > strange too. > I suppose, but you either need to then know all of the other commands > which git-gc runs, and do them manually, skipping git-gc altogether, > or use git-gc, and end up rewriting the pack twice, ince using the > git-repack in git-gc, and then once manually so you can give the > options that you really want to give to git-repack. > > Maybe the right approach is to have a way to specify default --window > and --depth as git configuration variables? Looks like there is a > pack.window already, but not a pack.depth. > > What if we add a pack.depth configuration option, and add only > --no-reuse-delta to git-gc? Would that be better? I would use pack.depth . Thanks, -- Dana L. How danahow@gmail.com +1 650 804 5991 cell ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc 2007-05-08 3:21 ` Theodore Tso 2007-05-08 3:38 ` Dana How @ 2007-05-08 4:43 ` Junio C Hamano 2007-05-08 13:46 ` Nicolas Pitre 1 sibling, 1 reply; 40+ messages in thread From: Junio C Hamano @ 2007-05-08 4:43 UTC (permalink / raw) To: Theodore Tso; +Cc: Nicolas Pitre, git Theodore Tso <tytso@mit.edu> writes: > On Mon, May 07, 2007 at 11:13:58PM -0400, Nicolas Pitre wrote: >> ... >> Especially if you're aware and interested in those options, you won't be >> afraid of 'git repack -a -f -d --window=...". >> >> In the context of "gc", having an option that reads "window" looks a bit >> strange too. > > I suppose, but you either need to then know all of the other commands > which git-gc runs, and do them manually, skipping git-gc altogether, > or use git-gc, and end up rewriting the pack twice,... If the user really wants to tweak the parameters that much and that often, I think what Nico says, plus your pack.depth/window configuration variables, make more sense. git-gc is meant to be a shorthand with reasonable "one size fits all" default, and there is something wrong if a user has to give customization option every time it is run. It could be that the default parameters are grossly off for _everybody_, in which case we should fix the default. With the recent introduction of delta base caching code, we might want to tweak the default pack depth to larger value for everybody, by the way. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc 2007-05-08 4:43 ` Junio C Hamano @ 2007-05-08 13:46 ` Nicolas Pitre 0 siblings, 0 replies; 40+ messages in thread From: Nicolas Pitre @ 2007-05-08 13:46 UTC (permalink / raw) To: Junio C Hamano; +Cc: Theodore Tso, git On Mon, 7 May 2007, Junio C Hamano wrote: > Theodore Tso <tytso@mit.edu> writes: > > > On Mon, May 07, 2007 at 11:13:58PM -0400, Nicolas Pitre wrote: > >> ... > >> Especially if you're aware and interested in those options, you won't be > >> afraid of 'git repack -a -f -d --window=...". > >> > >> In the context of "gc", having an option that reads "window" looks a bit > >> strange too. > > > > I suppose, but you either need to then know all of the other commands > > which git-gc runs, and do them manually, skipping git-gc altogether, > > or use git-gc, and end up rewriting the pack twice,... > > If the user really wants to tweak the parameters that much and > that often, I think what Nico says, plus your pack.depth/window > configuration variables, make more sense. Also, once you've run git-repack with -f and the window/depth params you want, those deltas will be reused as is by default afterwards even if you use git-gc. So the big repack to tighten a large repo needs only to be done once. Nicolas ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta, --window, and --depth options to 2007-05-08 2:54 [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc Theodore Ts'o 2007-05-08 3:13 ` Nicolas Pitre @ 2007-05-08 13:28 ` Theodore Ts'o 2007-05-08 13:28 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Theodore Ts'o 2007-05-08 15:30 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Nicolas Pitre 1 sibling, 2 replies; 40+ messages in thread From: Theodore Ts'o @ 2007-05-08 13:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List OK, here's a patch to implement pack.depth (with the default tweaked to 50 --- is that too high?), followed by a simplified and reworked patch to git-gc that only implements --no-reuse-delta. I don't imagine that most users will want to use that feature most of the time, hence the long option name, but occasionally, it might be useful. Yes, the user could just run "git-repack -a -d -f -l" after running git-gc, but then the "git-repack -a -d -l" in git-gc is just a wasted disk i/o. I don't know if I'll manage to convince you, if not, just drop the second patch, I guess. :-) - Ted ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Theodore Ts'o @ 2007-05-08 13:28 ` Theodore Ts'o 2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta option to git-gc Theodore Ts'o 2007-05-08 15:38 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Nicolas Pitre 2007-05-08 15:30 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Nicolas Pitre 1 sibling, 2 replies; 40+ messages in thread From: Theodore Ts'o @ 2007-05-08 13:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List, Theodore Ts'o Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> --- Documentation/config.txt | 6 +++++- Documentation/git-pack-objects.txt | 2 +- Documentation/git-repack.txt | 2 +- builtin-pack-objects.c | 6 +++++- 4 files changed, 12 insertions(+), 4 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 24f9655..c7674c2 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -544,7 +544,11 @@ merge.<driver>.recursive:: pack.window:: The size of the window used by gitlink:git-pack-objects[1] when no - window size is given on the command line. Defaults to 10. + window size is given on the command line. Defaults to 10. + +pack.depth:: + The maximum delta depth used by gitlink:git-pack-objects[1] when no + maximum depth is given on the command line. Defaults to 50. pull.octopus:: The default merge strategy to use when pulling multiple branches diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index d9e11c6..bd3ee45 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -83,7 +83,7 @@ base-name:: it too deep affects the performance on the unpacker side, because delta data needs to be applied that many times to get to the necessary object. - The default value for both --window and --depth is 10. + The default value for --window is 10 and --depth is 50. --incremental:: This flag causes an object already in a pack ignored diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index d39abc1..cc3b0b2 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -63,7 +63,7 @@ OPTIONS space. `--depth` limits the maximum delta depth; making it too deep affects the performance on the unpacker side, because delta data needs to be applied that many times to get to the necessary object. - The default value for both --window and --depth is 10. + The default value for --window is 10 and --depth is 50. Configuration diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 7bff8ea..966f843 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -64,6 +64,7 @@ static char tmpname[PATH_MAX]; static unsigned char pack_file_sha1[20]; static int progress = 1; static int window = 10; +static int depth = 50; static int pack_to_stdout; static int num_preferred_base; static struct progress progress_state; @@ -1489,6 +1490,10 @@ static int git_pack_config(const char *k, const char *v) window = git_config_int(k, v); return 0; } + if(!strcmp(k, "pack.depth")) { + depth = git_config_int(k, v); + return 0; + } return git_default_config(k, v); } @@ -1584,7 +1589,6 @@ static int adjust_perm(const char *path, mode_t mode) int cmd_pack_objects(int argc, const char **argv, const char *prefix) { - int depth = 10; int use_internal_rev_list = 0; int thin = 0; uint32_t i; -- 1.5.2.rc2.22.ga39d ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH] Add --no-reuse-delta option to git-gc 2007-05-08 13:28 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Theodore Ts'o @ 2007-05-08 13:28 ` Theodore Ts'o 2007-05-08 15:35 ` Nicolas Pitre 2007-05-09 5:05 ` Daniel Barkalow 2007-05-08 15:38 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Nicolas Pitre 1 sibling, 2 replies; 40+ messages in thread From: Theodore Ts'o @ 2007-05-08 13:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List, Theodore Ts'o This allows the user to regenerate the deltas in packs while doing a git-gc. The user could just run git-repack -a -d -f -l after running git-gc, but then the first git-repack run by git-gc is a bit of waste. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> --- Documentation/git-gc.txt | 7 ++++++- builtin-gc.c | 24 ++++++++++++++++++++++-- 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt index bc16584..0493d06 100644 --- a/Documentation/git-gc.txt +++ b/Documentation/git-gc.txt @@ -8,7 +8,7 @@ git-gc - Cleanup unnecessary files and optimize the local repository SYNOPSIS -------- -'git-gc' [--prune] +'git-gc' [--prune] [--no-reuse-delta] DESCRIPTION ----------- @@ -35,6 +35,11 @@ OPTIONS repository at the same time (e.g. never use this option in a cron script). +--no-reuse-delta:: + This causes deltas in existing packs to be recalculated instead + of reusing the existing deltas. This can save disk space at + the cost of taking more time to recalculate them from scratch. + Configuration ------------- diff --git a/builtin-gc.c b/builtin-gc.c index 3b1f8c2..5cb7ffd 100644 --- a/builtin-gc.c +++ b/builtin-gc.c @@ -15,13 +15,14 @@ #define FAILED_RUN "failed to run %s" -static const char builtin_gc_usage[] = "git-gc [--prune]"; +static const char builtin_gc_usage[] = "git-gc [--prune] [--no-reuse-delta]"; static int pack_refs = -1; +#define MAX_ADD 10 static const char *argv_pack_refs[] = {"pack-refs", "--prune", NULL}; static const char *argv_reflog[] = {"reflog", "expire", "--all", NULL}; -static const char *argv_repack[] = {"repack", "-a", "-d", "-l", NULL}; +static const char *argv_repack[MAX_ADD] = {"repack", "-a", "-d", "-l", NULL}; static const char *argv_prune[] = {"prune", NULL}; static const char *argv_rerere[] = {"rerere", "gc", NULL}; @@ -37,6 +38,21 @@ static int gc_config(const char *var, const char *value) return git_default_config(var, value); } +static append_option(const char **cmd, const char *opt, int max_length) +{ + int i; + + for (i=0; cmd[i]; i++) + ; + + if (i+2 >= max_length) { + fprintf(stderr, "Too many options specified\n"); + exit(1); + } + cmd[i++] = opt; + cmd[i] = 0; +} + int cmd_gc(int argc, const char **argv, const char *prefix) { int i; @@ -53,6 +69,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix) prune = 1; continue; } + if (!strcmp(arg, "--no-reuse-delta")) { + append_option(argv_repack, "-f", MAX_ADD); + continue; + } /* perhaps other parameters later... */ break; } -- 1.5.2.rc2.22.ga39d ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc 2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta option to git-gc Theodore Ts'o @ 2007-05-08 15:35 ` Nicolas Pitre 2007-05-09 5:05 ` Daniel Barkalow 1 sibling, 0 replies; 40+ messages in thread From: Nicolas Pitre @ 2007-05-08 15:35 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Junio C Hamano, Git Mailing List On Tue, 8 May 2007, Theodore Ts'o wrote: > This allows the user to regenerate the deltas in packs while doing > a git-gc. The user could just run git-repack -a -d -f -l after > running git-gc, but then the first git-repack run by git-gc is > a bit of waste. Given the patch I just posted and my previous arguments I don't think this patch is in the spirit of git-gc. I'd prefer if git-gc remained as simple with the least options as possible, but I don't care that much either. Nicolas ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc 2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta option to git-gc Theodore Ts'o 2007-05-08 15:35 ` Nicolas Pitre @ 2007-05-09 5:05 ` Daniel Barkalow 2007-05-09 8:15 ` Junio C Hamano 1 sibling, 1 reply; 40+ messages in thread From: Daniel Barkalow @ 2007-05-09 5:05 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Junio C Hamano, Git Mailing List On Tue, 8 May 2007, Theodore Ts'o wrote: > This allows the user to regenerate the deltas in packs while doing > a git-gc. The user could just run git-repack -a -d -f -l after > running git-gc, but then the first git-repack run by git-gc is > a bit of waste. Maybe git-gc should have an option for "compress hard"? It seems to me like a two-sizes-fit-all solution would be good here; "git gc" for daily use, and "git gc --squeeze" for when you want to make the result as small as possible, with compute time not being a major factor. If all you know is that you're going to burn this repository onto a stack of CDs and mail it to somebody (but don't know about delta reuse, window sizes, and depths, let alone good values for these), it would be good to have an option where it picks an appropriate different set of defaults for you. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc 2007-05-09 5:05 ` Daniel Barkalow @ 2007-05-09 8:15 ` Junio C Hamano 2007-05-09 9:02 ` Steven Grimm 2007-05-09 19:48 ` [PATCH] Add --aggressive option to 'git gc' Theodore Tso 0 siblings, 2 replies; 40+ messages in thread From: Junio C Hamano @ 2007-05-09 8:15 UTC (permalink / raw) To: Daniel Barkalow; +Cc: Theodore Ts'o, Git Mailing List Daniel Barkalow <barkalow@iabervon.org> writes: > On Tue, 8 May 2007, Theodore Ts'o wrote: > >> This allows the user to regenerate the deltas in packs while doing >> a git-gc. The user could just run git-repack -a -d -f -l after >> running git-gc, but then the first git-repack run by git-gc is >> a bit of waste. > > Maybe git-gc should have an option for "compress hard"? It seems to me > like a two-sizes-fit-all solution would be good here; "git gc" for daily > use, and "git gc --squeeze" for when you want to make the result as small > as possible, with compute time not being a major factor. I think that sounds saner and more user friendly than specific knob to tune "window", "depth" and friends which are too technical. It has an added attraction that we can redefine what exactly "hard" means later. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc 2007-05-09 8:15 ` Junio C Hamano @ 2007-05-09 9:02 ` Steven Grimm 2007-05-09 11:35 ` Other compression?, was " Johannes Schindelin ` (2 more replies) 2007-05-09 19:48 ` [PATCH] Add --aggressive option to 'git gc' Theodore Tso 1 sibling, 3 replies; 40+ messages in thread From: Steven Grimm @ 2007-05-09 9:02 UTC (permalink / raw) To: Junio C Hamano; +Cc: Daniel Barkalow, Theodore Ts'o, Git Mailing List Junio C Hamano wrote: > I think that sounds saner and more user friendly than specific > knob to tune "window", "depth" and friends which are too > technical. It has an added attraction that we can redefine what > exactly "hard" means later. > On that note, has any thought been given to looking at other compression algorithms? Gzip is a great high-speed compressor, but there are others out there (some a bit slower, some much slower at both compression and decompression) that produce substantially smaller output. One could even, if one were in a particularly twisted state of mind, envision using CPU-intensive compression for less frequently-accessed objects and using gzip for active ones, on the theory that the best time/space tradeoff is not uniform across all the objects in a git repository. Presumably most of us never actually unpack the vast majority of objects in a git repository of reasonable age, so the fact that it'd take a little longer if we *did* want to unpack them isn't much of a downside compared to the upside of reclaiming disk space. That would mitigate the impact of using an algorithm that's slow at decompression. I think it'd be kind of neat to have my .git directory shrink by another 20+%. That's conservative; on maximumcompression.com's test of a mix of different file types including images, gzip compresses 64% and the best-scoring one does 80%. On English text gzip does 71% and the top scorer does 89%. Most of the top-tier compressors are proprietary, but there are some open-source ones that do pretty well. Maybe not worth the added complexity, but I thought I'd toss it out there. It probably makes more sense (if it makes any at all) after Linus's suggestion to not unpack after cloning is in place. Once the upstream has gone to the trouble of CPU-intensive compressing, you certainly don't want to force clones to have to spend the time repeating the same work. -Steve (who suspects this is a "yes, we talked this over early in git's history" question, but what the heck) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Other compression?, was Re: [PATCH] Add --no-reuse-delta option to git-gc 2007-05-09 9:02 ` Steven Grimm @ 2007-05-09 11:35 ` Johannes Schindelin 2007-05-09 15:15 ` Junio C Hamano 2007-05-09 19:10 ` Shawn O. Pearce 2 siblings, 0 replies; 40+ messages in thread From: Johannes Schindelin @ 2007-05-09 11:35 UTC (permalink / raw) To: Steven Grimm; +Cc: Git Mailing List Hi, On Wed, 9 May 2007, Steven Grimm wrote: > On that note, has any thought been given to looking at other compression > algorithms? I think you could do that. But you would lose compatibility with all existing Git clients. IOW nobody could pull from you. Ciao, Dscho ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc 2007-05-09 9:02 ` Steven Grimm 2007-05-09 11:35 ` Other compression?, was " Johannes Schindelin @ 2007-05-09 15:15 ` Junio C Hamano 2007-05-09 19:10 ` Shawn O. Pearce 2 siblings, 0 replies; 40+ messages in thread From: Junio C Hamano @ 2007-05-09 15:15 UTC (permalink / raw) To: Steven Grimm; +Cc: Daniel Barkalow, Theodore Ts'o, Git Mailing List Steven Grimm <koreth@midwinter.com> writes: > -Steve (who suspects this is a "yes, we talked this over early in > git's history" question, but what the heck) AFAICR, the answer is "heck, we have never bothered about such low level implementation details as gzip has been good enough for us. If you feel so inclined, go wild." ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc 2007-05-09 9:02 ` Steven Grimm 2007-05-09 11:35 ` Other compression?, was " Johannes Schindelin 2007-05-09 15:15 ` Junio C Hamano @ 2007-05-09 19:10 ` Shawn O. Pearce 2007-06-10 7:40 ` Sam Vilain 2 siblings, 1 reply; 40+ messages in thread From: Shawn O. Pearce @ 2007-05-09 19:10 UTC (permalink / raw) To: Steven Grimm Cc: Junio C Hamano, Daniel Barkalow, Theodore Ts'o, Git Mailing List Steven Grimm <koreth@midwinter.com> wrote: > On that note, has any thought been given to looking at other compression > algorithms? Gzip is a great high-speed compressor, but there are others > out there (some a bit slower, some much slower at both compression and > decompression) that produce substantially smaller output. Its been discussed once before on the list, in very recent history, but not by a whole lot. As Junio pointed out, I don't think there ever really was any discussion of is gzip the best way to deflate the objects. I think gzip was just chosen simply because it was readily available in libz, stable, and has a pretty decent speed/size ratio. > I think it'd be kind of neat to have my .git directory shrink by another > 20+%. That's conservative; on maximumcompression.com's test of a mix of > different file types including images, gzip compresses 64% and the > best-scoring one does 80%. On English text gzip does 71% and the top > scorer does 89%. Most of the top-tier compressors are proprietary, but Yes. But in many cases we might actually be able to do even better by going with a pack-wide dictionary. Why? Think about source code structure. E.g. $ git grep --cached 'struct object'| cut -d: -f1|wc -l 402 So 402 files in git.git use the term 'struct object', and that's just the current revision I had in my index. With our current packfile organization we are likely to store this string at least 402 times. We'll store it once in each file's delta chain, assuming each file's blobs largely fall into a single delta chain for that file (reasonable assumption, but certainly not always true). That's just one string that does appear somewhat frequently in any file its used in. Now try 'unsigned char' (its 944 files, but an even higher frequency-per-file). So anyway, for the past year I've been thinking about trying to implement a blob-level dictionary prototype to see if it helps on a project like linux-2.6.git, but I haven't gotten to it. The pack v4 work was about applying that basic dicationary principal to trees and commits, and I think it pays off nicely there. Just need to get it cleaned up, rebased onto current master, and submitted to the list for wider testing. ;-) -- Shawn. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc 2007-05-09 19:10 ` Shawn O. Pearce @ 2007-06-10 7:40 ` Sam Vilain 2007-06-11 1:51 ` Nicolas Pitre 0 siblings, 1 reply; 40+ messages in thread From: Sam Vilain @ 2007-06-10 7:40 UTC (permalink / raw) To: Steven Grimm Cc: Shawn O. Pearce, Junio C Hamano, Daniel Barkalow, Theodore Ts'o, Git Mailing List Shawn O. Pearce wrote: >> On that note, has any thought been given to looking at other compression >> algorithms? Gzip is a great high-speed compressor, but there are others >> out there (some a bit slower, some much slower at both compression and >> decompression) that produce substantially smaller output. >> > Its been discussed once before on the list, in very recent history, > but not by a whole lot. As Junio pointed out, I don't think there > ever really was any discussion of is gzip the best way to deflate the > objects. I think gzip was just chosen simply because it was readily > available in libz, stable, and has a pretty decent speed/size ratio. > I think it's the right tool. I just don't see any point in changing to anything slower for the sake of 20% space saving. Especially bzip2. Consider this. Compression works primarily through two things: huffman coding and string matching. The larger the window for your string matching, the slower the compression and the more memory you need thrashing your CPU memory cache when decompressing. Now I'm not an expert on compression algorithms but I think a large part of the reason gzip is blindingly faster than bzip2 is because gzip uses a 64k buffer and bzip2 a 900k one. Only now are CPUs getting caches large enough to deal with that size of buffer, the rest of the time you're waiting for your RAM. Moore's law was supposed to make bzip2 fast one of these days but I'm still waiting. But with git-repack the window is effectively the size of your repository. So that blows bzip2 out of the water. Why else can git make compressed packs smaller than a .bz2 of the raw files? This is the same observation Shawn makes with the pack-wide dictionary, but he sounds like he wants to apply it to the huffman coding stage as well as the current delta/string matching stage. Now that would be interesting... Anyway it's a free world so be my guest to implement it, I guess if this was selectable it would only be a minor annoyance waiting a bit longer pulling from from some repositories, and it would be interesting to see if it did make a big difference with pack file sizes. Sam ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc 2007-06-10 7:40 ` Sam Vilain @ 2007-06-11 1:51 ` Nicolas Pitre 2007-06-11 6:20 ` Steven Grimm 2007-06-11 10:20 ` Johannes Schindelin 0 siblings, 2 replies; 40+ messages in thread From: Nicolas Pitre @ 2007-06-11 1:51 UTC (permalink / raw) To: Sam Vilain Cc: Steven Grimm, Shawn O. Pearce, Junio C Hamano, Daniel Barkalow, Theodore Ts'o, Git Mailing List On Sun, 10 Jun 2007, Sam Vilain wrote: > Now I'm not an expert on compression algorithms but I think a large part > of the reason gzip is blindingly faster than bzip2 is because gzip uses > a 64k buffer and bzip2 a 900k one. Only now are CPUs getting caches > large enough to deal with that size of buffer, the rest of the time > you're waiting for your RAM. Moore's law was supposed to make bzip2 fast > one of these days but I'm still waiting. > > Anyway it's a free world so be my guest to implement it, I guess if this > was selectable it would only be a minor annoyance waiting a bit longer > pulling from from some repositories, and it would be interesting to see > if it did make a big difference with pack file sizes. It won't happen for a simple reason: to be backward compatible with older GIT clients. If you have your repo compressed with bzip2 and an old client pulls it then the server would have to decompress and recompress everything with gzip. If instead your repo remains with gzip and a new client asks for bzip2 then you have to recompress as well (slow). So in practice it is best to remain with a single compression method. Nicolas ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc 2007-06-11 1:51 ` Nicolas Pitre @ 2007-06-11 6:20 ` Steven Grimm 2007-06-11 6:31 ` Shawn O. Pearce 2007-06-11 10:20 ` Johannes Schindelin 1 sibling, 1 reply; 40+ messages in thread From: Steven Grimm @ 2007-06-11 6:20 UTC (permalink / raw) To: Nicolas Pitre Cc: Sam Vilain, Shawn O. Pearce, Junio C Hamano, Daniel Barkalow, Theodore Ts'o, Git Mailing List Nicolas Pitre wrote: > > It won't happen for a simple reason: to be backward compatible with > older GIT clients. If you have your repo compressed with bzip2 and an > old client pulls it then the server would have to decompress and > recompress everything with gzip. If instead your repo remains with gzip > and a new client asks for bzip2 then you have to recompress as well > (slow). So in practice it is best to remain with a single compression > method. > Not that I really think this is all that important (my original question was more out of curiosity than anything) but I don't think those are really issues. Even with a better compression scheme, nobody would want to remove gzip support; if you are creating a repo that needs to be compatible with old clients -- and of course not all repositories need that -- you'd just configure it to use gzip (or more likely, *not* configure it to use the other method.) There are already options whose documentation says, "Don't enable this if you want your repo to be accessed by git clients older than version X." In other words, git already has an established approach for adding new non-backward-compatible features. Some projects, e.g. internal corporate ones, can mandate that everyone upgrade their clients, and more importantly, once any new git feature has been out for a long enough time, even public projects can reasonably say, "You need version X to access our repo." If alternate compression were introduced today, in five years would anyone care that there'd be some ancient, ancient clients that couldn't use it? And since the hypothetical new client would have support for both compression types, pulling from a gzip-based repo could be accomplished totally transparently; you could, one assumes, even pull from a gzip repo and pack locally using the other scheme if you felt like it. -Steve ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc 2007-06-11 6:20 ` Steven Grimm @ 2007-06-11 6:31 ` Shawn O. Pearce 0 siblings, 0 replies; 40+ messages in thread From: Shawn O. Pearce @ 2007-06-11 6:31 UTC (permalink / raw) To: Steven Grimm Cc: Nicolas Pitre, Sam Vilain, Junio C Hamano, Daniel Barkalow, Theodore Ts'o, Git Mailing List Steven Grimm <koreth@midwinter.com> wrote: > Nicolas Pitre wrote: > > > >It won't happen for a simple reason: to be backward compatible with > >older GIT clients. If you have your repo compressed with bzip2 and an > >old client pulls it then the server would have to decompress and > >recompress everything with gzip. If instead your repo remains with gzip > >and a new client asks for bzip2 then you have to recompress as well > >(slow). So in practice it is best to remain with a single compression > >method. > > > > Not that I really think this is all that important (my original question > was more out of curiosity than anything) but I don't think those are > really issues. ... > And since the hypothetical new client would have support for both > compression types, pulling from a gzip-based repo could be accomplished > totally transparently; you could, one assumes, even pull from a gzip > repo and pack locally using the other scheme if you felt like it. Right, I agree with Steven here. Pack v4 (which is rapidly becoming vaporware it seems) uses such a radically different encoding for its commit and tree objects that you cannot send them as-is to a client unless that client also understands pack v4. Consequently you need to decompress and recompress those objects when talking to an older peer; this isn't very different from a switch to bzip2. That said, I think switching to a different generic compressor isn't that interesting. We can probably do a lot better by organizing files into clusters, where those clusters have large number of symbols in common, and then compress everything in the same cluster with the same pack-wide dictionary. Specifically I'm thinking that you may be able to cluster the *.h/*.c into one cluster, and the Makefile into another, and the .gitignore into a third cluster, and then leverage the large commonalities between those blobs when building a dictionary and compressing them. No, I haven't prototyped it out yet. -- Shawn. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc 2007-06-11 1:51 ` Nicolas Pitre 2007-06-11 6:20 ` Steven Grimm @ 2007-06-11 10:20 ` Johannes Schindelin 2007-06-11 14:01 ` Nicolas Pitre 1 sibling, 1 reply; 40+ messages in thread From: Johannes Schindelin @ 2007-06-11 10:20 UTC (permalink / raw) To: Nicolas Pitre Cc: Sam Vilain, Steven Grimm, Shawn O. Pearce, Junio C Hamano, Daniel Barkalow, Theodore Ts'o, Git Mailing List Hi, On Sun, 10 Jun 2007, Nicolas Pitre wrote: > On Sun, 10 Jun 2007, Sam Vilain wrote: > > > Anyway it's a free world so be my guest to implement it, I guess if > > this was selectable it would only be a minor annoyance waiting a bit > > longer pulling from from some repositories, and it would be > > interesting to see if it did make a big difference with pack file > > sizes. > > It won't happen for a simple reason: to be backward compatible with > older GIT clients. If you have your repo compressed with bzip2 and an > old client pulls it then the server would have to decompress and > recompress everything with gzip. If instead your repo remains with gzip > and a new client asks for bzip2 then you have to recompress as well > (slow). So in practice it is best to remain with a single compression > method. With the extension mechanism we have in place, the client can send what kind of compression it supports, and the server can actually refuse to send anything if it does not want to recompress. What I am trying to say: you do not necessarily have to allow every client to access that particular repository. I agree that mixed-compression repos are evil, but nothing stands in the way of a flag allowing (or disallowing) recompression in a different format when fetching. So if you should decide someday to track data with Git (remember: Generic Information Tracker, not just source code), that is particularly unfit for compression with gzip, but that you _need_ to store in a different compressed manner, you can set up a repository which will _only_ _ever_ use that compression. Of course, you'd need to prepare Git for that, but I could imagine something like a music library, which stores everything as ogg encoded snippets. It might even use some perception-based hash on small chunks of the music, and store the music as tree objects, which concatenate the small chunks. I might even try to do this for fun, some day in the distant future. It's a wild world, Dscho ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc 2007-06-11 10:20 ` Johannes Schindelin @ 2007-06-11 14:01 ` Nicolas Pitre 2007-06-11 21:40 ` Johannes Schindelin 0 siblings, 1 reply; 40+ messages in thread From: Nicolas Pitre @ 2007-06-11 14:01 UTC (permalink / raw) To: Johannes Schindelin Cc: Sam Vilain, Steven Grimm, Shawn O. Pearce, Junio C Hamano, Daniel Barkalow, Theodore Ts'o, Git Mailing List On Mon, 11 Jun 2007, Johannes Schindelin wrote: > Hi, > > On Sun, 10 Jun 2007, Nicolas Pitre wrote: > > > On Sun, 10 Jun 2007, Sam Vilain wrote: > > > > > Anyway it's a free world so be my guest to implement it, I guess if > > > this was selectable it would only be a minor annoyance waiting a bit > > > longer pulling from from some repositories, and it would be > > > interesting to see if it did make a big difference with pack file > > > sizes. > > > > It won't happen for a simple reason: to be backward compatible with > > older GIT clients. If you have your repo compressed with bzip2 and an > > old client pulls it then the server would have to decompress and > > recompress everything with gzip. If instead your repo remains with gzip > > and a new client asks for bzip2 then you have to recompress as well > > (slow). So in practice it is best to remain with a single compression > > method. > > With the extension mechanism we have in place, the client can send what > kind of compression it supports, and the server can actually refuse to > send anything if it does not want to recompress. > > What I am trying to say: you do not necessarily have to allow every client > to access that particular repository. I agree that mixed-compression repos > are evil, but nothing stands in the way of a flag allowing (or > disallowing) recompression in a different format when fetching. I know. But is it worthwhile? I think not. However I won't stand in the way of anyone who wants to try and provide numbers. I just don't believe this is worthwhile and am not inclined to do it. OK... Well, I just performed a really quick test: $ mkdir test-bzip2 $ mkdir test-gzip $ cp git/*.[cho] test-bzip2 $ cp git/*.[cho] test-gzip $ bzip2 test-bzip2/* $ gzip test-gzip/* $ du -s test-bzip2 test-gzip 5016 test-bzip2 4956 test-gzip It is true that bzip2 is better with large files, but we typically have very few of them in a Git repo, and in the presence of large files bzip2 then becomes _much_ slower than gzip. So, given that the nature of Git objects are likely to be small in 98% of the cases due to deltas, it appears that bzip2 won't be a gain at all but rather a waste, making a poor case for supporting it forever afterwards. > So if you should decide someday to track data with Git (remember: Generic > Information Tracker, not just source code), Bah... if you please. > that is particularly unfit for > compression with gzip, but that you _need_ to store in a different > compressed manner, you can set up a repository which will _only_ _ever_ > use that compression. Maybe. But you'd better have a concrete data set and result numbers to convince me. Designing software for hypothetical situations before they actually exist leads to bloatware. Nicolas ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc 2007-06-11 14:01 ` Nicolas Pitre @ 2007-06-11 21:40 ` Johannes Schindelin 0 siblings, 0 replies; 40+ messages in thread From: Johannes Schindelin @ 2007-06-11 21:40 UTC (permalink / raw) To: Nicolas Pitre Cc: Sam Vilain, Steven Grimm, Shawn O. Pearce, Junio C Hamano, Daniel Barkalow, Theodore Ts'o, Git Mailing List Hi, On Mon, 11 Jun 2007, Nicolas Pitre wrote: > But you'd better have a concrete data set and result numbers to convince > me. Designing software for hypothetical situations before they actually > exist leads to bloatware. Right you are. So I'll not mention it until I have it ;-) Ciao, Dscho ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH] Add --aggressive option to 'git gc' 2007-05-09 8:15 ` Junio C Hamano 2007-05-09 9:02 ` Steven Grimm @ 2007-05-09 19:48 ` Theodore Tso 2007-05-09 20:19 ` Junio C Hamano 2007-05-10 7:38 ` Junio C Hamano 1 sibling, 2 replies; 40+ messages in thread From: Theodore Tso @ 2007-05-09 19:48 UTC (permalink / raw) To: Junio C Hamano; +Cc: Daniel Barkalow, Git Mailing List On Wed, May 09, 2007 at 01:15:07AM -0700, Junio C Hamano wrote: > > Maybe git-gc should have an option for "compress hard"? It seems to me > > like a two-sizes-fit-all solution would be good here; "git gc" for daily > > use, and "git gc --squeeze" for when you want to make the result as small > > as possible, with compute time not being a major factor. > > I think that sounds saner and more user friendly than specific > knob to tune "window", "depth" and friends which are too > technical. It has an added attraction that we can redefine what > exactly "hard" means later. OK, here's a patch that does exactly that. I choose git-gc --aggressive, since I thought that was more descriptive than --hard or --squeeze. Junio, would you be willing to apply this? - Ted === Cut here === Add --aggressive option to 'git gc' This option causes 'git gc' to more aggressively optimize the repository at the cost of taking much more wall clock and CPU time. Today this option causes git-pack-objects to use --no-use-delta option, and it allows the --window parameter to be set via the gc.aggressiveWindow configuration parameter. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> --- Documentation/config.txt | 5 +++++ Documentation/git-gc.txt | 16 +++++++++++++++- builtin-gc.c | 35 +++++++++++++++++++++++++++++++++-- 3 files changed, 53 insertions(+), 3 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index ea434af..efcf301 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -384,6 +384,11 @@ format.suffix:: `.patch`. Use this variable to change that suffix (make sure to include the dot if you want it). +gc.aggressiveWindow:: + The window size parameter used in the delta compression + algorithm used by 'git gc --aggressive'. This defaults + to 10. + gc.packrefs:: `git gc` does not run `git pack-refs` in a bare repository by default so that older dumb-transport clients can still fetch diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt index bc16584..56575e8 100644 --- a/Documentation/git-gc.txt +++ b/Documentation/git-gc.txt @@ -8,7 +8,7 @@ git-gc - Cleanup unnecessary files and optimize the local repository SYNOPSIS -------- -'git-gc' [--prune] +'git-gc' [--prune] [--aggressive] DESCRIPTION ----------- @@ -35,6 +35,13 @@ OPTIONS repository at the same time (e.g. never use this option in a cron script). +--aggressive:: + Usually 'git-gc' runs very quickly while providing good disk + space utilization and performance. This option will cause + git-gc to more aggressive optimize the repository at the expense + of taking much more time. The effects of this optimization are + persistent, so this option only needs to be sporadically; every + few hundred changesets or so. Configuration ------------- @@ -67,6 +74,13 @@ The optional configuration variable 'gc.packrefs' determines if is not run in bare repositories by default, to allow older dumb-transport clients fetch from the repository, but this will change in the future. +The optional configuration variable 'gc.aggressiveWindow' controls how +much time is spent optimizing the delta compression of the objects in +the repository when the --aggressive option is specified. The larger +the value, the more time is spent optimizing the delta compression. See +the documentation for the --window' option in gitlink:git-repack[1] for +more details. This defaults to 10. + See Also -------- gitlink:git-prune[1] diff --git a/builtin-gc.c b/builtin-gc.c index 3b1f8c2..10f92f1 100644 --- a/builtin-gc.c +++ b/builtin-gc.c @@ -15,13 +15,15 @@ #define FAILED_RUN "failed to run %s" -static const char builtin_gc_usage[] = "git-gc [--prune]"; +static const char builtin_gc_usage[] = "git-gc [--prune] [--aggressive]"; static int pack_refs = -1; +static int aggressive_window = -1; +#define MAX_ADD 10 static const char *argv_pack_refs[] = {"pack-refs", "--prune", NULL}; static const char *argv_reflog[] = {"reflog", "expire", "--all", NULL}; -static const char *argv_repack[] = {"repack", "-a", "-d", "-l", NULL}; +static const char *argv_repack[MAX_ADD] = {"repack", "-a", "-d", "-l", NULL}; static const char *argv_prune[] = {"prune", NULL}; static const char *argv_rerere[] = {"rerere", "gc", NULL}; @@ -34,13 +36,34 @@ static int gc_config(const char *var, const char *value) pack_refs = git_config_bool(var, value); return 0; } + if (!strcmp(var, "gc.aggressiveWindow")) { + aggressive_window = git_config_int(var, value); + printf("aggressive_window = %d\n", aggressive_window); + return 0; + } return git_default_config(var, value); } +static append_option(const char **cmd, const char *opt, int max_length) +{ + int i; + + for (i=0; cmd[i]; i++) + ; + + if (i+2 >= max_length) { + fprintf(stderr, "Too many options specified\n"); + exit(1); + } + cmd[i++] = opt; + cmd[i] = 0; +} + int cmd_gc(int argc, const char **argv, const char *prefix) { int i; int prune = 0; + char buf[80]; git_config(gc_config); @@ -53,6 +76,14 @@ int cmd_gc(int argc, const char **argv, const char *prefix) prune = 1; continue; } + if (!strcmp(arg, "--aggressive")) { + append_option(argv_repack, "-f", MAX_ADD); + if (aggressive_window > 0) { + sprintf(buf, "--window=%d", aggressive_window); + append_option(argv_repack, buf, MAX_ADD); + } + continue; + } /* perhaps other parameters later... */ break; } -- 1.5.2.rc2.22.ga39d ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --aggressive option to 'git gc' 2007-05-09 19:48 ` [PATCH] Add --aggressive option to 'git gc' Theodore Tso @ 2007-05-09 20:19 ` Junio C Hamano 2007-05-09 22:22 ` Theodore Tso 2007-05-10 7:38 ` Junio C Hamano 1 sibling, 1 reply; 40+ messages in thread From: Junio C Hamano @ 2007-05-09 20:19 UTC (permalink / raw) To: Theodore Tso; +Cc: Daniel Barkalow, Git Mailing List Theodore Tso <tytso@mit.edu> writes: > On Wed, May 09, 2007 at 01:15:07AM -0700, Junio C Hamano wrote: >> > Maybe git-gc should have an option for "compress hard"? It seems to me >> > like a two-sizes-fit-all solution would be good here; "git gc" for daily >> > use, and "git gc --squeeze" for when you want to make the result as small >> > as possible, with compute time not being a major factor. >> >> I think that sounds saner and more user friendly than specific >> knob to tune "window", "depth" and friends which are too >> technical. It has an added attraction that we can redefine what >> exactly "hard" means later. > > OK, here's a patch that does exactly that. I choose git-gc > --aggressive, since I thought that was more descriptive than --hard or > --squeeze. Junio, would you be willing to apply this? Willing? Yes. It's tricky that it defaults to 10 and still called aggressive. When the configuration variable is left unspecified, the only reason it is called aggressive is because it passes '-f' to repack, right? It was not very clear at the first sight and I was about to ask why the default is 10, not higher. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --aggressive option to 'git gc' 2007-05-09 20:19 ` Junio C Hamano @ 2007-05-09 22:22 ` Theodore Tso 0 siblings, 0 replies; 40+ messages in thread From: Theodore Tso @ 2007-05-09 22:22 UTC (permalink / raw) To: Junio C Hamano; +Cc: Daniel Barkalow, Git Mailing List On Wed, May 09, 2007 at 01:19:54PM -0700, Junio C Hamano wrote: > It's tricky that it defaults to 10 and still called aggressive. > When the configuration variable is left unspecified, the only > reason it is called aggressive is because it passes '-f' to > repack, right? It was not very clear at the first sight and I > was about to ask why the default is 10, not higher. Yes, it's called aggressive because it causes git-gc to recalculate the delta's. So that means that git-gc --aggressive will take around 9-10 times longer than git-gc. If the user changes gc.aggressive-window to some larger value, say like 30, then git-gc --aggressive would take around 20 times longer than git-gc. So that's why I didn't make the default bigger, but instead left it as something which could be configured by the user. Maybe the default should be larger; I don't strong opposition towards making it be more like 30, but it seemed to me that doubling the run time for a 5% decrease in pack size wasn't worth it. Regards, - Ted ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --aggressive option to 'git gc' 2007-05-09 19:48 ` [PATCH] Add --aggressive option to 'git gc' Theodore Tso 2007-05-09 20:19 ` Junio C Hamano @ 2007-05-10 7:38 ` Junio C Hamano 1 sibling, 0 replies; 40+ messages in thread From: Junio C Hamano @ 2007-05-10 7:38 UTC (permalink / raw) To: Theodore Tso; +Cc: Daniel Barkalow, Git Mailing List Theodore Tso <tytso@mit.edu> writes: > Junio, would you be willing to apply this? Yes, but ;-). > static int pack_refs = -1; > +static int aggressive_window = -1; > > +#define MAX_ADD 10 > static const char *argv_pack_refs[] = {"pack-refs", "--prune", NULL}; > static const char *argv_reflog[] = {"reflog", "expire", "--all", NULL}; > -static const char *argv_repack[] = {"repack", "-a", "-d", "-l", NULL}; > +static const char *argv_repack[MAX_ADD] = {"repack", "-a", "-d", "-l", NULL}; > static const char *argv_prune[] = {"prune", NULL}; > static const char *argv_rerere[] = {"rerere", "gc", NULL}; > > @@ -34,13 +36,34 @@ static int gc_config(const char *var, const char *value) > pack_refs = git_config_bool(var, value); > return 0; > } > + if (!strcmp(var, "gc.aggressiveWindow")) { Callbacks to git_config() are called with variable names downcased (except for the 2nd level for 3-level variables. E.g. [REMOTE "Foo"] URL = ...; becomes var = "remote.Foo.url", val = ...). > + aggressive_window = git_config_int(var, value); > + printf("aggressive_window = %d\n", aggressive_window); Did you mean to leave this in? Looks like a debug remnant... > + return 0; > + } > return git_default_config(var, value); > } > > +static append_option(const char **cmd, const char *opt, int max_length) Type is "static void" I presume. > +{ > + int i; Funny tab here. > + > + for (i=0; cmd[i]; i++) > + ; SP around operator '='. > + > + if (i+2 >= max_length) { Same for '+' > + fprintf(stderr, "Too many options specified\n"); > + exit(1); die("Too many...specified"); /* note the lack of \n at the end */ > + } > + cmd[i++] = opt; > + cmd[i] = 0; We tend to spell out NULL, although we all are aware that C says literal '0' is the null pointer. All fixups are trivial so I'd take the patch and amend locally. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 2007-05-08 13:28 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Theodore Ts'o 2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta option to git-gc Theodore Ts'o @ 2007-05-08 15:38 ` Nicolas Pitre 2007-05-08 16:30 ` Theodore Tso 1 sibling, 1 reply; 40+ messages in thread From: Nicolas Pitre @ 2007-05-08 15:38 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Junio C Hamano, Git Mailing List On Tue, 8 May 2007, Theodore Ts'o wrote: > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> I'd prefer if tests were performed on the performance impact before changing the default depth. If done separately from this patch then the commit log could contain those results as well. Nicolas ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 2007-05-08 15:38 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Nicolas Pitre @ 2007-05-08 16:30 ` Theodore Tso 2007-05-08 16:49 ` Johannes Schindelin ` (2 more replies) 0 siblings, 3 replies; 40+ messages in thread From: Theodore Tso @ 2007-05-08 16:30 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Junio C Hamano, Git Mailing List On Tue, May 08, 2007 at 11:38:46AM -0400, Nicolas Pitre wrote: > On Tue, 8 May 2007, Theodore Ts'o wrote: > > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > > I'd prefer if tests were performed on the performance impact before > changing the default depth. If done separately from this patch then the > commit log could contain those results as well. The following results are on a recent git repository, using time to record the real, user, and sys times on the two commands: "git-gc --no-reuse-delta --window=X --depth=Y" and "git log --pretty=oneline -S'object' > /dev/null". All of these tests were done with a hot cache, so disk speed didn't enter into the calculations. git-gc git log -S'object' pack size w=10,d=10 27.1s/25.2s/0.3s 20.8s/20.4s/0.1s 15292k w=10,d=30 23.8s/22.3s/0.2s 21.2s/20.9s/0.1s 12996k w=10,d=50 24.8s/22.4s/0.4s 21.8s/21.2s/0.1s 12340k w=100,d=100 24.1s/22.8s/0.3s 22.4s/21.8s/0.2s 11772k w=30,d=10 45.0s/43.1s/0.4s 20.8s/20.5s/0.1s 14388k w=30,d=30 35.8s/34.1s/0.3s 21.6s/21.1s/0.1s 11800k w=30,d=50 34.6s/33.0s/0.3s 22.1s/21.4s/0.1s 11376k w=30,d=100 34.0s/32.2s/0.3s 22.2s/21.6s/0.1s 11012k w=50,d=10 56.1s/54.3s/0.4s 21.3s/20.5s/0.1s 14224k w=50,d=30 47.2s/45.4s/0.4s 21.6s/21.0s/0.1s 11496k w=50,d=50 44.5s/43.0s/0.3s 21.7s/21.2s/0.1s 11108k w=50,d=100 44.3s/42.7s/0.4s 22.4s/21.7s/0.1s 10824k So a couple of things immediately become evident. First of all, as Junio predicted, changing --depth makes no difference to the git-gc or git log times. The latter is thanks to the delta chaching. Secondly, changing --depth does make a signficiant difference to the pack size. Finally, --window does help somewhat in reducing the pack size, but it _significantly_ increases the time to calculate the pack. My conclusion given this quick benchmark is that it seems to me that changing the defaults of --depth to 50, and keeping --window at 10, is a reasonable thing to do. Regards, - Ted ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 2007-05-08 16:30 ` Theodore Tso @ 2007-05-08 16:49 ` Johannes Schindelin 2007-05-08 18:09 ` Theodore Tso 2007-05-08 17:07 ` Dana How 2007-05-08 17:35 ` Nicolas Pitre 2 siblings, 1 reply; 40+ messages in thread From: Johannes Schindelin @ 2007-05-08 16:49 UTC (permalink / raw) To: Theodore Tso; +Cc: Nicolas Pitre, Junio C Hamano, Git Mailing List Hi, On Tue, 8 May 2007, Theodore Tso wrote: > On Tue, May 08, 2007 at 11:38:46AM -0400, Nicolas Pitre wrote: > > On Tue, 8 May 2007, Theodore Ts'o wrote: > > > > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > > > > I'd prefer if tests were performed on the performance impact before > > changing the default depth. If done separately from this patch then the > > commit log could contain those results as well. > > The following results are on a recent git repository, using time to > record the real, user, and sys times on the two commands: "git-gc > --no-reuse-delta --window=X --depth=Y" and "git log --pretty=oneline > -S'object' > /dev/null". All of these tests were done with a hot > cache, so disk speed didn't enter into the calculations. > > git-gc git log -S'object' pack size > w=10,d=10 27.1s/25.2s/0.3s 20.8s/20.4s/0.1s 15292k > w=10,d=30 23.8s/22.3s/0.2s 21.2s/20.9s/0.1s 12996k > w=10,d=50 24.8s/22.4s/0.4s 21.8s/21.2s/0.1s 12340k > w=100,d=100 24.1s/22.8s/0.3s 22.4s/21.8s/0.2s 11772k > > w=30,d=10 45.0s/43.1s/0.4s 20.8s/20.5s/0.1s 14388k > w=30,d=30 35.8s/34.1s/0.3s 21.6s/21.1s/0.1s 11800k > w=30,d=50 34.6s/33.0s/0.3s 22.1s/21.4s/0.1s 11376k > w=30,d=100 34.0s/32.2s/0.3s 22.2s/21.6s/0.1s 11012k > > w=50,d=10 56.1s/54.3s/0.4s 21.3s/20.5s/0.1s 14224k > w=50,d=30 47.2s/45.4s/0.4s 21.6s/21.0s/0.1s 11496k > w=50,d=50 44.5s/43.0s/0.3s 21.7s/21.2s/0.1s 11108k > w=50,d=100 44.3s/42.7s/0.4s 22.4s/21.7s/0.1s 10824k > > So a couple of things immediately become evident. First of all, as > Junio predicted, changing --depth makes no difference to the git-gc or > git log times. The latter is thanks to the delta chaching. Secondly, > changing --depth does make a signficiant difference to the pack size. > > Finally, --window does help somewhat in reducing the pack size, but it > _significantly_ increases the time to calculate the pack. > > My conclusion given this quick benchmark is that it seems to me that > changing the defaults of --depth to 50, and keeping --window at 10, is > a reasonable thing to do. I'd be happier if that test was done on _at least_ the kernel repo, if not something larger, _plus_ having the numbers on page faults. Swapping can kill performance substantially... git.git is small. Ciao, Dscho ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 2007-05-08 16:49 ` Johannes Schindelin @ 2007-05-08 18:09 ` Theodore Tso 2007-05-08 18:46 ` Nicolas Pitre 0 siblings, 1 reply; 40+ messages in thread From: Theodore Tso @ 2007-05-08 18:09 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Nicolas Pitre, Junio C Hamano, Git Mailing List On Tue, May 08, 2007 at 06:49:47PM +0200, Johannes Schindelin wrote: > I'd be happier if that test was done on _at least_ the kernel repo, if not > something larger, _plus_ having the numbers on page faults. Swapping can > kill performance substantially... Given how the delta cache works, I really don't think it's going to matter. In any case, my laptop has 2gigs of memory, and the kernel pack file is only 134megs, so you're not going to see any major page faults.... In any case, here is a quick run: git-gc git-log -S'object' real/user/sys/min.faults real/user/sys/min.faults pack size w=10,d=10 4:31/257.7/6.2/391711 5:53/326.9/1.7/255156 155940k w=10,d=30 4:16/242.7/6.5/378193 5:39/331.6/2.3/437283 143144k w=10,d=50 4:29/250.1/6.7/554493 5:43/334.5/1.9/362574 140080k You'll note that it's the same thing; git-gc, git-log doesn't change much, while the pack size decreases as --depth increases. We're only seeing at 10% decrease in the pack size, compared to the 20% decrease with the git repository, but that's probably because of the HTML and man branches, which no doubt delta compress really, really well. I can run a full set of benchmarks, varying both --window and --depth, and also including a non-pickaxe git-log test as requested, but not until tonight. I really don't think we'll see any surprises compared to the earlier runs, though. After all, if we just stop and think about how the delta caching works, and how the repacking algorithm works, it's pretty clear that there shouldn't be any scaling issues with increasing --depth, and that increasing --window is just going to be painful, and these should hold true regardless of the size of the repo. - Ted ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 2007-05-08 18:09 ` Theodore Tso @ 2007-05-08 18:46 ` Nicolas Pitre 2007-05-09 13:49 ` Theodore Tso 0 siblings, 1 reply; 40+ messages in thread From: Nicolas Pitre @ 2007-05-08 18:46 UTC (permalink / raw) To: Theodore Tso; +Cc: Johannes Schindelin, Junio C Hamano, Git Mailing List On Tue, 8 May 2007, Theodore Tso wrote: > After all, if we just stop and think about how the delta caching > works, and how the repacking algorithm works, it's pretty clear that > there shouldn't be any scaling issues with increasing --depth, and > that increasing --window is just going to be painful, and these should > hold true regardless of the size of the repo. The window size has absolutely no effect on the runtime pack access, except maybe for the increased number of deltas. It is really a pack time cost. The delta depth is the opposite: it has no effect on the packing time, but it has the potential to slow down runtime access. But the delta base cache is apparently working really well to mitigate that cost, as long as it is big enough of course. Nicolas ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 2007-05-08 18:46 ` Nicolas Pitre @ 2007-05-09 13:49 ` Theodore Tso 2007-05-09 14:17 ` Johannes Schindelin 0 siblings, 1 reply; 40+ messages in thread From: Theodore Tso @ 2007-05-09 13:49 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Johannes Schindelin, Junio C Hamano, Git Mailing List On Tue, May 08, 2007 at 02:46:35PM -0400, Nicolas Pitre wrote: > The window size has absolutely no effect on the runtime pack access, > except maybe for the increased number of deltas. It is really a pack > time cost. The delta depth is the opposite: it has no effect on the > packing time, but it has the potential to slow down runtime access. But > the delta base cache is apparently working really well to mitigate that > cost, as long as it is big enough of course. Exactly, and the numbers bear out with the theory. Junio's already applied the change to up the default to 50, but for the record, here are the results using a kernel git repository, with times for doing a git-gc (with the --no-reuse-delta, although I like the suggestion to change it to just be "--hard", or maybe "--agressive"), and with a git pickaxe and a git-log with pathname restrictions. It essentially confirms that the delta base cache is doing the job just fine, up to even a depth of 100. However, there isn't much difference in pack size between a depth of 50 and 100. Increasing the window size from 10 to 30 increases the pack run time by roughly 40%, and saves an extra 5% or so on the pack size. Increasing the window beyond 30 has ever-smaller diminishing returns, while the time to repack gets bigger and bigger. The timing information is real/user/system/minor pagefaults, and as before, the results are done using a hot cache. Apologies in advance for the long lines in the benchmark results. These results were generated using a relatively recent, post-2.6.21 kernel git tree on a Thinkpad T60p laptop with a 2GHz T2500 Intel Core Duo processor. Regards, git-gc pack size git-log -S'object' git-log include/scsi drivers/scsi w=10,d=10 5:08.70/293.20/5.56/1027802 162316k 5:31.12/322.20/4.14/153298 0:01.80/1.72/0.03/19077 w=10,d=30 4:14.79/245.46/2.57/398754 149636k 5:41.56/331.21/4.86/517220 0:01.83/1.74/0.05/17880 w=10,d=50 4:31.89/257.16/3.50/576538 146608k 5:51.58/336.63/5.07/259643 0:01.88/1.80/0.04/17658 w=10,d=100 4:35.93/262.77/4.08/715195 144152k 5:58.23/341.26/6.22/624510 0:01.89/1.80/0.04/17571 w=30,d=10 7:23.06/424.08/6.35/1222640 153752k 5:32.85/323.96/2.13/213150 0:01.73/1.64/0.06/18429 w=30,d=30 6:15.76/364.27/3.53/407546 141200k 5:42.29/333.53/2.10/338301 0:01.81/1.71/0.04/17237 w=30,d=50 6:26.85/372.24/5.38/578343 139408k 5:42.21/336.65/1.30/260234 0:01.77/1.70/0.05/17108 w=30,d=100 6:34.31/381.72/5.40/744886 138040k 5:59.03/342.65/4.90/607681 0:01.93/1.79/0.06/17043 w=50,d=10 8:51.08/508.80/5.75/1050358 152168k 5:35.43/327.48/3.70/209655 0:01.70/1.64/0.04/18264 w=50,d=30 8:04.53/471.65/5.58/423755 139000k 5:42.06/335.67/1.55/335333 0:01.78/1.72/0.04/17046 w=50,d=50 8:14.87/479.93/7.09/617310 137244k 5:47.22/339.26/2.73/462412 0:01.79/1.72/0.04/16871 w=50,d=100 8:23.13/490.09/4.10/751742 136152k 5:51.10/343.89/2.29/503573 0:01.83/1.73/0.06/16920 w=100,d=10 12:00.34/702.07/6.27/1167403 150736k 5:34.42/328.88/1.09/207522 0:01.72/1.63/0.07/18024 w=100,d=30 11:56.34/702.36/2.93/436950 137240k 5:42.64/335.82/2.07/422743 0:01.77/1.67/0.05/16852 w=100,d=50 12:20.39/722.99/3.94/671655 135488k 5:47.20/339.38/2.44/468974 0:01.78/1.68/0.03/16621 w=100,d=100 12:37.69/740.63/4.12/733593 134492k 5:52.26/344.16/3.00/636120 0:01.85/1.80/0.03/16681 - Ted ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 2007-05-09 13:49 ` Theodore Tso @ 2007-05-09 14:17 ` Johannes Schindelin 0 siblings, 0 replies; 40+ messages in thread From: Johannes Schindelin @ 2007-05-09 14:17 UTC (permalink / raw) To: Theodore Tso; +Cc: Nicolas Pitre, Junio C Hamano, Git Mailing List Hi, On Wed, 9 May 2007, Theodore Tso wrote: > [...] here are the results using a kernel git repository [...] Thank you very much, Dscho ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 2007-05-08 16:30 ` Theodore Tso 2007-05-08 16:49 ` Johannes Schindelin @ 2007-05-08 17:07 ` Dana How 2007-05-08 17:35 ` Nicolas Pitre 2 siblings, 0 replies; 40+ messages in thread From: Dana How @ 2007-05-08 17:07 UTC (permalink / raw) To: Theodore Tso; +Cc: Nicolas Pitre, Junio C Hamano, Git Mailing List, danahow On 5/8/07, Theodore Tso <tytso@mit.edu> wrote: > The following results are on a recent git repository, using time to > record the real, user, and sys times on the two commands: "git-gc > --no-reuse-delta --window=X --depth=Y" and "git log --pretty=oneline > -S'object' > /dev/null". All of these tests were done with a hot > cache, so disk speed didn't enter into the calculations. > > ... > > So a couple of things immediately become evident. First of all, as > Junio predicted, changing --depth makes no difference to the git-gc or > git log times. The latter is thanks to the delta chaching. Secondly, > changing --depth does make a signficiant difference to the pack size. > > Finally, --window does help somewhat in reducing the pack size, but it > _significantly_ increases the time to calculate the pack. > > My conclusion given this quick benchmark is that it seems to me that > changing the defaults of --depth to 50, and keeping --window at 10, is > a reasonable thing to do. If you still have the packfiles around, the times for some non-pickaxe git-log commands would be interesting, like from git-log's man page: git log v2.6.12.. include/scsi drivers/scsi git log --since="2 weeks ago" -- gitk These operations would be more dominated by processing smaller objects. Thanks, -- Dana L. How danahow@gmail.com +1 650 804 5991 cell ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 2007-05-08 16:30 ` Theodore Tso 2007-05-08 16:49 ` Johannes Schindelin 2007-05-08 17:07 ` Dana How @ 2007-05-08 17:35 ` Nicolas Pitre 2007-05-09 5:03 ` Junio C Hamano 2 siblings, 1 reply; 40+ messages in thread From: Nicolas Pitre @ 2007-05-08 17:35 UTC (permalink / raw) To: Theodore Tso; +Cc: Junio C Hamano, Git Mailing List On Tue, 8 May 2007, Theodore Tso wrote: > On Tue, May 08, 2007 at 11:38:46AM -0400, Nicolas Pitre wrote: > > On Tue, 8 May 2007, Theodore Ts'o wrote: > > > > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > > > > I'd prefer if tests were performed on the performance impact before > > changing the default depth. If done separately from this patch then the > > commit log could contain those results as well. > > The following results are on a recent git repository, using time to > record the real, user, and sys times on the two commands: "git-gc > --no-reuse-delta --window=X --depth=Y" and "git log --pretty=oneline > -S'object' > /dev/null". All of these tests were done with a hot > cache, so disk speed didn't enter into the calculations. [...] > My conclusion given this quick benchmark is that it seems to me that > changing the defaults of --depth to 50, and keeping --window at 10, is > a reasonable thing to do. Effectively. I'd still prefer to see the default changed in a patch of its own though. Nicolas ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 2007-05-08 17:35 ` Nicolas Pitre @ 2007-05-09 5:03 ` Junio C Hamano 0 siblings, 0 replies; 40+ messages in thread From: Junio C Hamano @ 2007-05-09 5:03 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Theodore Tso, Git Mailing List Nicolas Pitre <nico@cam.org> writes: > On Tue, 8 May 2007, Theodore Tso wrote: > >> On Tue, May 08, 2007 at 11:38:46AM -0400, Nicolas Pitre wrote: >> > On Tue, 8 May 2007, Theodore Ts'o wrote: >> > >> > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> >> > >> > I'd prefer if tests were performed on the performance impact before >> > changing the default depth. If done separately from this patch then the >> > commit log could contain those results as well. >> >> The following results are on a recent git repository, using time to >> record the real, user, and sys times on the two commands: "git-gc >> --no-reuse-delta --window=X --depth=Y" and "git log --pretty=oneline >> -S'object' > /dev/null". All of these tests were done with a hot >> cache, so disk speed didn't enter into the calculations. > [...] >> My conclusion given this quick benchmark is that it seems to me that >> changing the defaults of --depth to 50, and keeping --window at 10, is >> a reasonable thing to do. > > Effectively. > > I'd still prefer to see the default changed in a patch of its own > though. I'll split the patch into two and apply them separately. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta, --window, and --depth options to 2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Theodore Ts'o 2007-05-08 13:28 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Theodore Ts'o @ 2007-05-08 15:30 ` Nicolas Pitre 2007-05-08 21:12 ` Junio C Hamano 1 sibling, 1 reply; 40+ messages in thread From: Nicolas Pitre @ 2007-05-08 15:30 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Junio C Hamano, Git Mailing List On Tue, 8 May 2007, Theodore Ts'o wrote: > OK, here's a patch to implement pack.depth (with the default tweaked > to 50 --- is that too high?), This is most likely to affect runtime performances, but some benchmarks would be needed to find out how much. Having both pack.depth and pack.window as config options (with current defaults of 10) would certainly be a good thing. Tweaking those defaults should probably be investigated separately. > followed by a simplified and reworked > patch to git-gc that only implements --no-reuse-delta. > > I don't imagine that most users will want to use that feature most of > the time, hence the long option name, but occasionally, it might be > useful. Yes, the user could just run "git-repack -a -d -f -l" after > running git-gc, but then the "git-repack -a -d -l" in git-gc is just a > wasted disk i/o. In which case, it is git-gc that needs to get a bit smarter. Maybe something like this: ----- >* Avoid running git-repack from git-gc if there is evidently nothing to repack. Signed-off-by: Nicolas Pitre <nico@cam.org> --- diff --git a/builtin-count-objects.c b/builtin-count-objects.c index ff90ebd..8d79764 100644 --- a/builtin-count-objects.c +++ b/builtin-count-objects.c @@ -67,13 +67,40 @@ static void count_objects(DIR *d, char *path, int len, int verbose, } } -int cmd_count_objects(int ac, const char **av, const char *prefix) +static void count_loose_objects(int verbose, + unsigned long *loose, + unsigned long *loose_size, + unsigned long *packed_loose, + unsigned long *garbage) { - int i; - int verbose = 0; const char *objdir = get_object_directory(); - int len = strlen(objdir); + int i, len = strlen(objdir); char *path = xmalloc(len + 50); + memcpy(path, objdir, len); + if (len && objdir[len-1] != '/') + path[len++] = '/'; + for (i = 0; i < 256; i++) { + DIR *d; + sprintf(path + len, "%02x", i); + d = opendir(path); + if (!d) + continue; + count_objects(d, path, len, verbose, + loose, loose_size, packed_loose, garbage); + closedir(d); + } +} + +unsigned long num_loose_objects(void) +{ + unsigned long loose = 0, packed_loose = 0, garbage = 0, loose_size = 0; + count_loose_objects(0, &loose, &loose_size, &packed_loose, &garbage); + return loose; +} + +int cmd_count_objects(int ac, const char **av, const char *prefix) +{ + int i, verbose = 0; unsigned long loose = 0, packed = 0, packed_loose = 0, garbage = 0; unsigned long loose_size = 0; @@ -90,19 +117,8 @@ int cmd_count_objects(int ac, const char **av, const char *prefix) /* we do not take arguments other than flags for now */ if (i < ac) usage(count_objects_usage); - memcpy(path, objdir, len); - if (len && objdir[len-1] != '/') - path[len++] = '/'; - for (i = 0; i < 256; i++) { - DIR *d; - sprintf(path + len, "%02x", i); - d = opendir(path); - if (!d) - continue; - count_objects(d, path, len, verbose, - &loose, &loose_size, &packed_loose, &garbage); - closedir(d); - } + + count_loose_objects(verbose, &loose, &loose_size, &packed_loose, &garbage); if (verbose) { struct packed_git *p; unsigned long num_pack = 0; diff --git a/builtin-gc.c b/builtin-gc.c index 3b1f8c2..b9b3c05 100644 --- a/builtin-gc.c +++ b/builtin-gc.c @@ -40,7 +40,7 @@ static int gc_config(const char *var, const char *value) int cmd_gc(int argc, const char **argv, const char *prefix) { int i; - int prune = 0; + int prune = 0, do_repack = 0; git_config(gc_config); @@ -65,7 +65,20 @@ int cmd_gc(int argc, const char **argv, const char *prefix) if (run_command_v_opt(argv_reflog, RUN_GIT_CMD)) return error(FAILED_RUN, argv_reflog[0]); - if (run_command_v_opt(argv_repack, RUN_GIT_CMD)) + if (num_loose_objects() > 0) { + do_repack = 1; + } else { + struct packed_git *p; + unsigned long num_pack = 0; + if (!packed_git) + prepare_packed_git(); + for (p = packed_git; p; p = p->next) + if (p->pack_local) + num_pack++; + if (num_pack > 1) + do_repack = 1; + } + if (do_repack && run_command_v_opt(argv_repack, RUN_GIT_CMD)) return error(FAILED_RUN, argv_repack[0]); if (prune && run_command_v_opt(argv_prune, RUN_GIT_CMD)) diff --git a/cache.h b/cache.h index 8e76152..3a140f1 100644 --- a/cache.h +++ b/cache.h @@ -359,6 +359,8 @@ extern int legacy_loose_object(unsigned char *); extern int has_pack_file(const unsigned char *sha1); extern int has_pack_index(const unsigned char *sha1); +extern unsigned long num_loose_objects(void); + extern signed char hexval_table[256]; static inline unsigned int hexval(unsigned int c) { ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta, --window, and --depth options to 2007-05-08 15:30 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Nicolas Pitre @ 2007-05-08 21:12 ` Junio C Hamano 2007-05-08 23:59 ` Nicolas Pitre 0 siblings, 1 reply; 40+ messages in thread From: Junio C Hamano @ 2007-05-08 21:12 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Theodore Ts'o, Git Mailing List Nicolas Pitre <nico@cam.org> writes: > In which case, it is git-gc that needs to get a bit smarter. Maybe > something like this: I agree git-gc should be tuned to "one size fits all well enough" default, rather than getting more complicated parameters to fine tune its behaviour to satisfy power users. > @@ -65,7 +65,20 @@ int cmd_gc(int argc, const char **argv, const char *prefix) > if (run_command_v_opt(argv_reflog, RUN_GIT_CMD)) > return error(FAILED_RUN, argv_reflog[0]); > > - if (run_command_v_opt(argv_repack, RUN_GIT_CMD)) > + if (num_loose_objects() > 0) { > + do_repack = 1; > + } else { > + struct packed_git *p; > + unsigned long num_pack = 0; > + if (!packed_git) > + prepare_packed_git(); > + for (p = packed_git; p; p = p->next) > + if (p->pack_local) > + num_pack++; > + if (num_pack > 1) > + do_repack = 1; > + } > + if (do_repack && run_command_v_opt(argv_repack, RUN_GIT_CMD)) > return error(FAILED_RUN, argv_repack[0]); > > if (prune && run_command_v_opt(argv_prune, RUN_GIT_CMD)) Is this even correct? When your repository is fully packed, if you decided to discard one of your topic branches with "git branch -D", what does this code do? We see no loose objects, we see only one pack, so the unreachable objects are left in the pack? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta, --window, and --depth options to 2007-05-08 21:12 ` Junio C Hamano @ 2007-05-08 23:59 ` Nicolas Pitre 0 siblings, 0 replies; 40+ messages in thread From: Nicolas Pitre @ 2007-05-08 23:59 UTC (permalink / raw) To: Junio C Hamano; +Cc: Theodore Ts'o, Git Mailing List On Tue, 8 May 2007, Junio C Hamano wrote: > Nicolas Pitre <nico@cam.org> writes: > > > @@ -65,7 +65,20 @@ int cmd_gc(int argc, const char **argv, const char *prefix) > > if (run_command_v_opt(argv_reflog, RUN_GIT_CMD)) > > return error(FAILED_RUN, argv_reflog[0]); > > > > - if (run_command_v_opt(argv_repack, RUN_GIT_CMD)) > > + if (num_loose_objects() > 0) { > > + do_repack = 1; > > + } else { > > + struct packed_git *p; > > + unsigned long num_pack = 0; > > + if (!packed_git) > > + prepare_packed_git(); > > + for (p = packed_git; p; p = p->next) > > + if (p->pack_local) > > + num_pack++; > > + if (num_pack > 1) > > + do_repack = 1; > > + } > > + if (do_repack && run_command_v_opt(argv_repack, RUN_GIT_CMD)) > > return error(FAILED_RUN, argv_repack[0]); > > > > if (prune && run_command_v_opt(argv_prune, RUN_GIT_CMD)) > > Is this even correct? > > When your repository is fully packed, if you decided to discard > one of your topic branches with "git branch -D", what does this > code do? We see no loose objects, we see only one pack, so the > unreachable objects are left in the pack? Right. OK, scrap that. Nicolas ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2007-06-11 21:44 UTC | newest] Thread overview: 40+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-05-08 2:54 [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc Theodore Ts'o 2007-05-08 3:13 ` Nicolas Pitre 2007-05-08 3:21 ` Theodore Tso 2007-05-08 3:38 ` Dana How 2007-05-08 4:43 ` Junio C Hamano 2007-05-08 13:46 ` Nicolas Pitre 2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Theodore Ts'o 2007-05-08 13:28 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Theodore Ts'o 2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta option to git-gc Theodore Ts'o 2007-05-08 15:35 ` Nicolas Pitre 2007-05-09 5:05 ` Daniel Barkalow 2007-05-09 8:15 ` Junio C Hamano 2007-05-09 9:02 ` Steven Grimm 2007-05-09 11:35 ` Other compression?, was " Johannes Schindelin 2007-05-09 15:15 ` Junio C Hamano 2007-05-09 19:10 ` Shawn O. Pearce 2007-06-10 7:40 ` Sam Vilain 2007-06-11 1:51 ` Nicolas Pitre 2007-06-11 6:20 ` Steven Grimm 2007-06-11 6:31 ` Shawn O. Pearce 2007-06-11 10:20 ` Johannes Schindelin 2007-06-11 14:01 ` Nicolas Pitre 2007-06-11 21:40 ` Johannes Schindelin 2007-05-09 19:48 ` [PATCH] Add --aggressive option to 'git gc' Theodore Tso 2007-05-09 20:19 ` Junio C Hamano 2007-05-09 22:22 ` Theodore Tso 2007-05-10 7:38 ` Junio C Hamano 2007-05-08 15:38 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Nicolas Pitre 2007-05-08 16:30 ` Theodore Tso 2007-05-08 16:49 ` Johannes Schindelin 2007-05-08 18:09 ` Theodore Tso 2007-05-08 18:46 ` Nicolas Pitre 2007-05-09 13:49 ` Theodore Tso 2007-05-09 14:17 ` Johannes Schindelin 2007-05-08 17:07 ` Dana How 2007-05-08 17:35 ` Nicolas Pitre 2007-05-09 5:03 ` Junio C Hamano 2007-05-08 15:30 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Nicolas Pitre 2007-05-08 21:12 ` Junio C Hamano 2007-05-08 23:59 ` Nicolas Pitre
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).