* [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc
@ 2007-05-08 2:54 Theodore Ts'o
2007-05-08 3:13 ` Nicolas Pitre
2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Theodore Ts'o
0 siblings, 2 replies; 40+ messages in thread
From: Theodore Ts'o @ 2007-05-08 2:54 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Sometimes users might want to use more aggressive packing options
when doing a git-gc. This allows them to do so without having
to use the low-level plumbing commands.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
Documentation/git-gc.txt | 3 ++-
Documentation/git-pack-objects.txt | 19 +------------------
Documentation/pack-options.txt | 18 ++++++++++++++++++
builtin-gc.c | 30 ++++++++++++++++++++++++++++--
4 files changed, 49 insertions(+), 21 deletions(-)
create mode 100644 Documentation/pack-options.txt
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index bc16584..3c807c2 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -8,7 +8,7 @@ git-gc - Cleanup unnecessary files and optimize the local repository
SYNOPSIS
--------
-'git-gc' [--prune]
+'git-gc' [--prune] [--no-reuse-delta] [--window=N] [--depth=N]
DESCRIPTION
-----------
@@ -35,6 +35,7 @@ OPTIONS
repository at the same time (e.g. never use this option
in a cron script).
+include::pack-options.txt[]
Configuration
-------------
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index d9e11c6..aac71f6 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -73,18 +73,6 @@ base-name::
as if all refs under `$GIT_DIR/refs` are specified to be
included.
---window=[N], --depth=[N]::
- These two options affect how the objects contained in
- the pack are stored using delta compression. The
- objects are first internally sorted by type, size and
- optionally names and compared against the other objects
- within --window to see if using delta compression saves
- space. --depth limits the maximum delta depth; making
- it too deep affects the performance on the unpacker
- side, because delta data needs to be applied that many
- times to get to the necessary object.
- The default value for both --window and --depth is 10.
-
--incremental::
This flag causes an object already in a pack ignored
even if it appears in the standard input.
@@ -120,12 +108,7 @@ base-name::
This flag makes the command not to report its progress
on the standard error stream.
---no-reuse-delta::
- When creating a packed archive in a repository that
- has existing packs, the command reuses existing deltas.
- This sometimes results in a slightly suboptimal pack.
- This flag tells the command not to reuse existing deltas
- but compute them from scratch.
+include::pack-options.txt[]
--delta-base-offset::
A packed archive can express base object of a delta as
diff --git a/Documentation/pack-options.txt b/Documentation/pack-options.txt
new file mode 100644
index 0000000..7b0ae5f
--- /dev/null
+++ b/Documentation/pack-options.txt
@@ -0,0 +1,18 @@
+--window=[N], --depth=[N]::
+ These two options affect how the objects contained in
+ the pack are stored using delta compression. The
+ objects are first internally sorted by type, size and
+ optionally names and compared against the other objects
+ within --window to see if using delta compression saves
+ space. --depth limits the maximum delta depth; making
+ it too deep affects the performance on the unpacker
+ side, because delta data needs to be applied that many
+ times to get to the necessary object.
+ The default value for both --window and --depth is 10.
+
+--no-reuse-delta::
+ When creating a packed archive in a repository that
+ has existing packs, the command reuses existing deltas.
+ This sometimes results in a slightly suboptimal pack.
+ This flag tells the command not to reuse existing deltas
+ but compute them from scratch.
diff --git a/builtin-gc.c b/builtin-gc.c
index 3b1f8c2..7e7775d 100644
--- a/builtin-gc.c
+++ b/builtin-gc.c
@@ -15,13 +15,15 @@
#define FAILED_RUN "failed to run %s"
-static const char builtin_gc_usage[] = "git-gc [--prune]";
+static const char builtin_gc_usage[] = "git-gc [--prune] [--no-reuse-delta] [--window=N] [--depth=N]";
static int pack_refs = -1;
+#define MAX_ADD 10
+
static const char *argv_pack_refs[] = {"pack-refs", "--prune", NULL};
static const char *argv_reflog[] = {"reflog", "expire", "--all", NULL};
-static const char *argv_repack[] = {"repack", "-a", "-d", "-l", NULL};
+static const char *argv_repack[MAX_ADD] = {"repack", "-a", "-d", "-l", NULL};
static const char *argv_prune[] = {"prune", NULL};
static const char *argv_rerere[] = {"rerere", "gc", NULL};
@@ -37,6 +39,21 @@ static int gc_config(const char *var, const char *value)
return git_default_config(var, value);
}
+static append_option(const char **cmd, const char *opt, int max_length)
+{
+ int i;
+
+ for (i=0; cmd[i]; i++)
+ ;
+
+ if (i+2 >= max_length) {
+ fprintf(stderr, "Too many options specified\n");
+ exit(1);
+ }
+ cmd[i++] = opt;
+ cmd[i] = 0;
+}
+
int cmd_gc(int argc, const char **argv, const char *prefix)
{
int i;
@@ -53,6 +70,15 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
prune = 1;
continue;
}
+ if (!strcmp(arg, "--no-reuse-delta")) {
+ append_option(argv_repack, "-f", MAX_ADD);
+ continue;
+ }
+ if (!strncmp(arg, "--window", 8) ||
+ !strncmp(arg, "--depth", 7)) {
+ append_option(argv_repack, arg, MAX_ADD);
+ continue;
+ }
/* perhaps other parameters later... */
break;
}
--
1.5.2.rc1.20.g86b9-dirty
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc
2007-05-08 2:54 [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc Theodore Ts'o
@ 2007-05-08 3:13 ` Nicolas Pitre
2007-05-08 3:21 ` Theodore Tso
2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Theodore Ts'o
1 sibling, 1 reply; 40+ messages in thread
From: Nicolas Pitre @ 2007-05-08 3:13 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Junio C Hamano, git
On Mon, 7 May 2007, Theodore Ts'o wrote:
> Sometimes users might want to use more aggressive packing options
> when doing a git-gc. This allows them to do so without having
> to use the low-level plumbing commands.
The 'git repack' command isn't _that_ low level, is it?
git-pack-objects is plumbing for sure, but not git-repack?
Especially if you're aware and interested in those options, you won't be
afraid of 'git repack -a -f -d --window=...".
In the context of "gc", having an option that reads "window" looks a bit
strange too.
Maybe it's just me...
Nicolas
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc
2007-05-08 3:13 ` Nicolas Pitre
@ 2007-05-08 3:21 ` Theodore Tso
2007-05-08 3:38 ` Dana How
2007-05-08 4:43 ` Junio C Hamano
0 siblings, 2 replies; 40+ messages in thread
From: Theodore Tso @ 2007-05-08 3:21 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, git
On Mon, May 07, 2007 at 11:13:58PM -0400, Nicolas Pitre wrote:
> On Mon, 7 May 2007, Theodore Ts'o wrote:
>
> > Sometimes users might want to use more aggressive packing options
> > when doing a git-gc. This allows them to do so without having
> > to use the low-level plumbing commands.
>
> The 'git repack' command isn't _that_ low level, is it?
> git-pack-objects is plumbing for sure, but not git-repack?
>
> Especially if you're aware and interested in those options, you won't be
> afraid of 'git repack -a -f -d --window=...".
>
> In the context of "gc", having an option that reads "window" looks a bit
> strange too.
I suppose, but you either need to then know all of the other commands
which git-gc runs, and do them manually, skipping git-gc altogether,
or use git-gc, and end up rewriting the pack twice, ince using the
git-repack in git-gc, and then once manually so you can give the
options that you really want to give to git-repack.
Maybe the right approach is to have a way to specify default --window
and --depth as git configuration variables? Looks like there is a
pack.window already, but not a pack.depth.
What if we add a pack.depth configuration option, and add only
--no-reuse-delta to git-gc? Would that be better?
- Ted
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc
2007-05-08 3:21 ` Theodore Tso
@ 2007-05-08 3:38 ` Dana How
2007-05-08 4:43 ` Junio C Hamano
1 sibling, 0 replies; 40+ messages in thread
From: Dana How @ 2007-05-08 3:38 UTC (permalink / raw)
To: Theodore Tso; +Cc: Nicolas Pitre, Junio C Hamano, git, danahow
On 5/7/07, Theodore Tso <tytso@mit.edu> wrote:
> On Mon, May 07, 2007 at 11:13:58PM -0400, Nicolas Pitre wrote:
> > On Mon, 7 May 2007, Theodore Ts'o wrote:
> > > Sometimes users might want to use more aggressive packing options
> > > when doing a git-gc. This allows them to do so without having
> > > to use the low-level plumbing commands.
> > In the context of "gc", having an option that reads "window" looks a bit
> > strange too.
> I suppose, but you either need to then know all of the other commands
> which git-gc runs, and do them manually, skipping git-gc altogether,
> or use git-gc, and end up rewriting the pack twice, ince using the
> git-repack in git-gc, and then once manually so you can give the
> options that you really want to give to git-repack.
>
> Maybe the right approach is to have a way to specify default --window
> and --depth as git configuration variables? Looks like there is a
> pack.window already, but not a pack.depth.
>
> What if we add a pack.depth configuration option, and add only
> --no-reuse-delta to git-gc? Would that be better?
I would use pack.depth .
Thanks,
--
Dana L. How danahow@gmail.com +1 650 804 5991 cell
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc
2007-05-08 3:21 ` Theodore Tso
2007-05-08 3:38 ` Dana How
@ 2007-05-08 4:43 ` Junio C Hamano
2007-05-08 13:46 ` Nicolas Pitre
1 sibling, 1 reply; 40+ messages in thread
From: Junio C Hamano @ 2007-05-08 4:43 UTC (permalink / raw)
To: Theodore Tso; +Cc: Nicolas Pitre, git
Theodore Tso <tytso@mit.edu> writes:
> On Mon, May 07, 2007 at 11:13:58PM -0400, Nicolas Pitre wrote:
>> ...
>> Especially if you're aware and interested in those options, you won't be
>> afraid of 'git repack -a -f -d --window=...".
>>
>> In the context of "gc", having an option that reads "window" looks a bit
>> strange too.
>
> I suppose, but you either need to then know all of the other commands
> which git-gc runs, and do them manually, skipping git-gc altogether,
> or use git-gc, and end up rewriting the pack twice,...
If the user really wants to tweak the parameters that much and
that often, I think what Nico says, plus your pack.depth/window
configuration variables, make more sense. git-gc is meant to be
a shorthand with reasonable "one size fits all" default, and
there is something wrong if a user has to give customization
option every time it is run. It could be that the default
parameters are grossly off for _everybody_, in which case we
should fix the default.
With the recent introduction of delta base caching code, we
might want to tweak the default pack depth to larger value for
everybody, by the way.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta, --window, and --depth options to
2007-05-08 2:54 [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc Theodore Ts'o
2007-05-08 3:13 ` Nicolas Pitre
@ 2007-05-08 13:28 ` Theodore Ts'o
2007-05-08 13:28 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Theodore Ts'o
2007-05-08 15:30 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Nicolas Pitre
1 sibling, 2 replies; 40+ messages in thread
From: Theodore Ts'o @ 2007-05-08 13:28 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List
OK, here's a patch to implement pack.depth (with the default tweaked
to 50 --- is that too high?), followed by a simplified and reworked
patch to git-gc that only implements --no-reuse-delta.
I don't imagine that most users will want to use that feature most of
the time, hence the long option name, but occasionally, it might be
useful. Yes, the user could just run "git-repack -a -d -f -l" after
running git-gc, but then the "git-repack -a -d -l" in git-gc is just a
wasted disk i/o. I don't know if I'll manage to convince you, if not,
just drop the second patch, I guess. :-)
- Ted
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50
2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Theodore Ts'o
@ 2007-05-08 13:28 ` Theodore Ts'o
2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta option to git-gc Theodore Ts'o
2007-05-08 15:38 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Nicolas Pitre
2007-05-08 15:30 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Nicolas Pitre
1 sibling, 2 replies; 40+ messages in thread
From: Theodore Ts'o @ 2007-05-08 13:28 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List, Theodore Ts'o
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
Documentation/config.txt | 6 +++++-
Documentation/git-pack-objects.txt | 2 +-
Documentation/git-repack.txt | 2 +-
builtin-pack-objects.c | 6 +++++-
4 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 24f9655..c7674c2 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -544,7 +544,11 @@ merge.<driver>.recursive::
pack.window::
The size of the window used by gitlink:git-pack-objects[1] when no
- window size is given on the command line. Defaults to 10.
+ window size is given on the command line. Defaults to 10.
+
+pack.depth::
+ The maximum delta depth used by gitlink:git-pack-objects[1] when no
+ maximum depth is given on the command line. Defaults to 50.
pull.octopus::
The default merge strategy to use when pulling multiple branches
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index d9e11c6..bd3ee45 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -83,7 +83,7 @@ base-name::
it too deep affects the performance on the unpacker
side, because delta data needs to be applied that many
times to get to the necessary object.
- The default value for both --window and --depth is 10.
+ The default value for --window is 10 and --depth is 50.
--incremental::
This flag causes an object already in a pack ignored
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index d39abc1..cc3b0b2 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -63,7 +63,7 @@ OPTIONS
space. `--depth` limits the maximum delta depth; making it too deep
affects the performance on the unpacker side, because delta data needs
to be applied that many times to get to the necessary object.
- The default value for both --window and --depth is 10.
+ The default value for --window is 10 and --depth is 50.
Configuration
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 7bff8ea..966f843 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -64,6 +64,7 @@ static char tmpname[PATH_MAX];
static unsigned char pack_file_sha1[20];
static int progress = 1;
static int window = 10;
+static int depth = 50;
static int pack_to_stdout;
static int num_preferred_base;
static struct progress progress_state;
@@ -1489,6 +1490,10 @@ static int git_pack_config(const char *k, const char *v)
window = git_config_int(k, v);
return 0;
}
+ if(!strcmp(k, "pack.depth")) {
+ depth = git_config_int(k, v);
+ return 0;
+ }
return git_default_config(k, v);
}
@@ -1584,7 +1589,6 @@ static int adjust_perm(const char *path, mode_t mode)
int cmd_pack_objects(int argc, const char **argv, const char *prefix)
{
- int depth = 10;
int use_internal_rev_list = 0;
int thin = 0;
uint32_t i;
--
1.5.2.rc2.22.ga39d
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH] Add --no-reuse-delta option to git-gc
2007-05-08 13:28 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Theodore Ts'o
@ 2007-05-08 13:28 ` Theodore Ts'o
2007-05-08 15:35 ` Nicolas Pitre
2007-05-09 5:05 ` Daniel Barkalow
2007-05-08 15:38 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Nicolas Pitre
1 sibling, 2 replies; 40+ messages in thread
From: Theodore Ts'o @ 2007-05-08 13:28 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List, Theodore Ts'o
This allows the user to regenerate the deltas in packs while doing
a git-gc. The user could just run git-repack -a -d -f -l after
running git-gc, but then the first git-repack run by git-gc is
a bit of waste.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
Documentation/git-gc.txt | 7 ++++++-
builtin-gc.c | 24 ++++++++++++++++++++++--
2 files changed, 28 insertions(+), 3 deletions(-)
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index bc16584..0493d06 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -8,7 +8,7 @@ git-gc - Cleanup unnecessary files and optimize the local repository
SYNOPSIS
--------
-'git-gc' [--prune]
+'git-gc' [--prune] [--no-reuse-delta]
DESCRIPTION
-----------
@@ -35,6 +35,11 @@ OPTIONS
repository at the same time (e.g. never use this option
in a cron script).
+--no-reuse-delta::
+ This causes deltas in existing packs to be recalculated instead
+ of reusing the existing deltas. This can save disk space at
+ the cost of taking more time to recalculate them from scratch.
+
Configuration
-------------
diff --git a/builtin-gc.c b/builtin-gc.c
index 3b1f8c2..5cb7ffd 100644
--- a/builtin-gc.c
+++ b/builtin-gc.c
@@ -15,13 +15,14 @@
#define FAILED_RUN "failed to run %s"
-static const char builtin_gc_usage[] = "git-gc [--prune]";
+static const char builtin_gc_usage[] = "git-gc [--prune] [--no-reuse-delta]";
static int pack_refs = -1;
+#define MAX_ADD 10
static const char *argv_pack_refs[] = {"pack-refs", "--prune", NULL};
static const char *argv_reflog[] = {"reflog", "expire", "--all", NULL};
-static const char *argv_repack[] = {"repack", "-a", "-d", "-l", NULL};
+static const char *argv_repack[MAX_ADD] = {"repack", "-a", "-d", "-l", NULL};
static const char *argv_prune[] = {"prune", NULL};
static const char *argv_rerere[] = {"rerere", "gc", NULL};
@@ -37,6 +38,21 @@ static int gc_config(const char *var, const char *value)
return git_default_config(var, value);
}
+static append_option(const char **cmd, const char *opt, int max_length)
+{
+ int i;
+
+ for (i=0; cmd[i]; i++)
+ ;
+
+ if (i+2 >= max_length) {
+ fprintf(stderr, "Too many options specified\n");
+ exit(1);
+ }
+ cmd[i++] = opt;
+ cmd[i] = 0;
+}
+
int cmd_gc(int argc, const char **argv, const char *prefix)
{
int i;
@@ -53,6 +69,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
prune = 1;
continue;
}
+ if (!strcmp(arg, "--no-reuse-delta")) {
+ append_option(argv_repack, "-f", MAX_ADD);
+ continue;
+ }
/* perhaps other parameters later... */
break;
}
--
1.5.2.rc2.22.ga39d
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc
2007-05-08 4:43 ` Junio C Hamano
@ 2007-05-08 13:46 ` Nicolas Pitre
0 siblings, 0 replies; 40+ messages in thread
From: Nicolas Pitre @ 2007-05-08 13:46 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Theodore Tso, git
On Mon, 7 May 2007, Junio C Hamano wrote:
> Theodore Tso <tytso@mit.edu> writes:
>
> > On Mon, May 07, 2007 at 11:13:58PM -0400, Nicolas Pitre wrote:
> >> ...
> >> Especially if you're aware and interested in those options, you won't be
> >> afraid of 'git repack -a -f -d --window=...".
> >>
> >> In the context of "gc", having an option that reads "window" looks a bit
> >> strange too.
> >
> > I suppose, but you either need to then know all of the other commands
> > which git-gc runs, and do them manually, skipping git-gc altogether,
> > or use git-gc, and end up rewriting the pack twice,...
>
> If the user really wants to tweak the parameters that much and
> that often, I think what Nico says, plus your pack.depth/window
> configuration variables, make more sense.
Also, once you've run git-repack with -f and the window/depth params you
want, those deltas will be reused as is by default afterwards even if
you use git-gc. So the big repack to tighten a large repo needs only to
be done once.
Nicolas
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta, --window, and --depth options to
2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Theodore Ts'o
2007-05-08 13:28 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Theodore Ts'o
@ 2007-05-08 15:30 ` Nicolas Pitre
2007-05-08 21:12 ` Junio C Hamano
1 sibling, 1 reply; 40+ messages in thread
From: Nicolas Pitre @ 2007-05-08 15:30 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Junio C Hamano, Git Mailing List
On Tue, 8 May 2007, Theodore Ts'o wrote:
> OK, here's a patch to implement pack.depth (with the default tweaked
> to 50 --- is that too high?),
This is most likely to affect runtime performances, but some benchmarks
would be needed to find out how much.
Having both pack.depth and pack.window as config options (with current
defaults of 10) would certainly be a good thing. Tweaking those
defaults should probably be investigated separately.
> followed by a simplified and reworked
> patch to git-gc that only implements --no-reuse-delta.
>
> I don't imagine that most users will want to use that feature most of
> the time, hence the long option name, but occasionally, it might be
> useful. Yes, the user could just run "git-repack -a -d -f -l" after
> running git-gc, but then the "git-repack -a -d -l" in git-gc is just a
> wasted disk i/o.
In which case, it is git-gc that needs to get a bit smarter. Maybe
something like this:
----- >*
Avoid running git-repack from git-gc if there is evidently nothing to
repack.
Signed-off-by: Nicolas Pitre <nico@cam.org>
---
diff --git a/builtin-count-objects.c b/builtin-count-objects.c
index ff90ebd..8d79764 100644
--- a/builtin-count-objects.c
+++ b/builtin-count-objects.c
@@ -67,13 +67,40 @@ static void count_objects(DIR *d, char *path, int len, int verbose,
}
}
-int cmd_count_objects(int ac, const char **av, const char *prefix)
+static void count_loose_objects(int verbose,
+ unsigned long *loose,
+ unsigned long *loose_size,
+ unsigned long *packed_loose,
+ unsigned long *garbage)
{
- int i;
- int verbose = 0;
const char *objdir = get_object_directory();
- int len = strlen(objdir);
+ int i, len = strlen(objdir);
char *path = xmalloc(len + 50);
+ memcpy(path, objdir, len);
+ if (len && objdir[len-1] != '/')
+ path[len++] = '/';
+ for (i = 0; i < 256; i++) {
+ DIR *d;
+ sprintf(path + len, "%02x", i);
+ d = opendir(path);
+ if (!d)
+ continue;
+ count_objects(d, path, len, verbose,
+ loose, loose_size, packed_loose, garbage);
+ closedir(d);
+ }
+}
+
+unsigned long num_loose_objects(void)
+{
+ unsigned long loose = 0, packed_loose = 0, garbage = 0, loose_size = 0;
+ count_loose_objects(0, &loose, &loose_size, &packed_loose, &garbage);
+ return loose;
+}
+
+int cmd_count_objects(int ac, const char **av, const char *prefix)
+{
+ int i, verbose = 0;
unsigned long loose = 0, packed = 0, packed_loose = 0, garbage = 0;
unsigned long loose_size = 0;
@@ -90,19 +117,8 @@ int cmd_count_objects(int ac, const char **av, const char *prefix)
/* we do not take arguments other than flags for now */
if (i < ac)
usage(count_objects_usage);
- memcpy(path, objdir, len);
- if (len && objdir[len-1] != '/')
- path[len++] = '/';
- for (i = 0; i < 256; i++) {
- DIR *d;
- sprintf(path + len, "%02x", i);
- d = opendir(path);
- if (!d)
- continue;
- count_objects(d, path, len, verbose,
- &loose, &loose_size, &packed_loose, &garbage);
- closedir(d);
- }
+
+ count_loose_objects(verbose, &loose, &loose_size, &packed_loose, &garbage);
if (verbose) {
struct packed_git *p;
unsigned long num_pack = 0;
diff --git a/builtin-gc.c b/builtin-gc.c
index 3b1f8c2..b9b3c05 100644
--- a/builtin-gc.c
+++ b/builtin-gc.c
@@ -40,7 +40,7 @@ static int gc_config(const char *var, const char *value)
int cmd_gc(int argc, const char **argv, const char *prefix)
{
int i;
- int prune = 0;
+ int prune = 0, do_repack = 0;
git_config(gc_config);
@@ -65,7 +65,20 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
if (run_command_v_opt(argv_reflog, RUN_GIT_CMD))
return error(FAILED_RUN, argv_reflog[0]);
- if (run_command_v_opt(argv_repack, RUN_GIT_CMD))
+ if (num_loose_objects() > 0) {
+ do_repack = 1;
+ } else {
+ struct packed_git *p;
+ unsigned long num_pack = 0;
+ if (!packed_git)
+ prepare_packed_git();
+ for (p = packed_git; p; p = p->next)
+ if (p->pack_local)
+ num_pack++;
+ if (num_pack > 1)
+ do_repack = 1;
+ }
+ if (do_repack && run_command_v_opt(argv_repack, RUN_GIT_CMD))
return error(FAILED_RUN, argv_repack[0]);
if (prune && run_command_v_opt(argv_prune, RUN_GIT_CMD))
diff --git a/cache.h b/cache.h
index 8e76152..3a140f1 100644
--- a/cache.h
+++ b/cache.h
@@ -359,6 +359,8 @@ extern int legacy_loose_object(unsigned char *);
extern int has_pack_file(const unsigned char *sha1);
extern int has_pack_index(const unsigned char *sha1);
+extern unsigned long num_loose_objects(void);
+
extern signed char hexval_table[256];
static inline unsigned int hexval(unsigned int c)
{
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc
2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta option to git-gc Theodore Ts'o
@ 2007-05-08 15:35 ` Nicolas Pitre
2007-05-09 5:05 ` Daniel Barkalow
1 sibling, 0 replies; 40+ messages in thread
From: Nicolas Pitre @ 2007-05-08 15:35 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Junio C Hamano, Git Mailing List
On Tue, 8 May 2007, Theodore Ts'o wrote:
> This allows the user to regenerate the deltas in packs while doing
> a git-gc. The user could just run git-repack -a -d -f -l after
> running git-gc, but then the first git-repack run by git-gc is
> a bit of waste.
Given the patch I just posted and my previous arguments I don't think
this patch is in the spirit of git-gc. I'd prefer if git-gc remained as
simple with the least options as possible, but I don't care that much
either.
Nicolas
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50
2007-05-08 13:28 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Theodore Ts'o
2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta option to git-gc Theodore Ts'o
@ 2007-05-08 15:38 ` Nicolas Pitre
2007-05-08 16:30 ` Theodore Tso
1 sibling, 1 reply; 40+ messages in thread
From: Nicolas Pitre @ 2007-05-08 15:38 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Junio C Hamano, Git Mailing List
On Tue, 8 May 2007, Theodore Ts'o wrote:
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
I'd prefer if tests were performed on the performance impact before
changing the default depth. If done separately from this patch then the
commit log could contain those results as well.
Nicolas
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50
2007-05-08 15:38 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Nicolas Pitre
@ 2007-05-08 16:30 ` Theodore Tso
2007-05-08 16:49 ` Johannes Schindelin
` (2 more replies)
0 siblings, 3 replies; 40+ messages in thread
From: Theodore Tso @ 2007-05-08 16:30 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, Git Mailing List
On Tue, May 08, 2007 at 11:38:46AM -0400, Nicolas Pitre wrote:
> On Tue, 8 May 2007, Theodore Ts'o wrote:
>
> > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
>
> I'd prefer if tests were performed on the performance impact before
> changing the default depth. If done separately from this patch then the
> commit log could contain those results as well.
The following results are on a recent git repository, using time to
record the real, user, and sys times on the two commands: "git-gc
--no-reuse-delta --window=X --depth=Y" and "git log --pretty=oneline
-S'object' > /dev/null". All of these tests were done with a hot
cache, so disk speed didn't enter into the calculations.
git-gc git log -S'object' pack size
w=10,d=10 27.1s/25.2s/0.3s 20.8s/20.4s/0.1s 15292k
w=10,d=30 23.8s/22.3s/0.2s 21.2s/20.9s/0.1s 12996k
w=10,d=50 24.8s/22.4s/0.4s 21.8s/21.2s/0.1s 12340k
w=100,d=100 24.1s/22.8s/0.3s 22.4s/21.8s/0.2s 11772k
w=30,d=10 45.0s/43.1s/0.4s 20.8s/20.5s/0.1s 14388k
w=30,d=30 35.8s/34.1s/0.3s 21.6s/21.1s/0.1s 11800k
w=30,d=50 34.6s/33.0s/0.3s 22.1s/21.4s/0.1s 11376k
w=30,d=100 34.0s/32.2s/0.3s 22.2s/21.6s/0.1s 11012k
w=50,d=10 56.1s/54.3s/0.4s 21.3s/20.5s/0.1s 14224k
w=50,d=30 47.2s/45.4s/0.4s 21.6s/21.0s/0.1s 11496k
w=50,d=50 44.5s/43.0s/0.3s 21.7s/21.2s/0.1s 11108k
w=50,d=100 44.3s/42.7s/0.4s 22.4s/21.7s/0.1s 10824k
So a couple of things immediately become evident. First of all, as
Junio predicted, changing --depth makes no difference to the git-gc or
git log times. The latter is thanks to the delta chaching. Secondly,
changing --depth does make a signficiant difference to the pack size.
Finally, --window does help somewhat in reducing the pack size, but it
_significantly_ increases the time to calculate the pack.
My conclusion given this quick benchmark is that it seems to me that
changing the defaults of --depth to 50, and keeping --window at 10, is
a reasonable thing to do.
Regards,
- Ted
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50
2007-05-08 16:30 ` Theodore Tso
@ 2007-05-08 16:49 ` Johannes Schindelin
2007-05-08 18:09 ` Theodore Tso
2007-05-08 17:07 ` Dana How
2007-05-08 17:35 ` Nicolas Pitre
2 siblings, 1 reply; 40+ messages in thread
From: Johannes Schindelin @ 2007-05-08 16:49 UTC (permalink / raw)
To: Theodore Tso; +Cc: Nicolas Pitre, Junio C Hamano, Git Mailing List
Hi,
On Tue, 8 May 2007, Theodore Tso wrote:
> On Tue, May 08, 2007 at 11:38:46AM -0400, Nicolas Pitre wrote:
> > On Tue, 8 May 2007, Theodore Ts'o wrote:
> >
> > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> >
> > I'd prefer if tests were performed on the performance impact before
> > changing the default depth. If done separately from this patch then the
> > commit log could contain those results as well.
>
> The following results are on a recent git repository, using time to
> record the real, user, and sys times on the two commands: "git-gc
> --no-reuse-delta --window=X --depth=Y" and "git log --pretty=oneline
> -S'object' > /dev/null". All of these tests were done with a hot
> cache, so disk speed didn't enter into the calculations.
>
> git-gc git log -S'object' pack size
> w=10,d=10 27.1s/25.2s/0.3s 20.8s/20.4s/0.1s 15292k
> w=10,d=30 23.8s/22.3s/0.2s 21.2s/20.9s/0.1s 12996k
> w=10,d=50 24.8s/22.4s/0.4s 21.8s/21.2s/0.1s 12340k
> w=100,d=100 24.1s/22.8s/0.3s 22.4s/21.8s/0.2s 11772k
>
> w=30,d=10 45.0s/43.1s/0.4s 20.8s/20.5s/0.1s 14388k
> w=30,d=30 35.8s/34.1s/0.3s 21.6s/21.1s/0.1s 11800k
> w=30,d=50 34.6s/33.0s/0.3s 22.1s/21.4s/0.1s 11376k
> w=30,d=100 34.0s/32.2s/0.3s 22.2s/21.6s/0.1s 11012k
>
> w=50,d=10 56.1s/54.3s/0.4s 21.3s/20.5s/0.1s 14224k
> w=50,d=30 47.2s/45.4s/0.4s 21.6s/21.0s/0.1s 11496k
> w=50,d=50 44.5s/43.0s/0.3s 21.7s/21.2s/0.1s 11108k
> w=50,d=100 44.3s/42.7s/0.4s 22.4s/21.7s/0.1s 10824k
>
> So a couple of things immediately become evident. First of all, as
> Junio predicted, changing --depth makes no difference to the git-gc or
> git log times. The latter is thanks to the delta chaching. Secondly,
> changing --depth does make a signficiant difference to the pack size.
>
> Finally, --window does help somewhat in reducing the pack size, but it
> _significantly_ increases the time to calculate the pack.
>
> My conclusion given this quick benchmark is that it seems to me that
> changing the defaults of --depth to 50, and keeping --window at 10, is
> a reasonable thing to do.
I'd be happier if that test was done on _at least_ the kernel repo, if not
something larger, _plus_ having the numbers on page faults. Swapping can
kill performance substantially...
git.git is small.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50
2007-05-08 16:30 ` Theodore Tso
2007-05-08 16:49 ` Johannes Schindelin
@ 2007-05-08 17:07 ` Dana How
2007-05-08 17:35 ` Nicolas Pitre
2 siblings, 0 replies; 40+ messages in thread
From: Dana How @ 2007-05-08 17:07 UTC (permalink / raw)
To: Theodore Tso; +Cc: Nicolas Pitre, Junio C Hamano, Git Mailing List, danahow
On 5/8/07, Theodore Tso <tytso@mit.edu> wrote:
> The following results are on a recent git repository, using time to
> record the real, user, and sys times on the two commands: "git-gc
> --no-reuse-delta --window=X --depth=Y" and "git log --pretty=oneline
> -S'object' > /dev/null". All of these tests were done with a hot
> cache, so disk speed didn't enter into the calculations.
>
> ...
>
> So a couple of things immediately become evident. First of all, as
> Junio predicted, changing --depth makes no difference to the git-gc or
> git log times. The latter is thanks to the delta chaching. Secondly,
> changing --depth does make a signficiant difference to the pack size.
>
> Finally, --window does help somewhat in reducing the pack size, but it
> _significantly_ increases the time to calculate the pack.
>
> My conclusion given this quick benchmark is that it seems to me that
> changing the defaults of --depth to 50, and keeping --window at 10, is
> a reasonable thing to do.
If you still have the packfiles around, the times for some non-pickaxe
git-log commands would be interesting, like from git-log's man page:
git log v2.6.12.. include/scsi drivers/scsi
git log --since="2 weeks ago" -- gitk
These operations would be more dominated by processing smaller objects.
Thanks,
--
Dana L. How danahow@gmail.com +1 650 804 5991 cell
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50
2007-05-08 16:30 ` Theodore Tso
2007-05-08 16:49 ` Johannes Schindelin
2007-05-08 17:07 ` Dana How
@ 2007-05-08 17:35 ` Nicolas Pitre
2007-05-09 5:03 ` Junio C Hamano
2 siblings, 1 reply; 40+ messages in thread
From: Nicolas Pitre @ 2007-05-08 17:35 UTC (permalink / raw)
To: Theodore Tso; +Cc: Junio C Hamano, Git Mailing List
On Tue, 8 May 2007, Theodore Tso wrote:
> On Tue, May 08, 2007 at 11:38:46AM -0400, Nicolas Pitre wrote:
> > On Tue, 8 May 2007, Theodore Ts'o wrote:
> >
> > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> >
> > I'd prefer if tests were performed on the performance impact before
> > changing the default depth. If done separately from this patch then the
> > commit log could contain those results as well.
>
> The following results are on a recent git repository, using time to
> record the real, user, and sys times on the two commands: "git-gc
> --no-reuse-delta --window=X --depth=Y" and "git log --pretty=oneline
> -S'object' > /dev/null". All of these tests were done with a hot
> cache, so disk speed didn't enter into the calculations.
[...]
> My conclusion given this quick benchmark is that it seems to me that
> changing the defaults of --depth to 50, and keeping --window at 10, is
> a reasonable thing to do.
Effectively.
I'd still prefer to see the default changed in a patch of its own
though.
Nicolas
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50
2007-05-08 16:49 ` Johannes Schindelin
@ 2007-05-08 18:09 ` Theodore Tso
2007-05-08 18:46 ` Nicolas Pitre
0 siblings, 1 reply; 40+ messages in thread
From: Theodore Tso @ 2007-05-08 18:09 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Nicolas Pitre, Junio C Hamano, Git Mailing List
On Tue, May 08, 2007 at 06:49:47PM +0200, Johannes Schindelin wrote:
> I'd be happier if that test was done on _at least_ the kernel repo, if not
> something larger, _plus_ having the numbers on page faults. Swapping can
> kill performance substantially...
Given how the delta cache works, I really don't think it's going to
matter. In any case, my laptop has 2gigs of memory, and the kernel
pack file is only 134megs, so you're not going to see any major page
faults....
In any case, here is a quick run:
git-gc git-log -S'object'
real/user/sys/min.faults real/user/sys/min.faults pack size
w=10,d=10 4:31/257.7/6.2/391711 5:53/326.9/1.7/255156 155940k
w=10,d=30 4:16/242.7/6.5/378193 5:39/331.6/2.3/437283 143144k
w=10,d=50 4:29/250.1/6.7/554493 5:43/334.5/1.9/362574 140080k
You'll note that it's the same thing; git-gc, git-log doesn't change
much, while the pack size decreases as --depth increases. We're only
seeing at 10% decrease in the pack size, compared to the 20% decrease
with the git repository, but that's probably because of the HTML and
man branches, which no doubt delta compress really, really well.
I can run a full set of benchmarks, varying both --window and --depth,
and also including a non-pickaxe git-log test as requested, but not
until tonight. I really don't think we'll see any surprises compared
to the earlier runs, though.
After all, if we just stop and think about how the delta caching
works, and how the repacking algorithm works, it's pretty clear that
there shouldn't be any scaling issues with increasing --depth, and
that increasing --window is just going to be painful, and these should
hold true regardless of the size of the repo.
- Ted
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50
2007-05-08 18:09 ` Theodore Tso
@ 2007-05-08 18:46 ` Nicolas Pitre
2007-05-09 13:49 ` Theodore Tso
0 siblings, 1 reply; 40+ messages in thread
From: Nicolas Pitre @ 2007-05-08 18:46 UTC (permalink / raw)
To: Theodore Tso; +Cc: Johannes Schindelin, Junio C Hamano, Git Mailing List
On Tue, 8 May 2007, Theodore Tso wrote:
> After all, if we just stop and think about how the delta caching
> works, and how the repacking algorithm works, it's pretty clear that
> there shouldn't be any scaling issues with increasing --depth, and
> that increasing --window is just going to be painful, and these should
> hold true regardless of the size of the repo.
The window size has absolutely no effect on the runtime pack access,
except maybe for the increased number of deltas. It is really a pack
time cost. The delta depth is the opposite: it has no effect on the
packing time, but it has the potential to slow down runtime access. But
the delta base cache is apparently working really well to mitigate that
cost, as long as it is big enough of course.
Nicolas
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta, --window, and --depth options to
2007-05-08 15:30 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Nicolas Pitre
@ 2007-05-08 21:12 ` Junio C Hamano
2007-05-08 23:59 ` Nicolas Pitre
0 siblings, 1 reply; 40+ messages in thread
From: Junio C Hamano @ 2007-05-08 21:12 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Theodore Ts'o, Git Mailing List
Nicolas Pitre <nico@cam.org> writes:
> In which case, it is git-gc that needs to get a bit smarter. Maybe
> something like this:
I agree git-gc should be tuned to "one size fits all well enough"
default, rather than getting more complicated parameters to fine
tune its behaviour to satisfy power users.
> @@ -65,7 +65,20 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
> if (run_command_v_opt(argv_reflog, RUN_GIT_CMD))
> return error(FAILED_RUN, argv_reflog[0]);
>
> - if (run_command_v_opt(argv_repack, RUN_GIT_CMD))
> + if (num_loose_objects() > 0) {
> + do_repack = 1;
> + } else {
> + struct packed_git *p;
> + unsigned long num_pack = 0;
> + if (!packed_git)
> + prepare_packed_git();
> + for (p = packed_git; p; p = p->next)
> + if (p->pack_local)
> + num_pack++;
> + if (num_pack > 1)
> + do_repack = 1;
> + }
> + if (do_repack && run_command_v_opt(argv_repack, RUN_GIT_CMD))
> return error(FAILED_RUN, argv_repack[0]);
>
> if (prune && run_command_v_opt(argv_prune, RUN_GIT_CMD))
Is this even correct?
When your repository is fully packed, if you decided to discard
one of your topic branches with "git branch -D", what does this
code do? We see no loose objects, we see only one pack, so the
unreachable objects are left in the pack?
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta, --window, and --depth options to
2007-05-08 21:12 ` Junio C Hamano
@ 2007-05-08 23:59 ` Nicolas Pitre
0 siblings, 0 replies; 40+ messages in thread
From: Nicolas Pitre @ 2007-05-08 23:59 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Theodore Ts'o, Git Mailing List
On Tue, 8 May 2007, Junio C Hamano wrote:
> Nicolas Pitre <nico@cam.org> writes:
>
> > @@ -65,7 +65,20 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
> > if (run_command_v_opt(argv_reflog, RUN_GIT_CMD))
> > return error(FAILED_RUN, argv_reflog[0]);
> >
> > - if (run_command_v_opt(argv_repack, RUN_GIT_CMD))
> > + if (num_loose_objects() > 0) {
> > + do_repack = 1;
> > + } else {
> > + struct packed_git *p;
> > + unsigned long num_pack = 0;
> > + if (!packed_git)
> > + prepare_packed_git();
> > + for (p = packed_git; p; p = p->next)
> > + if (p->pack_local)
> > + num_pack++;
> > + if (num_pack > 1)
> > + do_repack = 1;
> > + }
> > + if (do_repack && run_command_v_opt(argv_repack, RUN_GIT_CMD))
> > return error(FAILED_RUN, argv_repack[0]);
> >
> > if (prune && run_command_v_opt(argv_prune, RUN_GIT_CMD))
>
> Is this even correct?
>
> When your repository is fully packed, if you decided to discard
> one of your topic branches with "git branch -D", what does this
> code do? We see no loose objects, we see only one pack, so the
> unreachable objects are left in the pack?
Right. OK, scrap that.
Nicolas
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50
2007-05-08 17:35 ` Nicolas Pitre
@ 2007-05-09 5:03 ` Junio C Hamano
0 siblings, 0 replies; 40+ messages in thread
From: Junio C Hamano @ 2007-05-09 5:03 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Theodore Tso, Git Mailing List
Nicolas Pitre <nico@cam.org> writes:
> On Tue, 8 May 2007, Theodore Tso wrote:
>
>> On Tue, May 08, 2007 at 11:38:46AM -0400, Nicolas Pitre wrote:
>> > On Tue, 8 May 2007, Theodore Ts'o wrote:
>> >
>> > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
>> >
>> > I'd prefer if tests were performed on the performance impact before
>> > changing the default depth. If done separately from this patch then the
>> > commit log could contain those results as well.
>>
>> The following results are on a recent git repository, using time to
>> record the real, user, and sys times on the two commands: "git-gc
>> --no-reuse-delta --window=X --depth=Y" and "git log --pretty=oneline
>> -S'object' > /dev/null". All of these tests were done with a hot
>> cache, so disk speed didn't enter into the calculations.
> [...]
>> My conclusion given this quick benchmark is that it seems to me that
>> changing the defaults of --depth to 50, and keeping --window at 10, is
>> a reasonable thing to do.
>
> Effectively.
>
> I'd still prefer to see the default changed in a patch of its own
> though.
I'll split the patch into two and apply them separately.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc
2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta option to git-gc Theodore Ts'o
2007-05-08 15:35 ` Nicolas Pitre
@ 2007-05-09 5:05 ` Daniel Barkalow
2007-05-09 8:15 ` Junio C Hamano
1 sibling, 1 reply; 40+ messages in thread
From: Daniel Barkalow @ 2007-05-09 5:05 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Junio C Hamano, Git Mailing List
On Tue, 8 May 2007, Theodore Ts'o wrote:
> This allows the user to regenerate the deltas in packs while doing
> a git-gc. The user could just run git-repack -a -d -f -l after
> running git-gc, but then the first git-repack run by git-gc is
> a bit of waste.
Maybe git-gc should have an option for "compress hard"? It seems to me
like a two-sizes-fit-all solution would be good here; "git gc" for daily
use, and "git gc --squeeze" for when you want to make the result as small
as possible, with compute time not being a major factor. If all you know
is that you're going to burn this repository onto a stack of CDs and mail
it to somebody (but don't know about delta reuse, window sizes, and
depths, let alone good values for these), it would be good to have an
option where it picks an appropriate different set of defaults for you.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc
2007-05-09 5:05 ` Daniel Barkalow
@ 2007-05-09 8:15 ` Junio C Hamano
2007-05-09 9:02 ` Steven Grimm
2007-05-09 19:48 ` [PATCH] Add --aggressive option to 'git gc' Theodore Tso
0 siblings, 2 replies; 40+ messages in thread
From: Junio C Hamano @ 2007-05-09 8:15 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: Theodore Ts'o, Git Mailing List
Daniel Barkalow <barkalow@iabervon.org> writes:
> On Tue, 8 May 2007, Theodore Ts'o wrote:
>
>> This allows the user to regenerate the deltas in packs while doing
>> a git-gc. The user could just run git-repack -a -d -f -l after
>> running git-gc, but then the first git-repack run by git-gc is
>> a bit of waste.
>
> Maybe git-gc should have an option for "compress hard"? It seems to me
> like a two-sizes-fit-all solution would be good here; "git gc" for daily
> use, and "git gc --squeeze" for when you want to make the result as small
> as possible, with compute time not being a major factor.
I think that sounds saner and more user friendly than specific
knob to tune "window", "depth" and friends which are too
technical. It has an added attraction that we can redefine what
exactly "hard" means later.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc
2007-05-09 8:15 ` Junio C Hamano
@ 2007-05-09 9:02 ` Steven Grimm
2007-05-09 11:35 ` Other compression?, was " Johannes Schindelin
` (2 more replies)
2007-05-09 19:48 ` [PATCH] Add --aggressive option to 'git gc' Theodore Tso
1 sibling, 3 replies; 40+ messages in thread
From: Steven Grimm @ 2007-05-09 9:02 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Daniel Barkalow, Theodore Ts'o, Git Mailing List
Junio C Hamano wrote:
> I think that sounds saner and more user friendly than specific
> knob to tune "window", "depth" and friends which are too
> technical. It has an added attraction that we can redefine what
> exactly "hard" means later.
>
On that note, has any thought been given to looking at other compression
algorithms? Gzip is a great high-speed compressor, but there are others
out there (some a bit slower, some much slower at both compression and
decompression) that produce substantially smaller output.
One could even, if one were in a particularly twisted state of mind,
envision using CPU-intensive compression for less frequently-accessed
objects and using gzip for active ones, on the theory that the best
time/space tradeoff is not uniform across all the objects in a git
repository. Presumably most of us never actually unpack the vast
majority of objects in a git repository of reasonable age, so the fact
that it'd take a little longer if we *did* want to unpack them isn't
much of a downside compared to the upside of reclaiming disk space. That
would mitigate the impact of using an algorithm that's slow at
decompression.
I think it'd be kind of neat to have my .git directory shrink by another
20+%. That's conservative; on maximumcompression.com's test of a mix of
different file types including images, gzip compresses 64% and the
best-scoring one does 80%. On English text gzip does 71% and the top
scorer does 89%. Most of the top-tier compressors are proprietary, but
there are some open-source ones that do pretty well.
Maybe not worth the added complexity, but I thought I'd toss it out
there. It probably makes more sense (if it makes any at all) after
Linus's suggestion to not unpack after cloning is in place. Once the
upstream has gone to the trouble of CPU-intensive compressing, you
certainly don't want to force clones to have to spend the time repeating
the same work.
-Steve (who suspects this is a "yes, we talked this over early in git's
history" question, but what the heck)
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Other compression?, was Re: [PATCH] Add --no-reuse-delta option to git-gc
2007-05-09 9:02 ` Steven Grimm
@ 2007-05-09 11:35 ` Johannes Schindelin
2007-05-09 15:15 ` Junio C Hamano
2007-05-09 19:10 ` Shawn O. Pearce
2 siblings, 0 replies; 40+ messages in thread
From: Johannes Schindelin @ 2007-05-09 11:35 UTC (permalink / raw)
To: Steven Grimm; +Cc: Git Mailing List
Hi,
On Wed, 9 May 2007, Steven Grimm wrote:
> On that note, has any thought been given to looking at other compression
> algorithms?
I think you could do that. But you would lose compatibility with all
existing Git clients. IOW nobody could pull from you.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50
2007-05-08 18:46 ` Nicolas Pitre
@ 2007-05-09 13:49 ` Theodore Tso
2007-05-09 14:17 ` Johannes Schindelin
0 siblings, 1 reply; 40+ messages in thread
From: Theodore Tso @ 2007-05-09 13:49 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Johannes Schindelin, Junio C Hamano, Git Mailing List
On Tue, May 08, 2007 at 02:46:35PM -0400, Nicolas Pitre wrote:
> The window size has absolutely no effect on the runtime pack access,
> except maybe for the increased number of deltas. It is really a pack
> time cost. The delta depth is the opposite: it has no effect on the
> packing time, but it has the potential to slow down runtime access. But
> the delta base cache is apparently working really well to mitigate that
> cost, as long as it is big enough of course.
Exactly, and the numbers bear out with the theory. Junio's already
applied the change to up the default to 50, but for the record, here
are the results using a kernel git repository, with times for doing a
git-gc (with the --no-reuse-delta, although I like the suggestion to
change it to just be "--hard", or maybe "--agressive"), and with a git
pickaxe and a git-log with pathname restrictions. It essentially
confirms that the delta base cache is doing the job just fine, up to
even a depth of 100. However, there isn't much difference in pack
size between a depth of 50 and 100.
Increasing the window size from 10 to 30 increases the pack run time
by roughly 40%, and saves an extra 5% or so on the pack size.
Increasing the window beyond 30 has ever-smaller diminishing returns,
while the time to repack gets bigger and bigger.
The timing information is real/user/system/minor pagefaults, and as
before, the results are done using a hot cache. Apologies in advance
for the long lines in the benchmark results. These results were
generated using a relatively recent, post-2.6.21 kernel git tree on a
Thinkpad T60p laptop with a 2GHz T2500 Intel Core Duo processor.
Regards,
git-gc pack size git-log -S'object' git-log include/scsi drivers/scsi
w=10,d=10 5:08.70/293.20/5.56/1027802 162316k 5:31.12/322.20/4.14/153298 0:01.80/1.72/0.03/19077
w=10,d=30 4:14.79/245.46/2.57/398754 149636k 5:41.56/331.21/4.86/517220 0:01.83/1.74/0.05/17880
w=10,d=50 4:31.89/257.16/3.50/576538 146608k 5:51.58/336.63/5.07/259643 0:01.88/1.80/0.04/17658
w=10,d=100 4:35.93/262.77/4.08/715195 144152k 5:58.23/341.26/6.22/624510 0:01.89/1.80/0.04/17571
w=30,d=10 7:23.06/424.08/6.35/1222640 153752k 5:32.85/323.96/2.13/213150 0:01.73/1.64/0.06/18429
w=30,d=30 6:15.76/364.27/3.53/407546 141200k 5:42.29/333.53/2.10/338301 0:01.81/1.71/0.04/17237
w=30,d=50 6:26.85/372.24/5.38/578343 139408k 5:42.21/336.65/1.30/260234 0:01.77/1.70/0.05/17108
w=30,d=100 6:34.31/381.72/5.40/744886 138040k 5:59.03/342.65/4.90/607681 0:01.93/1.79/0.06/17043
w=50,d=10 8:51.08/508.80/5.75/1050358 152168k 5:35.43/327.48/3.70/209655 0:01.70/1.64/0.04/18264
w=50,d=30 8:04.53/471.65/5.58/423755 139000k 5:42.06/335.67/1.55/335333 0:01.78/1.72/0.04/17046
w=50,d=50 8:14.87/479.93/7.09/617310 137244k 5:47.22/339.26/2.73/462412 0:01.79/1.72/0.04/16871
w=50,d=100 8:23.13/490.09/4.10/751742 136152k 5:51.10/343.89/2.29/503573 0:01.83/1.73/0.06/16920
w=100,d=10 12:00.34/702.07/6.27/1167403 150736k 5:34.42/328.88/1.09/207522 0:01.72/1.63/0.07/18024
w=100,d=30 11:56.34/702.36/2.93/436950 137240k 5:42.64/335.82/2.07/422743 0:01.77/1.67/0.05/16852
w=100,d=50 12:20.39/722.99/3.94/671655 135488k 5:47.20/339.38/2.44/468974 0:01.78/1.68/0.03/16621
w=100,d=100 12:37.69/740.63/4.12/733593 134492k 5:52.26/344.16/3.00/636120 0:01.85/1.80/0.03/16681
- Ted
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50
2007-05-09 13:49 ` Theodore Tso
@ 2007-05-09 14:17 ` Johannes Schindelin
0 siblings, 0 replies; 40+ messages in thread
From: Johannes Schindelin @ 2007-05-09 14:17 UTC (permalink / raw)
To: Theodore Tso; +Cc: Nicolas Pitre, Junio C Hamano, Git Mailing List
Hi,
On Wed, 9 May 2007, Theodore Tso wrote:
> [...] here are the results using a kernel git repository [...]
Thank you very much,
Dscho
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc
2007-05-09 9:02 ` Steven Grimm
2007-05-09 11:35 ` Other compression?, was " Johannes Schindelin
@ 2007-05-09 15:15 ` Junio C Hamano
2007-05-09 19:10 ` Shawn O. Pearce
2 siblings, 0 replies; 40+ messages in thread
From: Junio C Hamano @ 2007-05-09 15:15 UTC (permalink / raw)
To: Steven Grimm; +Cc: Daniel Barkalow, Theodore Ts'o, Git Mailing List
Steven Grimm <koreth@midwinter.com> writes:
> -Steve (who suspects this is a "yes, we talked this over early in
> git's history" question, but what the heck)
AFAICR, the answer is "heck, we have never bothered about such
low level implementation details as gzip has been good enough
for us. If you feel so inclined, go wild."
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc
2007-05-09 9:02 ` Steven Grimm
2007-05-09 11:35 ` Other compression?, was " Johannes Schindelin
2007-05-09 15:15 ` Junio C Hamano
@ 2007-05-09 19:10 ` Shawn O. Pearce
2007-06-10 7:40 ` Sam Vilain
2 siblings, 1 reply; 40+ messages in thread
From: Shawn O. Pearce @ 2007-05-09 19:10 UTC (permalink / raw)
To: Steven Grimm
Cc: Junio C Hamano, Daniel Barkalow, Theodore Ts'o,
Git Mailing List
Steven Grimm <koreth@midwinter.com> wrote:
> On that note, has any thought been given to looking at other compression
> algorithms? Gzip is a great high-speed compressor, but there are others
> out there (some a bit slower, some much slower at both compression and
> decompression) that produce substantially smaller output.
Its been discussed once before on the list, in very recent history,
but not by a whole lot. As Junio pointed out, I don't think there
ever really was any discussion of is gzip the best way to deflate the
objects. I think gzip was just chosen simply because it was readily
available in libz, stable, and has a pretty decent speed/size ratio.
> I think it'd be kind of neat to have my .git directory shrink by another
> 20+%. That's conservative; on maximumcompression.com's test of a mix of
> different file types including images, gzip compresses 64% and the
> best-scoring one does 80%. On English text gzip does 71% and the top
> scorer does 89%. Most of the top-tier compressors are proprietary, but
Yes. But in many cases we might actually be able to do even better
by going with a pack-wide dictionary. Why?
Think about source code structure. E.g.
$ git grep --cached 'struct object'| cut -d: -f1|wc -l
402
So 402 files in git.git use the term 'struct object', and that's just
the current revision I had in my index. With our current packfile
organization we are likely to store this string at least 402 times.
We'll store it once in each file's delta chain, assuming each
file's blobs largely fall into a single delta chain for that file
(reasonable assumption, but certainly not always true).
That's just one string that does appear somewhat frequently in any
file its used in. Now try 'unsigned char' (its 944 files, but an
even higher frequency-per-file).
So anyway, for the past year I've been thinking about trying to
implement a blob-level dictionary prototype to see if it helps on a
project like linux-2.6.git, but I haven't gotten to it. The pack v4
work was about applying that basic dicationary principal to trees
and commits, and I think it pays off nicely there. Just need to
get it cleaned up, rebased onto current master, and submitted to
the list for wider testing. ;-)
--
Shawn.
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH] Add --aggressive option to 'git gc'
2007-05-09 8:15 ` Junio C Hamano
2007-05-09 9:02 ` Steven Grimm
@ 2007-05-09 19:48 ` Theodore Tso
2007-05-09 20:19 ` Junio C Hamano
2007-05-10 7:38 ` Junio C Hamano
1 sibling, 2 replies; 40+ messages in thread
From: Theodore Tso @ 2007-05-09 19:48 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Daniel Barkalow, Git Mailing List
On Wed, May 09, 2007 at 01:15:07AM -0700, Junio C Hamano wrote:
> > Maybe git-gc should have an option for "compress hard"? It seems to me
> > like a two-sizes-fit-all solution would be good here; "git gc" for daily
> > use, and "git gc --squeeze" for when you want to make the result as small
> > as possible, with compute time not being a major factor.
>
> I think that sounds saner and more user friendly than specific
> knob to tune "window", "depth" and friends which are too
> technical. It has an added attraction that we can redefine what
> exactly "hard" means later.
OK, here's a patch that does exactly that. I choose git-gc
--aggressive, since I thought that was more descriptive than --hard or
--squeeze. Junio, would you be willing to apply this?
- Ted
=== Cut here ===
Add --aggressive option to 'git gc'
This option causes 'git gc' to more aggressively optimize the
repository at the cost of taking much more wall clock and CPU time.
Today this option causes git-pack-objects to use --no-use-delta
option, and it allows the --window parameter to be set via the
gc.aggressiveWindow configuration parameter.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
Documentation/config.txt | 5 +++++
Documentation/git-gc.txt | 16 +++++++++++++++-
builtin-gc.c | 35 +++++++++++++++++++++++++++++++++--
3 files changed, 53 insertions(+), 3 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index ea434af..efcf301 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -384,6 +384,11 @@ format.suffix::
`.patch`. Use this variable to change that suffix (make sure to
include the dot if you want it).
+gc.aggressiveWindow::
+ The window size parameter used in the delta compression
+ algorithm used by 'git gc --aggressive'. This defaults
+ to 10.
+
gc.packrefs::
`git gc` does not run `git pack-refs` in a bare repository by
default so that older dumb-transport clients can still fetch
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index bc16584..56575e8 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -8,7 +8,7 @@ git-gc - Cleanup unnecessary files and optimize the local repository
SYNOPSIS
--------
-'git-gc' [--prune]
+'git-gc' [--prune] [--aggressive]
DESCRIPTION
-----------
@@ -35,6 +35,13 @@ OPTIONS
repository at the same time (e.g. never use this option
in a cron script).
+--aggressive::
+ Usually 'git-gc' runs very quickly while providing good disk
+ space utilization and performance. This option will cause
+ git-gc to more aggressive optimize the repository at the expense
+ of taking much more time. The effects of this optimization are
+ persistent, so this option only needs to be sporadically; every
+ few hundred changesets or so.
Configuration
-------------
@@ -67,6 +74,13 @@ The optional configuration variable 'gc.packrefs' determines if
is not run in bare repositories by default, to allow older dumb-transport
clients fetch from the repository, but this will change in the future.
+The optional configuration variable 'gc.aggressiveWindow' controls how
+much time is spent optimizing the delta compression of the objects in
+the repository when the --aggressive option is specified. The larger
+the value, the more time is spent optimizing the delta compression. See
+the documentation for the --window' option in gitlink:git-repack[1] for
+more details. This defaults to 10.
+
See Also
--------
gitlink:git-prune[1]
diff --git a/builtin-gc.c b/builtin-gc.c
index 3b1f8c2..10f92f1 100644
--- a/builtin-gc.c
+++ b/builtin-gc.c
@@ -15,13 +15,15 @@
#define FAILED_RUN "failed to run %s"
-static const char builtin_gc_usage[] = "git-gc [--prune]";
+static const char builtin_gc_usage[] = "git-gc [--prune] [--aggressive]";
static int pack_refs = -1;
+static int aggressive_window = -1;
+#define MAX_ADD 10
static const char *argv_pack_refs[] = {"pack-refs", "--prune", NULL};
static const char *argv_reflog[] = {"reflog", "expire", "--all", NULL};
-static const char *argv_repack[] = {"repack", "-a", "-d", "-l", NULL};
+static const char *argv_repack[MAX_ADD] = {"repack", "-a", "-d", "-l", NULL};
static const char *argv_prune[] = {"prune", NULL};
static const char *argv_rerere[] = {"rerere", "gc", NULL};
@@ -34,13 +36,34 @@ static int gc_config(const char *var, const char *value)
pack_refs = git_config_bool(var, value);
return 0;
}
+ if (!strcmp(var, "gc.aggressiveWindow")) {
+ aggressive_window = git_config_int(var, value);
+ printf("aggressive_window = %d\n", aggressive_window);
+ return 0;
+ }
return git_default_config(var, value);
}
+static append_option(const char **cmd, const char *opt, int max_length)
+{
+ int i;
+
+ for (i=0; cmd[i]; i++)
+ ;
+
+ if (i+2 >= max_length) {
+ fprintf(stderr, "Too many options specified\n");
+ exit(1);
+ }
+ cmd[i++] = opt;
+ cmd[i] = 0;
+}
+
int cmd_gc(int argc, const char **argv, const char *prefix)
{
int i;
int prune = 0;
+ char buf[80];
git_config(gc_config);
@@ -53,6 +76,14 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
prune = 1;
continue;
}
+ if (!strcmp(arg, "--aggressive")) {
+ append_option(argv_repack, "-f", MAX_ADD);
+ if (aggressive_window > 0) {
+ sprintf(buf, "--window=%d", aggressive_window);
+ append_option(argv_repack, buf, MAX_ADD);
+ }
+ continue;
+ }
/* perhaps other parameters later... */
break;
}
--
1.5.2.rc2.22.ga39d
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --aggressive option to 'git gc'
2007-05-09 19:48 ` [PATCH] Add --aggressive option to 'git gc' Theodore Tso
@ 2007-05-09 20:19 ` Junio C Hamano
2007-05-09 22:22 ` Theodore Tso
2007-05-10 7:38 ` Junio C Hamano
1 sibling, 1 reply; 40+ messages in thread
From: Junio C Hamano @ 2007-05-09 20:19 UTC (permalink / raw)
To: Theodore Tso; +Cc: Daniel Barkalow, Git Mailing List
Theodore Tso <tytso@mit.edu> writes:
> On Wed, May 09, 2007 at 01:15:07AM -0700, Junio C Hamano wrote:
>> > Maybe git-gc should have an option for "compress hard"? It seems to me
>> > like a two-sizes-fit-all solution would be good here; "git gc" for daily
>> > use, and "git gc --squeeze" for when you want to make the result as small
>> > as possible, with compute time not being a major factor.
>>
>> I think that sounds saner and more user friendly than specific
>> knob to tune "window", "depth" and friends which are too
>> technical. It has an added attraction that we can redefine what
>> exactly "hard" means later.
>
> OK, here's a patch that does exactly that. I choose git-gc
> --aggressive, since I thought that was more descriptive than --hard or
> --squeeze. Junio, would you be willing to apply this?
Willing? Yes.
It's tricky that it defaults to 10 and still called aggressive.
When the configuration variable is left unspecified, the only
reason it is called aggressive is because it passes '-f' to
repack, right? It was not very clear at the first sight and I
was about to ask why the default is 10, not higher.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --aggressive option to 'git gc'
2007-05-09 20:19 ` Junio C Hamano
@ 2007-05-09 22:22 ` Theodore Tso
0 siblings, 0 replies; 40+ messages in thread
From: Theodore Tso @ 2007-05-09 22:22 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Daniel Barkalow, Git Mailing List
On Wed, May 09, 2007 at 01:19:54PM -0700, Junio C Hamano wrote:
> It's tricky that it defaults to 10 and still called aggressive.
> When the configuration variable is left unspecified, the only
> reason it is called aggressive is because it passes '-f' to
> repack, right? It was not very clear at the first sight and I
> was about to ask why the default is 10, not higher.
Yes, it's called aggressive because it causes git-gc to recalculate
the delta's. So that means that git-gc --aggressive will take around
9-10 times longer than git-gc. If the user changes
gc.aggressive-window to some larger value, say like 30, then git-gc
--aggressive would take around 20 times longer than git-gc.
So that's why I didn't make the default bigger, but instead left it as
something which could be configured by the user. Maybe the default
should be larger; I don't strong opposition towards making it be more
like 30, but it seemed to me that doubling the run time for a 5%
decrease in pack size wasn't worth it.
Regards,
- Ted
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --aggressive option to 'git gc'
2007-05-09 19:48 ` [PATCH] Add --aggressive option to 'git gc' Theodore Tso
2007-05-09 20:19 ` Junio C Hamano
@ 2007-05-10 7:38 ` Junio C Hamano
1 sibling, 0 replies; 40+ messages in thread
From: Junio C Hamano @ 2007-05-10 7:38 UTC (permalink / raw)
To: Theodore Tso; +Cc: Daniel Barkalow, Git Mailing List
Theodore Tso <tytso@mit.edu> writes:
> Junio, would you be willing to apply this?
Yes, but ;-).
> static int pack_refs = -1;
> +static int aggressive_window = -1;
>
> +#define MAX_ADD 10
> static const char *argv_pack_refs[] = {"pack-refs", "--prune", NULL};
> static const char *argv_reflog[] = {"reflog", "expire", "--all", NULL};
> -static const char *argv_repack[] = {"repack", "-a", "-d", "-l", NULL};
> +static const char *argv_repack[MAX_ADD] = {"repack", "-a", "-d", "-l", NULL};
> static const char *argv_prune[] = {"prune", NULL};
> static const char *argv_rerere[] = {"rerere", "gc", NULL};
>
> @@ -34,13 +36,34 @@ static int gc_config(const char *var, const char *value)
> pack_refs = git_config_bool(var, value);
> return 0;
> }
> + if (!strcmp(var, "gc.aggressiveWindow")) {
Callbacks to git_config() are called with variable names
downcased (except for the 2nd level for 3-level variables. E.g.
[REMOTE "Foo"] URL = ...; becomes var = "remote.Foo.url", val =
...).
> + aggressive_window = git_config_int(var, value);
> + printf("aggressive_window = %d\n", aggressive_window);
Did you mean to leave this in? Looks like a debug remnant...
> + return 0;
> + }
> return git_default_config(var, value);
> }
>
> +static append_option(const char **cmd, const char *opt, int max_length)
Type is "static void" I presume.
> +{
> + int i;
Funny tab here.
> +
> + for (i=0; cmd[i]; i++)
> + ;
SP around operator '='.
> +
> + if (i+2 >= max_length) {
Same for '+'
> + fprintf(stderr, "Too many options specified\n");
> + exit(1);
die("Too many...specified"); /* note the lack of \n at the end */
> + }
> + cmd[i++] = opt;
> + cmd[i] = 0;
We tend to spell out NULL, although we all are aware that C says
literal '0' is the null pointer.
All fixups are trivial so I'd take the patch and amend locally.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc
2007-05-09 19:10 ` Shawn O. Pearce
@ 2007-06-10 7:40 ` Sam Vilain
2007-06-11 1:51 ` Nicolas Pitre
0 siblings, 1 reply; 40+ messages in thread
From: Sam Vilain @ 2007-06-10 7:40 UTC (permalink / raw)
To: Steven Grimm
Cc: Shawn O. Pearce, Junio C Hamano, Daniel Barkalow,
Theodore Ts'o, Git Mailing List
Shawn O. Pearce wrote:
>> On that note, has any thought been given to looking at other compression
>> algorithms? Gzip is a great high-speed compressor, but there are others
>> out there (some a bit slower, some much slower at both compression and
>> decompression) that produce substantially smaller output.
>>
> Its been discussed once before on the list, in very recent history,
> but not by a whole lot. As Junio pointed out, I don't think there
> ever really was any discussion of is gzip the best way to deflate the
> objects. I think gzip was just chosen simply because it was readily
> available in libz, stable, and has a pretty decent speed/size ratio.
>
I think it's the right tool. I just don't see any point in changing to
anything slower for the sake of 20% space saving. Especially bzip2.
Consider this.
Compression works primarily through two things: huffman coding and
string matching. The larger the window for your string matching, the
slower the compression and the more memory you need thrashing your CPU
memory cache when decompressing.
Now I'm not an expert on compression algorithms but I think a large part
of the reason gzip is blindingly faster than bzip2 is because gzip uses
a 64k buffer and bzip2 a 900k one. Only now are CPUs getting caches
large enough to deal with that size of buffer, the rest of the time
you're waiting for your RAM. Moore's law was supposed to make bzip2 fast
one of these days but I'm still waiting.
But with git-repack the window is effectively the size of your
repository. So that blows bzip2 out of the water. Why else can git make
compressed packs smaller than a .bz2 of the raw files? This is the same
observation Shawn makes with the pack-wide dictionary, but he sounds
like he wants to apply it to the huffman coding stage as well as the
current delta/string matching stage. Now that would be interesting...
Anyway it's a free world so be my guest to implement it, I guess if this
was selectable it would only be a minor annoyance waiting a bit longer
pulling from from some repositories, and it would be interesting to see
if it did make a big difference with pack file sizes.
Sam
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc
2007-06-10 7:40 ` Sam Vilain
@ 2007-06-11 1:51 ` Nicolas Pitre
2007-06-11 6:20 ` Steven Grimm
2007-06-11 10:20 ` Johannes Schindelin
0 siblings, 2 replies; 40+ messages in thread
From: Nicolas Pitre @ 2007-06-11 1:51 UTC (permalink / raw)
To: Sam Vilain
Cc: Steven Grimm, Shawn O. Pearce, Junio C Hamano, Daniel Barkalow,
Theodore Ts'o, Git Mailing List
On Sun, 10 Jun 2007, Sam Vilain wrote:
> Now I'm not an expert on compression algorithms but I think a large part
> of the reason gzip is blindingly faster than bzip2 is because gzip uses
> a 64k buffer and bzip2 a 900k one. Only now are CPUs getting caches
> large enough to deal with that size of buffer, the rest of the time
> you're waiting for your RAM. Moore's law was supposed to make bzip2 fast
> one of these days but I'm still waiting.
>
> Anyway it's a free world so be my guest to implement it, I guess if this
> was selectable it would only be a minor annoyance waiting a bit longer
> pulling from from some repositories, and it would be interesting to see
> if it did make a big difference with pack file sizes.
It won't happen for a simple reason: to be backward compatible with
older GIT clients. If you have your repo compressed with bzip2 and an
old client pulls it then the server would have to decompress and
recompress everything with gzip. If instead your repo remains with gzip
and a new client asks for bzip2 then you have to recompress as well
(slow). So in practice it is best to remain with a single compression
method.
Nicolas
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc
2007-06-11 1:51 ` Nicolas Pitre
@ 2007-06-11 6:20 ` Steven Grimm
2007-06-11 6:31 ` Shawn O. Pearce
2007-06-11 10:20 ` Johannes Schindelin
1 sibling, 1 reply; 40+ messages in thread
From: Steven Grimm @ 2007-06-11 6:20 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Sam Vilain, Shawn O. Pearce, Junio C Hamano, Daniel Barkalow,
Theodore Ts'o, Git Mailing List
Nicolas Pitre wrote:
>
> It won't happen for a simple reason: to be backward compatible with
> older GIT clients. If you have your repo compressed with bzip2 and an
> old client pulls it then the server would have to decompress and
> recompress everything with gzip. If instead your repo remains with gzip
> and a new client asks for bzip2 then you have to recompress as well
> (slow). So in practice it is best to remain with a single compression
> method.
>
Not that I really think this is all that important (my original question
was more out of curiosity than anything) but I don't think those are
really issues.
Even with a better compression scheme, nobody would want to remove gzip
support; if you are creating a repo that needs to be compatible with old
clients -- and of course not all repositories need that -- you'd just
configure it to use gzip (or more likely, *not* configure it to use the
other method.) There are already options whose documentation says,
"Don't enable this if you want your repo to be accessed by git clients
older than version X." In other words, git already has an established
approach for adding new non-backward-compatible features. Some projects,
e.g. internal corporate ones, can mandate that everyone upgrade their
clients, and more importantly, once any new git feature has been out for
a long enough time, even public projects can reasonably say, "You need
version X to access our repo." If alternate compression were introduced
today, in five years would anyone care that there'd be some ancient,
ancient clients that couldn't use it?
And since the hypothetical new client would have support for both
compression types, pulling from a gzip-based repo could be accomplished
totally transparently; you could, one assumes, even pull from a gzip
repo and pack locally using the other scheme if you felt like it.
-Steve
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc
2007-06-11 6:20 ` Steven Grimm
@ 2007-06-11 6:31 ` Shawn O. Pearce
0 siblings, 0 replies; 40+ messages in thread
From: Shawn O. Pearce @ 2007-06-11 6:31 UTC (permalink / raw)
To: Steven Grimm
Cc: Nicolas Pitre, Sam Vilain, Junio C Hamano, Daniel Barkalow,
Theodore Ts'o, Git Mailing List
Steven Grimm <koreth@midwinter.com> wrote:
> Nicolas Pitre wrote:
> >
> >It won't happen for a simple reason: to be backward compatible with
> >older GIT clients. If you have your repo compressed with bzip2 and an
> >old client pulls it then the server would have to decompress and
> >recompress everything with gzip. If instead your repo remains with gzip
> >and a new client asks for bzip2 then you have to recompress as well
> >(slow). So in practice it is best to remain with a single compression
> >method.
> >
>
> Not that I really think this is all that important (my original question
> was more out of curiosity than anything) but I don't think those are
> really issues.
...
> And since the hypothetical new client would have support for both
> compression types, pulling from a gzip-based repo could be accomplished
> totally transparently; you could, one assumes, even pull from a gzip
> repo and pack locally using the other scheme if you felt like it.
Right, I agree with Steven here.
Pack v4 (which is rapidly becoming vaporware it seems) uses such
a radically different encoding for its commit and tree objects
that you cannot send them as-is to a client unless that client
also understands pack v4. Consequently you need to decompress and
recompress those objects when talking to an older peer; this isn't
very different from a switch to bzip2.
That said, I think switching to a different generic compressor isn't
that interesting. We can probably do a lot better by organizing
files into clusters, where those clusters have large number of
symbols in common, and then compress everything in the same cluster
with the same pack-wide dictionary.
Specifically I'm thinking that you may be able to cluster the *.h/*.c
into one cluster, and the Makefile into another, and the .gitignore
into a third cluster, and then leverage the large commonalities
between those blobs when building a dictionary and compressing them.
No, I haven't prototyped it out yet.
--
Shawn.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc
2007-06-11 1:51 ` Nicolas Pitre
2007-06-11 6:20 ` Steven Grimm
@ 2007-06-11 10:20 ` Johannes Schindelin
2007-06-11 14:01 ` Nicolas Pitre
1 sibling, 1 reply; 40+ messages in thread
From: Johannes Schindelin @ 2007-06-11 10:20 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Sam Vilain, Steven Grimm, Shawn O. Pearce, Junio C Hamano,
Daniel Barkalow, Theodore Ts'o, Git Mailing List
Hi,
On Sun, 10 Jun 2007, Nicolas Pitre wrote:
> On Sun, 10 Jun 2007, Sam Vilain wrote:
>
> > Anyway it's a free world so be my guest to implement it, I guess if
> > this was selectable it would only be a minor annoyance waiting a bit
> > longer pulling from from some repositories, and it would be
> > interesting to see if it did make a big difference with pack file
> > sizes.
>
> It won't happen for a simple reason: to be backward compatible with
> older GIT clients. If you have your repo compressed with bzip2 and an
> old client pulls it then the server would have to decompress and
> recompress everything with gzip. If instead your repo remains with gzip
> and a new client asks for bzip2 then you have to recompress as well
> (slow). So in practice it is best to remain with a single compression
> method.
With the extension mechanism we have in place, the client can send what
kind of compression it supports, and the server can actually refuse to
send anything if it does not want to recompress.
What I am trying to say: you do not necessarily have to allow every client
to access that particular repository. I agree that mixed-compression repos
are evil, but nothing stands in the way of a flag allowing (or
disallowing) recompression in a different format when fetching.
So if you should decide someday to track data with Git (remember: Generic
Information Tracker, not just source code), that is particularly unfit for
compression with gzip, but that you _need_ to store in a different
compressed manner, you can set up a repository which will _only_ _ever_
use that compression.
Of course, you'd need to prepare Git for that, but I could imagine
something like a music library, which stores everything as ogg encoded
snippets. It might even use some perception-based hash on small chunks of
the music, and store the music as tree objects, which concatenate the
small chunks. I might even try to do this for fun, some day in the distant
future.
It's a wild world,
Dscho
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc
2007-06-11 10:20 ` Johannes Schindelin
@ 2007-06-11 14:01 ` Nicolas Pitre
2007-06-11 21:40 ` Johannes Schindelin
0 siblings, 1 reply; 40+ messages in thread
From: Nicolas Pitre @ 2007-06-11 14:01 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Sam Vilain, Steven Grimm, Shawn O. Pearce, Junio C Hamano,
Daniel Barkalow, Theodore Ts'o, Git Mailing List
On Mon, 11 Jun 2007, Johannes Schindelin wrote:
> Hi,
>
> On Sun, 10 Jun 2007, Nicolas Pitre wrote:
>
> > On Sun, 10 Jun 2007, Sam Vilain wrote:
> >
> > > Anyway it's a free world so be my guest to implement it, I guess if
> > > this was selectable it would only be a minor annoyance waiting a bit
> > > longer pulling from from some repositories, and it would be
> > > interesting to see if it did make a big difference with pack file
> > > sizes.
> >
> > It won't happen for a simple reason: to be backward compatible with
> > older GIT clients. If you have your repo compressed with bzip2 and an
> > old client pulls it then the server would have to decompress and
> > recompress everything with gzip. If instead your repo remains with gzip
> > and a new client asks for bzip2 then you have to recompress as well
> > (slow). So in practice it is best to remain with a single compression
> > method.
>
> With the extension mechanism we have in place, the client can send what
> kind of compression it supports, and the server can actually refuse to
> send anything if it does not want to recompress.
>
> What I am trying to say: you do not necessarily have to allow every client
> to access that particular repository. I agree that mixed-compression repos
> are evil, but nothing stands in the way of a flag allowing (or
> disallowing) recompression in a different format when fetching.
I know.
But is it worthwhile? I think not.
However I won't stand in the way of anyone who wants to try and provide
numbers. I just don't believe this is worthwhile and am not inclined to
do it.
OK... Well, I just performed a really quick test:
$ mkdir test-bzip2
$ mkdir test-gzip
$ cp git/*.[cho] test-bzip2
$ cp git/*.[cho] test-gzip
$ bzip2 test-bzip2/*
$ gzip test-gzip/*
$ du -s test-bzip2 test-gzip
5016 test-bzip2
4956 test-gzip
It is true that bzip2 is better with large files, but we typically have
very few of them in a Git repo, and in the presence of large files bzip2
then becomes _much_ slower than gzip. So, given that the nature of Git
objects are likely to be small in 98% of the cases due to deltas, it
appears that bzip2 won't be a gain at all but rather a waste, making a
poor case for supporting it forever afterwards.
> So if you should decide someday to track data with Git (remember: Generic
> Information Tracker, not just source code),
Bah... if you please.
> that is particularly unfit for
> compression with gzip, but that you _need_ to store in a different
> compressed manner, you can set up a repository which will _only_ _ever_
> use that compression.
Maybe. But you'd better have a concrete data set and result numbers to
convince me. Designing software for hypothetical situations before they
actually exist leads to bloatware.
Nicolas
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Add --no-reuse-delta option to git-gc
2007-06-11 14:01 ` Nicolas Pitre
@ 2007-06-11 21:40 ` Johannes Schindelin
0 siblings, 0 replies; 40+ messages in thread
From: Johannes Schindelin @ 2007-06-11 21:40 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Sam Vilain, Steven Grimm, Shawn O. Pearce, Junio C Hamano,
Daniel Barkalow, Theodore Ts'o, Git Mailing List
Hi,
On Mon, 11 Jun 2007, Nicolas Pitre wrote:
> But you'd better have a concrete data set and result numbers to convince
> me. Designing software for hypothetical situations before they actually
> exist leads to bloatware.
Right you are. So I'll not mention it until I have it ;-)
Ciao,
Dscho
^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2007-06-11 21:44 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-08 2:54 [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc Theodore Ts'o
2007-05-08 3:13 ` Nicolas Pitre
2007-05-08 3:21 ` Theodore Tso
2007-05-08 3:38 ` Dana How
2007-05-08 4:43 ` Junio C Hamano
2007-05-08 13:46 ` Nicolas Pitre
2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Theodore Ts'o
2007-05-08 13:28 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Theodore Ts'o
2007-05-08 13:28 ` [PATCH] Add --no-reuse-delta option to git-gc Theodore Ts'o
2007-05-08 15:35 ` Nicolas Pitre
2007-05-09 5:05 ` Daniel Barkalow
2007-05-09 8:15 ` Junio C Hamano
2007-05-09 9:02 ` Steven Grimm
2007-05-09 11:35 ` Other compression?, was " Johannes Schindelin
2007-05-09 15:15 ` Junio C Hamano
2007-05-09 19:10 ` Shawn O. Pearce
2007-06-10 7:40 ` Sam Vilain
2007-06-11 1:51 ` Nicolas Pitre
2007-06-11 6:20 ` Steven Grimm
2007-06-11 6:31 ` Shawn O. Pearce
2007-06-11 10:20 ` Johannes Schindelin
2007-06-11 14:01 ` Nicolas Pitre
2007-06-11 21:40 ` Johannes Schindelin
2007-05-09 19:48 ` [PATCH] Add --aggressive option to 'git gc' Theodore Tso
2007-05-09 20:19 ` Junio C Hamano
2007-05-09 22:22 ` Theodore Tso
2007-05-10 7:38 ` Junio C Hamano
2007-05-08 15:38 ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Nicolas Pitre
2007-05-08 16:30 ` Theodore Tso
2007-05-08 16:49 ` Johannes Schindelin
2007-05-08 18:09 ` Theodore Tso
2007-05-08 18:46 ` Nicolas Pitre
2007-05-09 13:49 ` Theodore Tso
2007-05-09 14:17 ` Johannes Schindelin
2007-05-08 17:07 ` Dana How
2007-05-08 17:35 ` Nicolas Pitre
2007-05-09 5:03 ` Junio C Hamano
2007-05-08 15:30 ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Nicolas Pitre
2007-05-08 21:12 ` Junio C Hamano
2007-05-08 23:59 ` Nicolas Pitre
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).