From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH 4/4] gc --aggressive: three phase repacking
Date: Sun, 16 Mar 2014 20:35:04 +0700 [thread overview]
Message-ID: <1394976904-15395-6-git-send-email-pclouds@gmail.com> (raw)
In-Reply-To: <1394976904-15395-1-git-send-email-pclouds@gmail.com>
As explained in the previous commit, current aggressive settings
--depth=250 --window=250 could slow down repository access
significantly. Notice that people usually work on recent history only,
we could keep recent history more loosely packed, so that repo access
is fast most of the time while the pack file remains small.
Three more configuration variables are used to make that happen. The
first one, gc.aggressiveCommitLimits covers the old history part,
which will be tightly packed. The remaining part will be packed with
gc.lessAggresiveWindow and gc.lessAggressiveDepth. If
gc.aggressiveCommitLimits is empty, everything will be tightly packed
as before.
The repack process becomes:
- repack -adf on old history (e.g. the default --before=1.year.ago)
mark to keep that pack
- repack the second time with lessAggressive settings, the kept pack
should be left untouched.
- remove .keep file and repack the final time, reusing all deltas
This process costs more time, but produce a more effecient pack. It is
assumed that people who do "gc --aggressive" do not do this often and
do not mind if it takes a bit longer.
git.git is not a great repo to test it because its size is modest but
so are my laptop's cpu and memory, so here are the timings and pack
sizes
size time
old aggr. 36MB 5m51
new aggr. 37MB 6m13
repack -adf 48MB 1m12
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
Documentation/config.txt | 19 +++++++++
builtin/gc.c | 109 +++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 124 insertions(+), 4 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 5ce7f9a..47979dc 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1161,6 +1161,25 @@ gc.aggressiveWindow::
algorithm used by 'git gc --aggressive'. This defaults
to 250.
+gc.aggressiveCommitLimits::
+ This one parameter to linkgit:git-rev-list[1] to select
+ commits that are repacked with gc.aggressiveDepth and
+ gc.aggressiveWindow, while the remaining commits are repacked
+ with gc.lessAggressiveDepth and gc.lessAggressiveWindow.
++
+If this is an empty string, everything will be repacked with
+gc.aggressiveWindow and gc.aggressiveDepth.
+
+gc.lessAggressiveDepth::
+ The depth parameter used in the delta compression
+ algorithm used by 'git gc --aggressive' when
+ gc.aggressiveCommitLimits is set. This defaults to 50.
+
+gc.lessAggressiveWindow::
+ The window size parameter used in the delta compression
+ algorithm used by 'git gc --aggressive' when
+ gc.aggressiveCommitLimits is set. This defaults to 250.
+
gc.auto::
When there are approximately more than this many loose
objects in the repository, `git gc --auto` will pack them.
diff --git a/builtin/gc.c b/builtin/gc.c
index 72aa912..37fc603 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -28,10 +28,14 @@ static const char * const builtin_gc_usage[] = {
static int pack_refs = 1;
static int aggressive_depth = 250;
static int aggressive_window = 250;
+static const char *aggressive_rev_list = "--before=1.year.ago";
+static int less_aggressive_depth = 50;
+static int less_aggressive_window = 250;
static int gc_auto_threshold = 6700;
static int gc_auto_pack_limit = 50;
static int detach_auto = 1;
static const char *prune_expire = "2.weeks.ago";
+static int delta_base_offset = 1;
static struct argv_array pack_refs_cmd = ARGV_ARRAY_INIT;
static struct argv_array reflog = ARGV_ARRAY_INIT;
@@ -39,10 +43,13 @@ static struct argv_array repack = ARGV_ARRAY_INIT;
static struct argv_array prune = ARGV_ARRAY_INIT;
static struct argv_array rerere = ARGV_ARRAY_INIT;
+static char *keep_file;
static char *pidfile;
static void remove_pidfile(void)
{
+ if (keep_file)
+ unlink_or_warn(keep_file);
if (pidfile)
unlink(pidfile);
}
@@ -54,6 +61,63 @@ static void remove_pidfile_on_signal(int signo)
raise(signo);
}
+static void pack_old_history(int quiet)
+{
+ struct child_process pack_objects;
+ struct child_process rev_list;
+ struct argv_array av_po = ARGV_ARRAY_INIT;
+ struct argv_array av_rl = ARGV_ARRAY_INIT;
+ char sha1[41];
+
+ argv_array_pushl(&av_rl, "rev-list", "--all", "--objects",
+ "--reflog", NULL);
+ argv_array_push(&av_rl, aggressive_rev_list);
+
+ memset(&rev_list, 0, sizeof(rev_list));
+ rev_list.no_stdin = 1;
+ rev_list.out = -1;
+ rev_list.git_cmd = 1;
+ rev_list.argv = av_rl.argv;
+
+ if (start_command(&rev_list))
+ die(_("gc: unable to fork git-rev-list"));
+
+ argv_array_pushl(&av_po, "pack-objects", "--keep-true-parents",
+ "--honor-pack-keep", "--non-empty", "--no-reuse-delta",
+ "--keep", "--local", NULL);
+ if (delta_base_offset)
+ argv_array_push(&av_po, "--delta-base-offset");
+ if (quiet)
+ argv_array_push(&av_po, "-q");
+ if (aggressive_window)
+ argv_array_pushf(&av_po, "--window=%d", aggressive_window);
+ if (aggressive_depth)
+ argv_array_pushf(&av_po, "--depth=%d", aggressive_depth);
+ argv_array_push(&av_po, git_path("objects/pack/pack"));
+
+ memset(&pack_objects, 0, sizeof(pack_objects));
+ pack_objects.in = rev_list.out;
+ pack_objects.out = -1;
+ pack_objects.git_cmd = 1;
+ pack_objects.argv = av_po.argv;
+
+ if (start_command(&pack_objects))
+ die(_("gc: unable to fork git-pack-objects"));
+
+ if (read_in_full(pack_objects.out, sha1, 41) != 41 ||
+ sha1[40] != '\n')
+ die_errno(_("gc: pack-objects did not return the new pack's SHA-1"));
+ sha1[40] = '\0';
+ keep_file = git_pathdup("objects/pack/pack-%s.keep", sha1);
+ close(pack_objects.out);
+
+ if (finish_command(&rev_list))
+ die(_("gc: git-rev-list died with error"));
+
+ if (finish_command(&pack_objects))
+ die(_("gc: git-pack-objects died with error"));
+}
+
static int gc_config(const char *var, const char *value, void *cb)
{
if (!strcmp(var, "gc.packrefs")) {
@@ -71,6 +135,22 @@ static int gc_config(const char *var, const char *value, void *cb)
aggressive_depth = git_config_int(var, value);
return 0;
}
+ if (!strcmp(var, "gc.aggressivecommitlimits")) {
+ aggressive_rev_list = value && *value ? xstrdup(value) : NULL;
+ return 0;
+ }
+ if (!strcmp(var, "gc.lessaggressivewindow")) {
+ less_aggressive_window = git_config_int(var, value);
+ return 0;
+ }
+ if (!strcmp(var, "gc.lessaggressivedepth")) {
+ less_aggressive_depth = git_config_int(var, value);
+ return 0;
+ }
+ if (!strcmp(var, "repack.usedeltabaseoffset")) {
+ delta_base_offset = git_config_bool(var, value);
+ return 0;
+ }
if (!strcmp(var, "gc.auto")) {
gc_auto_threshold = git_config_int(var, value);
return 0;
@@ -298,11 +378,19 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
usage_with_options(builtin_gc_usage, builtin_gc_options);
if (aggressive) {
+ int depth, window;
+ if (aggressive_rev_list) {
+ depth = less_aggressive_depth;
+ window = less_aggressive_window;
+ } else {
+ depth = aggressive_depth;
+ window = aggressive_window;
+ }
argv_array_push(&repack, "-f");
- if (aggressive_depth > 0)
- argv_array_pushf(&repack, "--depth=%d", aggressive_depth);
- if (aggressive_window > 0)
- argv_array_pushf(&repack, "--window=%d", aggressive_window);
+ if (depth > 0)
+ argv_array_pushf(&repack, "--depth=%d", depth);
+ if (window > 0)
+ argv_array_pushf(&repack, "--window=%d", window);
}
if (quiet)
argv_array_push(&repack, "-q");
@@ -343,9 +431,22 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
if (run_command_v_opt(reflog.argv, RUN_GIT_CMD))
return error(FAILED_RUN, reflog.argv[0]);
+ if (aggressive && aggressive_rev_list)
+ pack_old_history(quiet);
+
if (run_command_v_opt(repack.argv, RUN_GIT_CMD))
return error(FAILED_RUN, repack.argv[0]);
+ if (aggressive && aggressive_rev_list) {
+ if (keep_file)
+ unlink_or_warn(keep_file);
+ argv_array_clear(&repack);
+ argv_array_pushl(&repack, "repack", "-d", "-l", NULL);
+ add_repack_all_option();
+ if (run_command_v_opt(repack.argv, RUN_GIT_CMD))
+ return error(FAILED_RUN, repack.argv[0]);
+ }
+
if (prune_expire) {
argv_array_push(&prune, prune_expire);
if (quiet)
--
1.9.0.40.gaa8c3ea
next prev parent reply other threads:[~2014-03-16 13:35 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-16 13:34 [PATCH 0/4] Better "gc --aggressive" Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH 1/4] environment.c: fix constness for odb_pack_keep() Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH] index-pack: do not segfault when keep_name is NULL Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH 2/4] pack-objects: support --keep Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH 3/4] gc --aggressive: make --depth configurable Nguyễn Thái Ngọc Duy
[not found] ` <CAG+J_Dw=Y5d2JTOngkxH=vNg3C43nP5=y7S6VXS=aHgmBshYZQ@mail.gmail.com>
2014-03-16 23:06 ` Duy Nguyen
2014-03-16 13:35 ` Nguyễn Thái Ngọc Duy [this message]
2014-03-17 22:12 ` [PATCH 4/4] gc --aggressive: three phase repacking Junio C Hamano
2014-03-17 22:59 ` Duy Nguyen
2014-03-17 23:07 ` Junio C Hamano
2014-03-18 4:50 ` Jeff King
2014-03-18 5:00 ` Duy Nguyen
2014-03-18 5:13 ` Jeff King
2014-03-18 6:16 ` David Kastrup
2014-03-19 11:03 ` Duy Nguyen
2014-03-18 5:07 ` Jeff King
2014-03-18 5:16 ` Duy Nguyen
2014-03-18 6:19 ` Duy Nguyen
2014-03-18 7:38 ` David Kastrup
[not found] ` <CALbm-EbZSuzynXoUNEifP=Ga_mj6Fp9L9Do-mxhRdMvUEfogig@mail.gmail.com>
2014-03-20 1:31 ` Duy Nguyen
2014-03-18 6:19 ` David Kastrup
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1394976904-15395-6-git-send-email-pclouds@gmail.com \
--to=pclouds@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).