* [PATCH] gc --auto: warn garbage collection happens soon
@ 2011-12-27 13:45 Nguyễn Thái Ngọc Duy
2011-12-27 21:52 ` Junio C Hamano
0 siblings, 1 reply; 8+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2011-12-27 13:45 UTC (permalink / raw)
To: git; +Cc: Nguyễn Thái Ngọc Duy
This gives users a chance to run gc explicitly elsewhere if they do not
want gc to run suddenly in current terminal.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
v2 of a patch posted a few months ago. The warning limits are in
percentage and configurable. I could have set the default limits to
100% (i.e. no warnings) to keep current behavior. However I think
warning is better.
May need rewording inn config.txt, I'm not sure I state it clearly.
Documentation/config.txt | 12 ++++++++++++
Documentation/git-gc.txt | 4 ++++
builtin/gc.c | 41 +++++++++++++++++++++++++++++++++++++++--
3 files changed, 55 insertions(+), 2 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 5a841da..c263496 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -965,12 +965,24 @@ gc.auto::
light-weight garbage collection from time to time. The
default value is 6700. Setting this to 0 disables it.
+gc.autowarn::
+ The percentage of loose objects specified in `gc.auto`. If the
+ number of loose objects exceeds this limit, `git gc --auto`
+ will warn users garbage collection will happen soon. Default
+ value is 90. Setting this to 100 disables it.
+
gc.autopacklimit::
When there are more than this many packs that are not
marked with `*.keep` file in the repository, `git gc
--auto` consolidates them into one larger pack. The
default value is 50. Setting this to 0 disables it.
+gc.autopackwarn::
+ The percentage of packs specified in `gc.autopacklimit`. If
+ the number of packs exceeds this limit, `git gc --auto` will
+ warn users garbage collection will happen soon. Default value
+ is 90. Setting this to 100 disables it.
+
gc.packrefs::
Running `git pack-refs` in a repository renders it
unclonable by Git versions prior to 1.5.1.2 over dumb
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 815afcb..937b3d6 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -59,6 +59,10 @@ then existing packs (except those marked with a `.keep` file)
are consolidated into a single pack by using the `-A` option of
'git repack'. Setting `gc.autopacklimit` to 0 disables
automatic consolidation of packs.
++
+`git gc --auto` will warn users when the number of loose objects or
+packs is close to the limits. See `gc.autowarn` and `gc.autopackwarn`
+for details.
--prune=<date>::
Prune loose objects older than date (default is 2 weeks ago,
diff --git a/builtin/gc.c b/builtin/gc.c
index 0498094..f3fa46d 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -28,6 +28,10 @@ static int gc_auto_threshold = 6700;
static int gc_auto_pack_limit = 50;
static const char *prune_expire = "2.weeks.ago";
+/* numbers are in percent, to be converted to absolute later */
+static int gc_warn_auto_threshold = 90;
+static int gc_warn_auto_pack_limit = 90;
+
#define MAX_ADD 10
static const char *argv_pack_refs[] = {"pack-refs", "--all", "--prune", NULL};
static const char *argv_reflog[] = {"reflog", "expire", "--all", NULL};
@@ -52,10 +56,26 @@ static int gc_config(const char *var, const char *value, void *cb)
gc_auto_threshold = git_config_int(var, value);
return 0;
}
+ if (!strcmp(var, "gc.autowarn")) {
+ int percent = percent = git_config_int(var, value);
+ if (percent <= 0 || percent > 100)
+ die(_("gc.autowarn %d%% does not make sense"),
+ percent);
+ gc_warn_auto_threshold = percent;
+ return 0;
+ }
if (!strcmp(var, "gc.autopacklimit")) {
gc_auto_pack_limit = git_config_int(var, value);
return 0;
}
+ if (!strcmp(var, "gc.autopackwarn")) {
+ int percent = percent = git_config_int(var, value);
+ if (percent <= 0 || percent > 100)
+ die(_("gc.autopackwarn %d%% does not make sense"),
+ percent);
+ gc_warn_auto_pack_limit = percent;
+ return 0;
+ }
if (!strcmp(var, "gc.pruneexpire")) {
if (value && strcmp(value, "now")) {
unsigned long now = approxidate("now");
@@ -118,7 +138,15 @@ static int too_many_loose_objects(void)
}
}
closedir(dir);
- return needed;
+ if (needed)
+ return 1;
+
+ auto_threshold = (gc_warn_auto_threshold + 255) / 256;
+ if (num_loose >= auto_threshold)
+ warning(_("Too many loose objects (current approx. %d, limit %d).\n"
+ "\"git gc\" will soon run automatically"),
+ num_loose * 256, gc_auto_threshold);
+ return 0;
}
static int too_many_packs(void)
@@ -141,7 +169,14 @@ static int too_many_packs(void)
*/
cnt++;
}
- return gc_auto_pack_limit <= cnt;
+ if (gc_auto_pack_limit <= cnt)
+ return 1;
+
+ if (gc_warn_auto_pack_limit <= cnt)
+ warning(_("Too many packs (current %d, limit %d)\n"
+ "\"git gc\" will soon run automatically."),
+ cnt, gc_auto_pack_limit);
+ return 0;
}
static int need_to_gc(void)
@@ -193,6 +228,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
usage_with_options(builtin_gc_usage, builtin_gc_options);
git_config(gc_config, NULL);
+ gc_warn_auto_threshold = 0.01 * gc_auto_threshold * gc_warn_auto_threshold;
+ gc_warn_auto_pack_limit = 0.01 * gc_auto_pack_limit * gc_auto_pack_limit;
if (pack_refs < 0)
pack_refs = !is_bare_repository();
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] gc --auto: warn garbage collection happens soon
2011-12-27 13:45 [PATCH] gc --auto: warn garbage collection happens soon Nguyễn Thái Ngọc Duy
@ 2011-12-27 21:52 ` Junio C Hamano
2011-12-28 18:40 ` Jeff King
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Junio C Hamano @ 2011-12-27 21:52 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> This gives users a chance to run gc explicitly elsewhere if they do not
> want gc to run suddenly in current terminal.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
As I am still in a cheerly holiday mood, let's be a bit philosophical,
step back a bit and think.
After this patch gets applied, will the users start feeling bothered by
repeated "you will soon see auto-gc" messages and will want "you will soon
start seeing the you will soon see auto-gc messages" warnings?
And if the answer to that tongue-in-cheek question is no, what is the
reason why the users will not find the messages disturbing, while loathing
the auto-gc?
I suspect that is because auto-gc takes long time, making the user wait,
compared to the new message that may be noisy but quick. Perhaps the real
cure for the disease is not to add the message but to make an auto-gc less
painful, no?
What are the things we could do to make auto-gc less painful?
Are we doing something that is not necessary in auto-gc that takes time
but that we can live without doing?
It may be a better cure for the disease to force a full gc after
operations that we know the users already know to take long time (e.g. a
clone, a large fetch), so that the next auto-gc do not have to do much
work.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] gc --auto: warn garbage collection happens soon
2011-12-27 21:52 ` Junio C Hamano
@ 2011-12-28 18:40 ` Jeff King
2011-12-28 20:02 ` Junio C Hamano
2011-12-28 21:50 ` Nguyen Thai Ngoc Duy
2011-12-29 18:29 ` Mark Brown
2 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2011-12-28 18:40 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Nguyễn Thái Ngọc Duy, git
On Tue, Dec 27, 2011 at 01:52:35PM -0800, Junio C Hamano wrote:
> And if the answer to that tongue-in-cheek question is no, what is the
> reason why the users will not find the messages disturbing, while loathing
> the auto-gc?
>
> I suspect that is because auto-gc takes long time, making the user wait,
> compared to the new message that may be noisy but quick. Perhaps the real
> cure for the disease is not to add the message but to make an auto-gc less
> painful, no?
>
> What are the things we could do to make auto-gc less painful?
>
> Are we doing something that is not necessary in auto-gc that takes time
> but that we can live without doing?
I don't personally find gc all that painful (though maybe that is
because I tend to gc myself and rarely hit the auto-gc), but I have
noticed that git-prune takes by far the most time to run. If you are
just doing an incremental pack, you might be packing only a few thousand
objects and not touching old history at all (and with many cores, the
delta compression flies by). But prune requires running "git rev-list
--objects --all", which takes something like 45 seconds for linux-2.6 on
my fast-ish laptop (and about 23 seconds for git.git).
We could perhaps cut out pruning in the auto-gc case unless there are a
lot of objects left over after the packing phase. It's not worth doing a
full prune to clean up a dozen objects[1]. It probably is if you have a
thousand objects left after packing.
-Peff
[1] Actually, it's not just having objects. You may have just exploded
unreachable objects from a pack, but they are still younger than the
2 week expiration period. Therefore trying to prune them is
pointless, because even if they are unreachable, you won't delete
them. So you really want to say "how many actual candidate objects
do we have for pruning?"
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] gc --auto: warn garbage collection happens soon
2011-12-28 18:40 ` Jeff King
@ 2011-12-28 20:02 ` Junio C Hamano
2011-12-28 21:30 ` Jeff King
0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2011-12-28 20:02 UTC (permalink / raw)
To: Jeff King; +Cc: Nguyễn Thái Ngọc Duy, git
Jeff King <peff@peff.net> writes:
> [1] Actually, it's not just having objects. You may have just exploded
> unreachable objects from a pack, but they are still younger than the
> 2 week expiration period. Therefore trying to prune them is
> pointless, because even if they are unreachable, you won't delete
> them. So you really want to say "how many actual candidate objects
> do we have for pruning?"
An obvious knee-jerk reaction is "Ugh, if we have very recently repacked,
don't we know what are reachable and what are not already, and use that
knowledge while pruning to avoid traversing everything again?"
My memory around repack, fsck and prune needs refreshing, though, to tell
if that suggestion is feasible.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] gc --auto: warn garbage collection happens soon
2011-12-28 20:02 ` Junio C Hamano
@ 2011-12-28 21:30 ` Jeff King
2011-12-28 22:09 ` Nguyen Thai Ngoc Duy
0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2011-12-28 21:30 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Nguyễn Thái Ngọc Duy, git
On Wed, Dec 28, 2011 at 12:02:18PM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > [1] Actually, it's not just having objects. You may have just exploded
> > unreachable objects from a pack, but they are still younger than the
> > 2 week expiration period. Therefore trying to prune them is
> > pointless, because even if they are unreachable, you won't delete
> > them. So you really want to say "how many actual candidate objects
> > do we have for pruning?"
>
> An obvious knee-jerk reaction is "Ugh, if we have very recently repacked,
> don't we know what are reachable and what are not already, and use that
> knowledge while pruning to avoid traversing everything again?"
Especially now that prune has learned about progress reporting, it's
easy to see in "git gc" that the "Counting objects" phase of the repack
and the connectivity search in prune are counting the same objects. It
would obviously be easy to just dump the set of sha1s in packed binary
format, and let git-prune reference that.
But it doesn't work in the general case. Running "git gc" will repack
everything, and so it looks at all reachable objects. But "git gc
--auto" will typically do an incremental pack (unless you have too many
packs), which means its counting objects phase only looks at part of
the graph. So that result can't be used for object reachability, since
many objects won't be marked[1].
So yes, it's an optimization we can do, but it only works some of the
time. And worse, it works in the time we care less (when we are doing a
full repack anyway, so we are already spending more time counting
objects, and more I/O rewriting existing packed objects), but not when
we want it most (doing a few seconds of incremental repack during "git
gc --auto", which balloons to a minute because of the prune time).
-Peff
[1] It's tempting to say "well, we just repacked incrementally, so if
something was referenced and not packed, we would have just packed
it, right?" But look at how incremental packing works. We do a
traversal with "--unpacked", which means we don't dig down past
commit objects that are already packed. And that's why its fast.
But packs don't necessarily respect reachability. It's possible for
you to have object X in a pack, but X^{tree} is not (or X^, or
whatever)[2]. I believe using "git repack" would fail to actually
pack that. But that's OK, because it almost never happens, and the
worst case is that the object doesn't get packed until you do a full
repack.
But I'm not sure you would want the same level of shortcut for
git-prune, which would actually be _deleting_ the object. We want to
be very sure in that case.
[2] The obvious way to get into this situation is to give weird rev-list
parameters to pack-objects. But I think you could also do it
accidentally by having commit X loose, then fetching history
containing commit Y that builds on X. If the fetch is big enough,
we'll keep the pack that we got from the other side. So X remains
loose, but its ancestors are packed. Running an incremental repack
will stop the traversal at Y and never consider X for packing.
I didn't actually test this, but that's my reading of the code (see
the revs->unpacked check in revision.c:get_commit_action).
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] gc --auto: warn garbage collection happens soon
2011-12-28 21:30 ` Jeff King
@ 2011-12-28 22:09 ` Nguyen Thai Ngoc Duy
0 siblings, 0 replies; 8+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-12-28 22:09 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git
2011/12/29 Jeff King <peff@peff.net>:
> Especially now that prune has learned about progress reporting, it's
> easy to see in "git gc" that the "Counting objects" phase of the repack
> and the connectivity search in prune are counting the same objects. It
> would obviously be easy to just dump the set of sha1s in packed binary
> format, and let git-prune reference that.
>
> But it doesn't work in the general case. Running "git gc" will repack
> everything, and so it looks at all reachable objects. But "git gc
> --auto" will typically do an incremental pack (unless you have too many
> packs), which means its counting objects phase only looks at part of
> the graph. So that result can't be used for object reachability, since
> many objects won't be marked[1].
Hmm.. I was thinking of sharing this "counting objects" part when
repack is rewritten in C. I guess I can drop the idea now.
--
Duy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] gc --auto: warn garbage collection happens soon
2011-12-27 21:52 ` Junio C Hamano
2011-12-28 18:40 ` Jeff King
@ 2011-12-28 21:50 ` Nguyen Thai Ngoc Duy
2011-12-29 18:29 ` Mark Brown
2 siblings, 0 replies; 8+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-12-28 21:50 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
2011/12/28 Junio C Hamano <gitster@pobox.com>:
> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>
>> This gives users a chance to run gc explicitly elsewhere if they do not
>> want gc to run suddenly in current terminal.
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>
> As I am still in a cheerly holiday mood, let's be a bit philosophical,
> step back a bit and think.
>
> After this patch gets applied, will the users start feeling bothered by
> repeated "you will soon see auto-gc" messages and will want "you will soon
> start seeing the you will soon see auto-gc messages" warnings?
They should not for most of the time, given the default settings is
warnings at 90% limits. If they do feel bothered, they could turn it
off or just run "gc".
> And if the answer to that tongue-in-cheek question is no, what is the
> reason why the users will not find the messages disturbing, while loathing
> the auto-gc?
>
> I suspect that is because auto-gc takes long time, making the user wait,
> compared to the new message that may be noisy but quick. Perhaps the real
> cure for the disease is not to add the message but to make an auto-gc less
> painful, no?
It's something with expected run time of a command. When I'm about to
run "commit", I know the command is fast and I expect the shell prompt
soon. When I run "fetch", I know it may take a bit (or a lot) of time
and I will be ready to make myself a cup of coffee while it's running.
auto-gc is an unknown factor and may break my expectations. I would
not mind if auto-gc is extremely fast, e.g. a couple of seconds
maximum. But gc time seems to be proportional to repository size.
> What are the things we could do to make auto-gc less painful?
>
> Are we doing something that is not necessary in auto-gc that takes time
> but that we can live without doing?
>
> It may be a better cure for the disease to force a full gc after
> operations that we know the users already know to take long time (e.g. a
> clone, a large fetch), so that the next auto-gc do not have to do much
> work.
git works best when everything is in one pack. So while we may be able
to skip stuff and make auto-gc fast the first few times, eventually we
need to do something like "git repack -ad" as part of auto-gc. I don't
see any way to make that part complete in a few secs regardless repo
size (unless packv4 comes in time and speeds up revlist
significantly). So the pain will be there in the end, it's just
delayed.
There's another possibility (but not sure if it's feasible): to make
auto-gc use up to certain amount of time. If it runs out of allocated
time, it needs to save its state somewhere, somehow and resumes in
next auto-gc.
--
Duy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] gc --auto: warn garbage collection happens soon
2011-12-27 21:52 ` Junio C Hamano
2011-12-28 18:40 ` Jeff King
2011-12-28 21:50 ` Nguyen Thai Ngoc Duy
@ 2011-12-29 18:29 ` Mark Brown
2 siblings, 0 replies; 8+ messages in thread
From: Mark Brown @ 2011-12-29 18:29 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Nguy???n Th??i Ng???c Duy, git
On Tue, Dec 27, 2011 at 01:52:35PM -0800, Junio C Hamano wrote:
> And if the answer to that tongue-in-cheek question is no, what is the
> reason why the users will not find the messages disturbing, while loathing
> the auto-gc?
The main problem I've noticed with the auto gc is that git gui seems to
want to do one at a much lower threashold than the command line tools
(and far too aggressive), it seems that the logic that determines when
to do one isn't quite in agreement within all the git tools.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-12-29 18:29 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-27 13:45 [PATCH] gc --auto: warn garbage collection happens soon Nguyễn Thái Ngọc Duy
2011-12-27 21:52 ` Junio C Hamano
2011-12-28 18:40 ` Jeff King
2011-12-28 20:02 ` Junio C Hamano
2011-12-28 21:30 ` Jeff King
2011-12-28 22:09 ` Nguyen Thai Ngoc Duy
2011-12-28 21:50 ` Nguyen Thai Ngoc Duy
2011-12-29 18:29 ` Mark Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).