Git development
 help / color / mirror / Atom feed
* [PATCH v6] gc: ignore old gc.log files
From: David Turner @ 2017-02-10 21:28 UTC (permalink / raw)
  To: git; +Cc: peff, pclouds, David Turner, Junio C Hamano

A server can end up in a state where there are lots of unreferenced
loose objects (say, because many users are doing a bunch of rebasing
and pushing their rebased branches).  Running "git gc --auto" in
this state would cause a gc.log file to be created, preventing
future auto gcs, causing pack files to pile up.  Since many git
operations are O(n) in the number of pack files, this would lead to
poor performance.

Git should never get itself into a state where it refuses to do any
maintenance, just because at some point some piece of the maintenance
didn't make progress.

Teach Git to ignore gc.log files which are older than (by default)
one day old, which can be tweaked via the gc.logExpiry configuration
variable.  That way, these pack files will get cleaned up, if
necessary, at least once per day.  And operators who find a need for
more-frequent gcs can adjust gc.logExpiry to meet their needs.

There is also some cleanup: a successful manual gc, or a
warning-free auto gc with an old log file, will remove any old
gc.log files.

It might still happen that manual intervention is required
(e.g. because the repo is corrupt), but at the very least it won't
be because Git is too dumb to try again.

Signed-off-by: David Turner <dturner@twosigma.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt |  6 +++++
 builtin/gc.c             | 57 ++++++++++++++++++++++++++++++++++++++++++------
 t/t6500-gc.sh            | 15 +++++++++++++
 3 files changed, 71 insertions(+), 7 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index fc5a28a32..a684b7e3e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1402,6 +1402,12 @@ gc.autoDetach::
 	Make `git gc --auto` return immediately and run in background
 	if the system supports it. Default is true.
 
+gc.logExpiry::
+	If the file gc.log exists, then `git gc --auto` won't run
+	unless that file is more than 'gc.logExpiry' old.  Default is
+	"1.day".  See `gc.pruneExpire` for more ways to specify its
+	value.
+
 gc.packRefs::
 	Running `git pack-refs` in a repository renders it
 	unclonable by Git versions prior to 1.5.1.2 over dumb
diff --git a/builtin/gc.c b/builtin/gc.c
index 331f21926..a2b9e8924 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -33,6 +33,8 @@ static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
 static int detach_auto = 1;
+static unsigned long gc_log_expire_time;
+static const char *gc_log_expire = "1.day.ago";
 static const char *prune_expire = "2.weeks.ago";
 static const char *prune_worktrees_expire = "3.months.ago";
 
@@ -76,10 +78,28 @@ static void git_config_date_string(const char *key, const char **output)
 static void process_log_file(void)
 {
 	struct stat st;
-	if (!fstat(get_lock_file_fd(&log_lock), &st) && st.st_size)
+	if (fstat(get_lock_file_fd(&log_lock), &st)) {
+		/*
+		 * Perhaps there was an i/o error or another
+		 * unlikely situation.  Try to make a note of
+		 * this in gc.log along with any existing
+		 * messages.
+		 */
+		int saved_errno = errno;
+		fprintf(stderr, _("Failed to fstat %s: %s"),
+			get_tempfile_path(&log_lock.tempfile),
+			strerror(saved_errno));
+		fflush(stderr);
 		commit_lock_file(&log_lock);
-	else
+		errno = saved_errno;
+	} else if (st.st_size) {
+		/* There was some error recorded in the lock file */
+		commit_lock_file(&log_lock);
+	} else {
+		/* No error, clean up any old gc.log */
+		unlink(git_path("gc.log"));
 		rollback_lock_file(&log_lock);
+	}
 }
 
 static void process_log_file_at_exit(void)
@@ -113,6 +133,8 @@ static void gc_config(void)
 	git_config_get_bool("gc.autodetach", &detach_auto);
 	git_config_date_string("gc.pruneexpire", &prune_expire);
 	git_config_date_string("gc.worktreepruneexpire", &prune_worktrees_expire);
+	git_config_date_string("gc.logexpiry", &gc_log_expire);
+
 	git_config(git_default_config, NULL);
 }
 
@@ -290,19 +312,34 @@ static const char *lock_repo_for_gc(int force, pid_t* ret_pid)
 static int report_last_gc_error(void)
 {
 	struct strbuf sb = STRBUF_INIT;
-	int ret;
+	int ret = 0;
+	struct stat st;
+	char *gc_log_path = git_pathdup("gc.log");
 
-	ret = strbuf_read_file(&sb, git_path("gc.log"), 0);
+	if (stat(gc_log_path, &st)) {
+		if (errno == ENOENT)
+			goto done;
+
+		ret = error_errno(_("Can't stat %s"), gc_log_path);
+		goto done;
+	}
+
+	if (st.st_mtime < gc_log_expire_time)
+		goto done;
+
+	ret = strbuf_read_file(&sb, gc_log_path, 0);
 	if (ret > 0)
-		return error(_("The last gc run reported the following. "
+		ret = error(_("The last gc run reported the following. "
 			       "Please correct the root cause\n"
 			       "and remove %s.\n"
 			       "Automatic cleanup will not be performed "
 			       "until the file is removed.\n\n"
 			       "%s"),
-			     git_path("gc.log"), sb.buf);
+			    gc_log_path, sb.buf);
 	strbuf_release(&sb);
-	return 0;
+done:
+	free(gc_log_path);
+	return ret;
 }
 
 static int gc_before_repack(void)
@@ -349,7 +386,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	argv_array_pushl(&prune_worktrees, "worktree", "prune", "--expire", NULL);
 	argv_array_pushl(&rerere, "rerere", "gc", NULL);
 
+	/* default expiry time, overwritten in gc_config */
 	gc_config();
+	if (parse_expiry_date(gc_log_expire, &gc_log_expire_time))
+		die(_("Failed to parse gc.logexpiry value %s"), gc_log_expire);
 
 	if (pack_refs < 0)
 		pack_refs = !is_bare_repository();
@@ -448,5 +488,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		warning(_("There are too many unreachable loose objects; "
 			"run 'git prune' to remove them."));
 
+	if (!daemonized)
+		unlink(git_path("gc.log"));
+
 	return 0;
 }
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 1762dfa6a..08de2e8ab 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -67,5 +67,20 @@ test_expect_success 'auto gc with too many loose objects does not attempt to cre
 	test_line_count = 2 new # There is one new pack and its .idx
 '
 
+test_expect_success 'background auto gc does not run if gc.log is present and recent but does if it is old' '
+	test_commit foo &&
+	test_commit bar &&
+	git repack &&
+	test_config gc.autopacklimit 1 &&
+	test_config gc.autodetach true &&
+	echo fleem >.git/gc.log &&
+	test_must_fail git gc --auto 2>err &&
+	test_i18ngrep "^error:" err &&
+	test_config gc.logexpiry 5.days &&
+	test-chmtime =-345600 .git/gc.log &&
+	test_must_fail git gc --auto &&
+	test_config gc.logexpiry 2.days &&
+	git gc --auto
+'
 
 test_done
-- 
2.11.GIT


^ permalink raw reply related

* Re: What's cooking in git.git (Feb 2017, #02; Mon, 6)
From: Junio C Hamano @ 2017-02-10 21:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Siddharth Kannan, git
In-Reply-To: <xmqqy3xgrlvh.fsf@gitster.mtv.corp.google.com>

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> On Mon, Feb 06, 2017 at 02:34:08PM -0800, Junio C Hamano wrote:
>>
>>> * sk/parse-remote-cleanup (2017-02-06) 1 commit
>>>   (merged to 'next' on 2017-02-06 at 6ec89f72d5)
>>>  + parse-remote: remove reference to unused op_prep
>>> 
>>>  Code clean-up.
>>> 
>>>  Will merge to 'master'.
>>
>> Hrm. Are the functions in git-parse-remote.sh part of the public API?
>> That is, do we expect third-party scripts to do:
>>
>>   . "$(git rev-parse --exec)/git-parse-remote.sh
>>   error_on_missing_default_upstream "$a" "$b" "$c" "$d"
>>
>> ? If so, then they may be surprised by the change in function signature.
>>
>> I generally think of git-sh-setup as the one that external scripts would
>> use. There _is_ a manpage for git-parse-remote, but it doesn't list any
>> functions. So maybe they're all fair game for changing?
>>
>> I just didn't see any discussion of this in the original patch thread,
>> so I wanted to make sure we were making that decision consciously, and
>> not accidentally. :)
>
> Ummm, yes, I admit that this was accidental.  I didn't really think
> of parse-remote as an externally visible and supported interface,
> but users have tendency to break our expectations, so, I dunno.

After sleeping on this, I doubt that the value of this "code
clean-up" is worth the trouble of waiting to see if a third-party
who dot sources parse-remote steps up, which may never materialize
while the topic is cooking in 'next' and more importantly risking
breakage on such a third-party.

Let's drop the topic and excise it from 'next' at the next version
boundary.

^ permalink raw reply

* Re: [PATCH v5] gc: ignore old gc.log files
From: Jeff King @ 2017-02-10 21:15 UTC (permalink / raw)
  To: David Turner; +Cc: git, pclouds, Junio C Hamano
In-Reply-To: <20170210205931.5348-1-dturner@twosigma.com>

> @@ -76,10 +78,30 @@ static void git_config_date_string(const char *key, const char **output)
>  static void process_log_file(void)
>  {
>  	struct stat st;
> -	if (!fstat(get_lock_file_fd(&log_lock), &st) && st.st_size)
> +	if (fstat(get_lock_file_fd(&log_lock), &st)) {
> +		/*
> +		 * Perhaps there was an i/o error or another
> +		 * unlikely situation.  Try to make a note of
> +		 * this in gc.log along with any existing
> +		 * messages.
> +		 */
> +		FILE *fp;
> +		int saved_errno = errno;
> +		fp = fdopen(log_lock.tempfile.fd, "a");

We usually use xfdopen() to handle (unlikely) errors rather than
segfaulting.  But I think you'd actually want fdopen_lock_file(), which
attaches the fd to the tempfile for flushing and cleanup purposes.

That said, I'm not sure I understand why you're opening a new stdio
filehandle. We know that stderr already points to our logfile (that's
how content gets there in the first place). If there's a problem with
the file or the descriptor, opening a new filehandle around the same
descriptor won't help.

Speaking of stderr, I wonder if this function should be calling
fflush(stderr) before looking at the fstat result. There could be
contents buffered there that haven't been written out yet (not from
child processes, but perhaps ones written in this process itself).
Probably unlikely in practice, since stderr is typically unbuffered by
default.

-Peff

^ permalink raw reply

* Re: [PATCH v3] gc: ignore old gc.log files
From: Junio C Hamano @ 2017-02-10 21:04 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, git@vger.kernel.org, pclouds@gmail.com
In-Reply-To: <20170210205138.5nnexap32pkbjjrk@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> I dunno. This seems like a lot of manual scrambling to try to overcome
> unlikely errors just to make our stderr heard (and we'd still fail in
> some cases). It seems like:
>
>   if (fstat(...)) {
> 	/* weird, fstat failed; try our best to mention it */
> 	error_errno("unable to fstat gc.log.lock");
> 	commit_lock_file(&log_lock));
>   } else if (st.st_size) {
> 	/* we have new errors to report */
> 	commit_lock_file(&log_lock);
>   } else {
> 	/* no new errors; clean up old ones */
> 	unlink(git_path("gc.log"));
> 	rollback_lock_file(&log_lock);
>   }
>
> would be sufficient.

Yeah, that should be sufficient.

^ permalink raw reply

* [PATCH v5] gc: ignore old gc.log files
From: David Turner @ 2017-02-10 20:59 UTC (permalink / raw)
  To: git; +Cc: peff, pclouds, David Turner, Junio C Hamano

A server can end up in a state where there are lots of unreferenced
loose objects (say, because many users are doing a bunch of rebasing
and pushing their rebased branches).  Running "git gc --auto" in
this state would cause a gc.log file to be created, preventing
future auto gcs, causing pack files to pile up.  Since many git
operations are O(n) in the number of pack files, this would lead to
poor performance.

Git should never get itself into a state where it refuses to do any
maintenance, just because at some point some piece of the maintenance
didn't make progress.

Teach Git to ignore gc.log files which are older than (by default)
one day old, which can be tweaked via the gc.logExpiry configuration
variable.  That way, these pack files will get cleaned up, if
necessary, at least once per day.  And operators who find a need for
more-frequent gcs can adjust gc.logExpiry to meet their needs.

There is also some cleanup: a successful manual gc, or a
warning-free auto gc with an old log file, will remove any old
gc.log files.

It might still happen that manual intervention is required
(e.g. because the repo is corrupt), but at the very least it won't
be because Git is too dumb to try again.

Signed-off-by: David Turner <dturner@twosigma.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt |  6 +++++
 builtin/gc.c             | 59 ++++++++++++++++++++++++++++++++++++++++++------
 t/t6500-gc.sh            | 15 ++++++++++++
 3 files changed, 73 insertions(+), 7 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index fc5a28a32..a684b7e3e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1402,6 +1402,12 @@ gc.autoDetach::
 	Make `git gc --auto` return immediately and run in background
 	if the system supports it. Default is true.
 
+gc.logExpiry::
+	If the file gc.log exists, then `git gc --auto` won't run
+	unless that file is more than 'gc.logExpiry' old.  Default is
+	"1.day".  See `gc.pruneExpire` for more ways to specify its
+	value.
+
 gc.packRefs::
 	Running `git pack-refs` in a repository renders it
 	unclonable by Git versions prior to 1.5.1.2 over dumb
diff --git a/builtin/gc.c b/builtin/gc.c
index 331f21926..8d355feb0 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -33,6 +33,8 @@ static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
 static int detach_auto = 1;
+static unsigned long gc_log_expire_time;
+static const char *gc_log_expire = "1.day.ago";
 static const char *prune_expire = "2.weeks.ago";
 static const char *prune_worktrees_expire = "3.months.ago";
 
@@ -76,10 +78,30 @@ static void git_config_date_string(const char *key, const char **output)
 static void process_log_file(void)
 {
 	struct stat st;
-	if (!fstat(get_lock_file_fd(&log_lock), &st) && st.st_size)
+	if (fstat(get_lock_file_fd(&log_lock), &st)) {
+		/*
+		 * Perhaps there was an i/o error or another
+		 * unlikely situation.  Try to make a note of
+		 * this in gc.log along with any existing
+		 * messages.
+		 */
+		FILE *fp;
+		int saved_errno = errno;
+		fp = fdopen(log_lock.tempfile.fd, "a");
+		fprintf(fp, _("Failed to fstat %s: %s"),
+			get_tempfile_path(&log_lock.tempfile),
+			strerror(saved_errno));
+		fclose(fp);
 		commit_lock_file(&log_lock);
-	else
+		errno = saved_errno;
+	} else if (st.st_size) {
+		/* There was some error recorded in the lock file */
+		commit_lock_file(&log_lock);
+	} else {
+		/* No error, clean up any old gc.log */
+		unlink(git_path("gc.log"));
 		rollback_lock_file(&log_lock);
+	}
 }
 
 static void process_log_file_at_exit(void)
@@ -113,6 +135,8 @@ static void gc_config(void)
 	git_config_get_bool("gc.autodetach", &detach_auto);
 	git_config_date_string("gc.pruneexpire", &prune_expire);
 	git_config_date_string("gc.worktreepruneexpire", &prune_worktrees_expire);
+	git_config_date_string("gc.logexpiry", &gc_log_expire);
+
 	git_config(git_default_config, NULL);
 }
 
@@ -290,19 +314,34 @@ static const char *lock_repo_for_gc(int force, pid_t* ret_pid)
 static int report_last_gc_error(void)
 {
 	struct strbuf sb = STRBUF_INIT;
-	int ret;
+	int ret = 0;
+	struct stat st;
+	char *gc_log_path = git_pathdup("gc.log");
 
-	ret = strbuf_read_file(&sb, git_path("gc.log"), 0);
+	if (stat(gc_log_path, &st)) {
+		if (errno == ENOENT)
+			goto done;
+
+		ret = error_errno(_("Can't stat %s"), gc_log_path);
+		goto done;
+	}
+
+	if (st.st_mtime < gc_log_expire_time)
+		goto done;
+
+	ret = strbuf_read_file(&sb, gc_log_path, 0);
 	if (ret > 0)
-		return error(_("The last gc run reported the following. "
+		ret = error(_("The last gc run reported the following. "
 			       "Please correct the root cause\n"
 			       "and remove %s.\n"
 			       "Automatic cleanup will not be performed "
 			       "until the file is removed.\n\n"
 			       "%s"),
-			     git_path("gc.log"), sb.buf);
+			    gc_log_path, sb.buf);
 	strbuf_release(&sb);
-	return 0;
+done:
+	free(gc_log_path);
+	return ret;
 }
 
 static int gc_before_repack(void)
@@ -349,7 +388,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	argv_array_pushl(&prune_worktrees, "worktree", "prune", "--expire", NULL);
 	argv_array_pushl(&rerere, "rerere", "gc", NULL);
 
+	/* default expiry time, overwritten in gc_config */
 	gc_config();
+	if (parse_expiry_date(gc_log_expire, &gc_log_expire_time))
+		die(_("Failed to parse gc.logexpiry value %s"), gc_log_expire);
 
 	if (pack_refs < 0)
 		pack_refs = !is_bare_repository();
@@ -448,5 +490,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		warning(_("There are too many unreachable loose objects; "
 			"run 'git prune' to remove them."));
 
+	if (!daemonized)
+		unlink(git_path("gc.log"));
+
 	return 0;
 }
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 1762dfa6a..08de2e8ab 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -67,5 +67,20 @@ test_expect_success 'auto gc with too many loose objects does not attempt to cre
 	test_line_count = 2 new # There is one new pack and its .idx
 '
 
+test_expect_success 'background auto gc does not run if gc.log is present and recent but does if it is old' '
+	test_commit foo &&
+	test_commit bar &&
+	git repack &&
+	test_config gc.autopacklimit 1 &&
+	test_config gc.autodetach true &&
+	echo fleem >.git/gc.log &&
+	test_must_fail git gc --auto 2>err &&
+	test_i18ngrep "^error:" err &&
+	test_config gc.logexpiry 5.days &&
+	test-chmtime =-345600 .git/gc.log &&
+	test_must_fail git gc --auto &&
+	test_config gc.logexpiry 2.days &&
+	git gc --auto
+'
 
 test_done
-- 
2.11.GIT


^ permalink raw reply related

* Re: fuzzy patch application
From: Jeff King @ 2017-02-10 20:57 UTC (permalink / raw)
  To: Nick Desaulniers; +Cc: git
In-Reply-To: <CAKwvOdn9j=_Ob=xq4ucN6Ar1G537zNiU9ox4iF6o1qO7kPY41A@mail.gmail.com>

On Fri, Feb 10, 2017 at 11:20:59AM -0800, Nick Desaulniers wrote:

> I frequently need to backport patches from the Linux kernel to older
> kernel versions (Android Security).  My usual workflow for simple
> patches is:
> 
> 1. try `git am patch.txt`.

This is not exactly an answer to your question, but "git am -3" is often
a better solution than trying to fuzz patches. It assumes the patches
are Git patches (and record their origin blobs), and that you have that
blob (which should be true if the patches are based on the normal kernel
history, and you just fetch that history into your repository).

I've found that this often manages to apply patches that "git apply"
will not by itself. And I also find the resulting conflicts to be much
easier to deal with than patch's ".rej" files.

-Peff

^ permalink raw reply

* RE: [PATCH v3] gc: ignore old gc.log files
From: David Turner @ 2017-02-10 20:44 UTC (permalink / raw)
  To: 'Jeff King'; +Cc: git@vger.kernel.org, pclouds@gmail.com
In-Reply-To: <20170210200838.kuwpldsgzvkjlmri@sigill.intra.peff.net>



> -----Original Message-----
> From: Jeff King [mailto:peff@peff.net]
> Sent: Friday, February 10, 2017 3:09 PM
> To: David Turner <David.Turner@twosigma.com>
> Cc: git@vger.kernel.org; pclouds@gmail.com
> Subject: Re: [PATCH v3] gc: ignore old gc.log files
> 
> On Fri, Feb 10, 2017 at 02:20:19PM -0500, David Turner wrote:
> 
> > @@ -76,10 +77,47 @@ static void git_config_date_string(const char
> > *key, const char **output)  static void process_log_file(void)  {
> >  	struct stat st;
> > -	if (!fstat(get_lock_file_fd(&log_lock), &st) && st.st_size)
> > +	if (fstat(get_lock_file_fd(&log_lock), &st)) {
> > +		if (errno == ENOENT) {
> > +			/*
> > +			 * The user has probably intentionally deleted
> > +			 * gc.log.lock (perhaps because they're blowing
> > +			 * away the whole repo), so thre's no need to
> > +			 * report anything here.  But we also won't
> > +			 * delete gc.log, because we don't know what
> > +			 * the user's intentions are.
> > +			 */
> 
> Hrm. Does fstat actually trigger ENOENT in that case? There's no pathname
> lookup happening at all. A simple test on Linux seems to show that it does not.
> Build:
> 
> 	#include <unistd.h>
> 	#include <fcntl.h>
> 	#include <sys/stat.h>
> 
> 	int main(int argc, char **argv)
> 	{
> 		struct stat st;
> 		int fd = open(argv[1], O_WRONLY|O_CREAT|O_EXCL, 0600);
> 		unlink(argv[1]);
> 		fstat(fd, &st);
> 		return 0;
> 	}
> 
> and run:
> 
>   strace ./a.out tmp
> 
> which shows:
> 
>   open("tmp", O_WRONLY|O_CREAT|O_EXCL, 056660) = 3
>   unlink("tmp")                           = 0
>   fstat(3, {st_mode=S_IFREG|S_ISUID|S_ISGID|0640, st_size=0, ...}) = 0
> 
> But maybe there are other systems emulating fstat() would trigger here.
> I dunno.

Yeah, I'm also not sure.  On the other hand, if we're going to catch fstat 
errors anyway, we might as well do something sensible with this one.

> > +		} else {
> > +			FILE *fp;
> > +			int fd;
> > +			int saved_errno = errno;
> > +			/*
> > +			 * Perhaps there was an i/o error or another
> > +			 * unlikely situation.  Try to make a note of
> > +			 * this in gc.log.  If this fails again,
> > +			 * give up and leave gc.log as it was.
> > +			 */
> > +			rollback_lock_file(&log_lock);
> > +			fd = hold_lock_file_for_update(&log_lock,
> > +						       git_path("gc.log"),
> > +						       LOCK_DIE_ON_ERROR);
> 
> If there was an i/o error, then gc.log.lock still exists. And this attempt to lock will
> therefore fail, calling die(). Which would trigger our atexit() to call process_log(),
> which would hit this code again, and so forth. I'm not sure if we'd actually
> recurse when an atexit handler calls exit(). But it seems questionable.

No, because  we call rollback_log_file first.

> I'm also not sure why we need to re-open the file in the first place. We have an
> open descriptor (and we even redirected stderr to it already).
> Why don't we just write to it?

If fstat failed, that probably indicates something bad about the old fd.  I'm not 
actually sure why fstat would ever fail, since in all likelihood, the kernel keeps 
information about inodes corresponding to open fds in-memory.  Maybe someone
forcibly unmounted the drive?  

> > @@ -113,6 +151,9 @@ static void gc_config(void)
> >  	git_config_get_bool("gc.autodetach", &detach_auto);
> >  	git_config_date_string("gc.pruneexpire", &prune_expire);
> >  	git_config_date_string("gc.worktreepruneexpire",
> > &prune_worktrees_expire);
> > +	if (!git_config_get_value("gc.logexpiry", &value))
> > +		parse_expiry_date(value, &gc_log_expire_time);
> > +
> 
> Should you be using git_config_date_string() here? It looks like it does some
> extra sanity-checking. It annoyingly just gets the string, and doesn't parse it.
> Perhaps it would be worth adding a
> git_config_date_value() helper.

That seems like a good idea, but I'm going to skip it for now and promise to
do it next time I need a date config.

> Or alternatively, save the date string here, and then parse once later on, after
> having resolved all config (and overwritten the default value).

Sure.

> > @@ -448,5 +506,8 @@ int cmd_gc(int argc, const char **argv, const char
> *prefix)
> >  		warning(_("There are too many unreachable loose objects; "
> >  			"run 'git prune' to remove them."));
> >
> > +	if (!daemonized)
> > +		unlink(git_path("gc.log"));
> > +
> 
> I think this is a good thing to do, though I'd have probably put it in a separate
> patch. I guess that's a matter of taste.

I could go either way, but since I've already gone this way, I'll stick with it.

> > +test_expect_success 'background auto gc does not run if gc.log is present
> and recent but does if it is old' '
> > +	keep=$(ls .git/objects/pack/*.pack|head -1|sed -e "s/pack$/keep/") &&
> > +	test_commit foo &&
> > +	test_commit bar &&
> > +	git repack &&
> > +	test_config gc.autopacklimit 1 &&
> > +	test_config gc.autodetach true &&
> > +	echo fleem >.git/gc.log &&
> > +	test_must_fail git gc --auto 2>err &&
> > +	test_i18ngrep "^error:" err &&
> > +	test-chmtime =-86401 .git/gc.log &&
> > +	git gc --auto
> > +'
> 
> This gives only 1 second of leeway. I wonder if we could end up getting bogus
> failures due to system clock adjustments, or even skew between the filesystem
> and OS clocks. Perhaps we should set it farther back, like a few days.
> 
> (It also relies on the 1-day default. That's probably OK, though we could also set
> an explicit default for the test, which would exercise the config code path, too).

Sure, I'll re-roll with that change.

^ permalink raw reply

* Re: [PATCH v3] gc: ignore old gc.log files
From: Jeff King @ 2017-02-10 20:51 UTC (permalink / raw)
  To: David Turner; +Cc: git@vger.kernel.org, pclouds@gmail.com
In-Reply-To: <7852bf6688ed487097d4f997ac72dcba@exmbdft7.ad.twosigma.com>

On Fri, Feb 10, 2017 at 08:44:27PM +0000, David Turner wrote:

> > But maybe there are other systems emulating fstat() would trigger here.
> > I dunno.
> 
> Yeah, I'm also not sure.  On the other hand, if we're going to catch fstat 
> errors anyway, we might as well do something sensible with this one.

I'd say it's probably not worth worrying about here. We don't think it
can happen, and it would just fall-through to the "woah, fstat failed"
code path if it does.

> > If there was an i/o error, then gc.log.lock still exists. And this attempt to lock will
> > therefore fail, calling die(). Which would trigger our atexit() to call process_log(),
> > which would hit this code again, and so forth. I'm not sure if we'd actually
> > recurse when an atexit handler calls exit(). But it seems questionable.
> 
> No, because  we call rollback_log_file first.

Ah, right, sorry I missed that.

> > I'm also not sure why we need to re-open the file in the first place. We have an
> > open descriptor (and we even redirected stderr to it already).
> > Why don't we just write to it?
> 
> If fstat failed, that probably indicates something bad about the old fd.  I'm not 
> actually sure why fstat would ever fail, since in all likelihood, the kernel keeps 
> information about inodes corresponding to open fds in-memory.  Maybe someone
> forcibly unmounted the drive?

It seems like the re-open would fail then, too. And the error message
for that would go to stderr, which goes to...the old file.

I dunno. This seems like a lot of manual scrambling to try to overcome
unlikely errors just to make our stderr heard (and we'd still fail in
some cases). It seems like:

  if (fstat(...)) {
	/* weird, fstat failed; try our best to mention it */
	error_errno("unable to fstat gc.log.lock");
	commit_lock_file(&log_lock));
  } else if (st.st_size) {
	/* we have new errors to report */
	commit_lock_file(&log_lock);
  } else {
	/* no new errors; clean up old ones */
	unlink(git_path("gc.log"));
	rollback_lock_file(&log_lock);
  }

would be sufficient.

-Peff

^ permalink raw reply

* Re: [PATCH] fixup! bisect--helper: `bisect_next_check` & bisect_voc shell function in C
From: René Scharfe @ 2017-02-10 20:47 UTC (permalink / raw)
  To: Johannes Schindelin, Pranit Bauva; +Cc: git, Junio C Hamano
In-Reply-To: <a1b9143bb29a8a5979dd733ed20161e6769b2b83.1486736391.git.johannes.schindelin@gmx.de>

Am 10.02.2017 um 15:20 schrieb Johannes Schindelin:
> It is curious that only MacOSX builds trigger an error about this, both
> GCC and Clang, but not Linux GCC nor Clang (see
> https://travis-ci.org/git/git/jobs/200182819#L1152 for details):
>
> builtin/bisect--helper.c:299:6: error: variable 'good_syn' is used
> 		uninitialized whenever 'if' condition is true
> 		[-Werror,-Wsometimes-uninitialized]
>         if (missing_good && !missing_bad && current_term &&
>             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> builtin/bisect--helper.c:350:7: note: uninitialized use occurs here
>         if (!good_syn)
>              ^~~~~~~~

The only way that good_syn could be used in the if block is by going to 
the label finish, which does the following before returning:

	if (!bad_ref)
		free(bad_ref);
	if (!good_glob)
		free(good_glob);
	if (!bad_syn)
		free(bad_syn);
	if (!good_syn)
		free(good_syn);

On Linux that code is elided completely -- freeing NULL is a no-op.  I 
guess free(3) has different attributes on OS X and compilers don't dare 
to optimize it away there.

So instead of calling free(3) only in the case when we did not allocate 
memory (which makes no sense and leaks) we should either call it in the 
opposite case, or (preferred) unconditionally, as it can handle the NULL 
case itself.  Once that's fixed initialization will be required even on 
Linux.

René

^ permalink raw reply

* Re: [PATCH v2 1/2] rev-parse tests: add tests executed from a subdirectory
From: Junio C Hamano @ 2017-02-10 20:25 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Michael Rappazzo, Nguyễn Thái Ngọc Duy
In-Reply-To: <cc23463af8096c5f8429f939ce881cf0eb5c2dcd.1486740772.git.johannes.schindelin@gmx.de>

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
> index 292a0720fcc..1d6e27a09d8 100755
> --- a/t/t1700-split-index.sh
> +++ b/t/t1700-split-index.sh
> @@ -200,4 +200,21 @@ EOF
>  	test_cmp expect actual
>  '
>  
> +test_expect_failure 'rev-parse --shared-index-path' '
> +	rm -rf .git &&
> +	test_create_repo . &&

Another thing that I notice only after merging this and other topics
to 'pu' was that this piece needs to always come at the end of the
script because of this removal.  It would make the test more robust
to create a test repository for this test and work inside it.

> +	git update-index --split-index &&
> +	ls -t .git/sharedindex* | tail -n 1 >expect &&
> +	git rev-parse --shared-index-path >actual &&
> +	test_cmp expect actual &&
> +	mkdir work &&
> +	test_when_finished "rm -rf work" &&
> +	(
> +		cd work &&
> +		ls -t ../.git/sharedindex* | tail -n 1 >expect &&
> +		git rev-parse --shared-index-path >actual &&
> +		test_cmp expect actual
> +	)
> +'
> +
>  test_done

^ permalink raw reply

* Re: [PATCH v3] gc: ignore old gc.log files
From: Jeff King @ 2017-02-10 20:08 UTC (permalink / raw)
  To: David Turner; +Cc: git, pclouds
In-Reply-To: <20170210192019.13927-1-dturner@twosigma.com>

On Fri, Feb 10, 2017 at 02:20:19PM -0500, David Turner wrote:

> @@ -76,10 +77,47 @@ static void git_config_date_string(const char *key, const char **output)
>  static void process_log_file(void)
>  {
>  	struct stat st;
> -	if (!fstat(get_lock_file_fd(&log_lock), &st) && st.st_size)
> +	if (fstat(get_lock_file_fd(&log_lock), &st)) {
> +		if (errno == ENOENT) {
> +			/*
> +			 * The user has probably intentionally deleted
> +			 * gc.log.lock (perhaps because they're blowing
> +			 * away the whole repo), so thre's no need to
> +			 * report anything here.  But we also won't
> +			 * delete gc.log, because we don't know what
> +			 * the user's intentions are.
> +			 */

Hrm. Does fstat actually trigger ENOENT in that case? There's no
pathname lookup happening at all. A simple test on Linux seems to show
that it does not. Build:

	#include <unistd.h>
	#include <fcntl.h>
	#include <sys/stat.h>
	
	int main(int argc, char **argv)
	{
		struct stat st;
		int fd = open(argv[1], O_WRONLY|O_CREAT|O_EXCL, 0600);
		unlink(argv[1]);
		fstat(fd, &st);
		return 0;
	}

and run:

  strace ./a.out tmp

which shows:

  open("tmp", O_WRONLY|O_CREAT|O_EXCL, 056660) = 3
  unlink("tmp")                           = 0
  fstat(3, {st_mode=S_IFREG|S_ISUID|S_ISGID|0640, st_size=0, ...}) = 0

But maybe there are other systems emulating fstat() would trigger here.
I dunno.

> +		} else {
> +			FILE *fp;
> +			int fd;
> +			int saved_errno = errno;
> +			/*
> +			 * Perhaps there was an i/o error or another
> +			 * unlikely situation.  Try to make a note of
> +			 * this in gc.log.  If this fails again,
> +			 * give up and leave gc.log as it was.
> +			 */
> +			rollback_lock_file(&log_lock);
> +			fd = hold_lock_file_for_update(&log_lock,
> +						       git_path("gc.log"),
> +						       LOCK_DIE_ON_ERROR);

If there was an i/o error, then gc.log.lock still exists. And this
attempt to lock will therefore fail, calling die(). Which would trigger
our atexit() to call process_log(), which would hit this code again, and
so forth. I'm not sure if we'd actually recurse when an atexit handler
calls exit(). But it seems questionable.

I'm also not sure why we need to re-open the file in the first place. We
have an open descriptor (and we even redirected stderr to it already).
Why don't we just write to it?

> @@ -113,6 +151,9 @@ static void gc_config(void)
>  	git_config_get_bool("gc.autodetach", &detach_auto);
>  	git_config_date_string("gc.pruneexpire", &prune_expire);
>  	git_config_date_string("gc.worktreepruneexpire", &prune_worktrees_expire);
> +	if (!git_config_get_value("gc.logexpiry", &value))
> +		parse_expiry_date(value, &gc_log_expire_time);
> +

Should you be using git_config_date_string() here? It looks like it does
some extra sanity-checking. It annoyingly just gets the string, and
doesn't parse it. Perhaps it would be worth adding a
git_config_date_value() helper.

Or alternatively, save the date string here, and then parse once later
on, after having resolved all config (and overwritten the default
value).

> @@ -448,5 +506,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
>  		warning(_("There are too many unreachable loose objects; "
>  			"run 'git prune' to remove them."));
>  
> +	if (!daemonized)
> +		unlink(git_path("gc.log"));
> +

I think this is a good thing to do, though I'd have probably put it in a
separate patch. I guess that's a matter of taste.

> +test_expect_success 'background auto gc does not run if gc.log is present and recent but does if it is old' '
> +	keep=$(ls .git/objects/pack/*.pack|head -1|sed -e "s/pack$/keep/") &&
> +	test_commit foo &&
> +	test_commit bar &&
> +	git repack &&
> +	test_config gc.autopacklimit 1 &&
> +	test_config gc.autodetach true &&
> +	echo fleem >.git/gc.log &&
> +	test_must_fail git gc --auto 2>err &&
> +	test_i18ngrep "^error:" err &&
> +	test-chmtime =-86401 .git/gc.log &&
> +	git gc --auto
> +'

This gives only 1 second of leeway. I wonder if we could end up getting
bogus failures due to system clock adjustments, or even skew between the
filesystem and OS clocks. Perhaps we should set it farther back, like a
few days.

(It also relies on the 1-day default. That's probably OK, though we
could also set an explicit default for the test, which would exercise
the config code path, too).

-Peff

^ permalink raw reply

* [PATCH 2/2] ls-files: move only kept cache entries in prune_cache()
From: René Scharfe @ 2017-02-10 20:03 UTC (permalink / raw)
  To: Git List; +Cc: Duy Nguyen, Brandon Williams, Junio C Hamano
In-Reply-To: <f480bd26-f74e-9088-844d-26cde0843baa@web.de>

prune_cache() first identifies those entries at the start of the sorted
array that can be discarded.  Then it moves the rest of the entries up.
Last it identifies the unwanted trailing entries among the moved ones
and cuts them off.

Change the order: Identify both start *and* end of the range to keep
first and then move only those entries to the top.  The resulting code
is slightly shorter and a bit more efficient.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
The performance impact is probably only measurable with a *really* big
index.

 builtin/ls-files.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 18105ec7ea..1c0f057d02 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -379,10 +379,7 @@ static void prune_cache(const char *prefix, size_t prefixlen)
 	pos = cache_name_pos(prefix, prefixlen);
 	if (pos < 0)
 		pos = -pos-1;
-	memmove(active_cache, active_cache + pos,
-		(active_nr - pos) * sizeof(struct cache_entry *));
-	active_nr -= pos;
-	first = 0;
+	first = pos;
 	last = active_nr;
 	while (last > first) {
 		int next = (last + first) >> 1;
@@ -393,7 +390,9 @@ static void prune_cache(const char *prefix, size_t prefixlen)
 		}
 		last = next;
 	}
-	active_nr = last;
+	memmove(active_cache, active_cache + pos,
+		(last - pos) * sizeof(struct cache_entry *));
+	active_nr = last - pos;
 }
 
 /*
-- 
2.11.1


^ permalink raw reply related

* [PATCH v4] gc: ignore old gc.log files
From: David Turner @ 2017-02-10 20:00 UTC (permalink / raw)
  To: git; +Cc: peff, pclouds, David Turner

Git should never get itself into a state where it refuses to do any
maintenance, just because at some point some piece of the maintenance
didn't make progress.

In this commit, git learns to ignore gc.log files which are older than
(by default) one day old.  It also learns about a config, gc.logExpiry
to manage this.  There is also some cleanup: a successful manual gc,
or a warning-free auto gc, will remove any old gc.log files.

It might still happen that manual intervention is required
(e.g. because the repo is corrupt), but at the very least it won't be
because Git is too dumb to try again.

Automatic gc was intended to make client repositories be
self-maintaining.  It would be good if automatic gc were also useful
to server operators.  A server can end up in a state whre there are
lots of unreferenced loose objects (say, because many users are doing
a bunch of rebasing and pushing their rebased branches). Before this
patch, this state would cause a gc.log file to be created, preventing
future auto gcs.  Then pack files could pile up.  Since many git
operations are O(n) in the number of pack files, this would lead to
poor performance.  Now, the pack files will get cleaned up, if
necessary, at least once per day.  And operators who find a need for
more-frequent gcs can adjust gc.logExpiry to meet their needs.

Signed-off-by: David Turner <dturner@twosigma.com>
Helped-by: Jeff King <peff@peff.net>
---
 Documentation/config.txt |  6 ++++
 builtin/gc.c             | 76 +++++++++++++++++++++++++++++++++++++++++++-----
 t/t6500-gc.sh            | 12 ++++++++
 3 files changed, 87 insertions(+), 7 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index fc5a28a32..a684b7e3e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1402,6 +1402,12 @@ gc.autoDetach::
 	Make `git gc --auto` return immediately and run in background
 	if the system supports it. Default is true.
 
+gc.logExpiry::
+	If the file gc.log exists, then `git gc --auto` won't run
+	unless that file is more than 'gc.logExpiry' old.  Default is
+	"1.day".  See `gc.pruneExpire` for more ways to specify its
+	value.
+
 gc.packRefs::
 	Running `git pack-refs` in a repository renders it
 	unclonable by Git versions prior to 1.5.1.2 over dumb
diff --git a/builtin/gc.c b/builtin/gc.c
index 331f21926..55c441115 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -33,6 +33,7 @@ static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
 static int detach_auto = 1;
+static unsigned long gc_log_expire_time;
 static const char *prune_expire = "2.weeks.ago";
 static const char *prune_worktrees_expire = "3.months.ago";
 
@@ -76,10 +77,48 @@ static void git_config_date_string(const char *key, const char **output)
 static void process_log_file(void)
 {
 	struct stat st;
-	if (!fstat(get_lock_file_fd(&log_lock), &st) && st.st_size)
+	if (fstat(get_lock_file_fd(&log_lock), &st)) {
+		if (errno == ENOENT) {
+			/*
+			 * The user has probably intentionally deleted
+			 * gc.log.lock (perhaps because they're blowing
+			 * away the whole repo), so thre's no need to
+			 * report anything here.  But we also won't
+			 * delete gc.log, because we don't know what
+			 * the user's intentions are.
+			 */
+		} else {
+			FILE *fp;
+			int fd;
+			int saved_errno = errno;
+			/*
+			 * Perhaps there was an i/o error or another
+			 * unlikely situation.  Try to make a note of
+			 * this in gc.log.  If this fails again,
+			 * give up and leave gc.log as it was.
+			 */
+			rollback_lock_file(&log_lock);
+			fd = hold_lock_file_for_update(&log_lock,
+						       git_path("gc.log"),
+						       LOCK_DIE_ON_ERROR);
+
+			fp = fdopen(fd, "w");
+			fprintf(fp, _("Failed to fstat %s: %s"),
+				get_tempfile_path(&log_lock.tempfile),
+				strerror(saved_errno));
+			fclose(fp);
+			commit_lock_file(&log_lock);
+			errno = saved_errno;
+		}
+
+	} else if (st.st_size) {
+		/* There was some error recorded in the lock file */
 		commit_lock_file(&log_lock);
-	else
+	} else {
+		/* No error, clean up any old gc.log */
+		unlink(git_path("gc.log"));
 		rollback_lock_file(&log_lock);
+	}
 }
 
 static void process_log_file_at_exit(void)
@@ -113,6 +152,9 @@ static void gc_config(void)
 	git_config_get_bool("gc.autodetach", &detach_auto);
 	git_config_date_string("gc.pruneexpire", &prune_expire);
 	git_config_date_string("gc.worktreepruneexpire", &prune_worktrees_expire);
+	if (!git_config_get_value("gc.logexpiry", &value))
+		parse_expiry_date(value, &gc_log_expire_time);
+
 	git_config(git_default_config, NULL);
 }
 
@@ -290,19 +332,34 @@ static const char *lock_repo_for_gc(int force, pid_t* ret_pid)
 static int report_last_gc_error(void)
 {
 	struct strbuf sb = STRBUF_INIT;
-	int ret;
+	int ret = 0;
+	struct stat st;
+	char *gc_log_path = git_pathdup("gc.log");
+
+	if (stat(gc_log_path, &st)) {
+		if (errno == ENOENT)
+			goto done;
+
+		ret = error_errno(_("Can't stat %s"), gc_log_path);
+		goto done;
+	}
+
+	if (st.st_mtime < gc_log_expire_time)
+		goto done;
 
-	ret = strbuf_read_file(&sb, git_path("gc.log"), 0);
+	ret = strbuf_read_file(&sb, gc_log_path, 0);
 	if (ret > 0)
-		return error(_("The last gc run reported the following. "
+		ret = error(_("The last gc run reported the following. "
 			       "Please correct the root cause\n"
 			       "and remove %s.\n"
 			       "Automatic cleanup will not be performed "
 			       "until the file is removed.\n\n"
 			       "%s"),
-			     git_path("gc.log"), sb.buf);
+			    gc_log_path, sb.buf);
 	strbuf_release(&sb);
-	return 0;
+done:
+	free(gc_log_path);
+	return ret;
 }
 
 static int gc_before_repack(void)
@@ -349,6 +406,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	argv_array_pushl(&prune_worktrees, "worktree", "prune", "--expire", NULL);
 	argv_array_pushl(&rerere, "rerere", "gc", NULL);
 
+	/* default expiry time, overwritten in gc_config */
+	parse_expiry_date("1.day", &gc_log_expire_time);
 	gc_config();
 
 	if (pack_refs < 0)
@@ -448,5 +507,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		warning(_("There are too many unreachable loose objects; "
 			"run 'git prune' to remove them."));
 
+	if (!daemonized)
+		unlink(git_path("gc.log"));
+
 	return 0;
 }
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 1762dfa6a..e1fb9b4d5 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -67,5 +67,17 @@ test_expect_success 'auto gc with too many loose objects does not attempt to cre
 	test_line_count = 2 new # There is one new pack and its .idx
 '
 
+test_expect_success 'background auto gc does not run if gc.log is present and recent but does if it is old' '
+	test_commit foo &&
+	test_commit bar &&
+	git repack &&
+	test_config gc.autopacklimit 1 &&
+	test_config gc.autodetach true &&
+	echo fleem >.git/gc.log &&
+	test_must_fail git gc --auto 2>err &&
+	test_i18ngrep "^error:" err &&
+	test-chmtime =-86401 .git/gc.log &&
+	git gc --auto
+'
 
 test_done
-- 
2.11.GIT


^ permalink raw reply related

* Re: [PATCH v3] gc: ignore old gc.log files
From: Junio C Hamano @ 2017-02-10 19:56 UTC (permalink / raw)
  To: David Turner; +Cc: git, peff, pclouds
In-Reply-To: <xmqqy3xdkeuw.fsf@gitster.mtv.corp.google.com>

Junio C Hamano <gitster@pobox.com> writes:

> David Turner <dturner@twosigma.com> writes:
> ...
>> It might still happen that manual intervention is required
>> (e.g. because the repo is corrupt), but at the very least it won't be
>> because Git is too dumb to try again.
>
> Thanks, nicely explained.

Sorry but I spotted a typo "whre" and ended up updating the proposed
log message by reordering them, omitting redundant info, etc.  Here
is a proposed amend with the "where does saved-errno go?" and "we do
not seem to use keep" fix-ups squashed in.

-- >8 --
Subject: gc: ignore old gc.log files

A server can end up in a state where there are lots of unreferenced
loose objects (say, because many users are doing a bunch of rebasing
and pushing their rebased branches).  Running "git gc --auto" in
this state would cause a gc.log file to be created, preventing
future auto gcs, causing pack files to pile up.  Since many git
operations are O(n) in the number of pack files, this would lead to
poor performance.

Git should never get itself into a state where it refuses to do any
maintenance, just because at some point some piece of the maintenance
didn't make progress.

Teach Git to ignore gc.log files which are older than (by default)
one day old, which can be tweaked via the gc.logExpiry configuration
variable.  That way, these pack files will get cleaned up, if
necessary, at least once per day.  And operators who find a need for
more-frequent gcs can adjust gc.logExpiry to meet their needs.

There is also some cleanup: a successful manual gc, or a
warning-free auto gc with an old log file, will remove any old
gc.log files.

It might still happen that manual intervention is required
(e.g. because the repo is corrupt), but at the very least it won't
be because Git is too dumb to try again.

Signed-off-by: David Turner <dturner@twosigma.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt |  6 ++++
 builtin/gc.c             | 76 +++++++++++++++++++++++++++++++++++++++++++-----
 t/t6500-gc.sh            | 12 ++++++++
 3 files changed, 87 insertions(+), 7 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1fee83ca42..d385711b70 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1397,6 +1397,12 @@ gc.autoDetach::
 	Make `git gc --auto` return immediately and run in background
 	if the system supports it. Default is true.
 
+gc.logExpiry::
+	If the file gc.log exists, then `git gc --auto` won't run
+	unless that file is more than 'gc.logExpiry' old.  Default is
+	"1.day".  See `gc.pruneExpire` for more ways to specify its
+	value.
+
 gc.packRefs::
 	Running `git pack-refs` in a repository renders it
 	unclonable by Git versions prior to 1.5.1.2 over dumb
diff --git a/builtin/gc.c b/builtin/gc.c
index 331f219260..55c441115d 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -33,6 +33,7 @@ static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
 static int detach_auto = 1;
+static unsigned long gc_log_expire_time;
 static const char *prune_expire = "2.weeks.ago";
 static const char *prune_worktrees_expire = "3.months.ago";
 
@@ -76,10 +77,48 @@ static void git_config_date_string(const char *key, const char **output)
 static void process_log_file(void)
 {
 	struct stat st;
-	if (!fstat(get_lock_file_fd(&log_lock), &st) && st.st_size)
+	if (fstat(get_lock_file_fd(&log_lock), &st)) {
+		if (errno == ENOENT) {
+			/*
+			 * The user has probably intentionally deleted
+			 * gc.log.lock (perhaps because they're blowing
+			 * away the whole repo), so thre's no need to
+			 * report anything here.  But we also won't
+			 * delete gc.log, because we don't know what
+			 * the user's intentions are.
+			 */
+		} else {
+			FILE *fp;
+			int fd;
+			int saved_errno = errno;
+			/*
+			 * Perhaps there was an i/o error or another
+			 * unlikely situation.  Try to make a note of
+			 * this in gc.log.  If this fails again,
+			 * give up and leave gc.log as it was.
+			 */
+			rollback_lock_file(&log_lock);
+			fd = hold_lock_file_for_update(&log_lock,
+						       git_path("gc.log"),
+						       LOCK_DIE_ON_ERROR);
+
+			fp = fdopen(fd, "w");
+			fprintf(fp, _("Failed to fstat %s: %s"),
+				get_tempfile_path(&log_lock.tempfile),
+				strerror(saved_errno));
+			fclose(fp);
+			commit_lock_file(&log_lock);
+			errno = saved_errno;
+		}
+
+	} else if (st.st_size) {
+		/* There was some error recorded in the lock file */
 		commit_lock_file(&log_lock);
-	else
+	} else {
+		/* No error, clean up any old gc.log */
+		unlink(git_path("gc.log"));
 		rollback_lock_file(&log_lock);
+	}
 }
 
 static void process_log_file_at_exit(void)
@@ -113,6 +152,9 @@ static void gc_config(void)
 	git_config_get_bool("gc.autodetach", &detach_auto);
 	git_config_date_string("gc.pruneexpire", &prune_expire);
 	git_config_date_string("gc.worktreepruneexpire", &prune_worktrees_expire);
+	if (!git_config_get_value("gc.logexpiry", &value))
+		parse_expiry_date(value, &gc_log_expire_time);
+
 	git_config(git_default_config, NULL);
 }
 
@@ -290,19 +332,34 @@ static const char *lock_repo_for_gc(int force, pid_t* ret_pid)
 static int report_last_gc_error(void)
 {
 	struct strbuf sb = STRBUF_INIT;
-	int ret;
+	int ret = 0;
+	struct stat st;
+	char *gc_log_path = git_pathdup("gc.log");
+
+	if (stat(gc_log_path, &st)) {
+		if (errno == ENOENT)
+			goto done;
+
+		ret = error_errno(_("Can't stat %s"), gc_log_path);
+		goto done;
+	}
+
+	if (st.st_mtime < gc_log_expire_time)
+		goto done;
 
-	ret = strbuf_read_file(&sb, git_path("gc.log"), 0);
+	ret = strbuf_read_file(&sb, gc_log_path, 0);
 	if (ret > 0)
-		return error(_("The last gc run reported the following. "
+		ret = error(_("The last gc run reported the following. "
 			       "Please correct the root cause\n"
 			       "and remove %s.\n"
 			       "Automatic cleanup will not be performed "
 			       "until the file is removed.\n\n"
 			       "%s"),
-			     git_path("gc.log"), sb.buf);
+			    gc_log_path, sb.buf);
 	strbuf_release(&sb);
-	return 0;
+done:
+	free(gc_log_path);
+	return ret;
 }
 
 static int gc_before_repack(void)
@@ -349,6 +406,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	argv_array_pushl(&prune_worktrees, "worktree", "prune", "--expire", NULL);
 	argv_array_pushl(&rerere, "rerere", "gc", NULL);
 
+	/* default expiry time, overwritten in gc_config */
+	parse_expiry_date("1.day", &gc_log_expire_time);
 	gc_config();
 
 	if (pack_refs < 0)
@@ -448,5 +507,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		warning(_("There are too many unreachable loose objects; "
 			"run 'git prune' to remove them."));
 
+	if (!daemonized)
+		unlink(git_path("gc.log"));
+
 	return 0;
 }
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 1762dfa6a3..e1fb9b4d5b 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -67,5 +67,17 @@ test_expect_success 'auto gc with too many loose objects does not attempt to cre
 	test_line_count = 2 new # There is one new pack and its .idx
 '
 
+test_expect_success 'background auto gc does not run if gc.log is present and recent but does if it is old' '
+	test_commit foo &&
+	test_commit bar &&
+	git repack &&
+	test_config gc.autopacklimit 1 &&
+	test_config gc.autodetach true &&
+	echo fleem >.git/gc.log &&
+	test_must_fail git gc --auto 2>err &&
+	test_i18ngrep "^error:" err &&
+	test-chmtime =-86401 .git/gc.log &&
+	git gc --auto
+'
 
 test_done

^ permalink raw reply related

* [PATCH 1/2] ls-files: pass prefix length explicitly to prune_cache()
From: René Scharfe @ 2017-02-10 19:42 UTC (permalink / raw)
  To: Git List; +Cc: Duy Nguyen, Brandon Williams, Junio C Hamano

The function prune_cache() relies on the fact that it is only called on
max_prefix and sneakily uses the matching global variable max_prefix_len
directly.  Tighten its interface by passing both the string and its
length as parameters.  While at it move the NULL check into the function
to collect all cache-pruning related logic in one place.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
Not urgent, of course.

 builtin/ls-files.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 1592290815..18105ec7ea 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -369,11 +369,14 @@ static void show_files(struct dir_struct *dir)
 /*
  * Prune the index to only contain stuff starting with "prefix"
  */
-static void prune_cache(const char *prefix)
+static void prune_cache(const char *prefix, size_t prefixlen)
 {
-	int pos = cache_name_pos(prefix, max_prefix_len);
+	int pos;
 	unsigned int first, last;
 
+	if (!prefix)
+		return;
+	pos = cache_name_pos(prefix, prefixlen);
 	if (pos < 0)
 		pos = -pos-1;
 	memmove(active_cache, active_cache + pos,
@@ -384,7 +387,7 @@ static void prune_cache(const char *prefix)
 	while (last > first) {
 		int next = (last + first) >> 1;
 		const struct cache_entry *ce = active_cache[next];
-		if (!strncmp(ce->name, prefix, max_prefix_len)) {
+		if (!strncmp(ce->name, prefix, prefixlen)) {
 			first = next+1;
 			continue;
 		}
@@ -641,8 +644,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 	      show_killed || show_modified || show_resolve_undo))
 		show_cached = 1;
 
-	if (max_prefix)
-		prune_cache(max_prefix);
+	prune_cache(max_prefix, max_prefix_len);
 	if (with_tree) {
 		/*
 		 * Basic sanity check; show-stages and show-unmerged
-- 
2.11.1


^ permalink raw reply related

* Re: [PATCH] dir: avoid allocation in fill_directory()
From: René Scharfe @ 2017-02-10 19:42 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Brandon Williams, Git List
In-Reply-To: <CACsJy8CE-cyTZHZZhvhdsNau7iSqBci1BdUqDYvxoE5odV2SBA@mail.gmail.com>

Am 08.02.2017 um 07:22 schrieb Duy Nguyen:
> On Wed, Feb 8, 2017 at 5:04 AM, René Scharfe <l.s.r@web.de> wrote:
>> Pass the match member of the first pathspec item directly to
>> read_directory() instead of using common_prefix() to duplicate it first,
>> thus avoiding memory duplication, strlen(3) and free(3).
>
> How about killing common_prefix()? There are two other callers in
> ls-files.c and commit.c and it looks safe to do (but I didn't look
> very hard).

I would like that, but it doesn't look like it's worth it.  I have a 
patch series for making overlay_tree_on_cache() take pointer+length, but 
it's surprisingly long and bloats the code.  Duplicating a small piece 
of memory once per command doesn't look so bad in comparison.

(The payoff for avoiding an allocation is higher for library functions 
like fill_directory().)

But while working on that I found two opportunities for improvement in 
prune_cache().  I'll send patches shortly.

> There's a subtle difference. Before the patch, prefix[prefix_len] is
> NUL. After the patch, it's not always true. If some code (incorrectly)
> depends on that, this patch exposes it. I had a look inside
> read_directory() though and it looks like no such code exists. So, all
> good.

Thanks for checking.

NB: The code before 966de302 (dir: convert fill_directory to use the 
pathspec struct interface, committed 2017-01-04) made the same 
assumption, i.e. that NUL is not needed.

René

^ permalink raw reply

* Re: [PATCH v3] gc: ignore old gc.log files
From: Junio C Hamano @ 2017-02-10 19:47 UTC (permalink / raw)
  To: David Turner; +Cc: git, peff, pclouds
In-Reply-To: <20170210192019.13927-1-dturner@twosigma.com>

David Turner <dturner@twosigma.com> writes:

> It would be good if automatic gc were useful to server operators.
> A server can end up in a state whre there are
> lots of unreferenced loose objects (say, because many users are doing
> a bunch of rebasing and pushing their rebased branches). Before this
> patch, this state would cause a gc.log file to be created, preventing
> future auto gcs.  Then pack files could pile up.  Since many git
> operations are O(n) in the number of pack files, this would lead to
> poor performance.  Now, the pack files will get cleaned up, if
> necessary, at least once per day.  And operators who find a need for
> more-frequent gcs can adjust gc.logExpiry to meet their needs.
>
> Git should never get itself into a state where it refuses to do any
> maintenance, just because at some point some piece of the maintenance
> didn't make progress.
>
> In this commit, git learns to ignore gc.log files which are older than
> (by default) one day old.  It also learns about a config, gc.logExpiry
> to manage this.  There is also some cleanup: a successful manual gc,
> or a warning-free auto gc with an old log file, will remove any old
> gc.log files.
>
> It might still happen that manual intervention is required
> (e.g. because the repo is corrupt), but at the very least it won't be
> because Git is too dumb to try again.

Thanks, nicely explained.

> +	if (fstat(get_lock_file_fd(&log_lock), &st)) {
> +		if (errno == ENOENT) {
> +			/*
> +			 * The user has probably intentionally deleted
> +			 * gc.log.lock (perhaps because they're blowing
> +			 * away the whole repo), so thre's no need to
> +			 * report anything here.  But we also won't
> +			 * delete gc.log, because we don't know what
> +			 * the user's intentions are.
> +			 */

OK.

> +		} else {
> +			FILE *fp;
> +			int fd;
> +			int saved_errno = errno;
> +			/*
> +			 * Perhaps there was an i/o error or another
> +			 * unlikely situation.  Try to make a note of
> +			 * this in gc.log.  If this fails again,
> +			 * give up and leave gc.log as it was.
> +			 */
> +			rollback_lock_file(&log_lock);
> +			fd = hold_lock_file_for_update(&log_lock,
> +						       git_path("gc.log"),
> +						       LOCK_DIE_ON_ERROR);
> +
> +			fp = fdopen(fd, "w");
> +			fprintf(fp, _("Failed to fstat %s: %s"),
> +				get_tempfile_path(&log_lock.tempfile),
> +				strerror(errno));

I think you meant to use saved_errno in this message and then

> +			fclose(fp);
> +			commit_lock_file(&log_lock);

possibly assign it back to errno around here?

> +		}
> +
> +	} else if (st.st_size) {
> +		/* There was some error recorded in the lock file */
>  		commit_lock_file(&log_lock);
> -	else
> +	} else {
> +		/* No error, clean up any old gc.log */
> +		unlink(git_path("gc.log"));
>  		rollback_lock_file(&log_lock);
> +	}
>  }

> diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
> index 1762dfa6a..84ad07eb2 100755
> --- a/t/t6500-gc.sh
> +++ b/t/t6500-gc.sh
> @@ -67,5 +67,18 @@ test_expect_success 'auto gc with too many loose objects does not attempt to cre
>  	test_line_count = 2 new # There is one new pack and its .idx
>  '
>  
> +test_expect_success 'background auto gc does not run if gc.log is present and recent but does if it is old' '
> +	keep=$(ls .git/objects/pack/*.pack|head -1|sed -e "s/pack$/keep/") &&

You could save one process with:

    ls .git/objects/pack/*.pack | sed -e "s/pack$/keep/" -e q

but do you even need $keep?  I do not see it used below.

> +	test_commit foo &&
> +	test_commit bar &&
> +	git repack &&
> +	test_config gc.autopacklimit 1 &&
> +	test_config gc.autodetach true &&
> +	echo fleem >.git/gc.log &&
> +	test_must_fail git gc --auto 2>err &&
> +	test_i18ngrep "^error:" err &&
> +	test-chmtime =-86401 .git/gc.log &&
> +	git gc --auto
> +'
>  
>  test_done

Other than that I didn't spot anything suspicious.  I'll wait to see
what others may want to add.

Thanks.

^ permalink raw reply

* Re: fuzzy patch application
From: Junio C Hamano @ 2017-02-10 19:38 UTC (permalink / raw)
  To: Nick Desaulniers; +Cc: git
In-Reply-To: <CAKwvOdn9j=_Ob=xq4ucN6Ar1G537zNiU9ox4iF6o1qO7kPY41A@mail.gmail.com>

Nick Desaulniers <ndesaulniers@google.com> writes:

> I frequently need to backport patches from the Linux kernel to older
> kernel versions (Android Security)....
> ...
> My question is, why does `patch` seem to do a better job at applying
> patches than `git am`?  It's almost like the `git` tools don't try to fuzz
> the offsets.

You diagnosed correctly.  We do allow offsets but by default no fuzz
and that is a deliberate design decision made in very early days.
You can pass option to reduce context, but that is not the default.

^ permalink raw reply

* Re: [PATCH v2 0/9] Store submodules in a hash, not a linked list
From: Junio C Hamano @ 2017-02-10 19:23 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller,
	Johannes Schindelin, David Turner, Jeff King, git
In-Reply-To: <cover.1486724698.git.mhagger@alum.mit.edu>

Michael Haggerty <mhagger@alum.mit.edu> writes:

> This is v2 of the patch series, considerably reorganized but not that
> different codewise.

Thanks.  The way the series loses "!*submodule and !submodule are
the same thing, sometimes" is easier to follow when presented in
this order, at least to me.


^ permalink raw reply

* Re: [PATCH v2 9/9] read_loose_refs(): read refs using resolve_ref_recursively()
From: Junio C Hamano @ 2017-02-10 19:22 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller,
	Johannes Schindelin, David Turner, Jeff King, git
In-Reply-To: <d8e906d969700acbca8dc717673d0a9cdc910f62.1486724698.git.mhagger@alum.mit.edu>

Michael Haggerty <mhagger@alum.mit.edu> writes:

> There is no need to call read_ref_full() or resolve_gitlink_ref() from
> read_loose_refs(), because we already have a ref_store object in hand.
> So we can call resolve_ref_recursively() ourselves. Happily, this
> unifies the code for the submodule vs. non-submodule cases.
>
> This requires resolve_ref_recursively() to be exposed to the refs
> subsystem, though not to non-refs code.
>
> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
> ---
>  refs.c               | 50 +++++++++++++++++++++++++-------------------------
>  refs/files-backend.c | 18 ++++--------------
>  refs/refs-internal.h |  5 +++++
>  3 files changed, 34 insertions(+), 39 deletions(-)

OK, but one thing puzzles me...

> @@ -1390,27 +1390,6 @@ static struct ref_store *main_ref_store;
>  static struct hashmap submodule_ref_stores;
>  
>  /*
> - * Return the ref_store instance for the specified submodule (or the
> - * main repository if submodule is NULL). If that ref_store hasn't
> - * been initialized yet, return NULL.
> - */
> -static struct ref_store *lookup_ref_store(const char *submodule)
> -{
> -	struct submodule_hash_entry *entry;
> -
> -	if (!submodule)
> -		return main_ref_store;
> -
> -	if (!submodule_ref_stores.tablesize)
> -		/* It's initialized on demand in register_ref_store(). */
> -		return NULL;
> -
> -	entry = hashmap_get_from_hash(&submodule_ref_stores,
> -				      strhash(submodule), submodule);
> -	return entry ? entry->refs : NULL;
> -}
> -
> -/*
>   * Register the specified ref_store to be the one that should be used
>   * for submodule (or the main repository if submodule is NULL). It is
>   * a fatal error to call this function twice for the same submodule.
> @@ -1451,6 +1430,27 @@ static struct ref_store *ref_store_init(const char *submodule)
>  	return refs;
>  }
>  
> +/*
> + * Return the ref_store instance for the specified submodule (or the
> + * main repository if submodule is NULL). If that ref_store hasn't
> + * been initialized yet, return NULL.
> + */
> +static struct ref_store *lookup_ref_store(const char *submodule)
> +{
> +	struct submodule_hash_entry *entry;
> +
> +	if (!submodule)
> +		return main_ref_store;
> +
> +	if (!submodule_ref_stores.tablesize)
> +		/* It's initialized on demand in register_ref_store(). */
> +		return NULL;
> +
> +	entry = hashmap_get_from_hash(&submodule_ref_stores,
> +				      strhash(submodule), submodule);
> +	return entry ? entry->refs : NULL;
> +}
> +

I somehow thought that we had an early "reorder the code" step to
avoid hunks like these?  Am I missing some subtle changes made while
moving the function down?

^ permalink raw reply

* fuzzy patch application
From: Nick Desaulniers @ 2017-02-10 19:20 UTC (permalink / raw)
  To: git

I frequently need to backport patches from the Linux kernel to older
kernel versions (Android Security).  My usual workflow for simple
patches is:

1. try `git am patch.txt`.
2. if that fails try `git apply -v patch.txt` then add commit message
manually.
3. if that fails try `patch -p1 < patch.txt` then add commit message
manually.
4. if that fails, manually fix bug based on patch.

My question is, why does `patch` seem to do a better job at applying
patches than `git am`?  It's almost like the `git` tools don't try to fuzz
the offsets.  Is there a way to tell git to try fuzzing offsets when
applying patches?

From the output of `patch` I ran recently (for a patch that
`git apply` would not apply):

patching file <filename.c>
Hunk #1 succeeded at 113 (offset -1 lines).
Hunk #2 succeeded at 435 (offset -3 lines).
Hunk #3 succeeded at 693 (offset 2 lines).
Hunk #4 succeeded at 1535 (offset -41 lines).
Hunk #5 succeeded at 1551 (offset -41 lines).
Hunk #6 succeeded at 1574 with fuzz 2 (offset -42 lines).
Hunk #7 succeeded at 1614 (offset -42 lines).

In fact, `git apply -v` errors for hunk #6.

I would guess maybe that there's an upper limit on the offset searched?
Also, I'm not sure what the `fuzz 2` part means exactly, but it seems like
`git apply` chokes when fuzzing is needed to properly apply a patch.

http://stackoverflow.com/q/6336440/1027966

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply

* [PATCH v3] gc: ignore old gc.log files
From: David Turner @ 2017-02-10 19:20 UTC (permalink / raw)
  To: git; +Cc: peff, pclouds, David Turner

Git should never get itself into a state where it refuses to do any
maintenance, just because at some point some piece of the maintenance
didn't make progress.

In this commit, git learns to ignore gc.log files which are older than
(by default) one day old.  It also learns about a config, gc.logExpiry
to manage this.  There is also some cleanup: a successful manual gc,
or a warning-free auto gc with an old log file, will remove any old
gc.log files.

It might still happen that manual intervention is required
(e.g. because the repo is corrupt), but at the very least it won't be
because Git is too dumb to try again.

Automatic gc was intended to make client repositories be
self-maintaining.  It would be good if automatic gc were also useful
to server operators.  A server can end up in a state whre there are
lots of unreferenced loose objects (say, because many users are doing
a bunch of rebasing and pushing their rebased branches). Before this
patch, this state would cause a gc.log file to be created, preventing
future auto gcs.  Then pack files could pile up.  Since many git
operations are O(n) in the number of pack files, this would lead to
poor performance.  Now, the pack files will get cleaned up, if
necessary, at least once per day.  And operators who find a need for
more-frequent gcs can adjust gc.logExpiry to meet their needs.

Signed-off-by: David Turner <dturner@twosigma.com>
Helped-by: Jeff King <peff@peff.net>
---
 Documentation/config.txt |  6 ++++
 builtin/gc.c             | 75 +++++++++++++++++++++++++++++++++++++++++++-----
 t/t6500-gc.sh            | 13 +++++++++
 3 files changed, 87 insertions(+), 7 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index fc5a28a32..a684b7e3e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1402,6 +1402,12 @@ gc.autoDetach::
 	Make `git gc --auto` return immediately and run in background
 	if the system supports it. Default is true.
 
+gc.logExpiry::
+	If the file gc.log exists, then `git gc --auto` won't run
+	unless that file is more than 'gc.logExpiry' old.  Default is
+	"1.day".  See `gc.pruneExpire` for more ways to specify its
+	value.
+
 gc.packRefs::
 	Running `git pack-refs` in a repository renders it
 	unclonable by Git versions prior to 1.5.1.2 over dumb
diff --git a/builtin/gc.c b/builtin/gc.c
index 331f21926..6f297aa91 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -33,6 +33,7 @@ static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
 static int detach_auto = 1;
+static unsigned long gc_log_expire_time;
 static const char *prune_expire = "2.weeks.ago";
 static const char *prune_worktrees_expire = "3.months.ago";
 
@@ -76,10 +77,47 @@ static void git_config_date_string(const char *key, const char **output)
 static void process_log_file(void)
 {
 	struct stat st;
-	if (!fstat(get_lock_file_fd(&log_lock), &st) && st.st_size)
+	if (fstat(get_lock_file_fd(&log_lock), &st)) {
+		if (errno == ENOENT) {
+			/*
+			 * The user has probably intentionally deleted
+			 * gc.log.lock (perhaps because they're blowing
+			 * away the whole repo), so thre's no need to
+			 * report anything here.  But we also won't
+			 * delete gc.log, because we don't know what
+			 * the user's intentions are.
+			 */
+		} else {
+			FILE *fp;
+			int fd;
+			int saved_errno = errno;
+			/*
+			 * Perhaps there was an i/o error or another
+			 * unlikely situation.  Try to make a note of
+			 * this in gc.log.  If this fails again,
+			 * give up and leave gc.log as it was.
+			 */
+			rollback_lock_file(&log_lock);
+			fd = hold_lock_file_for_update(&log_lock,
+						       git_path("gc.log"),
+						       LOCK_DIE_ON_ERROR);
+
+			fp = fdopen(fd, "w");
+			fprintf(fp, _("Failed to fstat %s: %s"),
+				get_tempfile_path(&log_lock.tempfile),
+				strerror(errno));
+			fclose(fp);
+			commit_lock_file(&log_lock);
+		}
+
+	} else if (st.st_size) {
+		/* There was some error recorded in the lock file */
 		commit_lock_file(&log_lock);
-	else
+	} else {
+		/* No error, clean up any old gc.log */
+		unlink(git_path("gc.log"));
 		rollback_lock_file(&log_lock);
+	}
 }
 
 static void process_log_file_at_exit(void)
@@ -113,6 +151,9 @@ static void gc_config(void)
 	git_config_get_bool("gc.autodetach", &detach_auto);
 	git_config_date_string("gc.pruneexpire", &prune_expire);
 	git_config_date_string("gc.worktreepruneexpire", &prune_worktrees_expire);
+	if (!git_config_get_value("gc.logexpiry", &value))
+		parse_expiry_date(value, &gc_log_expire_time);
+
 	git_config(git_default_config, NULL);
 }
 
@@ -290,19 +331,34 @@ static const char *lock_repo_for_gc(int force, pid_t* ret_pid)
 static int report_last_gc_error(void)
 {
 	struct strbuf sb = STRBUF_INIT;
-	int ret;
+	int ret = 0;
+	struct stat st;
+	char *gc_log_path = git_pathdup("gc.log");
+
+	if (stat(gc_log_path, &st)) {
+		if (errno == ENOENT)
+			goto done;
+
+		ret = error_errno(_("Can't stat %s"), gc_log_path);
+		goto done;
+	}
+
+	if (st.st_mtime < gc_log_expire_time)
+		goto done;
 
-	ret = strbuf_read_file(&sb, git_path("gc.log"), 0);
+	ret = strbuf_read_file(&sb, gc_log_path, 0);
 	if (ret > 0)
-		return error(_("The last gc run reported the following. "
+		ret = error(_("The last gc run reported the following. "
 			       "Please correct the root cause\n"
 			       "and remove %s.\n"
 			       "Automatic cleanup will not be performed "
 			       "until the file is removed.\n\n"
 			       "%s"),
-			     git_path("gc.log"), sb.buf);
+			    gc_log_path, sb.buf);
 	strbuf_release(&sb);
-	return 0;
+done:
+	free(gc_log_path);
+	return ret;
 }
 
 static int gc_before_repack(void)
@@ -349,6 +405,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	argv_array_pushl(&prune_worktrees, "worktree", "prune", "--expire", NULL);
 	argv_array_pushl(&rerere, "rerere", "gc", NULL);
 
+	/* default expiry time, overwritten in gc_config */
+	parse_expiry_date("1.day", &gc_log_expire_time);
 	gc_config();
 
 	if (pack_refs < 0)
@@ -448,5 +506,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		warning(_("There are too many unreachable loose objects; "
 			"run 'git prune' to remove them."));
 
+	if (!daemonized)
+		unlink(git_path("gc.log"));
+
 	return 0;
 }
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 1762dfa6a..84ad07eb2 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -67,5 +67,18 @@ test_expect_success 'auto gc with too many loose objects does not attempt to cre
 	test_line_count = 2 new # There is one new pack and its .idx
 '
 
+test_expect_success 'background auto gc does not run if gc.log is present and recent but does if it is old' '
+	keep=$(ls .git/objects/pack/*.pack|head -1|sed -e "s/pack$/keep/") &&
+	test_commit foo &&
+	test_commit bar &&
+	git repack &&
+	test_config gc.autopacklimit 1 &&
+	test_config gc.autodetach true &&
+	echo fleem >.git/gc.log &&
+	test_must_fail git gc --auto 2>err &&
+	test_i18ngrep "^error:" err &&
+	test-chmtime =-86401 .git/gc.log &&
+	git gc --auto
+'
 
 test_done
-- 
2.11.GIT


^ permalink raw reply related

* Re: [PATCH v2 1/9] refs: reorder some function definitions
From: Junio C Hamano @ 2017-02-10 19:14 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller,
	Johannes Schindelin, David Turner, Jeff King, git
In-Reply-To: <661ef87844918501e84b43d254305e1997af9b57.1486724698.git.mhagger@alum.mit.edu>

Michael Haggerty <mhagger@alum.mit.edu> writes:

> This avoids the need to add forward declarations in the next step.
>
> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
> ---
>  refs.c | 64 ++++++++++++++++++++++++++++++++--------------------------------
>  1 file changed, 32 insertions(+), 32 deletions(-)

Makes sense, but the patch itself looks like ... unreadble ;-)

>
> diff --git a/refs.c b/refs.c
> index 9bd0bc1..707092f 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1358,27 +1358,19 @@ static struct ref_store *main_ref_store;
>  /* A linked list of ref_stores for submodules: */
>  static struct ref_store *submodule_ref_stores;
>  
> -void base_ref_store_init(struct ref_store *refs,
> -			 const struct ref_storage_be *be,
> -			 const char *submodule)
> +struct ref_store *lookup_ref_store(const char *submodule)
>  {
> -	refs->be = be;
> -	if (!submodule) {
> -		if (main_ref_store)
> -			die("BUG: main_ref_store initialized twice");
> +	struct ref_store *refs;
>  
> -		refs->submodule = "";
> -		refs->next = NULL;
> -		main_ref_store = refs;
> -	} else {
> -		if (lookup_ref_store(submodule))
> -			die("BUG: ref_store for submodule '%s' initialized twice",
> -			    submodule);
> +	if (!submodule || !*submodule)
> +		return main_ref_store;
>  
> -		refs->submodule = xstrdup(submodule);
> -		refs->next = submodule_ref_stores;
> -		submodule_ref_stores = refs;
> +	for (refs = submodule_ref_stores; refs; refs = refs->next) {
> +		if (!strcmp(submodule, refs->submodule))
> +			return refs;
>  	}
> +
> +	return NULL;
>  }
>  
>  struct ref_store *ref_store_init(const char *submodule)
> @@ -1395,21 +1387,6 @@ struct ref_store *ref_store_init(const char *submodule)
>  		return be->init(submodule);
>  }
>  
> -struct ref_store *lookup_ref_store(const char *submodule)
> -{
> -	struct ref_store *refs;
> -
> -	if (!submodule || !*submodule)
> -		return main_ref_store;
> -
> -	for (refs = submodule_ref_stores; refs; refs = refs->next) {
> -		if (!strcmp(submodule, refs->submodule))
> -			return refs;
> -	}
> -
> -	return NULL;
> -}
> -
>  struct ref_store *get_ref_store(const char *submodule)
>  {
>  	struct ref_store *refs;
> @@ -1435,6 +1412,29 @@ struct ref_store *get_ref_store(const char *submodule)
>  	return refs;
>  }
>  
> +void base_ref_store_init(struct ref_store *refs,
> +			 const struct ref_storage_be *be,
> +			 const char *submodule)
> +{
> +	refs->be = be;
> +	if (!submodule) {
> +		if (main_ref_store)
> +			die("BUG: main_ref_store initialized twice");
> +
> +		refs->submodule = "";
> +		refs->next = NULL;
> +		main_ref_store = refs;
> +	} else {
> +		if (lookup_ref_store(submodule))
> +			die("BUG: ref_store for submodule '%s' initialized twice",
> +			    submodule);
> +
> +		refs->submodule = xstrdup(submodule);
> +		refs->next = submodule_ref_stores;
> +		submodule_ref_stores = refs;
> +	}
> +}
> +
>  void assert_main_repository(struct ref_store *refs, const char *caller)
>  {
>  	if (*refs->submodule)

^ permalink raw reply

* Re: Bug with fixup and autosquash
From: Philip Oakley @ 2017-02-10 19:02 UTC (permalink / raw)
  To: Johannes Schindelin, Junio C Hamano
  Cc: Ashutosh Bapat, git, Michael Haggerty, Michael J Gruber,
	Matthieu Moy
In-Reply-To: <alpine.DEB.2.20.1702101654500.3496@virtualbox>

From: "Johannes Schindelin" <Johannes.Schindelin@gmx.de>
> Hi Junio,
>
> On Thu, 9 Feb 2017, Junio C Hamano wrote:
>
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>
>> > Almost. While I fixed the performance issues as well as the design
>> > allowed, I happened to "fix" the problem where an incomplete prefix
>> > match could be favored over an exact match.
>>
>> Hmph.  Would it require too much further work to do what you said the
>> code does:
>
> I was just being overly precise. I *did* fix the problem. But since it was
> not my intention, I quoted the verb "fix".
>
>> > The rebase--helper code (specifically, the patch moving autosquash
>> > logic into it: https://github.com/dscho/git/commit/7d0831637f) tries
>> > to match exact onelines first, and falls back to prefix matching only
>> > after that.
>>
>> If the code matches exact onlines and then falls back to prefix, I do
>> not think incomplete prefix would be mistakenly chosen over an exact
>> one, so perhaps your code already does the right thing?
>
> The code does exactly that. It does even more: as `fixup! <SHA-1>` is
> allowed (for SHA-1s that have been mentioned in previous `pick` lines), it
> tries to match that before falling back to the incomplete prefix match.
>
> Ciao,
> Johannes

Now just the doc update to do.... ;-)

I definitely think the 'fix' that allows the `fixup! <SHA-1>` as the subject 
line is a good way to go for those who mix in the use of the gui and gitk 
into their workflow (*)

--
Philip
(*) I just don't see the point of having multiple cli tty windows, and then 
not using the gui/k windows that are part of the tool set.


^ permalink raw reply

* Re: [PATCH v2 0/2] Fix bugs in rev-parse's output when run in a subdirectory
From: Junio C Hamano @ 2017-02-10 18:59 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Nguyễn Thái Ngọc Duy, Michael Rappazzo
In-Reply-To: <cover.1486740772.git.johannes.schindelin@gmx.de>

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> Johannes Schindelin (1):
>   rev-parse: fix several options when running in a subdirectory
>
> Michael Rappazzo (1):
>   rev-parse tests: add tests executed from a subdirectory
>
>  builtin/rev-parse.c      | 15 +++++++++++----
>  t/t1500-rev-parse.sh     | 28 ++++++++++++++++++++++++++++
>  t/t1700-split-index.sh   | 17 +++++++++++++++++
>  t/t2027-worktree-list.sh | 10 +++++++++-
>  4 files changed, 65 insertions(+), 5 deletions(-)
>
>
> base-commit: 6e3a7b3398559305c7a239a42e447c21a8f39ff8
> Published-As: https://github.com/dscho/git/releases/tag/git-path-in-subdir-v2
> Fetch-It-Via: git fetch https://github.com/dscho/git git-path-in-subdir-v2

Will queue as js/git-path-in-subdir topic forked at 2.12-rc0.

I still think the log message in 2/2 is making a mountain out of
molehill and showing a skewed perception on pros-and-cons in a
design decision, but I won't repeat my review.  I saw a few
correctness issues and pointed them out on the patches.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox