From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Git List" <git@vger.kernel.org>,
"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
"Michael Haggerty" <mhagger@alum.mit.edu>,
"Stefan Beller" <stefanbeller@gmail.com>,
"Jeff King" <peff@peff.net>,
"Jonathan Nieder" <jrnieder@gmail.com>
Subject: Re: BUG: Race condition due to reflog expiry in "gc"
Date: Wed, 13 Mar 2019 11:28:39 +0100 [thread overview]
Message-ID: <87sgvrax0o.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <xmqq1s3bh7ky.fsf@gitster-ct.c.googlers.com>
On Wed, Mar 13 2019, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> I'm still working on fixing a race condition I encountered in "gc"
>> recently, but am not 100% sure of the fix. So I thought I'd send a
>> braindump of what I have so far in case it jolts any memories.
>>
>> The problem is that when we "gc" we run "reflog expire --all". This
>> iterates over the reflogs, and takes a *.lock for each reference.
>>
>> It'll fail intermittendly in two ways:
>>
>> 1. If something is concurrently committing to the repo it'll fail
>> because we for a tiny amount of time hold a HEAD.lock file, so HEAD
>> can't be updated.
>>
>> 2. On a repository that's just being "git fetch"'d by some concurrent
>> process the "gc" will fail, because the ref's SHA1 has changed,
>> which we inspect as we aquire the lock.
>
> Both sounds very much expected and expectable outcome. I am not
> sure how they need to be called bugs.
Let's leave aside that I started the subject with "BUG:" and let me
rephrase.
I was under the impression that git-gc was supposed to support operating
on a repository that's concurrently being modified, as long as you don't
set the likes of gc.pruneExpire too aggressively.
Running a "gc" in a loop without "git reflog expire --all" and when
watching the repository being GC'd with:
fswatch -l 0.1 -t -r . 2>&1 | grep lock
We only create .git/MERGE_RR.lock, .git/gc.pid.lock and
git/packed-refs.lock. These are all things that would only cause another
concurrent GC to fail, not a normal git command.
So the only reason a concurrent commit (case #1) fails is because of the
refs being locked during the reflog iteration, and similarly "gc" itself
will fail due to a concurrently updating ref (case #2).
It seems that first of all we need this, I'll submit that as a separate
patch sometime soon:
diff --git a/builtin/gc.c b/builtin/gc.c
index 020f725acc..ae488646e1 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -127,6 +127,12 @@ static void gc_config(void)
pack_refs = git_config_bool("gc.packrefs", value);
}
+ if (!git_config_get_value("gc.reflogexpire", &value) && value &&
+ !strcmp(value, "never") &&
+ !git_config_get_value("gc.reflogexpireunreachable", &value) && value &&
+ !strcmp(value, "never"))
+ prune_reflogs = 0;
+
git_config_get_int("gc.aggressivewindow", &aggressive_window);
git_config_get_int("gc.aggressivedepth", &aggressive_depth);
git_config_get_int("gc.auto", &gc_auto_threshold);
I.e. now even if your gc.* config says you don't want the reflogs
touched, we still pointlessly iterate over all of them. The case I'm
running into (a variant of #2) is one solved by that patch, i.e. I'm
fine "gc" just having the reflogs kept forever as a workaround in this
case.
Something like that should have been added back in 62aad1849f ("gc
--auto: do not lock refs in the background", 2014-05-25), i.e. now the
"prune_reflogs" variable is never used, it's just cargo-culted from a
copy/pasting of the "pack_refs" code.
In other "gc" phases in "pack-objects" and "prune" we also look at the
reflogs. This obviously bad patch ignores them entirely:
diff --git a/builtin/prune.c b/builtin/prune.c
index 97613eccb5..bccee7813e 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -41,7 +41,7 @@ static void perform_reachability_traversal(struct rev_info *revs)
if (show_progress)
progress = start_delayed_progress(_("Checking connectivity"), 0);
- mark_reachable_objects(revs, 1, expire, progress);
+ mark_reachable_objects(revs, 0, expire, progress);
stop_progress(&progress);
initialized = 1;
}
diff --git a/builtin/repack.c b/builtin/repack.c
index 67f8978043..618ffbfe0a 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -364,7 +364,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
keep_pack_list.items[i].string);
argv_array_push(&cmd.args, "--non-empty");
argv_array_push(&cmd.args, "--all");
- argv_array_push(&cmd.args, "--reflog");
argv_array_push(&cmd.args, "--indexed-objects");
if (repository_format_partial_clone)
argv_array_push(&cmd.args, "--exclude-promisor-objects");
I'm just including that as illustration that add_reflogs_to_pending() in
revision.c during "gc" already iterates over the reflogs without locking
anything, but of course it's just reading them.
So one thing that would mitigate things a lot is if
files_reflog_expire() and its call to expire_reflog_ent() via
refs_for_each_reflog_ent() would lazily aquire the lock on the ref.
Digging a bit further that's actually what we're doing now since
4ff0f01cb7 ("refs: retry acquiring reference locks for 100ms",
2017-08-21).
But this runs into the logic we've had for a long time, or since your
bda3a31cc7 ("reflog-expire: Avoid creating new files in a directory
inside readdir(3) loop", 2008-01-25) where we first loop over all the
refs in the process of finding the reflogs, and then will try to lock
those refs at those expected SHA-1s. If they've changed in the meantime
we error out don't clean up the lockfile.
So just this fixes that:
diff --git a/refs/files-backend.c b/refs/files-backend.c
index ef053f716c..b6576f28b9 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3037,7 +3037,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
* reference itself, plus we might need to update the
* reference if --updateref was specified:
*/
- lock = lock_ref_oid_basic(refs, refname, oid,
+ lock = lock_ref_oid_basic(refs, refname, NULL,
NULL, NULL, REF_NO_DEREF,
&type, &err);
if (!lock) {
Which seems sensible to me. We'll still get the lock, we just don't
assert that the refname we use to get the lock must be at that
SHA-1. We'll still use it for the purposes of expiry.
But maybe I've missed some caveat in reflog_expiry_prepare() and friends
and we really do need the reflog at that OID, then this would suck less:
diff --git a/builtin/reflog.c b/builtin/reflog.c
index 4d3430900d..4bb272fdc8 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -625,12 +625,16 @@ static int cmd_reflog_expire(int argc, const char **argv, const char *prefix)
free_worktrees(worktrees);
for (i = 0; i < collected.nr; i++) {
struct collected_reflog *e = collected.e[i];
+ int st;
set_reflog_expiry_param(&cb.cmd, explicit_expiry, e->reflog);
- status |= reflog_expire(e->reflog, &e->oid, flags,
- reflog_expiry_prepare,
- should_expire_reflog_ent,
- reflog_expiry_cleanup,
- &cb);
+ st = reflog_expire(e->reflog, &e->oid, flags,
+ reflog_expiry_prepare,
+ should_expire_reflog_ent,
+ reflog_expiry_cleanup,
+ &cb);
+ if (st == -2)
+ continue;
+ status |= st;
free(e);
}
free(collected.e);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index ef053f716c..8b0b6b7b85 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3041,6 +3041,11 @@ static int files_reflog_expire(struct ref_store *ref_store,
NULL, NULL, REF_NO_DEREF,
&type, &err);
if (!lock) {
+ if (errno == EBUSY) {
+ warning("cannot lock ref '%s': %s. Skipping!", refname, err.buf);
+ strbuf_release(&err);
+ return -2;
+ }
error("cannot lock ref '%s': %s", refname, err.buf);
strbuf_release(&err);
return -1;
I.e. we just detect the EBUSY that verify_lock() sets if the OID doesn't
match, and don't prune that reflog. As seen above "pack-objects" and
"prune" will still iterate over the same logs later for the purposes of
reachability, so this shouldn't get us into a corrupt state due to
throwing away objects referenced in those logs, we'll just prune fewer
things than we could have.
So I think I'll use the first patch noted above as a hack to solve the
narrow problem I have now, but any comments on the above most
welcome. I'm not very familiar with the ref code in case that wasn't
obvious already.
B.t.w. the mention of f3b661f766 ("expire_reflog(): use a lock_file for
rewriting the reflog file", 2014-12-12) upthread is irrelevant. That's a
commit where we use the lockfile code to write out the *new* reflog,
which is unrelated to all of this.
next prev parent reply other threads:[~2019-03-13 10:28 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-12 23:28 BUG: Race condition due to reflog expiry in "gc" Ævar Arnfjörð Bjarmason
2019-03-13 1:43 ` Junio C Hamano
2019-03-13 10:28 ` Ævar Arnfjörð Bjarmason [this message]
2019-03-13 16:02 ` Jeff King
2019-03-13 16:22 ` Ævar Arnfjörð Bjarmason
2019-03-13 23:54 ` [PATCH 0/5] gc: minor code cleanup + contention fixes Ævar Arnfjörð Bjarmason
2019-03-14 12:34 ` [PATCH v2 0/7] " Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 0/8] " Ævar Arnfjörð Bjarmason
2019-03-18 6:13 ` Junio C Hamano
2019-03-28 16:14 ` [PATCH v4 0/7] gc: tests and handle reflog expire config Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 1/7] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 2/7] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 3/7] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 4/7] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 5/7] reflog tests: test for the "points nowhere" warning Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 6/7] reflog tests: assert lack of early exit with expiry="never" Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 7/7] gc: handle & check gc.reflogExpire config Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 1/8] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 2/8] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 3/8] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 4/8] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 5/8] reflog tests: test for the "points nowhere" warning Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 6/8] reflog tests: assert lack of early exit with expiry="never" Ævar Arnfjörð Bjarmason
2019-03-19 6:20 ` Jeff King
2019-03-15 15:59 ` [PATCH v3 7/8] gc: handle & check gc.reflogExpire config Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 8/8] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
[not found] ` <b870a17d-2103-41b8-3cbc-7389d5fff33a@alum.mit.edu>
2019-03-21 14:10 ` Ævar Arnfjörð Bjarmason
2019-03-19 6:27 ` [PATCH v2 0/7] gc: minor code cleanup + contention fixes Jeff King
2019-03-14 12:34 ` [PATCH v2 1/7] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-14 12:34 ` [PATCH v2 2/7] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-15 9:51 ` Duy Nguyen
2019-03-15 10:42 ` Ævar Arnfjörð Bjarmason
2019-03-15 15:49 ` Johannes Schindelin
2019-03-14 12:34 ` [PATCH v2 3/7] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-14 12:34 ` [PATCH v2 4/7] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-14 12:34 ` [PATCH v2 5/7] reflog: exit early if there's no work to do Ævar Arnfjörð Bjarmason
2019-03-15 10:01 ` Duy Nguyen
2019-03-15 15:51 ` Ævar Arnfjörð Bjarmason
2019-03-14 12:34 ` [PATCH v2 6/7] gc: don't run "reflog expire" when keeping reflogs Ævar Arnfjörð Bjarmason
2019-03-15 10:05 ` Duy Nguyen
2019-03-15 10:24 ` Ævar Arnfjörð Bjarmason
2019-03-15 10:32 ` Duy Nguyen
2019-03-15 15:51 ` Johannes Schindelin
2019-03-18 6:04 ` Junio C Hamano
2019-03-14 12:34 ` [PATCH v2 7/7] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
2019-03-15 11:10 ` Duy Nguyen
2019-03-15 15:52 ` Ævar Arnfjörð Bjarmason
2019-03-13 23:54 ` [PATCH 1/5] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-14 0:25 ` Jeff King
2019-03-13 23:54 ` [PATCH 2/5] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-14 0:26 ` Jeff King
2019-03-13 23:54 ` [PATCH 3/5] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-14 0:30 ` Jeff King
2019-03-13 23:54 ` [PATCH 4/5] gc: don't run "reflog expire" when keeping reflogs Ævar Arnfjörð Bjarmason
2019-03-14 0:40 ` Jeff King
2019-03-14 4:51 ` Junio C Hamano
2019-03-14 19:26 ` Jeff King
2019-03-13 23:54 ` [PATCH 5/5] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
2019-03-14 0:44 ` Jeff King
2019-03-14 0:25 ` BUG: Race condition due to reflog expiry in "gc" Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87sgvrax0o.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jrnieder@gmail.com \
--cc=mhagger@alum.mit.edu \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
--cc=stefanbeller@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.