All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Jeff King" <peff@peff.net>,
	"Stefan Beller" <stefanbeller@gmail.com>,
	"Jonathan Nieder" <jrnieder@gmail.com>
Subject: Re: [PATCH v3 8/8] reflog expire: don't assert the OID when locking refs
Date: Thu, 21 Mar 2019 15:10:08 +0100	[thread overview]
Message-ID: <87pnqkco8v.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <b870a17d-2103-41b8-3cbc-7389d5fff33a@alum.mit.edu>


On Tue, Mar 19 2019, Michael Haggerty wrote:

> Thanks for your work and for your thorough explanation of the change!

Hi. Yes, thanks a lot for the feedback. Just hadn't gotten around to
looping back to this yet & digging into the issue you raised.

> On 3/15/19 4:59 PM, Ævar Arnfjörð Bjarmason wrote:
>> During reflog expiry, the cmd_reflog_expire() function first iterates
>> over all reflogs in logs/*, and then one-by-one acquires the lock for
>> each one to expire its reflog by getting a *.lock file on the
>> corresponding loose ref[1] (even if the actual ref is packed).
>>
>> This lock is needed, but what isn't needed is locking the loose ref as
>> a function of the OID we found from that first iteration. By the time
>> we get around to re-visiting the reference some of the OIDs may have
>> changed.
>
> Instead of "what isn't needed is locking the loose ref as a function of
> the OID we found from that first iteration", I suggest "what isn't
> needed is to insist that the reference still has the OID that we found
> in that first iteration".
>
>> Thus the verify_lock() function called by the lock_ref_oid_basic()
>> function being changed here would fail with e.g. "ref '%s' is at %s
>> but expected %s" if the repository was being updated concurrent to the
>> "reflog expire".
>>
>> By not passing the OID to it we'll try to lock the reference
>> regardless of it last known OID. Locking as a function of the OID
>
> s/it/its/
>
>> would make "reflog expire" exit with a non-zero exit status under such
>> contention, which in turn meant that a "gc" command (which expires
>> reflogs before forking to the background) would encounter a hard
>> error.
>
> The last sentence seems mostly redundant with the previous paragraph.
>
>> This behavior of considering the OID when locking has been here ever
>> since "reflog expire" was initially implemented in 4264dc15e1 ("git
>> reflog expire", 2006-12-19). As seen in that simpler initial version
>> of the code we subsequently use the OID to inform the expiry (and
>> still do), but never needed to use it to lock the reference associated
>> with the reflog.
>>
>> By locking the reference without considering what OID we last saw it
>> at, we won't encounter user-visible contention to the extent that
>> core.filesRefLockTimeout mitigates it. See 4ff0f01cb7 ("refs: retry
>> acquiring reference locks for 100ms", 2017-08-21).
>>
>> Unfortunately this sort of probabilistic contention is hard to turn
>> into a test. I've tested this by running the following three subshells
>> in concurrent terminals:
>>
>>     (
>>         cd /tmp &&
>>         rm -rf git &&
>>         git init git &&
>>         cd git &&
>>         while true
>>         do
>>             head -c 10 /dev/urandom | hexdump >out &&
>>             git add out &&
>>             git commit -m"out"
>>         done
>>     )
>>
>>     (
>>         cd /tmp &&
>>         rm -rf git-clone &&
>>         git clone file:///tmp/git git-clone &&
>>         cd git-clone &&
>>         while git pull
>>         do
>>             date
>>         done
>>     )
>>
>>     (
>>         cd /tmp/git-clone &&
>>         while git reflog expire --all
>>         do
>>             date
>>         done
>>     )
>>
>> Before this change the "reflog expire" would fail really quickly with
>> a "but expected" error. After this change both the "pull" and "reflog
>> expire" will run for a while, but eventually fail because I get
>> unlucky with core.filesRefLockTimeout (the "reflog expire" is in a
>> really tight loop). That can be resolved by being more generous with
>> higher values of core.filesRefLockTimeout than the 100ms default.
>>
>> As noted in the commentary being added here we also need to handle the
>> case of references being racily deleted, that can be tested by adding
>> this to the above:
>>
>>     (
>>         cd /tmp/git-clone &&
>>         while git branch topic master && git branch -D topic
>>         do
>>             date
>>         done
>>     )
>>
>> We could change lock_ref_oid_basic() to always pass down
>> RESOLVE_REF_READING to refs_resolve_ref_unsafe() and then
>> files_reflog_expire() to detect the "is it deleted?" state. But let's
>> not bother, in the event of such a race we're going to redundantly
>> create a lock on the deleted reference, and shortly afterwards handle
>> that case and others with the refs_reflog_exists() check.
>>
>> 1. https://public-inbox.org/git/54857871.5090805@alum.mit.edu/
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  refs/files-backend.c | 15 ++++++++++++++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/refs/files-backend.c b/refs/files-backend.c
>> index ef053f716c3..c7ed1792b3b 100644
>> --- a/refs/files-backend.c
>> +++ b/refs/files-backend.c
>> @@ -3036,8 +3036,14 @@ static int files_reflog_expire(struct ref_store *ref_store,
>>  	 * The reflog file is locked by holding the lock on the
>>  	 * reference itself, plus we might need to update the
>>  	 * reference if --updateref was specified:
>> +	 *
>> +	 * We don't pass down the oid here because we'd like to be
>> +	 * tolerant to the OID of the ref having changed, and to
>> +	 * gracefully handle the case where it's been deleted (see oid
>> +	 * -> mustexist -> RESOLVE_REF_READING in
>> +	 * lock_ref_oid_basic()) ...
>>  	 */
>> -	lock = lock_ref_oid_basic(refs, refname, oid,
>> +	lock = lock_ref_oid_basic(refs, refname, NULL,
>>  				  NULL, NULL, REF_NO_DEREF,
>>  				  &type, &err);
>
> This seems totally reasonable. But then later, where `oid` is passed to
> `(*prepare_fn)()`, I think you must pass `&(lock->old_oid)` instead,
> since we no longer have a guarantee that `oid` reflects the correct
> state of the reference. And after that, there is no need for this
> function to take an `oid` parameter at all (which also makes sense from
> an abstract point of view). Which means that the signatures of
> `refs_reflog_expire()`, `reflog_expire()`, `packed_reflog_expire()`, and
> `reflog_expire_fn` can also be changed, along with callers.
>
> I haven't had time yet to inspect those callers to see whether they
> might actually care that the `oid` that they used to pass to
> `reflog_expire()` isn't necessarily the one that gets passed back to
> their callbacks, but following the trail that I just outlined should
> make it possible to determine that.
>
>>  	if (!lock) {
>> @@ -3045,6 +3051,13 @@ static int files_reflog_expire(struct ref_store *ref_store,
>>  		strbuf_release(&err);
>>  		return -1;
>>  	}
>> +	/*
>> +	 * When refs are deleted their reflog is deleted before the
>> +	 * loose ref is deleted. This catches that case, i.e. when
>> +	 * racing against a ref deletion lock_ref_oid_basic() will
>> +	 * have acquired a lock on the now-deleted ref, but here's
>> +	 * where we find out it has no reflog anymore.
>> +	 */
>>  	if (!refs_reflog_exists(ref_store, refname)) {
>>  		unlock_ref(lock);
>>  		return 0;
>>
>
> Cheers,
> Michael

  parent reply	other threads:[~2019-03-21 14:10 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-12 23:28 BUG: Race condition due to reflog expiry in "gc" Ævar Arnfjörð Bjarmason
2019-03-13  1:43 ` Junio C Hamano
2019-03-13 10:28   ` Ævar Arnfjörð Bjarmason
2019-03-13 16:02     ` Jeff King
2019-03-13 16:22       ` Ævar Arnfjörð Bjarmason
2019-03-13 23:54         ` [PATCH 0/5] gc: minor code cleanup + contention fixes Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 0/7] " Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 0/8] " Ævar Arnfjörð Bjarmason
2019-03-18  6:13               ` Junio C Hamano
2019-03-28 16:14               ` [PATCH v4 0/7] gc: tests and handle reflog expire config Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 1/7] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 2/7] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 3/7] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 4/7] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 5/7] reflog tests: test for the "points nowhere" warning Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 6/7] reflog tests: assert lack of early exit with expiry="never" Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 7/7] gc: handle & check gc.reflogExpire config Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 1/8] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 2/8] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 3/8] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 4/8] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 5/8] reflog tests: test for the "points nowhere" warning Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 6/8] reflog tests: assert lack of early exit with expiry="never" Ævar Arnfjörð Bjarmason
2019-03-19  6:20               ` Jeff King
2019-03-15 15:59             ` [PATCH v3 7/8] gc: handle & check gc.reflogExpire config Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 8/8] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
     [not found]               ` <b870a17d-2103-41b8-3cbc-7389d5fff33a@alum.mit.edu>
2019-03-21 14:10                 ` Ævar Arnfjörð Bjarmason [this message]
2019-03-19  6:27             ` [PATCH v2 0/7] gc: minor code cleanup + contention fixes Jeff King
2019-03-14 12:34           ` [PATCH v2 1/7] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 2/7] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-15  9:51             ` Duy Nguyen
2019-03-15 10:42               ` Ævar Arnfjörð Bjarmason
2019-03-15 15:49                 ` Johannes Schindelin
2019-03-14 12:34           ` [PATCH v2 3/7] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 4/7] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 5/7] reflog: exit early if there's no work to do Ævar Arnfjörð Bjarmason
2019-03-15 10:01             ` Duy Nguyen
2019-03-15 15:51               ` Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 6/7] gc: don't run "reflog expire" when keeping reflogs Ævar Arnfjörð Bjarmason
2019-03-15 10:05             ` Duy Nguyen
2019-03-15 10:24               ` Ævar Arnfjörð Bjarmason
2019-03-15 10:32                 ` Duy Nguyen
2019-03-15 15:51                 ` Johannes Schindelin
2019-03-18  6:04                 ` Junio C Hamano
2019-03-14 12:34           ` [PATCH v2 7/7] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
2019-03-15 11:10             ` Duy Nguyen
2019-03-15 15:52               ` Ævar Arnfjörð Bjarmason
2019-03-13 23:54         ` [PATCH 1/5] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-14  0:25           ` Jeff King
2019-03-13 23:54         ` [PATCH 2/5] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-14  0:26           ` Jeff King
2019-03-13 23:54         ` [PATCH 3/5] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-14  0:30           ` Jeff King
2019-03-13 23:54         ` [PATCH 4/5] gc: don't run "reflog expire" when keeping reflogs Ævar Arnfjörð Bjarmason
2019-03-14  0:40           ` Jeff King
2019-03-14  4:51             ` Junio C Hamano
2019-03-14 19:26               ` Jeff King
2019-03-13 23:54         ` [PATCH 5/5] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
2019-03-14  0:44           ` Jeff King
2019-03-14  0:25         ` BUG: Race condition due to reflog expiry in "gc" Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pnqkco8v.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=stefanbeller@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.