From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
"Jeff King" <peff@peff.net>,
"Michael Haggerty" <mhagger@alum.mit.edu>,
"Stefan Beller" <stefanbeller@gmail.com>,
"Jonathan Nieder" <jrnieder@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH v3 0/8] gc: minor code cleanup + contention fixes
Date: Fri, 15 Mar 2019 16:59:51 +0100 [thread overview]
Message-ID: <20190315155959.12390-1-avarab@gmail.com> (raw)
In-Reply-To: <20190314123439.4347-1-avarab@gmail.com>
Changes since v2:
* The "let's detect that we don't need work in reflog expire" patch
is gone, as noted in a new set of tests/commit messages that was
tricker than expected, and not worth it.
* Minor rewording fixes to commit messages.
* Further paraphrased rationale in commit messages from replies to v2
feedback (thanks all!)
* The "no OID" and "no mustexist" case was confusingly conflated, now
we still do that in 8/8, but with a commit message & commit
explaining why and how that works.
Ævar Arnfjörð Bjarmason (8):
gc: remove redundant check for gc_auto_threshold
gc: convert to using the_hash_algo
gc: refactor a "call me once" pattern
reflog tests: make use of "test_config" idiom
reflog tests: test for the "points nowhere" warning
reflog tests: assert lack of early exit with expiry="never"
gc: handle & check gc.reflogExpire config
reflog expire: don't assert the OID when locking refs
builtin/gc.c | 37 +++++++++++++++++++++++++++++--------
refs/files-backend.c | 15 ++++++++++++++-
t/t1410-reflog.sh | 25 +++++++++++++++++--------
t/t6500-gc.sh | 19 +++++++++++++++++++
4 files changed, 79 insertions(+), 17 deletions(-)
Range-diff:
1: f11699d8e77 = 1: 81694c82130 gc: remove redundant check for gc_auto_threshold
2: e18433f9c6b ! 2: 4bdcf1d0be2 gc: convert to using the_hash_algo
@@ -17,20 +17,33 @@
but objects for that hash will share a directory storage with the
other hash.
- Thus we could theoretically have 1k SHA-1 loose objects, and say 1
- million SHA-256 objects, and not notice because we're currently using
- SHA-1.
+ Thus we could theoretically have e.g. 1k SHA-1 loose objects, and 1
+ million SHA-256 objects. Then not notice that we need to pack them
+ because we're currently using SHA-1, even though our FS may be
+ straining under the stress of such humongous directories.
So assuming that "gc" eventually learns to pack up both SHA-1 and
- SHA-256 objects regardless of what the current the_hash_alg is perhaps
- this check should be changed to consider all files in objects/17/
- matching [0-9a-f] 38 or 62 characters in length (i.e. both SHA-1 and
- SHA-256).
+ SHA-256 objects regardless of what the current the_hash_algo is,
+ perhaps this check should be changed to consider all files in
+ objects/17/ matching [0-9a-f] 38 or 62 characters in length (i.e. both
+ SHA-1 and SHA-256).
But none of that is something we need to worry about now, and
supporting both 38 and 62 characters depending on "the_hash_algo"
removes another case of SHA-1 hardcoding.
+ As noted in [1] I'm making no effort to somehow remove the hardcoding
+ for "2" as in "use the first two hexdigits for the directory
+ name". There's no indication that that'll ever change, and somehow
+ generalizing it here would be a drop in the ocean, so there's no point
+ in doing that. It also couldn't be done without coming up with some
+ generalized version of the magical "objects/17" directory. See [2] for
+ a discussion of that directory.
+
+ 1. https://public-inbox.org/git/874l84ber7.fsf@evledraar.gmail.com/
+
+ 2. https://public-inbox.org/git/87k1mta9x5.fsf@evledraar.gmail.com/
+
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
diff --git a/builtin/gc.c b/builtin/gc.c
3: 54e4bce91c3 = 3: 9444a1233af gc: refactor a "call me once" pattern
4: 82f87db1348 = 4: 60a06ae6185 reflog tests: make use of "test_config" idiom
-: ----------- > 5: 52838fdc449 reflog tests: test for the "points nowhere" warning
5: c79608dbbb3 ! 6: 6063429f108 reflog: exit early if there's no work to do
@@ -1,32 +1,25 @@
Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
- reflog: exit early if there's no work to do
+ reflog tests: assert lack of early exit with expiry="never"
When gc.reflogExpire and gc.reflogExpireUnreachable are set to "never"
and --stale-fix isn't in effect (covered by the first part of the "if"
- statement being modified here) we can exit early without pointlessly
- looping over all the reflogs.
+ statement being modified here) we *could* exit early without
+ pointlessly looping over all the reflogs.
- Signed-off-by: Jeff King <peff@peff.net>
- Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
+ However, as an earlier change to add a test for the "points nowhere"
+ warning shows even in such a mode we might want to print out a
+ warning.
- diff --git a/builtin/reflog.c b/builtin/reflog.c
- --- a/builtin/reflog.c
- +++ b/builtin/reflog.c
-@@
- mark_reachable_objects(&cb.cmd.revs, 0, 0, NULL);
- if (flags & EXPIRE_REFLOGS_VERBOSE)
- putchar('\n');
-+ } else if (!cb.cmd.expire_total && !cb.cmd.expire_unreachable) {
-+ /*
-+ * If we're not expiring anything and not dropping
-+ * stale entries, there's no point in even opening the
-+ * reflogs, since we're guaranteed to do nothing.
-+ */
-+ return 0;
- }
-
- if (do_all) {
+ So while it's conceivable to implement this, I don't think it's worth
+ it. It's going to be too easy to inadvertently add some flag that'll
+ make the expiry happen anyway, and even with "never" we'd like to see
+ all the lines we're going to keep.
+
+ So let's assert that we're going to loop over all the references even
+ when this configuration is in effect.
+
+ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh
--- a/t/t1410-reflog.sh
@@ -37,7 +30,7 @@
- git reflog expire --verbose --all &&
+ git reflog expire --verbose --all >output &&
-+ test_line_count = 0 output &&
++ test_line_count = 9 output &&
+
git reflog refs/heads/master >output &&
test_line_count = 4 output
6: c47dedab58d ! 7: 6693d1d84da gc: don't run "reflog expire" when keeping reflogs
@@ -1,22 +1,38 @@
Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
- gc: don't run "reflog expire" when keeping reflogs
+ gc: handle & check gc.reflogExpire config
Don't redundantly run "git reflog expire --all" when gc.reflogExpire
- and gc.reflogExpireUnreachable are set to "never".
+ and gc.reflogExpireUnreachable are set to "never", and die immediately
+ if those configuration valuer are bad.
- An earlier change taught "git reflog expire" itself to exit early
- under this scenario, so in some sense this isn't strictly
- necessary. Reasons to also do it here:
+ As an earlier "assert lack of early exit" change to the tests for "git
+ reflog expire" shows, an early check of gc.reflogExpire{Unreachable,}
+ isn't wanted in general for "git reflog expire", but it makes sense
+ for "gc" because:
- 1) Similar to 8ab5aa4bd8 ("parseopt: handle malformed --expire
- arguments more nicely", 2018-04-21). We'll die early if the config
- variables are set to invalid values. We run "pack-refs" before
- "reflog expire", which can take a while, only to then die on an
- invalid gc.reflogExpire{Unreachable,} configuration.
+ 1) Similarly to 8ab5aa4bd8 ("parseopt: handle malformed --expire
+ arguments more nicely", 2018-04-21) we'll now die early if the
+ config variables are set to invalid values.
+
+ We run "pack-refs" before "reflog expire", which can take a while,
+ only to then die on an invalid gc.reflogExpire{Unreachable,}
+ configuration.
2) Not invoking the command at all means it won't show up in trace
- output, which makes what's going on more obvious.
+ output, which makes what's going on more obvious when the two are
+ set to "never".
+
+ 3) As a later change documents we lock the refs when looping over the
+ refs to expire, even in cases where we end up doing nothing due to
+ this config.
+
+ For the reasons noted in the earlier "assert lack of early exit"
+ change I don't think it's worth it to bend over backwards in "git
+ reflog expire" itself to carefully detect if we'll really do
+ nothing given the combination of all its possible options and skip
+ that locking, but that's easy to detect here in "gc" where we'll
+ only run "reflog expire" in a relatively simple mode.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
7: 55dd203a042 ! 8: e0814569aba reflog expire: don't assert the OID when locking refs
@@ -79,6 +79,25 @@
really tight loop). That can be resolved by being more generous with
higher values of core.filesRefLockTimeout than the 100ms default.
+ As noted in the commentary being added here we also need to handle the
+ case of references being racily deleted, that can be tested by adding
+ this to the above:
+
+ (
+ cd /tmp/git-clone &&
+ while git branch topic master && git branch -D topic
+ do
+ date
+ done
+ )
+
+ We could change lock_ref_oid_basic() to always pass down
+ RESOLVE_REF_READING to refs_resolve_ref_unsafe() and then
+ files_reflog_expire() to detect the "is it deleted?" state. But let's
+ not bother, in the event of such a race we're going to redundantly
+ create a lock on the deleted reference, and shortly afterwards handle
+ that case and others with the refs_reflog_exists() check.
+
1. https://public-inbox.org/git/54857871.5090805@alum.mit.edu/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
@@ -87,11 +106,32 @@
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@
+ * The reflog file is locked by holding the lock on the
* reference itself, plus we might need to update the
* reference if --updateref was specified:
++ *
++ * We don't pass down the oid here because we'd like to be
++ * tolerant to the OID of the ref having changed, and to
++ * gracefully handle the case where it's been deleted (see oid
++ * -> mustexist -> RESOLVE_REF_READING in
++ * lock_ref_oid_basic()) ...
*/
- lock = lock_ref_oid_basic(refs, refname, oid,
-+ lock = lock_ref_oid_basic(refs, refname, NULL /* NOT oid! */,
++ lock = lock_ref_oid_basic(refs, refname, NULL,
NULL, NULL, REF_NO_DEREF,
&type, &err);
if (!lock) {
+@@
+ strbuf_release(&err);
+ return -1;
+ }
++ /*
++ * When refs are deleted their reflog is deleted before the
++ * loose ref is deleted. This catches that case, i.e. when
++ * racing against a ref deletion lock_ref_oid_basic() will
++ * have acquired a lock on the now-deleted ref, but here's
++ * where we find out it has no reflog anymore.
++ */
+ if (!refs_reflog_exists(ref_store, refname)) {
+ unlock_ref(lock);
+ return 0;
--
2.21.0.360.g471c308f928
next prev parent reply other threads:[~2019-03-15 16:00 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-12 23:28 BUG: Race condition due to reflog expiry in "gc" Ævar Arnfjörð Bjarmason
2019-03-13 1:43 ` Junio C Hamano
2019-03-13 10:28 ` Ævar Arnfjörð Bjarmason
2019-03-13 16:02 ` Jeff King
2019-03-13 16:22 ` Ævar Arnfjörð Bjarmason
2019-03-13 23:54 ` [PATCH 0/5] gc: minor code cleanup + contention fixes Ævar Arnfjörð Bjarmason
2019-03-14 12:34 ` [PATCH v2 0/7] " Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` Ævar Arnfjörð Bjarmason [this message]
2019-03-18 6:13 ` [PATCH v3 0/8] " Junio C Hamano
2019-03-28 16:14 ` [PATCH v4 0/7] gc: tests and handle reflog expire config Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 1/7] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 2/7] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 3/7] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 4/7] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 5/7] reflog tests: test for the "points nowhere" warning Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 6/7] reflog tests: assert lack of early exit with expiry="never" Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 7/7] gc: handle & check gc.reflogExpire config Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 1/8] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 2/8] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 3/8] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 4/8] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 5/8] reflog tests: test for the "points nowhere" warning Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 6/8] reflog tests: assert lack of early exit with expiry="never" Ævar Arnfjörð Bjarmason
2019-03-19 6:20 ` Jeff King
2019-03-15 15:59 ` [PATCH v3 7/8] gc: handle & check gc.reflogExpire config Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 8/8] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
[not found] ` <b870a17d-2103-41b8-3cbc-7389d5fff33a@alum.mit.edu>
2019-03-21 14:10 ` Ævar Arnfjörð Bjarmason
2019-03-19 6:27 ` [PATCH v2 0/7] gc: minor code cleanup + contention fixes Jeff King
2019-03-14 12:34 ` [PATCH v2 1/7] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-14 12:34 ` [PATCH v2 2/7] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-15 9:51 ` Duy Nguyen
2019-03-15 10:42 ` Ævar Arnfjörð Bjarmason
2019-03-15 15:49 ` Johannes Schindelin
2019-03-14 12:34 ` [PATCH v2 3/7] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-14 12:34 ` [PATCH v2 4/7] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-14 12:34 ` [PATCH v2 5/7] reflog: exit early if there's no work to do Ævar Arnfjörð Bjarmason
2019-03-15 10:01 ` Duy Nguyen
2019-03-15 15:51 ` Ævar Arnfjörð Bjarmason
2019-03-14 12:34 ` [PATCH v2 6/7] gc: don't run "reflog expire" when keeping reflogs Ævar Arnfjörð Bjarmason
2019-03-15 10:05 ` Duy Nguyen
2019-03-15 10:24 ` Ævar Arnfjörð Bjarmason
2019-03-15 10:32 ` Duy Nguyen
2019-03-15 15:51 ` Johannes Schindelin
2019-03-18 6:04 ` Junio C Hamano
2019-03-14 12:34 ` [PATCH v2 7/7] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
2019-03-15 11:10 ` Duy Nguyen
2019-03-15 15:52 ` Ævar Arnfjörð Bjarmason
2019-03-13 23:54 ` [PATCH 1/5] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-14 0:25 ` Jeff King
2019-03-13 23:54 ` [PATCH 2/5] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-14 0:26 ` Jeff King
2019-03-13 23:54 ` [PATCH 3/5] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-14 0:30 ` Jeff King
2019-03-13 23:54 ` [PATCH 4/5] gc: don't run "reflog expire" when keeping reflogs Ævar Arnfjörð Bjarmason
2019-03-14 0:40 ` Jeff King
2019-03-14 4:51 ` Junio C Hamano
2019-03-14 19:26 ` Jeff King
2019-03-13 23:54 ` [PATCH 5/5] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
2019-03-14 0:44 ` Jeff King
2019-03-14 0:25 ` BUG: Race condition due to reflog expiry in "gc" Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190315155959.12390-1-avarab@gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jrnieder@gmail.com \
--cc=mhagger@alum.mit.edu \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
--cc=stefanbeller@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).