public inbox for linux-bcachefs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: linux-bcachefs@vger.kernel.org
Subject: [PATCH 3/5] bcachefs: refactor journal stuck checking into standalone helper
Date: Tue, 21 Mar 2023 09:20:12 -0400	[thread overview]
Message-ID: <20230321132014.1438249-4-bfoster@redhat.com> (raw)
In-Reply-To: <20230321132014.1438249-1-bfoster@redhat.com>

bcachefs checks for journal stuck conditions both in the journal
space calculation code and the journal reservation slow path. The
logic in both places is rather tricky and can result in
non-deterministic failure characteristics and debug output.

In preparation to condense journal stuck handling to a single place,
refactor the __journal_res_get() logic into a standalone helper.
Since multiple callers into the reservation code can result in
duplicate reports, use the ->err_seq field as a serialization
mechanism for the debug dump. Finally, add some comments to help
explain the logic and hopefully facilitate further improvements in
the future.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/bcachefs/journal.c | 85 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 63 insertions(+), 22 deletions(-)

diff --git a/fs/bcachefs/journal.c b/fs/bcachefs/journal.c
index f521f733e180..3f0e6d71aa32 100644
--- a/fs/bcachefs/journal.c
+++ b/fs/bcachefs/journal.c
@@ -76,6 +76,67 @@ static void journal_pin_list_init(struct journal_entry_pin_list *p, int count)
 	p->devs.nr = 0;
 }
 
+/*
+ * Detect stuck journal conditions and trigger shutdown. Technically the journal
+ * can end up stuck for a variety of reasons, such as a blocked I/O, journal
+ * reservation lockup, etc. Since this is a fatal error with potentially
+ * unpredictable characteristics, we want to be fairly conservative before we
+ * decide to shut things down.
+ *
+ * Consider the journal stuck when it appears full with no ability to commit
+ * btree transactions, to discard journal buckets, nor acquire priority
+ * (reserved watermark) reservation.
+ */
+static inline bool
+journal_error_check_stuck(struct journal *j, int error, unsigned flags)
+{
+	struct bch_fs *c = container_of(j, struct bch_fs, journal);
+	bool stuck = false;
+	struct printbuf buf = PRINTBUF;
+
+	if (!(error == JOURNAL_ERR_journal_full ||
+	      error == JOURNAL_ERR_journal_pin_full) ||
+	    nr_unwritten_journal_entries(j) ||
+	    (flags & JOURNAL_WATERMARK_MASK) != JOURNAL_WATERMARK_reserved)
+		return stuck;
+
+	spin_lock(&j->lock);
+
+	if (j->can_discard) {
+		spin_unlock(&j->lock);
+		return stuck;
+	}
+
+	stuck = true;
+
+	/*
+	 * The journal shutdown path will set ->err_seq, but do it here first to
+	 * serialize against concurrent failures and avoid duplicate error
+	 * reports.
+	 */
+	if (j->err_seq) {
+		spin_unlock(&j->lock);
+		return stuck;
+	}
+	j->err_seq = journal_cur_seq(j);
+	spin_unlock(&j->lock);
+
+	bch_err(c, "Journal stuck! Hava a pre-reservation but journal full (error %s)",
+		bch2_journal_errors[error]);
+	bch2_journal_debug_to_text(&buf, j);
+	bch_err(c, "%s", buf.buf);
+
+	printbuf_reset(&buf);
+	bch2_journal_pins_to_text(&buf, j);
+	bch_err(c, "Journal pins:\n%s", buf.buf);
+	printbuf_exit(&buf);
+
+	bch2_fatal_error(c);
+	dump_stack();
+
+	return stuck;
+}
+
 /* journal entry close/open: */
 
 void __bch2_journal_buf_put(struct journal *j)
@@ -417,28 +478,8 @@ static int __journal_res_get(struct journal *j, struct journal_res *res,
 
 	if (!ret)
 		goto retry;
-
-	if ((ret == JOURNAL_ERR_journal_full ||
-	     ret == JOURNAL_ERR_journal_pin_full) &&
-	    !can_discard &&
-	    !nr_unwritten_journal_entries(j) &&
-	    (flags & JOURNAL_WATERMARK_MASK) == JOURNAL_WATERMARK_reserved) {
-		struct printbuf buf = PRINTBUF;
-
-		bch_err(c, "Journal stuck! Hava a pre-reservation but journal full (ret %s)",
-			bch2_journal_errors[ret]);
-
-		bch2_journal_debug_to_text(&buf, j);
-		bch_err(c, "%s", buf.buf);
-
-		printbuf_reset(&buf);
-		bch2_journal_pins_to_text(&buf, j);
-		bch_err(c, "Journal pins:\n%s", buf.buf);
-
-		printbuf_exit(&buf);
-		bch2_fatal_error(c);
-		dump_stack();
-	}
+	if (journal_error_check_stuck(j, ret, flags))
+		ret = -BCH_ERR_journal_res_get_blocked;
 
 	/*
 	 * Journal is full - can't rely on reclaim from work item due to
-- 
2.39.2


  parent reply	other threads:[~2023-03-21 13:19 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-21 13:20 [PATCH 0/5] bcachefs: journal stall fixes Brian Foster
2023-03-21 13:20 ` [PATCH 1/5] bcachefs: more aggressive fast path write buffer key flushing Brian Foster
2023-03-21 13:40   ` Brian Foster
2023-03-21 13:20 ` [PATCH 2/5] bcachefs: gracefully unwind journal res slowpath on shutdown Brian Foster
2023-03-21 13:20 ` Brian Foster [this message]
2023-03-21 13:20 ` [PATCH 4/5] bcachefs: drop unnecessary journal stuck check from space calculation Brian Foster
2023-03-21 13:20 ` [PATCH 5/5] RFC: bcachefs: use a timeout for the journal stuck condition Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230321132014.1438249-4-bfoster@redhat.com \
    --to=bfoster@redhat.com \
    --cc=linux-bcachefs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox