[PATCH v2 1/1] jbd2: gracefully abort on checkpointing state corruptions

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 1/1] jbd2: gracefully abort on checkpointing state corruptions
@ 2026-03-11  4:15 Milos Nikic
  2026-03-17 12:28 ` Jan Kara
  2026-03-28  5:31 ` Theodore Ts'o
  0 siblings, 2 replies; 3+ messages in thread
From: Milos Nikic @ 2026-03-11  4:15 UTC (permalink / raw)
  To: jack
  Cc: tytso, linux-ext4, linux-kernel, Milos Nikic, Andreas Dilger,
	Zhang Yi, Baokun Li

This patch targets two internal state machine invariants in checkpoint.c
residing inside functions that natively return integer error codes.

- In jbd2_cleanup_journal_tail(): A blocknr of 0 indicates a severely
corrupted journal superblock. Replaced the J_ASSERT with a WARN_ON_ONCE
and a graceful journal abort, returning -EFSCORRUPTED.

- In jbd2_log_do_checkpoint(): Replaced the J_ASSERT_BH checking for
an unexpected buffer_jwrite state. If the warning triggers, we
explicitly drop the just-taken get_bh() reference and call __flush_batch()
to safely clean up any previously queued buffers in the j_chkpt_bhs array,
preventing a memory leak before returning -EFSCORRUPTED.

Signed-off-by: Milos Nikic <nikic.milos@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Baokun Li <libaokun@linux.alibaba.com>
---
 Changes in v2:

    Replaced the -EUCLEAN error code with -EFSCORRUPTED to better align with ext4/jbd2 semantics for on-disk metadata inconsistencies (per Baokun's review).

    Reordered the error path in jbd2_log_do_checkpoint() so that jbd2_journal_abort() is called after __flush_batch(). This ensures cleanly batched buffers are logically flushed before the journal kill switch is flipped.

    Collected Reviewed-by tags from Andreas Dilger, Zhang Yi, and Baokun Li.

Changes in v1:

    Initial implementation converting J_ASSERTs in jbd2_cleanup_journal_tail() and jbd2_log_do_checkpoint() to WARN_ON_ONCE and graceful journal aborts.

 fs/jbd2/checkpoint.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index de89c5bef607..1508e2f54462 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -267,7 +267,15 @@ int jbd2_log_do_checkpoint(journal_t *journal)
 			 */
 			BUFFER_TRACE(bh, "queue");
 			get_bh(bh);
-			J_ASSERT_BH(bh, !buffer_jwrite(bh));
+			if (WARN_ON_ONCE(buffer_jwrite(bh))) {
+				put_bh(bh); /* drop the ref we just took */
+				spin_unlock(&journal->j_list_lock);
+				/* Clean up any previously batched buffers */
+				if (batch_count)
+					__flush_batch(journal, &batch_count);
+				jbd2_journal_abort(journal, -EFSCORRUPTED);
+				return -EFSCORRUPTED;
+			}
 			journal->j_chkpt_bhs[batch_count++] = bh;
 			transaction->t_chp_stats.cs_written++;
 			transaction->t_checkpoint_list = jh->b_cpnext;
@@ -325,7 +333,10 @@ int jbd2_cleanup_journal_tail(journal_t *journal)
 
 	if (!jbd2_journal_get_log_tail(journal, &first_tid, &blocknr))
 		return 1;
-	J_ASSERT(blocknr != 0);
+	if (WARN_ON_ONCE(blocknr == 0)) {
+		jbd2_journal_abort(journal, -EFSCORRUPTED);
+		return -EFSCORRUPTED;
+	}
 
 	/*
 	 * We need to make sure that any blocks that were recently written out
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2 1/1] jbd2: gracefully abort on checkpointing state corruptions
  2026-03-11  4:15 [PATCH v2 1/1] jbd2: gracefully abort on checkpointing state corruptions Milos Nikic
@ 2026-03-17 12:28 ` Jan Kara
  2026-03-28  5:31 ` Theodore Ts'o
  1 sibling, 0 replies; 3+ messages in thread
From: Jan Kara @ 2026-03-17 12:28 UTC (permalink / raw)
  To: Milos Nikic
  Cc: jack, tytso, linux-ext4, linux-kernel, Andreas Dilger, Zhang Yi,
	Baokun Li

On Tue 10-03-26 21:15:48, Milos Nikic wrote:
> This patch targets two internal state machine invariants in checkpoint.c
> residing inside functions that natively return integer error codes.
> 
> - In jbd2_cleanup_journal_tail(): A blocknr of 0 indicates a severely
> corrupted journal superblock. Replaced the J_ASSERT with a WARN_ON_ONCE
> and a graceful journal abort, returning -EFSCORRUPTED.
> 
> - In jbd2_log_do_checkpoint(): Replaced the J_ASSERT_BH checking for
> an unexpected buffer_jwrite state. If the warning triggers, we
> explicitly drop the just-taken get_bh() reference and call __flush_batch()
> to safely clean up any previously queued buffers in the j_chkpt_bhs array,
> preventing a memory leak before returning -EFSCORRUPTED.
> 
> Signed-off-by: Milos Nikic <nikic.milos@gmail.com>
> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
> Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
> Reviewed-by: Baokun Li <libaokun@linux.alibaba.com>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  Changes in v2:
> 
>     Replaced the -EUCLEAN error code with -EFSCORRUPTED to better align with ext4/jbd2 semantics for on-disk metadata inconsistencies (per Baokun's review).
> 
>     Reordered the error path in jbd2_log_do_checkpoint() so that jbd2_journal_abort() is called after __flush_batch(). This ensures cleanly batched buffers are logically flushed before the journal kill switch is flipped.
> 
>     Collected Reviewed-by tags from Andreas Dilger, Zhang Yi, and Baokun Li.
> 
> Changes in v1:
> 
>     Initial implementation converting J_ASSERTs in jbd2_cleanup_journal_tail() and jbd2_log_do_checkpoint() to WARN_ON_ONCE and graceful journal aborts.
> 
>  fs/jbd2/checkpoint.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
> index de89c5bef607..1508e2f54462 100644
> --- a/fs/jbd2/checkpoint.c
> +++ b/fs/jbd2/checkpoint.c
> @@ -267,7 +267,15 @@ int jbd2_log_do_checkpoint(journal_t *journal)
>  			 */
>  			BUFFER_TRACE(bh, "queue");
>  			get_bh(bh);
> -			J_ASSERT_BH(bh, !buffer_jwrite(bh));
> +			if (WARN_ON_ONCE(buffer_jwrite(bh))) {
> +				put_bh(bh); /* drop the ref we just took */
> +				spin_unlock(&journal->j_list_lock);
> +				/* Clean up any previously batched buffers */
> +				if (batch_count)
> +					__flush_batch(journal, &batch_count);
> +				jbd2_journal_abort(journal, -EFSCORRUPTED);
> +				return -EFSCORRUPTED;
> +			}
>  			journal->j_chkpt_bhs[batch_count++] = bh;
>  			transaction->t_chp_stats.cs_written++;
>  			transaction->t_checkpoint_list = jh->b_cpnext;
> @@ -325,7 +333,10 @@ int jbd2_cleanup_journal_tail(journal_t *journal)
>  
>  	if (!jbd2_journal_get_log_tail(journal, &first_tid, &blocknr))
>  		return 1;
> -	J_ASSERT(blocknr != 0);
> +	if (WARN_ON_ONCE(blocknr == 0)) {
> +		jbd2_journal_abort(journal, -EFSCORRUPTED);
> +		return -EFSCORRUPTED;
> +	}
>  
>  	/*
>  	 * We need to make sure that any blocks that were recently written out
> -- 
> 2.53.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2 1/1] jbd2: gracefully abort on checkpointing state corruptions
  2026-03-11  4:15 [PATCH v2 1/1] jbd2: gracefully abort on checkpointing state corruptions Milos Nikic
  2026-03-17 12:28 ` Jan Kara
@ 2026-03-28  5:31 ` Theodore Ts'o
  1 sibling, 0 replies; 3+ messages in thread
From: Theodore Ts'o @ 2026-03-28  5:31 UTC (permalink / raw)
  To: jack, Milos Nikic
  Cc: Theodore Ts'o, linux-ext4, linux-kernel, Andreas Dilger,
	Zhang Yi, Baokun Li


On Tue, 10 Mar 2026 21:15:48 -0700, Milos Nikic wrote:
> This patch targets two internal state machine invariants in checkpoint.c
> residing inside functions that natively return integer error codes.
> 
> - In jbd2_cleanup_journal_tail(): A blocknr of 0 indicates a severely
> corrupted journal superblock. Replaced the J_ASSERT with a WARN_ON_ONCE
> and a graceful journal abort, returning -EFSCORRUPTED.
> 
> [...]

Applied, thanks!

[1/1] jbd2: gracefully abort on checkpointing state corruptions
      commit: bac3190a8e79beff6ed221975e0c9b1b5f2a21da

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-03-28  5:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-11  4:15 [PATCH v2 1/1] jbd2: gracefully abort on checkpointing state corruptions Milos Nikic
2026-03-17 12:28 ` Jan Kara
2026-03-28  5:31 ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox