linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] jbd2: avoid mount failed when commit block is partial submitted
@ 2024-04-25  6:45 Ye Bin
  2024-04-29 19:14 ` Jan Kara
  2024-06-20  2:50 ` Theodore Ts'o
  0 siblings, 2 replies; 4+ messages in thread
From: Ye Bin @ 2024-04-25  6:45 UTC (permalink / raw)
  To: tytso, adilger.kernel, linux-ext4; +Cc: jack, Ye Bin

From: Ye Bin <yebin10@huawei.com>

We encountered a problem that the file system could not be mounted in
the power-off scenario. The analysis of the file system mirror shows that
only part of the data is written to the last commit block.
The valid data of the commit block is concentrated in the first sector.
However, the data of the entire block is involved in the checksum calculation.
For different hardware, the minimum atomic unit may be different.
If the checksum of a committed block is incorrect, clear the data except the
'commit_header' and then calculate the checksum. If the checkusm is correct,
it is considered that the block is partially committed. However, if there are
valid description/revoke blocks, it is considered that the data is abnormal
and the log replay is stopped.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 fs/jbd2/recovery.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index 1f7664984d6e..594bf02a709f 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -443,6 +443,27 @@ static int jbd2_commit_block_csum_verify(journal_t *j, void *buf)
 	return provided == cpu_to_be32(calculated);
 }
 
+static bool jbd2_commit_block_csum_verify_partial(journal_t *j, void *buf)
+{
+	struct commit_header *h;
+	__be32 provided;
+	__u32 calculated;
+	void *tmpbuf;
+
+	tmpbuf = kzalloc(j->j_blocksize, GFP_KERNEL);
+	if (!tmpbuf)
+		return false;
+
+	memcpy(tmpbuf, buf, sizeof(struct commit_header));
+	h = tmpbuf;
+	provided = h->h_chksum[0];
+	h->h_chksum[0] = 0;
+	calculated = jbd2_chksum(j, j->j_csum_seed, tmpbuf, j->j_blocksize);
+	kfree(tmpbuf);
+
+	return provided == cpu_to_be32(calculated);
+}
+
 static int jbd2_block_tag_csum_verify(journal_t *j, journal_block_tag_t *tag,
 				      journal_block_tag3_t *tag3,
 				      void *buf, __u32 sequence)
@@ -479,6 +500,7 @@ static int do_one_pass(journal_t *journal,
 	int			descr_csum_size = 0;
 	int			block_error = 0;
 	bool			need_check_commit_time = false;
+	bool                    has_partial_commit = false;
 	__u64			last_trans_commit_time = 0, commit_time;
 
 	/*
@@ -590,6 +612,14 @@ static int do_one_pass(journal_t *journal,
 					next_log_block);
 			}
 
+			if (pass == PASS_SCAN && has_partial_commit) {
+				pr_err("JBD2: Detect validate descriptor block %lu after incomplete commit block\n",
+				       next_log_block);
+				err = -EFSBADCRC;
+				brelse(bh);
+				goto failed;
+			}
+
 			/* If it is a valid descriptor block, replay it
 			 * in pass REPLAY; if journal_checksums enabled, then
 			 * calculate checksums in PASS_SCAN, otherwise,
@@ -810,6 +840,14 @@ static int do_one_pass(journal_t *journal,
 			if (pass == PASS_SCAN &&
 			    !jbd2_commit_block_csum_verify(journal,
 							   bh->b_data)) {
+				if (jbd2_commit_block_csum_verify_partial(
+								  journal,
+								  bh->b_data)) {
+					pr_notice("JBD2: Find incomplete commit block in transaction %u block %lu\n",
+						  next_commit_ID, next_log_block);
+					has_partial_commit = true;
+					goto chksum_ok;
+				}
 			chksum_error:
 				if (commit_time < last_trans_commit_time)
 					goto ignore_crc_mismatch;
@@ -824,6 +862,7 @@ static int do_one_pass(journal_t *journal,
 				}
 			}
 			if (pass == PASS_SCAN) {
+			chksum_ok:
 				last_trans_commit_time = commit_time;
 				head_block = next_log_block;
 			}
@@ -843,6 +882,15 @@ static int do_one_pass(journal_t *journal,
 					  next_log_block);
 				need_check_commit_time = true;
 			}
+
+			if (pass == PASS_SCAN && has_partial_commit) {
+				pr_err("JBD2: Detect validate revoke block %lu after incomplete commit block\n",
+				       next_log_block);
+				err = -EFSBADCRC;
+				brelse(bh);
+				goto failed;
+			}
+
 			/* If we aren't in the REVOKE pass, then we can
 			 * just skip over this block. */
 			if (pass != PASS_REVOKE) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] jbd2: avoid mount failed when commit block is partial submitted
  2024-04-25  6:45 [PATCH v3] jbd2: avoid mount failed when commit block is partial submitted Ye Bin
@ 2024-04-29 19:14 ` Jan Kara
  2024-06-20  2:50 ` Theodore Ts'o
  1 sibling, 0 replies; 4+ messages in thread
From: Jan Kara @ 2024-04-29 19:14 UTC (permalink / raw)
  To: Ye Bin; +Cc: tytso, adilger.kernel, linux-ext4, jack, Ye Bin

On Thu 25-04-24 14:45:15, Ye Bin wrote:
> From: Ye Bin <yebin10@huawei.com>
> 
> We encountered a problem that the file system could not be mounted in
> the power-off scenario. The analysis of the file system mirror shows that
> only part of the data is written to the last commit block.
> The valid data of the commit block is concentrated in the first sector.
> However, the data of the entire block is involved in the checksum calculation.
> For different hardware, the minimum atomic unit may be different.
> If the checksum of a committed block is incorrect, clear the data except the
> 'commit_header' and then calculate the checksum. If the checkusm is correct,
> it is considered that the block is partially committed. However, if there are
> valid description/revoke blocks, it is considered that the data is abnormal
> and the log replay is stopped.
> 
> Signed-off-by: Ye Bin <yebin10@huawei.com>

The patch as a fix looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

I'd just note that the maze of branches and gotos in do_one_pass() is
becoming very hard to follow so ideally we should come up with some
refactoring of the function so that it is easier to follow. But that's
definitely something for a separate patch.

								Honza

> ---
>  fs/jbd2/recovery.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
> 
> diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
> index 1f7664984d6e..594bf02a709f 100644
> --- a/fs/jbd2/recovery.c
> +++ b/fs/jbd2/recovery.c
> @@ -443,6 +443,27 @@ static int jbd2_commit_block_csum_verify(journal_t *j, void *buf)
>  	return provided == cpu_to_be32(calculated);
>  }
>  
> +static bool jbd2_commit_block_csum_verify_partial(journal_t *j, void *buf)
> +{
> +	struct commit_header *h;
> +	__be32 provided;
> +	__u32 calculated;
> +	void *tmpbuf;
> +
> +	tmpbuf = kzalloc(j->j_blocksize, GFP_KERNEL);
> +	if (!tmpbuf)
> +		return false;
> +
> +	memcpy(tmpbuf, buf, sizeof(struct commit_header));
> +	h = tmpbuf;
> +	provided = h->h_chksum[0];
> +	h->h_chksum[0] = 0;
> +	calculated = jbd2_chksum(j, j->j_csum_seed, tmpbuf, j->j_blocksize);
> +	kfree(tmpbuf);
> +
> +	return provided == cpu_to_be32(calculated);
> +}
> +
>  static int jbd2_block_tag_csum_verify(journal_t *j, journal_block_tag_t *tag,
>  				      journal_block_tag3_t *tag3,
>  				      void *buf, __u32 sequence)
> @@ -479,6 +500,7 @@ static int do_one_pass(journal_t *journal,
>  	int			descr_csum_size = 0;
>  	int			block_error = 0;
>  	bool			need_check_commit_time = false;
> +	bool                    has_partial_commit = false;
>  	__u64			last_trans_commit_time = 0, commit_time;
>  
>  	/*
> @@ -590,6 +612,14 @@ static int do_one_pass(journal_t *journal,
>  					next_log_block);
>  			}
>  
> +			if (pass == PASS_SCAN && has_partial_commit) {
> +				pr_err("JBD2: Detect validate descriptor block %lu after incomplete commit block\n",
> +				       next_log_block);
> +				err = -EFSBADCRC;
> +				brelse(bh);
> +				goto failed;
> +			}
> +
>  			/* If it is a valid descriptor block, replay it
>  			 * in pass REPLAY; if journal_checksums enabled, then
>  			 * calculate checksums in PASS_SCAN, otherwise,
> @@ -810,6 +840,14 @@ static int do_one_pass(journal_t *journal,
>  			if (pass == PASS_SCAN &&
>  			    !jbd2_commit_block_csum_verify(journal,
>  							   bh->b_data)) {
> +				if (jbd2_commit_block_csum_verify_partial(
> +								  journal,
> +								  bh->b_data)) {
> +					pr_notice("JBD2: Find incomplete commit block in transaction %u block %lu\n",
> +						  next_commit_ID, next_log_block);
> +					has_partial_commit = true;
> +					goto chksum_ok;
> +				}
>  			chksum_error:
>  				if (commit_time < last_trans_commit_time)
>  					goto ignore_crc_mismatch;
> @@ -824,6 +862,7 @@ static int do_one_pass(journal_t *journal,
>  				}
>  			}
>  			if (pass == PASS_SCAN) {
> +			chksum_ok:
>  				last_trans_commit_time = commit_time;
>  				head_block = next_log_block;
>  			}
> @@ -843,6 +882,15 @@ static int do_one_pass(journal_t *journal,
>  					  next_log_block);
>  				need_check_commit_time = true;
>  			}
> +
> +			if (pass == PASS_SCAN && has_partial_commit) {
> +				pr_err("JBD2: Detect validate revoke block %lu after incomplete commit block\n",
> +				       next_log_block);
> +				err = -EFSBADCRC;
> +				brelse(bh);
> +				goto failed;
> +			}
> +
>  			/* If we aren't in the REVOKE pass, then we can
>  			 * just skip over this block. */
>  			if (pass != PASS_REVOKE) {
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] jbd2: avoid mount failed when commit block is partial submitted
  2024-04-25  6:45 [PATCH v3] jbd2: avoid mount failed when commit block is partial submitted Ye Bin
  2024-04-29 19:14 ` Jan Kara
@ 2024-06-20  2:50 ` Theodore Ts'o
  2024-06-20  6:32   ` yebin (H)
  1 sibling, 1 reply; 4+ messages in thread
From: Theodore Ts'o @ 2024-06-20  2:50 UTC (permalink / raw)
  To: Ye Bin; +Cc: adilger.kernel, linux-ext4, jack, Ye Bin

Apologies for not getting back to this until now; I was focused on
finalizing changes for the merge window, and then I was on vacation
for the 3 or 4 weeks.

On Thu, Apr 25, 2024 at 02:45:15PM +0800, Ye Bin wrote:
> From: Ye Bin <yebin10@huawei.com>
> 
> We encountered a problem that the file system could not be mounted in
> the power-off scenario. The analysis of the file system mirror shows that
> only part of the data is written to the last commit block.
> The valid data of the commit block is concentrated in the first sector.
> However, the data of the entire block is involved in the checksum calculation.
> For different hardware, the minimum atomic unit may be different.
> If the checksum of a committed block is incorrect, clear the data except the
> 'commit_header' and then calculate the checksum. If the checkusm is correct,
> it is considered that the block is partially committed.

This makes a lot of sense; thanks for changing the patch to do this.

> However, if there are valid description/revoke blocks, it is
> considered that the data is abnormal and the log replay is stopped.

I'm not sure I understand your thinking behind this part of the patch,
though.  The description/revoke blocks will have their own checksum,
and while I grant that it would be... highly unusual for the commit
block to be partially written as the result of a torn write, and then
for there to be subsequent valid descriptor or revoke blocks (which
would presumably be part of the next transaction), I wonder if the
extra complexity is worth it.

I can't think of a situation where this might happen other than say, a
bit flip in the portion of commit block where we don't care about its
contents; but in that case, after zeroing out parts of the commit
block that we don't care about, if the checksum is valid, presumably
we would have managed to luckily recover from the bit flip.  So
continuing shouldn't be risky.

What am I missing?

						- Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] jbd2: avoid mount failed when commit block is partial submitted
  2024-06-20  2:50 ` Theodore Ts'o
@ 2024-06-20  6:32   ` yebin (H)
  0 siblings, 0 replies; 4+ messages in thread
From: yebin (H) @ 2024-06-20  6:32 UTC (permalink / raw)
  To: Theodore Ts'o, Ye Bin; +Cc: adilger.kernel, linux-ext4, jack



On 2024/6/20 10:50, Theodore Ts'o wrote:
> Apologies for not getting back to this until now; I was focused on
> finalizing changes for the merge window, and then I was on vacation
> for the 3 or 4 weeks.
>
> On Thu, Apr 25, 2024 at 02:45:15PM +0800, Ye Bin wrote:
>> From: Ye Bin <yebin10@huawei.com>
>>
>> We encountered a problem that the file system could not be mounted in
>> the power-off scenario. The analysis of the file system mirror shows that
>> only part of the data is written to the last commit block.
>> The valid data of the commit block is concentrated in the first sector.
>> However, the data of the entire block is involved in the checksum calculation.
>> For different hardware, the minimum atomic unit may be different.
>> If the checksum of a committed block is incorrect, clear the data except the
>> 'commit_header' and then calculate the checksum. If the checkusm is correct,
>> it is considered that the block is partially committed.
> This makes a lot of sense; thanks for changing the patch to do this.
>
>> However, if there are valid description/revoke blocks, it is
>> considered that the data is abnormal and the log replay is stopped.
> I'm not sure I understand your thinking behind this part of the patch,
> though.  The description/revoke blocks will have their own checksum,
> and while I grant that it would be... highly unusual for the commit
> block to be partially written as the result of a torn write, and then
> for there to be subsequent valid descriptor or revoke blocks (which
> would presumably be part of the next transaction), I wonder if the
> extra complexity is worth it.
>
> I can't think of a situation where this might happen other than say, a
> bit flip in the portion of commit block where we don't care about its
> contents; but in that case, after zeroing out parts of the commit
> block that we don't care about, if the checksum is valid, presumably
> we would have managed to luckily recover from the bit flip.  So
> continuing shouldn't be risky.
We cannot fundamentally solve malicious data tampering by searching for 
valid logs
through scanning. My idea in doing this is that even if the kernel knows 
there is a
problem with the data and insists on replaying logs, it is not good. We 
should let the
user decide what to do. But when i think about it, doing log recovery 
based on what
we believe to be the correct scan results is actually a presumptuous 
claim. I agree with
your point of view that the problem should not be complicated.
I will delete these judgments and send another version.
> What am I missing?
>
> 						- Ted
> .
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-06-20  6:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-25  6:45 [PATCH v3] jbd2: avoid mount failed when commit block is partial submitted Ye Bin
2024-04-29 19:14 ` Jan Kara
2024-06-20  2:50 ` Theodore Ts'o
2024-06-20  6:32   ` yebin (H)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).