public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs: verify the tranisd of the to-be-written dirty extent buffer
@ 2022-03-02  1:10 Qu Wenruo
  2022-03-02 18:56 ` David Sterba
  2022-03-07 10:51 ` Nikolay Borisov
  0 siblings, 2 replies; 5+ messages in thread
From: Qu Wenruo @ 2022-03-02  1:10 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Christoph Anton Mitterer

[BUG]
There is a bug report that a bitflip in the transid part of an extent
buffer makes btrfs to reject certain tree blocks:

  BTRFS error (device dm-0): parent transid verify failed on 1382301696 wanted 262166 found 22

[CAUSE]
Note the failed transid check, hex(262166) = 0x40016, while
hex(22) = 0x16.

It's an obvious bitflip.

Furthermore, the reporter also confirmed the bitflip is from the
hardware, so it's a real hardware caused bitflip, and such problem can
not be detected by the existing tree-checker framework.

As tree-checker can only verify the content inside one tree block, while
generation of a tree block can only be verified against its parent.

So such problem remain undetected.

[FIX]
Although tree-checker can not verify it at write-time, we still have a
quick (but not the most accurate) way to catch such obvious corruption.

Function csum_one_extent_buffer() is called before we submit metadata
write.

Thus it means, all the extent buffer passed in should be dirty tree
blocks, and should be newer than last committed transaction.

Using that we can catch the above bitflip.

Although it's not a perfect solution, as if the corrupted generation is
higher than the correct value, we have no way to catch it at all.

Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
Link: https://lore.kernel.org/linux-btrfs/2dfcbc130c55cc6fd067b93752e90bd2b079baca.camel@scientia.org/
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b6a81c39d5f4..a89aa523413b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -441,17 +441,31 @@ static int csum_one_extent_buffer(struct extent_buffer *eb)
 	else
 		ret = btrfs_check_leaf_full(eb);
 
-	if (ret < 0) {
-		btrfs_print_tree(eb, 0);
+	if (ret < 0)
+		goto error;
+
+	/*
+	 * Also check the generation, the eb reached here must be newer than
+	 * last committed. Or something seriously wrong happened.
+	 */
+	if (btrfs_header_generation(eb) <= fs_info->last_trans_committed) {
+		ret = -EUCLEAN;
 		btrfs_err(fs_info,
-			"block=%llu write time tree block corruption detected",
-			eb->start);
-		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
-		return ret;
+			"block=%llu bad generation, have %llu expect > %llu",
+			  eb->start, btrfs_header_generation(eb),
+			  fs_info->last_trans_committed);
+		goto error;
 	}
 	write_extent_buffer(eb, result, 0, fs_info->csum_size);
 
 	return 0;
+error:
+	btrfs_print_tree(eb, 0);
+	btrfs_err(fs_info,
+		"block=%llu write time tree block corruption detected",
+		eb->start);
+	WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+	return ret;
 }
 
 /* Checksum all dirty extent buffers in one bio_vec */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: verify the tranisd of the to-be-written dirty extent buffer
  2022-03-02  1:10 [PATCH] btrfs: verify the tranisd of the to-be-written dirty extent buffer Qu Wenruo
@ 2022-03-02 18:56 ` David Sterba
  2022-03-03  4:20   ` Christoph Anton Mitterer
  2022-03-07 10:51 ` Nikolay Borisov
  1 sibling, 1 reply; 5+ messages in thread
From: David Sterba @ 2022-03-02 18:56 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Christoph Anton Mitterer

On Wed, Mar 02, 2022 at 09:10:21AM +0800, Qu Wenruo wrote:
> [BUG]
> There is a bug report that a bitflip in the transid part of an extent
> buffer makes btrfs to reject certain tree blocks:
> 
>   BTRFS error (device dm-0): parent transid verify failed on 1382301696 wanted 262166 found 22
> 
> [CAUSE]
> Note the failed transid check, hex(262166) = 0x40016, while
> hex(22) = 0x16.
> 
> It's an obvious bitflip.

Oh, quite. I think I've seen only bits flipped from 0 -> 1, this case is
an interesting evidence.

> Furthermore, the reporter also confirmed the bitflip is from the
> hardware, so it's a real hardware caused bitflip, and such problem can
> not be detected by the existing tree-checker framework.
> 
> As tree-checker can only verify the content inside one tree block, while
> generation of a tree block can only be verified against its parent.
> 
> So such problem remain undetected.
> 
> [FIX]
> Although tree-checker can not verify it at write-time, we still have a
> quick (but not the most accurate) way to catch such obvious corruption.
> 
> Function csum_one_extent_buffer() is called before we submit metadata
> write.
> 
> Thus it means, all the extent buffer passed in should be dirty tree
> blocks, and should be newer than last committed transaction.
> 
> Using that we can catch the above bitflip.
> 
> Although it's not a perfect solution, as if the corrupted generation is
> higher than the correct value, we have no way to catch it at all.
> 
> Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
> Link: https://lore.kernel.org/linux-btrfs/2dfcbc130c55cc6fd067b93752e90bd2b079baca.camel@scientia.org/
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Added to misc-next, thanks.

> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -441,17 +441,31 @@ static int csum_one_extent_buffer(struct extent_buffer *eb)
>  	else
>  		ret = btrfs_check_leaf_full(eb);
>  
> -	if (ret < 0) {
> -		btrfs_print_tree(eb, 0);
> +	if (ret < 0)
> +		goto error;
> +
> +	/*
> +	 * Also check the generation, the eb reached here must be newer than
> +	 * last committed. Or something seriously wrong happened.
> +	 */
> +	if (btrfs_header_generation(eb) <= fs_info->last_trans_committed) {

For any cases in regular code that leads to EUCLEAN you can use
unlikely() annotation.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: verify the tranisd of the to-be-written dirty extent buffer
  2022-03-02 18:56 ` David Sterba
@ 2022-03-03  4:20   ` Christoph Anton Mitterer
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Anton Mitterer @ 2022-03-03  4:20 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs

On Wed, 2022-03-02 at 19:56 +0100, David Sterba wrote:
> Oh, quite. I think I've seen only bits flipped from 0 -> 1, this case
> is
> an interesting evidence.

I had actually both cases in that defective RAM module...

Cheers,
Chris.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: verify the tranisd of the to-be-written dirty extent buffer
  2022-03-02  1:10 [PATCH] btrfs: verify the tranisd of the to-be-written dirty extent buffer Qu Wenruo
  2022-03-02 18:56 ` David Sterba
@ 2022-03-07 10:51 ` Nikolay Borisov
  2022-03-07 11:11   ` Qu Wenruo
  1 sibling, 1 reply; 5+ messages in thread
From: Nikolay Borisov @ 2022-03-07 10:51 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Christoph Anton Mitterer



On 2.03.22 г. 3:10 ч., Qu Wenruo wrote:
> [BUG]
> There is a bug report that a bitflip in the transid part of an extent
> buffer makes btrfs to reject certain tree blocks:
> 
>    BTRFS error (device dm-0): parent transid verify failed on 1382301696 wanted 262166 found 22
> 
> [CAUSE]
> Note the failed transid check, hex(262166) = 0x40016, while
> hex(22) = 0x16.
> 
> It's an obvious bitflip.
> 
> Furthermore, the reporter also confirmed the bitflip is from the
> hardware, so it's a real hardware caused bitflip, and such problem can
> not be detected by the existing tree-checker framework.
> 
> As tree-checker can only verify the content inside one tree block, while
> generation of a tree block can only be verified against its parent.
> 
> So such problem remain undetected.
> 
> [FIX]
> Although tree-checker can not verify it at write-time, we still have a
> quick (but not the most accurate) way to catch such obvious corruption.
> 
> Function csum_one_extent_buffer() is called before we submit metadata
> write.
> 
> Thus it means, all the extent buffer passed in should be dirty tree
> blocks, and should be newer than last committed transaction.
> 
> Using that we can catch the above bitflip.
> 
> Although it's not a perfect solution, as if the corrupted generation is
> higher than the correct value, we have no way to catch it at all.
> 
> Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
> Link: https://lore.kernel.org/linux-btrfs/2dfcbc130c55cc6fd067b93752e90bd2b079baca.camel@scientia.org/
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/disk-io.c | 26 ++++++++++++++++++++------
>   1 file changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index b6a81c39d5f4..a89aa523413b 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -441,17 +441,31 @@ static int csum_one_extent_buffer(struct extent_buffer *eb)
>   	else
>   		ret = btrfs_check_leaf_full(eb);
>   
> -	if (ret < 0) {
> -		btrfs_print_tree(eb, 0);
> +	if (ret < 0)
> +		goto error;
> +
> +	/*
> +	 * Also check the generation, the eb reached here must be newer than
> +	 * last committed. Or something seriously wrong happened.
> +	 */
> +	if (btrfs_header_generation(eb) <= fs_info->last_trans_committed) {
> +		ret = -EUCLEAN;
>   		btrfs_err(fs_info,
> -			"block=%llu write time tree block corruption detected",
> -			eb->start);
> -		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> -		return ret;
> +			"block=%llu bad generation, have %llu expect > %llu",
> +			  eb->start, btrfs_header_generation(eb),
> +			  fs_info->last_trans_committed);
> +		goto error;

nit: I'd rather have this check in btrfs_check_node/check_leaf functions 
rather than having just this specific check in csum_one_extent_buffer. 
The only thing which is missing AFAICS is the fact the check function 
don't have a context whether we are checking for read or for write. It 
might make sense to extend them to get a boolean param whether the 
validation is for a write or not ?

>   	}
>   	write_extent_buffer(eb, result, 0, fs_info->csum_size);
>   
>   	return 0;
> +error:
> +	btrfs_print_tree(eb, 0);
> +	btrfs_err(fs_info,
> +		"block=%llu write time tree block corruption detected",
> +		eb->start);
> +	WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> +	return ret;
>   }
>   
>   /* Checksum all dirty extent buffers in one bio_vec */

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: verify the tranisd of the to-be-written dirty extent buffer
  2022-03-07 10:51 ` Nikolay Borisov
@ 2022-03-07 11:11   ` Qu Wenruo
  0 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2022-03-07 11:11 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs; +Cc: Christoph Anton Mitterer



On 2022/3/7 18:51, Nikolay Borisov wrote:
> 
> 
> On 2.03.22 г. 3:10 ч., Qu Wenruo wrote:
>> [BUG]
>> There is a bug report that a bitflip in the transid part of an extent
>> buffer makes btrfs to reject certain tree blocks:
>>
>>    BTRFS error (device dm-0): parent transid verify failed on 
>> 1382301696 wanted 262166 found 22
>>
>> [CAUSE]
>> Note the failed transid check, hex(262166) = 0x40016, while
>> hex(22) = 0x16.
>>
>> It's an obvious bitflip.
>>
>> Furthermore, the reporter also confirmed the bitflip is from the
>> hardware, so it's a real hardware caused bitflip, and such problem can
>> not be detected by the existing tree-checker framework.
>>
>> As tree-checker can only verify the content inside one tree block, while
>> generation of a tree block can only be verified against its parent.
>>
>> So such problem remain undetected.
>>
>> [FIX]
>> Although tree-checker can not verify it at write-time, we still have a
>> quick (but not the most accurate) way to catch such obvious corruption.
>>
>> Function csum_one_extent_buffer() is called before we submit metadata
>> write.
>>
>> Thus it means, all the extent buffer passed in should be dirty tree
>> blocks, and should be newer than last committed transaction.
>>
>> Using that we can catch the above bitflip.
>>
>> Although it's not a perfect solution, as if the corrupted generation is
>> higher than the correct value, we have no way to catch it at all.
>>
>> Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
>> Link: 
>> https://lore.kernel.org/linux-btrfs/2dfcbc130c55cc6fd067b93752e90bd2b079baca.camel@scientia.org/ 
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/disk-io.c | 26 ++++++++++++++++++++------
>>   1 file changed, 20 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index b6a81c39d5f4..a89aa523413b 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -441,17 +441,31 @@ static int csum_one_extent_buffer(struct 
>> extent_buffer *eb)
>>       else
>>           ret = btrfs_check_leaf_full(eb);
>> -    if (ret < 0) {
>> -        btrfs_print_tree(eb, 0);
>> +    if (ret < 0)
>> +        goto error;
>> +
>> +    /*
>> +     * Also check the generation, the eb reached here must be newer than
>> +     * last committed. Or something seriously wrong happened.
>> +     */
>> +    if (btrfs_header_generation(eb) <= fs_info->last_trans_committed) {
>> +        ret = -EUCLEAN;
>>           btrfs_err(fs_info,
>> -            "block=%llu write time tree block corruption detected",
>> -            eb->start);
>> -        WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
>> -        return ret;
>> +            "block=%llu bad generation, have %llu expect > %llu",
>> +              eb->start, btrfs_header_generation(eb),
>> +              fs_info->last_trans_committed);
>> +        goto error;
> 
> nit: I'd rather have this check in btrfs_check_node/check_leaf functions 
> rather than having just this specific check in csum_one_extent_buffer. 
> The only thing which is missing AFAICS is the fact the check function 
> don't have a context whether we are checking for read or for write. It 
> might make sense to extend them to get a boolean param whether the 
> validation is for a write or not ?

Not only it lacks a bool, but also needs a u64 root_owner.


However, I really want to keep check_leaf/node() simple, just an eb, and 
all checks are based on the info from that eb, no extra ones.

Adding one u64 will not be a problem, but I doubt if we begin this 
trend, we may add more and more parameters for that simple function, and 
make it no longer that simple.

Thanks,
Qu

> 
>>       }
>>       write_extent_buffer(eb, result, 0, fs_info->csum_size);
>>       return 0;
>> +error:
>> +    btrfs_print_tree(eb, 0);
>> +    btrfs_err(fs_info,
>> +        "block=%llu write time tree block corruption detected",
>> +        eb->start);
>> +    WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
>> +    return ret;
>>   }
>>   /* Checksum all dirty extent buffers in one bio_vec */
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-03-07 11:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-03-02  1:10 [PATCH] btrfs: verify the tranisd of the to-be-written dirty extent buffer Qu Wenruo
2022-03-02 18:56 ` David Sterba
2022-03-03  4:20   ` Christoph Anton Mitterer
2022-03-07 10:51 ` Nikolay Borisov
2022-03-07 11:11   ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox