[PATCH] btrfs: replace a wrong memset() with memzero

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] btrfs: replace a wrong memset() with memzero_page()
@ 2022-03-25  9:37 Qu Wenruo
  2022-03-25 10:10 ` Johannes Thumshirn
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Qu Wenruo @ 2022-03-25  9:37 UTC (permalink / raw)
  To: linux-btrfs

The original code is not really setting the memory to 0x00 but 0x01.

To prevent such problem from happening, use memzero_page() instead.

Since we're here, also make @len const since it's just sectorsize.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 78a5145353e1..179d479b72e9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3272,7 +3272,7 @@ static int check_data_csum(struct inode *inode, struct btrfs_bio *bbio,
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	char *kaddr;
-	u32 len = fs_info->sectorsize;
+	const u32 len = fs_info->sectorsize;
 	const u32 csum_size = fs_info->csum_size;
 	unsigned int offset_sectors;
 	u8 *csum_expected;
@@ -3287,11 +3287,11 @@ static int check_data_csum(struct inode *inode, struct btrfs_bio *bbio,
 	shash->tfm = fs_info->csum_shash;
 
 	crypto_shash_digest(shash, kaddr + pgoff, len, csum);
+	kunmap_atomic(kaddr);
 
 	if (memcmp(csum, csum_expected, csum_size))
 		goto zeroit;
 
-	kunmap_atomic(kaddr);
 	return 0;
 zeroit:
 	btrfs_print_data_csum_error(BTRFS_I(inode), start, csum, csum_expected,
@@ -3299,9 +3299,8 @@ static int check_data_csum(struct inode *inode, struct btrfs_bio *bbio,
 	if (bbio->device)
 		btrfs_dev_stat_inc_and_print(bbio->device,
 					     BTRFS_DEV_STAT_CORRUPTION_ERRS);
-	memset(kaddr + pgoff, 1, len);
+	memzero_page(page, pgoff, len);
 	flush_dcache_page(page);
-	kunmap_atomic(kaddr);
 	return -EIO;
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-25  9:37 [PATCH] btrfs: replace a wrong memset() with memzero_page() Qu Wenruo
@ 2022-03-25 10:10 ` Johannes Thumshirn
  2022-03-25 10:49   ` Qu Wenruo
  2022-03-25 16:38 ` Christoph Hellwig
  2022-03-28 18:51 ` David Sterba
  2 siblings, 1 reply; 23+ messages in thread
From: Johannes Thumshirn @ 2022-03-25 10:10 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs@vger.kernel.org

On 25/03/2022 10:38, Qu Wenruo wrote:
> The original code is not really setting the memory to 0x00 but 0x01.
> 
> To prevent such problem from happening, use memzero_page() instead.
> 
> Since we're here, also make @len const since it's just sectorsize.

Any idea why we're setting it to 1? It's been this way since 07157aacb1ec
("Btrfs: Add file data csums back in via hooks in the extent map code")
which landed in v2.6.29. So basically since the dawn of time.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-25 10:10 ` Johannes Thumshirn
@ 2022-03-25 10:49   ` Qu Wenruo
  2022-03-25 10:58     ` Johannes Thumshirn
  0 siblings, 1 reply; 23+ messages in thread
From: Qu Wenruo @ 2022-03-25 10:49 UTC (permalink / raw)
  To: Johannes Thumshirn, Qu Wenruo, linux-btrfs@vger.kernel.org



On 2022/3/25 18:10, Johannes Thumshirn wrote:
> On 25/03/2022 10:38, Qu Wenruo wrote:
>> The original code is not really setting the memory to 0x00 but 0x01.
>>
>> To prevent such problem from happening, use memzero_page() instead.
>>
>> Since we're here, also make @len const since it's just sectorsize.
>
> Any idea why we're setting it to 1? It's been this way since 07157aacb1ec
> ("Btrfs: Add file data csums back in via hooks in the extent map code")
> which landed in v2.6.29. So basically since the dawn of time.

No idea at all, maybe just a typo.

Thanks,
Qu
>
>
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-25 10:49   ` Qu Wenruo
@ 2022-03-25 10:58     ` Johannes Thumshirn
  0 siblings, 0 replies; 23+ messages in thread
From: Johannes Thumshirn @ 2022-03-25 10:58 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs@vger.kernel.org

On 25/03/2022 11:49, Qu Wenruo wrote:
> 
> 
> On 2022/3/25 18:10, Johannes Thumshirn wrote:
>> On 25/03/2022 10:38, Qu Wenruo wrote:
>>> The original code is not really setting the memory to 0x00 but 0x01.
>>>
>>> To prevent such problem from happening, use memzero_page() instead.
>>>
>>> Since we're here, also make @len const since it's just sectorsize.
>>
>> Any idea why we're setting it to 1? It's been this way since 07157aacb1ec
>> ("Btrfs: Add file data csums back in via hooks in the extent map code")
>> which landed in v2.6.29. So basically since the dawn of time.
> 
> No idea at all, maybe just a typo.

Anyways,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-25  9:37 [PATCH] btrfs: replace a wrong memset() with memzero_page() Qu Wenruo
  2022-03-25 10:10 ` Johannes Thumshirn
@ 2022-03-25 16:38 ` Christoph Hellwig
  2022-03-26  1:15   ` Qu Wenruo
  2022-04-05  4:58   ` Christoph Hellwig
  2022-03-28 18:51 ` David Sterba
  2 siblings, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2022-03-25 16:38 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote:
> -	u32 len = fs_info->sectorsize;
> +	const u32 len = fs_info->sectorsize;

Why the spurious and rather pointless const annotation that is unrelated
to the rest of the patch?

>  					     BTRFS_DEV_STAT_CORRUPTION_ERRS);
> -	memset(kaddr + pgoff, 1, len);
> +	memzero_page(page, pgoff, len);
>  	flush_dcache_page(page);

memzero_page already takes care of the flush_dcache_page.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-25 16:38 ` Christoph Hellwig
@ 2022-03-26  1:15   ` Qu Wenruo
  2022-04-05  4:58   ` Christoph Hellwig
  1 sibling, 0 replies; 23+ messages in thread
From: Qu Wenruo @ 2022-03-26  1:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs



On 2022/3/26 00:38, Christoph Hellwig wrote:
> On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote:
>> -	u32 len = fs_info->sectorsize;
>> +	const u32 len = fs_info->sectorsize;
> 
> Why the spurious and rather pointless const annotation that is unrelated
> to the rest of the patch?

Because we prefer const for constant variables in btrfs code.

As we have experienced some bugs in the past related to reusing some 
variables.

> 
>>   					     BTRFS_DEV_STAT_CORRUPTION_ERRS);
>> -	memset(kaddr + pgoff, 1, len);
>> +	memzero_page(page, pgoff, len);
>>   	flush_dcache_page(page);
> 
> memzero_page already takes care of the flush_dcache_page.
> 

Oh, indeed.

Thanks,
Qu


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-25 16:38 ` Christoph Hellwig
  2022-03-26  1:15   ` Qu Wenruo
@ 2022-04-05  4:58   ` Christoph Hellwig
  2022-04-05  5:08     ` Qu Wenruo
  2022-04-05 13:59     ` David Sterba
  1 sibling, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2022-04-05  4:58 UTC (permalink / raw)
  To: Qu Wenruo, David Sterba; +Cc: linux-btrfs

On Fri, Mar 25, 2022 at 09:38:16AM -0700, Christoph Hellwig wrote:
> > -	memset(kaddr + pgoff, 1, len);
> > +	memzero_page(page, pgoff, len);
> >  	flush_dcache_page(page);
> 
> memzero_page already takes care of the flush_dcache_page.

David, you've picked this up with the stay flush_dcache_page left in
place.  Plase try to fix it up instead of spreading the weird cargo cult
flush_dcache_page calls.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-04-05  4:58   ` Christoph Hellwig
@ 2022-04-05  5:08     ` Qu Wenruo
  2022-04-05 14:04       ` David Sterba
  2022-04-05 13:59     ` David Sterba
  1 sibling, 1 reply; 23+ messages in thread
From: Qu Wenruo @ 2022-04-05  5:08 UTC (permalink / raw)
  To: Christoph Hellwig, David Sterba; +Cc: linux-btrfs



On 2022/4/5 12:58, Christoph Hellwig wrote:
> On Fri, Mar 25, 2022 at 09:38:16AM -0700, Christoph Hellwig wrote:
>>> -	memset(kaddr + pgoff, 1, len);
>>> +	memzero_page(page, pgoff, len);
>>>   	flush_dcache_page(page);
>>
>> memzero_page already takes care of the flush_dcache_page.
>
> David, you've picked this up with the stay flush_dcache_page left in
> place.  Plase try to fix it up instead of spreading the weird cargo cult
> flush_dcache_page calls.
>

Please drop this patch, as discussed with Filipe, the memset() value
0x01 could be a poison value to distinguish from plain 0x00.

Furthermore, we are not sure if we even want to zeroing out/poison the
content.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-04-05  5:08     ` Qu Wenruo
@ 2022-04-05 14:04       ` David Sterba
  0 siblings, 0 replies; 23+ messages in thread
From: David Sterba @ 2022-04-05 14:04 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Christoph Hellwig, David Sterba, linux-btrfs

On Tue, Apr 05, 2022 at 01:08:41PM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/4/5 12:58, Christoph Hellwig wrote:
> > On Fri, Mar 25, 2022 at 09:38:16AM -0700, Christoph Hellwig wrote:
> >>> -	memset(kaddr + pgoff, 1, len);
> >>> +	memzero_page(page, pgoff, len);
> >>>   	flush_dcache_page(page);
> >>
> >> memzero_page already takes care of the flush_dcache_page.
> >
> > David, you've picked this up with the stay flush_dcache_page left in
> > place.  Plase try to fix it up instead of spreading the weird cargo cult
> > flush_dcache_page calls.
> 
> Please drop this patch, as discussed with Filipe, the memset() value
> 0x01 could be a poison value to distinguish from plain 0x00.

> Furthermore, we are not sure if we even want to zeroing out/poison the
> content.

I've read the discussion and have been thinking what should we do here,
I'd be for doing memset to 0 because 0x1 is quite arbitrary and does not
follow any established pattern for poison values, if there's anything
like that when returning data from kernel to userspace. I find zeros
OK here. Arguably "It's been 0x1 forever so we should not change it" for
backward compatibility reasons could apply, but this is not an interface
that applications use, the error code is the interface.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-04-05  4:58   ` Christoph Hellwig
  2022-04-05  5:08     ` Qu Wenruo
@ 2022-04-05 13:59     ` David Sterba
  1 sibling, 0 replies; 23+ messages in thread
From: David Sterba @ 2022-04-05 13:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Qu Wenruo, David Sterba, linux-btrfs

On Mon, Apr 04, 2022 at 09:58:26PM -0700, Christoph Hellwig wrote:
> On Fri, Mar 25, 2022 at 09:38:16AM -0700, Christoph Hellwig wrote:
> > > -	memset(kaddr + pgoff, 1, len);
> > > +	memzero_page(page, pgoff, len);
> > >  	flush_dcache_page(page);
> > 
> > memzero_page already takes care of the flush_dcache_page.
> 
> David, you've picked this up with the stay flush_dcache_page left in
> place.  Plase try to fix it up instead of spreading the weird cargo cult
> flush_dcache_page calls.

Good point, flush_dcache_page is in memzero_page so it should have been
removed within the patch. I've grepped and there are still 14 calls,
some of them next to memzero_page so that'll go away too.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-25  9:37 [PATCH] btrfs: replace a wrong memset() with memzero_page() Qu Wenruo
  2022-03-25 10:10 ` Johannes Thumshirn
  2022-03-25 16:38 ` Christoph Hellwig
@ 2022-03-28 18:51 ` David Sterba
  2022-03-28 18:58   ` David Sterba
  2022-03-29  9:57   ` Filipe Manana
  2 siblings, 2 replies; 23+ messages in thread
From: David Sterba @ 2022-03-28 18:51 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote:
> The original code is not really setting the memory to 0x00 but 0x01.
> 
> To prevent such problem from happening, use memzero_page() instead.

This should at least mention we think that setting it to 0 is right, as
you call it wrong but give no hint why it's thought to be wrong.

> Since we're here, also make @len const since it's just sectorsize.

Please don't do that, adding const is fine when the line gets touched
but otherwise adding it to an unrelated fix is not what I want to
encourage.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-28 18:51 ` David Sterba
@ 2022-03-28 18:58   ` David Sterba
  2022-03-29  9:57   ` Filipe Manana
  1 sibling, 0 replies; 23+ messages in thread
From: David Sterba @ 2022-03-28 18:58 UTC (permalink / raw)
  To: David Sterba; +Cc: Qu Wenruo, linux-btrfs

On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote:
> On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote:
> > The original code is not really setting the memory to 0x00 but 0x01.
> > 
> > To prevent such problem from happening, use memzero_page() instead.
> 
> This should at least mention we think that setting it to 0 is right, as
> you call it wrong but give no hint why it's thought to be wrong.
> 
> > Since we're here, also make @len const since it's just sectorsize.
> 
> Please don't do that, adding const is fine when the line gets touched
> but otherwise adding it to an unrelated fix is not what I want to
> encourage.

Applied with that hunk deleted and with an updated changelog.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-28 18:51 ` David Sterba
  2022-03-28 18:58   ` David Sterba
@ 2022-03-29  9:57   ` Filipe Manana
  2022-03-29 10:49     ` Qu Wenruo
  1 sibling, 1 reply; 23+ messages in thread
From: Filipe Manana @ 2022-03-29  9:57 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote:
> On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote:
> > The original code is not really setting the memory to 0x00 but 0x01.
> > 
> > To prevent such problem from happening, use memzero_page() instead.
> 
> This should at least mention we think that setting it to 0 is right, as
> you call it wrong but give no hint why it's thought to be wrong.

My guess is that something different from zero makes it easier to spot
the problem in user space, as 0 is not uncommon (holes, prealloced extents)
and may get unnoticed by applications/users.

I don't see a good reason to change this behaviour. Maybe it's just the
label name 'zeroit' that makes it confusing.

> 
> > Since we're here, also make @len const since it's just sectorsize.
> 
> Please don't do that, adding const is fine when the line gets touched
> but otherwise adding it to an unrelated fix is not what I want to
> encourage.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-29  9:57   ` Filipe Manana
@ 2022-03-29 10:49     ` Qu Wenruo
  2022-03-29 11:39       ` Filipe Manana
  0 siblings, 1 reply; 23+ messages in thread
From: Qu Wenruo @ 2022-03-29 10:49 UTC (permalink / raw)
  To: Filipe Manana, dsterba, linux-btrfs



On 2022/3/29 17:57, Filipe Manana wrote:
> On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote:
>> On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote:
>>> The original code is not really setting the memory to 0x00 but 0x01.
>>>
>>> To prevent such problem from happening, use memzero_page() instead.
>>
>> This should at least mention we think that setting it to 0 is right, as
>> you call it wrong but give no hint why it's thought to be wrong.
>
> My guess is that something different from zero makes it easier to spot
> the problem in user space, as 0 is not uncommon (holes, prealloced extents)
> and may get unnoticed by applications/users.

OK, that makes some sense.

But shouldn't user space tool get an -EIO directly?

As the corrupted range won't have PageUptodate set anyway.

Thanks,
Qu
>
> I don't see a good reason to change this behaviour. Maybe it's just the
> label name 'zeroit' that makes it confusing. >
>>
>>> Since we're here, also make @len const since it's just sectorsize.
>>
>> Please don't do that, adding const is fine when the line gets touched
>> but otherwise adding it to an unrelated fix is not what I want to
>> encourage.
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-29 10:49     ` Qu Wenruo
@ 2022-03-29 11:39       ` Filipe Manana
  2022-03-29 23:52         ` Qu Wenruo
  0 siblings, 1 reply; 23+ messages in thread
From: Filipe Manana @ 2022-03-29 11:39 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, linux-btrfs

On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/3/29 17:57, Filipe Manana wrote:
> > On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote:
> > > On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote:
> > > > The original code is not really setting the memory to 0x00 but 0x01.
> > > > 
> > > > To prevent such problem from happening, use memzero_page() instead.
> > > 
> > > This should at least mention we think that setting it to 0 is right, as
> > > you call it wrong but give no hint why it's thought to be wrong.
> > 
> > My guess is that something different from zero makes it easier to spot
> > the problem in user space, as 0 is not uncommon (holes, prealloced extents)
> > and may get unnoticed by applications/users.
> 
> OK, that makes some sense.
> 
> But shouldn't user space tool get an -EIO directly?

It should.

But even if applications get -EIO, they may often ignore return values.
It's their fault, but if we can make it less likely that errors are not noticed,
the better. I think we all did often, ignore all or just some return values
from read(), write(), open(), etc.

One recent example is the MariaDB case with io-uring. They were reporting
corruption to the users, but the problem is that didn't properly check
return values, ignoring partial reads and treating them as success:

https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582

The data was fine, not corrupted, they just didn't deal with partial reads
and then read the remaining data when a read returns less data than expected.



> 
> As the corrupted range won't have PageUptodate set anyway.
> 
> Thanks,
> Qu
> > 
> > I don't see a good reason to change this behaviour. Maybe it's just the
> > label name 'zeroit' that makes it confusing. >
> > > 
> > > > Since we're here, also make @len const since it's just sectorsize.
> > > 
> > > Please don't do that, adding const is fine when the line gets touched
> > > but otherwise adding it to an unrelated fix is not what I want to
> > > encourage.
> > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-29 11:39       ` Filipe Manana
@ 2022-03-29 23:52         ` Qu Wenruo
  2022-03-30  9:27           ` Filipe Manana
  0 siblings, 1 reply; 23+ messages in thread
From: Qu Wenruo @ 2022-03-29 23:52 UTC (permalink / raw)
  To: Filipe Manana; +Cc: dsterba, linux-btrfs



On 2022/3/29 19:39, Filipe Manana wrote:
> On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2022/3/29 17:57, Filipe Manana wrote:
>>> On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote:
>>>> On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote:
>>>>> The original code is not really setting the memory to 0x00 but 0x01.
>>>>>
>>>>> To prevent such problem from happening, use memzero_page() instead.
>>>>
>>>> This should at least mention we think that setting it to 0 is right, as
>>>> you call it wrong but give no hint why it's thought to be wrong.
>>>
>>> My guess is that something different from zero makes it easier to spot
>>> the problem in user space, as 0 is not uncommon (holes, prealloced extents)
>>> and may get unnoticed by applications/users.
>>
>> OK, that makes some sense.
>>
>> But shouldn't user space tool get an -EIO directly?
>
> It should.
>
> But even if applications get -EIO, they may often ignore return values.
> It's their fault, but if we can make it less likely that errors are not noticed,
> the better. I think we all did often, ignore all or just some return values
> from read(), write(), open(), etc.
>
> One recent example is the MariaDB case with io-uring. They were reporting
> corruption to the users, but the problem is that didn't properly check
> return values, ignoring partial reads and treating them as success:
>
> https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582
>
> The data was fine, not corrupted, they just didn't deal with partial reads
> and then read the remaining data when a read returns less data than expected.

Then can we slightly improve the filling pattern?

Instead of 0x01, introduce some poison value?

And of course, change the lable "zeroit" to something like "poise_it"?

Thanks,
Qu
>
>
>
>>
>> As the corrupted range won't have PageUptodate set anyway.
>>
>> Thanks,
>> Qu
>>>
>>> I don't see a good reason to change this behaviour. Maybe it's just the
>>> label name 'zeroit' that makes it confusing. >
>>>>
>>>>> Since we're here, also make @len const since it's just sectorsize.
>>>>
>>>> Please don't do that, adding const is fine when the line gets touched
>>>> but otherwise adding it to an unrelated fix is not what I want to
>>>> encourage.
>>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-29 23:52         ` Qu Wenruo
@ 2022-03-30  9:27           ` Filipe Manana
  2022-03-30 10:34             ` Filipe Manana
  0 siblings, 1 reply; 23+ messages in thread
From: Filipe Manana @ 2022-03-30  9:27 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, linux-btrfs

On Wed, Mar 30, 2022 at 07:52:04AM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/3/29 19:39, Filipe Manana wrote:
> > On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote:
> > > 
> > > 
> > > On 2022/3/29 17:57, Filipe Manana wrote:
> > > > On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote:
> > > > > On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote:
> > > > > > The original code is not really setting the memory to 0x00 but 0x01.
> > > > > > 
> > > > > > To prevent such problem from happening, use memzero_page() instead.
> > > > > 
> > > > > This should at least mention we think that setting it to 0 is right, as
> > > > > you call it wrong but give no hint why it's thought to be wrong.
> > > > 
> > > > My guess is that something different from zero makes it easier to spot
> > > > the problem in user space, as 0 is not uncommon (holes, prealloced extents)
> > > > and may get unnoticed by applications/users.
> > > 
> > > OK, that makes some sense.
> > > 
> > > But shouldn't user space tool get an -EIO directly?
> > 
> > It should.
> > 
> > But even if applications get -EIO, they may often ignore return values.
> > It's their fault, but if we can make it less likely that errors are not noticed,
> > the better. I think we all did often, ignore all or just some return values
> > from read(), write(), open(), etc.
> > 
> > One recent example is the MariaDB case with io-uring. They were reporting
> > corruption to the users, but the problem is that didn't properly check
> > return values, ignoring partial reads and treating them as success:
> > 
> > https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582
> > 
> > The data was fine, not corrupted, they just didn't deal with partial reads
> > and then read the remaining data when a read returns less data than expected.
> 
> Then can we slightly improve the filling pattern?
> 
> Instead of 0x01, introduce some poison value?

Why isn't 0x01 a good enough value?

A long range of 0x01 values is certainly unexpected in text files, and very likely
on binary formats as well. Or do you think there's some case where long sequences
of 0x01 are common and not unexpected?

> 
> And of course, change the lable "zeroit" to something like "poise_it"?

"poison_it", poise has a very different and unrelated meaning in English.

> 
> Thanks,
> Qu
> > 
> > 
> > 
> > > 
> > > As the corrupted range won't have PageUptodate set anyway.
> > > 
> > > Thanks,
> > > Qu
> > > > 
> > > > I don't see a good reason to change this behaviour. Maybe it's just the
> > > > label name 'zeroit' that makes it confusing. >
> > > > > 
> > > > > > Since we're here, also make @len const since it's just sectorsize.
> > > > > 
> > > > > Please don't do that, adding const is fine when the line gets touched
> > > > > but otherwise adding it to an unrelated fix is not what I want to
> > > > > encourage.
> > > > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-30  9:27           ` Filipe Manana
@ 2022-03-30 10:34             ` Filipe Manana
  2022-03-30 10:42               ` Qu Wenruo
  0 siblings, 1 reply; 23+ messages in thread
From: Filipe Manana @ 2022-03-30 10:34 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, linux-btrfs

On Wed, Mar 30, 2022 at 10:27:53AM +0100, Filipe Manana wrote:
> On Wed, Mar 30, 2022 at 07:52:04AM +0800, Qu Wenruo wrote:
> > 
> > 
> > On 2022/3/29 19:39, Filipe Manana wrote:
> > > On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote:
> > > > 
> > > > 
> > > > On 2022/3/29 17:57, Filipe Manana wrote:
> > > > > On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote:
> > > > > > On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote:
> > > > > > > The original code is not really setting the memory to 0x00 but 0x01.
> > > > > > > 
> > > > > > > To prevent such problem from happening, use memzero_page() instead.
> > > > > > 
> > > > > > This should at least mention we think that setting it to 0 is right, as
> > > > > > you call it wrong but give no hint why it's thought to be wrong.
> > > > > 
> > > > > My guess is that something different from zero makes it easier to spot
> > > > > the problem in user space, as 0 is not uncommon (holes, prealloced extents)
> > > > > and may get unnoticed by applications/users.
> > > > 
> > > > OK, that makes some sense.
> > > > 
> > > > But shouldn't user space tool get an -EIO directly?
> > > 
> > > It should.
> > > 
> > > But even if applications get -EIO, they may often ignore return values.
> > > It's their fault, but if we can make it less likely that errors are not noticed,
> > > the better. I think we all did often, ignore all or just some return values
> > > from read(), write(), open(), etc.
> > > 
> > > One recent example is the MariaDB case with io-uring. They were reporting
> > > corruption to the users, but the problem is that didn't properly check
> > > return values, ignoring partial reads and treating them as success:
> > > 
> > > https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582
> > > 
> > > The data was fine, not corrupted, they just didn't deal with partial reads
> > > and then read the remaining data when a read returns less data than expected.
> > 
> > Then can we slightly improve the filling pattern?
> > 
> > Instead of 0x01, introduce some poison value?
> 
> Why isn't 0x01 a good enough value?
> 
> A long range of 0x01 values is certainly unexpected in text files, and very likely
> on binary formats as well. Or do you think there's some case where long sequences
> of 0x01 are common and not unexpected?
> 
> > 
> > And of course, change the lable "zeroit" to something like "poise_it"?
> 
> "poison_it", poise has a very different and unrelated meaning in English.

It's also worth considering if we should change the page content at all.

Adding a poison value makes it easier to detect the corruption, both for
developers and for sloppy applications that don't check error values.

However people often want to still have access to the corrupted data, they
can tolerate a few corrupted bytes and find the remaining useful. This has
been requested a few times in the past.

Thanks.

> 
> > 
> > Thanks,
> > Qu
> > > 
> > > 
> > > 
> > > > 
> > > > As the corrupted range won't have PageUptodate set anyway.
> > > > 
> > > > Thanks,
> > > > Qu
> > > > > 
> > > > > I don't see a good reason to change this behaviour. Maybe it's just the
> > > > > label name 'zeroit' that makes it confusing. >
> > > > > > 
> > > > > > > Since we're here, also make @len const since it's just sectorsize.
> > > > > > 
> > > > > > Please don't do that, adding const is fine when the line gets touched
> > > > > > but otherwise adding it to an unrelated fix is not what I want to
> > > > > > encourage.
> > > > > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-30 10:34             ` Filipe Manana
@ 2022-03-30 10:42               ` Qu Wenruo
  2022-03-30 11:02                 ` Filipe Manana
  2022-03-30 11:03                 ` Graham Cobb
  0 siblings, 2 replies; 23+ messages in thread
From: Qu Wenruo @ 2022-03-30 10:42 UTC (permalink / raw)
  To: Filipe Manana; +Cc: dsterba, linux-btrfs



On 2022/3/30 18:34, Filipe Manana wrote:
> On Wed, Mar 30, 2022 at 10:27:53AM +0100, Filipe Manana wrote:
>> On Wed, Mar 30, 2022 at 07:52:04AM +0800, Qu Wenruo wrote:
>>>
>>>
>>> On 2022/3/29 19:39, Filipe Manana wrote:
>>>> On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote:
>>>>>
>>>>>
>>>>> On 2022/3/29 17:57, Filipe Manana wrote:
>>>>>> On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote:
>>>>>>> On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote:
>>>>>>>> The original code is not really setting the memory to 0x00 but 0x01.
>>>>>>>>
>>>>>>>> To prevent such problem from happening, use memzero_page() instead.
>>>>>>>
>>>>>>> This should at least mention we think that setting it to 0 is right, as
>>>>>>> you call it wrong but give no hint why it's thought to be wrong.
>>>>>>
>>>>>> My guess is that something different from zero makes it easier to spot
>>>>>> the problem in user space, as 0 is not uncommon (holes, prealloced extents)
>>>>>> and may get unnoticed by applications/users.
>>>>>
>>>>> OK, that makes some sense.
>>>>>
>>>>> But shouldn't user space tool get an -EIO directly?
>>>>
>>>> It should.
>>>>
>>>> But even if applications get -EIO, they may often ignore return values.
>>>> It's their fault, but if we can make it less likely that errors are not noticed,
>>>> the better. I think we all did often, ignore all or just some return values
>>>> from read(), write(), open(), etc.
>>>>
>>>> One recent example is the MariaDB case with io-uring. They were reporting
>>>> corruption to the users, but the problem is that didn't properly check
>>>> return values, ignoring partial reads and treating them as success:
>>>>
>>>> https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582
>>>>
>>>> The data was fine, not corrupted, they just didn't deal with partial reads
>>>> and then read the remaining data when a read returns less data than expected.
>>>
>>> Then can we slightly improve the filling pattern?
>>>
>>> Instead of 0x01, introduce some poison value?
>>
>> Why isn't 0x01 a good enough value?
>>
>> A long range of 0x01 values is certainly unexpected in text files, and very likely
>> on binary formats as well. Or do you think there's some case where long sequences
>> of 0x01 are common and not unexpected?
>>
>>>
>>> And of course, change the lable "zeroit" to something like "poise_it"?
>>
>> "poison_it", poise has a very different and unrelated meaning in English.
>
> It's also worth considering if we should change the page content at all.
>
> Adding a poison value makes it easier to detect the corruption, both for
> developers and for sloppy applications that don't check error values.
>
> However people often want to still have access to the corrupted data, they
> can tolerate a few corrupted bytes and find the remaining useful. This has
> been requested a few times in the past.

This looks more favorable.

And I didn't think the change would break anything.

For proper user space programs checking the error, they know it's an
-EIO and will error out.

For bad programs not checking the error, it will just read the corrupted
data, and may even pass its internal sanity checks (if the corrupted
bytes are not in some critical part).

Or is there something we haven't taken into consideration?

Thanks,
Qu

>
> Thanks.
>
>>
>>>
>>> Thanks,
>>> Qu
>>>>
>>>>
>>>>
>>>>>
>>>>> As the corrupted range won't have PageUptodate set anyway.
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>>
>>>>>> I don't see a good reason to change this behaviour. Maybe it's just the
>>>>>> label name 'zeroit' that makes it confusing. >
>>>>>>>
>>>>>>>> Since we're here, also make @len const since it's just sectorsize.
>>>>>>>
>>>>>>> Please don't do that, adding const is fine when the line gets touched
>>>>>>> but otherwise adding it to an unrelated fix is not what I want to
>>>>>>> encourage.
>>>>>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-30 10:42               ` Qu Wenruo
@ 2022-03-30 11:02                 ` Filipe Manana
  2022-03-30 11:03                 ` Graham Cobb
  1 sibling, 0 replies; 23+ messages in thread
From: Filipe Manana @ 2022-03-30 11:02 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, linux-btrfs

On Wed, Mar 30, 2022 at 06:42:39PM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/3/30 18:34, Filipe Manana wrote:
> > On Wed, Mar 30, 2022 at 10:27:53AM +0100, Filipe Manana wrote:
> > > On Wed, Mar 30, 2022 at 07:52:04AM +0800, Qu Wenruo wrote:
> > > > 
> > > > 
> > > > On 2022/3/29 19:39, Filipe Manana wrote:
> > > > > On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote:
> > > > > > 
> > > > > > 
> > > > > > On 2022/3/29 17:57, Filipe Manana wrote:
> > > > > > > On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote:
> > > > > > > > On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote:
> > > > > > > > > The original code is not really setting the memory to 0x00 but 0x01.
> > > > > > > > > 
> > > > > > > > > To prevent such problem from happening, use memzero_page() instead.
> > > > > > > > 
> > > > > > > > This should at least mention we think that setting it to 0 is right, as
> > > > > > > > you call it wrong but give no hint why it's thought to be wrong.
> > > > > > > 
> > > > > > > My guess is that something different from zero makes it easier to spot
> > > > > > > the problem in user space, as 0 is not uncommon (holes, prealloced extents)
> > > > > > > and may get unnoticed by applications/users.
> > > > > > 
> > > > > > OK, that makes some sense.
> > > > > > 
> > > > > > But shouldn't user space tool get an -EIO directly?
> > > > > 
> > > > > It should.
> > > > > 
> > > > > But even if applications get -EIO, they may often ignore return values.
> > > > > It's their fault, but if we can make it less likely that errors are not noticed,
> > > > > the better. I think we all did often, ignore all or just some return values
> > > > > from read(), write(), open(), etc.
> > > > > 
> > > > > One recent example is the MariaDB case with io-uring. They were reporting
> > > > > corruption to the users, but the problem is that didn't properly check
> > > > > return values, ignoring partial reads and treating them as success:
> > > > > 
> > > > > https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582
> > > > > 
> > > > > The data was fine, not corrupted, they just didn't deal with partial reads
> > > > > and then read the remaining data when a read returns less data than expected.
> > > > 
> > > > Then can we slightly improve the filling pattern?
> > > > 
> > > > Instead of 0x01, introduce some poison value?
> > > 
> > > Why isn't 0x01 a good enough value?
> > > 
> > > A long range of 0x01 values is certainly unexpected in text files, and very likely
> > > on binary formats as well. Or do you think there's some case where long sequences
> > > of 0x01 are common and not unexpected?
> > > 
> > > > 
> > > > And of course, change the lable "zeroit" to something like "poise_it"?
> > > 
> > > "poison_it", poise has a very different and unrelated meaning in English.
> > 
> > It's also worth considering if we should change the page content at all.
> > 
> > Adding a poison value makes it easier to detect the corruption, both for
> > developers and for sloppy applications that don't check error values.
> > 
> > However people often want to still have access to the corrupted data, they
> > can tolerate a few corrupted bytes and find the remaining useful. This has
> > been requested a few times in the past.
> 
> This looks more favorable.
> 
> And I didn't think the change would break anything.
> 
> For proper user space programs checking the error, they know it's an
> -EIO and will error out.
> 
> For bad programs not checking the error, it will just read the corrupted
> data, and may even pass its internal sanity checks (if the corrupted
> bytes are not in some critical part).
> 
> Or is there something we haven't taken into consideration?

Apart from that, applications ignoring error code returned from read()/pread()/etc,
I can't think about anything else to worry about (at least for now).

I believe the poison value was intended only for the early days of development, or
at least it's something I would do myself because it makes sense and I've seen it
elsewhere too. But there are no comments in the code or change logs to validate
that assumption.

Certainly zeroing out the conent is not useful, so either a poison value or don't
change the page data at all. We should really have a test for this - for example
use a single data profile, corrupt 1 byte of data, mount the fs, read the data,
check we get -EIO and verify that the page content is either the data we have on
disk or the poison value.

> 
> Thanks,
> Qu
> 
> > 
> > Thanks.
> > 
> > > 
> > > > 
> > > > Thanks,
> > > > Qu
> > > > > 
> > > > > 
> > > > > 
> > > > > > 
> > > > > > As the corrupted range won't have PageUptodate set anyway.
> > > > > > 
> > > > > > Thanks,
> > > > > > Qu
> > > > > > > 
> > > > > > > I don't see a good reason to change this behaviour. Maybe it's just the
> > > > > > > label name 'zeroit' that makes it confusing. >
> > > > > > > > 
> > > > > > > > > Since we're here, also make @len const since it's just sectorsize.
> > > > > > > > 
> > > > > > > > Please don't do that, adding const is fine when the line gets touched
> > > > > > > > but otherwise adding it to an unrelated fix is not what I want to
> > > > > > > > encourage.
> > > > > > > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-30 10:42               ` Qu Wenruo
  2022-03-30 11:02                 ` Filipe Manana
@ 2022-03-30 11:03                 ` Graham Cobb
  2022-03-30 21:34                   ` Filipe Manana
  1 sibling, 1 reply; 23+ messages in thread
From: Graham Cobb @ 2022-03-30 11:03 UTC (permalink / raw)
  To: Qu Wenruo, Filipe Manana; +Cc: dsterba, linux-btrfs


On 30/03/2022 11:42, Qu Wenruo wrote:
> 
> 
> On 2022/3/30 18:34, Filipe Manana wrote:
>> On Wed, Mar 30, 2022 at 10:27:53AM +0100, Filipe Manana wrote:
>>> On Wed, Mar 30, 2022 at 07:52:04AM +0800, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2022/3/29 19:39, Filipe Manana wrote:
>>>>> On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote:
>>>>>>
>>>>>>
>>>>>> On 2022/3/29 17:57, Filipe Manana wrote:
>>>>>>> On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote:
>>>>>>>> On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote:
>>>>>>>>> The original code is not really setting the memory to 0x00 but
>>>>>>>>> 0x01.
>>>>>>>>>
>>>>>>>>> To prevent such problem from happening, use memzero_page()
>>>>>>>>> instead.
>>>>>>>>
>>>>>>>> This should at least mention we think that setting it to 0 is
>>>>>>>> right, as
>>>>>>>> you call it wrong but give no hint why it's thought to be wrong.
>>>>>>>
>>>>>>> My guess is that something different from zero makes it easier to
>>>>>>> spot
>>>>>>> the problem in user space, as 0 is not uncommon (holes,
>>>>>>> prealloced extents)
>>>>>>> and may get unnoticed by applications/users.
>>>>>>
>>>>>> OK, that makes some sense.
>>>>>>
>>>>>> But shouldn't user space tool get an -EIO directly?
>>>>>
>>>>> It should.
>>>>>
>>>>> But even if applications get -EIO, they may often ignore return
>>>>> values.
>>>>> It's their fault, but if we can make it less likely that errors are
>>>>> not noticed,
>>>>> the better. I think we all did often, ignore all or just some
>>>>> return values
>>>>> from read(), write(), open(), etc.
>>>>>
>>>>> One recent example is the MariaDB case with io-uring. They were
>>>>> reporting
>>>>> corruption to the users, but the problem is that didn't properly check
>>>>> return values, ignoring partial reads and treating them as success:
>>>>>
>>>>> https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582
>>>>>
>>>>>
>>>>> The data was fine, not corrupted, they just didn't deal with
>>>>> partial reads
>>>>> and then read the remaining data when a read returns less data than
>>>>> expected.
>>>>
>>>> Then can we slightly improve the filling pattern?
>>>>
>>>> Instead of 0x01, introduce some poison value?
>>>
>>> Why isn't 0x01 a good enough value?
>>>
>>> A long range of 0x01 values is certainly unexpected in text files,
>>> and very likely
>>> on binary formats as well. Or do you think there's some case where
>>> long sequences
>>> of 0x01 are common and not unexpected?
>>>
>>>>
>>>> And of course, change the lable "zeroit" to something like "poise_it"?
>>>
>>> "poison_it", poise has a very different and unrelated meaning in
>>> English.
>>
>> It's also worth considering if we should change the page content at all.
>>
>> Adding a poison value makes it easier to detect the corruption, both for
>> developers and for sloppy applications that don't check error values.
>>
>> However people often want to still have access to the corrupted data,
>> they
>> can tolerate a few corrupted bytes and find the remaining useful. This
>> has
>> been requested a few times in the past.
> 
> This looks more favorable.
> 
> And I didn't think the change would break anything.
> 
> For proper user space programs checking the error, they know it's an
> -EIO and will error out.
> 
> For bad programs not checking the error, it will just read the corrupted
> data, and may even pass its internal sanity checks (if the corrupted
> bytes are not in some critical part).
> 
> Or is there something we haven't taken into consideration?

Well.... If an error occurred something has gone wrong. It could be a
simple bit flip in the storage itself, or it could be something else.
The something else could be a software or firmware bug, or a memory
corruption or memory controller power fluctuation blip, or .... I guess
it might even be possible in some architectures that the problem could
result in page table updates validating a page that actually was never
written to at all and could still contain some previous contents.

In a time when we worry about things like Spectre, Meltdown, Rowhammer,
stale cache leakage, etc it may be a good security principle that data
left over from failed operations should never be made visible to user mode.

Possibly unnecessary worrying, but then who thought that jump prediction
would create a side-channel for exposing data that can actually be
exploited in real life.

Graham

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-30 11:03                 ` Graham Cobb
@ 2022-03-30 21:34                   ` Filipe Manana
  2022-03-30 22:29                     ` Graham Cobb
  0 siblings, 1 reply; 23+ messages in thread
From: Filipe Manana @ 2022-03-30 21:34 UTC (permalink / raw)
  To: Graham Cobb; +Cc: Qu Wenruo, dsterba, linux-btrfs

On Wed, Mar 30, 2022 at 12:03:36PM +0100, Graham Cobb wrote:
> 
> On 30/03/2022 11:42, Qu Wenruo wrote:
> > 
> > 
> > On 2022/3/30 18:34, Filipe Manana wrote:
> >> On Wed, Mar 30, 2022 at 10:27:53AM +0100, Filipe Manana wrote:
> >>> On Wed, Mar 30, 2022 at 07:52:04AM +0800, Qu Wenruo wrote:
> >>>>
> >>>>
> >>>> On 2022/3/29 19:39, Filipe Manana wrote:
> >>>>> On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 2022/3/29 17:57, Filipe Manana wrote:
> >>>>>>> On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote:
> >>>>>>>> On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote:
> >>>>>>>>> The original code is not really setting the memory to 0x00 but
> >>>>>>>>> 0x01.
> >>>>>>>>>
> >>>>>>>>> To prevent such problem from happening, use memzero_page()
> >>>>>>>>> instead.
> >>>>>>>>
> >>>>>>>> This should at least mention we think that setting it to 0 is
> >>>>>>>> right, as
> >>>>>>>> you call it wrong but give no hint why it's thought to be wrong.
> >>>>>>>
> >>>>>>> My guess is that something different from zero makes it easier to
> >>>>>>> spot
> >>>>>>> the problem in user space, as 0 is not uncommon (holes,
> >>>>>>> prealloced extents)
> >>>>>>> and may get unnoticed by applications/users.
> >>>>>>
> >>>>>> OK, that makes some sense.
> >>>>>>
> >>>>>> But shouldn't user space tool get an -EIO directly?
> >>>>>
> >>>>> It should.
> >>>>>
> >>>>> But even if applications get -EIO, they may often ignore return
> >>>>> values.
> >>>>> It's their fault, but if we can make it less likely that errors are
> >>>>> not noticed,
> >>>>> the better. I think we all did often, ignore all or just some
> >>>>> return values
> >>>>> from read(), write(), open(), etc.
> >>>>>
> >>>>> One recent example is the MariaDB case with io-uring. They were
> >>>>> reporting
> >>>>> corruption to the users, but the problem is that didn't properly check
> >>>>> return values, ignoring partial reads and treating them as success:
> >>>>>
> >>>>> https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582
> >>>>>
> >>>>>
> >>>>> The data was fine, not corrupted, they just didn't deal with
> >>>>> partial reads
> >>>>> and then read the remaining data when a read returns less data than
> >>>>> expected.
> >>>>
> >>>> Then can we slightly improve the filling pattern?
> >>>>
> >>>> Instead of 0x01, introduce some poison value?
> >>>
> >>> Why isn't 0x01 a good enough value?
> >>>
> >>> A long range of 0x01 values is certainly unexpected in text files,
> >>> and very likely
> >>> on binary formats as well. Or do you think there's some case where
> >>> long sequences
> >>> of 0x01 are common and not unexpected?
> >>>
> >>>>
> >>>> And of course, change the lable "zeroit" to something like "poise_it"?
> >>>
> >>> "poison_it", poise has a very different and unrelated meaning in
> >>> English.
> >>
> >> It's also worth considering if we should change the page content at all.
> >>
> >> Adding a poison value makes it easier to detect the corruption, both for
> >> developers and for sloppy applications that don't check error values.
> >>
> >> However people often want to still have access to the corrupted data,
> >> they
> >> can tolerate a few corrupted bytes and find the remaining useful. This
> >> has
> >> been requested a few times in the past.
> > 
> > This looks more favorable.
> > 
> > And I didn't think the change would break anything.
> > 
> > For proper user space programs checking the error, they know it's an
> > -EIO and will error out.
> > 
> > For bad programs not checking the error, it will just read the corrupted
> > data, and may even pass its internal sanity checks (if the corrupted
> > bytes are not in some critical part).
> > 
> > Or is there something we haven't taken into consideration?
> 
> Well.... If an error occurred something has gone wrong. It could be a
> simple bit flip in the storage itself, or it could be something else.
> The something else could be a software or firmware bug, or a memory
> corruption or memory controller power fluctuation blip, or .... I guess
> it might even be possible in some architectures that the problem could
> result in page table updates validating a page that actually was never
> written to at all and could still contain some previous contents.
> 
> In a time when we worry about things like Spectre, Meltdown, Rowhammer,
> stale cache leakage, etc it may be a good security principle that data
> left over from failed operations should never be made visible to user mode.
> 
> Possibly unnecessary worrying, but then who thought that jump prediction
> would create a side-channel for exposing data that can actually be
> exploited in real life.

Hum, then a filesystem that doesn't have data checksums (the vast majority of
them), like ext4, xfs, etc, has a serious security flaw, no? You ask to read an
extent, and it returns whatever is on disk, even if it has suffered some corruption.

Shall we open a CVE, and force all filesystems to implement data checksums? ;)
We would also have to drop support for nodatacow and nodatasum from btrfs.


> 
> Graham

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
  2022-03-30 21:34                   ` Filipe Manana
@ 2022-03-30 22:29                     ` Graham Cobb
  0 siblings, 0 replies; 23+ messages in thread
From: Graham Cobb @ 2022-03-30 22:29 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Qu Wenruo, dsterba, linux-btrfs

On 30/03/2022 22:34, Filipe Manana wrote:
> On Wed, Mar 30, 2022 at 12:03:36PM +0100, Graham Cobb wrote:
>>
>> On 30/03/2022 11:42, Qu Wenruo wrote:
>>>
>>>
>>> On 2022/3/30 18:34, Filipe Manana wrote:
>>>> On Wed, Mar 30, 2022 at 10:27:53AM +0100, Filipe Manana wrote:
>>>>> On Wed, Mar 30, 2022 at 07:52:04AM +0800, Qu Wenruo wrote:
>>>>>>
>>>>>>
>>>>>> On 2022/3/29 19:39, Filipe Manana wrote:
>>>>>>> On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2022/3/29 17:57, Filipe Manana wrote:
>>>>>>>>> On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote:
>>>>>>>>>> On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote:
>>>>>>>>>>> The original code is not really setting the memory to 0x00 but
>>>>>>>>>>> 0x01.
>>>>>>>>>>>
>>>>>>>>>>> To prevent such problem from happening, use memzero_page()
>>>>>>>>>>> instead.
>>>>>>>>>>
>>>>>>>>>> This should at least mention we think that setting it to 0 is
>>>>>>>>>> right, as
>>>>>>>>>> you call it wrong but give no hint why it's thought to be wrong.
>>>>>>>>>
>>>>>>>>> My guess is that something different from zero makes it easier to
>>>>>>>>> spot
>>>>>>>>> the problem in user space, as 0 is not uncommon (holes,
>>>>>>>>> prealloced extents)
>>>>>>>>> and may get unnoticed by applications/users.
>>>>>>>>
>>>>>>>> OK, that makes some sense.
>>>>>>>>
>>>>>>>> But shouldn't user space tool get an -EIO directly?
>>>>>>>
>>>>>>> It should.
>>>>>>>
>>>>>>> But even if applications get -EIO, they may often ignore return
>>>>>>> values.
>>>>>>> It's their fault, but if we can make it less likely that errors are
>>>>>>> not noticed,
>>>>>>> the better. I think we all did often, ignore all or just some
>>>>>>> return values
>>>>>>> from read(), write(), open(), etc.
>>>>>>>
>>>>>>> One recent example is the MariaDB case with io-uring. They were
>>>>>>> reporting
>>>>>>> corruption to the users, but the problem is that didn't properly check
>>>>>>> return values, ignoring partial reads and treating them as success:
>>>>>>>
>>>>>>> https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582
>>>>>>>
>>>>>>>
>>>>>>> The data was fine, not corrupted, they just didn't deal with
>>>>>>> partial reads
>>>>>>> and then read the remaining data when a read returns less data than
>>>>>>> expected.
>>>>>>
>>>>>> Then can we slightly improve the filling pattern?
>>>>>>
>>>>>> Instead of 0x01, introduce some poison value?
>>>>>
>>>>> Why isn't 0x01 a good enough value?
>>>>>
>>>>> A long range of 0x01 values is certainly unexpected in text files,
>>>>> and very likely
>>>>> on binary formats as well. Or do you think there's some case where
>>>>> long sequences
>>>>> of 0x01 are common and not unexpected?
>>>>>
>>>>>>
>>>>>> And of course, change the lable "zeroit" to something like "poise_it"?
>>>>>
>>>>> "poison_it", poise has a very different and unrelated meaning in
>>>>> English.
>>>>
>>>> It's also worth considering if we should change the page content at all.
>>>>
>>>> Adding a poison value makes it easier to detect the corruption, both for
>>>> developers and for sloppy applications that don't check error values.
>>>>
>>>> However people often want to still have access to the corrupted data,
>>>> they
>>>> can tolerate a few corrupted bytes and find the remaining useful. This
>>>> has
>>>> been requested a few times in the past.
>>>
>>> This looks more favorable.
>>>
>>> And I didn't think the change would break anything.
>>>
>>> For proper user space programs checking the error, they know it's an
>>> -EIO and will error out.
>>>
>>> For bad programs not checking the error, it will just read the corrupted
>>> data, and may even pass its internal sanity checks (if the corrupted
>>> bytes are not in some critical part).
>>>
>>> Or is there something we haven't taken into consideration?
>>
>> Well.... If an error occurred something has gone wrong. It could be a
>> simple bit flip in the storage itself, or it could be something else.
>> The something else could be a software or firmware bug, or a memory
>> corruption or memory controller power fluctuation blip, or .... I guess
>> it might even be possible in some architectures that the problem could
>> result in page table updates validating a page that actually was never
>> written to at all and could still contain some previous contents.
>>
>> In a time when we worry about things like Spectre, Meltdown, Rowhammer,
>> stale cache leakage, etc it may be a good security principle that data
>> left over from failed operations should never be made visible to user mode.
>>
>> Possibly unnecessary worrying, but then who thought that jump prediction
>> would create a side-channel for exposing data that can actually be
>> exploited in real life.
> 
> Hum, then a filesystem that doesn't have data checksums (the vast majority of
> them), like ext4, xfs, etc, has a serious security flaw, no? You ask to read an
> extent, and it returns whatever is on disk, even if it has suffered some corruption.
> 
> Shall we open a CVE, and force all filesystems to implement data checksums? ;)
> We would also have to drop support for nodatacow and nodatasum from btrfs.

No. There is a clear difference between cases where you don't know
whether the data is valid and cases where there has been an error
detected (a device-reported read error, a checksum error, a logic error,
or whatever).

You asked if there was anything you hadn't taken into consideration. It
is entirely up to you and the team to consider it - in cases where you
*know* an error has occurred should you refuse to return the data? I
don't know - but the decision should be deliberate and deserves a comment.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-04-05 21:15 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-03-25  9:37 [PATCH] btrfs: replace a wrong memset() with memzero_page() Qu Wenruo
2022-03-25 10:10 ` Johannes Thumshirn
2022-03-25 10:49   ` Qu Wenruo
2022-03-25 10:58     ` Johannes Thumshirn
2022-03-25 16:38 ` Christoph Hellwig
2022-03-26  1:15   ` Qu Wenruo
2022-04-05  4:58   ` Christoph Hellwig
2022-04-05  5:08     ` Qu Wenruo
2022-04-05 14:04       ` David Sterba
2022-04-05 13:59     ` David Sterba
2022-03-28 18:51 ` David Sterba
2022-03-28 18:58   ` David Sterba
2022-03-29  9:57   ` Filipe Manana
2022-03-29 10:49     ` Qu Wenruo
2022-03-29 11:39       ` Filipe Manana
2022-03-29 23:52         ` Qu Wenruo
2022-03-30  9:27           ` Filipe Manana
2022-03-30 10:34             ` Filipe Manana
2022-03-30 10:42               ` Qu Wenruo
2022-03-30 11:02                 ` Filipe Manana
2022-03-30 11:03                 ` Graham Cobb
2022-03-30 21:34                   ` Filipe Manana
2022-03-30 22:29                     ` Graham Cobb

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox