* [PATCH] btrfs: replace a wrong memset() with memzero_page()
@ 2022-03-25 9:37 Qu Wenruo
2022-03-25 10:10 ` Johannes Thumshirn
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Qu Wenruo @ 2022-03-25 9:37 UTC (permalink / raw)
To: linux-btrfs
The original code is not really setting the memory to 0x00 but 0x01.
To prevent such problem from happening, use memzero_page() instead.
Since we're here, also make @len const since it's just sectorsize.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/inode.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 78a5145353e1..179d479b72e9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3272,7 +3272,7 @@ static int check_data_csum(struct inode *inode, struct btrfs_bio *bbio,
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
char *kaddr;
- u32 len = fs_info->sectorsize;
+ const u32 len = fs_info->sectorsize;
const u32 csum_size = fs_info->csum_size;
unsigned int offset_sectors;
u8 *csum_expected;
@@ -3287,11 +3287,11 @@ static int check_data_csum(struct inode *inode, struct btrfs_bio *bbio,
shash->tfm = fs_info->csum_shash;
crypto_shash_digest(shash, kaddr + pgoff, len, csum);
+ kunmap_atomic(kaddr);
if (memcmp(csum, csum_expected, csum_size))
goto zeroit;
- kunmap_atomic(kaddr);
return 0;
zeroit:
btrfs_print_data_csum_error(BTRFS_I(inode), start, csum, csum_expected,
@@ -3299,9 +3299,8 @@ static int check_data_csum(struct inode *inode, struct btrfs_bio *bbio,
if (bbio->device)
btrfs_dev_stat_inc_and_print(bbio->device,
BTRFS_DEV_STAT_CORRUPTION_ERRS);
- memset(kaddr + pgoff, 1, len);
+ memzero_page(page, pgoff, len);
flush_dcache_page(page);
- kunmap_atomic(kaddr);
return -EIO;
}
--
2.35.1
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-25 9:37 [PATCH] btrfs: replace a wrong memset() with memzero_page() Qu Wenruo @ 2022-03-25 10:10 ` Johannes Thumshirn 2022-03-25 10:49 ` Qu Wenruo 2022-03-25 16:38 ` Christoph Hellwig 2022-03-28 18:51 ` David Sterba 2 siblings, 1 reply; 23+ messages in thread From: Johannes Thumshirn @ 2022-03-25 10:10 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs@vger.kernel.org On 25/03/2022 10:38, Qu Wenruo wrote: > The original code is not really setting the memory to 0x00 but 0x01. > > To prevent such problem from happening, use memzero_page() instead. > > Since we're here, also make @len const since it's just sectorsize. Any idea why we're setting it to 1? It's been this way since 07157aacb1ec ("Btrfs: Add file data csums back in via hooks in the extent map code") which landed in v2.6.29. So basically since the dawn of time. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-25 10:10 ` Johannes Thumshirn @ 2022-03-25 10:49 ` Qu Wenruo 2022-03-25 10:58 ` Johannes Thumshirn 0 siblings, 1 reply; 23+ messages in thread From: Qu Wenruo @ 2022-03-25 10:49 UTC (permalink / raw) To: Johannes Thumshirn, Qu Wenruo, linux-btrfs@vger.kernel.org On 2022/3/25 18:10, Johannes Thumshirn wrote: > On 25/03/2022 10:38, Qu Wenruo wrote: >> The original code is not really setting the memory to 0x00 but 0x01. >> >> To prevent such problem from happening, use memzero_page() instead. >> >> Since we're here, also make @len const since it's just sectorsize. > > Any idea why we're setting it to 1? It's been this way since 07157aacb1ec > ("Btrfs: Add file data csums back in via hooks in the extent map code") > which landed in v2.6.29. So basically since the dawn of time. No idea at all, maybe just a typo. Thanks, Qu > > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-25 10:49 ` Qu Wenruo @ 2022-03-25 10:58 ` Johannes Thumshirn 0 siblings, 0 replies; 23+ messages in thread From: Johannes Thumshirn @ 2022-03-25 10:58 UTC (permalink / raw) To: Qu Wenruo, Qu Wenruo, linux-btrfs@vger.kernel.org On 25/03/2022 11:49, Qu Wenruo wrote: > > > On 2022/3/25 18:10, Johannes Thumshirn wrote: >> On 25/03/2022 10:38, Qu Wenruo wrote: >>> The original code is not really setting the memory to 0x00 but 0x01. >>> >>> To prevent such problem from happening, use memzero_page() instead. >>> >>> Since we're here, also make @len const since it's just sectorsize. >> >> Any idea why we're setting it to 1? It's been this way since 07157aacb1ec >> ("Btrfs: Add file data csums back in via hooks in the extent map code") >> which landed in v2.6.29. So basically since the dawn of time. > > No idea at all, maybe just a typo. Anyways, Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-25 9:37 [PATCH] btrfs: replace a wrong memset() with memzero_page() Qu Wenruo 2022-03-25 10:10 ` Johannes Thumshirn @ 2022-03-25 16:38 ` Christoph Hellwig 2022-03-26 1:15 ` Qu Wenruo 2022-04-05 4:58 ` Christoph Hellwig 2022-03-28 18:51 ` David Sterba 2 siblings, 2 replies; 23+ messages in thread From: Christoph Hellwig @ 2022-03-25 16:38 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote: > - u32 len = fs_info->sectorsize; > + const u32 len = fs_info->sectorsize; Why the spurious and rather pointless const annotation that is unrelated to the rest of the patch? > BTRFS_DEV_STAT_CORRUPTION_ERRS); > - memset(kaddr + pgoff, 1, len); > + memzero_page(page, pgoff, len); > flush_dcache_page(page); memzero_page already takes care of the flush_dcache_page. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-25 16:38 ` Christoph Hellwig @ 2022-03-26 1:15 ` Qu Wenruo 2022-04-05 4:58 ` Christoph Hellwig 1 sibling, 0 replies; 23+ messages in thread From: Qu Wenruo @ 2022-03-26 1:15 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-btrfs On 2022/3/26 00:38, Christoph Hellwig wrote: > On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote: >> - u32 len = fs_info->sectorsize; >> + const u32 len = fs_info->sectorsize; > > Why the spurious and rather pointless const annotation that is unrelated > to the rest of the patch? Because we prefer const for constant variables in btrfs code. As we have experienced some bugs in the past related to reusing some variables. > >> BTRFS_DEV_STAT_CORRUPTION_ERRS); >> - memset(kaddr + pgoff, 1, len); >> + memzero_page(page, pgoff, len); >> flush_dcache_page(page); > > memzero_page already takes care of the flush_dcache_page. > Oh, indeed. Thanks, Qu ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-25 16:38 ` Christoph Hellwig 2022-03-26 1:15 ` Qu Wenruo @ 2022-04-05 4:58 ` Christoph Hellwig 2022-04-05 5:08 ` Qu Wenruo 2022-04-05 13:59 ` David Sterba 1 sibling, 2 replies; 23+ messages in thread From: Christoph Hellwig @ 2022-04-05 4:58 UTC (permalink / raw) To: Qu Wenruo, David Sterba; +Cc: linux-btrfs On Fri, Mar 25, 2022 at 09:38:16AM -0700, Christoph Hellwig wrote: > > - memset(kaddr + pgoff, 1, len); > > + memzero_page(page, pgoff, len); > > flush_dcache_page(page); > > memzero_page already takes care of the flush_dcache_page. David, you've picked this up with the stay flush_dcache_page left in place. Plase try to fix it up instead of spreading the weird cargo cult flush_dcache_page calls. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-04-05 4:58 ` Christoph Hellwig @ 2022-04-05 5:08 ` Qu Wenruo 2022-04-05 14:04 ` David Sterba 2022-04-05 13:59 ` David Sterba 1 sibling, 1 reply; 23+ messages in thread From: Qu Wenruo @ 2022-04-05 5:08 UTC (permalink / raw) To: Christoph Hellwig, David Sterba; +Cc: linux-btrfs On 2022/4/5 12:58, Christoph Hellwig wrote: > On Fri, Mar 25, 2022 at 09:38:16AM -0700, Christoph Hellwig wrote: >>> - memset(kaddr + pgoff, 1, len); >>> + memzero_page(page, pgoff, len); >>> flush_dcache_page(page); >> >> memzero_page already takes care of the flush_dcache_page. > > David, you've picked this up with the stay flush_dcache_page left in > place. Plase try to fix it up instead of spreading the weird cargo cult > flush_dcache_page calls. > Please drop this patch, as discussed with Filipe, the memset() value 0x01 could be a poison value to distinguish from plain 0x00. Furthermore, we are not sure if we even want to zeroing out/poison the content. Thanks, Qu ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-04-05 5:08 ` Qu Wenruo @ 2022-04-05 14:04 ` David Sterba 0 siblings, 0 replies; 23+ messages in thread From: David Sterba @ 2022-04-05 14:04 UTC (permalink / raw) To: Qu Wenruo; +Cc: Christoph Hellwig, David Sterba, linux-btrfs On Tue, Apr 05, 2022 at 01:08:41PM +0800, Qu Wenruo wrote: > > > On 2022/4/5 12:58, Christoph Hellwig wrote: > > On Fri, Mar 25, 2022 at 09:38:16AM -0700, Christoph Hellwig wrote: > >>> - memset(kaddr + pgoff, 1, len); > >>> + memzero_page(page, pgoff, len); > >>> flush_dcache_page(page); > >> > >> memzero_page already takes care of the flush_dcache_page. > > > > David, you've picked this up with the stay flush_dcache_page left in > > place. Plase try to fix it up instead of spreading the weird cargo cult > > flush_dcache_page calls. > > Please drop this patch, as discussed with Filipe, the memset() value > 0x01 could be a poison value to distinguish from plain 0x00. > Furthermore, we are not sure if we even want to zeroing out/poison the > content. I've read the discussion and have been thinking what should we do here, I'd be for doing memset to 0 because 0x1 is quite arbitrary and does not follow any established pattern for poison values, if there's anything like that when returning data from kernel to userspace. I find zeros OK here. Arguably "It's been 0x1 forever so we should not change it" for backward compatibility reasons could apply, but this is not an interface that applications use, the error code is the interface. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-04-05 4:58 ` Christoph Hellwig 2022-04-05 5:08 ` Qu Wenruo @ 2022-04-05 13:59 ` David Sterba 1 sibling, 0 replies; 23+ messages in thread From: David Sterba @ 2022-04-05 13:59 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Qu Wenruo, David Sterba, linux-btrfs On Mon, Apr 04, 2022 at 09:58:26PM -0700, Christoph Hellwig wrote: > On Fri, Mar 25, 2022 at 09:38:16AM -0700, Christoph Hellwig wrote: > > > - memset(kaddr + pgoff, 1, len); > > > + memzero_page(page, pgoff, len); > > > flush_dcache_page(page); > > > > memzero_page already takes care of the flush_dcache_page. > > David, you've picked this up with the stay flush_dcache_page left in > place. Plase try to fix it up instead of spreading the weird cargo cult > flush_dcache_page calls. Good point, flush_dcache_page is in memzero_page so it should have been removed within the patch. I've grepped and there are still 14 calls, some of them next to memzero_page so that'll go away too. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-25 9:37 [PATCH] btrfs: replace a wrong memset() with memzero_page() Qu Wenruo 2022-03-25 10:10 ` Johannes Thumshirn 2022-03-25 16:38 ` Christoph Hellwig @ 2022-03-28 18:51 ` David Sterba 2022-03-28 18:58 ` David Sterba 2022-03-29 9:57 ` Filipe Manana 2 siblings, 2 replies; 23+ messages in thread From: David Sterba @ 2022-03-28 18:51 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote: > The original code is not really setting the memory to 0x00 but 0x01. > > To prevent such problem from happening, use memzero_page() instead. This should at least mention we think that setting it to 0 is right, as you call it wrong but give no hint why it's thought to be wrong. > Since we're here, also make @len const since it's just sectorsize. Please don't do that, adding const is fine when the line gets touched but otherwise adding it to an unrelated fix is not what I want to encourage. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-28 18:51 ` David Sterba @ 2022-03-28 18:58 ` David Sterba 2022-03-29 9:57 ` Filipe Manana 1 sibling, 0 replies; 23+ messages in thread From: David Sterba @ 2022-03-28 18:58 UTC (permalink / raw) To: David Sterba; +Cc: Qu Wenruo, linux-btrfs On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote: > On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote: > > The original code is not really setting the memory to 0x00 but 0x01. > > > > To prevent such problem from happening, use memzero_page() instead. > > This should at least mention we think that setting it to 0 is right, as > you call it wrong but give no hint why it's thought to be wrong. > > > Since we're here, also make @len const since it's just sectorsize. > > Please don't do that, adding const is fine when the line gets touched > but otherwise adding it to an unrelated fix is not what I want to > encourage. Applied with that hunk deleted and with an updated changelog. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-28 18:51 ` David Sterba 2022-03-28 18:58 ` David Sterba @ 2022-03-29 9:57 ` Filipe Manana 2022-03-29 10:49 ` Qu Wenruo 1 sibling, 1 reply; 23+ messages in thread From: Filipe Manana @ 2022-03-29 9:57 UTC (permalink / raw) To: dsterba, Qu Wenruo, linux-btrfs On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote: > On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote: > > The original code is not really setting the memory to 0x00 but 0x01. > > > > To prevent such problem from happening, use memzero_page() instead. > > This should at least mention we think that setting it to 0 is right, as > you call it wrong but give no hint why it's thought to be wrong. My guess is that something different from zero makes it easier to spot the problem in user space, as 0 is not uncommon (holes, prealloced extents) and may get unnoticed by applications/users. I don't see a good reason to change this behaviour. Maybe it's just the label name 'zeroit' that makes it confusing. > > > Since we're here, also make @len const since it's just sectorsize. > > Please don't do that, adding const is fine when the line gets touched > but otherwise adding it to an unrelated fix is not what I want to > encourage. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-29 9:57 ` Filipe Manana @ 2022-03-29 10:49 ` Qu Wenruo 2022-03-29 11:39 ` Filipe Manana 0 siblings, 1 reply; 23+ messages in thread From: Qu Wenruo @ 2022-03-29 10:49 UTC (permalink / raw) To: Filipe Manana, dsterba, linux-btrfs On 2022/3/29 17:57, Filipe Manana wrote: > On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote: >> On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote: >>> The original code is not really setting the memory to 0x00 but 0x01. >>> >>> To prevent such problem from happening, use memzero_page() instead. >> >> This should at least mention we think that setting it to 0 is right, as >> you call it wrong but give no hint why it's thought to be wrong. > > My guess is that something different from zero makes it easier to spot > the problem in user space, as 0 is not uncommon (holes, prealloced extents) > and may get unnoticed by applications/users. OK, that makes some sense. But shouldn't user space tool get an -EIO directly? As the corrupted range won't have PageUptodate set anyway. Thanks, Qu > > I don't see a good reason to change this behaviour. Maybe it's just the > label name 'zeroit' that makes it confusing. > >> >>> Since we're here, also make @len const since it's just sectorsize. >> >> Please don't do that, adding const is fine when the line gets touched >> but otherwise adding it to an unrelated fix is not what I want to >> encourage. > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-29 10:49 ` Qu Wenruo @ 2022-03-29 11:39 ` Filipe Manana 2022-03-29 23:52 ` Qu Wenruo 0 siblings, 1 reply; 23+ messages in thread From: Filipe Manana @ 2022-03-29 11:39 UTC (permalink / raw) To: Qu Wenruo; +Cc: dsterba, linux-btrfs On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote: > > > On 2022/3/29 17:57, Filipe Manana wrote: > > On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote: > > > On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote: > > > > The original code is not really setting the memory to 0x00 but 0x01. > > > > > > > > To prevent such problem from happening, use memzero_page() instead. > > > > > > This should at least mention we think that setting it to 0 is right, as > > > you call it wrong but give no hint why it's thought to be wrong. > > > > My guess is that something different from zero makes it easier to spot > > the problem in user space, as 0 is not uncommon (holes, prealloced extents) > > and may get unnoticed by applications/users. > > OK, that makes some sense. > > But shouldn't user space tool get an -EIO directly? It should. But even if applications get -EIO, they may often ignore return values. It's their fault, but if we can make it less likely that errors are not noticed, the better. I think we all did often, ignore all or just some return values from read(), write(), open(), etc. One recent example is the MariaDB case with io-uring. They were reporting corruption to the users, but the problem is that didn't properly check return values, ignoring partial reads and treating them as success: https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582 The data was fine, not corrupted, they just didn't deal with partial reads and then read the remaining data when a read returns less data than expected. > > As the corrupted range won't have PageUptodate set anyway. > > Thanks, > Qu > > > > I don't see a good reason to change this behaviour. Maybe it's just the > > label name 'zeroit' that makes it confusing. > > > > > > > > Since we're here, also make @len const since it's just sectorsize. > > > > > > Please don't do that, adding const is fine when the line gets touched > > > but otherwise adding it to an unrelated fix is not what I want to > > > encourage. > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-29 11:39 ` Filipe Manana @ 2022-03-29 23:52 ` Qu Wenruo 2022-03-30 9:27 ` Filipe Manana 0 siblings, 1 reply; 23+ messages in thread From: Qu Wenruo @ 2022-03-29 23:52 UTC (permalink / raw) To: Filipe Manana; +Cc: dsterba, linux-btrfs On 2022/3/29 19:39, Filipe Manana wrote: > On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote: >> >> >> On 2022/3/29 17:57, Filipe Manana wrote: >>> On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote: >>>> On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote: >>>>> The original code is not really setting the memory to 0x00 but 0x01. >>>>> >>>>> To prevent such problem from happening, use memzero_page() instead. >>>> >>>> This should at least mention we think that setting it to 0 is right, as >>>> you call it wrong but give no hint why it's thought to be wrong. >>> >>> My guess is that something different from zero makes it easier to spot >>> the problem in user space, as 0 is not uncommon (holes, prealloced extents) >>> and may get unnoticed by applications/users. >> >> OK, that makes some sense. >> >> But shouldn't user space tool get an -EIO directly? > > It should. > > But even if applications get -EIO, they may often ignore return values. > It's their fault, but if we can make it less likely that errors are not noticed, > the better. I think we all did often, ignore all or just some return values > from read(), write(), open(), etc. > > One recent example is the MariaDB case with io-uring. They were reporting > corruption to the users, but the problem is that didn't properly check > return values, ignoring partial reads and treating them as success: > > https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582 > > The data was fine, not corrupted, they just didn't deal with partial reads > and then read the remaining data when a read returns less data than expected. Then can we slightly improve the filling pattern? Instead of 0x01, introduce some poison value? And of course, change the lable "zeroit" to something like "poise_it"? Thanks, Qu > > > >> >> As the corrupted range won't have PageUptodate set anyway. >> >> Thanks, >> Qu >>> >>> I don't see a good reason to change this behaviour. Maybe it's just the >>> label name 'zeroit' that makes it confusing. > >>>> >>>>> Since we're here, also make @len const since it's just sectorsize. >>>> >>>> Please don't do that, adding const is fine when the line gets touched >>>> but otherwise adding it to an unrelated fix is not what I want to >>>> encourage. >>> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-29 23:52 ` Qu Wenruo @ 2022-03-30 9:27 ` Filipe Manana 2022-03-30 10:34 ` Filipe Manana 0 siblings, 1 reply; 23+ messages in thread From: Filipe Manana @ 2022-03-30 9:27 UTC (permalink / raw) To: Qu Wenruo; +Cc: dsterba, linux-btrfs On Wed, Mar 30, 2022 at 07:52:04AM +0800, Qu Wenruo wrote: > > > On 2022/3/29 19:39, Filipe Manana wrote: > > On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote: > > > > > > > > > On 2022/3/29 17:57, Filipe Manana wrote: > > > > On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote: > > > > > On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote: > > > > > > The original code is not really setting the memory to 0x00 but 0x01. > > > > > > > > > > > > To prevent such problem from happening, use memzero_page() instead. > > > > > > > > > > This should at least mention we think that setting it to 0 is right, as > > > > > you call it wrong but give no hint why it's thought to be wrong. > > > > > > > > My guess is that something different from zero makes it easier to spot > > > > the problem in user space, as 0 is not uncommon (holes, prealloced extents) > > > > and may get unnoticed by applications/users. > > > > > > OK, that makes some sense. > > > > > > But shouldn't user space tool get an -EIO directly? > > > > It should. > > > > But even if applications get -EIO, they may often ignore return values. > > It's their fault, but if we can make it less likely that errors are not noticed, > > the better. I think we all did often, ignore all or just some return values > > from read(), write(), open(), etc. > > > > One recent example is the MariaDB case with io-uring. They were reporting > > corruption to the users, but the problem is that didn't properly check > > return values, ignoring partial reads and treating them as success: > > > > https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582 > > > > The data was fine, not corrupted, they just didn't deal with partial reads > > and then read the remaining data when a read returns less data than expected. > > Then can we slightly improve the filling pattern? > > Instead of 0x01, introduce some poison value? Why isn't 0x01 a good enough value? A long range of 0x01 values is certainly unexpected in text files, and very likely on binary formats as well. Or do you think there's some case where long sequences of 0x01 are common and not unexpected? > > And of course, change the lable "zeroit" to something like "poise_it"? "poison_it", poise has a very different and unrelated meaning in English. > > Thanks, > Qu > > > > > > > > > > > > As the corrupted range won't have PageUptodate set anyway. > > > > > > Thanks, > > > Qu > > > > > > > > I don't see a good reason to change this behaviour. Maybe it's just the > > > > label name 'zeroit' that makes it confusing. > > > > > > > > > > > > Since we're here, also make @len const since it's just sectorsize. > > > > > > > > > > Please don't do that, adding const is fine when the line gets touched > > > > > but otherwise adding it to an unrelated fix is not what I want to > > > > > encourage. > > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-30 9:27 ` Filipe Manana @ 2022-03-30 10:34 ` Filipe Manana 2022-03-30 10:42 ` Qu Wenruo 0 siblings, 1 reply; 23+ messages in thread From: Filipe Manana @ 2022-03-30 10:34 UTC (permalink / raw) To: Qu Wenruo; +Cc: dsterba, linux-btrfs On Wed, Mar 30, 2022 at 10:27:53AM +0100, Filipe Manana wrote: > On Wed, Mar 30, 2022 at 07:52:04AM +0800, Qu Wenruo wrote: > > > > > > On 2022/3/29 19:39, Filipe Manana wrote: > > > On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote: > > > > > > > > > > > > On 2022/3/29 17:57, Filipe Manana wrote: > > > > > On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote: > > > > > > On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote: > > > > > > > The original code is not really setting the memory to 0x00 but 0x01. > > > > > > > > > > > > > > To prevent such problem from happening, use memzero_page() instead. > > > > > > > > > > > > This should at least mention we think that setting it to 0 is right, as > > > > > > you call it wrong but give no hint why it's thought to be wrong. > > > > > > > > > > My guess is that something different from zero makes it easier to spot > > > > > the problem in user space, as 0 is not uncommon (holes, prealloced extents) > > > > > and may get unnoticed by applications/users. > > > > > > > > OK, that makes some sense. > > > > > > > > But shouldn't user space tool get an -EIO directly? > > > > > > It should. > > > > > > But even if applications get -EIO, they may often ignore return values. > > > It's their fault, but if we can make it less likely that errors are not noticed, > > > the better. I think we all did often, ignore all or just some return values > > > from read(), write(), open(), etc. > > > > > > One recent example is the MariaDB case with io-uring. They were reporting > > > corruption to the users, but the problem is that didn't properly check > > > return values, ignoring partial reads and treating them as success: > > > > > > https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582 > > > > > > The data was fine, not corrupted, they just didn't deal with partial reads > > > and then read the remaining data when a read returns less data than expected. > > > > Then can we slightly improve the filling pattern? > > > > Instead of 0x01, introduce some poison value? > > Why isn't 0x01 a good enough value? > > A long range of 0x01 values is certainly unexpected in text files, and very likely > on binary formats as well. Or do you think there's some case where long sequences > of 0x01 are common and not unexpected? > > > > > And of course, change the lable "zeroit" to something like "poise_it"? > > "poison_it", poise has a very different and unrelated meaning in English. It's also worth considering if we should change the page content at all. Adding a poison value makes it easier to detect the corruption, both for developers and for sloppy applications that don't check error values. However people often want to still have access to the corrupted data, they can tolerate a few corrupted bytes and find the remaining useful. This has been requested a few times in the past. Thanks. > > > > > Thanks, > > Qu > > > > > > > > > > > > > > > > > As the corrupted range won't have PageUptodate set anyway. > > > > > > > > Thanks, > > > > Qu > > > > > > > > > > I don't see a good reason to change this behaviour. Maybe it's just the > > > > > label name 'zeroit' that makes it confusing. > > > > > > > > > > > > > > Since we're here, also make @len const since it's just sectorsize. > > > > > > > > > > > > Please don't do that, adding const is fine when the line gets touched > > > > > > but otherwise adding it to an unrelated fix is not what I want to > > > > > > encourage. > > > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-30 10:34 ` Filipe Manana @ 2022-03-30 10:42 ` Qu Wenruo 2022-03-30 11:02 ` Filipe Manana 2022-03-30 11:03 ` Graham Cobb 0 siblings, 2 replies; 23+ messages in thread From: Qu Wenruo @ 2022-03-30 10:42 UTC (permalink / raw) To: Filipe Manana; +Cc: dsterba, linux-btrfs On 2022/3/30 18:34, Filipe Manana wrote: > On Wed, Mar 30, 2022 at 10:27:53AM +0100, Filipe Manana wrote: >> On Wed, Mar 30, 2022 at 07:52:04AM +0800, Qu Wenruo wrote: >>> >>> >>> On 2022/3/29 19:39, Filipe Manana wrote: >>>> On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote: >>>>> >>>>> >>>>> On 2022/3/29 17:57, Filipe Manana wrote: >>>>>> On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote: >>>>>>> On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote: >>>>>>>> The original code is not really setting the memory to 0x00 but 0x01. >>>>>>>> >>>>>>>> To prevent such problem from happening, use memzero_page() instead. >>>>>>> >>>>>>> This should at least mention we think that setting it to 0 is right, as >>>>>>> you call it wrong but give no hint why it's thought to be wrong. >>>>>> >>>>>> My guess is that something different from zero makes it easier to spot >>>>>> the problem in user space, as 0 is not uncommon (holes, prealloced extents) >>>>>> and may get unnoticed by applications/users. >>>>> >>>>> OK, that makes some sense. >>>>> >>>>> But shouldn't user space tool get an -EIO directly? >>>> >>>> It should. >>>> >>>> But even if applications get -EIO, they may often ignore return values. >>>> It's their fault, but if we can make it less likely that errors are not noticed, >>>> the better. I think we all did often, ignore all or just some return values >>>> from read(), write(), open(), etc. >>>> >>>> One recent example is the MariaDB case with io-uring. They were reporting >>>> corruption to the users, but the problem is that didn't properly check >>>> return values, ignoring partial reads and treating them as success: >>>> >>>> https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582 >>>> >>>> The data was fine, not corrupted, they just didn't deal with partial reads >>>> and then read the remaining data when a read returns less data than expected. >>> >>> Then can we slightly improve the filling pattern? >>> >>> Instead of 0x01, introduce some poison value? >> >> Why isn't 0x01 a good enough value? >> >> A long range of 0x01 values is certainly unexpected in text files, and very likely >> on binary formats as well. Or do you think there's some case where long sequences >> of 0x01 are common and not unexpected? >> >>> >>> And of course, change the lable "zeroit" to something like "poise_it"? >> >> "poison_it", poise has a very different and unrelated meaning in English. > > It's also worth considering if we should change the page content at all. > > Adding a poison value makes it easier to detect the corruption, both for > developers and for sloppy applications that don't check error values. > > However people often want to still have access to the corrupted data, they > can tolerate a few corrupted bytes and find the remaining useful. This has > been requested a few times in the past. This looks more favorable. And I didn't think the change would break anything. For proper user space programs checking the error, they know it's an -EIO and will error out. For bad programs not checking the error, it will just read the corrupted data, and may even pass its internal sanity checks (if the corrupted bytes are not in some critical part). Or is there something we haven't taken into consideration? Thanks, Qu > > Thanks. > >> >>> >>> Thanks, >>> Qu >>>> >>>> >>>> >>>>> >>>>> As the corrupted range won't have PageUptodate set anyway. >>>>> >>>>> Thanks, >>>>> Qu >>>>>> >>>>>> I don't see a good reason to change this behaviour. Maybe it's just the >>>>>> label name 'zeroit' that makes it confusing. > >>>>>>> >>>>>>>> Since we're here, also make @len const since it's just sectorsize. >>>>>>> >>>>>>> Please don't do that, adding const is fine when the line gets touched >>>>>>> but otherwise adding it to an unrelated fix is not what I want to >>>>>>> encourage. >>>>>> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-30 10:42 ` Qu Wenruo @ 2022-03-30 11:02 ` Filipe Manana 2022-03-30 11:03 ` Graham Cobb 1 sibling, 0 replies; 23+ messages in thread From: Filipe Manana @ 2022-03-30 11:02 UTC (permalink / raw) To: Qu Wenruo; +Cc: dsterba, linux-btrfs On Wed, Mar 30, 2022 at 06:42:39PM +0800, Qu Wenruo wrote: > > > On 2022/3/30 18:34, Filipe Manana wrote: > > On Wed, Mar 30, 2022 at 10:27:53AM +0100, Filipe Manana wrote: > > > On Wed, Mar 30, 2022 at 07:52:04AM +0800, Qu Wenruo wrote: > > > > > > > > > > > > On 2022/3/29 19:39, Filipe Manana wrote: > > > > > On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote: > > > > > > > > > > > > > > > > > > On 2022/3/29 17:57, Filipe Manana wrote: > > > > > > > On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote: > > > > > > > > On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote: > > > > > > > > > The original code is not really setting the memory to 0x00 but 0x01. > > > > > > > > > > > > > > > > > > To prevent such problem from happening, use memzero_page() instead. > > > > > > > > > > > > > > > > This should at least mention we think that setting it to 0 is right, as > > > > > > > > you call it wrong but give no hint why it's thought to be wrong. > > > > > > > > > > > > > > My guess is that something different from zero makes it easier to spot > > > > > > > the problem in user space, as 0 is not uncommon (holes, prealloced extents) > > > > > > > and may get unnoticed by applications/users. > > > > > > > > > > > > OK, that makes some sense. > > > > > > > > > > > > But shouldn't user space tool get an -EIO directly? > > > > > > > > > > It should. > > > > > > > > > > But even if applications get -EIO, they may often ignore return values. > > > > > It's their fault, but if we can make it less likely that errors are not noticed, > > > > > the better. I think we all did often, ignore all or just some return values > > > > > from read(), write(), open(), etc. > > > > > > > > > > One recent example is the MariaDB case with io-uring. They were reporting > > > > > corruption to the users, but the problem is that didn't properly check > > > > > return values, ignoring partial reads and treating them as success: > > > > > > > > > > https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582 > > > > > > > > > > The data was fine, not corrupted, they just didn't deal with partial reads > > > > > and then read the remaining data when a read returns less data than expected. > > > > > > > > Then can we slightly improve the filling pattern? > > > > > > > > Instead of 0x01, introduce some poison value? > > > > > > Why isn't 0x01 a good enough value? > > > > > > A long range of 0x01 values is certainly unexpected in text files, and very likely > > > on binary formats as well. Or do you think there's some case where long sequences > > > of 0x01 are common and not unexpected? > > > > > > > > > > > And of course, change the lable "zeroit" to something like "poise_it"? > > > > > > "poison_it", poise has a very different and unrelated meaning in English. > > > > It's also worth considering if we should change the page content at all. > > > > Adding a poison value makes it easier to detect the corruption, both for > > developers and for sloppy applications that don't check error values. > > > > However people often want to still have access to the corrupted data, they > > can tolerate a few corrupted bytes and find the remaining useful. This has > > been requested a few times in the past. > > This looks more favorable. > > And I didn't think the change would break anything. > > For proper user space programs checking the error, they know it's an > -EIO and will error out. > > For bad programs not checking the error, it will just read the corrupted > data, and may even pass its internal sanity checks (if the corrupted > bytes are not in some critical part). > > Or is there something we haven't taken into consideration? Apart from that, applications ignoring error code returned from read()/pread()/etc, I can't think about anything else to worry about (at least for now). I believe the poison value was intended only for the early days of development, or at least it's something I would do myself because it makes sense and I've seen it elsewhere too. But there are no comments in the code or change logs to validate that assumption. Certainly zeroing out the conent is not useful, so either a poison value or don't change the page data at all. We should really have a test for this - for example use a single data profile, corrupt 1 byte of data, mount the fs, read the data, check we get -EIO and verify that the page content is either the data we have on disk or the poison value. > > Thanks, > Qu > > > > > Thanks. > > > > > > > > > > > > > Thanks, > > > > Qu > > > > > > > > > > > > > > > > > > > > > > > > > > > As the corrupted range won't have PageUptodate set anyway. > > > > > > > > > > > > Thanks, > > > > > > Qu > > > > > > > > > > > > > > I don't see a good reason to change this behaviour. Maybe it's just the > > > > > > > label name 'zeroit' that makes it confusing. > > > > > > > > > > > > > > > > > > Since we're here, also make @len const since it's just sectorsize. > > > > > > > > > > > > > > > > Please don't do that, adding const is fine when the line gets touched > > > > > > > > but otherwise adding it to an unrelated fix is not what I want to > > > > > > > > encourage. > > > > > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-30 10:42 ` Qu Wenruo 2022-03-30 11:02 ` Filipe Manana @ 2022-03-30 11:03 ` Graham Cobb 2022-03-30 21:34 ` Filipe Manana 1 sibling, 1 reply; 23+ messages in thread From: Graham Cobb @ 2022-03-30 11:03 UTC (permalink / raw) To: Qu Wenruo, Filipe Manana; +Cc: dsterba, linux-btrfs On 30/03/2022 11:42, Qu Wenruo wrote: > > > On 2022/3/30 18:34, Filipe Manana wrote: >> On Wed, Mar 30, 2022 at 10:27:53AM +0100, Filipe Manana wrote: >>> On Wed, Mar 30, 2022 at 07:52:04AM +0800, Qu Wenruo wrote: >>>> >>>> >>>> On 2022/3/29 19:39, Filipe Manana wrote: >>>>> On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote: >>>>>> >>>>>> >>>>>> On 2022/3/29 17:57, Filipe Manana wrote: >>>>>>> On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote: >>>>>>>> On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote: >>>>>>>>> The original code is not really setting the memory to 0x00 but >>>>>>>>> 0x01. >>>>>>>>> >>>>>>>>> To prevent such problem from happening, use memzero_page() >>>>>>>>> instead. >>>>>>>> >>>>>>>> This should at least mention we think that setting it to 0 is >>>>>>>> right, as >>>>>>>> you call it wrong but give no hint why it's thought to be wrong. >>>>>>> >>>>>>> My guess is that something different from zero makes it easier to >>>>>>> spot >>>>>>> the problem in user space, as 0 is not uncommon (holes, >>>>>>> prealloced extents) >>>>>>> and may get unnoticed by applications/users. >>>>>> >>>>>> OK, that makes some sense. >>>>>> >>>>>> But shouldn't user space tool get an -EIO directly? >>>>> >>>>> It should. >>>>> >>>>> But even if applications get -EIO, they may often ignore return >>>>> values. >>>>> It's their fault, but if we can make it less likely that errors are >>>>> not noticed, >>>>> the better. I think we all did often, ignore all or just some >>>>> return values >>>>> from read(), write(), open(), etc. >>>>> >>>>> One recent example is the MariaDB case with io-uring. They were >>>>> reporting >>>>> corruption to the users, but the problem is that didn't properly check >>>>> return values, ignoring partial reads and treating them as success: >>>>> >>>>> https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582 >>>>> >>>>> >>>>> The data was fine, not corrupted, they just didn't deal with >>>>> partial reads >>>>> and then read the remaining data when a read returns less data than >>>>> expected. >>>> >>>> Then can we slightly improve the filling pattern? >>>> >>>> Instead of 0x01, introduce some poison value? >>> >>> Why isn't 0x01 a good enough value? >>> >>> A long range of 0x01 values is certainly unexpected in text files, >>> and very likely >>> on binary formats as well. Or do you think there's some case where >>> long sequences >>> of 0x01 are common and not unexpected? >>> >>>> >>>> And of course, change the lable "zeroit" to something like "poise_it"? >>> >>> "poison_it", poise has a very different and unrelated meaning in >>> English. >> >> It's also worth considering if we should change the page content at all. >> >> Adding a poison value makes it easier to detect the corruption, both for >> developers and for sloppy applications that don't check error values. >> >> However people often want to still have access to the corrupted data, >> they >> can tolerate a few corrupted bytes and find the remaining useful. This >> has >> been requested a few times in the past. > > This looks more favorable. > > And I didn't think the change would break anything. > > For proper user space programs checking the error, they know it's an > -EIO and will error out. > > For bad programs not checking the error, it will just read the corrupted > data, and may even pass its internal sanity checks (if the corrupted > bytes are not in some critical part). > > Or is there something we haven't taken into consideration? Well.... If an error occurred something has gone wrong. It could be a simple bit flip in the storage itself, or it could be something else. The something else could be a software or firmware bug, or a memory corruption or memory controller power fluctuation blip, or .... I guess it might even be possible in some architectures that the problem could result in page table updates validating a page that actually was never written to at all and could still contain some previous contents. In a time when we worry about things like Spectre, Meltdown, Rowhammer, stale cache leakage, etc it may be a good security principle that data left over from failed operations should never be made visible to user mode. Possibly unnecessary worrying, but then who thought that jump prediction would create a side-channel for exposing data that can actually be exploited in real life. Graham ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-30 11:03 ` Graham Cobb @ 2022-03-30 21:34 ` Filipe Manana 2022-03-30 22:29 ` Graham Cobb 0 siblings, 1 reply; 23+ messages in thread From: Filipe Manana @ 2022-03-30 21:34 UTC (permalink / raw) To: Graham Cobb; +Cc: Qu Wenruo, dsterba, linux-btrfs On Wed, Mar 30, 2022 at 12:03:36PM +0100, Graham Cobb wrote: > > On 30/03/2022 11:42, Qu Wenruo wrote: > > > > > > On 2022/3/30 18:34, Filipe Manana wrote: > >> On Wed, Mar 30, 2022 at 10:27:53AM +0100, Filipe Manana wrote: > >>> On Wed, Mar 30, 2022 at 07:52:04AM +0800, Qu Wenruo wrote: > >>>> > >>>> > >>>> On 2022/3/29 19:39, Filipe Manana wrote: > >>>>> On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote: > >>>>>> > >>>>>> > >>>>>> On 2022/3/29 17:57, Filipe Manana wrote: > >>>>>>> On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote: > >>>>>>>> On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote: > >>>>>>>>> The original code is not really setting the memory to 0x00 but > >>>>>>>>> 0x01. > >>>>>>>>> > >>>>>>>>> To prevent such problem from happening, use memzero_page() > >>>>>>>>> instead. > >>>>>>>> > >>>>>>>> This should at least mention we think that setting it to 0 is > >>>>>>>> right, as > >>>>>>>> you call it wrong but give no hint why it's thought to be wrong. > >>>>>>> > >>>>>>> My guess is that something different from zero makes it easier to > >>>>>>> spot > >>>>>>> the problem in user space, as 0 is not uncommon (holes, > >>>>>>> prealloced extents) > >>>>>>> and may get unnoticed by applications/users. > >>>>>> > >>>>>> OK, that makes some sense. > >>>>>> > >>>>>> But shouldn't user space tool get an -EIO directly? > >>>>> > >>>>> It should. > >>>>> > >>>>> But even if applications get -EIO, they may often ignore return > >>>>> values. > >>>>> It's their fault, but if we can make it less likely that errors are > >>>>> not noticed, > >>>>> the better. I think we all did often, ignore all or just some > >>>>> return values > >>>>> from read(), write(), open(), etc. > >>>>> > >>>>> One recent example is the MariaDB case with io-uring. They were > >>>>> reporting > >>>>> corruption to the users, but the problem is that didn't properly check > >>>>> return values, ignoring partial reads and treating them as success: > >>>>> > >>>>> https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582 > >>>>> > >>>>> > >>>>> The data was fine, not corrupted, they just didn't deal with > >>>>> partial reads > >>>>> and then read the remaining data when a read returns less data than > >>>>> expected. > >>>> > >>>> Then can we slightly improve the filling pattern? > >>>> > >>>> Instead of 0x01, introduce some poison value? > >>> > >>> Why isn't 0x01 a good enough value? > >>> > >>> A long range of 0x01 values is certainly unexpected in text files, > >>> and very likely > >>> on binary formats as well. Or do you think there's some case where > >>> long sequences > >>> of 0x01 are common and not unexpected? > >>> > >>>> > >>>> And of course, change the lable "zeroit" to something like "poise_it"? > >>> > >>> "poison_it", poise has a very different and unrelated meaning in > >>> English. > >> > >> It's also worth considering if we should change the page content at all. > >> > >> Adding a poison value makes it easier to detect the corruption, both for > >> developers and for sloppy applications that don't check error values. > >> > >> However people often want to still have access to the corrupted data, > >> they > >> can tolerate a few corrupted bytes and find the remaining useful. This > >> has > >> been requested a few times in the past. > > > > This looks more favorable. > > > > And I didn't think the change would break anything. > > > > For proper user space programs checking the error, they know it's an > > -EIO and will error out. > > > > For bad programs not checking the error, it will just read the corrupted > > data, and may even pass its internal sanity checks (if the corrupted > > bytes are not in some critical part). > > > > Or is there something we haven't taken into consideration? > > Well.... If an error occurred something has gone wrong. It could be a > simple bit flip in the storage itself, or it could be something else. > The something else could be a software or firmware bug, or a memory > corruption or memory controller power fluctuation blip, or .... I guess > it might even be possible in some architectures that the problem could > result in page table updates validating a page that actually was never > written to at all and could still contain some previous contents. > > In a time when we worry about things like Spectre, Meltdown, Rowhammer, > stale cache leakage, etc it may be a good security principle that data > left over from failed operations should never be made visible to user mode. > > Possibly unnecessary worrying, but then who thought that jump prediction > would create a side-channel for exposing data that can actually be > exploited in real life. Hum, then a filesystem that doesn't have data checksums (the vast majority of them), like ext4, xfs, etc, has a serious security flaw, no? You ask to read an extent, and it returns whatever is on disk, even if it has suffered some corruption. Shall we open a CVE, and force all filesystems to implement data checksums? ;) We would also have to drop support for nodatacow and nodatasum from btrfs. > > Graham ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() 2022-03-30 21:34 ` Filipe Manana @ 2022-03-30 22:29 ` Graham Cobb 0 siblings, 0 replies; 23+ messages in thread From: Graham Cobb @ 2022-03-30 22:29 UTC (permalink / raw) To: Filipe Manana; +Cc: Qu Wenruo, dsterba, linux-btrfs On 30/03/2022 22:34, Filipe Manana wrote: > On Wed, Mar 30, 2022 at 12:03:36PM +0100, Graham Cobb wrote: >> >> On 30/03/2022 11:42, Qu Wenruo wrote: >>> >>> >>> On 2022/3/30 18:34, Filipe Manana wrote: >>>> On Wed, Mar 30, 2022 at 10:27:53AM +0100, Filipe Manana wrote: >>>>> On Wed, Mar 30, 2022 at 07:52:04AM +0800, Qu Wenruo wrote: >>>>>> >>>>>> >>>>>> On 2022/3/29 19:39, Filipe Manana wrote: >>>>>>> On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 2022/3/29 17:57, Filipe Manana wrote: >>>>>>>>> On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote: >>>>>>>>>> On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote: >>>>>>>>>>> The original code is not really setting the memory to 0x00 but >>>>>>>>>>> 0x01. >>>>>>>>>>> >>>>>>>>>>> To prevent such problem from happening, use memzero_page() >>>>>>>>>>> instead. >>>>>>>>>> >>>>>>>>>> This should at least mention we think that setting it to 0 is >>>>>>>>>> right, as >>>>>>>>>> you call it wrong but give no hint why it's thought to be wrong. >>>>>>>>> >>>>>>>>> My guess is that something different from zero makes it easier to >>>>>>>>> spot >>>>>>>>> the problem in user space, as 0 is not uncommon (holes, >>>>>>>>> prealloced extents) >>>>>>>>> and may get unnoticed by applications/users. >>>>>>>> >>>>>>>> OK, that makes some sense. >>>>>>>> >>>>>>>> But shouldn't user space tool get an -EIO directly? >>>>>>> >>>>>>> It should. >>>>>>> >>>>>>> But even if applications get -EIO, they may often ignore return >>>>>>> values. >>>>>>> It's their fault, but if we can make it less likely that errors are >>>>>>> not noticed, >>>>>>> the better. I think we all did often, ignore all or just some >>>>>>> return values >>>>>>> from read(), write(), open(), etc. >>>>>>> >>>>>>> One recent example is the MariaDB case with io-uring. They were >>>>>>> reporting >>>>>>> corruption to the users, but the problem is that didn't properly check >>>>>>> return values, ignoring partial reads and treating them as success: >>>>>>> >>>>>>> https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582 >>>>>>> >>>>>>> >>>>>>> The data was fine, not corrupted, they just didn't deal with >>>>>>> partial reads >>>>>>> and then read the remaining data when a read returns less data than >>>>>>> expected. >>>>>> >>>>>> Then can we slightly improve the filling pattern? >>>>>> >>>>>> Instead of 0x01, introduce some poison value? >>>>> >>>>> Why isn't 0x01 a good enough value? >>>>> >>>>> A long range of 0x01 values is certainly unexpected in text files, >>>>> and very likely >>>>> on binary formats as well. Or do you think there's some case where >>>>> long sequences >>>>> of 0x01 are common and not unexpected? >>>>> >>>>>> >>>>>> And of course, change the lable "zeroit" to something like "poise_it"? >>>>> >>>>> "poison_it", poise has a very different and unrelated meaning in >>>>> English. >>>> >>>> It's also worth considering if we should change the page content at all. >>>> >>>> Adding a poison value makes it easier to detect the corruption, both for >>>> developers and for sloppy applications that don't check error values. >>>> >>>> However people often want to still have access to the corrupted data, >>>> they >>>> can tolerate a few corrupted bytes and find the remaining useful. This >>>> has >>>> been requested a few times in the past. >>> >>> This looks more favorable. >>> >>> And I didn't think the change would break anything. >>> >>> For proper user space programs checking the error, they know it's an >>> -EIO and will error out. >>> >>> For bad programs not checking the error, it will just read the corrupted >>> data, and may even pass its internal sanity checks (if the corrupted >>> bytes are not in some critical part). >>> >>> Or is there something we haven't taken into consideration? >> >> Well.... If an error occurred something has gone wrong. It could be a >> simple bit flip in the storage itself, or it could be something else. >> The something else could be a software or firmware bug, or a memory >> corruption or memory controller power fluctuation blip, or .... I guess >> it might even be possible in some architectures that the problem could >> result in page table updates validating a page that actually was never >> written to at all and could still contain some previous contents. >> >> In a time when we worry about things like Spectre, Meltdown, Rowhammer, >> stale cache leakage, etc it may be a good security principle that data >> left over from failed operations should never be made visible to user mode. >> >> Possibly unnecessary worrying, but then who thought that jump prediction >> would create a side-channel for exposing data that can actually be >> exploited in real life. > > Hum, then a filesystem that doesn't have data checksums (the vast majority of > them), like ext4, xfs, etc, has a serious security flaw, no? You ask to read an > extent, and it returns whatever is on disk, even if it has suffered some corruption. > > Shall we open a CVE, and force all filesystems to implement data checksums? ;) > We would also have to drop support for nodatacow and nodatasum from btrfs. No. There is a clear difference between cases where you don't know whether the data is valid and cases where there has been an error detected (a device-reported read error, a checksum error, a logic error, or whatever). You asked if there was anything you hadn't taken into consideration. It is entirely up to you and the team to consider it - in cases where you *know* an error has occurred should you refuse to return the data? I don't know - but the decision should be deliberate and deserves a comment. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2022-04-05 21:15 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-03-25 9:37 [PATCH] btrfs: replace a wrong memset() with memzero_page() Qu Wenruo 2022-03-25 10:10 ` Johannes Thumshirn 2022-03-25 10:49 ` Qu Wenruo 2022-03-25 10:58 ` Johannes Thumshirn 2022-03-25 16:38 ` Christoph Hellwig 2022-03-26 1:15 ` Qu Wenruo 2022-04-05 4:58 ` Christoph Hellwig 2022-04-05 5:08 ` Qu Wenruo 2022-04-05 14:04 ` David Sterba 2022-04-05 13:59 ` David Sterba 2022-03-28 18:51 ` David Sterba 2022-03-28 18:58 ` David Sterba 2022-03-29 9:57 ` Filipe Manana 2022-03-29 10:49 ` Qu Wenruo 2022-03-29 11:39 ` Filipe Manana 2022-03-29 23:52 ` Qu Wenruo 2022-03-30 9:27 ` Filipe Manana 2022-03-30 10:34 ` Filipe Manana 2022-03-30 10:42 ` Qu Wenruo 2022-03-30 11:02 ` Filipe Manana 2022-03-30 11:03 ` Graham Cobb 2022-03-30 21:34 ` Filipe Manana 2022-03-30 22:29 ` Graham Cobb
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox