From: Filipe Manana <fdmanana@kernel.org>
To: Graham Cobb <g.btrfs@cobb.uk.net>
Cc: Qu Wenruo <quwenruo.btrfs@gmx.com>,
dsterba@suse.cz, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: replace a wrong memset() with memzero_page()
Date: Wed, 30 Mar 2022 22:34:15 +0100 [thread overview]
Message-ID: <YkTM1yX2G02JyhHj@debian9.Home> (raw)
In-Reply-To: <bc1c01bd-7f98-a2a3-8a2e-2a1c1d31e853@cobb.uk.net>
On Wed, Mar 30, 2022 at 12:03:36PM +0100, Graham Cobb wrote:
>
> On 30/03/2022 11:42, Qu Wenruo wrote:
> >
> >
> > On 2022/3/30 18:34, Filipe Manana wrote:
> >> On Wed, Mar 30, 2022 at 10:27:53AM +0100, Filipe Manana wrote:
> >>> On Wed, Mar 30, 2022 at 07:52:04AM +0800, Qu Wenruo wrote:
> >>>>
> >>>>
> >>>> On 2022/3/29 19:39, Filipe Manana wrote:
> >>>>> On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 2022/3/29 17:57, Filipe Manana wrote:
> >>>>>>> On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote:
> >>>>>>>> On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote:
> >>>>>>>>> The original code is not really setting the memory to 0x00 but
> >>>>>>>>> 0x01.
> >>>>>>>>>
> >>>>>>>>> To prevent such problem from happening, use memzero_page()
> >>>>>>>>> instead.
> >>>>>>>>
> >>>>>>>> This should at least mention we think that setting it to 0 is
> >>>>>>>> right, as
> >>>>>>>> you call it wrong but give no hint why it's thought to be wrong.
> >>>>>>>
> >>>>>>> My guess is that something different from zero makes it easier to
> >>>>>>> spot
> >>>>>>> the problem in user space, as 0 is not uncommon (holes,
> >>>>>>> prealloced extents)
> >>>>>>> and may get unnoticed by applications/users.
> >>>>>>
> >>>>>> OK, that makes some sense.
> >>>>>>
> >>>>>> But shouldn't user space tool get an -EIO directly?
> >>>>>
> >>>>> It should.
> >>>>>
> >>>>> But even if applications get -EIO, they may often ignore return
> >>>>> values.
> >>>>> It's their fault, but if we can make it less likely that errors are
> >>>>> not noticed,
> >>>>> the better. I think we all did often, ignore all or just some
> >>>>> return values
> >>>>> from read(), write(), open(), etc.
> >>>>>
> >>>>> One recent example is the MariaDB case with io-uring. They were
> >>>>> reporting
> >>>>> corruption to the users, but the problem is that didn't properly check
> >>>>> return values, ignoring partial reads and treating them as success:
> >>>>>
> >>>>> https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582
> >>>>>
> >>>>>
> >>>>> The data was fine, not corrupted, they just didn't deal with
> >>>>> partial reads
> >>>>> and then read the remaining data when a read returns less data than
> >>>>> expected.
> >>>>
> >>>> Then can we slightly improve the filling pattern?
> >>>>
> >>>> Instead of 0x01, introduce some poison value?
> >>>
> >>> Why isn't 0x01 a good enough value?
> >>>
> >>> A long range of 0x01 values is certainly unexpected in text files,
> >>> and very likely
> >>> on binary formats as well. Or do you think there's some case where
> >>> long sequences
> >>> of 0x01 are common and not unexpected?
> >>>
> >>>>
> >>>> And of course, change the lable "zeroit" to something like "poise_it"?
> >>>
> >>> "poison_it", poise has a very different and unrelated meaning in
> >>> English.
> >>
> >> It's also worth considering if we should change the page content at all.
> >>
> >> Adding a poison value makes it easier to detect the corruption, both for
> >> developers and for sloppy applications that don't check error values.
> >>
> >> However people often want to still have access to the corrupted data,
> >> they
> >> can tolerate a few corrupted bytes and find the remaining useful. This
> >> has
> >> been requested a few times in the past.
> >
> > This looks more favorable.
> >
> > And I didn't think the change would break anything.
> >
> > For proper user space programs checking the error, they know it's an
> > -EIO and will error out.
> >
> > For bad programs not checking the error, it will just read the corrupted
> > data, and may even pass its internal sanity checks (if the corrupted
> > bytes are not in some critical part).
> >
> > Or is there something we haven't taken into consideration?
>
> Well.... If an error occurred something has gone wrong. It could be a
> simple bit flip in the storage itself, or it could be something else.
> The something else could be a software or firmware bug, or a memory
> corruption or memory controller power fluctuation blip, or .... I guess
> it might even be possible in some architectures that the problem could
> result in page table updates validating a page that actually was never
> written to at all and could still contain some previous contents.
>
> In a time when we worry about things like Spectre, Meltdown, Rowhammer,
> stale cache leakage, etc it may be a good security principle that data
> left over from failed operations should never be made visible to user mode.
>
> Possibly unnecessary worrying, but then who thought that jump prediction
> would create a side-channel for exposing data that can actually be
> exploited in real life.
Hum, then a filesystem that doesn't have data checksums (the vast majority of
them), like ext4, xfs, etc, has a serious security flaw, no? You ask to read an
extent, and it returns whatever is on disk, even if it has suffered some corruption.
Shall we open a CVE, and force all filesystems to implement data checksums? ;)
We would also have to drop support for nodatacow and nodatasum from btrfs.
>
> Graham
next prev parent reply other threads:[~2022-03-30 21:34 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-25 9:37 [PATCH] btrfs: replace a wrong memset() with memzero_page() Qu Wenruo
2022-03-25 10:10 ` Johannes Thumshirn
2022-03-25 10:49 ` Qu Wenruo
2022-03-25 10:58 ` Johannes Thumshirn
2022-03-25 16:38 ` Christoph Hellwig
2022-03-26 1:15 ` Qu Wenruo
2022-04-05 4:58 ` Christoph Hellwig
2022-04-05 5:08 ` Qu Wenruo
2022-04-05 14:04 ` David Sterba
2022-04-05 13:59 ` David Sterba
2022-03-28 18:51 ` David Sterba
2022-03-28 18:58 ` David Sterba
2022-03-29 9:57 ` Filipe Manana
2022-03-29 10:49 ` Qu Wenruo
2022-03-29 11:39 ` Filipe Manana
2022-03-29 23:52 ` Qu Wenruo
2022-03-30 9:27 ` Filipe Manana
2022-03-30 10:34 ` Filipe Manana
2022-03-30 10:42 ` Qu Wenruo
2022-03-30 11:02 ` Filipe Manana
2022-03-30 11:03 ` Graham Cobb
2022-03-30 21:34 ` Filipe Manana [this message]
2022-03-30 22:29 ` Graham Cobb
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YkTM1yX2G02JyhHj@debian9.Home \
--to=fdmanana@kernel.org \
--cc=dsterba@suse.cz \
--cc=g.btrfs@cobb.uk.net \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox