From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21550C433EF for ; Wed, 30 Mar 2022 21:34:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238147AbiC3VgH (ORCPT ); Wed, 30 Mar 2022 17:36:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233375AbiC3VgG (ORCPT ); Wed, 30 Mar 2022 17:36:06 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35A7CBC0B for ; Wed, 30 Mar 2022 14:34:20 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 85D386170B for ; Wed, 30 Mar 2022 21:34:19 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 52A6EC340EE; Wed, 30 Mar 2022 21:34:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1648676058; bh=X0kq7ccEavaSsnVfXy9qfEIwziwuKGhsXs1JAhPNGbw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=uDgB9bK8ngv5UHhObswlMRIJSDMjY9Pm0wrwwHoO0enKad8NZwyBlYJ2FwSmbZv6e ZF7YTXXHo17Wb33RsshhxUVY+swMIUDO4xHLI/6MNj8ibHAPaFudzjmO5YORPEaIp3 hxNEs4+voO+7ynPDrHYeJAxMwi5bFVKXe2zYXmaAPD0T9HmBv9ySUxmM4Iq76QfiSO tUHRMYNXEsFfp58ni3cmO/YKMTYtByXqFjvjCeNz0TKHpzcUUIR49su4m9CzwqMfd6 c2KJXsmmHmbX8xp2uMCJsi8iB4wnAnu+4EW9AJwWkJmLql29JLiSro+xtcW8oVg/Ta EzfFUIJuVoLeg== Date: Wed, 30 Mar 2022 22:34:15 +0100 From: Filipe Manana To: Graham Cobb Cc: Qu Wenruo , dsterba@suse.cz, linux-btrfs@vger.kernel.org Subject: Re: [PATCH] btrfs: replace a wrong memset() with memzero_page() Message-ID: References: <8d6f911a0282eba76f9e05d6f1e7c6f091d4870b.1648199349.git.wqu@suse.com> <20220328185121.GQ2237@twin.jikos.cz> <6e5e4112-8b9f-fcb4-1a04-c68db2fa9880@gmx.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Wed, Mar 30, 2022 at 12:03:36PM +0100, Graham Cobb wrote: > > On 30/03/2022 11:42, Qu Wenruo wrote: > > > > > > On 2022/3/30 18:34, Filipe Manana wrote: > >> On Wed, Mar 30, 2022 at 10:27:53AM +0100, Filipe Manana wrote: > >>> On Wed, Mar 30, 2022 at 07:52:04AM +0800, Qu Wenruo wrote: > >>>> > >>>> > >>>> On 2022/3/29 19:39, Filipe Manana wrote: > >>>>> On Tue, Mar 29, 2022 at 06:49:10PM +0800, Qu Wenruo wrote: > >>>>>> > >>>>>> > >>>>>> On 2022/3/29 17:57, Filipe Manana wrote: > >>>>>>> On Mon, Mar 28, 2022 at 08:51:21PM +0200, David Sterba wrote: > >>>>>>>> On Fri, Mar 25, 2022 at 05:37:59PM +0800, Qu Wenruo wrote: > >>>>>>>>> The original code is not really setting the memory to 0x00 but > >>>>>>>>> 0x01. > >>>>>>>>> > >>>>>>>>> To prevent such problem from happening, use memzero_page() > >>>>>>>>> instead. > >>>>>>>> > >>>>>>>> This should at least mention we think that setting it to 0 is > >>>>>>>> right, as > >>>>>>>> you call it wrong but give no hint why it's thought to be wrong. > >>>>>>> > >>>>>>> My guess is that something different from zero makes it easier to > >>>>>>> spot > >>>>>>> the problem in user space, as 0 is not uncommon (holes, > >>>>>>> prealloced extents) > >>>>>>> and may get unnoticed by applications/users. > >>>>>> > >>>>>> OK, that makes some sense. > >>>>>> > >>>>>> But shouldn't user space tool get an -EIO directly? > >>>>> > >>>>> It should. > >>>>> > >>>>> But even if applications get -EIO, they may often ignore return > >>>>> values. > >>>>> It's their fault, but if we can make it less likely that errors are > >>>>> not noticed, > >>>>> the better. I think we all did often, ignore all or just some > >>>>> return values > >>>>> from read(), write(), open(), etc. > >>>>> > >>>>> One recent example is the MariaDB case with io-uring. They were > >>>>> reporting > >>>>> corruption to the users, but the problem is that didn't properly check > >>>>> return values, ignoring partial reads and treating them as success: > >>>>> > >>>>> https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582 > >>>>> > >>>>> > >>>>> The data was fine, not corrupted, they just didn't deal with > >>>>> partial reads > >>>>> and then read the remaining data when a read returns less data than > >>>>> expected. > >>>> > >>>> Then can we slightly improve the filling pattern? > >>>> > >>>> Instead of 0x01, introduce some poison value? > >>> > >>> Why isn't 0x01 a good enough value? > >>> > >>> A long range of 0x01 values is certainly unexpected in text files, > >>> and very likely > >>> on binary formats as well. Or do you think there's some case where > >>> long sequences > >>> of 0x01 are common and not unexpected? > >>> > >>>> > >>>> And of course, change the lable "zeroit" to something like "poise_it"? > >>> > >>> "poison_it", poise has a very different and unrelated meaning in > >>> English. > >> > >> It's also worth considering if we should change the page content at all. > >> > >> Adding a poison value makes it easier to detect the corruption, both for > >> developers and for sloppy applications that don't check error values. > >> > >> However people often want to still have access to the corrupted data, > >> they > >> can tolerate a few corrupted bytes and find the remaining useful. This > >> has > >> been requested a few times in the past. > > > > This looks more favorable. > > > > And I didn't think the change would break anything. > > > > For proper user space programs checking the error, they know it's an > > -EIO and will error out. > > > > For bad programs not checking the error, it will just read the corrupted > > data, and may even pass its internal sanity checks (if the corrupted > > bytes are not in some critical part). > > > > Or is there something we haven't taken into consideration? > > Well.... If an error occurred something has gone wrong. It could be a > simple bit flip in the storage itself, or it could be something else. > The something else could be a software or firmware bug, or a memory > corruption or memory controller power fluctuation blip, or .... I guess > it might even be possible in some architectures that the problem could > result in page table updates validating a page that actually was never > written to at all and could still contain some previous contents. > > In a time when we worry about things like Spectre, Meltdown, Rowhammer, > stale cache leakage, etc it may be a good security principle that data > left over from failed operations should never be made visible to user mode. > > Possibly unnecessary worrying, but then who thought that jump prediction > would create a side-channel for exposing data that can actually be > exploited in real life. Hum, then a filesystem that doesn't have data checksums (the vast majority of them), like ext4, xfs, etc, has a serious security flaw, no? You ask to read an extent, and it returns whatever is on disk, even if it has suffered some corruption. Shall we open a CVE, and force all filesystems to implement data checksums? ;) We would also have to drop support for nodatacow and nodatasum from btrfs. > > Graham