From: Greg KH <greg@kroah.com>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org, stable@vger.kernel.org,
Christoph Hellwig <hch@infradead.org>,
Filipe Manana <fdmanana@suse.com>,
David Sterba <dsterba@suse.com>
Subject: Re: [PATCH 6.12.y] btrfs: always fallback to buffered write if the inode requires checksum
Date: Mon, 12 May 2025 16:00:05 +0200 [thread overview]
Message-ID: <2025051245-engine-overthrow-8820@gregkh> (raw)
In-Reply-To: <968f19c5b1b7d5595423b0ac0020cc18dfed8cb5.1746665263.git.wqu@suse.com>
On Thu, May 08, 2025 at 10:31:25AM +0930, Qu Wenruo wrote:
> commit 968f19c5b1b7d5595423b0ac0020cc18dfed8cb5 upstream.
>
> [BUG]
> It is a long known bug that VM image on btrfs can lead to data csum
> mismatch, if the qemu is using direct-io for the image (this is commonly
> known as cache mode 'none').
>
> [CAUSE]
> Inside the VM, if the fs is EXT4 or XFS, or even NTFS from Windows, the
> fs is allowed to dirty/modify the folio even if the folio is under
> writeback (as long as the address space doesn't have AS_STABLE_WRITES
> flag inherited from the block device).
>
> This is a valid optimization to improve the concurrency, and since these
> filesystems have no extra checksum on data, the content change is not a
> problem at all.
>
> Bu the final write into the image file is handled by btrfs, which needs
> the content not to be modified during writeback, or the checksum will
> not match the data (checksum is calculated before submitting the bio).
>
> So EXT4/XFS/NTRFS assume they can modify the folio under writeback, but
> btrfs requires no modification, this leads to the false csum mismatch.
>
> This is only a controlled example, there are even cases where
> multi-thread programs can submit a direct IO write, then another thread
> modifies the direct IO buffer for whatever reason.
>
> For such cases, btrfs has no sane way to detect such cases and leads to
> false data csum mismatch.
>
> [FIX]
> I have considered the following ideas to solve the problem:
>
> - Make direct IO to always skip data checksum
> This not only requires a new incompatible flag, as it breaks the
> current per-inode NODATASUM flag.
> But also requires extra handling for no csum found cases.
>
> And this also reduces our checksum protection.
>
> - Let hardware handle all the checksum
> AKA, just nodatasum mount option.
> That requires trust for hardware (which is not that trustful in a lot
> of cases), and it's not generic at all.
>
> - Always fallback to buffered write if the inode requires checksum
> This was suggested by Christoph, and is the solution utilized by this
> patch.
>
> The cost is obvious, the extra buffer copying into page cache, thus it
> reduces the performance.
> But at least it's still user configurable, if the end user still wants
> the zero-copy performance, just set NODATASUM flag for the inode
> (which is a common practice for VM images on btrfs).
>
> Since we cannot trust user space programs to keep the buffer
> consistent during direct IO, we have no choice but always falling back
> to buffered IO. At least by this, we avoid the more deadly false data
> checksum mismatch error.
>
> CC: stable@vger.kernel.org # 6.12+
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: David Sterba <dsterba@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
> fs/btrfs/direct-io.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
What about 6.14.y? It is needed there too, right? We can't take a
patch only for an older branch and not a newer one.
thanks,
greg k-h
prev parent reply other threads:[~2025-05-12 14:00 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-08 1:01 [PATCH 6.12.y] btrfs: always fallback to buffered write if the inode requires checksum Qu Wenruo
2025-05-09 1:52 ` Sasha Levin
2025-05-12 14:00 ` Greg KH [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2025051245-engine-overthrow-8820@gregkh \
--to=greg@kroah.com \
--cc=dsterba@suse.com \
--cc=fdmanana@suse.com \
--cc=hch@infradead.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox