public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <greg@kroah.com>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org, stable@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Filipe Manana <fdmanana@suse.com>,
	David Sterba <dsterba@suse.com>
Subject: Re: [PATCH 6.12.y] btrfs: always fallback to buffered write if the inode requires checksum
Date: Mon, 12 May 2025 16:00:05 +0200	[thread overview]
Message-ID: <2025051245-engine-overthrow-8820@gregkh> (raw)
In-Reply-To: <968f19c5b1b7d5595423b0ac0020cc18dfed8cb5.1746665263.git.wqu@suse.com>

On Thu, May 08, 2025 at 10:31:25AM +0930, Qu Wenruo wrote:
> commit 968f19c5b1b7d5595423b0ac0020cc18dfed8cb5 upstream.
> 
> [BUG]
> It is a long known bug that VM image on btrfs can lead to data csum
> mismatch, if the qemu is using direct-io for the image (this is commonly
> known as cache mode 'none').
> 
> [CAUSE]
> Inside the VM, if the fs is EXT4 or XFS, or even NTFS from Windows, the
> fs is allowed to dirty/modify the folio even if the folio is under
> writeback (as long as the address space doesn't have AS_STABLE_WRITES
> flag inherited from the block device).
> 
> This is a valid optimization to improve the concurrency, and since these
> filesystems have no extra checksum on data, the content change is not a
> problem at all.
> 
> Bu the final write into the image file is handled by btrfs, which needs
> the content not to be modified during writeback, or the checksum will
> not match the data (checksum is calculated before submitting the bio).
> 
> So EXT4/XFS/NTRFS assume they can modify the folio under writeback, but
> btrfs requires no modification, this leads to the false csum mismatch.
> 
> This is only a controlled example, there are even cases where
> multi-thread programs can submit a direct IO write, then another thread
> modifies the direct IO buffer for whatever reason.
> 
> For such cases, btrfs has no sane way to detect such cases and leads to
> false data csum mismatch.
> 
> [FIX]
> I have considered the following ideas to solve the problem:
> 
> - Make direct IO to always skip data checksum
>   This not only requires a new incompatible flag, as it breaks the
>   current per-inode NODATASUM flag.
>   But also requires extra handling for no csum found cases.
> 
>   And this also reduces our checksum protection.
> 
> - Let hardware handle all the checksum
>   AKA, just nodatasum mount option.
>   That requires trust for hardware (which is not that trustful in a lot
>   of cases), and it's not generic at all.
> 
> - Always fallback to buffered write if the inode requires checksum
>   This was suggested by Christoph, and is the solution utilized by this
>   patch.
> 
>   The cost is obvious, the extra buffer copying into page cache, thus it
>   reduces the performance.
>   But at least it's still user configurable, if the end user still wants
>   the zero-copy performance, just set NODATASUM flag for the inode
>   (which is a common practice for VM images on btrfs).
> 
>   Since we cannot trust user space programs to keep the buffer
>   consistent during direct IO, we have no choice but always falling back
>   to buffered IO.  At least by this, we avoid the more deadly false data
>   checksum mismatch error.
> 
> CC: stable@vger.kernel.org # 6.12+
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: David Sterba <dsterba@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/direct-io.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)

What about 6.14.y?  It is needed there too, right?  We can't take a
patch only for an older branch and not a newer one.

thanks,

greg k-h

      parent reply	other threads:[~2025-05-12 14:00 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-08  1:01 [PATCH 6.12.y] btrfs: always fallback to buffered write if the inode requires checksum Qu Wenruo
2025-05-09  1:52 ` Sasha Levin
2025-05-12 14:00 ` Greg KH [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2025051245-engine-overthrow-8820@gregkh \
    --to=greg@kroah.com \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=hch@infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox