public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
From: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
To: Kanchan Joshi <joshi.k@samsung.com>, Theodore Ts'o <tytso@mit.edu>
Cc: "lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"josef@toxicpanda.com" <josef@toxicpanda.com>
Subject: Re: [LSF/MM/BPF TOPIC] File system checksum offload
Date: Mon, 3 Feb 2025 07:47:53 +0000	[thread overview]
Message-ID: <b8790a76-fd4e-49b6-bc08-44e5c3bf348a@wdc.com> (raw)
In-Reply-To: <97f402bc-4029-48d4-bd03-80af5b799d04@samsung.com>

On 31.01.25 14:11, Kanchan Joshi wrote:
> On 1/30/2025 7:58 PM, Theodore Ts'o wrote:
>> On Thu, Jan 30, 2025 at 02:45:45PM +0530, Kanchan Joshi wrote:
>>> I would like to propose a discussion on employing checksum offload in
>>> filesystems.
>>> It would be good to co-locate this with the storage track, as the
>>> finer details lie in the block layer and NVMe driver.
>>
>> I wouldn't call this "file system offload".  Enabling the data
>> integrity feature or whatever you want to call it is really a block
>> layer issue.  The file system doesn't need to get involved at all.
>> Indeed, looking the patch, the only reason why the file system is
>> getting involved is because (a) you've added a mount option, and (b)
>> the mount option flips a bit in the bio that gets sent to the block
>> layer.
> 
> Mount option was only for the RFC. If everything else gets sorted, it
> would be about choosing whatever is liked by the Btrfs.
>     > But this could also be done by adding a queue specific flag, at which
>> point the file system doesn't need to be involved at all.  Why would
>> you want to enable the data ingregity feature on a per block I/O
>> basis, if the device supports it?
> 
> Because I thought users (filesystems) would prefer flexibility. Per-IO
> control helps to choose different policy for say data and meta. Let me
> outline the differences.

But data and metadata checksums are handled differently in btrfs. For 
data we checksum each block and write it into the checksum tree. For 
metadata the checksum is part of the metadata (see 'struct btrfs_header').

> Block-layer auto integrity
> - always attaches integrity-payload for each I/O.
> - it does compute checksum/reftag for each I/O. And this part does not
> do justice to the label 'offload'.
> 
> The patches make auto-integrity
> - attach the integrity-buffer only if the device configuration demands.
> - never compute checksum/reftag at the block-layer.
> - keeps the offload choice at per I/O level.
> 
> Btrfs checksum tree is created only for data blocks, so the patches
> apply the flag (REQ_INTEGRITY_OFFLOAD) on that. While metadata blocks,
> which maybe more important, continue to get checksummed at two levels
> (block and device).

The thing I don't like with the current RFC patchset is, it breaks 
scrub, repair and device error statistics. It nothing that can't be 
solved though. But as of now it just doesn't make any sense at all to 
me. We at least need the FS to look at the BLK_STS_PROTECTION return and 
handle accordingly in scrub, read repair and statistics.

And that's only for feature parity. I'd also like to see some 
performance numbers and numbers of reduced WAF, if this is really worth 
the hassle.

Thanks,
	Johannes

  reply	other threads:[~2025-02-03  7:48 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20250130092400epcas5p1a3a9d899583e9502ed45fe500ae8a824@epcas5p1.samsung.com>
2025-01-30  9:15 ` [LSF/MM/BPF TOPIC] File system checksum offload Kanchan Joshi
2025-01-30 14:28   ` Theodore Ts'o
2025-01-30 20:39     ` [Lsf-pc] " Martin K. Petersen
2025-01-31  4:40       ` Theodore Ts'o
2025-01-31  7:07         ` Christoph Hellwig
2025-01-31 13:11     ` Kanchan Joshi
2025-02-03  7:47       ` Johannes Thumshirn [this message]
2025-02-03  7:56         ` Christoph Hellwig
2025-02-03  8:04           ` Johannes Thumshirn
2025-02-03  8:06             ` hch
2025-02-03  8:16             ` Qu Wenruo
2025-02-03  8:26               ` Matthew Wilcox
2025-02-03  8:30                 ` hch
2025-02-03  8:36                   ` Qu Wenruo
2025-02-03  8:40                     ` hch
2025-02-03  8:51                       ` Qu Wenruo
2025-02-03  8:57                         ` hch
2025-02-03  8:26               ` hch
2025-02-03 13:27               ` Kanchan Joshi
2025-02-03 23:17                 ` Qu Wenruo
2025-02-04  5:48                   ` hch
2025-02-04  5:16                 ` hch
2025-03-18  7:06                   ` Kanchan Joshi
2025-03-18  8:07                     ` hch
2025-03-19 18:06                       ` Kanchan Joshi
2025-03-20  5:48                         ` hch
2025-02-03 13:32         ` Kanchan Joshi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b8790a76-fd4e-49b6-bc08-44e5c3bf348a@wdc.com \
    --to=johannes.thumshirn@wdc.com \
    --cc=josef@toxicpanda.com \
    --cc=joshi.k@samsung.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox