From: Kanchan Joshi <joshi.k@samsung.com>
To: "hch@infradead.org" <hch@infradead.org>
Cc: Qu Wenruo <wqu@suse.com>,
Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
Theodore Ts'o <tytso@mit.edu>,
"lsf-pc@lists.linux-foundation.org"
<lsf-pc@lists.linux-foundation.org>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"josef@toxicpanda.com" <josef@toxicpanda.com>
Subject: Re: [LSF/MM/BPF TOPIC] File system checksum offload
Date: Tue, 18 Mar 2025 12:36:44 +0530 [thread overview]
Message-ID: <edde46e9-403b-4ddf-bd73-abe95446590c@samsung.com> (raw)
In-Reply-To: <Z6GivxxFWFZhN7jD@infradead.org>
On 2/4/2025 10:46 AM, hch@infradead.org wrote:
> On Mon, Feb 03, 2025 at 06:57:13PM +0530, Kanchan Joshi wrote:
>> But, patches do exactly that i.e., hardware cusm support. And posted
>> numbers [*] are also when hardware is checksumming the data blocks.
>
> I'm still not sure why you think the series implements hardware
> csum support.
Series ensure that (a) that host does not compute the csum, and (b)
device computes.
Not sure if you were doubting the HW instead, but I checked that part
with user-space nvme-passthrough program which
- [During write] does not send checksum and sets PRACT as 1.
- [During read] sends metadata buffer and keeps PRACT as 0.
It reads the correct data checksum which host never computed (but device
did at the time of write).
> The buf mode is just a duplicate implementation of the block layer
> automatic PI. The no buf means PRACT which let's the device auto
> generate and strip PI.
Regardless of buf or no buf, it applies PRACT and only device computes
the checksum. The two modes are taking shape only because of the way
PRACT works for two different device configurations
#1: when meta-size == pi-size, we don't need to send meta-buffer.
#2: when meta-size > pi-size, we need to.
Automatic PI helps for #2 as split handling of meta-buffer comes free if
I/O is split. But overall, this is also about abstracting PRACT details
so that each filesystem does not have to bother.
And changes to keep this abstracted in Auto-PI/NVMe are not much:
block/bio-integrity.c | 42 ++++++++++++++++++++++++++++++++++++++-
block/t10-pi.c | 7 +++++++
drivers/nvme/host/core.c | 24 ++++++++++++++++++++++
drivers/nvme/host/nvme.h | 1 +
> Especially the latter one (which is the
> one that was benchmarked) literally provides no additional protection
> over what the device would already do. It's the "trust me, bro" of
> data integrity :) Which to be fair will work pretty well as devices
> that support PI are the creme de la creme of storage devices and
> will have very good internal data protection internally. But the
> point of data checksums is to not trust the storage device and
> not trust layers between the checksum generation and the storage
> device.
Right, I'm not saying that protection is getting better. Just that any
offload is about trusting someone else with the job. We have other
instances like atomic-writes, copy, write-zeroes, write-same etc.
> IFF using PRACT is an acceptable level of protection just running
> NODATASUM and disabling PI generation/verification in the block
> layer using the current sysfs attributes (or an in-kernel interface
> for that) to force the driver to set PRACT will do exactly the same
> thing.
I had considered but that can't work because:
- the sysfs attributes operate at block-device level for all read or all
write operations. That's not flexible for policies such "do something
for some writes/reads but not for others" which can translate to "do
checksum offload for FS data, but keep things as is for FS meta" or
other combinations.
- If the I/O goes down to driver with , driver will start failing
(rather than setting PRACT) if the configuration is "meta-size >
pi-size". This part in nvme_setup_rw:
if (!blk_integrity_rq(req)) {
if (WARN_ON_ONCE(!nvme_ns_has_pi(ns->head)))
return BLK_STS_NOTSUPP;
control |= NVME_RW_PRINFO_PRACT;
}
next prev parent reply other threads:[~2025-03-18 7:07 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20250130092400epcas5p1a3a9d899583e9502ed45fe500ae8a824@epcas5p1.samsung.com>
2025-01-30 9:15 ` [LSF/MM/BPF TOPIC] File system checksum offload Kanchan Joshi
2025-01-30 14:28 ` Theodore Ts'o
2025-01-30 20:39 ` [Lsf-pc] " Martin K. Petersen
2025-01-31 4:40 ` Theodore Ts'o
2025-01-31 7:07 ` Christoph Hellwig
2025-01-31 13:11 ` Kanchan Joshi
2025-02-03 7:47 ` Johannes Thumshirn
2025-02-03 7:56 ` Christoph Hellwig
2025-02-03 8:04 ` Johannes Thumshirn
2025-02-03 8:06 ` hch
2025-02-03 8:16 ` Qu Wenruo
2025-02-03 8:26 ` Matthew Wilcox
2025-02-03 8:30 ` hch
2025-02-03 8:36 ` Qu Wenruo
2025-02-03 8:40 ` hch
2025-02-03 8:51 ` Qu Wenruo
2025-02-03 8:57 ` hch
2025-02-03 8:26 ` hch
2025-02-03 13:27 ` Kanchan Joshi
2025-02-03 23:17 ` Qu Wenruo
2025-02-04 5:48 ` hch
2025-02-04 5:16 ` hch
2025-03-18 7:06 ` Kanchan Joshi [this message]
2025-03-18 8:07 ` hch
2025-03-19 18:06 ` Kanchan Joshi
2025-03-20 5:48 ` hch
2025-02-03 13:32 ` Kanchan Joshi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=edde46e9-403b-4ddf-bd73-abe95446590c@samsung.com \
--to=joshi.k@samsung.com \
--cc=Johannes.Thumshirn@wdc.com \
--cc=hch@infradead.org \
--cc=josef@toxicpanda.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=tytso@mit.edu \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox