* [PATCHv2] block: always allocate integrity buffer
@ 2025-05-07 19:14 Keith Busch
2025-05-07 22:31 ` Martin K. Petersen
2025-05-08 5:12 ` Christoph Hellwig
0 siblings, 2 replies; 6+ messages in thread
From: Keith Busch @ 2025-05-07 19:14 UTC (permalink / raw)
To: linux-block; +Cc: hch, linux-nvme, Keith Busch, Martin K. Petersen, Jens Axboe
From: Keith Busch <kbusch@kernel.org>
The integrity buffer, whether or not you want it generated or verified, is
mandatory for nvme formats that have metadata. The block integrity attributes
read_verify and write_generate had been stopping the metadata buffer from being
allocated and attached to the bio entirely. We only want to suppress the
protection checks on the device and host, but we still need the buffer.
Otherwise, reads and writes will just get IO errors and this nvme warning:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 371 at drivers/nvme/host/core.c:1036 nvme_setup_rw+0x122/0x210
...
RIP: 0010:nvme_setup_rw+0x122/0x210
...
Call Trace:
<TASK>
nvme_setup_cmd+0x1b4/0x280
nvme_queue_rqs+0xc4/0x1f0 [nvme]
blk_mq_dispatch_queue_requests+0x24a/0x430
blk_mq_flush_plug_list+0x50/0x140
__blk_flush_plug+0xc1/0x100
__submit_bio+0x1c1/0x360
? submit_bio_noacct_nocheck+0x2d6/0x3c0
submit_bio_noacct_nocheck+0x2d6/0x3c0
? submit_bio_noacct+0x47/0x4c0
submit_bio_wait+0x48/0xa0
__blkdev_direct_IO_simple+0xee/0x210
? current_time+0x1d/0x100
? current_time+0x1d/0x100
? __bio_clone+0xb0/0xb0
blkdev_read_iter+0xbb/0x140
vfs_read+0x239/0x310
ksys_read+0x58/0xc0
do_syscall_64+0x6c/0x180
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
v1->v2:
The bip_flags are initialized based on the the bi->flags and don't
change for the lifetime of the bio. Check this instead to avoid any
races with bi->flags changing by a user modifying the read_verify and
write_generate attributes.
Check if we can skip the verify step before scheduling the deferred
completion work.
block/bio-integrity-auto.c | 37 ++++++++++++++++++++++---------------
1 file changed, 22 insertions(+), 15 deletions(-)
diff --git a/block/bio-integrity-auto.c b/block/bio-integrity-auto.c
index e524c609be506..2c43e27b332ca 100644
--- a/block/bio-integrity-auto.c
+++ b/block/bio-integrity-auto.c
@@ -54,12 +54,12 @@ static void bio_integrity_verify_fn(struct work_struct *work)
*/
bool __bio_integrity_endio(struct bio *bio)
{
- struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
struct bio_integrity_payload *bip = bio_integrity(bio);
struct bio_integrity_data *bid =
container_of(bip, struct bio_integrity_data, bip);
- if (bio_op(bio) == REQ_OP_READ && !bio->bi_status && bi->csum_type) {
+ if (bio_op(bio) == REQ_OP_READ && !bio->bi_status &&
+ bip->bip_flags & BIP_CHECK_GUARD) {
INIT_WORK(&bid->work, bio_integrity_verify_fn);
queue_work(kintegrityd_wq, &bid->work);
return false;
@@ -69,6 +69,16 @@ bool __bio_integrity_endio(struct bio *bio)
return true;
}
+static inline void bio_set_bip_flags(struct blk_integrity *bi, u16 *bip_flags)
+{
+ if (bi->csum_type == BLK_INTEGRITY_CSUM_IP)
+ *bip_flags |= BIP_IP_CHECKSUM;
+ if (bi->csum_type)
+ *bip_flags |= BIP_CHECK_GUARD;
+ if (bi->flags & BLK_INTEGRITY_REF_TAG)
+ *bip_flags |= BIP_CHECK_REFTAG;
+}
+
/**
* bio_integrity_prep - Prepare bio for integrity I/O
* @bio: bio to prepare
@@ -83,6 +93,7 @@ bool __bio_integrity_endio(struct bio *bio)
bool bio_integrity_prep(struct bio *bio)
{
struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
+ unsigned short bip_flags = BIP_BLOCK_INTEGRITY;
struct bio_integrity_data *bid;
gfp_t gfp = GFP_NOIO;
unsigned int len;
@@ -101,19 +112,22 @@ bool bio_integrity_prep(struct bio *bio)
switch (bio_op(bio)) {
case REQ_OP_READ:
if (bi->flags & BLK_INTEGRITY_NOVERIFY)
- return true;
+ break;
+ bio_set_bip_flags(bi, &bip_flags);
break;
case REQ_OP_WRITE:
- if (bi->flags & BLK_INTEGRITY_NOGENERATE)
- return true;
-
/*
* Zero the memory allocated to not leak uninitialized kernel
* memory to disk for non-integrity metadata where nothing else
* initializes the memory.
*/
+ if (bi->flags & BLK_INTEGRITY_NOGENERATE) {
+ gfp |= __GFP_ZERO;
+ break;
+ }
if (bi->csum_type == BLK_INTEGRITY_CSUM_NONE)
gfp |= __GFP_ZERO;
+ bio_set_bip_flags(bi, &bip_flags);
break;
default:
return true;
@@ -134,22 +148,15 @@ bool bio_integrity_prep(struct bio *bio)
bid->bio = bio;
- bid->bip.bip_flags |= BIP_BLOCK_INTEGRITY;
+ bid->bip.bip_flags = bip_flags;
bip_set_seed(&bid->bip, bio->bi_iter.bi_sector);
- if (bi->csum_type == BLK_INTEGRITY_CSUM_IP)
- bid->bip.bip_flags |= BIP_IP_CHECKSUM;
- if (bi->csum_type)
- bid->bip.bip_flags |= BIP_CHECK_GUARD;
- if (bi->flags & BLK_INTEGRITY_REF_TAG)
- bid->bip.bip_flags |= BIP_CHECK_REFTAG;
-
if (bio_integrity_add_page(bio, virt_to_page(buf), len,
offset_in_page(buf)) < len)
goto err_end_io;
/* Auto-generate integrity metadata if this is a write */
- if (bio_data_dir(bio) == WRITE)
+ if (bio_data_dir(bio) == WRITE && bid->bip.bip_flags & BIP_CHECK_GUARD)
blk_integrity_generate(bio);
else
bid->saved_bio_iter = bio->bi_iter;
--
2.47.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCHv2] block: always allocate integrity buffer
2025-05-07 19:14 [PATCHv2] block: always allocate integrity buffer Keith Busch
@ 2025-05-07 22:31 ` Martin K. Petersen
2025-05-08 5:15 ` Christoph Hellwig
2025-05-08 5:12 ` Christoph Hellwig
1 sibling, 1 reply; 6+ messages in thread
From: Martin K. Petersen @ 2025-05-07 22:31 UTC (permalink / raw)
To: Keith Busch
Cc: linux-block, hch, linux-nvme, Keith Busch, Martin K. Petersen,
Jens Axboe
Keith,
> + if (bio_op(bio) == REQ_OP_READ && !bio->bi_status &&
> + bip->bip_flags & BIP_CHECK_GUARD) {
> - if (bio_data_dir(bio) == WRITE)
> + if (bio_data_dir(bio) == WRITE && bid->bip.bip_flags & BIP_CHECK_GUARD)
> blk_integrity_generate(bio);
I know that we can't have one without the other currently but there's
some cognitive PI dissonance wrt. keying off BIP_CHECK_GUARD only.
Maybe worth considering:
#define BIP_CHECK_FLAGS (BIP_CHECK_GUARD | BIP_CHECK_REFTAG | BIP_CHECK_APPTAG)
and validating against that? Or a bip_should_check() wrapper.
Not a biggie, it just trips me up when we encode implementation-specific
assumptions.
Anyway. It's probably OK.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
--
Martin K. Petersen
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2] block: always allocate integrity buffer
2025-05-07 19:14 [PATCHv2] block: always allocate integrity buffer Keith Busch
2025-05-07 22:31 ` Martin K. Petersen
@ 2025-05-08 5:12 ` Christoph Hellwig
2025-05-08 16:14 ` Keith Busch
1 sibling, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2025-05-08 5:12 UTC (permalink / raw)
To: Keith Busch
Cc: linux-block, hch, linux-nvme, Keith Busch, Martin K. Petersen,
Jens Axboe
On Wed, May 07, 2025 at 12:14:24PM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> The integrity buffer, whether or not you want it generated or verified, is
> mandatory for nvme formats that have metadata. The block integrity attributes
> read_verify and write_generate had been stopping the metadata buffer from being
This commit log exceeds the 73 characters allocated to it, please reformat
it.
> allocated and attached to the bio entirely. We only want to suppress the
> protection checks on the device and host, but we still need the buffer.
>
> Otherwise, reads and writes will just get IO errors and this nvme warning:
But to a point - the metadata buffer is only required for non-PI
metadata. I think from looking at the code that is exactly what
this patch does, but the commit log sounds different.
Also this should probably have a fixes tag.
> - if (bio_op(bio) == REQ_OP_READ && !bio->bi_status && bi->csum_type) {
> + if (bio_op(bio) == REQ_OP_READ && !bio->bi_status &&
> + bip->bip_flags & BIP_CHECK_GUARD) {
While Martin correctly points out we currently always do both guard
and reftag checking, we really should check for either, especially as
some code below is written in a way to allow for formats that only
have one of them.
> +static inline void bio_set_bip_flags(struct blk_integrity *bi, u16 *bip_flags)
> +{
> + if (bi->csum_type == BLK_INTEGRITY_CSUM_IP)
> + *bip_flags |= BIP_IP_CHECKSUM;
> + if (bi->csum_type)
> + *bip_flags |= BIP_CHECK_GUARD;
> + if (bi->flags & BLK_INTEGRITY_REF_TAG)
> + *bip_flags |= BIP_CHECK_REFTAG;
> +
Just return the flags here instead of the somewhat odd output by
pointer return?
> + break;
> + bio_set_bip_flags(bi, &bip_flags);
> break;
> case REQ_OP_WRITE:
...
> + bio_set_bip_flags(bi, &bip_flags);
> break;
> default:
> return true;
> @@ -134,22 +148,15 @@ bool bio_integrity_prep(struct bio *bio)
Just move this after the switch to have a single callsite. And maybe
don't even bother with the helper then?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2] block: always allocate integrity buffer
2025-05-07 22:31 ` Martin K. Petersen
@ 2025-05-08 5:15 ` Christoph Hellwig
0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2025-05-08 5:15 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Keith Busch, linux-block, hch, linux-nvme, Keith Busch,
Jens Axboe
On Wed, May 07, 2025 at 06:31:33PM -0400, Martin K. Petersen wrote:
> I know that we can't have one without the other currently but there's
> some cognitive PI dissonance wrt. keying off BIP_CHECK_GUARD only.
>
> Maybe worth considering:
>
> #define BIP_CHECK_FLAGS (BIP_CHECK_GUARD | BIP_CHECK_REFTAG | BIP_CHECK_APPTAG)
>
> and validating against that? Or a bip_should_check() wrapper.
I'd vote for the helper.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2] block: always allocate integrity buffer
2025-05-08 5:12 ` Christoph Hellwig
@ 2025-05-08 16:14 ` Keith Busch
2025-05-08 16:19 ` Christoph Hellwig
0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2025-05-08 16:14 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, linux-block, linux-nvme, Martin K. Petersen,
Jens Axboe
On Thu, May 08, 2025 at 07:12:33AM +0200, Christoph Hellwig wrote:
> > allocated and attached to the bio entirely. We only want to suppress the
> > protection checks on the device and host, but we still need the buffer.
> >
> > Otherwise, reads and writes will just get IO errors and this nvme warning:
>
> But to a point - the metadata buffer is only required for non-PI
> metadata. I think from looking at the code that is exactly what
> this patch does, but the commit log sounds different.
Not exactly. If anyone's intention was to use this specifically to force
nvme pract for those special PI formats, then this patch would change
that. The driver would just get an unchecked metadata buffer instead.
This is more about just preventing the kernel from sending a request
that the driver is going to reject. Like if someone mistakenly messes
with these attributes on formats where PRACT doesn't work.
One use case is like when the kernel started supporting PI offsets
(~6.8?), and that broke the upgrade path since previously written data
wouldn't pass the newly enforced checksum on reads, so we need ways to
disable the checks.
But since you mention it, maybe someone does want to force PRACT on the
generic read/write path? I considered it a fallback when the kernel
doesn't have CONFIG_BLK_DEV_INTEGRITY, but we could control it at
runtime through these attributes too. It would just need a new flag in
the blk_integrity profile to say if format supports controller-side
strip/generation and use that to decide if we need to attach an
unchecked integrity payload or not.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2] block: always allocate integrity buffer
2025-05-08 16:14 ` Keith Busch
@ 2025-05-08 16:19 ` Christoph Hellwig
0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2025-05-08 16:19 UTC (permalink / raw)
To: Keith Busch
Cc: Christoph Hellwig, Keith Busch, linux-block, linux-nvme,
Martin K. Petersen, Jens Axboe
On Thu, May 08, 2025 at 10:14:13AM -0600, Keith Busch wrote:
> But since you mention it, maybe someone does want to force PRACT on the
> generic read/write path? I considered it a fallback when the kernel
> doesn't have CONFIG_BLK_DEV_INTEGRITY, but we could control it at
> runtime through these attributes too. It would just need a new flag in
> the blk_integrity profile to say if format supports controller-side
> strip/generation and use that to decide if we need to attach an
> unchecked integrity payload or not.
The generate/verify attributes always were my way to force insert/strip
behavior on both SCSI and NVMe for testing.
But yeah, I never tested the PI + other metadata features where this
might or might not work, and we really do need to fix the attribute
for those.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-05-08 16:19 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-07 19:14 [PATCHv2] block: always allocate integrity buffer Keith Busch
2025-05-07 22:31 ` Martin K. Petersen
2025-05-08 5:15 ` Christoph Hellwig
2025-05-08 5:12 ` Christoph Hellwig
2025-05-08 16:14 ` Keith Busch
2025-05-08 16:19 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).