* [PATCHv2] blk-integrity: support arbitrary buffer alignment
@ 2025-11-07 4:34 Keith Busch
2025-11-07 13:15 ` Christoph Hellwig
0 siblings, 1 reply; 4+ messages in thread
From: Keith Busch @ 2025-11-07 4:34 UTC (permalink / raw)
To: linux-block; +Cc: hch, axboe, martin.petersen, Keith Busch
From: Keith Busch <kbusch@kernel.org>
A bio segment might have partial block data with the rest continuing
into the next segments.
At the same time, the protection information may also be split in
multiple segments. The most likely way that may happen is if two
requests merge, or if we're directly using the io_uring user metadata.
Further, it may be possible to have unalign the protection fields from
the user space buffer, or if there are odd additional opaque bytes in
front or in back of the protection information metadata region.
Change up the iteration to allow spanning multiple segments. This patch
is mostly a re-write of the protection information handling to allow any
arbitrary alignments, so it's probably easier to review the end result
rather than the diff.
Note, this strives to be a very general solution that should work in
scenarios that I think unlikely to ever encounter in real life.
This was tested using recently proposed io_uring metadata test case
here:
https://lore.kernel.org/io-uring/20251107042953.3393507-1-kbusch@meta.com/
The test purposefully contructs metadata with offsets that have the data
integrity field straddle pages. As longs as they're not physically
contiguous, that will split the field across multiple segments and test
those conditions, which will either get a copy buffer if the device
doesn't support multiple integrity segments, or get a temporary data
integrity field copy during the reftag remapping.
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
v1->v2:
Fixed up the new "union" type and added a comment to explain what its
for.
Fixed up sparse warnings
I think I fixed the ip checksum type based on the existing
implementation, but don't know as I never encountered a device that
supports that.
Fixed up remapping for partial completions.
Various common code cleanups.
block/blk-settings.c | 10 -
block/t10-pi.c | 853 +++++++++++++++++++++++++------------------
2 files changed, 499 insertions(+), 364 deletions(-)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 78dfef1176231..e0d0b035f39d2 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -197,16 +197,6 @@ static int blk_validate_integrity_limits(struct queue_limits *lim)
if (!bi->interval_exp)
bi->interval_exp = ilog2(lim->logical_block_size);
- /*
- * The PI generation / validation helpers do not expect intervals to
- * straddle multiple bio_vecs. Enforce alignment so that those are
- * never generated, and that each buffer is aligned as expected.
- */
- if (bi->csum_type) {
- lim->dma_alignment = max(lim->dma_alignment,
- (1U << bi->interval_exp) - 1);
- }
-
/*
* The block layer automatically adds integrity data for bios that don't
* already have it. Limit the I/O size so that a single maximum size
diff --git a/block/t10-pi.c b/block/t10-pi.c
index 0c4ed97021460..dd0986b272bb9 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -12,230 +12,175 @@
#include <linux/unaligned.h>
#include "blk.h"
+#define APP_TAG_ESCAPE 0xffff
+#define REF_TAG_ESCAPTE 0xffffffff
+
+/**
+ * This union is used for onstack allocations when the pi field is split across
+ * segments. blk_validate_integrity_limits() guarantees pi_tuple_size matches
+ * the sizeof one of these two types.
+ */
+union pi_tuple {
+ struct crc64_pi_tuple crc64_pi;
+ struct t10_pi_tuple t10_pi;
+};
+
struct blk_integrity_iter {
- void *prot_buf;
- void *data_buf;
- sector_t seed;
- unsigned int data_size;
- unsigned short interval;
- const char *disk_name;
+ struct bio *bio;
+ struct bio_integrity_payload *bip;
+ struct blk_integrity *bi;
+ struct bvec_iter data_iter;
+ struct bvec_iter prot_iter;
+ unsigned int interval_remaining;
+ u64 seed;
+ u64 crc;
};
-static __be16 t10_pi_csum(__be16 csum, void *data, unsigned int len,
- unsigned char csum_type)
+static void blk_crc(struct blk_integrity_iter *iter, void *data,
+ unsigned int len)
{
- if (csum_type == BLK_INTEGRITY_CSUM_IP)
- return (__force __be16)ip_compute_csum(data, len);
- return cpu_to_be16(crc_t10dif_update(be16_to_cpu(csum), data, len));
+ switch (iter->bi->csum_type) {
+ case BLK_INTEGRITY_CSUM_CRC64:
+ iter->crc = crc64_nvme(iter->crc, data, len);
+ break;
+ case BLK_INTEGRITY_CSUM_CRC:
+ iter->crc = crc_t10dif_update(iter->crc, data, len);
+ break;
+ case BLK_INTEGRITY_CSUM_IP:
+ iter->crc = (__force u32)csum_partial(data, len,
+ (__force __wsum)iter->crc);
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ iter->crc = U64_MAX;
+ break;
+ }
}
-/*
- * Type 1 and Type 2 protection use the same format: 16 bit guard tag,
- * 16 bit app tag, 32 bit reference tag. Type 3 does not define the ref
- * tag.
+/**
+ * blk_integrity_crc_offset - update the crc for formats that have metadata
+ * padding in front of the data integrity field
*/
-static void t10_pi_generate(struct blk_integrity_iter *iter,
- struct blk_integrity *bi)
+static void blk_integrity_crc_offset(struct blk_integrity_iter *iter)
{
- u8 offset = bi->pi_offset;
- unsigned int i;
-
- for (i = 0 ; i < iter->data_size ; i += iter->interval) {
- struct t10_pi_tuple *pi = iter->prot_buf + offset;
-
- pi->guard_tag = t10_pi_csum(0, iter->data_buf, iter->interval,
- bi->csum_type);
- if (offset)
- pi->guard_tag = t10_pi_csum(pi->guard_tag,
- iter->prot_buf, offset, bi->csum_type);
- pi->app_tag = 0;
-
- if (bi->flags & BLK_INTEGRITY_REF_TAG)
- pi->ref_tag = cpu_to_be32(lower_32_bits(iter->seed));
- else
- pi->ref_tag = 0;
-
- iter->data_buf += iter->interval;
- iter->prot_buf += bi->metadata_size;
- iter->seed++;
+ unsigned int offset = iter->bi->pi_offset;
+ struct bio_vec *bvec = iter->bip->bip_vec;
+
+ while (offset > 0) {
+ struct bio_vec pbv = mp_bvec_iter_bvec(bvec, iter->prot_iter);
+ unsigned int len = min(pbv.bv_len, offset);
+ void *prot_buf = bvec_kmap_local(&pbv);
+
+ blk_crc(iter, prot_buf, len);
+ kunmap_local(prot_buf);
+ offset -= len;
+ bvec_iter_advance_single(bvec, &iter->prot_iter, len);
}
}
-static blk_status_t t10_pi_verify(struct blk_integrity_iter *iter,
- struct blk_integrity *bi)
+/**
+ * __blk_integrity_copy_from_tuple() - copy from @tuple to @iter
+ */
+static void __blk_integrity_copy_from_tuple(struct bio_integrity_payload *bip,
+ struct bvec_iter *iter, void *tuple,
+ unsigned int tuple_size)
{
- u8 offset = bi->pi_offset;
- unsigned int i;
-
- for (i = 0 ; i < iter->data_size ; i += iter->interval) {
- struct t10_pi_tuple *pi = iter->prot_buf + offset;
- __be16 csum;
-
- if (bi->flags & BLK_INTEGRITY_REF_TAG) {
- if (pi->app_tag == T10_PI_APP_ESCAPE)
- goto next;
-
- if (be32_to_cpu(pi->ref_tag) !=
- lower_32_bits(iter->seed)) {
- pr_err("%s: ref tag error at location %llu " \
- "(rcvd %u)\n", iter->disk_name,
- (unsigned long long)
- iter->seed, be32_to_cpu(pi->ref_tag));
- return BLK_STS_PROTECTION;
- }
- } else {
- if (pi->app_tag == T10_PI_APP_ESCAPE &&
- pi->ref_tag == T10_PI_REF_ESCAPE)
- goto next;
- }
+ void *prot_buf;
- csum = t10_pi_csum(0, iter->data_buf, iter->interval,
- bi->csum_type);
- if (offset)
- csum = t10_pi_csum(csum, iter->prot_buf, offset,
- bi->csum_type);
-
- if (pi->guard_tag != csum) {
- pr_err("%s: guard tag error at sector %llu " \
- "(rcvd %04x, want %04x)\n", iter->disk_name,
- (unsigned long long)iter->seed,
- be16_to_cpu(pi->guard_tag), be16_to_cpu(csum));
- return BLK_STS_PROTECTION;
- }
+ while (tuple_size) {
+ struct bio_vec pbv = mp_bvec_iter_bvec(bip->bip_vec, *iter);
+ unsigned int len = min(tuple_size, pbv.bv_len);
+
+ prot_buf = bvec_kmap_local(&pbv);
+ memcpy(prot_buf, tuple, len);
+ kunmap_local(prot_buf);
-next:
- iter->data_buf += iter->interval;
- iter->prot_buf += bi->metadata_size;
- iter->seed++;
+ bvec_iter_advance_single(bip->bip_vec, iter, len);
+ tuple_size -= len;
+ tuple += len;
}
+}
- return BLK_STS_OK;
+static void blk_integrity_copy_from_tuple(struct blk_integrity_iter *iter,
+ void *tuple)
+{
+ __blk_integrity_copy_from_tuple(iter->bip, &iter->prot_iter,
+ tuple, iter->bi->pi_tuple_size);
}
/**
- * t10_pi_type1_prepare - prepare PI prior submitting request to device
- * @rq: request with PI that should be prepared
- *
- * For Type 1/Type 2, the virtual start sector is the one that was
- * originally submitted by the block layer for the ref_tag usage. Due to
- * partitioning, MD/DM cloning, etc. the actual physical start sector is
- * likely to be different. Remap protection information to match the
- * physical LBA.
+ * __blk_integrity_copy_to_tuple() - copy to &tuple from @iter
*/
-static void t10_pi_type1_prepare(struct request *rq)
+static void __blk_integrity_copy_to_tuple(struct bio_integrity_payload *bip,
+ struct bvec_iter *iter, void *tuple,
+ unsigned int tuple_size)
{
- struct blk_integrity *bi = &rq->q->limits.integrity;
- const int tuple_sz = bi->metadata_size;
- u32 ref_tag = t10_pi_ref_tag(rq);
- u8 offset = bi->pi_offset;
- struct bio *bio;
+ void *prot_buf;
- __rq_for_each_bio(bio, rq) {
- struct bio_integrity_payload *bip = bio_integrity(bio);
- u32 virt = bip_get_seed(bip) & 0xffffffff;
- struct bio_vec iv;
- struct bvec_iter iter;
+ while (tuple_size) {
+ struct bio_vec pbv = mp_bvec_iter_bvec(bip->bip_vec, *iter);
+ unsigned int len = min(tuple_size, pbv.bv_len);
- /* Already remapped? */
- if (bip->bip_flags & BIP_MAPPED_INTEGRITY)
- break;
+ prot_buf = bvec_kmap_local(&pbv);
+ memcpy(tuple, prot_buf, len);
+ kunmap_local(prot_buf);
- bip_for_each_vec(iv, bip, iter) {
- unsigned int j;
- void *p;
-
- p = bvec_kmap_local(&iv);
- for (j = 0; j < iv.bv_len; j += tuple_sz) {
- struct t10_pi_tuple *pi = p + offset;
-
- if (be32_to_cpu(pi->ref_tag) == virt)
- pi->ref_tag = cpu_to_be32(ref_tag);
- virt++;
- ref_tag++;
- p += tuple_sz;
- }
- kunmap_local(p);
- }
-
- bip->bip_flags |= BIP_MAPPED_INTEGRITY;
+ bvec_iter_advance_single(bip->bip_vec, iter, len);
+ tuple_size -= len;
+ tuple += len;
}
}
-/**
- * t10_pi_type1_complete - prepare PI prior returning request to the blk layer
- * @rq: request with PI that should be prepared
- * @nr_bytes: total bytes to prepare
- *
- * For Type 1/Type 2, the virtual start sector is the one that was
- * originally submitted by the block layer for the ref_tag usage. Due to
- * partitioning, MD/DM cloning, etc. the actual physical start sector is
- * likely to be different. Since the physical start sector was submitted
- * to the device, we should remap it back to virtual values expected by the
- * block layer.
- */
-static void t10_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
+static void blk_integrity_copy_to_tuple(struct blk_integrity_iter *iter,
+ void *tuple)
{
- struct blk_integrity *bi = &rq->q->limits.integrity;
- unsigned intervals = nr_bytes >> bi->interval_exp;
- const int tuple_sz = bi->metadata_size;
- u32 ref_tag = t10_pi_ref_tag(rq);
- u8 offset = bi->pi_offset;
- struct bio *bio;
+ __blk_integrity_copy_to_tuple(iter->bip, &iter->prot_iter,
+ tuple, iter->bi->pi_tuple_size);
+}
- __rq_for_each_bio(bio, rq) {
- struct bio_integrity_payload *bip = bio_integrity(bio);
- u32 virt = bip_get_seed(bip) & 0xffffffff;
- struct bio_vec iv;
- struct bvec_iter iter;
-
- bip_for_each_vec(iv, bip, iter) {
- unsigned int j;
- void *p;
-
- p = bvec_kmap_local(&iv);
- for (j = 0; j < iv.bv_len && intervals; j += tuple_sz) {
- struct t10_pi_tuple *pi = p + offset;
-
- if (be32_to_cpu(pi->ref_tag) == ref_tag)
- pi->ref_tag = cpu_to_be32(virt);
- virt++;
- ref_tag++;
- intervals--;
- p += tuple_sz;
- }
- kunmap_local(p);
- }
+static void blk_set_ext_pi(void *prot_buf, struct blk_integrity_iter *iter)
+{
+ struct crc64_pi_tuple *pi = prot_buf;
+
+ if (unlikely((unsigned long)prot_buf & (sizeof(*pi) - 1))) {
+ put_unaligned_be64(iter->crc, &pi->guard_tag);
+ put_unaligned_be16(0, &pi->app_tag);
+ put_unaligned_be48(iter->seed, &pi->ref_tag);
+ } else {
+ pi->guard_tag = cpu_to_be64(iter->crc);
+ pi->app_tag = 0;
+ put_unaligned_be48(iter->seed, &pi->ref_tag);
}
}
-static __be64 ext_pi_crc64(u64 crc, void *data, unsigned int len)
+static void blk_set_t10_pi(void *prot_buf, struct blk_integrity_iter *iter)
{
- return cpu_to_be64(crc64_nvme(crc, data, len));
+ struct t10_pi_tuple *pi = prot_buf;
+
+ if (unlikely((unsigned long)prot_buf & (sizeof(*pi) - 1))) {
+ put_unaligned_be16(iter->crc, &pi->guard_tag);
+ put_unaligned_be16(0, &pi->app_tag);
+ put_unaligned_be32(iter->seed, &pi->ref_tag);
+ } else {
+ pi->guard_tag = cpu_to_be16(iter->crc);
+ pi->app_tag = 0;
+ pi->ref_tag = cpu_to_be32(iter->seed);
+ }
}
-static void ext_pi_crc64_generate(struct blk_integrity_iter *iter,
- struct blk_integrity *bi)
+static void blk_set_ip_pi(void *prot_buf, struct blk_integrity_iter *iter)
{
- u8 offset = bi->pi_offset;
- unsigned int i;
-
- for (i = 0 ; i < iter->data_size ; i += iter->interval) {
- struct crc64_pi_tuple *pi = iter->prot_buf + offset;
-
- pi->guard_tag = ext_pi_crc64(0, iter->data_buf, iter->interval);
- if (offset)
- pi->guard_tag = ext_pi_crc64(be64_to_cpu(pi->guard_tag),
- iter->prot_buf, offset);
+ __be16 csum = (__force __be16)~(lower_16_bits(iter->crc));
+ struct t10_pi_tuple *pi = prot_buf;
+
+ if (unlikely((unsigned long)prot_buf & (sizeof(*pi) - 1))) {
+ __put_unaligned_t(__be16, csum, &pi->guard_tag);
+ put_unaligned_be16(0, &pi->app_tag);
+ put_unaligned_be32(iter->seed, &pi->ref_tag);
+ } else {
+ pi->guard_tag = csum;
pi->app_tag = 0;
-
- if (bi->flags & BLK_INTEGRITY_REF_TAG)
- put_unaligned_be48(iter->seed, pi->ref_tag);
- else
- put_unaligned_be48(0ULL, pi->ref_tag);
-
- iter->data_buf += iter->interval;
- iter->prot_buf += bi->metadata_size;
- iter->seed++;
+ pi->ref_tag = cpu_to_be32(iter->seed);
}
}
@@ -247,227 +192,427 @@ static bool ext_pi_ref_escape(const u8 ref_tag[6])
}
static blk_status_t ext_pi_crc64_verify(struct blk_integrity_iter *iter,
- struct blk_integrity *bi)
+ struct crc64_pi_tuple *pi)
{
- u8 offset = bi->pi_offset;
- unsigned int i;
-
- for (i = 0; i < iter->data_size; i += iter->interval) {
- struct crc64_pi_tuple *pi = iter->prot_buf + offset;
- u64 ref, seed;
- __be64 csum;
-
- if (bi->flags & BLK_INTEGRITY_REF_TAG) {
- if (pi->app_tag == T10_PI_APP_ESCAPE)
- goto next;
-
- ref = get_unaligned_be48(pi->ref_tag);
- seed = lower_48_bits(iter->seed);
- if (ref != seed) {
- pr_err("%s: ref tag error at location %llu (rcvd %llu)\n",
- iter->disk_name, seed, ref);
- return BLK_STS_PROTECTION;
- }
- } else {
- if (pi->app_tag == T10_PI_APP_ESCAPE &&
- ext_pi_ref_escape(pi->ref_tag))
- goto next;
- }
-
- csum = ext_pi_crc64(0, iter->data_buf, iter->interval);
- if (offset)
- csum = ext_pi_crc64(be64_to_cpu(csum), iter->prot_buf,
- offset);
+ u64 guard, ref, seed = lower_48_bits(iter->seed);
+ u16 app;
+
+ if (unlikely((unsigned long)pi & (sizeof(*pi) - 1))) {
+ guard = get_unaligned_be64(&pi->guard_tag);
+ app = get_unaligned_be16(&pi->app_tag);
+ ref = get_unaligned_be48(pi->ref_tag);
+ } else {
+ guard = be64_to_cpu(pi->guard_tag);
+ app = be16_to_cpu(pi->app_tag);
+ ref = get_unaligned_be48(pi->ref_tag);
+ }
- if (pi->guard_tag != csum) {
- pr_err("%s: guard tag error at sector %llu " \
- "(rcvd %016llx, want %016llx)\n",
- iter->disk_name, (unsigned long long)iter->seed,
- be64_to_cpu(pi->guard_tag), be64_to_cpu(csum));
+ if (iter->bi->flags & BLK_INTEGRITY_REF_TAG) {
+ if (app == APP_TAG_ESCAPE)
+ return BLK_STS_OK;
+ if (ref != seed) {
+ pr_err("%s: ref tag error at location %llu (rcvd %llu)\n",
+ iter->bio->bi_bdev->bd_disk->disk_name, seed,
+ ref);
return BLK_STS_PROTECTION;
}
+ } else if (app == APP_TAG_ESCAPE && ext_pi_ref_escape(pi->ref_tag)) {
+ return BLK_STS_OK;
+ }
-next:
- iter->data_buf += iter->interval;
- iter->prot_buf += bi->metadata_size;
- iter->seed++;
+ if (guard != iter->crc) {
+ pr_err("%s: guard tag error at sector %llu (rcvd %016llx, want %016llx)\n",
+ iter->bio->bi_bdev->bd_disk->disk_name, iter->seed,
+ guard, iter->crc);
+ return BLK_STS_PROTECTION;
}
return BLK_STS_OK;
}
-static void ext_pi_type1_prepare(struct request *rq)
+static blk_status_t t10_pi_verify(struct blk_integrity_iter *iter,
+ struct t10_pi_tuple *pi)
{
- struct blk_integrity *bi = &rq->q->limits.integrity;
- const int tuple_sz = bi->metadata_size;
- u64 ref_tag = ext_pi_ref_tag(rq);
- u8 offset = bi->pi_offset;
- struct bio *bio;
+ u32 ref, seed = lower_32_bits(iter->seed);
+ u16 guard;
+ u16 app;
+
+ if (unlikely((unsigned long)pi & (sizeof(*pi) - 1))) {
+ guard = get_unaligned_be16(&pi->guard_tag);
+ app = get_unaligned_be16(&pi->app_tag);
+ ref = get_unaligned_be32(&pi->ref_tag);
+ } else {
+ guard = be16_to_cpu(pi->guard_tag);
+ app = be16_to_cpu(pi->app_tag);
+ ref = be32_to_cpu(pi->ref_tag);
+ }
- __rq_for_each_bio(bio, rq) {
- struct bio_integrity_payload *bip = bio_integrity(bio);
- u64 virt = lower_48_bits(bip_get_seed(bip));
- struct bio_vec iv;
- struct bvec_iter iter;
+ if (iter->bi->flags & BLK_INTEGRITY_REF_TAG) {
+ if (app == APP_TAG_ESCAPE)
+ return BLK_STS_OK;
+ if (ref != seed) {
+ pr_err("%s: ref tag error at location %u (rcvd %u)\n",
+ iter->bio->bi_bdev->bd_disk->disk_name, seed,
+ ref);
+ return BLK_STS_PROTECTION;
+ }
+ } else if (app == APP_TAG_ESCAPE && ref == REF_TAG_ESCAPTE) {
+ return BLK_STS_OK;
+ }
- /* Already remapped? */
- if (bip->bip_flags & BIP_MAPPED_INTEGRITY)
- break;
+ if (guard != (u16)iter->crc) {
+ pr_err("%s: guard tag error at sector %llu (rcvd %04x, want %04x)\n",
+ iter->bio->bi_bdev->bd_disk->disk_name, iter->seed,
+ guard, (u16)iter->crc);
+ return BLK_STS_PROTECTION;
+ }
- bip_for_each_vec(iv, bip, iter) {
- unsigned int j;
- void *p;
-
- p = bvec_kmap_local(&iv);
- for (j = 0; j < iv.bv_len; j += tuple_sz) {
- struct crc64_pi_tuple *pi = p + offset;
- u64 ref = get_unaligned_be48(pi->ref_tag);
-
- if (ref == virt)
- put_unaligned_be48(ref_tag, pi->ref_tag);
- virt++;
- ref_tag++;
- p += tuple_sz;
- }
- kunmap_local(p);
- }
+ return BLK_STS_OK;
+}
- bip->bip_flags |= BIP_MAPPED_INTEGRITY;
+static blk_status_t blk_integrity_verify(struct blk_integrity_iter *iter,
+ void *tuple)
+{
+ switch (iter->bi->csum_type) {
+ case BLK_INTEGRITY_CSUM_CRC64:
+ return ext_pi_crc64_verify(iter, tuple);
+ case BLK_INTEGRITY_CSUM_CRC:
+ case BLK_INTEGRITY_CSUM_IP:
+ return t10_pi_verify(iter, tuple);
+ default:
+ return BLK_STS_OK;
}
}
-static void ext_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
+static void blk_integrity_set(struct blk_integrity_iter *iter,
+ void *tuple)
{
- struct blk_integrity *bi = &rq->q->limits.integrity;
- unsigned intervals = nr_bytes >> bi->interval_exp;
- const int tuple_sz = bi->metadata_size;
- u64 ref_tag = ext_pi_ref_tag(rq);
- u8 offset = bi->pi_offset;
- struct bio *bio;
+ switch (iter->bi->csum_type) {
+ case BLK_INTEGRITY_CSUM_CRC64:
+ return blk_set_ext_pi(tuple, iter);
+ case BLK_INTEGRITY_CSUM_CRC:
+ return blk_set_t10_pi(tuple, iter);
+ case BLK_INTEGRITY_CSUM_IP:
+ return blk_set_ip_pi(tuple, iter);
+ default:
+ WARN_ON_ONCE(1);
+ return;
+ }
+}
- __rq_for_each_bio(bio, rq) {
- struct bio_integrity_payload *bip = bio_integrity(bio);
- u64 virt = lower_48_bits(bip_get_seed(bip));
- struct bio_vec iv;
- struct bvec_iter iter;
-
- bip_for_each_vec(iv, bip, iter) {
- unsigned int j;
- void *p;
-
- p = bvec_kmap_local(&iv);
- for (j = 0; j < iv.bv_len && intervals; j += tuple_sz) {
- struct crc64_pi_tuple *pi = p + offset;
- u64 ref = get_unaligned_be48(pi->ref_tag);
-
- if (ref == ref_tag)
- put_unaligned_be48(virt, pi->ref_tag);
- virt++;
- ref_tag++;
- intervals--;
- p += tuple_sz;
- }
- kunmap_local(p);
- }
+static blk_status_t blk_integrity_interval(struct blk_integrity_iter *iter,
+ bool verify)
+{
+ blk_status_t ret = BLK_STS_OK;
+ union pi_tuple tuple;
+ void *ptuple = &tuple;
+ struct bio_vec pbv;
+
+ blk_integrity_crc_offset(iter);
+ pbv = mp_bvec_iter_bvec(iter->bip->bip_vec, iter->prot_iter);
+ if (pbv.bv_len >= iter->bi->pi_tuple_size) {
+ ptuple = bvec_kmap_local(&pbv);
+ bvec_iter_advance_single(iter->bip->bip_vec, &iter->prot_iter,
+ iter->bi->metadata_size - iter->bi->pi_offset);
+ } else if (verify) {
+ blk_integrity_copy_to_tuple(iter, ptuple);
}
+
+ if (verify)
+ ret = blk_integrity_verify(iter, ptuple);
+ else
+ blk_integrity_set(iter, ptuple);
+
+ if (likely(ptuple != &tuple))
+ kunmap_local(ptuple);
+ else if (!verify)
+ blk_integrity_copy_from_tuple(iter, ptuple);
+
+ iter->interval_remaining = 1 << iter->bi->interval_exp;
+ iter->crc = 0;
+ iter->seed++;
+
+ return ret;
}
-void blk_integrity_generate(struct bio *bio)
+static void blk_integrity_iterate(struct bio *bio, struct bvec_iter *data_iter,
+ bool verify)
{
struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
struct bio_integrity_payload *bip = bio_integrity(bio);
- struct blk_integrity_iter iter;
- struct bvec_iter bviter;
- struct bio_vec bv;
-
- iter.disk_name = bio->bi_bdev->bd_disk->disk_name;
- iter.interval = 1 << bi->interval_exp;
- iter.seed = bio->bi_iter.bi_sector;
- iter.prot_buf = bvec_virt(bip->bip_vec);
- bio_for_each_segment(bv, bio, bviter) {
+ struct blk_integrity_iter iter = {
+ .bio = bio,
+ .bip = bip,
+ .bi = bi,
+ .data_iter = *data_iter,
+ .prot_iter = bip->bip_iter,
+ .interval_remaining = 1 << bi->interval_exp,
+ .seed = data_iter->bi_sector,
+ .crc = 0,
+ };
+ blk_status_t ret = BLK_STS_OK;
+
+ while (iter.data_iter.bi_size && ret == BLK_STS_OK) {
+ struct bio_vec bv = mp_bvec_iter_bvec(iter.bio->bi_io_vec,
+ iter.data_iter);
void *kaddr = bvec_kmap_local(&bv);
-
- iter.data_buf = kaddr;
- iter.data_size = bv.bv_len;
- switch (bi->csum_type) {
- case BLK_INTEGRITY_CSUM_CRC64:
- ext_pi_crc64_generate(&iter, bi);
- break;
- case BLK_INTEGRITY_CSUM_CRC:
- case BLK_INTEGRITY_CSUM_IP:
- t10_pi_generate(&iter, bi);
- break;
- default:
- break;
+ void *data = kaddr;
+ unsigned int len;
+
+ bvec_iter_advance_single(iter.bio->bi_io_vec, &iter.data_iter,
+ bv.bv_len);
+ while (bv.bv_len && ret == BLK_STS_OK) {
+ len = min(iter.interval_remaining, bv.bv_len);
+ blk_crc(&iter, data, len);
+ bv.bv_len -= len;
+ data += len;
+ iter.interval_remaining -= len;
+ if (!iter.interval_remaining)
+ ret = blk_integrity_interval(&iter, verify);
}
kunmap_local(kaddr);
}
+
+ if (ret)
+ bio->bi_status = ret;
+}
+
+void blk_integrity_generate(struct bio *bio)
+{
+ struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
+
+ switch (bi->csum_type) {
+ case BLK_INTEGRITY_CSUM_CRC64:
+ case BLK_INTEGRITY_CSUM_CRC:
+ case BLK_INTEGRITY_CSUM_IP:
+ blk_integrity_iterate(bio, &bio->bi_iter, false);
+ break;
+ default:
+ break;
+ }
}
void blk_integrity_verify_iter(struct bio *bio, struct bvec_iter *saved_iter)
{
struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
- struct bio_integrity_payload *bip = bio_integrity(bio);
- struct blk_integrity_iter iter;
- struct bvec_iter bviter;
- struct bio_vec bv;
+
+ switch (bi->csum_type) {
+ case BLK_INTEGRITY_CSUM_CRC64:
+ case BLK_INTEGRITY_CSUM_CRC:
+ case BLK_INTEGRITY_CSUM_IP:
+ blk_integrity_iterate(bio, saved_iter, true);
+ break;
+ default:
+ break;
+ }
+}
+
+/**
+ * blk_pi_advance_offset - advance @iter past the protection offset
+ *
+ * For protection formats that contain front padding on the metadata region.
+ */
+static void blk_pi_advance_offset(struct blk_integrity *bi,
+ struct bio_integrity_payload *bip,
+ struct bvec_iter *iter)
+{
+ unsigned int offset = bi->pi_offset;
+
+ while (offset > 0) {
+ struct bio_vec bv = mp_bvec_iter_bvec(bip->bip_vec, *iter);
+ unsigned int len = min(bv.bv_len, offset);
+
+ bvec_iter_advance_single(bip->bip_vec, iter, len);
+ offset -= len;
+ }
+}
+
+static void *blk_tuple_remap_begin(union pi_tuple *tuple,
+ struct blk_integrity *bi,
+ struct bio_integrity_payload *bip,
+ struct bvec_iter *iter)
+{
+ struct bvec_iter titer;
+ struct bio_vec pbv;
+
+ blk_pi_advance_offset(bi, bip, iter);
+ pbv = mp_bvec_iter_bvec(bip->bip_vec, *iter);
+ if (likely(pbv.bv_len >= bi->pi_tuple_size))
+ return bvec_kmap_local(&pbv);
/*
- * At the moment verify is called bi_iter has been advanced during split
- * and completion, so use the copy created during submission here.
+ * We need to preserve the state of the original iter for the
+ * copy_from_tuple at the end, so make a temp iter for here.
*/
- iter.disk_name = bio->bi_bdev->bd_disk->disk_name;
- iter.interval = 1 << bi->interval_exp;
- iter.seed = saved_iter->bi_sector;
- iter.prot_buf = bvec_virt(bip->bip_vec);
- __bio_for_each_segment(bv, bio, bviter, *saved_iter) {
- void *kaddr = bvec_kmap_local(&bv);
- blk_status_t ret = BLK_STS_OK;
+ titer = *iter;
+ __blk_integrity_copy_to_tuple(bip, &titer, tuple, bi->pi_tuple_size);
+ return tuple;
+}
- iter.data_buf = kaddr;
- iter.data_size = bv.bv_len;
- switch (bi->csum_type) {
- case BLK_INTEGRITY_CSUM_CRC64:
- ret = ext_pi_crc64_verify(&iter, bi);
- break;
- case BLK_INTEGRITY_CSUM_CRC:
- case BLK_INTEGRITY_CSUM_IP:
- ret = t10_pi_verify(&iter, bi);
- break;
- default:
- break;
- }
- kunmap_local(kaddr);
+static void blk_tuple_remap_end(union pi_tuple *tuple, void *ptuple,
+ struct blk_integrity *bi,
+ struct bio_integrity_payload *bip,
+ struct bvec_iter *iter)
+{
+ unsigned int len = bi->metadata_size - bi->pi_offset;
+
+ if (likely(ptuple != tuple)) {
+ kunmap_local(ptuple);
+ } else {
+ __blk_integrity_copy_from_tuple(bip, iter, ptuple,
+ bi->pi_tuple_size);
+ len -= bi->pi_tuple_size;
+ }
- if (ret) {
- bio->bi_status = ret;
- return;
- }
+ bvec_iter_advance(bip->bip_vec, iter, len);
+}
+
+static void blk_set_ext_unmap_ref(void *prot_buf, u64 virt, u64 ref_tag)
+{
+ struct crc64_pi_tuple *pi = prot_buf;
+ u64 ref = get_unaligned_be48(&pi->ref_tag);
+
+ if (ref == lower_48_bits(virt) && ref != virt)
+ put_unaligned_be48(virt, pi->ref_tag);
+}
+
+static void blk_set_t10_unmap_ref(void *prot_buf, u32 virt, u32 ref_tag)
+{
+ struct t10_pi_tuple *pi = prot_buf;
+ u32 ref;
+
+ if (unlikely((unsigned long)pi & (sizeof(*pi) - 1)))
+ ref = get_unaligned_be32(&pi->ref_tag);
+ else
+ ref = be32_to_cpu(pi->ref_tag);
+
+ if (ref != ref_tag || ref == virt)
+ return;
+
+ if (unlikely((unsigned long)pi & (sizeof(*pi) - 1)))
+ put_unaligned_be32(virt, &pi->ref_tag);
+ else
+ pi->ref_tag = cpu_to_be32(virt);
+}
+
+static void blk_reftag_remap_complete(struct blk_integrity *bi, void *tuple,
+ u64 virt, u64 ref)
+{
+ switch (bi->csum_type) {
+ case BLK_INTEGRITY_CSUM_CRC64:
+ blk_set_ext_unmap_ref(tuple, virt, ref);
+ break;
+ case BLK_INTEGRITY_CSUM_CRC:
+ case BLK_INTEGRITY_CSUM_IP:
+ blk_set_t10_unmap_ref(tuple, virt, ref);
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ break;
}
}
-void blk_integrity_prepare(struct request *rq)
+static void blk_set_ext_map_ref(void *prot_buf, u64 virt, u64 ref_tag)
{
- struct blk_integrity *bi = &rq->q->limits.integrity;
+ struct crc64_pi_tuple *pi = prot_buf;
+ u64 ref = get_unaligned_be48(&pi->ref_tag);
- if (!(bi->flags & BLK_INTEGRITY_REF_TAG))
+ if (ref == lower_48_bits(virt) && ref != ref_tag)
+ put_unaligned_be48(ref_tag, pi->ref_tag);
+}
+
+static void blk_set_t10_map_ref(void *prot_buf, u32 virt, u32 ref_tag)
+{
+ struct t10_pi_tuple *pi = prot_buf;
+ u32 ref;
+
+ if (unlikely((unsigned long)pi & (sizeof(*pi) - 1)))
+ ref = get_unaligned_be32(&pi->ref_tag);
+ else
+ ref = be32_to_cpu(pi->ref_tag);
+
+ if (ref != virt || ref == ref_tag)
return;
- if (bi->csum_type == BLK_INTEGRITY_CSUM_CRC64)
- ext_pi_type1_prepare(rq);
+ if (unlikely((unsigned long)pi & (sizeof(*pi) - 1)))
+ put_unaligned_be32(ref_tag, &pi->ref_tag);
else
- t10_pi_type1_prepare(rq);
+ pi->ref_tag = cpu_to_be32(ref_tag);
}
-void blk_integrity_complete(struct request *rq, unsigned int nr_bytes)
+static void blk_reftag_remap_prepare(struct blk_integrity *bi, void *tuple,
+ u64 virt, u64 ref)
+{
+ switch (bi->csum_type) {
+ case BLK_INTEGRITY_CSUM_CRC64:
+ blk_set_ext_map_ref(tuple, virt, ref);
+ break;
+ case BLK_INTEGRITY_CSUM_CRC:
+ case BLK_INTEGRITY_CSUM_IP:
+ blk_set_t10_map_ref(tuple, virt, ref);
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ break;
+ }
+}
+
+static void __blk_reftag_remap(struct bio *bio, struct blk_integrity *bi,
+ unsigned *intervals, u64 *ref, bool prep)
+{
+ struct bio_integrity_payload *bip = bio_integrity(bio);
+ struct bvec_iter iter = bip->bip_iter;
+ u64 virt = bip_get_seed(bip);
+ union pi_tuple tuple;
+ void *ptuple;
+
+ if (prep && bip->bip_flags & BIP_MAPPED_INTEGRITY) {
+ *ref += bio->bi_iter.bi_size >> bi->interval_exp;
+ return;
+ }
+
+ while (iter.bi_size && *intervals) {
+ ptuple = blk_tuple_remap_begin(&tuple, bi, bip, &iter);
+
+ if (prep)
+ blk_reftag_remap_prepare(bi, ptuple, virt, *ref);
+ else
+ blk_reftag_remap_complete(bi, ptuple, virt, *ref);
+
+ blk_tuple_remap_end(&tuple, ptuple, bi, bip, &iter);
+ (*intervals)--;
+ (*ref)++;
+ virt++;
+ }
+
+ if (prep)
+ bip->bip_flags |= BIP_MAPPED_INTEGRITY;
+}
+
+static void blk_integrity_remap(struct request *rq, unsigned int nr_bytes,
+ bool prep)
{
struct blk_integrity *bi = &rq->q->limits.integrity;
+ u64 ref = blk_rq_pos(rq) >> (bi->interval_exp - SECTOR_SHIFT);
+ unsigned intervals = nr_bytes >> bi->interval_exp;
+ struct bio *bio;
if (!(bi->flags & BLK_INTEGRITY_REF_TAG))
return;
- if (bi->csum_type == BLK_INTEGRITY_CSUM_CRC64)
- ext_pi_type1_complete(rq, nr_bytes);
- else
- t10_pi_type1_complete(rq, nr_bytes);
+ __rq_for_each_bio(bio, rq) {
+ __blk_reftag_remap(bio, bi, &intervals, &ref, prep);
+ if (!intervals)
+ break;
+ }
+}
+
+void blk_integrity_prepare(struct request *rq)
+{
+ blk_integrity_remap(rq, blk_rq_bytes(rq), true);
+}
+
+void blk_integrity_complete(struct request *rq, unsigned int nr_bytes)
+{
+ blk_integrity_remap(rq, nr_bytes, false);
}
--
2.47.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCHv2] blk-integrity: support arbitrary buffer alignment
2025-11-07 4:34 [PATCHv2] blk-integrity: support arbitrary buffer alignment Keith Busch
@ 2025-11-07 13:15 ` Christoph Hellwig
2025-11-07 15:00 ` Keith Busch
0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2025-11-07 13:15 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-block, hch, axboe, martin.petersen, Keith Busch
[-- Attachment #1: Type: text/plain, Size: 1420 bytes --]
On Thu, Nov 06, 2025 at 08:34:47PM -0800, Keith Busch wrote:
> This was tested using recently proposed io_uring metadata test case
> here:
>
> https://lore.kernel.org/io-uring/20251107042953.3393507-1-kbusch@meta.com/
>
> The test purposefully contructs metadata with offsets that have the data
> integrity field straddle pages. As longs as they're not physically
> contiguous, that will split the field across multiple segments and test
> those conditions, which will either get a copy buffer if the device
> doesn't support multiple integrity segments, or get a temporary data
> integrity field copy during the reftag remapping.
Any chance we could get this test or something like it into blktests?
That way it would get regularly run as part of block layer validation.
The changes looks sensible to be, and pass very basic sanity testing
using my PI setup. I have few cleanups we should get into (attached).
1: pass the union type down instead of casting to the t10/crc tuples
to improve type safety
2: fix W=1 warnings due to not quite kerneldoc comments
3: cleanup the copy wrappers that are only used once each
4: just always use the unaligned handlers. I guess this might be a
bit contentious, but at least for x86 it actually generates better
code. Alternatively we could require dword (4 byte) alignment for
PI, and only the guard tag in the crc64 format would require any
unaligned handling at all.
[-- Attachment #2: 0001-pass-union-pi_tuple-down.patch --]
[-- Type: text/x-patch, Size: 7242 bytes --]
From 0160966a9f74c9d60f3a8092a56ad5b429ae8bcb Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Fri, 7 Nov 2025 07:27:18 -0500
Subject: pass union pi_tuple down
Provide some extra type safety.
---
block/t10-pi.c | 70 ++++++++++++++++++++++++--------------------------
1 file changed, 34 insertions(+), 36 deletions(-)
diff --git a/block/t10-pi.c b/block/t10-pi.c
index dd0986b272bb..2702fe97e4fd 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -102,7 +102,7 @@ static void __blk_integrity_copy_from_tuple(struct bio_integrity_payload *bip,
}
static void blk_integrity_copy_from_tuple(struct blk_integrity_iter *iter,
- void *tuple)
+ union pi_tuple *tuple)
{
__blk_integrity_copy_from_tuple(iter->bip, &iter->prot_iter,
tuple, iter->bi->pi_tuple_size);
@@ -132,17 +132,16 @@ static void __blk_integrity_copy_to_tuple(struct bio_integrity_payload *bip,
}
static void blk_integrity_copy_to_tuple(struct blk_integrity_iter *iter,
- void *tuple)
+ union pi_tuple *tuple)
{
__blk_integrity_copy_to_tuple(iter->bip, &iter->prot_iter,
tuple, iter->bi->pi_tuple_size);
}
-static void blk_set_ext_pi(void *prot_buf, struct blk_integrity_iter *iter)
+static void blk_set_ext_pi(struct crc64_pi_tuple *pi,
+ struct blk_integrity_iter *iter)
{
- struct crc64_pi_tuple *pi = prot_buf;
-
- if (unlikely((unsigned long)prot_buf & (sizeof(*pi) - 1))) {
+ if (unlikely((unsigned long)pi & (sizeof(*pi) - 1))) {
put_unaligned_be64(iter->crc, &pi->guard_tag);
put_unaligned_be16(0, &pi->app_tag);
put_unaligned_be48(iter->seed, &pi->ref_tag);
@@ -153,11 +152,10 @@ static void blk_set_ext_pi(void *prot_buf, struct blk_integrity_iter *iter)
}
}
-static void blk_set_t10_pi(void *prot_buf, struct blk_integrity_iter *iter)
+static void blk_set_t10_pi(struct t10_pi_tuple *pi,
+ struct blk_integrity_iter *iter)
{
- struct t10_pi_tuple *pi = prot_buf;
-
- if (unlikely((unsigned long)prot_buf & (sizeof(*pi) - 1))) {
+ if (unlikely((unsigned long)pi & (sizeof(*pi) - 1))) {
put_unaligned_be16(iter->crc, &pi->guard_tag);
put_unaligned_be16(0, &pi->app_tag);
put_unaligned_be32(iter->seed, &pi->ref_tag);
@@ -168,12 +166,12 @@ static void blk_set_t10_pi(void *prot_buf, struct blk_integrity_iter *iter)
}
}
-static void blk_set_ip_pi(void *prot_buf, struct blk_integrity_iter *iter)
+static void blk_set_ip_pi(struct t10_pi_tuple *pi,
+ struct blk_integrity_iter *iter)
{
__be16 csum = (__force __be16)~(lower_16_bits(iter->crc));
- struct t10_pi_tuple *pi = prot_buf;
- if (unlikely((unsigned long)prot_buf & (sizeof(*pi) - 1))) {
+ if (unlikely((unsigned long)pi & (sizeof(*pi) - 1))) {
__put_unaligned_t(__be16, csum, &pi->guard_tag);
put_unaligned_be16(0, &pi->app_tag);
put_unaligned_be32(iter->seed, &pi->ref_tag);
@@ -271,29 +269,29 @@ static blk_status_t t10_pi_verify(struct blk_integrity_iter *iter,
}
static blk_status_t blk_integrity_verify(struct blk_integrity_iter *iter,
- void *tuple)
+ union pi_tuple *tuple)
{
switch (iter->bi->csum_type) {
case BLK_INTEGRITY_CSUM_CRC64:
- return ext_pi_crc64_verify(iter, tuple);
+ return ext_pi_crc64_verify(iter, &tuple->crc64_pi);
case BLK_INTEGRITY_CSUM_CRC:
case BLK_INTEGRITY_CSUM_IP:
- return t10_pi_verify(iter, tuple);
+ return t10_pi_verify(iter, &tuple->t10_pi);
default:
return BLK_STS_OK;
}
}
static void blk_integrity_set(struct blk_integrity_iter *iter,
- void *tuple)
+ union pi_tuple *tuple)
{
switch (iter->bi->csum_type) {
case BLK_INTEGRITY_CSUM_CRC64:
- return blk_set_ext_pi(tuple, iter);
+ return blk_set_ext_pi(&tuple->crc64_pi, iter);
case BLK_INTEGRITY_CSUM_CRC:
- return blk_set_t10_pi(tuple, iter);
+ return blk_set_t10_pi(&tuple->t10_pi, iter);
case BLK_INTEGRITY_CSUM_IP:
- return blk_set_ip_pi(tuple, iter);
+ return blk_set_ip_pi(&tuple->t10_pi, iter);
default:
WARN_ON_ONCE(1);
return;
@@ -467,18 +465,18 @@ static void blk_tuple_remap_end(union pi_tuple *tuple, void *ptuple,
bvec_iter_advance(bip->bip_vec, iter, len);
}
-static void blk_set_ext_unmap_ref(void *prot_buf, u64 virt, u64 ref_tag)
+static void blk_set_ext_unmap_ref(struct crc64_pi_tuple *pi, u64 virt,
+ u64 ref_tag)
{
- struct crc64_pi_tuple *pi = prot_buf;
u64 ref = get_unaligned_be48(&pi->ref_tag);
if (ref == lower_48_bits(virt) && ref != virt)
put_unaligned_be48(virt, pi->ref_tag);
}
-static void blk_set_t10_unmap_ref(void *prot_buf, u32 virt, u32 ref_tag)
+static void blk_set_t10_unmap_ref(struct t10_pi_tuple *pi, u32 virt,
+ u32 ref_tag)
{
- struct t10_pi_tuple *pi = prot_buf;
u32 ref;
if (unlikely((unsigned long)pi & (sizeof(*pi) - 1)))
@@ -495,16 +493,16 @@ static void blk_set_t10_unmap_ref(void *prot_buf, u32 virt, u32 ref_tag)
pi->ref_tag = cpu_to_be32(virt);
}
-static void blk_reftag_remap_complete(struct blk_integrity *bi, void *tuple,
- u64 virt, u64 ref)
+static void blk_reftag_remap_complete(struct blk_integrity *bi,
+ union pi_tuple *tuple, u64 virt, u64 ref)
{
switch (bi->csum_type) {
case BLK_INTEGRITY_CSUM_CRC64:
- blk_set_ext_unmap_ref(tuple, virt, ref);
+ blk_set_ext_unmap_ref(&tuple->crc64_pi, virt, ref);
break;
case BLK_INTEGRITY_CSUM_CRC:
case BLK_INTEGRITY_CSUM_IP:
- blk_set_t10_unmap_ref(tuple, virt, ref);
+ blk_set_t10_unmap_ref(&tuple->t10_pi, virt, ref);
break;
default:
WARN_ON_ONCE(1);
@@ -512,18 +510,17 @@ static void blk_reftag_remap_complete(struct blk_integrity *bi, void *tuple,
}
}
-static void blk_set_ext_map_ref(void *prot_buf, u64 virt, u64 ref_tag)
+static void blk_set_ext_map_ref(struct crc64_pi_tuple *pi, u64 virt,
+ u64 ref_tag)
{
- struct crc64_pi_tuple *pi = prot_buf;
u64 ref = get_unaligned_be48(&pi->ref_tag);
if (ref == lower_48_bits(virt) && ref != ref_tag)
put_unaligned_be48(ref_tag, pi->ref_tag);
}
-static void blk_set_t10_map_ref(void *prot_buf, u32 virt, u32 ref_tag)
+static void blk_set_t10_map_ref(struct t10_pi_tuple *pi, u32 virt, u32 ref_tag)
{
- struct t10_pi_tuple *pi = prot_buf;
u32 ref;
if (unlikely((unsigned long)pi & (sizeof(*pi) - 1)))
@@ -540,16 +537,17 @@ static void blk_set_t10_map_ref(void *prot_buf, u32 virt, u32 ref_tag)
pi->ref_tag = cpu_to_be32(ref_tag);
}
-static void blk_reftag_remap_prepare(struct blk_integrity *bi, void *tuple,
+static void blk_reftag_remap_prepare(struct blk_integrity *bi,
+ union pi_tuple *tuple,
u64 virt, u64 ref)
{
switch (bi->csum_type) {
case BLK_INTEGRITY_CSUM_CRC64:
- blk_set_ext_map_ref(tuple, virt, ref);
+ blk_set_ext_map_ref(&tuple->crc64_pi, virt, ref);
break;
case BLK_INTEGRITY_CSUM_CRC:
case BLK_INTEGRITY_CSUM_IP:
- blk_set_t10_map_ref(tuple, virt, ref);
+ blk_set_t10_map_ref(&tuple->t10_pi, virt, ref);
break;
default:
WARN_ON_ONCE(1);
@@ -564,7 +562,7 @@ static void __blk_reftag_remap(struct bio *bio, struct blk_integrity *bi,
struct bvec_iter iter = bip->bip_iter;
u64 virt = bip_get_seed(bip);
union pi_tuple tuple;
- void *ptuple;
+ union pi_tuple *ptuple;
if (prep && bip->bip_flags & BIP_MAPPED_INTEGRITY) {
*ref += bio->bi_iter.bi_size >> bi->interval_exp;
--
2.47.3
[-- Attachment #3: 0002-un-kerneldoc.patch --]
[-- Type: text/x-patch, Size: 2172 bytes --]
From 8e15595cf69757fd82623caaf7937cf49349c56d Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Fri, 7 Nov 2025 08:06:02 -0500
Subject: un-kerneldoc
Otherwise make W=1 complains. For the copy helpers I dropped the
comments entirely as they don't seem to provide a value over just
the function names.
---
block/t10-pi.c | 19 ++++++-------------
1 file changed, 6 insertions(+), 13 deletions(-)
diff --git a/block/t10-pi.c b/block/t10-pi.c
index 2702fe97e4fd..203598f26596 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -57,9 +57,9 @@ static void blk_crc(struct blk_integrity_iter *iter, void *data,
}
}
-/**
- * blk_integrity_crc_offset - update the crc for formats that have metadata
- * padding in front of the data integrity field
+/*
+ * Update the crc for formats that have metadata padding in front of the data
+ * integrity field
*/
static void blk_integrity_crc_offset(struct blk_integrity_iter *iter)
{
@@ -78,9 +78,6 @@ static void blk_integrity_crc_offset(struct blk_integrity_iter *iter)
}
}
-/**
- * __blk_integrity_copy_from_tuple() - copy from @tuple to @iter
- */
static void __blk_integrity_copy_from_tuple(struct bio_integrity_payload *bip,
struct bvec_iter *iter, void *tuple,
unsigned int tuple_size)
@@ -108,9 +105,6 @@ static void blk_integrity_copy_from_tuple(struct blk_integrity_iter *iter,
tuple, iter->bi->pi_tuple_size);
}
-/**
- * __blk_integrity_copy_to_tuple() - copy to &tuple from @iter
- */
static void __blk_integrity_copy_to_tuple(struct bio_integrity_payload *bip,
struct bvec_iter *iter, void *tuple,
unsigned int tuple_size)
@@ -405,10 +399,9 @@ void blk_integrity_verify_iter(struct bio *bio, struct bvec_iter *saved_iter)
}
}
-/**
- * blk_pi_advance_offset - advance @iter past the protection offset
- *
- * For protection formats that contain front padding on the metadata region.
+/*
+ * Advance @iter past the protection offset for protection formats that
+ * contain front padding on the metadata region.
*/
static void blk_pi_advance_offset(struct blk_integrity *bi,
struct bio_integrity_payload *bip,
--
2.47.3
[-- Attachment #4: 0003-simplify-copy-to-helpers.patch --]
[-- Type: text/x-patch, Size: 3687 bytes --]
From 6b14968fcbcc6e0a5c9c13b02f467d04a12d8bb6 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Fri, 7 Nov 2025 08:08:39 -0500
Subject: simplify copy/to helpers
No real need for the wrappers that have a single caller. Also make the
trivial comments non-kerneldoc as they don't document all paramters
anyway.
---
block/t10-pi.c | 44 +++++++++++++++++---------------------------
1 file changed, 17 insertions(+), 27 deletions(-)
diff --git a/block/t10-pi.c b/block/t10-pi.c
index 203598f26596..a3da845f03d9 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -78,9 +78,9 @@ static void blk_integrity_crc_offset(struct blk_integrity_iter *iter)
}
}
-static void __blk_integrity_copy_from_tuple(struct bio_integrity_payload *bip,
- struct bvec_iter *iter, void *tuple,
- unsigned int tuple_size)
+static void blk_integrity_copy_from_tuple(struct bio_integrity_payload *bip,
+ struct bvec_iter *iter, void *tuple,
+ unsigned int tuple_size)
{
void *prot_buf;
@@ -98,16 +98,9 @@ static void __blk_integrity_copy_from_tuple(struct bio_integrity_payload *bip,
}
}
-static void blk_integrity_copy_from_tuple(struct blk_integrity_iter *iter,
- union pi_tuple *tuple)
-{
- __blk_integrity_copy_from_tuple(iter->bip, &iter->prot_iter,
- tuple, iter->bi->pi_tuple_size);
-}
-
-static void __blk_integrity_copy_to_tuple(struct bio_integrity_payload *bip,
- struct bvec_iter *iter, void *tuple,
- unsigned int tuple_size)
+static void blk_integrity_copy_to_tuple(struct bio_integrity_payload *bip,
+ struct bvec_iter *iter, void *tuple,
+ unsigned int tuple_size)
{
void *prot_buf;
@@ -125,13 +118,6 @@ static void __blk_integrity_copy_to_tuple(struct bio_integrity_payload *bip,
}
}
-static void blk_integrity_copy_to_tuple(struct blk_integrity_iter *iter,
- union pi_tuple *tuple)
-{
- __blk_integrity_copy_to_tuple(iter->bip, &iter->prot_iter,
- tuple, iter->bi->pi_tuple_size);
-}
-
static void blk_set_ext_pi(struct crc64_pi_tuple *pi,
struct blk_integrity_iter *iter)
{
@@ -307,7 +293,8 @@ static blk_status_t blk_integrity_interval(struct blk_integrity_iter *iter,
bvec_iter_advance_single(iter->bip->bip_vec, &iter->prot_iter,
iter->bi->metadata_size - iter->bi->pi_offset);
} else if (verify) {
- blk_integrity_copy_to_tuple(iter, ptuple);
+ blk_integrity_copy_to_tuple(iter->bip, &iter->prot_iter,
+ ptuple, iter->bi->pi_tuple_size);
}
if (verify)
@@ -315,10 +302,13 @@ static blk_status_t blk_integrity_interval(struct blk_integrity_iter *iter,
else
blk_integrity_set(iter, ptuple);
- if (likely(ptuple != &tuple))
+ if (likely(ptuple != &tuple)) {
kunmap_local(ptuple);
- else if (!verify)
- blk_integrity_copy_from_tuple(iter, ptuple);
+ } else if (!verify) {
+ blk_integrity_copy_from_tuple(iter->bip, &iter->prot_iter,
+ ptuple, iter->bi->pi_tuple_size);
+ }
+
iter->interval_remaining = 1 << iter->bi->interval_exp;
iter->crc = 0;
@@ -436,7 +426,7 @@ static void *blk_tuple_remap_begin(union pi_tuple *tuple,
* copy_from_tuple at the end, so make a temp iter for here.
*/
titer = *iter;
- __blk_integrity_copy_to_tuple(bip, &titer, tuple, bi->pi_tuple_size);
+ blk_integrity_copy_to_tuple(bip, &titer, tuple, bi->pi_tuple_size);
return tuple;
}
@@ -450,8 +440,8 @@ static void blk_tuple_remap_end(union pi_tuple *tuple, void *ptuple,
if (likely(ptuple != tuple)) {
kunmap_local(ptuple);
} else {
- __blk_integrity_copy_from_tuple(bip, iter, ptuple,
- bi->pi_tuple_size);
+ blk_integrity_copy_from_tuple(bip, iter, ptuple,
+ bi->pi_tuple_size);
len -= bi->pi_tuple_size;
}
--
2.47.3
[-- Attachment #5: 0004-unaligned.patch --]
[-- Type: text/x-patch, Size: 5574 bytes --]
From c73943c50ce29d8c453e9339a84258ea16340266 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Fri, 7 Nov 2025 07:42:38 -0500
Subject: unaligned
The unaligned handling isn't any more expensive on "sane" architectures.
In fact on x86 it compiles down to the same code as the aligned version,
just using slightly different register allocation.
---
block/t10-pi.c | 97 +++++++++++---------------------------------------
1 file changed, 21 insertions(+), 76 deletions(-)
diff --git a/block/t10-pi.c b/block/t10-pi.c
index a3da845f03d9..8225f4cc972d 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -121,29 +121,17 @@ static void blk_integrity_copy_to_tuple(struct bio_integrity_payload *bip,
static void blk_set_ext_pi(struct crc64_pi_tuple *pi,
struct blk_integrity_iter *iter)
{
- if (unlikely((unsigned long)pi & (sizeof(*pi) - 1))) {
- put_unaligned_be64(iter->crc, &pi->guard_tag);
- put_unaligned_be16(0, &pi->app_tag);
- put_unaligned_be48(iter->seed, &pi->ref_tag);
- } else {
- pi->guard_tag = cpu_to_be64(iter->crc);
- pi->app_tag = 0;
- put_unaligned_be48(iter->seed, &pi->ref_tag);
- }
+ put_unaligned_be64(iter->crc, &pi->guard_tag);
+ put_unaligned((__be16)0, &pi->app_tag);
+ put_unaligned_be48(iter->seed, &pi->ref_tag);
}
static void blk_set_t10_pi(struct t10_pi_tuple *pi,
struct blk_integrity_iter *iter)
{
- if (unlikely((unsigned long)pi & (sizeof(*pi) - 1))) {
- put_unaligned_be16(iter->crc, &pi->guard_tag);
- put_unaligned_be16(0, &pi->app_tag);
- put_unaligned_be32(iter->seed, &pi->ref_tag);
- } else {
- pi->guard_tag = cpu_to_be16(iter->crc);
- pi->app_tag = 0;
- pi->ref_tag = cpu_to_be32(iter->seed);
- }
+ put_unaligned_be16(iter->crc, &pi->guard_tag);
+ put_unaligned((__be16)0, &pi->app_tag);
+ put_unaligned_be32(iter->seed, &pi->ref_tag);
}
static void blk_set_ip_pi(struct t10_pi_tuple *pi,
@@ -151,15 +139,9 @@ static void blk_set_ip_pi(struct t10_pi_tuple *pi,
{
__be16 csum = (__force __be16)~(lower_16_bits(iter->crc));
- if (unlikely((unsigned long)pi & (sizeof(*pi) - 1))) {
- __put_unaligned_t(__be16, csum, &pi->guard_tag);
- put_unaligned_be16(0, &pi->app_tag);
- put_unaligned_be32(iter->seed, &pi->ref_tag);
- } else {
- pi->guard_tag = csum;
- pi->app_tag = 0;
- pi->ref_tag = cpu_to_be32(iter->seed);
- }
+ __put_unaligned_t(__be16, csum, &pi->guard_tag);
+ put_unaligned_be16(0, &pi->app_tag);
+ put_unaligned_be32(iter->seed, &pi->ref_tag);
}
static bool ext_pi_ref_escape(const u8 ref_tag[6])
@@ -172,18 +154,10 @@ static bool ext_pi_ref_escape(const u8 ref_tag[6])
static blk_status_t ext_pi_crc64_verify(struct blk_integrity_iter *iter,
struct crc64_pi_tuple *pi)
{
- u64 guard, ref, seed = lower_48_bits(iter->seed);
- u16 app;
-
- if (unlikely((unsigned long)pi & (sizeof(*pi) - 1))) {
- guard = get_unaligned_be64(&pi->guard_tag);
- app = get_unaligned_be16(&pi->app_tag);
- ref = get_unaligned_be48(pi->ref_tag);
- } else {
- guard = be64_to_cpu(pi->guard_tag);
- app = be16_to_cpu(pi->app_tag);
- ref = get_unaligned_be48(pi->ref_tag);
- }
+ u64 seed = lower_48_bits(iter->seed);
+ u64 guard = get_unaligned_be64(&pi->guard_tag);
+ u64 ref = get_unaligned_be48(pi->ref_tag);
+ u16 app = get_unaligned_be16(&pi->app_tag);
if (iter->bi->flags & BLK_INTEGRITY_REF_TAG) {
if (app == APP_TAG_ESCAPE)
@@ -211,19 +185,10 @@ static blk_status_t ext_pi_crc64_verify(struct blk_integrity_iter *iter,
static blk_status_t t10_pi_verify(struct blk_integrity_iter *iter,
struct t10_pi_tuple *pi)
{
- u32 ref, seed = lower_32_bits(iter->seed);
- u16 guard;
- u16 app;
-
- if (unlikely((unsigned long)pi & (sizeof(*pi) - 1))) {
- guard = get_unaligned_be16(&pi->guard_tag);
- app = get_unaligned_be16(&pi->app_tag);
- ref = get_unaligned_be32(&pi->ref_tag);
- } else {
- guard = be16_to_cpu(pi->guard_tag);
- app = be16_to_cpu(pi->app_tag);
- ref = be32_to_cpu(pi->ref_tag);
- }
+ u32 seed = lower_32_bits(iter->seed);
+ u32 ref = get_unaligned_be32(&pi->ref_tag);
+ u16 guard = get_unaligned_be16(&pi->guard_tag);
+ u16 app = get_unaligned_be16(&pi->app_tag);
if (iter->bi->flags & BLK_INTEGRITY_REF_TAG) {
if (app == APP_TAG_ESCAPE)
@@ -460,20 +425,10 @@ static void blk_set_ext_unmap_ref(struct crc64_pi_tuple *pi, u64 virt,
static void blk_set_t10_unmap_ref(struct t10_pi_tuple *pi, u32 virt,
u32 ref_tag)
{
- u32 ref;
+ u32 ref = get_unaligned_be32(&pi->ref_tag);
- if (unlikely((unsigned long)pi & (sizeof(*pi) - 1)))
- ref = get_unaligned_be32(&pi->ref_tag);
- else
- ref = be32_to_cpu(pi->ref_tag);
-
- if (ref != ref_tag || ref == virt)
- return;
-
- if (unlikely((unsigned long)pi & (sizeof(*pi) - 1)))
+ if (ref == ref_tag && ref != virt)
put_unaligned_be32(virt, &pi->ref_tag);
- else
- pi->ref_tag = cpu_to_be32(virt);
}
static void blk_reftag_remap_complete(struct blk_integrity *bi,
@@ -504,20 +459,10 @@ static void blk_set_ext_map_ref(struct crc64_pi_tuple *pi, u64 virt,
static void blk_set_t10_map_ref(struct t10_pi_tuple *pi, u32 virt, u32 ref_tag)
{
- u32 ref;
+ u32 ref = get_unaligned_be32(&pi->ref_tag);
- if (unlikely((unsigned long)pi & (sizeof(*pi) - 1)))
- ref = get_unaligned_be32(&pi->ref_tag);
- else
- ref = be32_to_cpu(pi->ref_tag);
-
- if (ref != virt || ref == ref_tag)
- return;
-
- if (unlikely((unsigned long)pi & (sizeof(*pi) - 1)))
+ if (ref == virt && ref != ref_tag)
put_unaligned_be32(ref_tag, &pi->ref_tag);
- else
- pi->ref_tag = cpu_to_be32(ref_tag);
}
static void blk_reftag_remap_prepare(struct blk_integrity *bi,
--
2.47.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCHv2] blk-integrity: support arbitrary buffer alignment
2025-11-07 13:15 ` Christoph Hellwig
@ 2025-11-07 15:00 ` Keith Busch
2025-11-07 15:35 ` Christoph Hellwig
0 siblings, 1 reply; 4+ messages in thread
From: Keith Busch @ 2025-11-07 15:00 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Keith Busch, linux-block, axboe, martin.petersen
On Fri, Nov 07, 2025 at 02:15:19PM +0100, Christoph Hellwig wrote:
> On Thu, Nov 06, 2025 at 08:34:47PM -0800, Keith Busch wrote:
> > This was tested using recently proposed io_uring metadata test case
> > here:
> >
> > https://lore.kernel.org/io-uring/20251107042953.3393507-1-kbusch@meta.com/
>
> Any chance we could get this test or something like it into blktests?
>
> That way it would get regularly run as part of block layer validation.
I'll give it a shot. The only problem is that liburing currently doesn't
define the SQE fields or pi attributes for this feature, so we'd need to
have blktests conditionally redefine things depending on which liburing
version the system is using.
> The changes looks sensible to be, and pass very basic sanity testing
> using my PI setup. I have few cleanups we should get into (attached).
Thanks! These all look good to me. If it's okay with you, I'll fold
these in and tag you for Co-developed-by.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCHv2] blk-integrity: support arbitrary buffer alignment
2025-11-07 15:00 ` Keith Busch
@ 2025-11-07 15:35 ` Christoph Hellwig
0 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2025-11-07 15:35 UTC (permalink / raw)
To: Keith Busch
Cc: Christoph Hellwig, Keith Busch, linux-block, axboe,
martin.petersen
On Fri, Nov 07, 2025 at 08:00:26AM -0700, Keith Busch wrote:
> Thanks! These all look good to me. If it's okay with you, I'll fold
> these in and tag you for Co-developed-by.
Please fold them yes. I don't think I need a Co-developed-by or any
attribution for these trivial cleanups.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-11-07 15:35 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-07 4:34 [PATCHv2] blk-integrity: support arbitrary buffer alignment Keith Busch
2025-11-07 13:15 ` Christoph Hellwig
2025-11-07 15:00 ` Keith Busch
2025-11-07 15:35 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).