* [PATCH v2 0/3] Block integrity with flexible-offset PI
[not found] <CGME20240201130828epcas5p10bd98bcb6b8e9444603e347c2a910c44@epcas5p1.samsung.com>
@ 2024-02-01 13:01 ` Kanchan Joshi
2024-02-01 13:01 ` [PATCH v2 1/3] block: refactor guard helpers Kanchan Joshi
` (4 more replies)
0 siblings, 5 replies; 16+ messages in thread
From: Kanchan Joshi @ 2024-02-01 13:01 UTC (permalink / raw)
To: axboe, kbusch, hch, martin.petersen, sagi
Cc: linux-nvme, linux-block, gost.dev, Kanchan Joshi
The block integrity subsystem can only work with PI placed in the first
bytes of the metadata buffer.
The series makes block-integrity support the flexible placement of PI.
And changes NVMe driver to make use of the new capability.
This helps to
(i) enable the more common case for NVMe (PI in last bytes is the norm)
(ii) reduce nop profile users (tried by Jens recently [1]).
/* For NS 4K+16b, 8b PI, last bytes */
Before:
# cat /sys/block/nvme0n1/integrity/format
nop
After:
# cat /sys/block/nvme0n1/integrity/format
T10-DIF-TYPE1-CRC
Changes since v1:
- Reworded commit messsage in patch 2 and 3 (hch)
- Variable initialization order change (hch)
- Collect reviewed-by
[1] https://lore.kernel.org/linux-block/20240111160226.1936351-1-axboe@kernel.dk/
Kanchan Joshi (3):
block: refactor guard helpers
block: support PI at non-zero offset within metadata
nvme: allow integrity when PI is not in first bytes
block/bio-integrity.c | 1 +
block/blk-integrity.c | 1 +
block/t10-pi.c | 72 +++++++++++++++++++++++------------
drivers/nvme/host/core.c | 8 +++-
drivers/nvme/host/nvme.h | 1 +
include/linux/blk-integrity.h | 1 +
include/linux/blkdev.h | 1 +
7 files changed, 60 insertions(+), 25 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 1/3] block: refactor guard helpers
2024-02-01 13:01 ` [PATCH v2 0/3] Block integrity with flexible-offset PI Kanchan Joshi
@ 2024-02-01 13:01 ` Kanchan Joshi
2024-02-01 13:01 ` [PATCH v2 2/3] block: support PI at non-zero offset within metadata Kanchan Joshi
` (3 subsequent siblings)
4 siblings, 0 replies; 16+ messages in thread
From: Kanchan Joshi @ 2024-02-01 13:01 UTC (permalink / raw)
To: axboe, kbusch, hch, martin.petersen, sagi
Cc: linux-nvme, linux-block, gost.dev, Kanchan Joshi
Allow computation using the existing guard value.
This is a prep patch.
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
block/t10-pi.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/block/t10-pi.c b/block/t10-pi.c
index 914d8cddd43a..251a7b188963 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -12,14 +12,14 @@
#include <net/checksum.h>
#include <asm/unaligned.h>
-typedef __be16 (csum_fn) (void *, unsigned int);
+typedef __be16 (csum_fn) (__be16, void *, unsigned int);
-static __be16 t10_pi_crc_fn(void *data, unsigned int len)
+static __be16 t10_pi_crc_fn(__be16 crc, void *data, unsigned int len)
{
- return cpu_to_be16(crc_t10dif(data, len));
+ return cpu_to_be16(crc_t10dif_update(be16_to_cpu(crc), data, len));
}
-static __be16 t10_pi_ip_fn(void *data, unsigned int len)
+static __be16 t10_pi_ip_fn(__be16 csum, void *data, unsigned int len)
{
return (__force __be16)ip_compute_csum(data, len);
}
@@ -37,7 +37,7 @@ static blk_status_t t10_pi_generate(struct blk_integrity_iter *iter,
for (i = 0 ; i < iter->data_size ; i += iter->interval) {
struct t10_pi_tuple *pi = iter->prot_buf;
- pi->guard_tag = fn(iter->data_buf, iter->interval);
+ pi->guard_tag = fn(0, iter->data_buf, iter->interval);
pi->app_tag = 0;
if (type == T10_PI_TYPE1_PROTECTION)
@@ -83,7 +83,7 @@ static blk_status_t t10_pi_verify(struct blk_integrity_iter *iter,
goto next;
}
- csum = fn(iter->data_buf, iter->interval);
+ csum = fn(0, iter->data_buf, iter->interval);
if (pi->guard_tag != csum) {
pr_err("%s: guard tag error at sector %llu " \
@@ -280,9 +280,9 @@ const struct blk_integrity_profile t10_pi_type3_ip = {
};
EXPORT_SYMBOL(t10_pi_type3_ip);
-static __be64 ext_pi_crc64(void *data, unsigned int len)
+static __be64 ext_pi_crc64(u64 crc, void *data, unsigned int len)
{
- return cpu_to_be64(crc64_rocksoft(data, len));
+ return cpu_to_be64(crc64_rocksoft_update(crc, data, len));
}
static blk_status_t ext_pi_crc64_generate(struct blk_integrity_iter *iter,
@@ -293,7 +293,7 @@ static blk_status_t ext_pi_crc64_generate(struct blk_integrity_iter *iter,
for (i = 0 ; i < iter->data_size ; i += iter->interval) {
struct crc64_pi_tuple *pi = iter->prot_buf;
- pi->guard_tag = ext_pi_crc64(iter->data_buf, iter->interval);
+ pi->guard_tag = ext_pi_crc64(0, iter->data_buf, iter->interval);
pi->app_tag = 0;
if (type == T10_PI_TYPE1_PROTECTION)
@@ -343,7 +343,7 @@ static blk_status_t ext_pi_crc64_verify(struct blk_integrity_iter *iter,
goto next;
}
- csum = ext_pi_crc64(iter->data_buf, iter->interval);
+ csum = ext_pi_crc64(0, iter->data_buf, iter->interval);
if (pi->guard_tag != csum) {
pr_err("%s: guard tag error at sector %llu " \
"(rcvd %016llx, want %016llx)\n",
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 2/3] block: support PI at non-zero offset within metadata
2024-02-01 13:01 ` [PATCH v2 0/3] Block integrity with flexible-offset PI Kanchan Joshi
2024-02-01 13:01 ` [PATCH v2 1/3] block: refactor guard helpers Kanchan Joshi
@ 2024-02-01 13:01 ` Kanchan Joshi
2024-09-26 15:07 ` Keith Busch
2024-02-01 13:01 ` [PATCH v2 3/3] nvme: allow integrity when PI is not in first bytes Kanchan Joshi
` (2 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Kanchan Joshi @ 2024-02-01 13:01 UTC (permalink / raw)
To: axboe, kbusch, hch, martin.petersen, sagi
Cc: linux-nvme, linux-block, gost.dev, Kanchan Joshi, Chinmay Gameti
Block layer integrity processing assumes that protection information
(PI) is placed in the first bytes of each metadata block.
Remove this limitation and include the metadata before the PI in the
calculation of the guard tag.
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Chinmay Gameti <c.gameti@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
block/bio-integrity.c | 1 +
block/blk-integrity.c | 1 +
block/t10-pi.c | 52 +++++++++++++++++++++++++----------
include/linux/blk-integrity.h | 1 +
include/linux/blkdev.h | 1 +
5 files changed, 42 insertions(+), 14 deletions(-)
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index c9a16fba58b9..2e3e8e04961e 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -395,6 +395,7 @@ static blk_status_t bio_integrity_process(struct bio *bio,
iter.tuple_size = bi->tuple_size;
iter.seed = proc_iter->bi_sector;
iter.prot_buf = bvec_virt(bip->bip_vec);
+ iter.pi_offset = bi->pi_offset;
__bio_for_each_segment(bv, bio, bviter, *proc_iter) {
void *kaddr = bvec_kmap_local(&bv);
diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index d4e9b4556d14..ccbeb6dfa87a 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -370,6 +370,7 @@ void blk_integrity_register(struct gendisk *disk, struct blk_integrity *template
bi->profile = template->profile ? template->profile : &nop_profile;
bi->tuple_size = template->tuple_size;
bi->tag_size = template->tag_size;
+ bi->pi_offset = template->pi_offset;
blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, disk->queue);
diff --git a/block/t10-pi.c b/block/t10-pi.c
index 251a7b188963..d90892fd6f2a 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -32,12 +32,16 @@ static __be16 t10_pi_ip_fn(__be16 csum, void *data, unsigned int len)
static blk_status_t t10_pi_generate(struct blk_integrity_iter *iter,
csum_fn *fn, enum t10_dif_type type)
{
+ u8 offset = iter->pi_offset;
unsigned int i;
for (i = 0 ; i < iter->data_size ; i += iter->interval) {
- struct t10_pi_tuple *pi = iter->prot_buf;
+ struct t10_pi_tuple *pi = iter->prot_buf + offset;
pi->guard_tag = fn(0, iter->data_buf, iter->interval);
+ if (offset)
+ pi->guard_tag = fn(pi->guard_tag, iter->prot_buf,
+ offset);
pi->app_tag = 0;
if (type == T10_PI_TYPE1_PROTECTION)
@@ -56,12 +60,13 @@ static blk_status_t t10_pi_generate(struct blk_integrity_iter *iter,
static blk_status_t t10_pi_verify(struct blk_integrity_iter *iter,
csum_fn *fn, enum t10_dif_type type)
{
+ u8 offset = iter->pi_offset;
unsigned int i;
BUG_ON(type == T10_PI_TYPE0_PROTECTION);
for (i = 0 ; i < iter->data_size ; i += iter->interval) {
- struct t10_pi_tuple *pi = iter->prot_buf;
+ struct t10_pi_tuple *pi = iter->prot_buf + offset;
__be16 csum;
if (type == T10_PI_TYPE1_PROTECTION ||
@@ -84,6 +89,8 @@ static blk_status_t t10_pi_verify(struct blk_integrity_iter *iter,
}
csum = fn(0, iter->data_buf, iter->interval);
+ if (offset)
+ csum = fn(csum, iter->prot_buf, offset);
if (pi->guard_tag != csum) {
pr_err("%s: guard tag error at sector %llu " \
@@ -134,8 +141,10 @@ static blk_status_t t10_pi_type1_verify_ip(struct blk_integrity_iter *iter)
*/
static void t10_pi_type1_prepare(struct request *rq)
{
- const int tuple_sz = rq->q->integrity.tuple_size;
+ struct blk_integrity *bi = &rq->q->integrity;
+ const int tuple_sz = bi->tuple_size;
u32 ref_tag = t10_pi_ref_tag(rq);
+ u8 offset = bi->pi_offset;
struct bio *bio;
__rq_for_each_bio(bio, rq) {
@@ -154,7 +163,7 @@ static void t10_pi_type1_prepare(struct request *rq)
p = bvec_kmap_local(&iv);
for (j = 0; j < iv.bv_len; j += tuple_sz) {
- struct t10_pi_tuple *pi = p;
+ struct t10_pi_tuple *pi = p + offset;
if (be32_to_cpu(pi->ref_tag) == virt)
pi->ref_tag = cpu_to_be32(ref_tag);
@@ -183,9 +192,11 @@ static void t10_pi_type1_prepare(struct request *rq)
*/
static void t10_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
{
- unsigned intervals = nr_bytes >> rq->q->integrity.interval_exp;
- const int tuple_sz = rq->q->integrity.tuple_size;
+ struct blk_integrity *bi = &rq->q->integrity;
+ unsigned intervals = nr_bytes >> bi->interval_exp;
+ const int tuple_sz = bi->tuple_size;
u32 ref_tag = t10_pi_ref_tag(rq);
+ u8 offset = bi->pi_offset;
struct bio *bio;
__rq_for_each_bio(bio, rq) {
@@ -200,7 +211,7 @@ static void t10_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
p = bvec_kmap_local(&iv);
for (j = 0; j < iv.bv_len && intervals; j += tuple_sz) {
- struct t10_pi_tuple *pi = p;
+ struct t10_pi_tuple *pi = p + offset;
if (be32_to_cpu(pi->ref_tag) == ref_tag)
pi->ref_tag = cpu_to_be32(virt);
@@ -288,12 +299,16 @@ static __be64 ext_pi_crc64(u64 crc, void *data, unsigned int len)
static blk_status_t ext_pi_crc64_generate(struct blk_integrity_iter *iter,
enum t10_dif_type type)
{
+ u8 offset = iter->pi_offset;
unsigned int i;
for (i = 0 ; i < iter->data_size ; i += iter->interval) {
- struct crc64_pi_tuple *pi = iter->prot_buf;
+ struct crc64_pi_tuple *pi = iter->prot_buf + offset;
pi->guard_tag = ext_pi_crc64(0, iter->data_buf, iter->interval);
+ if (offset)
+ pi->guard_tag = ext_pi_crc64(be64_to_cpu(pi->guard_tag),
+ iter->prot_buf, offset);
pi->app_tag = 0;
if (type == T10_PI_TYPE1_PROTECTION)
@@ -319,10 +334,11 @@ static bool ext_pi_ref_escape(u8 *ref_tag)
static blk_status_t ext_pi_crc64_verify(struct blk_integrity_iter *iter,
enum t10_dif_type type)
{
+ u8 offset = iter->pi_offset;
unsigned int i;
for (i = 0; i < iter->data_size; i += iter->interval) {
- struct crc64_pi_tuple *pi = iter->prot_buf;
+ struct crc64_pi_tuple *pi = iter->prot_buf + offset;
u64 ref, seed;
__be64 csum;
@@ -344,6 +360,10 @@ static blk_status_t ext_pi_crc64_verify(struct blk_integrity_iter *iter,
}
csum = ext_pi_crc64(0, iter->data_buf, iter->interval);
+ if (offset)
+ csum = ext_pi_crc64(be64_to_cpu(csum), iter->prot_buf,
+ offset);
+
if (pi->guard_tag != csum) {
pr_err("%s: guard tag error at sector %llu " \
"(rcvd %016llx, want %016llx)\n",
@@ -373,8 +393,10 @@ static blk_status_t ext_pi_type1_generate_crc64(struct blk_integrity_iter *iter)
static void ext_pi_type1_prepare(struct request *rq)
{
- const int tuple_sz = rq->q->integrity.tuple_size;
+ struct blk_integrity *bi = &rq->q->integrity;
+ const int tuple_sz = bi->tuple_size;
u64 ref_tag = ext_pi_ref_tag(rq);
+ u8 offset = bi->pi_offset;
struct bio *bio;
__rq_for_each_bio(bio, rq) {
@@ -393,7 +415,7 @@ static void ext_pi_type1_prepare(struct request *rq)
p = bvec_kmap_local(&iv);
for (j = 0; j < iv.bv_len; j += tuple_sz) {
- struct crc64_pi_tuple *pi = p;
+ struct crc64_pi_tuple *pi = p + offset;
u64 ref = get_unaligned_be48(pi->ref_tag);
if (ref == virt)
@@ -411,9 +433,11 @@ static void ext_pi_type1_prepare(struct request *rq)
static void ext_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
{
- unsigned intervals = nr_bytes >> rq->q->integrity.interval_exp;
- const int tuple_sz = rq->q->integrity.tuple_size;
+ struct blk_integrity *bi = &rq->q->integrity;
+ unsigned intervals = nr_bytes >> bi->interval_exp;
+ const int tuple_sz = bi->tuple_size;
u64 ref_tag = ext_pi_ref_tag(rq);
+ u8 offset = bi->pi_offset;
struct bio *bio;
__rq_for_each_bio(bio, rq) {
@@ -428,7 +452,7 @@ static void ext_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
p = bvec_kmap_local(&iv);
for (j = 0; j < iv.bv_len && intervals; j += tuple_sz) {
- struct crc64_pi_tuple *pi = p;
+ struct crc64_pi_tuple *pi = p + offset;
u64 ref = get_unaligned_be48(pi->ref_tag);
if (ref == ref_tag)
diff --git a/include/linux/blk-integrity.h b/include/linux/blk-integrity.h
index 378b2459efe2..e253e7bd0d17 100644
--- a/include/linux/blk-integrity.h
+++ b/include/linux/blk-integrity.h
@@ -20,6 +20,7 @@ struct blk_integrity_iter {
unsigned int data_size;
unsigned short interval;
unsigned char tuple_size;
+ unsigned char pi_offset;
const char *disk_name;
};
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d7cac3de65b3..bb4d811fee46 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -108,6 +108,7 @@ struct blk_integrity {
const struct blk_integrity_profile *profile;
unsigned char flags;
unsigned char tuple_size;
+ unsigned char pi_offset;
unsigned char interval_exp;
unsigned char tag_size;
};
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 3/3] nvme: allow integrity when PI is not in first bytes
2024-02-01 13:01 ` [PATCH v2 0/3] Block integrity with flexible-offset PI Kanchan Joshi
2024-02-01 13:01 ` [PATCH v2 1/3] block: refactor guard helpers Kanchan Joshi
2024-02-01 13:01 ` [PATCH v2 2/3] block: support PI at non-zero offset within metadata Kanchan Joshi
@ 2024-02-01 13:01 ` Kanchan Joshi
2024-02-12 15:47 ` [PATCH v2 0/3] Block integrity with flexible-offset PI Kanchan Joshi
2024-02-12 15:57 ` Jens Axboe
4 siblings, 0 replies; 16+ messages in thread
From: Kanchan Joshi @ 2024-02-01 13:01 UTC (permalink / raw)
To: axboe, kbusch, hch, martin.petersen, sagi
Cc: linux-nvme, linux-block, gost.dev, Kanchan Joshi
NVM command set 1.0 (or later) mandates PI to be in the last bytes of
metadata. But this was not supported in the block-layer, and driver
registered a nop profile.
Since block-integrity can now handle flexible PI offset, change the
driver to support this configuration.
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
drivers/nvme/host/core.c | 8 +++++++-
drivers/nvme/host/nvme.h | 1 +
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 85ab0fcf9e88..4945d6259a13 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1726,6 +1726,7 @@ static void nvme_init_integrity(struct gendisk *disk,
}
integrity.tuple_size = head->ms;
+ integrity.pi_offset = head->pi_offset;
blk_integrity_register(disk, &integrity);
blk_queue_max_integrity_segments(disk->queue, max_integrity_segments);
}
@@ -1835,11 +1836,16 @@ static int nvme_init_ms(struct nvme_ctrl *ctrl, struct nvme_ns_head *head,
free_data:
kfree(nvm);
set_pi:
- if (head->pi_size && (first || head->ms == head->pi_size))
+ if (head->pi_size && head->ms >= head->pi_size)
head->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
else
head->pi_type = 0;
+ if (first)
+ head->pi_offset = 0;
+ else
+ head->pi_offset = head->ms - head->pi_size;
+
return ret;
}
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 030c80818240..281657320c3a 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -462,6 +462,7 @@ struct nvme_ns_head {
u16 ms;
u16 pi_size;
u8 pi_type;
+ u8 pi_offset;
u8 guard_type;
u16 sgs;
u32 sws;
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/3] Block integrity with flexible-offset PI
2024-02-01 13:01 ` [PATCH v2 0/3] Block integrity with flexible-offset PI Kanchan Joshi
` (2 preceding siblings ...)
2024-02-01 13:01 ` [PATCH v2 3/3] nvme: allow integrity when PI is not in first bytes Kanchan Joshi
@ 2024-02-12 15:47 ` Kanchan Joshi
2024-02-12 15:57 ` Jens Axboe
4 siblings, 0 replies; 16+ messages in thread
From: Kanchan Joshi @ 2024-02-12 15:47 UTC (permalink / raw)
To: Kanchan Joshi
Cc: axboe, kbusch, hch, martin.petersen, sagi, linux-nvme,
linux-block, gost.dev
On Thu, Feb 1, 2024 at 6:39 PM Kanchan Joshi <joshi.k@samsung.com> wrote:
>
> The block integrity subsystem can only work with PI placed in the first
> bytes of the metadata buffer.
>
> The series makes block-integrity support the flexible placement of PI.
> And changes NVMe driver to make use of the new capability.
Hi Jens,
This has got the necessary reviews. Can this be picked up?
Or do you see anything missing?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/3] Block integrity with flexible-offset PI
2024-02-01 13:01 ` [PATCH v2 0/3] Block integrity with flexible-offset PI Kanchan Joshi
` (3 preceding siblings ...)
2024-02-12 15:47 ` [PATCH v2 0/3] Block integrity with flexible-offset PI Kanchan Joshi
@ 2024-02-12 15:57 ` Jens Axboe
4 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2024-02-12 15:57 UTC (permalink / raw)
To: kbusch, hch, martin.petersen, sagi, Kanchan Joshi
Cc: linux-nvme, linux-block, gost.dev
On Thu, 01 Feb 2024 18:31:23 +0530, Kanchan Joshi wrote:
> The block integrity subsystem can only work with PI placed in the first
> bytes of the metadata buffer.
>
> The series makes block-integrity support the flexible placement of PI.
> And changes NVMe driver to make use of the new capability.
>
> This helps to
> (i) enable the more common case for NVMe (PI in last bytes is the norm)
> (ii) reduce nop profile users (tried by Jens recently [1]).
>
> [...]
Applied, thanks!
[1/3] block: refactor guard helpers
commit: 6b5c132a3f0d3b7c024ae98f0ace07c04d32cf73
[2/3] block: support PI at non-zero offset within metadata
commit: 60d21aac52e26531affdadb7543fe5b93f58b450
[3/3] nvme: allow integrity when PI is not in first bytes
commit: 921e81db524d17db683cc29aed7ff02f06ea3f96
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] block: support PI at non-zero offset within metadata
2024-02-01 13:01 ` [PATCH v2 2/3] block: support PI at non-zero offset within metadata Kanchan Joshi
@ 2024-09-26 15:07 ` Keith Busch
2024-09-26 16:38 ` Kanchan Joshi
0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2024-09-26 15:07 UTC (permalink / raw)
To: Kanchan Joshi
Cc: axboe, hch, martin.petersen, sagi, linux-nvme, linux-block,
gost.dev, Chinmay Gameti
On Thu, Feb 01, 2024 at 06:31:25PM +0530, Kanchan Joshi wrote:
> Block layer integrity processing assumes that protection information
> (PI) is placed in the first bytes of each metadata block.
>
> Remove this limitation and include the metadata before the PI in the
> calculation of the guard tag.
Very late reply, but I am just now discovering the consequences of this
patch.
We have drives with this format, 64b metadata with PI at the end. With
previous kernels, we had written data to these drives. Those kernel
versions disabled the GUARD generation, so the metadata was written
without it, and everything was fine.
Now we upgrade to 6.9+, and this kernel enables the GUARD check. All the
data previously written to this drive is unreadable because the GUARD is
invalid.
Not sure exactly what to do about this, but it is a broken kernel
upgrade path, so wanted to throw that information out there.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] block: support PI at non-zero offset within metadata
2024-09-26 15:07 ` Keith Busch
@ 2024-09-26 16:38 ` Kanchan Joshi
2024-09-26 16:55 ` Keith Busch
0 siblings, 1 reply; 16+ messages in thread
From: Kanchan Joshi @ 2024-09-26 16:38 UTC (permalink / raw)
To: Keith Busch
Cc: axboe, hch, martin.petersen, sagi, linux-nvme, linux-block,
gost.dev, Chinmay Gameti
On 9/26/2024 8:37 PM, Keith Busch wrote:
> On Thu, Feb 01, 2024 at 06:31:25PM +0530, Kanchan Joshi wrote:
>> Block layer integrity processing assumes that protection information
>> (PI) is placed in the first bytes of each metadata block.
>>
>> Remove this limitation and include the metadata before the PI in the
>> calculation of the guard tag.
>
> Very late reply, but I am just now discovering the consequences of this
> patch.
>
> We have drives with this format, 64b metadata with PI at the end. With
> previous kernels, we had written data to these drives. Those kernel
> versions disabled the GUARD generation, so the metadata was written
> without it, and everything was fine.
>
> Now we upgrade to 6.9+, and this kernel enables the GUARD check. All the
> data previously written to this drive is unreadable because the GUARD is
> invalid.
>
> Not sure exactly what to do about this, but it is a broken kernel
> upgrade path, so wanted to throw that information out there.
>
Ah, writing to the disk without any guard, but after that reading with
the guard!
I wish if there was a way to format the NVMe only to reset its pi type
(from non-zero to 0).
But there are kernel knobs too. Hope you are able to get to the same
state (as nop profile) by clearing write_generate and read_verify:
echo 0 > /sys/block/nvme0n1/integrity/read_verify
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] block: support PI at non-zero offset within metadata
2024-09-26 16:38 ` Kanchan Joshi
@ 2024-09-26 16:55 ` Keith Busch
2024-09-27 16:07 ` Kanchan Joshi
0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2024-09-26 16:55 UTC (permalink / raw)
To: Kanchan Joshi
Cc: axboe, hch, martin.petersen, sagi, linux-nvme, linux-block,
gost.dev, Chinmay Gameti
On Thu, Sep 26, 2024 at 10:08:09PM +0530, Kanchan Joshi wrote:
> But there are kernel knobs too. Hope you are able to get to the same
> state (as nop profile) by clearing write_generate and read_verify:
> echo 0 > /sys/block/nvme0n1/integrity/read_verify
It's not the kernel's verify causing the failure; it's the end device.
For nvme, it'll return status 0x282, End-to-end Guard Check Error, so
the kernel doesn't have a chance to check the data. We'd need to turn
off the command's NVME_RW_PRINFO_PRCHK_GUARD flag, but there's currently
no knob to toggle that.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] block: support PI at non-zero offset within metadata
2024-09-26 16:55 ` Keith Busch
@ 2024-09-27 16:07 ` Kanchan Joshi
2024-09-30 17:57 ` Martin K. Petersen
0 siblings, 1 reply; 16+ messages in thread
From: Kanchan Joshi @ 2024-09-27 16:07 UTC (permalink / raw)
To: Keith Busch
Cc: axboe, hch, martin.petersen, sagi, linux-nvme, linux-block,
gost.dev, Chinmay Gameti
On 9/26/2024 10:25 PM, Keith Busch wrote:
> On Thu, Sep 26, 2024 at 10:08:09PM +0530, Kanchan Joshi wrote:
>> But there are kernel knobs too. Hope you are able to get to the same
>> state (as nop profile) by clearing write_generate and read_verify:
>> echo 0 > /sys/block/nvme0n1/integrity/read_verify
>
> It's not the kernel's verify causing the failure; it's the end device.
> For nvme, it'll return status 0x282, End-to-end Guard Check Error, so
> the kernel doesn't have a chance to check the data. We'd need to turn
> off the command's NVME_RW_PRINFO_PRCHK_GUARD flag, but there's currently
> no knob to toggle that.
Indeed.
I spent a good deal of time on this today. I was thinking to connect
block read_verify/write_generate knobs to influence things at nvme level
(those PRCHK flags). But that will not be enough. Because with those
knobs block-layer will not attach meta-buffer, which is still needed.
The data was written under the condition when nvme driver set the
pi_type to 0 (even though at device level it was non-zero) during
integrity registration.
Thinking whether it will make sense to have a knob at the block-layer
level to do something like that i.e., override the set integrity-profile
with nop.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] block: support PI at non-zero offset within metadata
2024-09-27 16:07 ` Kanchan Joshi
@ 2024-09-30 17:57 ` Martin K. Petersen
2024-10-01 7:27 ` Javier González
0 siblings, 1 reply; 16+ messages in thread
From: Martin K. Petersen @ 2024-09-30 17:57 UTC (permalink / raw)
To: Kanchan Joshi
Cc: Keith Busch, axboe, hch, martin.petersen, sagi, linux-nvme,
linux-block, gost.dev, Chinmay Gameti
Kanchan,
> I spent a good deal of time on this today. I was thinking to connect
> block read_verify/write_generate knobs to influence things at nvme level
> (those PRCHK flags). But that will not be enough. Because with those
> knobs block-layer will not attach meta-buffer, which is still needed.
>
> The data was written under the condition when nvme driver set the
> pi_type to 0 (even though at device level it was non-zero) during
> integrity registration.
>
> Thinking whether it will make sense to have a knob at the block-layer
> level to do something like that i.e., override the set
> integrity-profile with nop.
SCSI went to great lengths to ensure that invalid protection information
would never be written during normal operation, regardless of whether
the host sent PI or not. And thus the only time one would anticipate a
PI error was if the data had actually been corrupted.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] block: support PI at non-zero offset within metadata
2024-09-30 17:57 ` Martin K. Petersen
@ 2024-10-01 7:27 ` Javier González
2024-10-01 15:37 ` Keith Busch
0 siblings, 1 reply; 16+ messages in thread
From: Javier González @ 2024-10-01 7:27 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Kanchan Joshi, Keith Busch, axboe, hch, sagi, linux-nvme,
linux-block, gost.dev, Chinmay Gameti
On 30.09.2024 13:57, Martin K. Petersen wrote:
>
>Kanchan,
>
>> I spent a good deal of time on this today. I was thinking to connect
>> block read_verify/write_generate knobs to influence things at nvme level
>> (those PRCHK flags). But that will not be enough. Because with those
>> knobs block-layer will not attach meta-buffer, which is still needed.
>>
>> The data was written under the condition when nvme driver set the
>> pi_type to 0 (even though at device level it was non-zero) during
>> integrity registration.
>>
>> Thinking whether it will make sense to have a knob at the block-layer
>> level to do something like that i.e., override the set
>> integrity-profile with nop.
>
>SCSI went to great lengths to ensure that invalid protection information
>would never be written during normal operation, regardless of whether
>the host sent PI or not. And thus the only time one would anticipate a
>PI error was if the data had actually been corrupted.
>
Is this something we should work on bringin to the NVMe TWG?
Javier
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] block: support PI at non-zero offset within metadata
2024-10-01 7:27 ` Javier González
@ 2024-10-01 15:37 ` Keith Busch
2024-10-02 10:29 ` Javier González
2024-10-02 16:18 ` Martin K. Petersen
0 siblings, 2 replies; 16+ messages in thread
From: Keith Busch @ 2024-10-01 15:37 UTC (permalink / raw)
To: Javier González
Cc: Martin K. Petersen, Kanchan Joshi, axboe, hch, sagi, linux-nvme,
linux-block, gost.dev, Chinmay Gameti
On Tue, Oct 01, 2024 at 09:27:08AM +0200, Javier González wrote:
> On 30.09.2024 13:57, Martin K. Petersen wrote:
> >
> > Kanchan,
> >
> > > I spent a good deal of time on this today. I was thinking to connect
> > > block read_verify/write_generate knobs to influence things at nvme level
> > > (those PRCHK flags). But that will not be enough. Because with those
> > > knobs block-layer will not attach meta-buffer, which is still needed.
> > >
> > > The data was written under the condition when nvme driver set the
> > > pi_type to 0 (even though at device level it was non-zero) during
> > > integrity registration.
> > >
> > > Thinking whether it will make sense to have a knob at the block-layer
> > > level to do something like that i.e., override the set
> > > integrity-profile with nop.
> >
> > SCSI went to great lengths to ensure that invalid protection information
> > would never be written during normal operation, regardless of whether
> > the host sent PI or not. And thus the only time one would anticipate a
> > PI error was if the data had actually been corrupted.
> >
>
> Is this something we should work on bringin to the NVMe TWG?
Maybe. It looks like they did the spec this way one purpose with the
ability to toggle guard tags per IO.
Just some more background on this because it may sound odd to use a data
protection namespace format that the kernel didn't support:
In this use case, writes to the device primarily come from the
passthrough interface, which could always use the guard tags for
end-to-end protection. The kernel block IO was the only path that had
the limitation.
Besides the passthrough interface, though, the setup uses kernel block
layer to write the partition tables. Upgrading from 6.8 -> 6.9 won't be
able to read the partition table on these devices. I'm still not sure
the best way to handle this, though.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] block: support PI at non-zero offset within metadata
2024-10-01 15:37 ` Keith Busch
@ 2024-10-02 10:29 ` Javier González
2024-10-02 16:18 ` Martin K. Petersen
1 sibling, 0 replies; 16+ messages in thread
From: Javier González @ 2024-10-02 10:29 UTC (permalink / raw)
To: Keith Busch
Cc: Martin K. Petersen, Kanchan Joshi, axboe, hch, sagi, linux-nvme,
linux-block, gost.dev, Chinmay Gameti
On 01.10.2024 09:37, Keith Busch wrote:
>On Tue, Oct 01, 2024 at 09:27:08AM +0200, Javier González wrote:
>> On 30.09.2024 13:57, Martin K. Petersen wrote:
>> >
>> > Kanchan,
>> >
>> > > I spent a good deal of time on this today. I was thinking to connect
>> > > block read_verify/write_generate knobs to influence things at nvme level
>> > > (those PRCHK flags). But that will not be enough. Because with those
>> > > knobs block-layer will not attach meta-buffer, which is still needed.
>> > >
>> > > The data was written under the condition when nvme driver set the
>> > > pi_type to 0 (even though at device level it was non-zero) during
>> > > integrity registration.
>> > >
>> > > Thinking whether it will make sense to have a knob at the block-layer
>> > > level to do something like that i.e., override the set
>> > > integrity-profile with nop.
>> >
>> > SCSI went to great lengths to ensure that invalid protection information
>> > would never be written during normal operation, regardless of whether
>> > the host sent PI or not. And thus the only time one would anticipate a
>> > PI error was if the data had actually been corrupted.
>> >
>>
>> Is this something we should work on bringin to the NVMe TWG?
>
>Maybe. It looks like they did the spec this way one purpose with the
>ability to toggle guard tags per IO.
>
>Just some more background on this because it may sound odd to use a data
>protection namespace format that the kernel didn't support:
>
>In this use case, writes to the device primarily come from the
>passthrough interface, which could always use the guard tags for
>end-to-end protection. The kernel block IO was the only path that had
>the limitation.
>
>Besides the passthrough interface, though, the setup uses kernel block
>layer to write the partition tables. Upgrading from 6.8 -> 6.9 won't be
>able to read the partition table on these devices. I'm still not sure
>the best way to handle this, though.
Mmmm. Need to think more about this.
Seems this is by design in NVMe. Will try to come back to Mike and Judy
to understand the background for this at the time...
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] block: support PI at non-zero offset within metadata
2024-10-01 15:37 ` Keith Busch
2024-10-02 10:29 ` Javier González
@ 2024-10-02 16:18 ` Martin K. Petersen
2024-10-02 19:03 ` Keith Busch
1 sibling, 1 reply; 16+ messages in thread
From: Martin K. Petersen @ 2024-10-02 16:18 UTC (permalink / raw)
To: Keith Busch
Cc: Javier González, Martin K. Petersen, Kanchan Joshi, axboe,
hch, sagi, linux-nvme, linux-block, gost.dev, Chinmay Gameti
Keith,
> Besides the passthrough interface, though, the setup uses kernel block
> layer to write the partition tables. Upgrading from 6.8 -> 6.9 won't
> be able to read the partition table on these devices. I'm still not
> sure the best way to handle this, though.
That's why, despite Type 2 supporting offset ref tags, the kernel always
uses the LBA. This is to ensure that any application which doesn't have
prior knowledge of the contents of each block's PI can perform a read
without failing. Including the kernel itself when reading the partition
table. But it's not just a kernel problem, the BIOS also needs to be
able to read arbitrary blocks, at least at the beginning and end of the
device.
Many of the features of the original DIF spec were squarely aimed at use
inside storage arrays where there was a single entity, the array
firmware, performing all reads and writes.
With DIX, since the interface needed to be used by array firmware and
general purpose operating systems alike, we needed to add the all
required features to the interface to accommodate every use case. And
NVMe inherited this. But NVMe did not really capture which things are
suitable only in the "single application" scenario.
I'm not sure there's a good fix. Other than as a passthrough user, the
burden is on you to ensure that things can be read by somebody else.
Could be backup applications, virus scanners, system firmware, etc.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] block: support PI at non-zero offset within metadata
2024-10-02 16:18 ` Martin K. Petersen
@ 2024-10-02 19:03 ` Keith Busch
0 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2024-10-02 19:03 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Javier González, Kanchan Joshi, axboe, hch, sagi, linux-nvme,
linux-block, gost.dev, Chinmay Gameti
On Wed, Oct 02, 2024 at 12:18:37PM -0400, Martin K. Petersen wrote:
> I'm not sure there's a good fix. Other than as a passthrough user, the
> burden is on you to ensure that things can be read by somebody else.
> Could be backup applications, virus scanners, system firmware, etc.
Maybe we should disable device side guard checks on read commands. NVMe
spec allows us to do that. This way the host would always be able to
read data, and the host can optionally validate the guard on its side if
it wants to.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-10-02 19:03 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CGME20240201130828epcas5p10bd98bcb6b8e9444603e347c2a910c44@epcas5p1.samsung.com>
2024-02-01 13:01 ` [PATCH v2 0/3] Block integrity with flexible-offset PI Kanchan Joshi
2024-02-01 13:01 ` [PATCH v2 1/3] block: refactor guard helpers Kanchan Joshi
2024-02-01 13:01 ` [PATCH v2 2/3] block: support PI at non-zero offset within metadata Kanchan Joshi
2024-09-26 15:07 ` Keith Busch
2024-09-26 16:38 ` Kanchan Joshi
2024-09-26 16:55 ` Keith Busch
2024-09-27 16:07 ` Kanchan Joshi
2024-09-30 17:57 ` Martin K. Petersen
2024-10-01 7:27 ` Javier González
2024-10-01 15:37 ` Keith Busch
2024-10-02 10:29 ` Javier González
2024-10-02 16:18 ` Martin K. Petersen
2024-10-02 19:03 ` Keith Busch
2024-02-01 13:01 ` [PATCH v2 3/3] nvme: allow integrity when PI is not in first bytes Kanchan Joshi
2024-02-12 15:47 ` [PATCH v2 0/3] Block integrity with flexible-offset PI Kanchan Joshi
2024-02-12 15:57 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).