* [RFC PATCH 0/2] dm-crypt support for per-sector NVMe metadata
@ 2024-05-15 13:27 Mikulas Patocka
2024-05-15 13:28 ` [RFC PATCH 1/2] block: change rq_integrity_vec to respect the iterator Mikulas Patocka
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Mikulas Patocka @ 2024-05-15 13:27 UTC (permalink / raw)
To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg,
Mike Snitzer, Milan Broz
Cc: linux-block, dm-devel, linux-nvme
Hi
Some NVMe devices may be formatted with extra 64 bytes of metadata per
sector.
Here I'm submitting for review dm-crypt patches that make it possible to
use per-sector metadata for authenticated encryption. With these patches,
dm-crypt can run directly on the top of a NVMe device, without using
dm-integrity. These patches increase write throughput twice, because there
is no write to the dm-integrity journal.
An example how to use it (so far, there is no support in the userspace
cryptsetup tool):
# nvme format /dev/nvme1 -n 1 -lbaf=4
# dmsetup create cr --table '0 1048576 crypt
capi:authenc(hmac(sha256),cbc(aes))-essiv:sha256
01b11af6b55f76424fd53fb66667c301466b2eeaf0f39fd36d26e7fc4f52ade2de4228e996f5ae2fe817ce178e77079d28e4baaebffbcd3e16ae4f36ef217298
0 /dev/nvme1n1 0 2 integrity:32:aead sector_size:4096'
Please review it - I'd like to know whether detecting the presence of
per-sector metadata in crypt_integrity_ctr is correct whether it should be
done differently.
Mikulas
^ permalink raw reply [flat|nested] 18+ messages in thread* [RFC PATCH 1/2] block: change rq_integrity_vec to respect the iterator 2024-05-15 13:27 [RFC PATCH 0/2] dm-crypt support for per-sector NVMe metadata Mikulas Patocka @ 2024-05-15 13:28 ` Mikulas Patocka 2024-05-16 2:30 ` Jens Axboe 2024-05-16 8:14 ` [RFC PATCH 1/2] " Ming Lei 2024-05-15 13:30 ` [RFC PATCH 2/2] dm-crypt: support per-sector NVMe metadata Mikulas Patocka 2024-05-27 22:12 ` [RFC PATCH 0/2] dm-crypt support for " Eric Wheeler 2 siblings, 2 replies; 18+ messages in thread From: Mikulas Patocka @ 2024-05-15 13:28 UTC (permalink / raw) To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg, Mike Snitzer, Milan Broz Cc: linux-block, dm-devel, linux-nvme If we allocate a bio that is larger than NVMe maximum request size, attach integrity metadata to it and send it to the NVMe subsystem, the integrity metadata will be corrupted. Splitting the bio works correctly. The function bio_split will clone the bio, trim the iterator of the first bio and advance the iterator of the second bio. However, the function rq_integrity_vec has a bug - it returns the first vector of the bio's metadata and completely disregards the metadata iterator that was advanced when the bio was split. Thus, the second bio uses the same metadata as the first bio and this leads to metadata corruption. This commit changes rq_integrity_vec, so that it calls mp_bvec_iter_bvec instead of returning the first vector. mp_bvec_iter_bvec reads the iterator and advances the vector by the iterator. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/nvme/host/pci.c | 6 +++--- include/linux/blk-integrity.h | 12 ++++++------ 2 files changed, 9 insertions(+), 9 deletions(-) Index: linux-2.6/drivers/nvme/host/pci.c =================================================================== --- linux-2.6.orig/drivers/nvme/host/pci.c +++ linux-2.6/drivers/nvme/host/pci.c @@ -825,9 +825,9 @@ static blk_status_t nvme_map_metadata(st struct nvme_command *cmnd) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); + struct bio_vec bv = rq_integrity_vec(req); - iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req), - rq_dma_dir(req), 0); + iod->meta_dma = dma_map_bvec(dev->dev, &bv, rq_dma_dir(req), 0); if (dma_mapping_error(dev->dev, iod->meta_dma)) return BLK_STS_IOERR; cmnd->rw.metadata = cpu_to_le64(iod->meta_dma); @@ -966,7 +966,7 @@ static __always_inline void nvme_pci_unm struct nvme_iod *iod = blk_mq_rq_to_pdu(req); dma_unmap_page(dev->dev, iod->meta_dma, - rq_integrity_vec(req)->bv_len, rq_dma_dir(req)); + rq_integrity_vec(req).bv_len, rq_dma_dir(req)); } if (blk_rq_nr_phys_segments(req)) Index: linux-2.6/include/linux/blk-integrity.h =================================================================== --- linux-2.6.orig/include/linux/blk-integrity.h +++ linux-2.6/include/linux/blk-integrity.h @@ -109,11 +109,11 @@ static inline bool blk_integrity_rq(stru * Return the first bvec that contains integrity data. Only drivers that are * limited to a single integrity segment should use this helper. */ -static inline struct bio_vec *rq_integrity_vec(struct request *rq) +static inline struct bio_vec rq_integrity_vec(struct request *rq) { - if (WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1)) - return NULL; - return rq->bio->bi_integrity->bip_vec; + WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1); + return mp_bvec_iter_bvec(rq->bio->bi_integrity->bip_vec, + rq->bio->bi_integrity->bip_iter); } #else /* CONFIG_BLK_DEV_INTEGRITY */ static inline int blk_rq_count_integrity_sg(struct request_queue *q, @@ -177,9 +177,9 @@ static inline int blk_integrity_rq(struc return 0; } -static inline struct bio_vec *rq_integrity_vec(struct request *rq) +static inline struct bio_vec rq_integrity_vec(struct request *rq) { - return NULL; + BUG(); } #endif /* CONFIG_BLK_DEV_INTEGRITY */ #endif /* _LINUX_BLK_INTEGRITY_H */ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] block: change rq_integrity_vec to respect the iterator 2024-05-15 13:28 ` [RFC PATCH 1/2] block: change rq_integrity_vec to respect the iterator Mikulas Patocka @ 2024-05-16 2:30 ` Jens Axboe 2024-05-20 12:53 ` Mikulas Patocka 2024-05-23 14:58 ` [PATCH v2] " Mikulas Patocka 2024-05-16 8:14 ` [RFC PATCH 1/2] " Ming Lei 1 sibling, 2 replies; 18+ messages in thread From: Jens Axboe @ 2024-05-16 2:30 UTC (permalink / raw) To: Mikulas Patocka, Keith Busch, Christoph Hellwig, Sagi Grimberg, Mike Snitzer, Milan Broz Cc: linux-block, dm-devel, linux-nvme On 5/15/24 7:28 AM, Mikulas Patocka wrote: > @@ -177,9 +177,9 @@ static inline int blk_integrity_rq(struc > return 0; > } > > -static inline struct bio_vec *rq_integrity_vec(struct request *rq) > +static inline struct bio_vec rq_integrity_vec(struct request *rq) > { > - return NULL; > + BUG(); > } > #endif /* CONFIG_BLK_DEV_INTEGRITY */ > #endif /* _LINUX_BLK_INTEGRITY_H */ Let's please not do that. If it's not used outside of CONFIG_BLK_DEV_INTEGRITY, it should just go away. -- Jens Axboe ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] block: change rq_integrity_vec to respect the iterator 2024-05-16 2:30 ` Jens Axboe @ 2024-05-20 12:53 ` Mikulas Patocka 2024-05-23 14:58 ` [PATCH v2] " Mikulas Patocka 1 sibling, 0 replies; 18+ messages in thread From: Mikulas Patocka @ 2024-05-20 12:53 UTC (permalink / raw) To: Jens Axboe Cc: Keith Busch, Christoph Hellwig, Sagi Grimberg, Mike Snitzer, Milan Broz, linux-block, dm-devel, linux-nvme On Wed, 15 May 2024, Jens Axboe wrote: > On 5/15/24 7:28 AM, Mikulas Patocka wrote: > > @@ -177,9 +177,9 @@ static inline int blk_integrity_rq(struc > > return 0; > > } > > > > -static inline struct bio_vec *rq_integrity_vec(struct request *rq) > > +static inline struct bio_vec rq_integrity_vec(struct request *rq) > > { > > - return NULL; > > + BUG(); > > } > > #endif /* CONFIG_BLK_DEV_INTEGRITY */ > > #endif /* _LINUX_BLK_INTEGRITY_H */ > > Let's please not do that. If it's not used outside of > CONFIG_BLK_DEV_INTEGRITY, it should just go away. > > -- > Jens Axboe It can't go away - it is guarded with blk_integrity_rq (which always returns 0 if compiled without CONFIG_BLK_DEV_INTEGRITY), so the compiler will optimize-out the calls to rq_integrity_vec. But we can't delete rq_integrity_vec, because the source code references it. Should rq_integrity_vec return empty 'struct bio_vec' instead? Or should we add more CONFIG_BLK_DEV_INTEGRITY tests to disable the call locations? Mikulas ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2] block: change rq_integrity_vec to respect the iterator 2024-05-16 2:30 ` Jens Axboe 2024-05-20 12:53 ` Mikulas Patocka @ 2024-05-23 14:58 ` Mikulas Patocka 2024-05-23 15:01 ` Jens Axboe 1 sibling, 1 reply; 18+ messages in thread From: Mikulas Patocka @ 2024-05-23 14:58 UTC (permalink / raw) To: Jens Axboe Cc: Keith Busch, Christoph Hellwig, Sagi Grimberg, Mike Snitzer, Milan Broz, linux-block, dm-devel, linux-nvme On Wed, 15 May 2024, Jens Axboe wrote: > On 5/15/24 7:28 AM, Mikulas Patocka wrote: > > @@ -177,9 +177,9 @@ static inline int blk_integrity_rq(struc > > return 0; > > } > > > > -static inline struct bio_vec *rq_integrity_vec(struct request *rq) > > +static inline struct bio_vec rq_integrity_vec(struct request *rq) > > { > > - return NULL; > > + BUG(); > > } > > #endif /* CONFIG_BLK_DEV_INTEGRITY */ > > #endif /* _LINUX_BLK_INTEGRITY_H */ > > Let's please not do that. If it's not used outside of > CONFIG_BLK_DEV_INTEGRITY, it should just go away. > > -- > Jens Axboe Here I'm resending the patch with the function rq_integrity_vec removed if CONFIG_BLK_DEV_INTEGRITY is not defined. Mikulas From: Mikulas Patocka <mpatocka@redhat.com> If we allocate a bio that is larger than NVMe maximum request size, attach integrity metadata to it and send it to the NVMe subsystem, the integrity metadata will be corrupted. Splitting the bio works correctly. The function bio_split will clone the bio, trim the iterator of the first bio and advance the iterator of the second bio. However, the function rq_integrity_vec has a bug - it returns the first vector of the bio's metadata and completely disregards the metadata iterator that was advanced when the bio was split. Thus, the second bio uses the same metadata as the first bio and this leads to metadata corruption. This commit changes rq_integrity_vec, so that it calls mp_bvec_iter_bvec instead of returning the first vector. mp_bvec_iter_bvec reads the iterator and advances the vector by the iterator. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/nvme/host/pci.c | 14 +++++++++++--- include/linux/blk-integrity.h | 12 ++++-------- 2 files changed, 15 insertions(+), 11 deletions(-) Index: linux-2.6/drivers/nvme/host/pci.c =================================================================== --- linux-2.6.orig/drivers/nvme/host/pci.c +++ linux-2.6/drivers/nvme/host/pci.c @@ -821,18 +821,20 @@ out_free_sg: return ret; } +#ifdef CONFIG_BLK_DEV_INTEGRITY static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req, struct nvme_command *cmnd) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); + struct bio_vec bv = rq_integrity_vec(req); - iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req), - rq_dma_dir(req), 0); + iod->meta_dma = dma_map_bvec(dev->dev, &bv, rq_dma_dir(req), 0); if (dma_mapping_error(dev->dev, iod->meta_dma)) return BLK_STS_IOERR; cmnd->rw.metadata = cpu_to_le64(iod->meta_dma); return BLK_STS_OK; } +#endif static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req) { @@ -853,16 +855,20 @@ static blk_status_t nvme_prep_rq(struct goto out_free_cmd; } +#ifdef CONFIG_BLK_DEV_INTEGRITY if (blk_integrity_rq(req)) { ret = nvme_map_metadata(dev, req, &iod->cmd); if (ret) goto out_unmap_data; } +#endif nvme_start_request(req); return BLK_STS_OK; +#ifdef CONFIG_BLK_DEV_INTEGRITY out_unmap_data: nvme_unmap_data(dev, req); +#endif out_free_cmd: nvme_cleanup_cmd(req); return ret; @@ -962,12 +968,14 @@ static __always_inline void nvme_pci_unm struct nvme_queue *nvmeq = req->mq_hctx->driver_data; struct nvme_dev *dev = nvmeq->dev; +#ifdef CONFIG_BLK_DEV_INTEGRITY if (blk_integrity_rq(req)) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); dma_unmap_page(dev->dev, iod->meta_dma, - rq_integrity_vec(req)->bv_len, rq_dma_dir(req)); + rq_integrity_vec(req).bv_len, rq_dma_dir(req)); } +#endif if (blk_rq_nr_phys_segments(req)) nvme_unmap_data(dev, req); Index: linux-2.6/include/linux/blk-integrity.h =================================================================== --- linux-2.6.orig/include/linux/blk-integrity.h +++ linux-2.6/include/linux/blk-integrity.h @@ -109,11 +109,11 @@ static inline bool blk_integrity_rq(stru * Return the first bvec that contains integrity data. Only drivers that are * limited to a single integrity segment should use this helper. */ -static inline struct bio_vec *rq_integrity_vec(struct request *rq) +static inline struct bio_vec rq_integrity_vec(struct request *rq) { - if (WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1)) - return NULL; - return rq->bio->bi_integrity->bip_vec; + WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1); + return mp_bvec_iter_bvec(rq->bio->bi_integrity->bip_vec, + rq->bio->bi_integrity->bip_iter); } #else /* CONFIG_BLK_DEV_INTEGRITY */ static inline int blk_rq_count_integrity_sg(struct request_queue *q, @@ -177,9 +177,5 @@ static inline int blk_integrity_rq(struc return 0; } -static inline struct bio_vec *rq_integrity_vec(struct request *rq) -{ - return NULL; -} #endif /* CONFIG_BLK_DEV_INTEGRITY */ #endif /* _LINUX_BLK_INTEGRITY_H */ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] block: change rq_integrity_vec to respect the iterator 2024-05-23 14:58 ` [PATCH v2] " Mikulas Patocka @ 2024-05-23 15:01 ` Jens Axboe 2024-05-23 15:11 ` Mikulas Patocka 0 siblings, 1 reply; 18+ messages in thread From: Jens Axboe @ 2024-05-23 15:01 UTC (permalink / raw) To: Mikulas Patocka Cc: Keith Busch, Christoph Hellwig, Sagi Grimberg, Mike Snitzer, Milan Broz, linux-block, dm-devel, linux-nvme On 5/23/24 8:58 AM, Mikulas Patocka wrote: > > > On Wed, 15 May 2024, Jens Axboe wrote: > >> On 5/15/24 7:28 AM, Mikulas Patocka wrote: >>> @@ -177,9 +177,9 @@ static inline int blk_integrity_rq(struc >>> return 0; >>> } >>> >>> -static inline struct bio_vec *rq_integrity_vec(struct request *rq) >>> +static inline struct bio_vec rq_integrity_vec(struct request *rq) >>> { >>> - return NULL; >>> + BUG(); >>> } >>> #endif /* CONFIG_BLK_DEV_INTEGRITY */ >>> #endif /* _LINUX_BLK_INTEGRITY_H */ >> >> Let's please not do that. If it's not used outside of >> CONFIG_BLK_DEV_INTEGRITY, it should just go away. >> >> -- >> Jens Axboe > > Here I'm resending the patch with the function rq_integrity_vec removed if > CONFIG_BLK_DEV_INTEGRITY is not defined. That looks better - but can you please just post a full new series, that's a lot easier to deal with and look at than adding a v2 of one patch in the thread. > @@ -853,16 +855,20 @@ static blk_status_t nvme_prep_rq(struct > goto out_free_cmd; > } > > +#ifdef CONFIG_BLK_DEV_INTEGRITY > if (blk_integrity_rq(req)) { > ret = nvme_map_metadata(dev, req, &iod->cmd); > if (ret) > goto out_unmap_data; > } > +#endif if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && blk_integrity_rq(req)) { ? > @@ -962,12 +968,14 @@ static __always_inline void nvme_pci_unm > struct nvme_queue *nvmeq = req->mq_hctx->driver_data; > struct nvme_dev *dev = nvmeq->dev; > > +#ifdef CONFIG_BLK_DEV_INTEGRITY > if (blk_integrity_rq(req)) { > struct nvme_iod *iod = blk_mq_rq_to_pdu(req); Ditto > Index: linux-2.6/include/linux/blk-integrity.h > =================================================================== > --- linux-2.6.orig/include/linux/blk-integrity.h > +++ linux-2.6/include/linux/blk-integrity.h > @@ -109,11 +109,11 @@ static inline bool blk_integrity_rq(stru > * Return the first bvec that contains integrity data. Only drivers that are > * limited to a single integrity segment should use this helper. > */ > -static inline struct bio_vec *rq_integrity_vec(struct request *rq) > +static inline struct bio_vec rq_integrity_vec(struct request *rq) > { > - if (WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1)) > - return NULL; > - return rq->bio->bi_integrity->bip_vec; > + WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1); > + return mp_bvec_iter_bvec(rq->bio->bi_integrity->bip_vec, > + rq->bio->bi_integrity->bip_iter); > } Not clear why the return on integrity segments > 1 is removed? -- Jens Axboe ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] block: change rq_integrity_vec to respect the iterator 2024-05-23 15:01 ` Jens Axboe @ 2024-05-23 15:11 ` Mikulas Patocka 2024-05-23 15:22 ` Anuj gupta 2024-05-23 15:33 ` Jens Axboe 0 siblings, 2 replies; 18+ messages in thread From: Mikulas Patocka @ 2024-05-23 15:11 UTC (permalink / raw) To: Jens Axboe Cc: Keith Busch, Christoph Hellwig, Sagi Grimberg, Mike Snitzer, Milan Broz, linux-block, dm-devel, linux-nvme On Thu, 23 May 2024, Jens Axboe wrote: > On 5/23/24 8:58 AM, Mikulas Patocka wrote: > > Here I'm resending the patch with the function rq_integrity_vec removed if > > CONFIG_BLK_DEV_INTEGRITY is not defined. > > That looks better - but can you please just post a full new series, > that's a lot easier to deal with and look at than adding a v2 of one > patch in the thread. OK, I'll post both patches. > > @@ -853,16 +855,20 @@ static blk_status_t nvme_prep_rq(struct > > goto out_free_cmd; > > } > > > > +#ifdef CONFIG_BLK_DEV_INTEGRITY > > if (blk_integrity_rq(req)) { > > ret = nvme_map_metadata(dev, req, &iod->cmd); > > if (ret) > > goto out_unmap_data; > > } > > +#endif > > if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && blk_integrity_rq(req)) { > > ? That wouldn't work, because the calls to rq_integrity_vec need to be eliminated by the preprocessor. Should I change rq_integrity_vec to this? Then, we could get rid of the ifdefs and let the optimizer remove all calls to rq_integrity_vec. static inline struct bio_vec rq_integrity_vec(struct request *rq) { struct bio_vec bv = { }; return bv; } > > @@ -962,12 +968,14 @@ static __always_inline void nvme_pci_unm > > struct nvme_queue *nvmeq = req->mq_hctx->driver_data; > > struct nvme_dev *dev = nvmeq->dev; > > > > +#ifdef CONFIG_BLK_DEV_INTEGRITY > > if (blk_integrity_rq(req)) { > > struct nvme_iod *iod = blk_mq_rq_to_pdu(req); > > Ditto > > > Index: linux-2.6/include/linux/blk-integrity.h > > =================================================================== > > --- linux-2.6.orig/include/linux/blk-integrity.h > > +++ linux-2.6/include/linux/blk-integrity.h > > @@ -109,11 +109,11 @@ static inline bool blk_integrity_rq(stru > > * Return the first bvec that contains integrity data. Only drivers that are > > * limited to a single integrity segment should use this helper. > > */ > > -static inline struct bio_vec *rq_integrity_vec(struct request *rq) > > +static inline struct bio_vec rq_integrity_vec(struct request *rq) > > { > > - if (WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1)) > > - return NULL; > > - return rq->bio->bi_integrity->bip_vec; > > + WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1); > > + return mp_bvec_iter_bvec(rq->bio->bi_integrity->bip_vec, > > + rq->bio->bi_integrity->bip_iter); > > } > > Not clear why the return on integrity segments > 1 is removed? Because we can't return NULL. But I can leave it there, print a warning and return the first vector. Mikulas > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] block: change rq_integrity_vec to respect the iterator 2024-05-23 15:11 ` Mikulas Patocka @ 2024-05-23 15:22 ` Anuj gupta 2024-05-23 15:33 ` Jens Axboe 1 sibling, 0 replies; 18+ messages in thread From: Anuj gupta @ 2024-05-23 15:22 UTC (permalink / raw) To: Mikulas Patocka Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg, Mike Snitzer, Milan Broz, linux-block, dm-devel, linux-nvme On Thu, May 23, 2024 at 8:41 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > On Thu, 23 May 2024, Jens Axboe wrote: > > > On 5/23/24 8:58 AM, Mikulas Patocka wrote: > > > > Here I'm resending the patch with the function rq_integrity_vec removed if > > > CONFIG_BLK_DEV_INTEGRITY is not defined. > > > > That looks better - but can you please just post a full new series, > > that's a lot easier to deal with and look at than adding a v2 of one > > patch in the thread. > > OK, I'll post both patches. > > > > @@ -853,16 +855,20 @@ static blk_status_t nvme_prep_rq(struct > > > goto out_free_cmd; > > > } > > > > > > +#ifdef CONFIG_BLK_DEV_INTEGRITY > > > if (blk_integrity_rq(req)) { > > > ret = nvme_map_metadata(dev, req, &iod->cmd); > > > if (ret) > > > goto out_unmap_data; > > > } > > > +#endif > > > > if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && blk_integrity_rq(req)) { > > > > ? > > That wouldn't work, because the calls to rq_integrity_vec need to be > eliminated by the preprocessor. > > Should I change rq_integrity_vec to this? Then, we could get rid of the > ifdefs and let the optimizer remove all calls to rq_integrity_vec. > static inline struct bio_vec rq_integrity_vec(struct request *rq) > { > struct bio_vec bv = { }; > return bv; > } This can be reduced to return (struct bio_vec){ }; > > > > @@ -962,12 +968,14 @@ static __always_inline void nvme_pci_unm > > > struct nvme_queue *nvmeq = req->mq_hctx->driver_data; > > > struct nvme_dev *dev = nvmeq->dev; > > > > > > +#ifdef CONFIG_BLK_DEV_INTEGRITY > > > if (blk_integrity_rq(req)) { > > > struct nvme_iod *iod = blk_mq_rq_to_pdu(req); > > > > Ditto > > > > > Index: linux-2.6/include/linux/blk-integrity.h > > > =================================================================== > > > --- linux-2.6.orig/include/linux/blk-integrity.h > > > +++ linux-2.6/include/linux/blk-integrity.h > > > @@ -109,11 +109,11 @@ static inline bool blk_integrity_rq(stru > > > * Return the first bvec that contains integrity data. Only drivers that are > > > * limited to a single integrity segment should use this helper. > > > */ > > > -static inline struct bio_vec *rq_integrity_vec(struct request *rq) > > > +static inline struct bio_vec rq_integrity_vec(struct request *rq) > > > { > > > - if (WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1)) > > > - return NULL; > > > - return rq->bio->bi_integrity->bip_vec; > > > + WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1); > > > + return mp_bvec_iter_bvec(rq->bio->bi_integrity->bip_vec, > > > + rq->bio->bi_integrity->bip_iter); > > > } > > > > Not clear why the return on integrity segments > 1 is removed? > > Because we can't return NULL. But I can leave it there, print a warning > and return the first vector. > > Mikulas > > > -- > > Jens Axboe > > > > -- Anuj Gupta ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] block: change rq_integrity_vec to respect the iterator 2024-05-23 15:11 ` Mikulas Patocka 2024-05-23 15:22 ` Anuj gupta @ 2024-05-23 15:33 ` Jens Axboe 2024-05-23 15:48 ` Mikulas Patocka 1 sibling, 1 reply; 18+ messages in thread From: Jens Axboe @ 2024-05-23 15:33 UTC (permalink / raw) To: Mikulas Patocka Cc: Keith Busch, Christoph Hellwig, Sagi Grimberg, Mike Snitzer, Milan Broz, linux-block, dm-devel, linux-nvme On 5/23/24 9:11 AM, Mikulas Patocka wrote: >>> @@ -853,16 +855,20 @@ static blk_status_t nvme_prep_rq(struct >>> goto out_free_cmd; >>> } >>> >>> +#ifdef CONFIG_BLK_DEV_INTEGRITY >>> if (blk_integrity_rq(req)) { >>> ret = nvme_map_metadata(dev, req, &iod->cmd); >>> if (ret) >>> goto out_unmap_data; >>> } >>> +#endif >> >> if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && blk_integrity_rq(req)) { >> >> ? > > That wouldn't work, because the calls to rq_integrity_vec need to be > eliminated by the preprocessor. Why not just do this incremental? Cleans up the ifdef mess too, leaving only the one actually using rq_integrity_vec in place. diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 5f857cbc95c8..bd56416a7fa8 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -821,10 +821,10 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req, return ret; } -#ifdef CONFIG_BLK_DEV_INTEGRITY static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req, struct nvme_command *cmnd) { +#ifdef CONFIG_BLK_DEV_INTEGRITY struct nvme_iod *iod = blk_mq_rq_to_pdu(req); struct bio_vec bv = rq_integrity_vec(req); @@ -832,9 +832,9 @@ static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req, if (dma_mapping_error(dev->dev, iod->meta_dma)) return BLK_STS_IOERR; cmnd->rw.metadata = cpu_to_le64(iod->meta_dma); +#endif return BLK_STS_OK; } -#endif static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req) { @@ -855,20 +855,16 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req) goto out_free_cmd; } -#ifdef CONFIG_BLK_DEV_INTEGRITY - if (blk_integrity_rq(req)) { + if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && blk_integrity_rq(req)) { ret = nvme_map_metadata(dev, req, &iod->cmd); if (ret) goto out_unmap_data; } -#endif nvme_start_request(req); return BLK_STS_OK; -#ifdef CONFIG_BLK_DEV_INTEGRITY out_unmap_data: nvme_unmap_data(dev, req); -#endif out_free_cmd: nvme_cleanup_cmd(req); return ret; > Should I change rq_integrity_vec to this? Then, we could get rid of the > ifdefs and let the optimizer remove all calls to rq_integrity_vec. > static inline struct bio_vec rq_integrity_vec(struct request *rq) > { > struct bio_vec bv = { }; > return bv; > } Only if that eliminates runtime checking for !INTEGRITY, which I don't thin it will. -- Jens Axboe ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2] block: change rq_integrity_vec to respect the iterator 2024-05-23 15:33 ` Jens Axboe @ 2024-05-23 15:48 ` Mikulas Patocka 0 siblings, 0 replies; 18+ messages in thread From: Mikulas Patocka @ 2024-05-23 15:48 UTC (permalink / raw) To: Jens Axboe Cc: Keith Busch, Christoph Hellwig, Sagi Grimberg, Mike Snitzer, Milan Broz, linux-block, dm-devel, linux-nvme On Thu, 23 May 2024, Jens Axboe wrote: > On 5/23/24 9:11 AM, Mikulas Patocka wrote: > >>> @@ -853,16 +855,20 @@ static blk_status_t nvme_prep_rq(struct > >>> goto out_free_cmd; > >>> } > >>> > >>> +#ifdef CONFIG_BLK_DEV_INTEGRITY > >>> if (blk_integrity_rq(req)) { > >>> ret = nvme_map_metadata(dev, req, &iod->cmd); > >>> if (ret) > >>> goto out_unmap_data; > >>> } > >>> +#endif > >> > >> if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && blk_integrity_rq(req)) { > >> > >> ? > > > > That wouldn't work, because the calls to rq_integrity_vec need to be > > eliminated by the preprocessor. > > Why not just do this incremental? Cleans up the ifdef mess too, leaving > only the one actually using rq_integrity_vec in place. > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index 5f857cbc95c8..bd56416a7fa8 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -821,10 +821,10 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req, > return ret; > } > > -#ifdef CONFIG_BLK_DEV_INTEGRITY > static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req, > struct nvme_command *cmnd) > { > +#ifdef CONFIG_BLK_DEV_INTEGRITY > struct nvme_iod *iod = blk_mq_rq_to_pdu(req); > struct bio_vec bv = rq_integrity_vec(req); > > @@ -832,9 +832,9 @@ static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req, > if (dma_mapping_error(dev->dev, iod->meta_dma)) > return BLK_STS_IOERR; > cmnd->rw.metadata = cpu_to_le64(iod->meta_dma); > +#endif > return BLK_STS_OK; > } > -#endif > > static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req) > { > @@ -855,20 +855,16 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req) > goto out_free_cmd; > } > > -#ifdef CONFIG_BLK_DEV_INTEGRITY > - if (blk_integrity_rq(req)) { > + if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && blk_integrity_rq(req)) { > ret = nvme_map_metadata(dev, req, &iod->cmd); > if (ret) > goto out_unmap_data; > } > -#endif > > nvme_start_request(req); > return BLK_STS_OK; > -#ifdef CONFIG_BLK_DEV_INTEGRITY > out_unmap_data: > nvme_unmap_data(dev, req); > -#endif > out_free_cmd: > nvme_cleanup_cmd(req); > return ret; > > > Should I change rq_integrity_vec to this? Then, we could get rid of the > > ifdefs and let the optimizer remove all calls to rq_integrity_vec. > > static inline struct bio_vec rq_integrity_vec(struct request *rq) > > { > > struct bio_vec bv = { }; > > return bv; > > } > > Only if that eliminates runtime checking for !INTEGRITY, which I don't > thin it will. It will eliminate the ifdefs. If we are compiling without CONFIG_BLK_DEV_INTEGRITY, blk_integrity_rq(req) is inline and it always returns 0. So the optimizer will remove anything guarded with "if (blk_integrity_rq(req))" - including the calls to nvme_map_metadata and rq_integrity_vec. But we need to provide dummy rq_integrity_vec for the compiler front-end. The front-end doesn't know that blk_integrity_rq always returns zero. So, the patch will be smaller if we get rid of the ifdefs and provide a dummy rq_integrity_vec. Mikulas > > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] block: change rq_integrity_vec to respect the iterator 2024-05-15 13:28 ` [RFC PATCH 1/2] block: change rq_integrity_vec to respect the iterator Mikulas Patocka 2024-05-16 2:30 ` Jens Axboe @ 2024-05-16 8:14 ` Ming Lei 2024-05-20 12:42 ` Mikulas Patocka 1 sibling, 1 reply; 18+ messages in thread From: Ming Lei @ 2024-05-16 8:14 UTC (permalink / raw) To: Mikulas Patocka Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg, Mike Snitzer, Milan Broz, linux-block, dm-devel, linux-nvme On Wed, May 15, 2024 at 03:28:11PM +0200, Mikulas Patocka wrote: > If we allocate a bio that is larger than NVMe maximum request size, attach > integrity metadata to it and send it to the NVMe subsystem, the integrity > metadata will be corrupted. > > Splitting the bio works correctly. The function bio_split will clone the > bio, trim the iterator of the first bio and advance the iterator of the > second bio. > > However, the function rq_integrity_vec has a bug - it returns the first > vector of the bio's metadata and completely disregards the metadata > iterator that was advanced when the bio was split. Thus, the second bio > uses the same metadata as the first bio and this leads to metadata > corruption. Wrt. NVMe, inside blk_mq_submit_bio(), bio_integrity_prep() is called after bio is split, ->bi_integrity is actually allocated for every split bio, so I am not sure if the issue is related with bio splitting. Or is it related with DM over NVMe? However, rq_integrity_vec() may not work correctly in case of bio merge. Thanks, Ming ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] block: change rq_integrity_vec to respect the iterator 2024-05-16 8:14 ` [RFC PATCH 1/2] " Ming Lei @ 2024-05-20 12:42 ` Mikulas Patocka 2024-05-20 13:19 ` Ming Lei 0 siblings, 1 reply; 18+ messages in thread From: Mikulas Patocka @ 2024-05-20 12:42 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg, Mike Snitzer, Milan Broz, linux-block, dm-devel, linux-nvme On Thu, 16 May 2024, Ming Lei wrote: > On Wed, May 15, 2024 at 03:28:11PM +0200, Mikulas Patocka wrote: > > If we allocate a bio that is larger than NVMe maximum request size, attach > > integrity metadata to it and send it to the NVMe subsystem, the integrity > > metadata will be corrupted. > > > > Splitting the bio works correctly. The function bio_split will clone the > > bio, trim the iterator of the first bio and advance the iterator of the > > second bio. > > > > However, the function rq_integrity_vec has a bug - it returns the first > > vector of the bio's metadata and completely disregards the metadata > > iterator that was advanced when the bio was split. Thus, the second bio > > uses the same metadata as the first bio and this leads to metadata > > corruption. > > Wrt. NVMe, inside blk_mq_submit_bio(), bio_integrity_prep() is called after > bio is split, ->bi_integrity is actually allocated for every split bio, so I > am not sure if the issue is related with bio splitting. Or is it related > with DM over NVMe? I created a dm-crypt patch that stores autenticated data in the bio integrity field: https://patchwork.kernel.org/project/linux-block/patch/703ffbcf-2fa8-56aa-2219-10254af26ba5@redhat.com/ And that patch needs this bugfix. Mikulas > However, rq_integrity_vec() may not work correctly in case of bio merge. > > > Thanks, > Ming > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] block: change rq_integrity_vec to respect the iterator 2024-05-20 12:42 ` Mikulas Patocka @ 2024-05-20 13:19 ` Ming Lei 0 siblings, 0 replies; 18+ messages in thread From: Ming Lei @ 2024-05-20 13:19 UTC (permalink / raw) To: Mikulas Patocka Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg, Mike Snitzer, Milan Broz, linux-block, dm-devel, linux-nvme, ming.lei On Mon, May 20, 2024 at 02:42:34PM +0200, Mikulas Patocka wrote: > > > On Thu, 16 May 2024, Ming Lei wrote: > > > On Wed, May 15, 2024 at 03:28:11PM +0200, Mikulas Patocka wrote: > > > If we allocate a bio that is larger than NVMe maximum request size, attach > > > integrity metadata to it and send it to the NVMe subsystem, the integrity > > > metadata will be corrupted. > > > > > > Splitting the bio works correctly. The function bio_split will clone the > > > bio, trim the iterator of the first bio and advance the iterator of the > > > second bio. > > > > > > However, the function rq_integrity_vec has a bug - it returns the first > > > vector of the bio's metadata and completely disregards the metadata > > > iterator that was advanced when the bio was split. Thus, the second bio > > > uses the same metadata as the first bio and this leads to metadata > > > corruption. > > > > Wrt. NVMe, inside blk_mq_submit_bio(), bio_integrity_prep() is called after > > bio is split, ->bi_integrity is actually allocated for every split bio, so I > > am not sure if the issue is related with bio splitting. Or is it related > > with DM over NVMe? > > I created a dm-crypt patch that stores autenticated data in the bio > integrity field: > https://patchwork.kernel.org/project/linux-block/patch/703ffbcf-2fa8-56aa-2219-10254af26ba5@redhat.com/ > > And that patch needs this bugfix. OK, then please update commit log with dm-crypt use case, given there isn't such issue on plain nvme. BTW, bio won't be merged in case of plain NVMe since there is gap between two nvme bio's meta buffer, both are allocated from kmalloc(). However, is it possible for the split bios from dm-crypt to be merged in blk-mq code because dm-crypt may have its own queue limit? If yes, I guess this patch may not be enough. Otherwise, I think this path is good. Thanks, Ming ^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC PATCH 2/2] dm-crypt: support per-sector NVMe metadata 2024-05-15 13:27 [RFC PATCH 0/2] dm-crypt support for per-sector NVMe metadata Mikulas Patocka 2024-05-15 13:28 ` [RFC PATCH 1/2] block: change rq_integrity_vec to respect the iterator Mikulas Patocka @ 2024-05-15 13:30 ` Mikulas Patocka 2024-05-27 22:12 ` [RFC PATCH 0/2] dm-crypt support for " Eric Wheeler 2 siblings, 0 replies; 18+ messages in thread From: Mikulas Patocka @ 2024-05-15 13:30 UTC (permalink / raw) To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg, Mike Snitzer, Milan Broz Cc: linux-block, dm-devel, linux-nvme Support per-sector NVMe metadata in dm-crypt. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/dm-crypt.c | 53 ++++++++++++++++++++++++++++---------------------- 1 file changed, 30 insertions(+), 23 deletions(-) Index: linux-2.6/drivers/md/dm-crypt.c =================================================================== --- linux-2.6.orig/drivers/md/dm-crypt.c +++ linux-2.6/drivers/md/dm-crypt.c @@ -211,7 +211,8 @@ struct crypt_config { unsigned int integrity_tag_size; unsigned int integrity_iv_size; - unsigned int on_disk_tag_size; + unsigned int used_tag_size; + unsigned int tuple_size; /* * pool for per bio private data, crypto requests, @@ -1148,14 +1149,14 @@ static int dm_crypt_integrity_io_alloc(s unsigned int tag_len; int ret; - if (!bio_sectors(bio) || !io->cc->on_disk_tag_size) + if (!bio_sectors(bio) || !io->cc->tuple_size) return 0; bip = bio_integrity_alloc(bio, GFP_NOIO, 1); if (IS_ERR(bip)) return PTR_ERR(bip); - tag_len = io->cc->on_disk_tag_size * (bio_sectors(bio) >> io->cc->sector_shift); + tag_len = io->cc->tuple_size * (bio_sectors(bio) >> io->cc->sector_shift); bip->bip_iter.bi_sector = io->cc->start + io->sector; @@ -1173,24 +1174,30 @@ static int crypt_integrity_ctr(struct cr struct blk_integrity *bi = blk_get_integrity(cc->dev->bdev->bd_disk); struct mapped_device *md = dm_table_get_md(ti->table); + if (!bi) { + ti->error = "No integrity profile."; + return -EINVAL; + } + /* From now we require underlying device with our integrity profile */ - if (!bi || strcasecmp(bi->profile->name, "DM-DIF-EXT-TAG")) { + if (strcasecmp(bi->profile->name, "DM-DIF-EXT-TAG") && + strcasecmp(bi->profile->name, "nop")) { ti->error = "Integrity profile not supported."; return -EINVAL; } - if (bi->tag_size != cc->on_disk_tag_size || - bi->tuple_size != cc->on_disk_tag_size) { + if (bi->tuple_size < cc->used_tag_size) { ti->error = "Integrity profile tag size mismatch."; return -EINVAL; } + cc->tuple_size = bi->tuple_size; if (1 << bi->interval_exp != cc->sector_size) { ti->error = "Integrity profile sector size mismatch."; return -EINVAL; } if (crypt_integrity_aead(cc)) { - cc->integrity_tag_size = cc->on_disk_tag_size - cc->integrity_iv_size; + cc->integrity_tag_size = cc->used_tag_size - cc->integrity_iv_size; DMDEBUG("%s: Integrity AEAD, tag size %u, IV size %u.", dm_device_name(md), cc->integrity_tag_size, cc->integrity_iv_size); @@ -1202,7 +1209,7 @@ static int crypt_integrity_ctr(struct cr DMDEBUG("%s: Additional per-sector space %u bytes for IV.", dm_device_name(md), cc->integrity_iv_size); - if ((cc->integrity_tag_size + cc->integrity_iv_size) != bi->tag_size) { + if ((cc->integrity_tag_size + cc->integrity_iv_size) > cc->tuple_size) { ti->error = "Not enough space for integrity tag in the profile."; return -EINVAL; } @@ -1281,7 +1288,7 @@ static void *tag_from_dmreq(struct crypt struct dm_crypt_io *io = container_of(ctx, struct dm_crypt_io, ctx); return &io->integrity_metadata[*org_tag_of_dmreq(cc, dmreq) * - cc->on_disk_tag_size]; + cc->tuple_size]; } static void *iv_tag_from_dmreq(struct crypt_config *cc, @@ -1362,9 +1369,9 @@ static int crypt_convert_block_aead(stru aead_request_set_crypt(req, dmreq->sg_in, dmreq->sg_out, cc->sector_size, iv); r = crypto_aead_encrypt(req); - if (cc->integrity_tag_size + cc->integrity_iv_size != cc->on_disk_tag_size) + if (cc->integrity_tag_size + cc->integrity_iv_size != cc->tuple_size) memset(tag + cc->integrity_tag_size + cc->integrity_iv_size, 0, - cc->on_disk_tag_size - (cc->integrity_tag_size + cc->integrity_iv_size)); + cc->tuple_size - (cc->integrity_tag_size + cc->integrity_iv_size)); } else { aead_request_set_crypt(req, dmreq->sg_in, dmreq->sg_out, cc->sector_size + cc->integrity_tag_size, iv); @@ -1794,7 +1801,7 @@ static void crypt_dec_pending(struct dm_ return; if (likely(!io->ctx.aead_recheck) && unlikely(io->ctx.aead_failed) && - cc->on_disk_tag_size && bio_data_dir(base_bio) == READ) { + cc->used_tag_size && bio_data_dir(base_bio) == READ) { io->ctx.aead_recheck = true; io->ctx.aead_failed = false; io->error = 0; @@ -3173,7 +3180,7 @@ static int crypt_ctr_optional(struct dm_ ti->error = "Invalid integrity arguments"; return -EINVAL; } - cc->on_disk_tag_size = val; + cc->used_tag_size = val; sval = strchr(opt_string + strlen("integrity:"), ':') + 1; if (!strcasecmp(sval, "aead")) { set_bit(CRYPT_MODE_INTEGRITY_AEAD, &cc->cipher_flags); @@ -3384,12 +3391,12 @@ static int crypt_ctr(struct dm_target *t if (ret) goto bad; - cc->tag_pool_max_sectors = POOL_ENTRY_SIZE / cc->on_disk_tag_size; + cc->tag_pool_max_sectors = POOL_ENTRY_SIZE / cc->tuple_size; if (!cc->tag_pool_max_sectors) cc->tag_pool_max_sectors = 1; ret = mempool_init_kmalloc_pool(&cc->tag_pool, MIN_IOS, - cc->tag_pool_max_sectors * cc->on_disk_tag_size); + cc->tag_pool_max_sectors * cc->tuple_size); if (ret) { ti->error = "Cannot allocate integrity tags mempool"; goto bad; @@ -3464,7 +3471,7 @@ static int crypt_map(struct dm_target *t * Check if bio is too large, split as needed. */ if (unlikely(bio->bi_iter.bi_size > (BIO_MAX_VECS << PAGE_SHIFT)) && - (bio_data_dir(bio) == WRITE || cc->on_disk_tag_size)) + (bio_data_dir(bio) == WRITE || cc->used_tag_size)) dm_accept_partial_bio(bio, ((BIO_MAX_VECS << PAGE_SHIFT) >> SECTOR_SHIFT)); /* @@ -3480,8 +3487,8 @@ static int crypt_map(struct dm_target *t io = dm_per_bio_data(bio, cc->per_bio_data_size); crypt_io_init(io, cc, bio, dm_target_offset(ti, bio->bi_iter.bi_sector)); - if (cc->on_disk_tag_size) { - unsigned int tag_len = cc->on_disk_tag_size * (bio_sectors(bio) >> cc->sector_shift); + if (cc->tuple_size) { + unsigned int tag_len = cc->tuple_size * (bio_sectors(bio) >> cc->sector_shift); if (unlikely(tag_len > KMALLOC_MAX_SIZE)) io->integrity_metadata = NULL; @@ -3552,7 +3559,7 @@ static void crypt_status(struct dm_targe num_feature_args += test_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, &cc->flags); num_feature_args += cc->sector_size != (1 << SECTOR_SHIFT); num_feature_args += test_bit(CRYPT_IV_LARGE_SECTORS, &cc->cipher_flags); - if (cc->on_disk_tag_size) + if (cc->used_tag_size) num_feature_args++; if (num_feature_args) { DMEMIT(" %d", num_feature_args); @@ -3566,8 +3573,8 @@ static void crypt_status(struct dm_targe DMEMIT(" no_read_workqueue"); if (test_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, &cc->flags)) DMEMIT(" no_write_workqueue"); - if (cc->on_disk_tag_size) - DMEMIT(" integrity:%u:%s", cc->on_disk_tag_size, cc->cipher_auth); + if (cc->used_tag_size) + DMEMIT(" integrity:%u:%s", cc->used_tag_size, cc->cipher_auth); if (cc->sector_size != (1 << SECTOR_SHIFT)) DMEMIT(" sector_size:%d", cc->sector_size); if (test_bit(CRYPT_IV_LARGE_SECTORS, &cc->cipher_flags)) @@ -3588,9 +3595,9 @@ static void crypt_status(struct dm_targe DMEMIT(",iv_large_sectors=%c", test_bit(CRYPT_IV_LARGE_SECTORS, &cc->cipher_flags) ? 'y' : 'n'); - if (cc->on_disk_tag_size) + if (cc->used_tag_size) DMEMIT(",integrity_tag_size=%u,cipher_auth=%s", - cc->on_disk_tag_size, cc->cipher_auth); + cc->used_tag_size, cc->cipher_auth); if (cc->sector_size != (1 << SECTOR_SHIFT)) DMEMIT(",sector_size=%d", cc->sector_size); if (cc->cipher_string) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/2] dm-crypt support for per-sector NVMe metadata 2024-05-15 13:27 [RFC PATCH 0/2] dm-crypt support for per-sector NVMe metadata Mikulas Patocka 2024-05-15 13:28 ` [RFC PATCH 1/2] block: change rq_integrity_vec to respect the iterator Mikulas Patocka 2024-05-15 13:30 ` [RFC PATCH 2/2] dm-crypt: support per-sector NVMe metadata Mikulas Patocka @ 2024-05-27 22:12 ` Eric Wheeler 2024-05-28 7:25 ` Milan Broz 2024-05-28 11:16 ` Mikulas Patocka 2 siblings, 2 replies; 18+ messages in thread From: Eric Wheeler @ 2024-05-27 22:12 UTC (permalink / raw) To: Mikulas Patocka Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg, Mike Snitzer, Milan Broz, linux-block, dm-devel, linux-nvme On Wed, 15 May 2024, Mikulas Patocka wrote: > Hi > > Some NVMe devices may be formatted with extra 64 bytes of metadata per > sector. > > Here I'm submitting for review dm-crypt patches that make it possible to > use per-sector metadata for authenticated encryption. With these patches, > dm-crypt can run directly on the top of a NVMe device, without using > dm-integrity. These patches increase write throughput twice, because there > is no write to the dm-integrity journal. > > An example how to use it (so far, there is no support in the userspace > cryptsetup tool): > > # nvme format /dev/nvme1 -n 1 -lbaf=4 > # dmsetup create cr --table '0 1048576 crypt > capi:authenc(hmac(sha256),cbc(aes))-essiv:sha256 > 01b11af6b55f76424fd53fb66667c301466b2eeaf0f39fd36d26e7fc4f52ade2de4228e996f5ae2fe817ce178e77079d28e4baaebffbcd3e16ae4f36ef217298 > 0 /dev/nvme1n1 0 2 integrity:32:aead sector_size:4096' Thats really an amazing feature, and I think your implementation is simple and elegant. Somehow reminds me of 520/528-byte sectors that big commercial filers use, but in a way the Linux could use. Questions: - I see you are using 32-bytes of AEAD data (out of 64). Is AEAD always 32-bytes, or can it vary by crypto mechanism? - What drive are you using? I am curious what your `nvme id-ns` output looks like. Do you have 64 in the `ms` value? # nvme id-ns /dev/nvme0n1 | grep lbaf nlbaf : 0 nulbaf : 0 lbaf 0 : ms:0 lbads:9 rp:0 (in use) ^ ^512b -- Eric Wheeler > > Please review it - I'd like to know whether detecting the presence of > per-sector metadata in crypt_integrity_ctr is correct whether it should be > done differently. > > Mikulas > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/2] dm-crypt support for per-sector NVMe metadata 2024-05-27 22:12 ` [RFC PATCH 0/2] dm-crypt support for " Eric Wheeler @ 2024-05-28 7:25 ` Milan Broz 2024-05-28 23:55 ` Eric Wheeler 2024-05-28 11:16 ` Mikulas Patocka 1 sibling, 1 reply; 18+ messages in thread From: Milan Broz @ 2024-05-28 7:25 UTC (permalink / raw) To: Eric Wheeler, Mikulas Patocka Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg, Mike Snitzer, linux-block, dm-devel, linux-nvme On 5/28/24 12:12 AM, Eric Wheeler wrote: > On Wed, 15 May 2024, Mikulas Patocka wrote: >> Hi >> >> Some NVMe devices may be formatted with extra 64 bytes of metadata per >> sector. >> >> Here I'm submitting for review dm-crypt patches that make it possible to >> use per-sector metadata for authenticated encryption. With these patches, >> dm-crypt can run directly on the top of a NVMe device, without using >> dm-integrity. These patches increase write throughput twice, because there >> is no write to the dm-integrity journal. >> >> An example how to use it (so far, there is no support in the userspace >> cryptsetup tool): >> >> # nvme format /dev/nvme1 -n 1 -lbaf=4 >> # dmsetup create cr --table '0 1048576 crypt >> capi:authenc(hmac(sha256),cbc(aes))-essiv:sha256 >> 01b11af6b55f76424fd53fb66667c301466b2eeaf0f39fd36d26e7fc4f52ade2de4228e996f5ae2fe817ce178e77079d28e4baaebffbcd3e16ae4f36ef217298 >> 0 /dev/nvme1n1 0 2 integrity:32:aead sector_size:4096' > > Thats really an amazing feature, and I think your implementation is simple > and elegant. Somehow reminds me of 520/528-byte sectors that big > commercial filers use, but in a way the Linux could use. > > Questions: > > - I see you are using 32-bytes of AEAD data (out of 64). Is AEAD always > 32-bytes, or can it vary by crypto mechanism? Hi Eric, I'll try to answer this question as this is where we headed with dm-integrity+dm-crypt since the beginning - replace it with HW and atomic sector+metadata handling once suitable HW becomes available. Currently, dm-integrity allocates exact space for any AEAD you want to construct (cipher-xts/hctr2 + hmac) or for native AEAD (my favourite is AEGIS here). So it depends on configuration, the only difference to dm-integrity is that HW allocates fixed 64 bytes so that crypto can use up to this space, but it should be completely configurable in dm-crypt. IOW real used space can vary by crypto mechanism. Definitely, it is now enough for real AEAD compared to legacy 512+8 DIF :) Also, it opens a way to store something more (sector context) in metadata, but that's an idea for the future (usable in fs encryption as well, I guess). > - What drive are you using? I am curious what your `nvme id-ns` output > looks like. Do you have 64 in the `ms` value? > > # nvme id-ns /dev/nvme0n1 | grep lbaf > nlbaf : 0 > nulbaf : 0 > lbaf 0 : ms:0 lbads:9 rp:0 (in use) > ^ ^512b This is the major issue still - I think there are only enterprisey NVMe drives that can do this. Milan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/2] dm-crypt support for per-sector NVMe metadata 2024-05-28 7:25 ` Milan Broz @ 2024-05-28 23:55 ` Eric Wheeler 0 siblings, 0 replies; 18+ messages in thread From: Eric Wheeler @ 2024-05-28 23:55 UTC (permalink / raw) To: Milan Broz Cc: Mikulas Patocka, Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg, Mike Snitzer, linux-block, dm-devel, linux-nvme On Tue, 28 May 2024, Milan Broz wrote: > On 5/28/24 12:12 AM, Eric Wheeler wrote: > > On Wed, 15 May 2024, Mikulas Patocka wrote: > >> Hi > >> > >> Some NVMe devices may be formatted with extra 64 bytes of metadata per > >> sector. > >> > >> Here I'm submitting for review dm-crypt patches that make it possible to > >> use per-sector metadata for authenticated encryption. With these patches, > >> dm-crypt can run directly on the top of a NVMe device, without using > >> dm-integrity. These patches increase write throughput twice, because there > >> is no write to the dm-integrity journal. > >> > >> An example how to use it (so far, there is no support in the userspace > >> cryptsetup tool): > >> > >> # nvme format /dev/nvme1 -n 1 -lbaf=4 > >> # dmsetup create cr --table '0 1048576 crypt > >> capi:authenc(hmac(sha256),cbc(aes))-essiv:sha256 > >> 01b11af6b55f76424fd53fb66667c301466b2eeaf0f39fd36d26e7fc4f52ade2de4228e996f5ae2fe817ce178e77079d28e4baaebffbcd3e16ae4f36ef217298 > >> 0 /dev/nvme1n1 0 2 integrity:32:aead sector_size:4096' > > > > Thats really an amazing feature, and I think your implementation is simple > > and elegant. Somehow reminds me of 520/528-byte sectors that big > > commercial filers use, but in a way the Linux could use. > > > > Questions: > > > > - I see you are using 32-bytes of AEAD data (out of 64). Is AEAD always > > 32-bytes, or can it vary by crypto mechanism? > > Hi Eric, > > I'll try to answer this question as this is where we headed with > dm-integrity+dm-crypt since the beginning - replace it with HW and > atomic sector+metadata handling once suitable HW becomes available. > > Currently, dm-integrity allocates exact space for any AEAD you want to > construct (cipher-xts/hctr2 + hmac) or for native AEAD (my favourite is > AEGIS here). Awesome. > So it depends on configuration, the only difference to dm-integrity is > that HW allocates fixed 64 bytes so that crypto can use up to this > space, but it should be completely configurable in dm-crypt. IOW real > used space can vary by crypto mechanism. > > Definitely, it is now enough for real AEAD compared to legacy 512+8 DIF :) > > Also, it opens a way to store something more (sector context) in metadata, > but that's an idea for the future (usable in fs encryption as well, I guess). Good idea, you could modify the SCSI layer (similarly as for NVMe meta) so that bio integrity payload data could be packed at the end of a sector for drives that have DIF room. This would make it possible to use 520/528-byte and 4160/4224-byte-sectored SAS drives in Linux, for the first time ever. Newer SAS drives like the Xeos X22 with can support the following sector sizes: 512 + 0 512 + 8 512 + 16 4096 + 0 4096 + 64 4096 +128 > > - What drive are you using? I am curious what your `nvme id-ns` output > > looks like. Do you have 64 in the `ms` value? > > > > # nvme id-ns /dev/nvme0n1 | grep lbaf > > nlbaf : 0 > > nulbaf : 0 > > lbaf 0 : ms:0 lbads:9 rp:0 (in use) > > ^ ^512b > > This is the major issue still - I think there are only enterprisey NVMe drives > that can do this. For now that is fine, we only use enterprise drives anyway, and it will be great to use the integrity that the drives support natively. Coupled with MDRAID, this solves hidden bitrot quite nicely at that block layer ... and it may trickle down into desktop drives eventually. -Eric > > Milan > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/2] dm-crypt support for per-sector NVMe metadata 2024-05-27 22:12 ` [RFC PATCH 0/2] dm-crypt support for " Eric Wheeler 2024-05-28 7:25 ` Milan Broz @ 2024-05-28 11:16 ` Mikulas Patocka 1 sibling, 0 replies; 18+ messages in thread From: Mikulas Patocka @ 2024-05-28 11:16 UTC (permalink / raw) To: Eric Wheeler Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg, Mike Snitzer, Milan Broz, linux-block, dm-devel, linux-nvme On Mon, 27 May 2024, Eric Wheeler wrote: > On Wed, 15 May 2024, Mikulas Patocka wrote: > > Hi > > > > Some NVMe devices may be formatted with extra 64 bytes of metadata per > > sector. > > > > Here I'm submitting for review dm-crypt patches that make it possible to > > use per-sector metadata for authenticated encryption. With these patches, > > dm-crypt can run directly on the top of a NVMe device, without using > > dm-integrity. These patches increase write throughput twice, because there > > is no write to the dm-integrity journal. > > > > An example how to use it (so far, there is no support in the userspace > > cryptsetup tool): > > > > # nvme format /dev/nvme1 -n 1 -lbaf=4 > > # dmsetup create cr --table '0 1048576 crypt > > capi:authenc(hmac(sha256),cbc(aes))-essiv:sha256 > > 01b11af6b55f76424fd53fb66667c301466b2eeaf0f39fd36d26e7fc4f52ade2de4228e996f5ae2fe817ce178e77079d28e4baaebffbcd3e16ae4f36ef217298 > > 0 /dev/nvme1n1 0 2 integrity:32:aead sector_size:4096' > > Thats really an amazing feature, and I think your implementation is simple > and elegant. Somehow reminds me of 520/528-byte sectors that big > commercial filers use, but in a way the Linux could use. > > Questions: > > - I see you are using 32-bytes of AEAD data (out of 64). Is AEAD always > 32-bytes, or can it vary by crypto mechanism? It varies. I.e. if you use hmac(sha512), full 64 bytes will be used. > - What drive are you using? Western Digital SN840 WUS4BA119DSP3X3 > I am curious what your `nvme id-ns` output > looks like. Do you have 64 in the `ms` value? > > # nvme id-ns /dev/nvme0n1 | grep lbaf > nlbaf : 0 > nulbaf : 0 > lbaf 0 : ms:0 lbads:9 rp:0 (in use) > ^ ^512b Yes, I have this: lbaf 0 : ms:0 lbads:9 rp:0 lbaf 1 : ms:8 lbads:9 rp:0 lbaf 2 : ms:0 lbads:12 rp:0 lbaf 3 : ms:8 lbads:12 rp:0 lbaf 4 : ms:64 lbads:12 rp:0 (in use) Mikulas > -- > Eric Wheeler ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-05-28 23:55 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-15 13:27 [RFC PATCH 0/2] dm-crypt support for per-sector NVMe metadata Mikulas Patocka 2024-05-15 13:28 ` [RFC PATCH 1/2] block: change rq_integrity_vec to respect the iterator Mikulas Patocka 2024-05-16 2:30 ` Jens Axboe 2024-05-20 12:53 ` Mikulas Patocka 2024-05-23 14:58 ` [PATCH v2] " Mikulas Patocka 2024-05-23 15:01 ` Jens Axboe 2024-05-23 15:11 ` Mikulas Patocka 2024-05-23 15:22 ` Anuj gupta 2024-05-23 15:33 ` Jens Axboe 2024-05-23 15:48 ` Mikulas Patocka 2024-05-16 8:14 ` [RFC PATCH 1/2] " Ming Lei 2024-05-20 12:42 ` Mikulas Patocka 2024-05-20 13:19 ` Ming Lei 2024-05-15 13:30 ` [RFC PATCH 2/2] dm-crypt: support per-sector NVMe metadata Mikulas Patocka 2024-05-27 22:12 ` [RFC PATCH 0/2] dm-crypt support for " Eric Wheeler 2024-05-28 7:25 ` Milan Broz 2024-05-28 23:55 ` Eric Wheeler 2024-05-28 11:16 ` Mikulas Patocka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox