* [PATCH] block: change rq_integrity_vec to respect the iterator
@ 2024-04-29 18:37 ` Mikulas Patocka
2024-04-30 4:49 ` Anuj gupta
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Mikulas Patocka @ 2024-04-29 18:37 UTC (permalink / raw)
To: Jens Axboe, Keith Busch, Mike Snitzer; +Cc: linux-block, dm-devel, linux-nvme
Hi
I am changing dm-crypt, so that it can store the autenticated encryption
tag directly into the NVMe metadata (without using dm-integrity). This
will improve performance significantly, because we can avoid journaling
done by dm-integrity. I've got it working, but I've found this bug, so I'm
sending a patch for it.
Mikulas
From: Mikulas Patocka <mpatocka@redhat.com>
If we allocate a bio that is larger than NVMe maximum request size, attach
integrity metadata to it and send it to the NVMe subsystem, the integrity
metadata will be corrupted.
Splitting the bio works correctly. The function bio_split will clone the
bio, trim the size of the first bio and advance the iterator of the second
bio.
However, the function rq_integrity_vec has a bug - it returns the first
vector of the bio's metadata and completely disregards the metadata
iterator that was advanced when the bio was split. Thus, the second bio
uses the same metadata as the first bio and this leads to metadata
corruption.
This commit changes rq_integrity_vec, so that it calls mp_bvec_iter_bvec
instead of returning the first vector. mp_bvec_iter_bvec reads the
iterator and advances the vector by the iterator.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
drivers/nvme/host/pci.c | 6 +++---
include/linux/blk-integrity.h | 12 ++++++------
2 files changed, 9 insertions(+), 9 deletions(-)
Index: linux-2.6/drivers/nvme/host/pci.c
===================================================================
--- linux-2.6.orig/drivers/nvme/host/pci.c
+++ linux-2.6/drivers/nvme/host/pci.c
@@ -825,9 +825,9 @@ static blk_status_t nvme_map_metadata(st
struct nvme_command *cmnd)
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+ struct bio_vec bv = rq_integrity_vec(req);
- iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
- rq_dma_dir(req), 0);
+ iod->meta_dma = dma_map_bvec(dev->dev, &bv, rq_dma_dir(req), 0);
if (dma_mapping_error(dev->dev, iod->meta_dma))
return BLK_STS_IOERR;
cmnd->rw.metadata = cpu_to_le64(iod->meta_dma);
@@ -966,7 +966,7 @@ static __always_inline void nvme_pci_unm
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
dma_unmap_page(dev->dev, iod->meta_dma,
- rq_integrity_vec(req)->bv_len, rq_dma_dir(req));
+ rq_integrity_vec(req).bv_len, rq_dma_dir(req));
}
if (blk_rq_nr_phys_segments(req))
Index: linux-2.6/include/linux/blk-integrity.h
===================================================================
--- linux-2.6.orig/include/linux/blk-integrity.h
+++ linux-2.6/include/linux/blk-integrity.h
@@ -109,11 +109,11 @@ static inline bool blk_integrity_rq(stru
* Return the first bvec that contains integrity data. Only drivers that are
* limited to a single integrity segment should use this helper.
*/
-static inline struct bio_vec *rq_integrity_vec(struct request *rq)
+static inline struct bio_vec rq_integrity_vec(struct request *rq)
{
- if (WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1))
- return NULL;
- return rq->bio->bi_integrity->bip_vec;
+ WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1);
+ return mp_bvec_iter_bvec(rq->bio->bi_integrity->bip_vec,
+ rq->bio->bi_integrity->bip_iter);
}
#else /* CONFIG_BLK_DEV_INTEGRITY */
static inline int blk_rq_count_integrity_sg(struct request_queue *q,
@@ -177,9 +177,9 @@ static inline int blk_integrity_rq(struc
return 0;
}
-static inline struct bio_vec *rq_integrity_vec(struct request *rq)
+static inline struct bio_vec rq_integrity_vec(struct request *rq)
{
- return NULL;
+ BUG();
}
#endif /* CONFIG_BLK_DEV_INTEGRITY */
#endif /* _LINUX_BLK_INTEGRITY_H */
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] block: change rq_integrity_vec to respect the iterator
2024-04-29 18:37 ` [PATCH] block: change rq_integrity_vec to respect the iterator Mikulas Patocka
@ 2024-04-30 4:49 ` Anuj gupta
2024-04-30 8:16 ` Keith Busch
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Anuj gupta @ 2024-04-30 4:49 UTC (permalink / raw)
To: Mikulas Patocka
Cc: Jens Axboe, Keith Busch, Mike Snitzer, linux-block, dm-devel,
linux-nvme, Kanchan Joshi
On Tue, Apr 30, 2024 at 12:07 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> Hi
>
> I am changing dm-crypt, so that it can store the autenticated encryption
> tag directly into the NVMe metadata (without using dm-integrity). This
> will improve performance significantly, because we can avoid journaling
> done by dm-integrity. I've got it working, but I've found this bug, so I'm
> sending a patch for it.
>
> Mikulas
>
>
> From: Mikulas Patocka <mpatocka@redhat.com>
>
> If we allocate a bio that is larger than NVMe maximum request size, attach
> integrity metadata to it and send it to the NVMe subsystem, the integrity
> metadata will be corrupted.
>
> Splitting the bio works correctly. The function bio_split will clone the
> bio, trim the size of the first bio and advance the iterator of the second
> bio.
>
> However, the function rq_integrity_vec has a bug - it returns the first
> vector of the bio's metadata and completely disregards the metadata
> iterator that was advanced when the bio was split. Thus, the second bio
> uses the same metadata as the first bio and this leads to metadata
> corruption.
>
> This commit changes rq_integrity_vec, so that it calls mp_bvec_iter_bvec
> instead of returning the first vector. mp_bvec_iter_bvec reads the
> iterator and advances the vector by the iterator.
We also encountered this issue with split and had a similar solution [1]
This needs a fixes tag too.
fixes <2a876f5e25e8e> ("block: add a rq_integrity_vec helper")
>
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
>
> ---
> drivers/nvme/host/pci.c | 6 +++---
> include/linux/blk-integrity.h | 12 ++++++------
> 2 files changed, 9 insertions(+), 9 deletions(-)
>
> Index: linux-2.6/drivers/nvme/host/pci.c
> ===================================================================
> --- linux-2.6.orig/drivers/nvme/host/pci.c
> +++ linux-2.6/drivers/nvme/host/pci.c
> @@ -825,9 +825,9 @@ static blk_status_t nvme_map_metadata(st
> struct nvme_command *cmnd)
> {
> struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> + struct bio_vec bv = rq_integrity_vec(req);
>
> - iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
> - rq_dma_dir(req), 0);
> + iod->meta_dma = dma_map_bvec(dev->dev, &bv, rq_dma_dir(req), 0);
> if (dma_mapping_error(dev->dev, iod->meta_dma))
> return BLK_STS_IOERR;
> cmnd->rw.metadata = cpu_to_le64(iod->meta_dma);
> @@ -966,7 +966,7 @@ static __always_inline void nvme_pci_unm
> struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>
> dma_unmap_page(dev->dev, iod->meta_dma,
> - rq_integrity_vec(req)->bv_len, rq_dma_dir(req));
> + rq_integrity_vec(req).bv_len, rq_dma_dir(req));
> }
>
> if (blk_rq_nr_phys_segments(req))
> Index: linux-2.6/include/linux/blk-integrity.h
> ===================================================================
> --- linux-2.6.orig/include/linux/blk-integrity.h
> +++ linux-2.6/include/linux/blk-integrity.h
> @@ -109,11 +109,11 @@ static inline bool blk_integrity_rq(stru
> * Return the first bvec that contains integrity data. Only drivers that are
> * limited to a single integrity segment should use this helper.
> */
The comment here needs to be updated. rq_integrity_vec now becomes an
iter based operation that can be used by drivers with multiple
integrity segments.
> -static inline struct bio_vec *rq_integrity_vec(struct request *rq)
> +static inline struct bio_vec rq_integrity_vec(struct request *rq)
> {
> - if (WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1))
> - return NULL;
> - return rq->bio->bi_integrity->bip_vec;
> + WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1);
> + return mp_bvec_iter_bvec(rq->bio->bi_integrity->bip_vec,
> + rq->bio->bi_integrity->bip_iter);
> }
> #else /* CONFIG_BLK_DEV_INTEGRITY */
> static inline int blk_rq_count_integrity_sg(struct request_queue *q,
> @@ -177,9 +177,9 @@ static inline int blk_integrity_rq(struc
> return 0;
> }
>
> -static inline struct bio_vec *rq_integrity_vec(struct request *rq)
> +static inline struct bio_vec rq_integrity_vec(struct request *rq)
> {
> - return NULL;
> + BUG();
> }
> #endif /* CONFIG_BLK_DEV_INTEGRITY */
> #endif /* _LINUX_BLK_INTEGRITY_H */
>
>
[1] https://lore.kernel.org/linux-block/20240425183943.6319-6-joshi.k@samsung.com/
--
Anuj Gupta
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] block: change rq_integrity_vec to respect the iterator
2024-04-29 18:37 ` [PATCH] block: change rq_integrity_vec to respect the iterator Mikulas Patocka
2024-04-30 4:49 ` Anuj gupta
@ 2024-04-30 8:16 ` Keith Busch
2024-05-01 7:49 ` Christoph Hellwig
2024-05-02 11:25 ` Kanchan Joshi
3 siblings, 0 replies; 5+ messages in thread
From: Keith Busch @ 2024-04-30 8:16 UTC (permalink / raw)
To: Mikulas Patocka
Cc: Jens Axboe, Mike Snitzer, linux-block, dm-devel, linux-nvme
On Mon, Apr 29, 2024 at 08:37:26PM +0200, Mikulas Patocka wrote:
> I am changing dm-crypt, so that it can store the autenticated encryption
> tag directly into the NVMe metadata (without using dm-integrity). This
> will improve performance significantly, because we can avoid journaling
> done by dm-integrity. I've got it working, but I've found this bug, so I'm
> sending a patch for it.
Patch looks fine, but Kanchan sent nearly the same one last week:
https://lore.kernel.org/linux-block/20240425183943.6319-6-joshi.k@samsung.com/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] block: change rq_integrity_vec to respect the iterator
2024-04-29 18:37 ` [PATCH] block: change rq_integrity_vec to respect the iterator Mikulas Patocka
2024-04-30 4:49 ` Anuj gupta
2024-04-30 8:16 ` Keith Busch
@ 2024-05-01 7:49 ` Christoph Hellwig
2024-05-02 11:25 ` Kanchan Joshi
3 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2024-05-01 7:49 UTC (permalink / raw)
To: Mikulas Patocka
Cc: Jens Axboe, Keith Busch, Mike Snitzer, linux-block, dm-devel,
linux-nvme
Please work with Anuj and Kanchan on a single coherent set of patches
to make integrity splitting work properly. It would probably also
help if you posted your code using this somewhere.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] block: change rq_integrity_vec to respect the iterator
2024-04-29 18:37 ` [PATCH] block: change rq_integrity_vec to respect the iterator Mikulas Patocka
` (2 preceding siblings ...)
2024-05-01 7:49 ` Christoph Hellwig
@ 2024-05-02 11:25 ` Kanchan Joshi
3 siblings, 0 replies; 5+ messages in thread
From: Kanchan Joshi @ 2024-05-02 11:25 UTC (permalink / raw)
To: Mikulas Patocka, Jens Axboe, Keith Busch, Mike Snitzer
Cc: linux-block, dm-devel, linux-nvme
Hi Mikulas
On 4/30/2024 12:07 AM, Mikulas Patocka wrote:
> Hi
>
> I am changing dm-crypt, so that it can store the autenticated encryption
> tag directly into the NVMe metadata (without using dm-integrity). This
> will improve performance significantly, because we can avoid journaling
> done by dm-integrity. I've got it working, but I've found this bug, so I'm
> sending a patch for it.
As noted by others, there are couple of options to have this settled:
- I can borrow stuff from your commit description to patch #5 [1] and
send that out separately
- Or you can add the changes that Anuj pointed and roll out v2
- Or you can take the route suggested by Christoph
Please choose.
[1]
https://lore.kernel.org/linux-nvme/20240425183943.6319-6-joshi.k@samsung.com/
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-05-02 11:25 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CGME20240429183739epcas5p1181a2946641784305b188f320033c637@epcas5p1.samsung.com>
2024-04-29 18:37 ` [PATCH] block: change rq_integrity_vec to respect the iterator Mikulas Patocka
2024-04-30 4:49 ` Anuj gupta
2024-04-30 8:16 ` Keith Busch
2024-05-01 7:49 ` Christoph Hellwig
2024-05-02 11:25 ` Kanchan Joshi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox