* [PATCHv2 0/2] Using SGLs for userspace commands
@ 2024-11-12 21:06 Keith Busch
2024-11-12 21:06 ` [PATCHv2 1/2] nvme-pci: add support for sgl metadata Keith Busch
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Keith Busch @ 2024-11-12 21:06 UTC (permalink / raw)
To: linux-nvme; +Cc: hch, Keith Busch
From: Keith Busch <kbusch@kernel.org>
Changes from previous version:
Merged up to linux 6.12, which had some necessary blk-integrity
patches
Dropped all the initial "cleanup" patches. This is instead the minimum
patchset to satisfy using the feature.
Fixed setting the queue limit
Fixed the SGLS MPTR bit
Keith Busch (2):
nvme-pci: add support for sgl metadata
nvme-pci: use sgls for all user requests if possible
drivers/nvme/host/core.c | 3 +
drivers/nvme/host/nvme.h | 5 ++
drivers/nvme/host/pci.c | 139 ++++++++++++++++++++++++++++++++++-----
include/linux/nvme.h | 1 +
4 files changed, 131 insertions(+), 17 deletions(-)
--
2.43.5
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCHv2 1/2] nvme-pci: add support for sgl metadata 2024-11-12 21:06 [PATCHv2 0/2] Using SGLs for userspace commands Keith Busch @ 2024-11-12 21:06 ` Keith Busch 2024-11-13 4:58 ` Christoph Hellwig 2024-11-12 21:06 ` [PATCHv2 2/2] nvme-pci: use sgls for all user requests if possible Keith Busch 2024-11-13 4:50 ` [PATCHv2 0/2] Using SGLs for userspace commands Christoph Hellwig 2 siblings, 1 reply; 13+ messages in thread From: Keith Busch @ 2024-11-12 21:06 UTC (permalink / raw) To: linux-nvme; +Cc: hch, Keith Busch From: Keith Busch <kbusch@kernel.org> Supporting this mode allows merging requests with metadata that wouldn't be possible otherwise, and creating user space requests that straddle physically discontiguous pages. Signed-off-by: Keith Busch <kbusch@kernel.org> --- drivers/nvme/host/core.c | 3 + drivers/nvme/host/nvme.h | 5 ++ drivers/nvme/host/pci.c | 136 ++++++++++++++++++++++++++++++++++----- include/linux/nvme.h | 1 + 4 files changed, 129 insertions(+), 16 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index e119ba0f8ab8b..12e5064b9cba0 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1985,6 +1985,9 @@ static void nvme_set_ctrl_limits(struct nvme_ctrl *ctrl, lim->max_segments = min_t(u32, USHRT_MAX, min_not_zero(nvme_max_drv_segments(ctrl), ctrl->max_segments)); lim->max_integrity_segments = ctrl->max_integrity_segments; + if (lim->max_integrity_segments > 1 && + !nvme_ctrl_meta_sgl_supported(ctrl)) + lim->max_integrity_segments = 1; lim->virt_boundary_mask = NVME_CTRL_PAGE_SIZE - 1; lim->max_segment_size = UINT_MAX; lim->dma_alignment = 3; diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 900719c4c70c1..1709fbd30c6dc 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -1126,6 +1126,11 @@ static inline bool nvme_ctrl_sgl_supported(struct nvme_ctrl *ctrl) return ctrl->sgls & ((1 << 0) | (1 << 1)); } +static inline bool nvme_ctrl_meta_sgl_supported(struct nvme_ctrl *ctrl) +{ + return ctrl->sgls & NVME_CTRL_SGLS_MPTR; +} + #ifdef CONFIG_NVME_HOST_AUTH int __init nvme_init_auth(void); void __exit nvme_exit_auth(void); diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 0aa26a33f2315..919f173043d18 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -43,6 +43,7 @@ */ #define NVME_MAX_KB_SZ 8192 #define NVME_MAX_SEGS 128 +#define NVME_MAX_META_SEGS 15 #define NVME_MAX_NR_ALLOCATIONS 5 static int use_threaded_interrupts; @@ -144,6 +145,7 @@ struct nvme_dev { struct sg_table *hmb_sgt; mempool_t *iod_mempool; + mempool_t *iod_meta_mempool; /* shadow doorbell buffer support: */ __le32 *dbbuf_dbs; @@ -239,6 +241,8 @@ struct nvme_iod { dma_addr_t first_dma; dma_addr_t meta_dma; struct sg_table sgt; + struct sg_table meta_sgt; + union nvme_descriptor meta_list; union nvme_descriptor list[NVME_MAX_NR_ALLOCATIONS]; }; @@ -506,6 +510,14 @@ static void nvme_commit_rqs(struct blk_mq_hw_ctx *hctx) spin_unlock(&nvmeq->sq_lock); } +static inline bool nvme_pci_metadata_use_sgls(struct nvme_dev *dev, + struct request *req) +{ + if (!nvme_ctrl_meta_sgl_supported(&dev->ctrl)) + return false; + return req->nr_integrity_segments > 1; +} + static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req, int nseg) { @@ -518,6 +530,8 @@ static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req, return false; if (!nvmeq->qid) return false; + if (nvme_pci_metadata_use_sgls(dev, req)) + return true; if (!sgl_threshold || avg_seg_size < sgl_threshold) return false; return true; @@ -780,7 +794,8 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req, struct bio_vec bv = req_bvec(req); if (!is_pci_p2pdma_page(bv.bv_page)) { - if ((bv.bv_offset & (NVME_CTRL_PAGE_SIZE - 1)) + + if (!nvme_pci_metadata_use_sgls(dev, req) && + (bv.bv_offset & (NVME_CTRL_PAGE_SIZE - 1)) + bv.bv_len <= NVME_CTRL_PAGE_SIZE * 2) return nvme_setup_prp_simple(dev, req, &cmnd->rw, &bv); @@ -824,11 +839,69 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req, return ret; } -static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req, - struct nvme_command *cmnd) +static blk_status_t nvme_pci_setup_meta_sgls(struct nvme_dev *dev, + struct request *req) +{ + struct nvme_iod *iod = blk_mq_rq_to_pdu(req); + struct nvme_rw_command *cmnd = &iod->cmd.rw; + struct nvme_sgl_desc *sg_list; + struct scatterlist *sgl, *sg; + unsigned int entries; + dma_addr_t sgl_dma; + int rc, i; + + iod->meta_sgt.sgl = mempool_alloc(dev->iod_meta_mempool, GFP_ATOMIC); + if (!iod->meta_sgt.sgl) + return BLK_STS_RESOURCE; + + sg_init_table(iod->meta_sgt.sgl, req->nr_integrity_segments); + iod->meta_sgt.orig_nents = blk_rq_map_integrity_sg(req, + iod->meta_sgt.sgl); + if (!iod->meta_sgt.orig_nents) + goto out_free_sg; + + rc = dma_map_sgtable(dev->dev, &iod->meta_sgt, rq_dma_dir(req), + DMA_ATTR_NO_WARN); + if (rc) + goto out_free_sg; + + sg_list = dma_pool_alloc(dev->prp_small_pool, GFP_ATOMIC, &sgl_dma); + if (!sg_list) + goto out_unmap_sg; + + entries = iod->meta_sgt.nents; + iod->meta_list.sg_list = sg_list; + iod->meta_dma = sgl_dma; + + cmnd->flags = NVME_CMD_SGL_METASEG; + cmnd->metadata = cpu_to_le64(sgl_dma); + + sgl = iod->meta_sgt.sgl; + if (entries == 1) { + nvme_pci_sgl_set_data(sg_list, sgl); + return BLK_STS_OK; + } + + sgl_dma += sizeof(sizeof(*sg_list)); + nvme_pci_sgl_set_seg(sg_list, sgl_dma, entries); + for_each_sg(sgl, sg, entries, i) + nvme_pci_sgl_set_data(&sg_list[i + 1], sg); + + return BLK_STS_OK; + +out_unmap_sg: + dma_unmap_sgtable(dev->dev, &iod->meta_sgt, rq_dma_dir(req), 0); +out_free_sg: + mempool_free(iod->meta_sgt.sgl, dev->iod_meta_mempool); + return BLK_STS_RESOURCE; +} + +static blk_status_t nvme_pci_setup_meta_mptr(struct nvme_dev *dev, + struct request *req) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); struct bio_vec bv = rq_integrity_vec(req); + struct nvme_command *cmnd = &iod->cmd; iod->meta_dma = dma_map_bvec(dev->dev, &bv, rq_dma_dir(req), 0); if (dma_mapping_error(dev->dev, iod->meta_dma)) @@ -837,6 +910,13 @@ static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req, return BLK_STS_OK; } +static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req) +{ + if (nvme_pci_metadata_use_sgls(dev, req)) + return nvme_pci_setup_meta_sgls(dev, req); + return nvme_pci_setup_meta_mptr(dev, req); +} + static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); @@ -845,6 +925,7 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req) iod->aborted = false; iod->nr_allocations = -1; iod->sgt.nents = 0; + iod->meta_sgt.nents = 0; ret = nvme_setup_cmd(req->q->queuedata, req); if (ret) @@ -857,7 +938,7 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req) } if (blk_integrity_rq(req)) { - ret = nvme_map_metadata(dev, req, &iod->cmd); + ret = nvme_map_metadata(dev, req); if (ret) goto out_unmap_data; } @@ -961,17 +1042,31 @@ static void nvme_queue_rqs(struct request **rqlist) *rqlist = requeue_list; } +static __always_inline void nvme_unmap_metadata(struct nvme_dev *dev, + struct request *req) +{ + struct nvme_iod *iod = blk_mq_rq_to_pdu(req); + + if (!iod->meta_sgt.nents) { + dma_unmap_page(dev->dev, iod->meta_dma, + rq_integrity_vec(req).bv_len, + rq_dma_dir(req)); + return; + } + + dma_pool_free(dev->prp_small_pool, iod->meta_list.sg_list, + iod->meta_dma); + dma_unmap_sgtable(dev->dev, &iod->meta_sgt, rq_dma_dir(req), 0); + mempool_free(iod->meta_sgt.sgl, dev->iod_meta_mempool); +} + static __always_inline void nvme_pci_unmap_rq(struct request *req) { struct nvme_queue *nvmeq = req->mq_hctx->driver_data; struct nvme_dev *dev = nvmeq->dev; - if (blk_integrity_rq(req)) { - struct nvme_iod *iod = blk_mq_rq_to_pdu(req); - - dma_unmap_page(dev->dev, iod->meta_dma, - rq_integrity_vec(req).bv_len, rq_dma_dir(req)); - } + if (blk_integrity_rq(req)) + nvme_unmap_metadata(dev, req); if (blk_rq_nr_phys_segments(req)) nvme_unmap_data(dev, req); @@ -2767,6 +2862,7 @@ static void nvme_release_prp_pools(struct nvme_dev *dev) static int nvme_pci_alloc_iod_mempool(struct nvme_dev *dev) { + size_t meta_size = sizeof(struct scatterlist) * (NVME_MAX_META_SEGS + 1); size_t alloc_size = sizeof(struct scatterlist) * NVME_MAX_SEGS; dev->iod_mempool = mempool_create_node(1, @@ -2775,7 +2871,18 @@ static int nvme_pci_alloc_iod_mempool(struct nvme_dev *dev) dev_to_node(dev->dev)); if (!dev->iod_mempool) return -ENOMEM; + + dev->iod_meta_mempool = mempool_create_node(1, + mempool_kmalloc, mempool_kfree, + (void *)meta_size, GFP_KERNEL, + dev_to_node(dev->dev)); + if (!dev->iod_meta_mempool) + goto free; + return 0; +free: + mempool_destroy(dev->iod_mempool); + return -ENOMEM; } static void nvme_free_tagset(struct nvme_dev *dev) @@ -3107,12 +3214,7 @@ static struct nvme_dev *nvme_pci_alloc_dev(struct pci_dev *pdev, dev->ctrl.max_hw_sectors = min_t(u32, NVME_MAX_KB_SZ << 1, dma_opt_mapping_size(&pdev->dev) >> 9); dev->ctrl.max_segments = NVME_MAX_SEGS; - - /* - * There is no support for SGLs for metadata (yet), so we are limited to - * a single integrity segment for the separate metadata pointer. - */ - dev->ctrl.max_integrity_segments = 1; + dev->ctrl.max_integrity_segments = NVME_MAX_META_SEGS; return dev; out_put_device: @@ -3216,6 +3318,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id) nvme_free_queues(dev, 0); out_release_iod_mempool: mempool_destroy(dev->iod_mempool); + mempool_destroy(dev->iod_meta_mempool); out_release_prp_pools: nvme_release_prp_pools(dev); out_dev_unmap: @@ -3281,6 +3384,7 @@ static void nvme_remove(struct pci_dev *pdev) nvme_dbbuf_dma_free(dev); nvme_free_queues(dev, 0); mempool_destroy(dev->iod_mempool); + mempool_destroy(dev->iod_meta_mempool); nvme_release_prp_pools(dev); nvme_dev_unmap(dev); nvme_uninit_ctrl(&dev->ctrl); diff --git a/include/linux/nvme.h b/include/linux/nvme.h index 0a6e22038ce36..44d7f9e21527c 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -389,6 +389,7 @@ enum { NVME_CTRL_CTRATT_PREDICTABLE_LAT = 1 << 5, NVME_CTRL_CTRATT_NAMESPACE_GRANULARITY = 1 << 7, NVME_CTRL_CTRATT_UUID_LIST = 1 << 9, + NVME_CTRL_SGLS_MPTR = 1 << 19, }; struct nvme_lbaf { -- 2.43.5 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCHv2 1/2] nvme-pci: add support for sgl metadata 2024-11-12 21:06 ` [PATCHv2 1/2] nvme-pci: add support for sgl metadata Keith Busch @ 2024-11-13 4:58 ` Christoph Hellwig 2024-11-13 15:48 ` Keith Busch 0 siblings, 1 reply; 13+ messages in thread From: Christoph Hellwig @ 2024-11-13 4:58 UTC (permalink / raw) To: Keith Busch; +Cc: linux-nvme, hch, Keith Busch On Tue, Nov 12, 2024 at 01:06:19PM -0800, Keith Busch wrote: > Supporting this mode allows merging requests with metadata that wouldn't > be possible otherwise, and creating user space requests that straddle > physically discontiguous pages. Not just merging, but also creating :) I.e. it allows to transfer multiple non-contiguous metadata segements. > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index e119ba0f8ab8b..12e5064b9cba0 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -1985,6 +1985,9 @@ static void nvme_set_ctrl_limits(struct nvme_ctrl *ctrl, > lim->max_segments = min_t(u32, USHRT_MAX, > min_not_zero(nvme_max_drv_segments(ctrl), ctrl->max_segments)); > lim->max_integrity_segments = ctrl->max_integrity_segments; > + if (lim->max_integrity_segments > 1 && > + !nvme_ctrl_meta_sgl_supported(ctrl)) > + lim->max_integrity_segments = 1; Despite the general mess about the SGLS field in the nvme spec, the meta sgls bit really is a PCIe-only feasture, and this will break metadata support on RDMA. The nvme_ctrl_meta_sgl_supported needs to be in pci.c to set ctrl->max_integrity_segments based on it. > +static inline bool nvme_ctrl_meta_sgl_supported(struct nvme_ctrl *ctrl) > +{ > + return ctrl->sgls & NVME_CTRL_SGLS_MPTR; > +} .. and thus I'd move this to pci.c (or just drop the helper). > + NVME_CTRL_SGLS_MPTR = 1 << 19, Maybe give the other fields in SGLS a name as well? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCHv2 1/2] nvme-pci: add support for sgl metadata 2024-11-13 4:58 ` Christoph Hellwig @ 2024-11-13 15:48 ` Keith Busch 0 siblings, 0 replies; 13+ messages in thread From: Keith Busch @ 2024-11-13 15:48 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme On Wed, Nov 13, 2024 at 05:58:12AM +0100, Christoph Hellwig wrote: > On Tue, Nov 12, 2024 at 01:06:19PM -0800, Keith Busch wrote: > > Supporting this mode allows merging requests with metadata that wouldn't > > be possible otherwise, and creating user space requests that straddle > > physically discontiguous pages. > > Not just merging, but also creating :) I.e. it allows to transfer > multiple non-contiguous metadata segements. > > > > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > > index e119ba0f8ab8b..12e5064b9cba0 100644 > > --- a/drivers/nvme/host/core.c > > +++ b/drivers/nvme/host/core.c > > @@ -1985,6 +1985,9 @@ static void nvme_set_ctrl_limits(struct nvme_ctrl *ctrl, > > lim->max_segments = min_t(u32, USHRT_MAX, > > min_not_zero(nvme_max_drv_segments(ctrl), ctrl->max_segments)); > > lim->max_integrity_segments = ctrl->max_integrity_segments; > > + if (lim->max_integrity_segments > 1 && > > + !nvme_ctrl_meta_sgl_supported(ctrl)) > > + lim->max_integrity_segments = 1; > > Despite the general mess about the SGLS field in the nvme spec, the meta > sgls bit really is a PCIe-only feasture, and this will break metadata > support on RDMA. The nvme_ctrl_meta_sgl_supported needs to be in pci.c > to set ctrl->max_integrity_segments based on it. > > > +static inline bool nvme_ctrl_meta_sgl_supported(struct nvme_ctrl *ctrl) > > +{ > > + return ctrl->sgls & NVME_CTRL_SGLS_MPTR; > > +} > > .. and thus I'd move this to pci.c (or just drop the helper). > > > + NVME_CTRL_SGLS_MPTR = 1 << 19, > > Maybe give the other fields in SGLS a name as well? Ack on all suggestions. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCHv2 2/2] nvme-pci: use sgls for all user requests if possible 2024-11-12 21:06 [PATCHv2 0/2] Using SGLs for userspace commands Keith Busch 2024-11-12 21:06 ` [PATCHv2 1/2] nvme-pci: add support for sgl metadata Keith Busch @ 2024-11-12 21:06 ` Keith Busch 2024-11-13 4:58 ` Christoph Hellwig 2024-11-13 4:50 ` [PATCHv2 0/2] Using SGLs for userspace commands Christoph Hellwig 2 siblings, 1 reply; 13+ messages in thread From: Keith Busch @ 2024-11-12 21:06 UTC (permalink / raw) To: linux-nvme; +Cc: hch, Keith Busch From: Keith Busch <kbusch@kernel.org> If the device supports SGLs, use these for all user requests. This format encodes the expected transfer length so it can catch short buffer errors in a user command, whether it occurred accidently or maliciously. For controllers that support SGL data mode, this is a viable mitigation to CVE-2023-6238. Signed-off-by: Keith Busch <kbusch@kernel.org> --- drivers/nvme/host/pci.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 919f173043d18..8f59ff3be8cea 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -515,7 +515,8 @@ static inline bool nvme_pci_metadata_use_sgls(struct nvme_dev *dev, { if (!nvme_ctrl_meta_sgl_supported(&dev->ctrl)) return false; - return req->nr_integrity_segments > 1; + return req->nr_integrity_segments > 1 || + nvme_req(req)->flags & NVME_REQ_USERCMD; } static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req, @@ -533,7 +534,7 @@ static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req, if (nvme_pci_metadata_use_sgls(dev, req)) return true; if (!sgl_threshold || avg_seg_size < sgl_threshold) - return false; + return nvme_req(req)->flags & NVME_REQ_USERCMD; return true; } -- 2.43.5 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCHv2 2/2] nvme-pci: use sgls for all user requests if possible 2024-11-12 21:06 ` [PATCHv2 2/2] nvme-pci: use sgls for all user requests if possible Keith Busch @ 2024-11-13 4:58 ` Christoph Hellwig 2024-11-13 15:48 ` Keith Busch 0 siblings, 1 reply; 13+ messages in thread From: Christoph Hellwig @ 2024-11-13 4:58 UTC (permalink / raw) To: Keith Busch; +Cc: linux-nvme, hch, Keith Busch On Tue, Nov 12, 2024 at 01:06:20PM -0800, Keith Busch wrote: > From: Keith Busch <kbusch@kernel.org> > > If the device supports SGLs, use these for all user requests. This > format encodes the expected transfer length so it can catch short buffer > errors in a user command, whether it occurred accidently or maliciously. > > For controllers that support SGL data mode, this is a viable mitigation > to CVE-2023-6238. The patch itself looks fine, but instead of the handwaivy mitigation, maybe just disable passthrough without SGL support by default to actually fix and not just mitigate the CVE? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCHv2 2/2] nvme-pci: use sgls for all user requests if possible 2024-11-13 4:58 ` Christoph Hellwig @ 2024-11-13 15:48 ` Keith Busch 2024-11-14 5:56 ` Christoph Hellwig 0 siblings, 1 reply; 13+ messages in thread From: Keith Busch @ 2024-11-13 15:48 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme On Wed, Nov 13, 2024 at 05:58:59AM +0100, Christoph Hellwig wrote: > On Tue, Nov 12, 2024 at 01:06:20PM -0800, Keith Busch wrote: > > From: Keith Busch <kbusch@kernel.org> > > > > If the device supports SGLs, use these for all user requests. This > > format encodes the expected transfer length so it can catch short buffer > > errors in a user command, whether it occurred accidently or maliciously. > > > > For controllers that support SGL data mode, this is a viable mitigation > > to CVE-2023-6238. > > The patch itself looks fine, but instead of the handwaivy mitigation, > maybe just disable passthrough without SGL support by default to actually > fix and not just mitigate the CVE? SGL is an optional feature that many devices don't implement. Even fewer do it for metadata. Disabling it entirely is "breaking userspace" for users I need to support. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCHv2 2/2] nvme-pci: use sgls for all user requests if possible 2024-11-13 15:48 ` Keith Busch @ 2024-11-14 5:56 ` Christoph Hellwig 2024-11-14 15:53 ` Keith Busch 0 siblings, 1 reply; 13+ messages in thread From: Christoph Hellwig @ 2024-11-14 5:56 UTC (permalink / raw) To: Keith Busch; +Cc: Christoph Hellwig, Keith Busch, linux-nvme On Wed, Nov 13, 2024 at 08:48:09AM -0700, Keith Busch wrote: > > > For controllers that support SGL data mode, this is a viable mitigation > > > to CVE-2023-6238. > > > > The patch itself looks fine, but instead of the handwaivy mitigation, > > maybe just disable passthrough without SGL support by default to actually > > fix and not just mitigate the CVE? > > SGL is an optional feature that many devices don't implement. Even fewer > do it for metadata. Disabling it entirely is "breaking userspace" for > users I need to support. Well, if that usage creates exploitable behavior we'll need to fix it and not just paper over it. Although this probably only really matters for the non-privileged passthrough. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCHv2 2/2] nvme-pci: use sgls for all user requests if possible 2024-11-14 5:56 ` Christoph Hellwig @ 2024-11-14 15:53 ` Keith Busch 2024-11-14 16:42 ` Christoph Hellwig 0 siblings, 1 reply; 13+ messages in thread From: Keith Busch @ 2024-11-14 15:53 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme On Thu, Nov 14, 2024 at 06:56:43AM +0100, Christoph Hellwig wrote: > On Wed, Nov 13, 2024 at 08:48:09AM -0700, Keith Busch wrote: > > > > For controllers that support SGL data mode, this is a viable mitigation > > > > to CVE-2023-6238. > > > > > > The patch itself looks fine, but instead of the handwaivy mitigation, > > > maybe just disable passthrough without SGL support by default to actually > > > fix and not just mitigate the CVE? > > > > SGL is an optional feature that many devices don't implement. Even fewer > > do it for metadata. Disabling it entirely is "breaking userspace" for > > users I need to support. > > Well, if that usage creates exploitable behavior we'll need to fix it > and not just paper over it. Although this probably only really matters > for the non-privileged passthrough. Only admin users can access this path by default. You have to opt-in for it, so it's not exploitable unless you ask for it. I can't see disabling the interface entirely. In a previous version of this patch, I had the kernel tainted if you tried to do passthrough without SGL support. Would that be a fair compromise if I reintroduce that behavior? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCHv2 2/2] nvme-pci: use sgls for all user requests if possible 2024-11-14 15:53 ` Keith Busch @ 2024-11-14 16:42 ` Christoph Hellwig 2024-11-14 17:29 ` Keith Busch 0 siblings, 1 reply; 13+ messages in thread From: Christoph Hellwig @ 2024-11-14 16:42 UTC (permalink / raw) To: Keith Busch; +Cc: Christoph Hellwig, Keith Busch, linux-nvme On Thu, Nov 14, 2024 at 08:53:49AM -0700, Keith Busch wrote: > Only admin users can access this path by default. You have to opt-in for > it, so it's not exploitable unless you ask for it. We do allow non-privileged users to send I/O commands and a small whitelist of admin commands by default, limited by the CSE effects. > I can't see disabling > the interface entirely. In a previous version of this patch, I had the > kernel tainted if you tried to do passthrough without SGL support. Would > that be a fair compromise if I reintroduce that behavior? Taint might be a bit too strong, but a one-time message in the log might be useful. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCHv2 2/2] nvme-pci: use sgls for all user requests if possible 2024-11-14 16:42 ` Christoph Hellwig @ 2024-11-14 17:29 ` Keith Busch 0 siblings, 0 replies; 13+ messages in thread From: Keith Busch @ 2024-11-14 17:29 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme On Thu, Nov 14, 2024 at 05:42:45PM +0100, Christoph Hellwig wrote: > On Thu, Nov 14, 2024 at 08:53:49AM -0700, Keith Busch wrote: > > Only admin users can access this path by default. You have to opt-in for > > it, so it's not exploitable unless you ask for it. > > We do allow non-privileged users to send I/O commands and a small > whitelist of admin commands by default, limited by the CSE effects. Not if you can't open the device. The default access permissions don't allow non-priviledged users. > > I can't see disabling > > the interface entirely. In a previous version of this patch, I had the > > kernel tainted if you tried to do passthrough without SGL support. Would > > that be a fair compromise if I reintroduce that behavior? > > Taint might be a bit too strong, but a one-time message in the log might > be useful. Sounds good to me. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCHv2 0/2] Using SGLs for userspace commands 2024-11-12 21:06 [PATCHv2 0/2] Using SGLs for userspace commands Keith Busch 2024-11-12 21:06 ` [PATCHv2 1/2] nvme-pci: add support for sgl metadata Keith Busch 2024-11-12 21:06 ` [PATCHv2 2/2] nvme-pci: use sgls for all user requests if possible Keith Busch @ 2024-11-13 4:50 ` Christoph Hellwig 2024-11-13 15:50 ` Keith Busch 2 siblings, 1 reply; 13+ messages in thread From: Christoph Hellwig @ 2024-11-13 4:50 UTC (permalink / raw) To: Keith Busch; +Cc: linux-nvme, hch, Keith Busch On Tue, Nov 12, 2024 at 01:06:18PM -0800, Keith Busch wrote: > From: Keith Busch <kbusch@kernel.org> > > Changes from previous version: > > Merged up to linux 6.12, which had some necessary blk-integrity > patches > > Dropped all the initial "cleanup" patches. This is instead the minimum > patchset to satisfy using the feature. Refresh my brain, but where was the previous version? A quick grep for " Using SGLs for userspace commands" on my linux-nvme mailbox and an internet search don't find results. Are the previous cleanups the fixes to the block layer metadata/integrity code to actually properly deal with splitting and merging? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCHv2 0/2] Using SGLs for userspace commands 2024-11-13 4:50 ` [PATCHv2 0/2] Using SGLs for userspace commands Christoph Hellwig @ 2024-11-13 15:50 ` Keith Busch 0 siblings, 0 replies; 13+ messages in thread From: Keith Busch @ 2024-11-13 15:50 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme On Wed, Nov 13, 2024 at 05:50:12AM +0100, Christoph Hellwig wrote: > On Tue, Nov 12, 2024 at 01:06:18PM -0800, Keith Busch wrote: > > From: Keith Busch <kbusch@kernel.org> > > > > Changes from previous version: > > > > Merged up to linux 6.12, which had some necessary blk-integrity > > patches > > > > Dropped all the initial "cleanup" patches. This is instead the minimum > > patchset to satisfy using the feature. > > Refresh my brain, but where was the previous version? A quick > grep for " Using SGLs for userspace commands" on my linux-nvme mailbox > and an internet search don't find results. > > Are the previous cleanups the fixes to the block layer metadata/integrity > code to actually properly deal with splitting and merging? It's here: https://lore.kernel.org/linux-nvme/20240904183818.713941-1-kbusch@meta.com/ This series follows on the block integrity series that fixed up the merging and segment accounting. Much of it is not necessary to get SGL metadata support. It was just cleanups in this path I did along the way. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-11-14 17:29 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-12 21:06 [PATCHv2 0/2] Using SGLs for userspace commands Keith Busch 2024-11-12 21:06 ` [PATCHv2 1/2] nvme-pci: add support for sgl metadata Keith Busch 2024-11-13 4:58 ` Christoph Hellwig 2024-11-13 15:48 ` Keith Busch 2024-11-12 21:06 ` [PATCHv2 2/2] nvme-pci: use sgls for all user requests if possible Keith Busch 2024-11-13 4:58 ` Christoph Hellwig 2024-11-13 15:48 ` Keith Busch 2024-11-14 5:56 ` Christoph Hellwig 2024-11-14 15:53 ` Keith Busch 2024-11-14 16:42 ` Christoph Hellwig 2024-11-14 17:29 ` Keith Busch 2024-11-13 4:50 ` [PATCHv2 0/2] Using SGLs for userspace commands Christoph Hellwig 2024-11-13 15:50 ` Keith Busch
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox