* [RFC 0/2] Set steering-tag directly for PCIe P2P memory access
@ 2025-11-13 21:37 Zhiping Zhang
2025-11-13 21:37 ` [RFC 1/2] " Zhiping Zhang
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Zhiping Zhang @ 2025-11-13 21:37 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma,
linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas
Cc: Zhiping Zhang
Currently, the steering tag can be used for a CPU on the motherboard; the
ACPI check is in place to query and obtain the supported steering tag. This
same check is not possible for the accelerator devices because they are
designed to be plug-and-play to and ownership can not be always confirmed.
We intend to use the steering tag to improve RDMA NIC memory access on a GPU
or accelerator device via PCIe peer-to-peer. An application can construct a
dma handler (DMAH) with the device memory type and a direct steering-tag
value, and this DMAH can be used to register a RDMA memory region with DMABUF
for the RDMA NIC to access the device memory. The steering tag contains
additional instructions or hints to the GPU or accelerator device for
advanced memory operations, such as, read cache selection.
Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
Zhiping Zhang (2):
PCIe: Add a memory type for P2P memory access
RDMA: Set steering-tag value directly for P2P memory access
.../infiniband/core/uverbs_std_types_dmah.c | 28 +++++++++++++++++++
drivers/infiniband/core/uverbs_std_types_mr.c | 3 ++
drivers/infiniband/hw/mlx5/dmah.c | 5 ++--
.../net/ethernet/mellanox/mlx5/core/lib/st.c | 12 +++++---
drivers/pci/tph.c | 4 +++
include/linux/mlx5/driver.h | 4 +--
include/linux/pci-tph.h | 4 ++-
include/rdma/ib_verbs.h | 2 ++
include/uapi/rdma/ib_user_ioctl_cmds.h | 1 +
9 files changed, 53 insertions(+), 10 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 14+ messages in thread* [RFC 1/2] Set steering-tag directly for PCIe P2P memory access 2025-11-13 21:37 [RFC 0/2] Set steering-tag directly for PCIe P2P memory access Zhiping Zhang @ 2025-11-13 21:37 ` Zhiping Zhang 2025-11-14 13:12 ` Jonathan Cameron 2025-11-24 21:27 ` Bjorn Helgaas 2025-11-13 21:37 ` [RFC 2/2] " Zhiping Zhang 2026-01-03 5:38 ` [RFC 2/2] [fix] mlx5: modifications for use cases other than CPU Zhiping Zhang 2 siblings, 2 replies; 14+ messages in thread From: Zhiping Zhang @ 2025-11-13 21:37 UTC (permalink / raw) To: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas Cc: Zhiping Zhang PCIe: Add a memory type for P2P memory access The current tph memory type definition applies for CPU use cases. For device memory accessed in the peer-to-peer (P2P) manner, we need another memory type. Signed-off-by: Zhiping Zhang <zhipingz@meta.com> --- drivers/pci/tph.c | 4 ++++ include/linux/pci-tph.h | 4 +++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c index cc64f93709a4..d983c9778c72 100644 --- a/drivers/pci/tph.c +++ b/drivers/pci/tph.c @@ -67,6 +67,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type, if (info->pm_st_valid) return info->pm_st; break; + default: + return 0; } break; case PCI_TPH_REQ_EXT_TPH: /* 16-bit tag */ @@ -79,6 +81,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type, if (info->pm_xst_valid) return info->pm_xst; break; + default: + return 0; } break; default: diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h index 9e4e331b1603..b989302b6755 100644 --- a/include/linux/pci-tph.h +++ b/include/linux/pci-tph.h @@ -14,10 +14,12 @@ * depending on the memory type: Volatile Memory or Persistent Memory. When a * caller query about a target's Steering Tag, it must provide the target's * tph_mem_type. ECN link: https://members.pcisig.com/wg/PCI-SIG/document/15470. + * Add a new tph type for PCI peer-to-peer access use case. */ enum tph_mem_type { TPH_MEM_TYPE_VM, /* volatile memory */ - TPH_MEM_TYPE_PM /* persistent memory */ + TPH_MEM_TYPE_PM, /* persistent memory */ + TPH_MEM_TYPE_P2P /* peer-to-peer accessable memory */ }; #ifdef CONFIG_PCIE_TPH -- 2.47.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC 1/2] Set steering-tag directly for PCIe P2P memory access 2025-11-13 21:37 ` [RFC 1/2] " Zhiping Zhang @ 2025-11-14 13:12 ` Jonathan Cameron 2025-11-18 0:50 ` zhipingz 2025-11-24 21:27 ` Bjorn Helgaas 1 sibling, 1 reply; 14+ messages in thread From: Jonathan Cameron @ 2025-11-14 13:12 UTC (permalink / raw) To: Zhiping Zhang Cc: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas On Thu, 13 Nov 2025 13:37:11 -0800 Zhiping Zhang <zhipingz@meta.com> wrote: > PCIe: Add a memory type for P2P memory access > > The current tph memory type definition applies for CPU use cases. For device > memory accessed in the peer-to-peer (P2P) manner, we need another memory > type. > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com> > --- > drivers/pci/tph.c | 4 ++++ > include/linux/pci-tph.h | 4 +++- > 2 files changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c > index cc64f93709a4..d983c9778c72 100644 > --- a/drivers/pci/tph.c > +++ b/drivers/pci/tph.c > @@ -67,6 +67,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type, > if (info->pm_st_valid) > return info->pm_st; > break; > + default: > + return 0; > } > break; > case PCI_TPH_REQ_EXT_TPH: /* 16-bit tag */ > @@ -79,6 +81,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type, > if (info->pm_xst_valid) > return info->pm_xst; > break; > + default: > + return 0; > } > break; > default: > diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h > index 9e4e331b1603..b989302b6755 100644 > --- a/include/linux/pci-tph.h > +++ b/include/linux/pci-tph.h > @@ -14,10 +14,12 @@ > * depending on the memory type: Volatile Memory or Persistent Memory. When a > * caller query about a target's Steering Tag, it must provide the target's > * tph_mem_type. ECN link: https://members.pcisig.com/wg/PCI-SIG/document/15470. > + * Add a new tph type for PCI peer-to-peer access use case. > */ > enum tph_mem_type { > TPH_MEM_TYPE_VM, /* volatile memory */ > - TPH_MEM_TYPE_PM /* persistent memory */ > + TPH_MEM_TYPE_PM, /* persistent memory */ > + TPH_MEM_TYPE_P2P /* peer-to-peer accessable memory */ Trivial but this time definitely add the trailing comma! Maybe there will never be any more in here but maybe there will and we can avoid a line of churn next time. > }; > > #ifdef CONFIG_PCIE_TPH ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 1/2] Set steering-tag directly for PCIe P2P memory access 2025-11-14 13:12 ` Jonathan Cameron @ 2025-11-18 0:50 ` zhipingz 0 siblings, 0 replies; 14+ messages in thread From: zhipingz @ 2025-11-18 0:50 UTC (permalink / raw) To: Jonathan Cameron Cc: jgg, leon, bhelgaas, linux-rdma, linux-pci, netdev, kbusch, yochai, yishaih > From: Jonathan Cameron @ 2025-11-14 13:12 UTC (permalink / raw) > To: Zhiping Zhang > Cc: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma, > linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas > > On Thu, 13 Nov 2025 13:37:11 -0800 > Zhiping Zhang <zhipingz@meta.com> wrote: > > > PCIe: Add a memory type for P2P memory access > > > > The current tph memory type definition applies for CPU use cases. For device > > memory accessed in the peer-to-peer (P2P) manner, we need another memory > > type. > > > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com> > > --- > > drivers/pci/tph.c | 4 ++++ > > include/linux/pci-tph.h | 4 +++- > > 2 files changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c > > index cc64f93709a4..d983c9778c72 100644 > > --- a/drivers/pci/tph.c > > +++ b/drivers/pci/tph.c > > @@ -67,6 +67,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type, > > if (info->pm_st_valid) > > return info->pm_st; > > break; > > + default: > > + return 0; > > } > > break; > > case PCI_TPH_REQ_EXT_TPH: /* 16-bit tag */ > > @@ -79,6 +81,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type, > > if (info->pm_xst_valid) > > return info->pm_xst; > > break; > > + default: > > + return 0; > > } > > break; > > default: > > diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h > > index 9e4e331b1603..b989302b6755 100644 > > --- a/include/linux/pci-tph.h > > +++ b/include/linux/pci-tph.h > > @@ -14,10 +14,12 @@ > > * depending on the memory type: Volatile Memory or Persistent Memory. When a > > * caller query about a target's Steering Tag, it must provide the target's > > * tph_mem_type. ECN link: https://members.pcisig.com/wg/PCI-SIG/document/15470. > > + * Add a new tph type for PCI peer-to-peer access use case. > > */ > > enum tph_mem_type { > > TPH_MEM_TYPE_VM, /* volatile memory */ > > - TPH_MEM_TYPE_PM /* persistent memory */ > > + TPH_MEM_TYPE_PM, /* persistent memory */ > > + TPH_MEM_TYPE_P2P /* peer-to-peer accessable memory */ > > Trivial but this time definitely add the trailing comma! Maybe there will never > be any more in here but maybe there will and we can avoid a line of > churn next time. > Thanks for catching that! I’ll add the trailing comma to the enum in the patch. > > }; > > > > #ifdef CONFIG_PCIE_TPH ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 1/2] Set steering-tag directly for PCIe P2P memory access 2025-11-13 21:37 ` [RFC 1/2] " Zhiping Zhang 2025-11-14 13:12 ` Jonathan Cameron @ 2025-11-24 21:27 ` Bjorn Helgaas 2025-12-01 17:43 ` Zhiping Zhang 1 sibling, 1 reply; 14+ messages in thread From: Bjorn Helgaas @ 2025-11-24 21:27 UTC (permalink / raw) To: Zhiping Zhang Cc: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas On Thu, Nov 13, 2025 at 01:37:11PM -0800, Zhiping Zhang wrote: > PCIe: Add a memory type for P2P memory access This should be in the Subject: line. It should also start with "PCI/TPH: ..." (not "PCIe") to match previous history. > The current tph memory type definition applies for CPU use cases. For device > memory accessed in the peer-to-peer (P2P) manner, we need another memory > type. s/tph/TPH/ Make this say what the patch does (not just that we *need* another memory type, that we actually *add* one). The subject line should also say what the patch does. I don't think this patch actually changes the *setting* of the steering tag (I could be wrong, I haven't looked carefully). > Signed-off-by: Zhiping Zhang <zhipingz@meta.com> > --- > drivers/pci/tph.c | 4 ++++ > include/linux/pci-tph.h | 4 +++- > 2 files changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c > index cc64f93709a4..d983c9778c72 100644 > --- a/drivers/pci/tph.c > +++ b/drivers/pci/tph.c > @@ -67,6 +67,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type, > if (info->pm_st_valid) > return info->pm_st; > break; > + default: > + return 0; > } > break; > case PCI_TPH_REQ_EXT_TPH: /* 16-bit tag */ > @@ -79,6 +81,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type, > if (info->pm_xst_valid) > return info->pm_xst; > break; > + default: > + return 0; > } > break; > default: > diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h > index 9e4e331b1603..b989302b6755 100644 > --- a/include/linux/pci-tph.h > +++ b/include/linux/pci-tph.h > @@ -14,10 +14,12 @@ > * depending on the memory type: Volatile Memory or Persistent Memory. When a > * caller query about a target's Steering Tag, it must provide the target's > * tph_mem_type. ECN link: https://members.pcisig.com/wg/PCI-SIG/document/15470. > + * Add a new tph type for PCI peer-to-peer access use case. > */ > enum tph_mem_type { > TPH_MEM_TYPE_VM, /* volatile memory */ > - TPH_MEM_TYPE_PM /* persistent memory */ > + TPH_MEM_TYPE_PM, /* persistent memory */ > + TPH_MEM_TYPE_P2P /* peer-to-peer accessable memory */ > }; > > #ifdef CONFIG_PCIE_TPH > -- > 2.47.3 > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 1/2] Set steering-tag directly for PCIe P2P memory access 2025-11-24 21:27 ` Bjorn Helgaas @ 2025-12-01 17:43 ` Zhiping Zhang 0 siblings, 0 replies; 14+ messages in thread From: Zhiping Zhang @ 2025-12-01 17:43 UTC (permalink / raw) To: Bjorn Helgaas Cc: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas > On Mon, 24 Nov 2025 15:27:53 -0600, Bjorn Helgaas wrote: > > PCIe: Add a memory type for P2P memory access > This should be in the Subject: line. > It should also start with "PCI/TPH: ..." (not "PCIe") to match > previous history. Thanks, ack! I will update the subject line. > > The current tph memory type definition applies for CPU use cases. For device > > memory accessed in the peer-to-peer (P2P) manner, we need another memory > > type. > s/tph/TPH/ > Make this say what the patch does (not just that we *need* another > memory type, that we actually *add* one). > The subject line should also say what the patch does. I don't think > this patch actually changes the *setting* of the steering tag (I could > be wrong, I haven't looked carefully). Sure, I’ll correct and revise the commit message to clearly state what the patch does. > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com> > > --- > > drivers/pci/tph.c | 4 ++++ > > include/linux/pci-tph.h | 4 +++- > > 2 files changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c > > index cc64f93709a4..d983c9778c72 100644 > > --- a/drivers/pci/tph.c > > +++ b/drivers/pci/tph.c > > @@ -67,6 +67,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type, > > if (info->pm_st_valid) > > return info->pm_st; > > break; > > + default: > > + return 0; > > } > > break; > > case PCI_TPH_REQ_EXT_TPH: /* 16-bit tag */ > > @@ -79,6 +81,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type, > > if (info->pm_xst_valid) > > return info->pm_xst; > > break; > > + default: > > + return 0; > > } > > break; > > default: > > diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h > > index 9e4e331b1603..b989302b6755 100644 > > --- a/include/linux/pci-tph.h > > +++ b/include/linux/pci-tph.h > > @@ -14,10 +14,12 @@ > > * depending on the memory type: Volatile Memory or Persistent Memory. When a > > * caller query about a target's Steering Tag, it must provide the target's > > * tph_mem_type. ECN link: https://members.pcisig.com/wg/PCI-SIG/document/15470. > > + * Add a new tph type for PCI peer-to-peer access use case. > > */ > > enum tph_mem_type { > > TPH_MEM_TYPE_VM, /* volatile memory */ > > - TPH_MEM_TYPE_PM /* persistent memory */ > > + TPH_MEM_TYPE_PM, /* persistent memory */ > > + TPH_MEM_TYPE_P2P /* peer-to-peer accessable memory */ > > }; > > > > #ifdef CONFIG_PCIE_TPH > > -- > > 2.47.3 > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC 2/2] Set steering-tag directly for PCIe P2P memory access 2025-11-13 21:37 [RFC 0/2] Set steering-tag directly for PCIe P2P memory access Zhiping Zhang 2025-11-13 21:37 ` [RFC 1/2] " Zhiping Zhang @ 2025-11-13 21:37 ` Zhiping Zhang 2025-11-17 16:00 ` Jason Gunthorpe 2026-01-03 5:38 ` [RFC 2/2] [fix] mlx5: modifications for use cases other than CPU Zhiping Zhang 2 siblings, 1 reply; 14+ messages in thread From: Zhiping Zhang @ 2025-11-13 21:37 UTC (permalink / raw) To: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas Cc: Zhiping Zhang RDMA: Set steering-tag value directly in DMAH struct for DMABUF MR This patch enables construction of a dma handler (DMAH) with the P2P memory type and a direct steering-tag value. It can be used to register a RDMA memory region with DMABUF for the RDMA NIC to access the other device's memory via P2P. Signed-off-by: Zhiping Zhang <zhipingz@meta.com> --- .../infiniband/core/uverbs_std_types_dmah.c | 28 +++++++++++++++++++ drivers/infiniband/core/uverbs_std_types_mr.c | 3 ++ drivers/infiniband/hw/mlx5/dmah.c | 5 ++-- .../net/ethernet/mellanox/mlx5/core/lib/st.c | 12 +++++--- include/linux/mlx5/driver.h | 4 +-- include/rdma/ib_verbs.h | 2 ++ include/uapi/rdma/ib_user_ioctl_cmds.h | 1 + 7 files changed, 46 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/core/uverbs_std_types_dmah.c b/drivers/infiniband/core/uverbs_std_types_dmah.c index 453ce656c6f2..1ef400f96965 100644 --- a/drivers/infiniband/core/uverbs_std_types_dmah.c +++ b/drivers/infiniband/core/uverbs_std_types_dmah.c @@ -61,6 +61,27 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DMAH_ALLOC)( dmah->valid_fields |= BIT(IB_DMAH_MEM_TYPE_EXISTS); } + if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL)) { + ret = uverbs_copy_from(&dmah->direct_st_val, attrs, + UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL); + if (ret) + goto err; + + if (dmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS)) { + ret = -EINVAL; + goto err; + } + if ((dmah->valid_fields & BIT(IB_DMAH_MEM_TYPE_EXISTS)) == 0) { + ret = -EINVAL; + goto err; + } + if (dmah->mem_type != TPH_MEM_TYPE_P2P) { + ret = -EINVAL; + goto err; + } + dmah->valid_fields |= BIT(IB_DMAH_DIRECT_ST_VAL_EXISTS); + } + if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_ALLOC_DMAH_PH)) { ret = uverbs_copy_from(&dmah->ph, attrs, UVERBS_ATTR_ALLOC_DMAH_PH); @@ -107,6 +128,10 @@ static const struct uverbs_attr_spec uverbs_dmah_mem_type[] = { .type = UVERBS_ATTR_TYPE_PTR_IN, UVERBS_ATTR_NO_DATA(), }, + [TPH_MEM_TYPE_P2P] = { + .type = UVERBS_ATTR_TYPE_PTR_IN, + UVERBS_ATTR_NO_DATA(), + }, }; DECLARE_UVERBS_NAMED_METHOD( @@ -123,6 +148,9 @@ DECLARE_UVERBS_NAMED_METHOD( UA_OPTIONAL), UVERBS_ATTR_PTR_IN(UVERBS_ATTR_ALLOC_DMAH_PH, UVERBS_ATTR_TYPE(u8), + UA_OPTIONAL), + UVERBS_ATTR_PTR_IN(UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL, + UVERBS_ATTR_TYPE(u16), UA_OPTIONAL)); DECLARE_UVERBS_NAMED_METHOD_DESTROY( diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c b/drivers/infiniband/core/uverbs_std_types_mr.c index 570b9656801d..10e47934898e 100644 --- a/drivers/infiniband/core/uverbs_std_types_mr.c +++ b/drivers/infiniband/core/uverbs_std_types_mr.c @@ -346,6 +346,9 @@ static int UVERBS_HANDLER(UVERBS_METHOD_REG_MR)( UVERBS_ATTR_REG_MR_DMA_HANDLE); if (IS_ERR(dmah)) return PTR_ERR(dmah); + if (dmah->mem_type == TPH_MEM_TYPE_P2P && has_fd == false) { + return -EINVAL; + } } ret = uverbs_get_flags32(&access_flags, attrs, diff --git a/drivers/infiniband/hw/mlx5/dmah.c b/drivers/infiniband/hw/mlx5/dmah.c index 362a88992ffa..98c8d3313653 100644 --- a/drivers/infiniband/hw/mlx5/dmah.c +++ b/drivers/infiniband/hw/mlx5/dmah.c @@ -15,8 +15,7 @@ static int mlx5_ib_alloc_dmah(struct ib_dmah *ibdmah, { struct mlx5_core_dev *mdev = to_mdev(ibdmah->device)->mdev; struct mlx5_ib_dmah *dmah = to_mdmah(ibdmah); - u16 st_bits = BIT(IB_DMAH_CPU_ID_EXISTS) | - BIT(IB_DMAH_MEM_TYPE_EXISTS); + u16 st_bits = BIT(IB_DMAH_MEM_TYPE_EXISTS); int err; /* PH is a must for TPH following PCIe spec 6.2-1.0 */ @@ -28,7 +27,7 @@ static int mlx5_ib_alloc_dmah(struct ib_dmah *ibdmah, if ((ibdmah->valid_fields & st_bits) != st_bits) return -EINVAL; err = mlx5_st_alloc_index(mdev, ibdmah->mem_type, - ibdmah->cpu_id, &dmah->st_index); + ibdmah->cpu_id, &dmah->st_index, ibdmah->direct_st_val); if (err) return err; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c index 47fe215f66bf..690ad8536128 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c @@ -80,7 +80,7 @@ void mlx5_st_destroy(struct mlx5_core_dev *dev) } int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type, - unsigned int cpu_uid, u16 *st_index) + unsigned int cpu_uid, u16 *st_index, u16 direct_st_val) { struct mlx5_st_idx_data *idx_data; struct mlx5_st *st = dev->st; @@ -92,9 +92,13 @@ int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type, if (!st) return -EOPNOTSUPP; - ret = pcie_tph_get_cpu_st(dev->pdev, mem_type, cpu_uid, &tag); - if (ret) - return ret; + if (mem_type == TPH_MEM_TYPE_P2P) + tag = direct_st_val; + else { + ret = pcie_tph_get_cpu_st(dev->pdev, mem_type, cpu_uid, &tag); + if (ret) + return ret; + } mutex_lock(&st->lock); diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 1c8ba601e760..a58be1f2844b 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1170,12 +1170,12 @@ int mlx5_dm_sw_icm_dealloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type #ifdef CONFIG_PCIE_TPH int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type, - unsigned int cpu_uid, u16 *st_index); + unsigned int cpu_uid, u16 *st_index, u16 direct_st_val); int mlx5_st_dealloc_index(struct mlx5_core_dev *dev, u16 st_index); #else static inline int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type, - unsigned int cpu_uid, u16 *st_index) + unsigned int cpu_uid, u16 *st_index, u16 direct_st_val) { return -EOPNOTSUPP; } diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 465b73d94f33..30a26b524f03 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1852,6 +1852,7 @@ enum { IB_DMAH_CPU_ID_EXISTS, IB_DMAH_MEM_TYPE_EXISTS, IB_DMAH_PH_EXISTS, + IB_DMAH_DIRECT_ST_VAL_EXISTS, }; struct ib_dmah { @@ -1866,6 +1867,7 @@ struct ib_dmah { atomic_t usecnt; u8 ph; u8 valid_fields; /* use IB_DMAH_XXX_EXISTS */ + u16 direct_st_val; }; struct ib_mr { diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h index 17f963014eca..42b3892b6761 100644 --- a/include/uapi/rdma/ib_user_ioctl_cmds.h +++ b/include/uapi/rdma/ib_user_ioctl_cmds.h @@ -242,6 +242,7 @@ enum uverbs_attrs_alloc_dmah_cmd_attr_ids { UVERBS_ATTR_ALLOC_DMAH_CPU_ID, UVERBS_ATTR_ALLOC_DMAH_TPH_MEM_TYPE, UVERBS_ATTR_ALLOC_DMAH_PH, + UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL, }; enum uverbs_attrs_free_dmah_cmd_attr_ids { -- 2.47.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access 2025-11-13 21:37 ` [RFC 2/2] " Zhiping Zhang @ 2025-11-17 16:00 ` Jason Gunthorpe 2025-11-20 7:24 ` Zhiping Zhang 0 siblings, 1 reply; 14+ messages in thread From: Jason Gunthorpe @ 2025-11-17 16:00 UTC (permalink / raw) To: Zhiping Zhang Cc: Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas On Thu, Nov 13, 2025 at 01:37:12PM -0800, Zhiping Zhang wrote: > RDMA: Set steering-tag value directly in DMAH struct for DMABUF MR > > This patch enables construction of a dma handler (DMAH) with the P2P memory type > and a direct steering-tag value. It can be used to register a RDMA memory > region with DMABUF for the RDMA NIC to access the other device's memory via P2P. > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com> > --- > .../infiniband/core/uverbs_std_types_dmah.c | 28 +++++++++++++++++++ > drivers/infiniband/core/uverbs_std_types_mr.c | 3 ++ > drivers/infiniband/hw/mlx5/dmah.c | 5 ++-- > .../net/ethernet/mellanox/mlx5/core/lib/st.c | 12 +++++--- > include/linux/mlx5/driver.h | 4 +-- > include/rdma/ib_verbs.h | 2 ++ > include/uapi/rdma/ib_user_ioctl_cmds.h | 1 + > 7 files changed, 46 insertions(+), 9 deletions(-) > > diff --git a/drivers/infiniband/core/uverbs_std_types_dmah.c b/drivers/infiniband/core/uverbs_std_types_dmah.c > index 453ce656c6f2..1ef400f96965 100644 > --- a/drivers/infiniband/core/uverbs_std_types_dmah.c > +++ b/drivers/infiniband/core/uverbs_std_types_dmah.c > @@ -61,6 +61,27 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DMAH_ALLOC)( > dmah->valid_fields |= BIT(IB_DMAH_MEM_TYPE_EXISTS); > } > > + if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL)) { > + ret = uverbs_copy_from(&dmah->direct_st_val, attrs, > + UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL); > + if (ret) > + goto err; This should not come from userspace, the dmabuf exporter should provide any TPH hints as part of the attachment process. We are trying not to allow userspace raw access to the TPH values, so this is not a desirable UAPI here. Jason ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access 2025-11-17 16:00 ` Jason Gunthorpe @ 2025-11-20 7:24 ` Zhiping Zhang 2025-11-20 13:11 ` Jason Gunthorpe 0 siblings, 1 reply; 14+ messages in thread From: Zhiping Zhang @ 2025-11-20 7:24 UTC (permalink / raw) To: Jason Gunthorpe Cc: Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas On Monday, November 17, 2025 at 8:00 AM, Jason Gunthorpe wrote: > Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access > > On Thu, Nov 13, 2025 at 01:37:12PM -0800, Zhiping Zhang wrote: > > RDMA: Set steering-tag value directly in DMAH struct for DMABUF MR > > > > This patch enables construction of a dma handler (DMAH) with the P2P memory type > > and a direct steering-tag value. It can be used to register a RDMA memory > > region with DMABUF for the RDMA NIC to access the other device's memory via P2P. > > > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com> > > --- > > .../infiniband/core/uverbs_std_types_dmah.c | 28 +++++++++++++++++++ > > drivers/infiniband/core/uverbs_std_types_mr.c | 3 ++ > > drivers/infiniband/hw/mlx5/dmah.c | 5 ++-- > > .../net/ethernet/mellanox/mlx5/core/lib/st.c | 12 +++++--- > > include/linux/mlx5/driver.h | 4 +-- > > include/rdma/ib_verbs.h | 2 ++ > > include/uapi/rdma/ib_user_ioctl_cmds.h | 1 + > > 7 files changed, 46 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/infiniband/core/uverbs_std_types_dmah.c b/drivers/infiniband/core/uverbs_std_types_dmah.c > > index 453ce656c6f2..1ef400f96965 100644 > > --- a/drivers/infiniband/core/uverbs_std_types_dmah.c > > +++ b/drivers/infiniband/core/uverbs_std_types_dmah.c > > @@ -61,6 +61,27 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DMAH_ALLOC)( > > dmah->valid_fields |= BIT(IB_DMAH_MEM_TYPE_EXISTS); > > } > > > > + if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL)) { > > + ret = uverbs_copy_from(&dmah->direct_st_val, attrs, > > + UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL); > > + if (ret) > > + goto err; > > This should not come from userspace, the dmabuf exporter should > provide any TPH hints as part of the attachment process. > > We are trying not to allow userspace raw access to the TPH values, so > this is not a desirable UAPI here. > > Jason Thanks for your feedback! I understand the concern about not exposing raw TPH values to userspace. To clarify, would it be acceptable to use an index-based mapping table, where userspace provides an index and the kernel translates it to the appropriate TPH value? Given that the PCIe spec allows up to 16-bit TPH values, this could require a mapping table of up to 128KB. Do you see this as a reasonable approach, or is there a preferred alternative? Additionally, in cases where the dmabuf exporter device can handle all possible 16-bit TPH values (i.e., it has its own internal mapping logic or table), should this still be entirely abstracted away from userspace? Zhiping ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access 2025-11-20 7:24 ` Zhiping Zhang @ 2025-11-20 13:11 ` Jason Gunthorpe 2025-12-04 8:10 ` Zhiping Zhang 0 siblings, 1 reply; 14+ messages in thread From: Jason Gunthorpe @ 2025-11-20 13:11 UTC (permalink / raw) To: Zhiping Zhang Cc: Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas On Wed, Nov 19, 2025 at 11:24:40PM -0800, Zhiping Zhang wrote: > On Monday, November 17, 2025 at 8:00 AM, Jason Gunthorpe wrote: > > Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access > > > > On Thu, Nov 13, 2025 at 01:37:12PM -0800, Zhiping Zhang wrote: > > > RDMA: Set steering-tag value directly in DMAH struct for DMABUF MR > > > > > > This patch enables construction of a dma handler (DMAH) with the P2P memory type > > > and a direct steering-tag value. It can be used to register a RDMA memory > > > region with DMABUF for the RDMA NIC to access the other device's memory via P2P. > > > > > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com> > > > --- > > > .../infiniband/core/uverbs_std_types_dmah.c | 28 +++++++++++++++++++ > > > drivers/infiniband/core/uverbs_std_types_mr.c | 3 ++ > > > drivers/infiniband/hw/mlx5/dmah.c | 5 ++-- > > > .../net/ethernet/mellanox/mlx5/core/lib/st.c | 12 +++++--- > > > include/linux/mlx5/driver.h | 4 +-- > > > include/rdma/ib_verbs.h | 2 ++ > > > include/uapi/rdma/ib_user_ioctl_cmds.h | 1 + > > > 7 files changed, 46 insertions(+), 9 deletions(-) > > > > > > diff --git a/drivers/infiniband/core/uverbs_std_types_dmah.c b/drivers/infiniband/core/uverbs_std_types_dmah.c > > > index 453ce656c6f2..1ef400f96965 100644 > > > --- a/drivers/infiniband/core/uverbs_std_types_dmah.c > > > +++ b/drivers/infiniband/core/uverbs_std_types_dmah.c > > > @@ -61,6 +61,27 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DMAH_ALLOC)( > > > dmah->valid_fields |= BIT(IB_DMAH_MEM_TYPE_EXISTS); > > > } > > > > > > + if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL)) { > > > + ret = uverbs_copy_from(&dmah->direct_st_val, attrs, > > > + UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL); > > > + if (ret) > > > + goto err; > > > > This should not come from userspace, the dmabuf exporter should > > provide any TPH hints as part of the attachment process. > > > > We are trying not to allow userspace raw access to the TPH values, so > > this is not a desirable UAPI here. > > Thanks for your feedback! > > I understand the concern about not exposing raw TPH values to > userspace. To clarify, would it be acceptable to use an index-based > mapping table, where userspace provides an index and the kernel > translates it to the appropriate TPH value? Given that the PCIe spec > allows up to 16-bit TPH values, this could require a mapping table > of up to 128KB. Do you see this as a reasonable approach, or is > there a preferred alternative? ? The issue here is to secure the TPH. The kernel driver that owns the exporting device should control what TPH values an importing driver will use. I don't see how an indirection table helps anything, you need to add an API to DMABUF to retrieve the tph. > Additionally, in cases where the dmabuf exporter device can handle all possible 16-bit > TPH values (i.e., it has its own internal mapping logic or table), should this still be > entirely abstracted away from userspace? I imagine the exporting device provides the raw on the wire TPH value it wants the importing device to use and the importing device is responsible to program it using whatever scheme it has. Jason ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access 2025-11-20 13:11 ` Jason Gunthorpe @ 2025-12-04 8:10 ` Zhiping Zhang 2025-12-27 19:22 ` Zhiping Zhang 0 siblings, 1 reply; 14+ messages in thread From: Zhiping Zhang @ 2025-12-04 8:10 UTC (permalink / raw) To: Jason Gunthorpe Cc: Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas On Monday 2025-11-20 13:11 UTC, Jason Gunthorpe wrote: > > Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access > > On Wed, Nov 19, 2025 at 11:24:40PM -0800, Zhiping Zhang wrote: > > On Monday, November 17, 2025 at 8:00 AM, Jason Gunthorpe wrote: > > > Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access > > > > > > On Thu, Nov 13, 2025 at 01:37:12PM -0800, Zhiping Zhang wrote: > > > > RDMA: Set steering-tag value directly in DMAH struct for DMABUF MR > > > > > > > > This patch enables construction of a dma handler (DMAH) with the P2P memory type > > > > and a direct steering-tag value. It can be used to register a RDMA memory > > > > region with DMABUF for the RDMA NIC to access the other device's memory via P2P. > > > > > > > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com> > > > > --- > > > > .../infiniband/core/uverbs_std_types_dmah.c | 28 +++++++++++++++++++ > > > > drivers/infiniband/core/uverbs_std_types_mr.c | 3 ++ > > > > drivers/infiniband/hw/mlx5/dmah.c | 5 ++-- > > > > .../net/ethernet/mellanox/mlx5/core/lib/st.c | 12 +++++--- > > > > include/linux/mlx5/driver.h | 4 +-- > > > > include/rdma/ib_verbs.h | 2 ++ > > > > include/uapi/rdma/ib_user_ioctl_cmds.h | 1 + > > > > 7 files changed, 46 insertions(+), 9 deletions(-) > > > > > > > > diff --git a/drivers/infiniband/core/uverbs_std_types_dmah.c b/drivers/infiniband/core/uverbs_std_types_dmah.c > > > > index 453ce656c6f2..1ef400f96965 100644 > > > > --- a/drivers/infiniband/core/uverbs_std_types_dmah.c > > > > +++ b/drivers/infiniband/core/uverbs_std_types_dmah.c > > > > @@ -61,6 +61,27 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DMAH_ALLOC)( > > > > dmah->valid_fields |= BIT(IB_DMAH_MEM_TYPE_EXISTS); > > > > } > > > > > > > > + if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL)) { > > > > + ret = uverbs_copy_from(&dmah->direct_st_val, attrs, > > > > + UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL); > > > > + if (ret) > > > > + goto err; > > > > > > This should not come from userspace, the dmabuf exporter should > > > provide any TPH hints as part of the attachment process. > > > > > > We are trying not to allow userspace raw access to the TPH values, so > > > this is not a desirable UAPI here. > > > > > Thanks for your feedback! > > > > I understand the concern about not exposing raw TPH values to > > userspace. To clarify, would it be acceptable to use an index-based > > mapping table, where userspace provides an index and the kernel > > translates it to the appropriate TPH value? Given that the PCIe spec > > allows up to 16-bit TPH values, this could require a mapping table > > of up to 128KB. Do you see this as a reasonable approach, or is > > there a preferred alternative? > > ? > > The issue here is to secure the TPH. The kernel driver that owns the > exporting device should control what TPH values an importing driver > will use. > > I don't see how an indirection table helps anything, you need to add > an API to DMABUF to retrieve the tph. I see, thanks for the clarification. Yes we can add and use another new API(s) for this purpose. Sorry for the delay: I was waiting for the final version of Leon's vfio-dmabuf patch series and plan to follow that for implementing the new API(s) needed. (https://lore.kernel.org/all/20251120-dmabuf-vfio-v9-6-d7f71607f371@nvidia.com/). > > > Additionally, in cases where the dmabuf exporter device can handle all possible 16-bit > > TPH values (i.e., it has its own internal mapping logic or table), should this still be > > entirely abstracted away from userspace? > > I imagine the exporting device provides the raw on the wire TPH value > it wants the importing device to use and the importing device is > responsible to program it using whatever scheme it has. > > Jason Can you suggest or elaborate a bit on the schmes you see feasible? When the exporting device supports all or multiple TPH values, it is desirable to have userspace processes select which TPH values to use for the dmabuf at runtime. Actually that is the main use case of this patch: the user can select the TPH values to associate desired P2P operations on the dmabuf. The difficulty is how we can provide this flexibility while still aligning with kernel and security best practices. Zhiping ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access 2025-12-04 8:10 ` Zhiping Zhang @ 2025-12-27 19:22 ` Zhiping Zhang 2026-01-06 0:57 ` Jason Gunthorpe 0 siblings, 1 reply; 14+ messages in thread From: Zhiping Zhang @ 2025-12-27 19:22 UTC (permalink / raw) To: Jason Gunthorpe Cc: Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas, Zhiping Zhang On Thur 2025-12-04 8:10 UTC Zhiping Zhang wrote: > On Monday 2025-11-20 13:11 UTC, Jason Gunthorpe wrote: > > > > Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access > > > > On Wed, Nov 19, 2025 at 11:24:40PM -0800, Zhiping Zhang wrote: > > > On Monday, November 17, 2025 at 8:00 AM, Jason Gunthorpe wrote: > > > > Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access > > > > > > > > On Thu, Nov 13, 2025 at 01:37:12PM -0800, Zhiping Zhang wrote: > > > > > RDMA: Set steering-tag value directly in DMAH struct for DMABUF MR > > > > > > > > > > This patch enables construction of a dma handler (DMAH) with the P2P memory type > > > > > and a direct steering-tag value. It can be used to register a RDMA memory > > > > > region with DMABUF for the RDMA NIC to access the other device's memory via P2P. > > > > > > > > > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com> > > > > > --- > > > > > .../infiniband/core/uverbs_std_types_dmah.c | 28 +++++++++++++++++++ > > > > > drivers/infiniband/core/uverbs_std_types_mr.c | 3 ++ > > > > > drivers/infiniband/hw/mlx5/dmah.c | 5 ++-- > > > > > .../net/ethernet/mellanox/mlx5/core/lib/st.c | 12 +++++--- > > > > > include/linux/mlx5/driver.h | 4 +-- > > > > > include/rdma/ib_verbs.h | 2 ++ > > > > > include/uapi/rdma/ib_user_ioctl_cmds.h | 1 + > > > > > 7 files changed, 46 insertions(+), 9 deletions(-) > > > > > > > > > > diff --git a/drivers/infiniband/core/uverbs_std_types_dmah.c b/drivers/infiniband/core/uverbs_std_types_dmah.c > > > > > index 453ce656c6f2..1ef400f96965 100644 > > > > > --- a/drivers/infiniband/core/uverbs_std_types_dmah.c > > > > > +++ b/drivers/infiniband/core/uverbs_std_types_dmah.c > > > > > @@ -61,6 +61,27 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DMAH_ALLOC)( > > > > > dmah->valid_fields |= BIT(IB_DMAH_MEM_TYPE_EXISTS); > > > > > } > > > > > > > > > > + if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL)) { > > > > > + ret = uverbs_copy_from(&dmah->direct_st_val, attrs, > > > > > + UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL); > > > > > + if (ret) > > > > > + goto err; > > > > > > > > This should not come from userspace, the dmabuf exporter should > > > > provide any TPH hints as part of the attachment process. > > > > > > > > We are trying not to allow userspace raw access to the TPH values, so > > > > this is not a desirable UAPI here. > > > > > > > > Thanks for your feedback! > > > > > > I understand the concern about not exposing raw TPH values to > > > userspace. To clarify, would it be acceptable to use an index-based > > > mapping table, where userspace provides an index and the kernel > > > translates it to the appropriate TPH value? Given that the PCIe spec > > > allows up to 16-bit TPH values, this could require a mapping table > > > of up to 128KB. Do you see this as a reasonable approach, or is > > > there a preferred alternative? > > > > ? > > > > The issue here is to secure the TPH. The kernel driver that owns the > > exporting device should control what TPH values an importing driver > > will use. > > > > I don't see how an indirection table helps anything, you need to add > > an API to DMABUF to retrieve the tph. > I see, thanks for the clarification. Yes we can add and use another new > API(s) for this purpose. > Sorry for the delay: I was waiting for the final version of Leon's > vfio-dmabuf patch series and plan to follow that for implementing the new > API(s) needed. > (https://lore.kernel.org/all/20251120-dmabuf-vfio-v9-6-d7f71607f371@nvidia.com/). > > > > > > Additionally, in cases where the dmabuf exporter device can handle all possible 16-bit > > > TPH values (i.e., it has its own internal mapping logic or table), should this still be > > > entirely abstracted away from userspace? > > > > I imagine the exporting device provides the raw on the wire TPH value > > it wants the importing device to use and the importing device is > > responsible to program it using whatever scheme it has. > > > > Jason > > Can you suggest or elaborate a bit on the schmes you see feasible? > > When the exporting device supports all or multiple TPH values, it is > desirable to have userspace processes select which TPH values to use > for the dmabuf at runtime. Actually that is the main use case of this > patch: the user can select the TPH values to associate desired P2P > operations on the dmabuf. The difficulty is how we can provide this > flexibility while still aligning with kernel and security best > practices. > > Zhiping Happy holidays! I went through the vfio-dmabuf patch series and Jason's comments once more. I think I have a proposal that addresses the concerns. For p2p or dmabuf use cases, we pass in an ID or fd similar to CPU_ID when allocating a dmah, and make a callback to the dmabuf exporter to get the TPH value associated with the fd. That involves adding a new dmabuf operation for the callback to get the TPH/tag value associated. I can start with vfio-dmabuf and add the new dmabuf op/ABI there based on Leon's patch. Pls let me know if you have any concerns or suggestions. Zhiping ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access 2025-12-27 19:22 ` Zhiping Zhang @ 2026-01-06 0:57 ` Jason Gunthorpe 0 siblings, 0 replies; 14+ messages in thread From: Jason Gunthorpe @ 2026-01-06 0:57 UTC (permalink / raw) To: Zhiping Zhang Cc: Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas On Sat, Dec 27, 2025 at 11:22:54AM -0800, Zhiping Zhang wrote: > For p2p or dmabuf use cases, we pass in an ID or fd similar to CPU_ID when > allocating a dmah, and make a callback to the dmabuf exporter to get the > TPH value associated with the fd. That involves adding a new dmabuf operation > for the callback to get the TPH/tag value associated. Ah, hum, that approach seems problematic since the dmah could be used with something that is not the exporting devices MMIO and this would allow userspace to subsitute in a wrong TPH which I think we should consider a security problem. I think you need to have the reg_mr_dmabuf itself enforce a TPH if the exporting DMABUF requests it that way we know the TPH and the MMIO addresses are properly linked together. Jason ^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC 2/2] [fix] mlx5: modifications for use cases other than CPU 2025-11-13 21:37 [RFC 0/2] Set steering-tag directly for PCIe P2P memory access Zhiping Zhang 2025-11-13 21:37 ` [RFC 1/2] " Zhiping Zhang 2025-11-13 21:37 ` [RFC 2/2] " Zhiping Zhang @ 2026-01-03 5:38 ` Zhiping Zhang 2 siblings, 0 replies; 14+ messages in thread From: Zhiping Zhang @ 2026-01-03 5:38 UTC (permalink / raw) To: Jason Gunthorpe Cc: Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas, Zhiping Zhang In order to set the tag value properly besides the CPU use case, we need to also fix and modify the few checks on CPU_ID in mlx5 RDMA code. Signed-off-by: Zhiping Zhang <zhipingz@meta.com> > [RFC 2/2] RDMA: Set steering-tag value directly for P2P memory access > > Currently, the steering tag can be used for a CPU on the motherboard; the > ACPI check is in place to query and obtain the supported steering tag. This > same check is not possible for the accelerator devices because they are > designed to be plug-and-play to and ownership can not be always confirmed. > > We intend to use the steering tag to improve RDMA NIC memory access on a GPU > or accelerator device via PCIe peer-to-peer. An application can construct a > dma handler (DMAH) with the device memory type and a direct steering-tag > value, and this DMAH can be used to register a RDMA memory region with DMABUF > for the RDMA NIC to access the device memory. The steering tag contains > additional instructions or hints to the GPU or accelerator device for > advanced memory operations, such as, read cache selection. > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com> --- drivers/infiniband/hw/mlx5/dmah.c | 3 ++- drivers/infiniband/hw/mlx5/mr.c | 6 ++++-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/dmah.c b/drivers/infiniband/hw/mlx5/dmah.c index 98c8d3313653..c0d8532f94ac 100644 --- a/drivers/infiniband/hw/mlx5/dmah.c +++ b/drivers/infiniband/hw/mlx5/dmah.c @@ -41,7 +41,8 @@ static int mlx5_ib_dealloc_dmah(struct ib_dmah *ibdmah, struct mlx5_ib_dmah *dmah = to_mdmah(ibdmah); struct mlx5_core_dev *mdev = to_mdev(ibdmah->device)->mdev; - if (ibdmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS)) + if (ibdmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS) || + ibdmah->valid_fields & BIT(IB_DMAH_DIRECT_ST_VAL_EXISTS)) return mlx5_st_dealloc_index(mdev, dmah->st_index); return 0; diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index d4917d5c2efa..fb0e0c5826c2 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -1470,7 +1470,8 @@ static struct ib_mr *create_real_mr(struct ib_pd *pd, struct ib_umem *umem, struct mlx5_ib_dmah *mdmah = to_mdmah(dmah); ph = dmah->ph; - if (dmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS)) + if (dmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS) || + dmah->valid_fields & BIT(IB_DMAH_DIRECT_ST_VAL_EXISTS)) st_index = mdmah->st_index; } @@ -1660,7 +1661,8 @@ reg_user_mr_dmabuf(struct ib_pd *pd, struct device *dma_device, struct mlx5_ib_dmah *mdmah = to_mdmah(dmah); ph = dmah->ph; - if (dmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS)) + if (dmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS) || + dmah->valid_fields & BIT(IB_DMAH_DIRECT_ST_VAL_EXISTS)) st_index = mdmah->st_index; } -- 2.47.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-01-06 0:58 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-13 21:37 [RFC 0/2] Set steering-tag directly for PCIe P2P memory access Zhiping Zhang 2025-11-13 21:37 ` [RFC 1/2] " Zhiping Zhang 2025-11-14 13:12 ` Jonathan Cameron 2025-11-18 0:50 ` zhipingz 2025-11-24 21:27 ` Bjorn Helgaas 2025-12-01 17:43 ` Zhiping Zhang 2025-11-13 21:37 ` [RFC 2/2] " Zhiping Zhang 2025-11-17 16:00 ` Jason Gunthorpe 2025-11-20 7:24 ` Zhiping Zhang 2025-11-20 13:11 ` Jason Gunthorpe 2025-12-04 8:10 ` Zhiping Zhang 2025-12-27 19:22 ` Zhiping Zhang 2026-01-06 0:57 ` Jason Gunthorpe 2026-01-03 5:38 ` [RFC 2/2] [fix] mlx5: modifications for use cases other than CPU Zhiping Zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).