netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/2] Set steering-tag directly for PCIe P2P memory access
@ 2025-11-13 21:37 Zhiping Zhang
  2025-11-13 21:37 ` [RFC 1/2] " Zhiping Zhang
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Zhiping Zhang @ 2025-11-13 21:37 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas
  Cc: Zhiping Zhang

Currently, the steering tag can be used for a CPU on the motherboard; the
ACPI check is in place to query and obtain the supported steering tag. This
same check is not possible for the accelerator devices because they are 
designed to be plug-and-play to and ownership can not be always confirmed.

We intend to use the steering tag to improve RDMA NIC memory access on a GPU
or accelerator device via PCIe peer-to-peer. An application can construct a
dma handler (DMAH) with the device memory type and a direct steering-tag
value, and this DMAH can be used to register a RDMA memory region with DMABUF
for the RDMA NIC to access the device memory. The steering tag contains
additional instructions or hints to the GPU or accelerator device for
advanced memory operations, such as, read cache selection.

Signed-off-by: Zhiping Zhang <zhipingz@meta.com>

Zhiping Zhang (2):
  PCIe: Add a memory type for P2P memory access
  RDMA: Set steering-tag value directly for P2P memory access

 .../infiniband/core/uverbs_std_types_dmah.c   | 28 +++++++++++++++++++
 drivers/infiniband/core/uverbs_std_types_mr.c |  3 ++
 drivers/infiniband/hw/mlx5/dmah.c             |  5 ++--
 .../net/ethernet/mellanox/mlx5/core/lib/st.c  | 12 +++++---
 drivers/pci/tph.c                             |  4 +++
 include/linux/mlx5/driver.h                   |  4 +--
 include/linux/pci-tph.h                       |  4 ++-
 include/rdma/ib_verbs.h                       |  2 ++
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  1 +
 9 files changed, 53 insertions(+), 10 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 1/2] Set steering-tag directly for PCIe P2P memory access
  2025-11-13 21:37 [RFC 0/2] Set steering-tag directly for PCIe P2P memory access Zhiping Zhang
@ 2025-11-13 21:37 ` Zhiping Zhang
  2025-11-14 13:12   ` Jonathan Cameron
  2025-11-24 21:27   ` Bjorn Helgaas
  2025-11-13 21:37 ` [RFC 2/2] " Zhiping Zhang
  2026-01-03  5:38 ` [RFC 2/2] [fix] mlx5: modifications for use cases other than CPU Zhiping Zhang
  2 siblings, 2 replies; 13+ messages in thread
From: Zhiping Zhang @ 2025-11-13 21:37 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas
  Cc: Zhiping Zhang

PCIe: Add a memory type for P2P memory access

The current tph memory type definition applies for CPU use cases. For device
memory accessed in the peer-to-peer (P2P) manner, we need another memory
type.

Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
---
 drivers/pci/tph.c       | 4 ++++
 include/linux/pci-tph.h | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
index cc64f93709a4..d983c9778c72 100644
--- a/drivers/pci/tph.c
+++ b/drivers/pci/tph.c
@@ -67,6 +67,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type,
 			if (info->pm_st_valid)
 				return info->pm_st;
 			break;
+		default:
+			return 0;
 		}
 		break;
 	case PCI_TPH_REQ_EXT_TPH: /* 16-bit tag */
@@ -79,6 +81,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type,
 			if (info->pm_xst_valid)
 				return info->pm_xst;
 			break;
+		default:
+			return 0;
 		}
 		break;
 	default:
diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
index 9e4e331b1603..b989302b6755 100644
--- a/include/linux/pci-tph.h
+++ b/include/linux/pci-tph.h
@@ -14,10 +14,12 @@
  * depending on the memory type: Volatile Memory or Persistent Memory. When a
  * caller query about a target's Steering Tag, it must provide the target's
  * tph_mem_type. ECN link: https://members.pcisig.com/wg/PCI-SIG/document/15470.
+ * Add a new tph type for PCI peer-to-peer access use case.
  */
 enum tph_mem_type {
 	TPH_MEM_TYPE_VM,	/* volatile memory */
-	TPH_MEM_TYPE_PM		/* persistent memory */
+	TPH_MEM_TYPE_PM,	/* persistent memory */
+	TPH_MEM_TYPE_P2P	/* peer-to-peer accessable memory */
 };
 
 #ifdef CONFIG_PCIE_TPH
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 2/2] Set steering-tag directly for PCIe P2P memory access
  2025-11-13 21:37 [RFC 0/2] Set steering-tag directly for PCIe P2P memory access Zhiping Zhang
  2025-11-13 21:37 ` [RFC 1/2] " Zhiping Zhang
@ 2025-11-13 21:37 ` Zhiping Zhang
  2025-11-17 16:00   ` Jason Gunthorpe
  2026-01-03  5:38 ` [RFC 2/2] [fix] mlx5: modifications for use cases other than CPU Zhiping Zhang
  2 siblings, 1 reply; 13+ messages in thread
From: Zhiping Zhang @ 2025-11-13 21:37 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas
  Cc: Zhiping Zhang

RDMA: Set steering-tag value directly in DMAH struct for DMABUF MR

This patch enables construction of a dma handler (DMAH) with the P2P memory type
and a direct steering-tag value. It can be used to register a RDMA memory
region with DMABUF for the RDMA NIC to access the other device's memory via P2P.

Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
---
 .../infiniband/core/uverbs_std_types_dmah.c   | 28 +++++++++++++++++++
 drivers/infiniband/core/uverbs_std_types_mr.c |  3 ++
 drivers/infiniband/hw/mlx5/dmah.c             |  5 ++--
 .../net/ethernet/mellanox/mlx5/core/lib/st.c  | 12 +++++---
 include/linux/mlx5/driver.h                   |  4 +--
 include/rdma/ib_verbs.h                       |  2 ++
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  1 +
 7 files changed, 46 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_std_types_dmah.c b/drivers/infiniband/core/uverbs_std_types_dmah.c
index 453ce656c6f2..1ef400f96965 100644
--- a/drivers/infiniband/core/uverbs_std_types_dmah.c
+++ b/drivers/infiniband/core/uverbs_std_types_dmah.c
@@ -61,6 +61,27 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DMAH_ALLOC)(
 		dmah->valid_fields |= BIT(IB_DMAH_MEM_TYPE_EXISTS);
 	}
 
+	if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL)) {
+		ret = uverbs_copy_from(&dmah->direct_st_val, attrs,
+				       UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL);
+		if (ret)
+			goto err;
+
+		if (dmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS)) {
+			ret = -EINVAL;
+			goto err;
+		}
+		if ((dmah->valid_fields & BIT(IB_DMAH_MEM_TYPE_EXISTS)) == 0) {
+			ret = -EINVAL;
+			goto err;
+		}
+		if (dmah->mem_type != TPH_MEM_TYPE_P2P) {
+			ret = -EINVAL;
+			goto err;
+		}
+		dmah->valid_fields |= BIT(IB_DMAH_DIRECT_ST_VAL_EXISTS);
+	}
+
 	if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_ALLOC_DMAH_PH)) {
 		ret = uverbs_copy_from(&dmah->ph, attrs,
 				       UVERBS_ATTR_ALLOC_DMAH_PH);
@@ -107,6 +128,10 @@ static const struct uverbs_attr_spec uverbs_dmah_mem_type[] = {
 		.type = UVERBS_ATTR_TYPE_PTR_IN,
 		UVERBS_ATTR_NO_DATA(),
 	},
+	[TPH_MEM_TYPE_P2P] = {
+		.type = UVERBS_ATTR_TYPE_PTR_IN,
+		UVERBS_ATTR_NO_DATA(),
+	},
 };
 
 DECLARE_UVERBS_NAMED_METHOD(
@@ -123,6 +148,9 @@ DECLARE_UVERBS_NAMED_METHOD(
 			    UA_OPTIONAL),
 	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_ALLOC_DMAH_PH,
 			   UVERBS_ATTR_TYPE(u8),
+			   UA_OPTIONAL),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL,
+			   UVERBS_ATTR_TYPE(u16),
 			   UA_OPTIONAL));
 
 DECLARE_UVERBS_NAMED_METHOD_DESTROY(
diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c b/drivers/infiniband/core/uverbs_std_types_mr.c
index 570b9656801d..10e47934898e 100644
--- a/drivers/infiniband/core/uverbs_std_types_mr.c
+++ b/drivers/infiniband/core/uverbs_std_types_mr.c
@@ -346,6 +346,9 @@ static int UVERBS_HANDLER(UVERBS_METHOD_REG_MR)(
 					   UVERBS_ATTR_REG_MR_DMA_HANDLE);
 		if (IS_ERR(dmah))
 			return PTR_ERR(dmah);
+		if (dmah->mem_type == TPH_MEM_TYPE_P2P && has_fd == false) {
+			return -EINVAL;
+		}
 	}
 
 	ret = uverbs_get_flags32(&access_flags, attrs,
diff --git a/drivers/infiniband/hw/mlx5/dmah.c b/drivers/infiniband/hw/mlx5/dmah.c
index 362a88992ffa..98c8d3313653 100644
--- a/drivers/infiniband/hw/mlx5/dmah.c
+++ b/drivers/infiniband/hw/mlx5/dmah.c
@@ -15,8 +15,7 @@ static int mlx5_ib_alloc_dmah(struct ib_dmah *ibdmah,
 {
 	struct mlx5_core_dev *mdev = to_mdev(ibdmah->device)->mdev;
 	struct mlx5_ib_dmah *dmah = to_mdmah(ibdmah);
-	u16 st_bits = BIT(IB_DMAH_CPU_ID_EXISTS) |
-		      BIT(IB_DMAH_MEM_TYPE_EXISTS);
+	u16 st_bits = BIT(IB_DMAH_MEM_TYPE_EXISTS);
 	int err;
 
 	/* PH is a must for TPH following PCIe spec 6.2-1.0 */
@@ -28,7 +27,7 @@ static int mlx5_ib_alloc_dmah(struct ib_dmah *ibdmah,
 		if ((ibdmah->valid_fields & st_bits) != st_bits)
 			return -EINVAL;
 		err = mlx5_st_alloc_index(mdev, ibdmah->mem_type,
-					  ibdmah->cpu_id, &dmah->st_index);
+					  ibdmah->cpu_id, &dmah->st_index, ibdmah->direct_st_val);
 		if (err)
 			return err;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c
index 47fe215f66bf..690ad8536128 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c
@@ -80,7 +80,7 @@ void mlx5_st_destroy(struct mlx5_core_dev *dev)
 }
 
 int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type,
-			unsigned int cpu_uid, u16 *st_index)
+			unsigned int cpu_uid, u16 *st_index, u16 direct_st_val)
 {
 	struct mlx5_st_idx_data *idx_data;
 	struct mlx5_st *st = dev->st;
@@ -92,9 +92,13 @@ int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type,
 	if (!st)
 		return -EOPNOTSUPP;
 
-	ret = pcie_tph_get_cpu_st(dev->pdev, mem_type, cpu_uid, &tag);
-	if (ret)
-		return ret;
+	if (mem_type == TPH_MEM_TYPE_P2P)
+		tag = direct_st_val;
+	else {
+		ret = pcie_tph_get_cpu_st(dev->pdev, mem_type, cpu_uid, &tag);
+		if (ret)
+			return ret;
+	}
 
 	mutex_lock(&st->lock);
 
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 1c8ba601e760..a58be1f2844b 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1170,12 +1170,12 @@ int mlx5_dm_sw_icm_dealloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type
 
 #ifdef CONFIG_PCIE_TPH
 int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type,
-			unsigned int cpu_uid, u16 *st_index);
+			unsigned int cpu_uid, u16 *st_index, u16 direct_st_val);
 int mlx5_st_dealloc_index(struct mlx5_core_dev *dev, u16 st_index);
 #else
 static inline int mlx5_st_alloc_index(struct mlx5_core_dev *dev,
 				      enum tph_mem_type mem_type,
-				      unsigned int cpu_uid, u16 *st_index)
+				      unsigned int cpu_uid, u16 *st_index, u16 direct_st_val)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 465b73d94f33..30a26b524f03 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1852,6 +1852,7 @@ enum {
 	IB_DMAH_CPU_ID_EXISTS,
 	IB_DMAH_MEM_TYPE_EXISTS,
 	IB_DMAH_PH_EXISTS,
+	IB_DMAH_DIRECT_ST_VAL_EXISTS,
 };
 
 struct ib_dmah {
@@ -1866,6 +1867,7 @@ struct ib_dmah {
 	atomic_t usecnt;
 	u8 ph;
 	u8 valid_fields; /* use IB_DMAH_XXX_EXISTS */
+	u16 direct_st_val;
 };
 
 struct ib_mr {
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index 17f963014eca..42b3892b6761 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -242,6 +242,7 @@ enum uverbs_attrs_alloc_dmah_cmd_attr_ids {
 	UVERBS_ATTR_ALLOC_DMAH_CPU_ID,
 	UVERBS_ATTR_ALLOC_DMAH_TPH_MEM_TYPE,
 	UVERBS_ATTR_ALLOC_DMAH_PH,
+	UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL,
 };
 
 enum uverbs_attrs_free_dmah_cmd_attr_ids {
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC 1/2] Set steering-tag directly for PCIe P2P memory access
  2025-11-13 21:37 ` [RFC 1/2] " Zhiping Zhang
@ 2025-11-14 13:12   ` Jonathan Cameron
  2025-11-18  0:50     ` zhipingz
  2025-11-24 21:27   ` Bjorn Helgaas
  1 sibling, 1 reply; 13+ messages in thread
From: Jonathan Cameron @ 2025-11-14 13:12 UTC (permalink / raw)
  To: Zhiping Zhang
  Cc: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas

On Thu, 13 Nov 2025 13:37:11 -0800
Zhiping Zhang <zhipingz@meta.com> wrote:

> PCIe: Add a memory type for P2P memory access
> 
> The current tph memory type definition applies for CPU use cases. For device
> memory accessed in the peer-to-peer (P2P) manner, we need another memory
> type.
> 
> Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> ---
>  drivers/pci/tph.c       | 4 ++++
>  include/linux/pci-tph.h | 4 +++-
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> index cc64f93709a4..d983c9778c72 100644
> --- a/drivers/pci/tph.c
> +++ b/drivers/pci/tph.c
> @@ -67,6 +67,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type,
>  			if (info->pm_st_valid)
>  				return info->pm_st;
>  			break;
> +		default:
> +			return 0;
>  		}
>  		break;
>  	case PCI_TPH_REQ_EXT_TPH: /* 16-bit tag */
> @@ -79,6 +81,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type,
>  			if (info->pm_xst_valid)
>  				return info->pm_xst;
>  			break;
> +		default:
> +			return 0;
>  		}
>  		break;
>  	default:
> diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
> index 9e4e331b1603..b989302b6755 100644
> --- a/include/linux/pci-tph.h
> +++ b/include/linux/pci-tph.h
> @@ -14,10 +14,12 @@
>   * depending on the memory type: Volatile Memory or Persistent Memory. When a
>   * caller query about a target's Steering Tag, it must provide the target's
>   * tph_mem_type. ECN link: https://members.pcisig.com/wg/PCI-SIG/document/15470.
> + * Add a new tph type for PCI peer-to-peer access use case.
>   */
>  enum tph_mem_type {
>  	TPH_MEM_TYPE_VM,	/* volatile memory */
> -	TPH_MEM_TYPE_PM		/* persistent memory */
> +	TPH_MEM_TYPE_PM,	/* persistent memory */
> +	TPH_MEM_TYPE_P2P	/* peer-to-peer accessable memory */

Trivial but this time definitely add the trailing comma!  Maybe there will never
be any more in here but maybe there will and we can avoid a line of
churn next time.

>  };
>  
>  #ifdef CONFIG_PCIE_TPH


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access
  2025-11-13 21:37 ` [RFC 2/2] " Zhiping Zhang
@ 2025-11-17 16:00   ` Jason Gunthorpe
  2025-11-20  7:24     ` Zhiping Zhang
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2025-11-17 16:00 UTC (permalink / raw)
  To: Zhiping Zhang
  Cc: Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev,
	Keith Busch, Yochai Cohen, Yishai Hadas

On Thu, Nov 13, 2025 at 01:37:12PM -0800, Zhiping Zhang wrote:
> RDMA: Set steering-tag value directly in DMAH struct for DMABUF MR
> 
> This patch enables construction of a dma handler (DMAH) with the P2P memory type
> and a direct steering-tag value. It can be used to register a RDMA memory
> region with DMABUF for the RDMA NIC to access the other device's memory via P2P.
> 
> Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> ---
>  .../infiniband/core/uverbs_std_types_dmah.c   | 28 +++++++++++++++++++
>  drivers/infiniband/core/uverbs_std_types_mr.c |  3 ++
>  drivers/infiniband/hw/mlx5/dmah.c             |  5 ++--
>  .../net/ethernet/mellanox/mlx5/core/lib/st.c  | 12 +++++---
>  include/linux/mlx5/driver.h                   |  4 +--
>  include/rdma/ib_verbs.h                       |  2 ++
>  include/uapi/rdma/ib_user_ioctl_cmds.h        |  1 +
>  7 files changed, 46 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/infiniband/core/uverbs_std_types_dmah.c b/drivers/infiniband/core/uverbs_std_types_dmah.c
> index 453ce656c6f2..1ef400f96965 100644
> --- a/drivers/infiniband/core/uverbs_std_types_dmah.c
> +++ b/drivers/infiniband/core/uverbs_std_types_dmah.c
> @@ -61,6 +61,27 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DMAH_ALLOC)(
>  		dmah->valid_fields |= BIT(IB_DMAH_MEM_TYPE_EXISTS);
>  	}
>  
> +	if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL)) {
> +		ret = uverbs_copy_from(&dmah->direct_st_val, attrs,
> +				       UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL);
> +		if (ret)
> +			goto err;

This should not come from userspace, the dmabuf exporter should
provide any TPH hints as part of the attachment process.

We are trying not to allow userspace raw access to the TPH values, so
this is not a desirable UAPI here.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 1/2] Set steering-tag directly for PCIe P2P memory access
  2025-11-14 13:12   ` Jonathan Cameron
@ 2025-11-18  0:50     ` zhipingz
  0 siblings, 0 replies; 13+ messages in thread
From: zhipingz @ 2025-11-18  0:50 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: jgg, leon, bhelgaas, linux-rdma, linux-pci, netdev, kbusch,
	yochai, yishaih

> From: Jonathan Cameron @ 2025-11-14 13:12 UTC (permalink / raw)
>  To: Zhiping Zhang
>  Cc: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma,
>	linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas
>
> On Thu, 13 Nov 2025 13:37:11 -0800
> Zhiping Zhang <zhipingz@meta.com> wrote:
>
> > PCIe: Add a memory type for P2P memory access
> > 
> > The current tph memory type definition applies for CPU use cases. For device
> > memory accessed in the peer-to-peer (P2P) manner, we need another memory
> > type.
> > 
> > Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> > ---
> >  drivers/pci/tph.c       | 4 ++++
> >  include/linux/pci-tph.h | 4 +++-
> >  2 files changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> > index cc64f93709a4..d983c9778c72 100644
> > --- a/drivers/pci/tph.c
> > +++ b/drivers/pci/tph.c
> > @@ -67,6 +67,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type,
> > 			if (info->pm_st_valid)
> > 				return info->pm_st;
> > 			break;
> > +		default:
> > +			return 0;
> > 		}
> > 		break;
> > 	case PCI_TPH_REQ_EXT_TPH: /* 16-bit tag */
> > @@ -79,6 +81,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type,
> > 			if (info->pm_xst_valid)
> > 				return info->pm_xst;
> > 			break;
> > +		default:
> > +			return 0;
> > 		}
> > 		break;
> > 	default:
> > diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
> > index 9e4e331b1603..b989302b6755 100644
> > --- a/include/linux/pci-tph.h
> > +++ b/include/linux/pci-tph.h
> > @@ -14,10 +14,12 @@
> >   * depending on the memory type: Volatile Memory or Persistent Memory. When a
> >   * caller query about a target's Steering Tag, it must provide the target's
> >   * tph_mem_type. ECN link: https://members.pcisig.com/wg/PCI-SIG/document/15470.
> > + * Add a new tph type for PCI peer-to-peer access use case.
> >   */
> >  enum tph_mem_type {
> >  	TPH_MEM_TYPE_VM,	/* volatile memory */
> > -	TPH_MEM_TYPE_PM		/* persistent memory */
> > +	TPH_MEM_TYPE_PM,	/* persistent memory */
> > +	TPH_MEM_TYPE_P2P	/* peer-to-peer accessable memory */
>
> Trivial but this time definitely add the trailing comma!  Maybe there will never
> be any more in here but maybe there will and we can avoid a line of
> churn next time.
>

Thanks for catching that! I’ll add the trailing comma to the enum in the patch.

> >  };
> >  
> >  #ifdef CONFIG_PCIE_TPH


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access
  2025-11-17 16:00   ` Jason Gunthorpe
@ 2025-11-20  7:24     ` Zhiping Zhang
  2025-11-20 13:11       ` Jason Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Zhiping Zhang @ 2025-11-20  7:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev,
	Keith Busch, Yochai Cohen, Yishai Hadas

On Monday, November 17, 2025 at 8:00 AM, Jason Gunthorpe wrote:
> Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access
>
> On Thu, Nov 13, 2025 at 01:37:12PM -0800, Zhiping Zhang wrote:
> > RDMA: Set steering-tag value directly in DMAH struct for DMABUF MR
> >
> > This patch enables construction of a dma handler (DMAH) with the P2P memory type
> > and a direct steering-tag value. It can be used to register a RDMA memory
> > region with DMABUF for the RDMA NIC to access the other device's memory via P2P.
> >
> > Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> > ---
> > .../infiniband/core/uverbs_std_types_dmah.c   | 28 +++++++++++++++++++
> > drivers/infiniband/core/uverbs_std_types_mr.c |  3 ++
> > drivers/infiniband/hw/mlx5/dmah.c             |  5 ++--
> > .../net/ethernet/mellanox/mlx5/core/lib/st.c  | 12 +++++---
> > include/linux/mlx5/driver.h                   |  4 +--
> > include/rdma/ib_verbs.h                       |  2 ++
> > include/uapi/rdma/ib_user_ioctl_cmds.h        |  1 +
> > 7 files changed, 46 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/uverbs_std_types_dmah.c b/drivers/infiniband/core/uverbs_std_types_dmah.c
> > index 453ce656c6f2..1ef400f96965 100644
> > --- a/drivers/infiniband/core/uverbs_std_types_dmah.c
> > +++ b/drivers/infiniband/core/uverbs_std_types_dmah.c
> > @@ -61,6 +61,27 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DMAH_ALLOC)(
> >               dmah->valid_fields |= BIT(IB_DMAH_MEM_TYPE_EXISTS);
> >       }
> >
> > +     if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL)) {
> > +             ret = uverbs_copy_from(&dmah->direct_st_val, attrs,
> > +                                    UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL);
> > +             if (ret)
> > +                     goto err;
>
> This should not come from userspace, the dmabuf exporter should
> provide any TPH hints as part of the attachment process.
> 
> We are trying not to allow userspace raw access to the TPH values, so
> this is not a desirable UAPI here.
>
> Jason

Thanks for your feedback!

I understand the concern about not exposing raw TPH values to userspace.
To clarify, would it be acceptable to use an index-based mapping table, 
where userspace provides an index and the kernel translates it to the 
appropriate TPH value? Given that the PCIe spec allows up to 16-bit TPH values,
this could require a mapping table of up to 128KB. Do you see this as a reasonable
approach, or is there a preferred alternative?

Additionally, in cases where the dmabuf exporter device can handle all possible 16-bit
TPH values  (i.e., it has its own internal mapping logic or table), should this still be
entirely abstracted away from userspace?

Zhiping

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access
  2025-11-20  7:24     ` Zhiping Zhang
@ 2025-11-20 13:11       ` Jason Gunthorpe
  2025-12-04  8:10         ` Zhiping Zhang
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2025-11-20 13:11 UTC (permalink / raw)
  To: Zhiping Zhang
  Cc: Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev,
	Keith Busch, Yochai Cohen, Yishai Hadas

On Wed, Nov 19, 2025 at 11:24:40PM -0800, Zhiping Zhang wrote:
> On Monday, November 17, 2025 at 8:00 AM, Jason Gunthorpe wrote:
> > Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access
> >
> > On Thu, Nov 13, 2025 at 01:37:12PM -0800, Zhiping Zhang wrote:
> > > RDMA: Set steering-tag value directly in DMAH struct for DMABUF MR
> > >
> > > This patch enables construction of a dma handler (DMAH) with the P2P memory type
> > > and a direct steering-tag value. It can be used to register a RDMA memory
> > > region with DMABUF for the RDMA NIC to access the other device's memory via P2P.
> > >
> > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> > > ---
> > > .../infiniband/core/uverbs_std_types_dmah.c   | 28 +++++++++++++++++++
> > > drivers/infiniband/core/uverbs_std_types_mr.c |  3 ++
> > > drivers/infiniband/hw/mlx5/dmah.c             |  5 ++--
> > > .../net/ethernet/mellanox/mlx5/core/lib/st.c  | 12 +++++---
> > > include/linux/mlx5/driver.h                   |  4 +--
> > > include/rdma/ib_verbs.h                       |  2 ++
> > > include/uapi/rdma/ib_user_ioctl_cmds.h        |  1 +
> > > 7 files changed, 46 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/core/uverbs_std_types_dmah.c b/drivers/infiniband/core/uverbs_std_types_dmah.c
> > > index 453ce656c6f2..1ef400f96965 100644
> > > --- a/drivers/infiniband/core/uverbs_std_types_dmah.c
> > > +++ b/drivers/infiniband/core/uverbs_std_types_dmah.c
> > > @@ -61,6 +61,27 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DMAH_ALLOC)(
> > >               dmah->valid_fields |= BIT(IB_DMAH_MEM_TYPE_EXISTS);
> > >       }
> > >
> > > +     if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL)) {
> > > +             ret = uverbs_copy_from(&dmah->direct_st_val, attrs,
> > > +                                    UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL);
> > > +             if (ret)
> > > +                     goto err;
> >
> > This should not come from userspace, the dmabuf exporter should
> > provide any TPH hints as part of the attachment process.
> > 
> > We are trying not to allow userspace raw access to the TPH values, so
> > this is not a desirable UAPI here.
> 
> Thanks for your feedback!
> 
> I understand the concern about not exposing raw TPH values to
> userspace.  To clarify, would it be acceptable to use an index-based
> mapping table, where userspace provides an index and the kernel
> translates it to the appropriate TPH value? Given that the PCIe spec
> allows up to 16-bit TPH values, this could require a mapping table
> of up to 128KB. Do you see this as a reasonable approach, or is
> there a preferred alternative?

?

The issue here is to secure the TPH. The kernel driver that owns the
exporting device should control what TPH values an importing driver
will use.

I don't see how an indirection table helps anything, you need to add
an API to DMABUF to retrieve the tph.

> Additionally, in cases where the dmabuf exporter device can handle all possible 16-bit
> TPH values  (i.e., it has its own internal mapping logic or table), should this still be
> entirely abstracted away from userspace?

I imagine the exporting device provides the raw on the wire TPH value
it wants the importing device to use and the importing device is
responsible to program it using whatever scheme it has.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 1/2] Set steering-tag directly for PCIe P2P memory access
  2025-11-13 21:37 ` [RFC 1/2] " Zhiping Zhang
  2025-11-14 13:12   ` Jonathan Cameron
@ 2025-11-24 21:27   ` Bjorn Helgaas
  2025-12-01 17:43     ` Zhiping Zhang
  1 sibling, 1 reply; 13+ messages in thread
From: Bjorn Helgaas @ 2025-11-24 21:27 UTC (permalink / raw)
  To: Zhiping Zhang
  Cc: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas

On Thu, Nov 13, 2025 at 01:37:11PM -0800, Zhiping Zhang wrote:
> PCIe: Add a memory type for P2P memory access

This should be in the Subject: line.

It should also start with "PCI/TPH: ..." (not "PCIe") to match
previous history.

> The current tph memory type definition applies for CPU use cases. For device
> memory accessed in the peer-to-peer (P2P) manner, we need another memory
> type.

s/tph/TPH/

Make this say what the patch does (not just that we *need* another
memory type, that we actually *add* one).

The subject line should also say what the patch does.  I don't think
this patch actually changes the *setting* of the steering tag (I could
be wrong, I haven't looked carefully).

> Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> ---
>  drivers/pci/tph.c       | 4 ++++
>  include/linux/pci-tph.h | 4 +++-
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> index cc64f93709a4..d983c9778c72 100644
> --- a/drivers/pci/tph.c
> +++ b/drivers/pci/tph.c
> @@ -67,6 +67,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type,
>  			if (info->pm_st_valid)
>  				return info->pm_st;
>  			break;
> +		default:
> +			return 0;
>  		}
>  		break;
>  	case PCI_TPH_REQ_EXT_TPH: /* 16-bit tag */
> @@ -79,6 +81,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type,
>  			if (info->pm_xst_valid)
>  				return info->pm_xst;
>  			break;
> +		default:
> +			return 0;
>  		}
>  		break;
>  	default:
> diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
> index 9e4e331b1603..b989302b6755 100644
> --- a/include/linux/pci-tph.h
> +++ b/include/linux/pci-tph.h
> @@ -14,10 +14,12 @@
>   * depending on the memory type: Volatile Memory or Persistent Memory. When a
>   * caller query about a target's Steering Tag, it must provide the target's
>   * tph_mem_type. ECN link: https://members.pcisig.com/wg/PCI-SIG/document/15470.
> + * Add a new tph type for PCI peer-to-peer access use case.
>   */
>  enum tph_mem_type {
>  	TPH_MEM_TYPE_VM,	/* volatile memory */
> -	TPH_MEM_TYPE_PM		/* persistent memory */
> +	TPH_MEM_TYPE_PM,	/* persistent memory */
> +	TPH_MEM_TYPE_P2P	/* peer-to-peer accessable memory */
>  };
>  
>  #ifdef CONFIG_PCIE_TPH
> -- 
> 2.47.3
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 1/2] Set steering-tag directly for PCIe P2P memory access
  2025-11-24 21:27   ` Bjorn Helgaas
@ 2025-12-01 17:43     ` Zhiping Zhang
  0 siblings, 0 replies; 13+ messages in thread
From: Zhiping Zhang @ 2025-12-01 17:43 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, Keith Busch, Yochai Cohen, Yishai Hadas

> On Mon, 24 Nov 2025 15:27:53 -0600, Bjorn Helgaas wrote:
> > PCIe: Add a memory type for P2P memory access

> This should be in the Subject: line.

> It should also start with "PCI/TPH: ..." (not "PCIe") to match
> previous history.

Thanks, ack! I will update the subject line.

> > The current tph memory type definition applies for CPU use cases. For device
> > memory accessed in the peer-to-peer (P2P) manner, we need another memory
> > type.

> s/tph/TPH/

> Make this say what the patch does (not just that we *need* another
> memory type, that we actually *add* one).

> The subject line should also say what the patch does.  I don't think
> this patch actually changes the *setting* of the steering tag (I could
> be wrong, I haven't looked carefully).

Sure, I’ll correct and revise the commit message to clearly state what the
patch does.

> > Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> > ---
> >  drivers/pci/tph.c       | 4 ++++
> >  include/linux/pci-tph.h | 4 +++-
> >  2 files changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c
> > index cc64f93709a4..d983c9778c72 100644
> > --- a/drivers/pci/tph.c
> > +++ b/drivers/pci/tph.c
> > @@ -67,6 +67,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type,
> > 			if (info->pm_st_valid)
> > 				return info->pm_st;
> > 			break;
> > +		default:
> > +			return 0;
> > 		}
> > 		break;
> > 	case PCI_TPH_REQ_EXT_TPH: /* 16-bit tag */
> > @@ -79,6 +81,8 @@ static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type,
> > 			if (info->pm_xst_valid)
> > 				return info->pm_xst;
> > 			break;
> > +		default:
> > +			return 0;
> > 		}
> >  		break;
> >  	default:
> > diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h
> > index 9e4e331b1603..b989302b6755 100644
> > --- a/include/linux/pci-tph.h
> > +++ b/include/linux/pci-tph.h
> > @@ -14,10 +14,12 @@
> >   * depending on the memory type: Volatile Memory or Persistent Memory. When a
> >   * caller query about a target's Steering Tag, it must provide the target's
> >   * tph_mem_type. ECN link: https://members.pcisig.com/wg/PCI-SIG/document/15470.
> > + * Add a new tph type for PCI peer-to-peer access use case.
> >   */
> >  enum tph_mem_type {
> >  	TPH_MEM_TYPE_VM,	/* volatile memory */
> > -	TPH_MEM_TYPE_PM		/* persistent memory */
> > +	TPH_MEM_TYPE_PM,	/* persistent memory */
> > +	TPH_MEM_TYPE_P2P	/* peer-to-peer accessable memory */
> >  };
> >  
> >  #ifdef CONFIG_PCIE_TPH
> > -- 
> > 2.47.3
> > 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access
  2025-11-20 13:11       ` Jason Gunthorpe
@ 2025-12-04  8:10         ` Zhiping Zhang
  2025-12-27 19:22           ` Zhiping Zhang
  0 siblings, 1 reply; 13+ messages in thread
From: Zhiping Zhang @ 2025-12-04  8:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev,
	Keith Busch, Yochai Cohen, Yishai Hadas

On Monday 2025-11-20 13:11 UTC, Jason Gunthorpe wrote:
>
> Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access
>
> On Wed, Nov 19, 2025 at 11:24:40PM -0800, Zhiping Zhang wrote:
> > On Monday, November 17, 2025 at 8:00 AM, Jason Gunthorpe wrote:
> > > Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access
> > >
> > > On Thu, Nov 13, 2025 at 01:37:12PM -0800, Zhiping Zhang wrote:
> > > > RDMA: Set steering-tag value directly in DMAH struct for DMABUF MR
> > > >
> > > > This patch enables construction of a dma handler (DMAH) with the P2P memory type
> > > > and a direct steering-tag value. It can be used to register a RDMA memory
> > > > region with DMABUF for the RDMA NIC to access the other device's memory via P2P.
> > > >
> > > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> > > > ---
> > > > .../infiniband/core/uverbs_std_types_dmah.c   | 28 +++++++++++++++++++
> > > > drivers/infiniband/core/uverbs_std_types_mr.c |  3 ++
> > > > drivers/infiniband/hw/mlx5/dmah.c             |  5 ++--
> > > > .../net/ethernet/mellanox/mlx5/core/lib/st.c  | 12 +++++---
> > > > include/linux/mlx5/driver.h                   |  4 +--
> > > > include/rdma/ib_verbs.h                       |  2 ++
> > > > include/uapi/rdma/ib_user_ioctl_cmds.h        |  1 +
> > > > 7 files changed, 46 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/drivers/infiniband/core/uverbs_std_types_dmah.c b/drivers/infiniband/core/uverbs_std_types_dmah.c
> > > > index 453ce656c6f2..1ef400f96965 100644
> > > > --- a/drivers/infiniband/core/uverbs_std_types_dmah.c
> > > > +++ b/drivers/infiniband/core/uverbs_std_types_dmah.c
> > > > @@ -61,6 +61,27 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DMAH_ALLOC)(
> > > >               dmah->valid_fields |= BIT(IB_DMAH_MEM_TYPE_EXISTS);
> > > >       }
> > > >
> > > > +     if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL)) {
> > > > +             ret = uverbs_copy_from(&dmah->direct_st_val, attrs,
> > > > +                                    UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL);
> > > > +             if (ret)
> > > > +                     goto err;
> > >
> > > This should not come from userspace, the dmabuf exporter should
> > > provide any TPH hints as part of the attachment process.
> > > 
> > > We are trying not to allow userspace raw access to the TPH values, so
> > > this is not a desirable UAPI here.
> > > 
> > Thanks for your feedback!
> > 
> > I understand the concern about not exposing raw TPH values to
> > userspace.  To clarify, would it be acceptable to use an index-based
> > mapping table, where userspace provides an index and the kernel
> > translates it to the appropriate TPH value? Given that the PCIe spec
> > allows up to 16-bit TPH values, this could require a mapping table
> > of up to 128KB. Do you see this as a reasonable approach, or is
> > there a preferred alternative?
>
> ?
>
> The issue here is to secure the TPH. The kernel driver that owns the
> exporting device should control what TPH values an importing driver
> will use.
>
> I don't see how an indirection table helps anything, you need to add
> an API to DMABUF to retrieve the tph.

I see, thanks for the clarification. Yes we can add and use another new
API(s) for this purpose.

Sorry for the delay: I was waiting for the final version of Leon's
vfio-dmabuf patch series and plan to follow that for implementing the new
API(s) needed.
(https://lore.kernel.org/all/20251120-dmabuf-vfio-v9-6-d7f71607f371@nvidia.com/).

>
> > Additionally, in cases where the dmabuf exporter device can handle all possible 16-bit
> > TPH values  (i.e., it has its own internal mapping logic or table), should this still be
> > entirely abstracted away from userspace?
>
> I imagine the exporting device provides the raw on the wire TPH value
> it wants the importing device to use and the importing device is
> responsible to program it using whatever scheme it has.
>
> Jason

Can you suggest or elaborate a bit on the schmes you see feasible?

When the exporting device supports all or multiple TPH values, it is
desirable to have userspace processes select which TPH values to use
for the dmabuf at runtime. Actually that is the main use case of this
patch: the user can select the TPH values to associate desired P2P
operations on the dmabuf. The difficulty is how we can provide this
flexibility while still aligning with kernel and security best
practices.

Zhiping



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access
  2025-12-04  8:10         ` Zhiping Zhang
@ 2025-12-27 19:22           ` Zhiping Zhang
  0 siblings, 0 replies; 13+ messages in thread
From: Zhiping Zhang @ 2025-12-27 19:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev,
	Keith Busch, Yochai Cohen, Yishai Hadas, Zhiping Zhang

On Thur 2025-12-04  8:10 UTC Zhiping Zhang wrote:

> On Monday 2025-11-20 13:11 UTC, Jason Gunthorpe wrote:
> >
> > Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access
> >
> > On Wed, Nov 19, 2025 at 11:24:40PM -0800, Zhiping Zhang wrote:
> > > On Monday, November 17, 2025 at 8:00 AM, Jason Gunthorpe wrote:
> > > > Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access
> > > >
> > > > On Thu, Nov 13, 2025 at 01:37:12PM -0800, Zhiping Zhang wrote:
> > > > > RDMA: Set steering-tag value directly in DMAH struct for DMABUF MR
> > > > >
> > > > > This patch enables construction of a dma handler (DMAH) with the P2P memory type
> > > > > and a direct steering-tag value. It can be used to register a RDMA memory
> > > > > region with DMABUF for the RDMA NIC to access the other device's memory via P2P.
> > > > >
> > > > > Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> > > > > ---
> > > > > .../infiniband/core/uverbs_std_types_dmah.c   | 28 +++++++++++++++++++
> > > > > drivers/infiniband/core/uverbs_std_types_mr.c |  3 ++
> > > > > drivers/infiniband/hw/mlx5/dmah.c             |  5 ++--
> > > > > .../net/ethernet/mellanox/mlx5/core/lib/st.c  | 12 +++++---
> > > > > include/linux/mlx5/driver.h                   |  4 +--
> > > > > include/rdma/ib_verbs.h                       |  2 ++
> > > > > include/uapi/rdma/ib_user_ioctl_cmds.h        |  1 +
> > > > > 7 files changed, 46 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/drivers/infiniband/core/uverbs_std_types_dmah.c b/drivers/infiniband/core/uverbs_std_types_dmah.c
> > > > > index 453ce656c6f2..1ef400f96965 100644
> > > > > --- a/drivers/infiniband/core/uverbs_std_types_dmah.c
> > > > > +++ b/drivers/infiniband/core/uverbs_std_types_dmah.c
> > > > > @@ -61,6 +61,27 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DMAH_ALLOC)(
> > > > >              dmah->valid_fields |= BIT(IB_DMAH_MEM_TYPE_EXISTS);
> > > > >      }
> > > > >
> > > > > +     if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL)) {
> > > > > +             ret = uverbs_copy_from(&dmah->direct_st_val, attrs,
> > > > > +                                    UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL);
> > > > > +             if (ret)
> > > > > +                     goto err;
> > > >
> > > > This should not come from userspace, the dmabuf exporter should
> > > > provide any TPH hints as part of the attachment process.
> > > >
> > > > We are trying not to allow userspace raw access to the TPH values, so
> > > > this is not a desirable UAPI here.
> > > >
> > > > Thanks for your feedback!
> > >
> > > I understand the concern about not exposing raw TPH values to
> > > userspace.  To clarify, would it be acceptable to use an index-based
> > > mapping table, where userspace provides an index and the kernel
> > > translates it to the appropriate TPH value? Given that the PCIe spec
> > > allows up to 16-bit TPH values, this could require a mapping table
> > > of up to 128KB. Do you see this as a reasonable approach, or is
> > > there a preferred alternative?
> >
> > ?
> >
> > The issue here is to secure the TPH. The kernel driver that owns the
> > exporting device should control what TPH values an importing driver
> > will use.
> >
> > I don't see how an indirection table helps anything, you need to add
> > an API to DMABUF to retrieve the tph.

> I see, thanks for the clarification. Yes we can add and use another new
> API(s) for this purpose.

> Sorry for the delay: I was waiting for the final version of Leon's
> vfio-dmabuf patch series and plan to follow that for implementing the new
> API(s) needed.
> (https://lore.kernel.org/all/20251120-dmabuf-vfio-v9-6-d7f71607f371@nvidia.com/).
>
> >
> > > Additionally, in cases where the dmabuf exporter device can handle all possible 16-bit
> > > TPH values  (i.e., it has its own internal mapping logic or table), should this still be
> > > entirely abstracted away from userspace?
> >
> > I imagine the exporting device provides the raw on the wire TPH value
> > it wants the importing device to use and the importing device is
> > responsible to program it using whatever scheme it has.
> >
> > Jason
>
> Can you suggest or elaborate a bit on the schmes you see feasible?
>
> When the exporting device supports all or multiple TPH values, it is
> desirable to have userspace processes select which TPH values to use
> for the dmabuf at runtime. Actually that is the main use case of this
> patch: the user can select the TPH values to associate desired P2P
> operations on the dmabuf. The difficulty is how we can provide this
> flexibility while still aligning with kernel and security best
> practices.
>
> Zhiping

Happy holidays! I went through the vfio-dmabuf patch series and Jason's
comments once more. I think I have a proposal that addresses the concerns.

For p2p or dmabuf use cases, we pass in an ID or fd similar to CPU_ID when
allocating a dmah, and make a callback to the dmabuf exporter to get the
TPH value associated with the fd. That involves adding a new dmabuf operation
for the callback to get the TPH/tag value associated.

I can start with vfio-dmabuf and add the new dmabuf op/ABI there based on
Leon's patch. Pls let me know if you have any concerns or suggestions.

Zhiping

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 2/2] [fix] mlx5: modifications for use cases other than CPU
  2025-11-13 21:37 [RFC 0/2] Set steering-tag directly for PCIe P2P memory access Zhiping Zhang
  2025-11-13 21:37 ` [RFC 1/2] " Zhiping Zhang
  2025-11-13 21:37 ` [RFC 2/2] " Zhiping Zhang
@ 2026-01-03  5:38 ` Zhiping Zhang
  2 siblings, 0 replies; 13+ messages in thread
From: Zhiping Zhang @ 2026-01-03  5:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev,
	Keith Busch, Yochai Cohen, Yishai Hadas, Zhiping Zhang

In order to set the tag value properly besides the CPU use case, we need
to also fix and modify the few checks on CPU_ID in mlx5 RDMA code.

Signed-off-by: Zhiping Zhang <zhipingz@meta.com>

> [RFC 2/2] RDMA: Set steering-tag value directly for P2P memory access
>
> Currently, the steering tag can be used for a CPU on the motherboard; the
> ACPI check is in place to query and obtain the supported steering tag. This
> same check is not possible for the accelerator devices because they are
> designed to be plug-and-play to and ownership can not be always confirmed.
>
> We intend to use the steering tag to improve RDMA NIC memory access on a GPU
> or accelerator device via PCIe peer-to-peer. An application can construct a
> dma handler (DMAH) with the device memory type and a direct steering-tag
> value, and this DMAH can be used to register a RDMA memory region with DMABUF
> for the RDMA NIC to access the device memory. The steering tag contains
> additional instructions or hints to the GPU or accelerator device for
> advanced memory operations, such as, read cache selection.
>
> Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
---
 drivers/infiniband/hw/mlx5/dmah.c | 3 ++-
 drivers/infiniband/hw/mlx5/mr.c   | 6 ++++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/dmah.c b/drivers/infiniband/hw/mlx5/dmah.c
index 98c8d3313653..c0d8532f94ac 100644
--- a/drivers/infiniband/hw/mlx5/dmah.c
+++ b/drivers/infiniband/hw/mlx5/dmah.c
@@ -41,7 +41,8 @@ static int mlx5_ib_dealloc_dmah(struct ib_dmah *ibdmah,
 	struct mlx5_ib_dmah *dmah = to_mdmah(ibdmah);
 	struct mlx5_core_dev *mdev = to_mdev(ibdmah->device)->mdev;
 
-	if (ibdmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS))
+	if (ibdmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS) ||
+	    ibdmah->valid_fields & BIT(IB_DMAH_DIRECT_ST_VAL_EXISTS))
 		return mlx5_st_dealloc_index(mdev, dmah->st_index);
 
 	return 0;
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index d4917d5c2efa..fb0e0c5826c2 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1470,7 +1470,8 @@ static struct ib_mr *create_real_mr(struct ib_pd *pd, struct ib_umem *umem,
 		struct mlx5_ib_dmah *mdmah = to_mdmah(dmah);
 
 		ph = dmah->ph;
-		if (dmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS))
+		if (dmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS) ||
+			dmah->valid_fields & BIT(IB_DMAH_DIRECT_ST_VAL_EXISTS))
 			st_index = mdmah->st_index;
 	}
 
@@ -1660,7 +1661,8 @@ reg_user_mr_dmabuf(struct ib_pd *pd, struct device *dma_device,
 		struct mlx5_ib_dmah *mdmah = to_mdmah(dmah);
 
 		ph = dmah->ph;
-		if (dmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS))
+		if (dmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS) ||
+			dmah->valid_fields & BIT(IB_DMAH_DIRECT_ST_VAL_EXISTS))
 			st_index = mdmah->st_index;
 	}
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-01-03  5:39 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-13 21:37 [RFC 0/2] Set steering-tag directly for PCIe P2P memory access Zhiping Zhang
2025-11-13 21:37 ` [RFC 1/2] " Zhiping Zhang
2025-11-14 13:12   ` Jonathan Cameron
2025-11-18  0:50     ` zhipingz
2025-11-24 21:27   ` Bjorn Helgaas
2025-12-01 17:43     ` Zhiping Zhang
2025-11-13 21:37 ` [RFC 2/2] " Zhiping Zhang
2025-11-17 16:00   ` Jason Gunthorpe
2025-11-20  7:24     ` Zhiping Zhang
2025-11-20 13:11       ` Jason Gunthorpe
2025-12-04  8:10         ` Zhiping Zhang
2025-12-27 19:22           ` Zhiping Zhang
2026-01-03  5:38 ` [RFC 2/2] [fix] mlx5: modifications for use cases other than CPU Zhiping Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).