Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* [PATCH for-next] RDMA/efa: Expose device P2P DMA support via device query
@ 2026-05-03 15:02 Yonatan Nachum
  2026-05-04  7:34 ` Jason Gunthorpe
  0 siblings, 1 reply; 4+ messages in thread
From: Yonatan Nachum @ 2026-05-03 15:02 UTC (permalink / raw)
  To: jgg, leon, linux-rdma
  Cc: mrgolin, sleybo, matua, gal.pressman, Yonatan Nachum,
	Yehuda Yitschak

Expose device P2P DMA support using the query device verbs.
If the device support P2P DMA, it can DMA directly to and from a peer
PCIe device

Reviewed-by: Michael Margolin <mrgolin@amazon.com>
Reviewed-by: Yehuda Yitschak <yehuday@amazon.com>
Signed-off-by: Yonatan Nachum <ynachum@amazon.com>
---
 drivers/infiniband/hw/efa/efa_admin_cmds_defs.h | 10 +++++++++-
 drivers/infiniband/hw/efa/efa_verbs.c           |  3 +++
 include/uapi/rdma/efa-abi.h                     |  1 +
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/efa/efa_admin_cmds_defs.h b/drivers/infiniband/hw/efa/efa_admin_cmds_defs.h
index ad34ea5da6b0..097b3303f3e9 100644
--- a/drivers/infiniband/hw/efa/efa_admin_cmds_defs.h
+++ b/drivers/infiniband/hw/efa/efa_admin_cmds_defs.h
@@ -725,7 +725,11 @@ struct efa_admin_feature_device_attr_desc {
 	 *    on TX queues
 	 * 4 : unsolicited_write_recv - If set, unsolicited
 	 *    write with imm. receive is supported
-	 * 31:5 : reserved - MBZ
+	 * 5 : event_counters - If set, event counters are
+	 *    supported
+	 * 6 : p2p_dma - If set the device can DMA directly
+	 *    to and from a peer PCIe device
+	 * 31:7 : reserved - MBZ
 	 */
 	u32 device_caps;
 
@@ -1132,6 +1136,10 @@ struct efa_admin_host_info {
 #define EFA_ADMIN_FEATURE_DEVICE_ATTR_DESC_DATA_POLLING_128_MASK BIT(2)
 #define EFA_ADMIN_FEATURE_DEVICE_ATTR_DESC_RDMA_WRITE_MASK  BIT(3)
 #define EFA_ADMIN_FEATURE_DEVICE_ATTR_DESC_UNSOLICITED_WRITE_RECV_MASK BIT(4)
+#define EFA_ADMIN_FEATURE_DEVICE_ATTR_DESC_EVENT_COUNTERS_SHIFT 5
+#define EFA_ADMIN_FEATURE_DEVICE_ATTR_DESC_EVENT_COUNTERS_MASK BIT(5)
+#define EFA_ADMIN_FEATURE_DEVICE_ATTR_DESC_P2P_DMA_SHIFT    6
+#define EFA_ADMIN_FEATURE_DEVICE_ATTR_DESC_P2P_DMA_MASK     BIT(6)
 
 /* create_eq_cmd */
 #define EFA_ADMIN_CREATE_EQ_CMD_ENTRY_SIZE_WORDS_MASK       GENMASK(4, 0)
diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index 7bd0838ebc99..b16f470f7d30 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -270,6 +270,9 @@ int efa_query_device(struct ib_device *ibdev,
 		if (EFA_DEV_CAP(dev, UNSOLICITED_WRITE_RECV))
 			resp.device_caps |= EFA_QUERY_DEVICE_CAPS_UNSOLICITED_WRITE_RECV;
 
+		if (EFA_DEV_CAP(dev, P2P_DMA))
+			resp.device_caps |= EFA_QUERY_DEVICE_CAPS_P2P_DMA;
+
 		if (dev->neqs)
 			resp.device_caps |= EFA_QUERY_DEVICE_CAPS_CQ_NOTIFICATIONS;
 
diff --git a/include/uapi/rdma/efa-abi.h b/include/uapi/rdma/efa-abi.h
index d5c18f8de182..d19cb59d822d 100644
--- a/include/uapi/rdma/efa-abi.h
+++ b/include/uapi/rdma/efa-abi.h
@@ -133,6 +133,7 @@ enum {
 	EFA_QUERY_DEVICE_CAPS_RDMA_WRITE = 1 << 5,
 	EFA_QUERY_DEVICE_CAPS_UNSOLICITED_WRITE_RECV = 1 << 6,
 	EFA_QUERY_DEVICE_CAPS_CQ_WITH_EXT_MEM = 1 << 7,
+	EFA_QUERY_DEVICE_CAPS_P2P_DMA = 1 << 8,
 };
 
 struct efa_ibv_ex_query_device_resp {
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH for-next] RDMA/efa: Expose device P2P DMA support via device query
  2026-05-03 15:02 [PATCH for-next] RDMA/efa: Expose device P2P DMA support via device query Yonatan Nachum
@ 2026-05-04  7:34 ` Jason Gunthorpe
  2026-05-05  8:15   ` Yonatan Nachum
  0 siblings, 1 reply; 4+ messages in thread
From: Jason Gunthorpe @ 2026-05-04  7:34 UTC (permalink / raw)
  To: Yonatan Nachum
  Cc: leon, linux-rdma, mrgolin, sleybo, matua, gal.pressman,
	Yehuda Yitschak

On Sun, May 03, 2026 at 03:02:46PM +0000, Yonatan Nachum wrote:
> Expose device P2P DMA support using the query device verbs.
> If the device support P2P DMA, it can DMA directly to and from a peer
> PCIe device

This doesn't seem right, this should be policed by failing to
established p2p mappings and to fail mapping dmabufs not with random
user space bits like this.

There are lots of things in our system that need this feedback to go
down that path.

Jason

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH for-next] RDMA/efa: Expose device P2P DMA support via device query
  2026-05-04  7:34 ` Jason Gunthorpe
@ 2026-05-05  8:15   ` Yonatan Nachum
  2026-05-05 15:40     ` Jason Gunthorpe
  0 siblings, 1 reply; 4+ messages in thread
From: Yonatan Nachum @ 2026-05-05  8:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: leon, linux-rdma, mrgolin, sleybo, matua, gal.pressman,
	Yehuda Yitschak

On Mon, May 04, 2026 at 04:34:12AM -0300, Jason Gunthorpe wrote:
> On Sun, May 03, 2026 at 03:02:46PM +0000, Yonatan Nachum wrote:
> > Expose device P2P DMA support using the query device verbs.
> > If the device support P2P DMA, it can DMA directly to and from a peer
> > PCIe device
> 
> This doesn't seem right, this should be policed by failing to
> established p2p mappings and to fail mapping dmabufs not with random
> user space bits like this.

The motivation here is to avoid requiring userspace to speculatively
attempt registering accelerator MRs just to discover whether the device
supports P2P DMA. Beyond the API awkwardness, this also has a real
performance impact — initializing an accelerator context can take
seconds, and by advertising this as a device capability, userspace can
know upfront whether it's worth going down that path.

If I understand your suggestion correctly, you'd prefer to enforce this
in the reg MR path itself rather than exposing a capability bit. I see
two issues with that approach:
1. Userspace would still need to speculatively attempt a reg MR to
discover P2P support.

2. When we get a dmabuf, I can't distinguish what is the backing memory,
for example DRAM or accelerator memory and we can't just blindly fail
all cases.

Can you please clarify how you imagine this capbility to be used ?

> There are lots of things in our system that need this feedback to go
> down that path.

Can you please clarify what you mean here ?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH for-next] RDMA/efa: Expose device P2P DMA support via device query
  2026-05-05  8:15   ` Yonatan Nachum
@ 2026-05-05 15:40     ` Jason Gunthorpe
  0 siblings, 0 replies; 4+ messages in thread
From: Jason Gunthorpe @ 2026-05-05 15:40 UTC (permalink / raw)
  To: Yonatan Nachum
  Cc: leon, linux-rdma, mrgolin, sleybo, matua, gal.pressman,
	Yehuda Yitschak

On Tue, May 05, 2026 at 08:15:14AM +0000, Yonatan Nachum wrote:
> On Mon, May 04, 2026 at 04:34:12AM -0300, Jason Gunthorpe wrote:
> > On Sun, May 03, 2026 at 03:02:46PM +0000, Yonatan Nachum wrote:
> > > Expose device P2P DMA support using the query device verbs.
> > > If the device support P2P DMA, it can DMA directly to and from a peer
> > > PCIe device
> > 
> > This doesn't seem right, this should be policed by failing to
> > established p2p mappings and to fail mapping dmabufs not with random
> > user space bits like this.
> 
> The motivation here is to avoid requiring userspace to speculatively
> attempt registering accelerator MRs just to discover whether the device
> supports P2P DMA.

This isn't how the kernel works there isn't a "this device can do
p2p", the question is always answered with pairs of devices. There is
no way for a single device to know it doesn't do p2p with any other
device in the system.

> Beyond the API awkwardness, this also has a real
> performance impact — initializing an accelerator context can take
> seconds, and by advertising this as a device capability, userspace can
> know upfront whether it's worth going down that path.

This seems like a misconfiguration. Since so much of this topologcal
information is not discoverable at run time usually systems have an
external data file explaining how to best use the system. I wouldn't
expect the runtime to be trying to probe this at run time.

If you *really* want probing then it has to be dmabuf based because
you must probe with pairs of PCI devices.

> 2. When we get a dmabuf, I can't distinguish what is the backing memory,
> for example DRAM or accelerator memory and we can't just blindly fail
> all cases.

You must rely on the p2p subsystem to make this determination and the
dmabuf exporter has to call into it. This will deal with those
problems.

Jason

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-05 15:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-03 15:02 [PATCH for-next] RDMA/efa: Expose device P2P DMA support via device query Yonatan Nachum
2026-05-04  7:34 ` Jason Gunthorpe
2026-05-05  8:15   ` Yonatan Nachum
2026-05-05 15:40     ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox