* [PATCH for-next v4 0/5] Introduce Completion Counters
@ 2026-05-11 22:37 Michael Margolin
2026-05-11 22:37 ` [PATCH for-next v4 1/5] RDMA/core: Add Completion Counters support Michael Margolin
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Michael Margolin @ 2026-05-11 22:37 UTC (permalink / raw)
To: jgg, leon, linux-rdma; +Cc: sleybo, matua, gal.pressman
Add core infrastructure for Completion Counters, a light-weight
alternative to polling CQ for tracking operation completions. The
related rdma-core interface proposal is linked in [1].
Define the UVERBS_OBJECT_COMP_CNTR ioctl object with create, destroy,
modify and read methods for both success and error counters. Add a QP
attach method on the QP object to associate a completion counter with a
queue pair.
Add EFA Completion Counters support as first implementer.
[1] https://github.com/linux-rdma/rdma-core/pull/1701
---
Changes in v4:
- Replaced inc and set commands by a single modify command
- Changed to passing buffers as EFA specific attributes using desc
struct aligned with the suggested common method of passing and
consuming umem in RDMA drivers
- Link to v2: https://lore.kernel.org/all/20260416212327.18191-1-mrgolin@amazon.com/
Changes in v3:
- Skipped this version because of a wrong patch list
Changes in v2:
- United set, inc and read flows for successful and error completions
counters
- Added comp_cntr usage count
- Minor cleanups
- Link to v1: https://lore.kernel.org/all/20260407115424.13359-1-mrgolin@amazon.com/
*** BLURB HERE ***
Michael Margolin (5):
RDMA/core: Add Completion Counters support
RDMA/core: Prevent destroying in-use completion counters
RDMA/core: Add Completion Counters to resource tracking
RDMA/efa: Update device interface
RDMA/efa: Add Completion Counters support
drivers/infiniband/core/Makefile | 1 +
drivers/infiniband/core/device.c | 6 +
drivers/infiniband/core/nldev.c | 1 +
drivers/infiniband/core/rdma_core.h | 1 +
drivers/infiniband/core/restrack.c | 2 +
drivers/infiniband/core/uverbs_cmd.c | 1 +
.../core/uverbs_std_types_comp_cntr.c | 183 ++++++++++++++
drivers/infiniband/core/uverbs_std_types_qp.c | 65 ++++-
drivers/infiniband/core/uverbs_uapi.c | 1 +
drivers/infiniband/core/verbs.c | 1 +
drivers/infiniband/hw/efa/efa.h | 17 +-
.../infiniband/hw/efa/efa_admin_cmds_defs.h | 187 +++++++++++++-
drivers/infiniband/hw/efa/efa_com_cmd.c | 106 ++++++++
drivers/infiniband/hw/efa/efa_com_cmd.h | 36 +++
drivers/infiniband/hw/efa/efa_io_defs.h | 64 ++++-
drivers/infiniband/hw/efa/efa_main.c | 7 +-
drivers/infiniband/hw/efa/efa_verbs.c | 229 ++++++++++++++++++
include/rdma/ib_verbs.h | 44 ++++
include/rdma/restrack.h | 4 +
include/uapi/rdma/efa-abi.h | 19 ++
include/uapi/rdma/ib_user_ioctl_cmds.h | 38 +++
include/uapi/rdma/ib_user_ioctl_verbs.h | 19 ++
include/uapi/rdma/ib_user_verbs.h | 2 +-
23 files changed, 1024 insertions(+), 10 deletions(-)
create mode 100644 drivers/infiniband/core/uverbs_std_types_comp_cntr.c
--
2.47.3
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH for-next v4 1/5] RDMA/core: Add Completion Counters support
2026-05-11 22:37 [PATCH for-next v4 0/5] Introduce Completion Counters Michael Margolin
@ 2026-05-11 22:37 ` Michael Margolin
2026-05-11 22:37 ` [PATCH for-next v4 2/5] RDMA/core: Prevent destroying in-use completion counters Michael Margolin
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Michael Margolin @ 2026-05-11 22:37 UTC (permalink / raw)
To: jgg, leon, linux-rdma; +Cc: sleybo, matua, gal.pressman, Yonatan Nachum
Add core infrastructure for Completion Counters, a light-weight
alternative to polling CQ for tracking operation completions.
Define the UVERBS_OBJECT_COMP_CNTR ioctl object with create, destroy,
modify and read methods for both success and error counters. Add a QP
attach method on the QP object to associate a completion counter with a
queue pair.
Add ib_comp_cntr struct, ib_comp_cntr_attach_attr, device ops, and
DECLARE_RDMA_OBJ_SIZE for driver object allocation.
Only userspace Completion Counters are supported at this stage.
Reviewed-by: Yonatan Nachum <ynachum@amazon.com>
Signed-off-by: Michael Margolin <mrgolin@amazon.com>
---
drivers/infiniband/core/Makefile | 1 +
drivers/infiniband/core/device.c | 6 +
drivers/infiniband/core/rdma_core.h | 1 +
drivers/infiniband/core/uverbs_cmd.c | 1 +
.../core/uverbs_std_types_comp_cntr.c | 174 ++++++++++++++++++
drivers/infiniband/core/uverbs_std_types_qp.c | 45 ++++-
drivers/infiniband/core/uverbs_uapi.c | 1 +
include/rdma/ib_verbs.h | 40 ++++
include/uapi/rdma/ib_user_ioctl_cmds.h | 38 ++++
include/uapi/rdma/ib_user_ioctl_verbs.h | 19 ++
include/uapi/rdma/ib_user_verbs.h | 2 +-
11 files changed, 326 insertions(+), 2 deletions(-)
create mode 100644 drivers/infiniband/core/uverbs_std_types_comp_cntr.c
diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index dce798d8cfe6..4767339608a1 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -35,6 +35,7 @@ ib_umad-y := user_mad.o
ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
rdma_core.o uverbs_std_types.o uverbs_ioctl.o \
uverbs_std_types_cq.o \
+ uverbs_std_types_comp_cntr.o \
uverbs_std_types_dmabuf.o \
uverbs_std_types_dmah.o \
uverbs_std_types_flow_action.o uverbs_std_types_dm.o \
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index b89efaaa81ec..18d809e59afa 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2734,6 +2734,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, create_ah);
SET_DEVICE_OP(dev_ops, create_counters);
SET_DEVICE_OP(dev_ops, create_cq);
+ SET_DEVICE_OP(dev_ops, create_comp_cntr);
SET_DEVICE_OP(dev_ops, create_user_cq);
SET_DEVICE_OP(dev_ops, create_flow);
SET_DEVICE_OP(dev_ops, create_qp);
@@ -2754,6 +2755,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, destroy_ah);
SET_DEVICE_OP(dev_ops, destroy_counters);
SET_DEVICE_OP(dev_ops, destroy_cq);
+ SET_DEVICE_OP(dev_ops, destroy_comp_cntr);
SET_DEVICE_OP(dev_ops, destroy_flow);
SET_DEVICE_OP(dev_ops, destroy_flow_action);
SET_DEVICE_OP(dev_ops, destroy_qp);
@@ -2805,6 +2807,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, modify_hw_stat);
SET_DEVICE_OP(dev_ops, modify_port);
SET_DEVICE_OP(dev_ops, modify_qp);
+ SET_DEVICE_OP(dev_ops, qp_attach_comp_cntr);
SET_DEVICE_OP(dev_ops, modify_srq);
SET_DEVICE_OP(dev_ops, modify_wq);
SET_DEVICE_OP(dev_ops, peek_cq);
@@ -2828,12 +2831,14 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, query_ucontext);
SET_DEVICE_OP(dev_ops, rdma_netdev_get_params);
SET_DEVICE_OP(dev_ops, read_counters);
+ SET_DEVICE_OP(dev_ops, read_comp_cntr);
SET_DEVICE_OP(dev_ops, reg_dm_mr);
SET_DEVICE_OP(dev_ops, reg_user_mr);
SET_DEVICE_OP(dev_ops, reg_user_mr_dmabuf);
SET_DEVICE_OP(dev_ops, req_notify_cq);
SET_DEVICE_OP(dev_ops, rereg_user_mr);
SET_DEVICE_OP(dev_ops, resize_user_cq);
+ SET_DEVICE_OP(dev_ops, modify_comp_cntr);
SET_DEVICE_OP(dev_ops, set_vf_guid);
SET_DEVICE_OP(dev_ops, set_vf_link_state);
SET_DEVICE_OP(dev_ops, ufile_hw_cleanup);
@@ -2842,6 +2847,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_OBJ_SIZE(dev_ops, ib_ah);
SET_OBJ_SIZE(dev_ops, ib_counters);
SET_OBJ_SIZE(dev_ops, ib_cq);
+ SET_OBJ_SIZE(dev_ops, ib_comp_cntr);
SET_OBJ_SIZE(dev_ops, ib_dmah);
SET_OBJ_SIZE(dev_ops, ib_mw);
SET_OBJ_SIZE(dev_ops, ib_pd);
diff --git a/drivers/infiniband/core/rdma_core.h b/drivers/infiniband/core/rdma_core.h
index 269b393799ab..2569550e4c6d 100644
--- a/drivers/infiniband/core/rdma_core.h
+++ b/drivers/infiniband/core/rdma_core.h
@@ -156,6 +156,7 @@ uverbs_api_ioctl_handler_fn uverbs_get_handler_fn(struct ib_udata *udata);
extern const struct uapi_definition uverbs_def_obj_async_fd[];
extern const struct uapi_definition uverbs_def_obj_counters[];
+extern const struct uapi_definition uverbs_def_obj_comp_cntr[];
extern const struct uapi_definition uverbs_def_obj_cq[];
extern const struct uapi_definition uverbs_def_obj_device[];
extern const struct uapi_definition uverbs_def_obj_dm[];
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index a768436ba468..4bc493b3b624 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3673,6 +3673,7 @@ static int ib_uverbs_ex_query_device(struct uverbs_attr_bundle *attrs)
resp.cq_moderation_caps.max_cq_moderation_period =
attr.cq_caps.max_cq_moderation_period;
resp.max_dm_size = attr.max_dm_size;
+ resp.max_comp_cntr = attr.max_comp_cntr;
resp.response_length = uverbs_response_length(attrs, sizeof(resp));
return uverbs_response(attrs, &resp, sizeof(resp));
diff --git a/drivers/infiniband/core/uverbs_std_types_comp_cntr.c b/drivers/infiniband/core/uverbs_std_types_comp_cntr.c
new file mode 100644
index 000000000000..c1cf0f59d483
--- /dev/null
+++ b/drivers/infiniband/core/uverbs_std_types_comp_cntr.c
@@ -0,0 +1,174 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright Amazon.com, Inc. or its affiliates. All rights reserved.
+ */
+
+#include <rdma/uverbs_std_types.h>
+#include "rdma_core.h"
+#include "uverbs.h"
+
+static int uverbs_free_comp_cntr(struct ib_uobject *uobject, enum rdma_remove_reason why,
+ struct uverbs_attr_bundle *attrs)
+{
+ struct ib_comp_cntr *cc = uobject->object;
+ int ret;
+
+ ret = cc->device->ops.destroy_comp_cntr(cc);
+ if (ret)
+ return ret;
+
+ kfree(cc);
+ return 0;
+}
+
+static int UVERBS_HANDLER(UVERBS_METHOD_COMP_CNTR_CREATE)(struct uverbs_attr_bundle *attrs)
+{
+ struct ib_uobject *uobj = uverbs_attr_get_uobject(attrs,
+ UVERBS_ATTR_CREATE_COMP_CNTR_HANDLE);
+ struct ib_device *ib_dev = attrs->context->device;
+ struct ib_comp_cntr *cc;
+ int ret;
+
+ if (!ib_dev->ops.create_comp_cntr ||
+ !ib_dev->ops.destroy_comp_cntr ||
+ !ib_dev->ops.qp_attach_comp_cntr)
+ return -EOPNOTSUPP;
+
+ cc = rdma_zalloc_drv_obj(ib_dev, ib_comp_cntr);
+ if (!cc)
+ return -ENOMEM;
+
+ cc->device = ib_dev;
+ cc->uobject = uobj;
+
+ ret = ib_dev->ops.create_comp_cntr(cc, attrs);
+ if (ret)
+ goto err_free;
+
+ uobj->object = cc;
+ uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_CREATE_COMP_CNTR_HANDLE);
+
+ ret = uverbs_copy_to(attrs, UVERBS_ATTR_CREATE_COMP_CNTR_RESP_COUNT_MAX_VALUE,
+ &cc->comp_count_max_value, sizeof(cc->comp_count_max_value));
+ if (ret)
+ return ret;
+
+ ret = uverbs_copy_to(attrs, UVERBS_ATTR_CREATE_COMP_CNTR_RESP_ERR_COUNT_MAX_VALUE,
+ &cc->err_count_max_value, sizeof(cc->err_count_max_value));
+ return ret;
+
+err_free:
+ kfree(cc);
+ return ret;
+}
+
+static int UVERBS_HANDLER(UVERBS_METHOD_COMP_CNTR_MODIFY)(struct uverbs_attr_bundle *attrs)
+{
+ struct ib_comp_cntr *cc = uverbs_attr_get_obj(attrs, UVERBS_ATTR_MODIFY_COMP_CNTR_HANDLE);
+ enum ib_comp_cntr_modify_op op;
+ enum ib_comp_cntr_entry entry;
+ u64 value;
+ int ret;
+
+ if (!cc->device->ops.modify_comp_cntr)
+ return -EOPNOTSUPP;
+
+ ret = uverbs_get_const(&entry, attrs, UVERBS_ATTR_MODIFY_COMP_CNTR_ENTRY);
+ if (ret)
+ return ret;
+
+ ret = uverbs_get_const(&op, attrs, UVERBS_ATTR_MODIFY_COMP_CNTR_OP);
+ if (ret)
+ return ret;
+
+ ret = uverbs_copy_from(&value, attrs, UVERBS_ATTR_MODIFY_COMP_CNTR_VALUE);
+ if (ret)
+ return ret;
+
+ return cc->device->ops.modify_comp_cntr(cc, entry, op, value);
+}
+
+static int UVERBS_HANDLER(UVERBS_METHOD_COMP_CNTR_READ)(struct uverbs_attr_bundle *attrs)
+{
+ struct ib_comp_cntr *cc = uverbs_attr_get_obj(attrs, UVERBS_ATTR_READ_COMP_CNTR_HANDLE);
+ enum ib_comp_cntr_entry entry;
+ u64 value;
+ int ret;
+
+ if (!cc->device->ops.read_comp_cntr)
+ return -EOPNOTSUPP;
+
+ ret = uverbs_get_const(&entry, attrs, UVERBS_ATTR_READ_COMP_CNTR_ENTRY);
+ if (ret)
+ return ret;
+
+ ret = cc->device->ops.read_comp_cntr(cc, entry, &value);
+ if (ret)
+ return ret;
+
+ return uverbs_copy_to(attrs, UVERBS_ATTR_READ_COMP_CNTR_RESP_VALUE, &value, sizeof(value));
+}
+
+DECLARE_UVERBS_NAMED_METHOD(
+ UVERBS_METHOD_COMP_CNTR_CREATE,
+ UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_COMP_CNTR_HANDLE,
+ UVERBS_OBJECT_COMP_CNTR,
+ UVERBS_ACCESS_NEW,
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_CREATE_COMP_CNTR_RESP_COUNT_MAX_VALUE,
+ UVERBS_ATTR_TYPE(u64),
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_CREATE_COMP_CNTR_RESP_ERR_COUNT_MAX_VALUE,
+ UVERBS_ATTR_TYPE(u64),
+ UA_MANDATORY),
+ UVERBS_ATTR_UHW());
+
+DECLARE_UVERBS_NAMED_METHOD_DESTROY(
+ UVERBS_METHOD_COMP_CNTR_DESTROY,
+ UVERBS_ATTR_IDR(UVERBS_ATTR_DESTROY_COMP_CNTR_HANDLE,
+ UVERBS_OBJECT_COMP_CNTR,
+ UVERBS_ACCESS_DESTROY,
+ UA_MANDATORY));
+
+DECLARE_UVERBS_NAMED_METHOD(
+ UVERBS_METHOD_COMP_CNTR_MODIFY,
+ UVERBS_ATTR_IDR(UVERBS_ATTR_MODIFY_COMP_CNTR_HANDLE,
+ UVERBS_OBJECT_COMP_CNTR,
+ UVERBS_ACCESS_WRITE,
+ UA_MANDATORY),
+ UVERBS_ATTR_CONST_IN(UVERBS_ATTR_MODIFY_COMP_CNTR_ENTRY,
+ enum ib_uverbs_comp_cntr_entry,
+ UA_MANDATORY),
+ UVERBS_ATTR_CONST_IN(UVERBS_ATTR_MODIFY_COMP_CNTR_OP,
+ enum ib_uverbs_comp_cntr_modify_op,
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_IN(UVERBS_ATTR_MODIFY_COMP_CNTR_VALUE,
+ UVERBS_ATTR_TYPE(u64),
+ UA_MANDATORY));
+
+DECLARE_UVERBS_NAMED_METHOD(
+ UVERBS_METHOD_COMP_CNTR_READ,
+ UVERBS_ATTR_IDR(UVERBS_ATTR_READ_COMP_CNTR_HANDLE,
+ UVERBS_OBJECT_COMP_CNTR,
+ UVERBS_ACCESS_READ,
+ UA_MANDATORY),
+ UVERBS_ATTR_CONST_IN(UVERBS_ATTR_READ_COMP_CNTR_ENTRY,
+ enum ib_uverbs_comp_cntr_entry,
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_READ_COMP_CNTR_RESP_VALUE,
+ UVERBS_ATTR_TYPE(u64),
+ UA_MANDATORY));
+
+DECLARE_UVERBS_NAMED_OBJECT(
+ UVERBS_OBJECT_COMP_CNTR,
+ UVERBS_TYPE_ALLOC_IDR(uverbs_free_comp_cntr),
+ &UVERBS_METHOD(UVERBS_METHOD_COMP_CNTR_CREATE),
+ &UVERBS_METHOD(UVERBS_METHOD_COMP_CNTR_DESTROY),
+ &UVERBS_METHOD(UVERBS_METHOD_COMP_CNTR_MODIFY),
+ &UVERBS_METHOD(UVERBS_METHOD_COMP_CNTR_READ));
+
+const struct uapi_definition uverbs_def_obj_comp_cntr[] = {
+ UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_COMP_CNTR,
+ UAPI_DEF_OBJ_NEEDS_FN(destroy_comp_cntr)),
+ {}
+};
diff --git a/drivers/infiniband/core/uverbs_std_types_qp.c b/drivers/infiniband/core/uverbs_std_types_qp.c
index be0730e8509e..dec4c0ebb41c 100644
--- a/drivers/infiniband/core/uverbs_std_types_qp.c
+++ b/drivers/infiniband/core/uverbs_std_types_qp.c
@@ -367,11 +367,54 @@ DECLARE_UVERBS_NAMED_METHOD(
UVERBS_ATTR_TYPE(struct ib_uverbs_destroy_qp_resp),
UA_MANDATORY));
+static int UVERBS_HANDLER(UVERBS_METHOD_QP_ATTACH_COMP_CNTR)(
+ struct uverbs_attr_bundle *attrs)
+{
+ struct ib_uobject *qp_uobj = uverbs_attr_get_uobject(
+ attrs, UVERBS_ATTR_QP_ATTACH_COMP_CNTR_HANDLE);
+ struct ib_comp_cntr *cc = uverbs_attr_get_obj(
+ attrs, UVERBS_ATTR_QP_ATTACH_COMP_CNTR_CNTR_HANDLE);
+ struct ib_comp_cntr_attach_attr attr = {};
+ struct ib_qp *qp = qp_uobj->object;
+ int ret;
+
+ if (!cc->device->ops.qp_attach_comp_cntr)
+ return -EOPNOTSUPP;
+
+ ret = uverbs_get_flags32(&attr.op_mask, attrs,
+ UVERBS_ATTR_QP_ATTACH_COMP_CNTR_OP_MASK,
+ IB_UVERBS_COMP_CNTR_ATTACH_OP_SEND |
+ IB_UVERBS_COMP_CNTR_ATTACH_OP_RECV |
+ IB_UVERBS_COMP_CNTR_ATTACH_OP_RDMA_READ |
+ IB_UVERBS_COMP_CNTR_ATTACH_OP_REMOTE_RDMA_READ |
+ IB_UVERBS_COMP_CNTR_ATTACH_OP_RDMA_WRITE |
+ IB_UVERBS_COMP_CNTR_ATTACH_OP_REMOTE_RDMA_WRITE);
+ if (ret)
+ return ret;
+
+ return qp->device->ops.qp_attach_comp_cntr(qp, cc, &attr);
+}
+
+DECLARE_UVERBS_NAMED_METHOD(
+ UVERBS_METHOD_QP_ATTACH_COMP_CNTR,
+ UVERBS_ATTR_IDR(UVERBS_ATTR_QP_ATTACH_COMP_CNTR_HANDLE,
+ UVERBS_OBJECT_QP,
+ UVERBS_ACCESS_WRITE,
+ UA_MANDATORY),
+ UVERBS_ATTR_IDR(UVERBS_ATTR_QP_ATTACH_COMP_CNTR_CNTR_HANDLE,
+ UVERBS_OBJECT_COMP_CNTR,
+ UVERBS_ACCESS_READ,
+ UA_MANDATORY),
+ UVERBS_ATTR_FLAGS_IN(UVERBS_ATTR_QP_ATTACH_COMP_CNTR_OP_MASK,
+ enum ib_uverbs_comp_cntr_attach_op,
+ UA_OPTIONAL));
+
DECLARE_UVERBS_NAMED_OBJECT(
UVERBS_OBJECT_QP,
UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_uqp_object), uverbs_free_qp),
&UVERBS_METHOD(UVERBS_METHOD_QP_CREATE),
- &UVERBS_METHOD(UVERBS_METHOD_QP_DESTROY));
+ &UVERBS_METHOD(UVERBS_METHOD_QP_DESTROY),
+ &UVERBS_METHOD(UVERBS_METHOD_QP_ATTACH_COMP_CNTR));
const struct uapi_definition uverbs_def_obj_qp[] = {
UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_QP,
diff --git a/drivers/infiniband/core/uverbs_uapi.c b/drivers/infiniband/core/uverbs_uapi.c
index 31b248295854..a3f42a50a14f 100644
--- a/drivers/infiniband/core/uverbs_uapi.c
+++ b/drivers/infiniband/core/uverbs_uapi.c
@@ -628,6 +628,7 @@ void uverbs_destroy_api(struct uverbs_api *uapi)
static const struct uapi_definition uverbs_core_api[] = {
UAPI_DEF_CHAIN(uverbs_def_obj_async_fd),
UAPI_DEF_CHAIN(uverbs_def_obj_counters),
+ UAPI_DEF_CHAIN(uverbs_def_obj_comp_cntr),
UAPI_DEF_CHAIN(uverbs_def_obj_cq),
UAPI_DEF_CHAIN(uverbs_def_obj_device),
UAPI_DEF_CHAIN(uverbs_def_obj_dm),
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9dd76f489a0b..f36a6d48790a 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -453,6 +453,7 @@ struct ib_device_attr {
u64 max_dm_size;
/* Max entries for sgl for optimized performance per READ */
u32 max_sgl_rd;
+ u32 max_comp_cntr;
};
enum ib_mtu {
@@ -1746,6 +1747,36 @@ struct ib_cq {
struct rdma_restrack_entry res;
};
+struct ib_comp_cntr {
+ struct ib_device *device;
+ struct ib_uobject *uobject;
+ u64 comp_count_max_value;
+ u64 err_count_max_value;
+};
+
+enum ib_comp_cntr_entry {
+ IB_COMP_CNTR_ENTRY_COMP = IB_UVERBS_COMP_CNTR_ENTRY_COMP,
+ IB_COMP_CNTR_ENTRY_ERR = IB_UVERBS_COMP_CNTR_ENTRY_ERR,
+};
+
+enum ib_comp_cntr_modify_op {
+ IB_COMP_CNTR_MODIFY_OP_SET = IB_UVERBS_COMP_CNTR_MODIFY_OP_SET,
+ IB_COMP_CNTR_MODIFY_OP_INC = IB_UVERBS_COMP_CNTR_MODIFY_OP_INC,
+};
+
+enum ib_comp_cntr_attach_op {
+ IB_COMP_CNTR_ATTACH_OP_SEND = IB_UVERBS_COMP_CNTR_ATTACH_OP_SEND,
+ IB_COMP_CNTR_ATTACH_OP_RECV = IB_UVERBS_COMP_CNTR_ATTACH_OP_RECV,
+ IB_COMP_CNTR_ATTACH_OP_RDMA_READ = IB_UVERBS_COMP_CNTR_ATTACH_OP_RDMA_READ,
+ IB_COMP_CNTR_ATTACH_OP_REMOTE_RDMA_READ = IB_UVERBS_COMP_CNTR_ATTACH_OP_REMOTE_RDMA_READ,
+ IB_COMP_CNTR_ATTACH_OP_RDMA_WRITE = IB_UVERBS_COMP_CNTR_ATTACH_OP_RDMA_WRITE,
+ IB_COMP_CNTR_ATTACH_OP_REMOTE_RDMA_WRITE = IB_UVERBS_COMP_CNTR_ATTACH_OP_REMOTE_RDMA_WRITE,
+};
+
+struct ib_comp_cntr_attach_attr {
+ u32 op_mask;
+};
+
struct ib_srq {
struct ib_device *device;
struct ib_pd *pd;
@@ -2624,6 +2655,8 @@ struct ib_device_ops {
struct ib_udata *udata);
int (*modify_qp)(struct ib_qp *qp, struct ib_qp_attr *qp_attr,
int qp_attr_mask, struct ib_udata *udata);
+ int (*qp_attach_comp_cntr)(struct ib_qp *qp, struct ib_comp_cntr *cc,
+ struct ib_comp_cntr_attach_attr *attr);
int (*query_qp)(struct ib_qp *qp, struct ib_qp_attr *qp_attr,
int qp_attr_mask, struct ib_qp_init_attr *qp_init_attr);
int (*destroy_qp)(struct ib_qp *qp, struct ib_udata *udata);
@@ -2645,6 +2678,12 @@ struct ib_device_ops {
* post_destroy_cq - Free all kernel resources
*/
void (*post_destroy_cq)(struct ib_cq *cq);
+ int (*create_comp_cntr)(struct ib_comp_cntr *cc,
+ struct uverbs_attr_bundle *attrs);
+ int (*destroy_comp_cntr)(struct ib_comp_cntr *cc);
+ int (*modify_comp_cntr)(struct ib_comp_cntr *cc, enum ib_comp_cntr_entry entry,
+ enum ib_comp_cntr_modify_op op, u64 value);
+ int (*read_comp_cntr)(struct ib_comp_cntr *cc, enum ib_comp_cntr_entry entry, u64 *value);
struct ib_mr *(*get_dma_mr)(struct ib_pd *pd, int mr_access_flags);
struct ib_mr *(*reg_user_mr)(struct ib_pd *pd, u64 start, u64 length,
u64 virt_addr, int mr_access_flags,
@@ -2878,6 +2917,7 @@ struct ib_device_ops {
DECLARE_RDMA_OBJ_SIZE(ib_ah);
DECLARE_RDMA_OBJ_SIZE(ib_counters);
DECLARE_RDMA_OBJ_SIZE(ib_cq);
+ DECLARE_RDMA_OBJ_SIZE(ib_comp_cntr);
DECLARE_RDMA_OBJ_SIZE(ib_dmah);
DECLARE_RDMA_OBJ_SIZE(ib_mw);
DECLARE_RDMA_OBJ_SIZE(ib_pd);
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index 72041c1b0ea5..352d808c315d 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -57,6 +57,7 @@ enum uverbs_default_objects {
UVERBS_OBJECT_ASYNC_EVENT,
UVERBS_OBJECT_DMAH,
UVERBS_OBJECT_DMABUF,
+ UVERBS_OBJECT_COMP_CNTR,
};
enum {
@@ -165,9 +166,16 @@ enum uverbs_attrs_destroy_qp_cmd_attr_ids {
UVERBS_ATTR_DESTROY_QP_RESP,
};
+enum uverbs_attrs_qp_attach_comp_cntr_cmd_attr_ids {
+ UVERBS_ATTR_QP_ATTACH_COMP_CNTR_HANDLE,
+ UVERBS_ATTR_QP_ATTACH_COMP_CNTR_CNTR_HANDLE,
+ UVERBS_ATTR_QP_ATTACH_COMP_CNTR_OP_MASK,
+};
+
enum uverbs_methods_qp {
UVERBS_METHOD_QP_CREATE,
UVERBS_METHOD_QP_DESTROY,
+ UVERBS_METHOD_QP_ATTACH_COMP_CNTR,
};
enum uverbs_attrs_create_srq_cmd_attr_ids {
@@ -434,4 +442,34 @@ enum uverbs_attrs_query_gid_entry_cmd_attr_ids {
UVERBS_ATTR_QUERY_GID_ENTRY_RESP_ENTRY,
};
+enum uverbs_methods_comp_cntr {
+ UVERBS_METHOD_COMP_CNTR_CREATE,
+ UVERBS_METHOD_COMP_CNTR_DESTROY,
+ UVERBS_METHOD_COMP_CNTR_MODIFY,
+ UVERBS_METHOD_COMP_CNTR_READ,
+};
+
+enum uverbs_attrs_create_comp_cntr_cmd_attr_ids {
+ UVERBS_ATTR_CREATE_COMP_CNTR_HANDLE,
+ UVERBS_ATTR_CREATE_COMP_CNTR_RESP_COUNT_MAX_VALUE,
+ UVERBS_ATTR_CREATE_COMP_CNTR_RESP_ERR_COUNT_MAX_VALUE,
+};
+
+enum uverbs_attrs_destroy_comp_cntr_cmd_attr_ids {
+ UVERBS_ATTR_DESTROY_COMP_CNTR_HANDLE,
+};
+
+enum uverbs_attrs_modify_comp_cntr_cmd_attr_ids {
+ UVERBS_ATTR_MODIFY_COMP_CNTR_HANDLE,
+ UVERBS_ATTR_MODIFY_COMP_CNTR_ENTRY,
+ UVERBS_ATTR_MODIFY_COMP_CNTR_OP,
+ UVERBS_ATTR_MODIFY_COMP_CNTR_VALUE,
+};
+
+enum uverbs_attrs_read_comp_cntr_cmd_attr_ids {
+ UVERBS_ATTR_READ_COMP_CNTR_HANDLE,
+ UVERBS_ATTR_READ_COMP_CNTR_ENTRY,
+ UVERBS_ATTR_READ_COMP_CNTR_RESP_VALUE,
+};
+
#endif
diff --git a/include/uapi/rdma/ib_user_ioctl_verbs.h b/include/uapi/rdma/ib_user_ioctl_verbs.h
index 90c5cd8e7753..70b53d5daa0c 100644
--- a/include/uapi/rdma/ib_user_ioctl_verbs.h
+++ b/include/uapi/rdma/ib_user_ioctl_verbs.h
@@ -273,4 +273,23 @@ struct ib_uverbs_gid_entry {
__u32 netdev_ifindex; /* It is 0 if there is no netdev associated with it */
};
+enum ib_uverbs_comp_cntr_entry {
+ IB_UVERBS_COMP_CNTR_ENTRY_COMP,
+ IB_UVERBS_COMP_CNTR_ENTRY_ERR,
+};
+
+enum ib_uverbs_comp_cntr_modify_op {
+ IB_UVERBS_COMP_CNTR_MODIFY_OP_SET,
+ IB_UVERBS_COMP_CNTR_MODIFY_OP_INC,
+};
+
+enum ib_uverbs_comp_cntr_attach_op {
+ IB_UVERBS_COMP_CNTR_ATTACH_OP_SEND = 1 << 0,
+ IB_UVERBS_COMP_CNTR_ATTACH_OP_RECV = 1 << 1,
+ IB_UVERBS_COMP_CNTR_ATTACH_OP_RDMA_READ = 1 << 2,
+ IB_UVERBS_COMP_CNTR_ATTACH_OP_REMOTE_RDMA_READ = 1 << 3,
+ IB_UVERBS_COMP_CNTR_ATTACH_OP_RDMA_WRITE = 1 << 4,
+ IB_UVERBS_COMP_CNTR_ATTACH_OP_REMOTE_RDMA_WRITE = 1 << 5,
+};
+
#endif
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 3b7bd99813e9..45d142f4a7f8 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -299,7 +299,7 @@ struct ib_uverbs_ex_query_device_resp {
struct ib_uverbs_cq_moderation_caps cq_moderation_caps;
__aligned_u64 max_dm_size;
__u32 xrc_odp_caps;
- __u32 reserved;
+ __u32 max_comp_cntr;
};
struct ib_uverbs_query_port {
--
2.47.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH for-next v4 2/5] RDMA/core: Prevent destroying in-use completion counters
2026-05-11 22:37 [PATCH for-next v4 0/5] Introduce Completion Counters Michael Margolin
2026-05-11 22:37 ` [PATCH for-next v4 1/5] RDMA/core: Add Completion Counters support Michael Margolin
@ 2026-05-11 22:37 ` Michael Margolin
2026-05-11 22:37 ` [PATCH for-next v4 3/5] RDMA/core: Add Completion Counters to resource tracking Michael Margolin
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Michael Margolin @ 2026-05-11 22:37 UTC (permalink / raw)
To: jgg, leon, linux-rdma; +Cc: sleybo, matua, gal.pressman, Yonatan Nachum
Reject comp_cntr destroy while it is attached to any QP. Track
attachments using an xarray in ib_qp keyed by the attach op_mask.
Use op bitmask to reject overlapping attaches early.
Reviewed-by: Yonatan Nachum <ynachum@amazon.com>
Signed-off-by: Michael Margolin <mrgolin@amazon.com>
---
.../core/uverbs_std_types_comp_cntr.c | 3 +++
drivers/infiniband/core/uverbs_std_types_qp.c | 22 ++++++++++++++++++-
drivers/infiniband/core/verbs.c | 1 +
include/rdma/ib_verbs.h | 3 +++
4 files changed, 28 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/uverbs_std_types_comp_cntr.c b/drivers/infiniband/core/uverbs_std_types_comp_cntr.c
index c1cf0f59d483..d64ec4c296dd 100644
--- a/drivers/infiniband/core/uverbs_std_types_comp_cntr.c
+++ b/drivers/infiniband/core/uverbs_std_types_comp_cntr.c
@@ -13,6 +13,9 @@ static int uverbs_free_comp_cntr(struct ib_uobject *uobject, enum rdma_remove_re
struct ib_comp_cntr *cc = uobject->object;
int ret;
+ if (atomic_read(&cc->usecnt))
+ return -EBUSY;
+
ret = cc->device->ops.destroy_comp_cntr(cc);
if (ret)
return ret;
diff --git a/drivers/infiniband/core/uverbs_std_types_qp.c b/drivers/infiniband/core/uverbs_std_types_qp.c
index dec4c0ebb41c..51a4639ef053 100644
--- a/drivers/infiniband/core/uverbs_std_types_qp.c
+++ b/drivers/infiniband/core/uverbs_std_types_qp.c
@@ -15,6 +15,8 @@ static int uverbs_free_qp(struct ib_uobject *uobject,
struct ib_qp *qp = uobject->object;
struct ib_uqp_object *uqp =
container_of(uobject, struct ib_uqp_object, uevent.uobject);
+ struct ib_comp_cntr *cc;
+ unsigned long index;
int ret;
/*
@@ -35,6 +37,10 @@ static int uverbs_free_qp(struct ib_uobject *uobject,
if (ret)
return ret;
+ xa_for_each(&qp->comp_cntrs, index, cc)
+ atomic_dec(&cc->usecnt);
+ xa_destroy(&qp->comp_cntrs);
+
if (uqp->uxrcd)
atomic_dec(&uqp->uxrcd->refcnt);
@@ -392,7 +398,21 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QP_ATTACH_COMP_CNTR)(
if (ret)
return ret;
- return qp->device->ops.qp_attach_comp_cntr(qp, cc, &attr);
+ if (attr.op_mask & qp->comp_cntr_op_mask)
+ return -EBUSY;
+
+ ret = qp->device->ops.qp_attach_comp_cntr(qp, cc, &attr);
+ if (ret)
+ return ret;
+
+ ret = xa_err(xa_store(&qp->comp_cntrs, attr.op_mask, cc, GFP_KERNEL));
+ if (ret)
+ return ret;
+
+ atomic_inc(&cc->usecnt);
+ qp->comp_cntr_op_mask |= attr.op_mask;
+
+ return 0;
}
DECLARE_UVERBS_NAMED_METHOD(
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index bac87de9cc67..df9a1bb9ece4 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1293,6 +1293,7 @@ static struct ib_qp *create_qp(struct ib_device *dev, struct ib_pd *pd,
qp->qp_context = attr->qp_context;
spin_lock_init(&qp->mr_lock);
+ xa_init(&qp->comp_cntrs);
INIT_LIST_HEAD(&qp->rdma_mrs);
INIT_LIST_HEAD(&qp->sig_mrs);
init_completion(&qp->srq_completion);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index f36a6d48790a..270b49a7d174 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1752,6 +1752,7 @@ struct ib_comp_cntr {
struct ib_uobject *uobject;
u64 comp_count_max_value;
u64 err_count_max_value;
+ atomic_t usecnt;
};
enum ib_comp_cntr_entry {
@@ -1947,6 +1948,8 @@ struct ib_qp {
struct completion srq_completion;
struct ib_xrcd *xrcd; /* XRC TGT QPs only */
struct list_head xrcd_list;
+ struct xarray comp_cntrs; /* op_mask -> comp_cntr */
+ u32 comp_cntr_op_mask;
/* count times opened, mcast attaches, flow attaches */
atomic_t usecnt;
--
2.47.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH for-next v4 3/5] RDMA/core: Add Completion Counters to resource tracking
2026-05-11 22:37 [PATCH for-next v4 0/5] Introduce Completion Counters Michael Margolin
2026-05-11 22:37 ` [PATCH for-next v4 1/5] RDMA/core: Add Completion Counters support Michael Margolin
2026-05-11 22:37 ` [PATCH for-next v4 2/5] RDMA/core: Prevent destroying in-use completion counters Michael Margolin
@ 2026-05-11 22:37 ` Michael Margolin
2026-05-11 22:37 ` [PATCH for-next v4 4/5] RDMA/efa: Update device interface Michael Margolin
2026-05-11 22:37 ` [PATCH for-next v4 5/5] RDMA/efa: Add Completion Counters support Michael Margolin
4 siblings, 0 replies; 6+ messages in thread
From: Michael Margolin @ 2026-05-11 22:37 UTC (permalink / raw)
To: jgg, leon, linux-rdma; +Cc: sleybo, matua, gal.pressman, Yonatan Nachum
Track completion counter objects in the resource tracking database so
they are visible through the rdma netlink interface. The rdma tool
displays the comp_cntr count in the resource summary.
Add RDMA_RESTRACK_COMP_CNTR type, embed rdma_restrack_entry in
ib_comp_cntr, and add the res_to_dev mapping. Register the resource
on create and remove it on destroy.
Reviewed-by: Yonatan Nachum <ynachum@amazon.com>
Signed-off-by: Michael Margolin <mrgolin@amazon.com>
---
drivers/infiniband/core/nldev.c | 1 +
drivers/infiniband/core/restrack.c | 2 ++
drivers/infiniband/core/uverbs_std_types_comp_cntr.c | 6 ++++++
include/rdma/ib_verbs.h | 1 +
include/rdma/restrack.h | 4 ++++
5 files changed, 14 insertions(+)
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 96c745d5bac4..155954fef3e2 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -446,6 +446,7 @@ static int fill_res_info(struct sk_buff *msg, struct ib_device *device,
[RDMA_RESTRACK_MR] = "mr",
[RDMA_RESTRACK_CTX] = "ctx",
[RDMA_RESTRACK_SRQ] = "srq",
+ [RDMA_RESTRACK_COMP_CNTR] = "comp_cntr",
};
struct nlattr *table_attr;
diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index ac3688952cab..d152cc5f042b 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -102,6 +102,8 @@ static struct ib_device *res_to_dev(struct rdma_restrack_entry *res)
return container_of(res, struct ib_srq, res)->device;
case RDMA_RESTRACK_DMAH:
return container_of(res, struct ib_dmah, res)->device;
+ case RDMA_RESTRACK_COMP_CNTR:
+ return container_of(res, struct ib_comp_cntr, res)->device;
default:
WARN_ONCE(true, "Wrong resource tracking type %u\n", res->type);
return NULL;
diff --git a/drivers/infiniband/core/uverbs_std_types_comp_cntr.c b/drivers/infiniband/core/uverbs_std_types_comp_cntr.c
index d64ec4c296dd..cfdbd712ea34 100644
--- a/drivers/infiniband/core/uverbs_std_types_comp_cntr.c
+++ b/drivers/infiniband/core/uverbs_std_types_comp_cntr.c
@@ -6,6 +6,7 @@
#include <rdma/uverbs_std_types.h>
#include "rdma_core.h"
#include "uverbs.h"
+#include "restrack.h"
static int uverbs_free_comp_cntr(struct ib_uobject *uobject, enum rdma_remove_reason why,
struct uverbs_attr_bundle *attrs)
@@ -20,6 +21,7 @@ static int uverbs_free_comp_cntr(struct ib_uobject *uobject, enum rdma_remove_re
if (ret)
return ret;
+ rdma_restrack_del(&cc->res);
kfree(cc);
return 0;
}
@@ -48,7 +50,11 @@ static int UVERBS_HANDLER(UVERBS_METHOD_COMP_CNTR_CREATE)(struct uverbs_attr_bun
if (ret)
goto err_free;
+ rdma_restrack_new(&cc->res, RDMA_RESTRACK_COMP_CNTR);
+ rdma_restrack_set_name(&cc->res, NULL);
+
uobj->object = cc;
+ rdma_restrack_add(&cc->res);
uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_CREATE_COMP_CNTR_HANDLE);
ret = uverbs_copy_to(attrs, UVERBS_ATTR_CREATE_COMP_CNTR_RESP_COUNT_MAX_VALUE,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 270b49a7d174..b644a1d8bb90 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1753,6 +1753,7 @@ struct ib_comp_cntr {
u64 comp_count_max_value;
u64 err_count_max_value;
atomic_t usecnt;
+ struct rdma_restrack_entry res;
};
enum ib_comp_cntr_entry {
diff --git a/include/rdma/restrack.h b/include/rdma/restrack.h
index 451f99e3717d..4ab72bc6d8c7 100644
--- a/include/rdma/restrack.h
+++ b/include/rdma/restrack.h
@@ -60,6 +60,10 @@ enum rdma_restrack_type {
* @RDMA_RESTRACK_DMAH: DMA handle
*/
RDMA_RESTRACK_DMAH,
+ /**
+ * @RDMA_RESTRACK_COMP_CNTR: Completion Counter
+ */
+ RDMA_RESTRACK_COMP_CNTR,
/**
* @RDMA_RESTRACK_MAX: Last entry, used for array dclarations
*/
--
2.47.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH for-next v4 4/5] RDMA/efa: Update device interface
2026-05-11 22:37 [PATCH for-next v4 0/5] Introduce Completion Counters Michael Margolin
` (2 preceding siblings ...)
2026-05-11 22:37 ` [PATCH for-next v4 3/5] RDMA/core: Add Completion Counters to resource tracking Michael Margolin
@ 2026-05-11 22:37 ` Michael Margolin
2026-05-11 22:37 ` [PATCH for-next v4 5/5] RDMA/efa: Add Completion Counters support Michael Margolin
4 siblings, 0 replies; 6+ messages in thread
From: Michael Margolin @ 2026-05-11 22:37 UTC (permalink / raw)
To: jgg, leon, linux-rdma
Cc: sleybo, matua, gal.pressman, Daniel Kinsbursky, Yonatan Nachum
Align device interface definitions.
Reviewed-by: Daniel Kinsbursky <dkinsb@amazon.com>
Reviewed-by: Yonatan Nachum <ynachum@amazon.com>
Signed-off-by: Michael Margolin <mrgolin@amazon.com>
---
.../infiniband/hw/efa/efa_admin_cmds_defs.h | 185 +++++++++++++++++-
drivers/infiniband/hw/efa/efa_io_defs.h | 63 +++++-
2 files changed, 242 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/hw/efa/efa_admin_cmds_defs.h b/drivers/infiniband/hw/efa/efa_admin_cmds_defs.h
index ad34ea5da6b0..2d75edabeefa 100644
--- a/drivers/infiniband/hw/efa/efa_admin_cmds_defs.h
+++ b/drivers/infiniband/hw/efa/efa_admin_cmds_defs.h
@@ -31,7 +31,12 @@ enum efa_admin_aq_opcode {
EFA_ADMIN_CREATE_EQ = 18,
EFA_ADMIN_DESTROY_EQ = 19,
EFA_ADMIN_ALLOC_MR = 20,
- EFA_ADMIN_MAX_OPCODE = 20,
+ EFA_ADMIN_SERVICE = 21,
+ EFA_ADMIN_CREATE_COUNTER = 25,
+ EFA_ADMIN_DESTROY_COUNTER = 26,
+ EFA_ADMIN_ATTACH_COUNTER = 27,
+ EFA_ADMIN_MODIFY_COUNTER = 28,
+ EFA_ADMIN_MAX_OPCODE = 28,
};
enum efa_admin_aq_feature_id {
@@ -725,7 +730,9 @@ struct efa_admin_feature_device_attr_desc {
* on TX queues
* 4 : unsolicited_write_recv - If set, unsolicited
* write with imm. receive is supported
- * 31:5 : reserved - MBZ
+ * 5 : event_counters - If set, event counters are
+ * supported
+ * 31:6 : reserved - MBZ
*/
u32 device_caps;
@@ -814,6 +821,34 @@ struct efa_admin_feature_queue_attr_desc_1 {
struct efa_admin_feature_queue_attr_desc_2 {
/* Maximum size of data that can be sent inline in a Send WQE */
u16 inline_buf_size_ex;
+
+ /* MBZ */
+ u8 reserved[6];
+
+ /*
+ * Supported counter QP events
+ * 0 : send_comp
+ * 1 : send_comp_err
+ * 2 : recv_comp
+ * 3 : recv_comp_err
+ * 4 : read_comp
+ * 5 : read_comp_err
+ * 6 : write_comp
+ * 7 : write_comp_err
+ * 8 : remote_read_comp
+ * 9 : remote_write_comp
+ * 31:10 : reserved - MBZ
+ */
+ u32 supported_counter_qp_events;
+
+ /* Maximum number of counters */
+ u32 max_event_counters;
+
+ /*
+ * Maximum counter value, counter wraps around to 0 after reaching
+ * this value
+ */
+ u64 event_counter_max_val;
};
struct efa_admin_event_queue_attr_desc {
@@ -1092,6 +1127,127 @@ struct efa_admin_host_info {
u32 flags;
};
+struct efa_admin_service_cmd {
+ struct efa_admin_aq_common_desc aq_common_descriptor;
+
+ u8 buffer[60];
+};
+
+struct efa_admin_service_resp {
+ struct efa_admin_acq_common_desc acq_common_desc;
+
+ u8 buffer[56];
+};
+
+/* Create Counter command */
+struct efa_admin_create_counter_cmd {
+ struct efa_admin_aq_common_desc aq_common_descriptor;
+
+ /* UAR number */
+ u16 uar;
+
+ /* MBZ */
+ u16 reserved;
+
+ /* Counter physical address */
+ u64 paddr;
+};
+
+struct efa_admin_create_counter_resp {
+ struct efa_admin_acq_common_desc acq_common_desc;
+
+ /* Counter handle */
+ u32 cntr_handle;
+
+ /* MBZ */
+ u32 reserved;
+};
+
+struct efa_admin_destroy_counter_cmd {
+ struct efa_admin_aq_common_desc aq_common_descriptor;
+
+ /* Counter handle */
+ u32 cntr_handle;
+};
+
+struct efa_admin_destroy_counter_resp {
+ struct efa_admin_acq_common_desc acq_common_desc;
+};
+
+enum efa_admin_counter_attach_type {
+ EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS = 0,
+};
+
+struct efa_admin_counter_attach_qp_events {
+ /* QP handle */
+ u32 qp_handle;
+
+ /*
+ * Bitmask of counter QP events
+ * 0 : send_comp
+ * 1 : send_comp_err
+ * 2 : recv_comp
+ * 3 : recv_comp_err
+ * 4 : read_comp
+ * 5 : read_comp_err
+ * 6 : write_comp
+ * 7 : write_comp_err
+ * 8 : remote_read_comp
+ * 9 : remote_write_comp
+ * 31:10 : reserved - MBZ
+ */
+ u32 events;
+};
+
+struct efa_admin_attach_counter_cmd {
+ struct efa_admin_aq_common_desc aq_common_descriptor;
+
+ /* Counter handle */
+ u32 cntr_handle;
+
+ /* efa_admin_counter_attach_type */
+ u8 attach_type;
+
+ /* MBZ */
+ u8 reserved[3];
+
+ union {
+ struct efa_admin_counter_attach_qp_events qp_events;
+ } u;
+};
+
+struct efa_admin_attach_counter_resp {
+ struct efa_admin_acq_common_desc acq_common_desc;
+};
+
+/* Counter modify operations */
+enum efa_admin_counter_modify_ops {
+ /* Set counter value */
+ EFA_ADMIN_COUNTER_MODIFY_SET = 0,
+ /* Add to counter value */
+ EFA_ADMIN_COUNTER_MODIFY_ADD = 1,
+};
+
+struct efa_admin_modify_counter_cmd {
+ struct efa_admin_aq_common_desc aq_common_descriptor;
+
+ /* Counter handle */
+ u32 cntr_handle;
+
+ /* Counter operation type (efa_admin_counter_modify_ops) */
+ u8 operation;
+
+ /* MBZ */
+ u8 reserved[7];
+
+ /* Value for SET or ADD */
+ u64 value;
+};
+
+struct efa_admin_modify_counter_resp {
+ struct efa_admin_acq_common_desc acq_common_desc;
+};
+
/* create_qp_cmd */
#define EFA_ADMIN_CREATE_QP_CMD_SQ_VIRT_MASK BIT(0)
#define EFA_ADMIN_CREATE_QP_CMD_RQ_VIRT_MASK BIT(1)
@@ -1132,6 +1288,19 @@ struct efa_admin_host_info {
#define EFA_ADMIN_FEATURE_DEVICE_ATTR_DESC_DATA_POLLING_128_MASK BIT(2)
#define EFA_ADMIN_FEATURE_DEVICE_ATTR_DESC_RDMA_WRITE_MASK BIT(3)
#define EFA_ADMIN_FEATURE_DEVICE_ATTR_DESC_UNSOLICITED_WRITE_RECV_MASK BIT(4)
+#define EFA_ADMIN_FEATURE_DEVICE_ATTR_DESC_EVENT_COUNTERS_MASK BIT(5)
+
+/* feature_queue_attr_desc_2 */
+#define EFA_ADMIN_FEATURE_QUEUE_ATTR_DESC_2_SEND_COMP_MASK BIT(0)
+#define EFA_ADMIN_FEATURE_QUEUE_ATTR_DESC_2_SEND_COMP_ERR_MASK BIT(1)
+#define EFA_ADMIN_FEATURE_QUEUE_ATTR_DESC_2_RECV_COMP_MASK BIT(2)
+#define EFA_ADMIN_FEATURE_QUEUE_ATTR_DESC_2_RECV_COMP_ERR_MASK BIT(3)
+#define EFA_ADMIN_FEATURE_QUEUE_ATTR_DESC_2_READ_COMP_MASK BIT(4)
+#define EFA_ADMIN_FEATURE_QUEUE_ATTR_DESC_2_READ_COMP_ERR_MASK BIT(5)
+#define EFA_ADMIN_FEATURE_QUEUE_ATTR_DESC_2_WRITE_COMP_MASK BIT(6)
+#define EFA_ADMIN_FEATURE_QUEUE_ATTR_DESC_2_WRITE_COMP_ERR_MASK BIT(7)
+#define EFA_ADMIN_FEATURE_QUEUE_ATTR_DESC_2_REMOTE_READ_COMP_MASK BIT(8)
+#define EFA_ADMIN_FEATURE_QUEUE_ATTR_DESC_2_REMOTE_WRITE_COMP_MASK BIT(9)
/* create_eq_cmd */
#define EFA_ADMIN_CREATE_EQ_CMD_ENTRY_SIZE_WORDS_MASK GENMASK(4, 0)
@@ -1150,4 +1319,16 @@ struct efa_admin_host_info {
#define EFA_ADMIN_HOST_INFO_INTREE_MASK BIT(0)
#define EFA_ADMIN_HOST_INFO_GDR_MASK BIT(1)
+/* counter_attach_qp_events */
+#define EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_SEND_COMP_MASK BIT(0)
+#define EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_SEND_COMP_ERR_MASK BIT(1)
+#define EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_RECV_COMP_MASK BIT(2)
+#define EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_RECV_COMP_ERR_MASK BIT(3)
+#define EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_READ_COMP_MASK BIT(4)
+#define EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_READ_COMP_ERR_MASK BIT(5)
+#define EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_WRITE_COMP_MASK BIT(6)
+#define EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_WRITE_COMP_ERR_MASK BIT(7)
+#define EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_REMOTE_READ_COMP_MASK BIT(8)
+#define EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_REMOTE_WRITE_COMP_MASK BIT(9)
+
#endif /* _EFA_ADMIN_CMDS_H_ */
diff --git a/drivers/infiniband/hw/efa/efa_io_defs.h b/drivers/infiniband/hw/efa/efa_io_defs.h
index a4c9fd33da38..ede4b27eb951 100644
--- a/drivers/infiniband/hw/efa/efa_io_defs.h
+++ b/drivers/infiniband/hw/efa/efa_io_defs.h
@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */
/*
- * Copyright 2018-2024 Amazon.com, Inc. or its affiliates. All rights reserved.
+ * Copyright 2018-2026 Amazon.com, Inc. or its affiliates. All rights reserved.
*/
#ifndef _EFA_IO_H_
@@ -9,6 +9,7 @@
#define EFA_IO_TX_DESC_NUM_BUFS 2
#define EFA_IO_TX_DESC_NUM_RDMA_BUFS 1
#define EFA_IO_TX_DESC_INLINE_MAX_SIZE 32
+#define EFA_IO_TX_DESC_INLINE_MAX_SIZE_128 80
#define EFA_IO_TX_DESC_IMM_DATA_SIZE 4
#define EFA_IO_TX_DESC_INLINE_PBL_SIZE 1
@@ -65,6 +66,8 @@ enum efa_io_comp_status {
EFA_IO_COMP_STATUS_REMOTE_ERROR_UNKNOWN_PEER = 14,
/* Unreachable remote - never received a response */
EFA_IO_COMP_STATUS_LOCAL_ERROR_UNREACH_REMOTE = 15,
+ /* Remote feature mismatch */
+ EFA_IO_COMP_STATUS_REMOTE_ERROR_FEATURE_MISMATCH = 18,
};
enum efa_io_frwr_pbl_mode {
@@ -72,6 +75,11 @@ enum efa_io_frwr_pbl_mode {
EFA_IO_FRWR_DIRECT_PBL = 1,
};
+enum efa_io_processing_hint {
+ /* Optimize for throughput */
+ EFA_IO_PROCESSING_HINT_BURST_PPS_SENSITIVE = 1 << 0,
+};
+
struct efa_io_tx_meta_desc {
/* Verbs-generated Request ID */
u16 req_id;
@@ -121,7 +129,15 @@ struct efa_io_tx_meta_desc {
u16 ah;
- u16 reserved;
+ /*
+ * control flags
+ * 1:0 : processing_hints - Bitmask of enum
+ * efa_io_processing_hint
+ * 7:2 : reserved - MBZ
+ */
+ u8 ctrl3;
+
+ u8 reserved;
/* Queue key */
u32 qkey;
@@ -172,6 +188,19 @@ struct efa_io_rdma_req {
struct efa_io_tx_buf_desc local_mem[1];
};
+struct efa_io_rdma_req_128 {
+ /* Remote memory address */
+ struct efa_io_remote_mem_addr remote_mem;
+
+ union {
+ /* Local memory address */
+ struct efa_io_tx_buf_desc local_mem[1];
+
+ /* inline data for RDMA */
+ u8 inline_data[80];
+ };
+};
+
struct efa_io_fast_mr_reg_req {
/* Updated local key of the MR after lkey/rkey increment */
u32 lkey;
@@ -230,8 +259,8 @@ struct efa_io_fast_mr_inv_req {
};
/*
- * Tx WQE, composed of tx meta descriptors followed by either tx buffer
- * descriptors or inline data
+ * 64-byte Tx WQE, composed of tx meta descriptors followed by either tx
+ * buffer descriptors or inline data
*/
struct efa_io_tx_wqe {
/* TX meta */
@@ -254,6 +283,31 @@ struct efa_io_tx_wqe {
} data;
};
+/*
+ * 128-byte Tx WQE, composed of tx meta descriptors followed by either tx
+ * buffer descriptors or inline data
+ */
+struct efa_io_tx_wqe_128 {
+ /* TX meta */
+ struct efa_io_tx_meta_desc meta;
+
+ union {
+ /* Send buffer descriptors */
+ struct efa_io_tx_buf_desc sgl[2];
+
+ u8 inline_data[80];
+
+ /* RDMA local and remote memory addresses */
+ struct efa_io_rdma_req_128 rdma_req;
+
+ /* Fast registration */
+ struct efa_io_fast_mr_reg_req reg_mr_req;
+
+ /* Fast invalidation */
+ struct efa_io_fast_mr_inv_req inv_mr_req;
+ } data;
+};
+
/*
* Rx buffer descriptor; RX WQE is composed of one or more RX buffer
* descriptors.
@@ -365,6 +419,7 @@ struct efa_io_rx_cdesc_ex {
#define EFA_IO_TX_META_DESC_FIRST_MASK BIT(2)
#define EFA_IO_TX_META_DESC_LAST_MASK BIT(3)
#define EFA_IO_TX_META_DESC_COMP_REQ_MASK BIT(4)
+#define EFA_IO_TX_META_DESC_PROCESSING_HINTS_MASK GENMASK(1, 0)
/* tx_buf_desc */
#define EFA_IO_TX_BUF_DESC_LKEY_MASK GENMASK(23, 0)
--
2.47.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH for-next v4 5/5] RDMA/efa: Add Completion Counters support
2026-05-11 22:37 [PATCH for-next v4 0/5] Introduce Completion Counters Michael Margolin
` (3 preceding siblings ...)
2026-05-11 22:37 ` [PATCH for-next v4 4/5] RDMA/efa: Update device interface Michael Margolin
@ 2026-05-11 22:37 ` Michael Margolin
4 siblings, 0 replies; 6+ messages in thread
From: Michael Margolin @ 2026-05-11 22:37 UTC (permalink / raw)
To: jgg, leon, linux-rdma; +Cc: sleybo, matua, gal.pressman
Implement completion counters for the EFA device. Each completion
counter is backed by two EFA event counters, one for success
completions and one for error completions.
The driver creates umem for counters from private descriptor ioctl
attributes (efa_uverbs_buffer_desc). Umem creation can be later
replaced by a core utility being developed.
Read operations are not implemented as the counter values are accessed
directly from userspace through the mapped memory.
Signed-off-by: Michael Margolin <mrgolin@amazon.com>
---
drivers/infiniband/hw/efa/efa.h | 17 +-
drivers/infiniband/hw/efa/efa_com_cmd.c | 106 +++++++++++
drivers/infiniband/hw/efa/efa_com_cmd.h | 36 ++++
drivers/infiniband/hw/efa/efa_main.c | 7 +-
drivers/infiniband/hw/efa/efa_verbs.c | 229 ++++++++++++++++++++++++
include/uapi/rdma/efa-abi.h | 19 ++
6 files changed, 412 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h
index 00b19f2ba3da..eebe4172b8f7 100644
--- a/drivers/infiniband/hw/efa/efa.h
+++ b/drivers/infiniband/hw/efa/efa.h
@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */
/*
- * Copyright 2018-2025 Amazon.com, Inc. or its affiliates. All rights reserved.
+ * Copyright 2018-2026 Amazon.com, Inc. or its affiliates. All rights reserved.
*/
#ifndef _EFA_H_
@@ -110,6 +110,14 @@ struct efa_cq {
struct ib_umem *umem;
};
+struct efa_comp_cntr {
+ struct ib_comp_cntr ibcc;
+ struct ib_umem *comp_umem;
+ struct ib_umem *err_umem;
+ u32 comp_handle;
+ u32 err_handle;
+};
+
struct efa_qp {
struct ib_qp ibqp;
dma_addr_t rq_dma_addr;
@@ -163,6 +171,13 @@ int efa_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init_attr,
int efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata);
int efa_create_user_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
struct uverbs_attr_bundle *attrs);
+int efa_create_comp_cntr(struct ib_comp_cntr *ibcc,
+ struct uverbs_attr_bundle *attrs);
+int efa_destroy_comp_cntr(struct ib_comp_cntr *ibcc);
+int efa_modify_comp_cntr(struct ib_comp_cntr *ibcc, enum ib_comp_cntr_entry entry,
+ enum ib_comp_cntr_modify_op op, u64 value);
+int efa_qp_attach_comp_cntr(struct ib_qp *ibqp, struct ib_comp_cntr *ibcc,
+ struct ib_comp_cntr_attach_attr *attr);
struct ib_mr *efa_reg_mr(struct ib_pd *ibpd, u64 start, u64 length,
u64 virt_addr, int access_flags,
struct ib_dmah *dmah,
diff --git a/drivers/infiniband/hw/efa/efa_com_cmd.c b/drivers/infiniband/hw/efa/efa_com_cmd.c
index 63c7f07806a8..e91c405e57d2 100644
--- a/drivers/infiniband/hw/efa/efa_com_cmd.c
+++ b/drivers/infiniband/hw/efa/efa_com_cmd.c
@@ -516,6 +516,8 @@ int efa_com_get_device_attr(struct efa_com_dev *edev,
}
result->inline_buf_size_ex = resp.u.queue_attr_2.inline_buf_size_ex;
+ result->max_event_counters = resp.u.queue_attr_2.max_event_counters;
+ result->event_counter_max_val = resp.u.queue_attr_2.event_counter_max_val;
} else {
result->inline_buf_size_ex = result->inline_buf_size;
}
@@ -851,3 +853,107 @@ int efa_com_get_stats(struct efa_com_dev *edev,
return 0;
}
+
+int efa_com_create_counter(struct efa_com_dev *edev,
+ struct efa_com_create_counter_params *params,
+ struct efa_com_create_counter_result *result)
+{
+ struct efa_admin_create_counter_cmd cmd = {};
+ struct efa_com_admin_queue *aq = &edev->aq;
+ struct efa_admin_create_counter_resp resp;
+ int err;
+
+ cmd.aq_common_descriptor.opcode = EFA_ADMIN_CREATE_COUNTER;
+ cmd.uar = params->uarn;
+ cmd.paddr = params->dma_addr;
+
+ err = efa_com_cmd_exec(aq, (struct efa_admin_aq_entry *)&cmd,
+ sizeof(cmd),
+ (struct efa_admin_acq_entry *)&resp,
+ sizeof(resp));
+ if (err) {
+ ibdev_err_ratelimited(edev->efa_dev,
+ "Failed to create counter [%d]\n", err);
+ return err;
+ }
+
+ result->cntr_handle = resp.cntr_handle;
+ return 0;
+}
+
+int efa_com_destroy_counter(struct efa_com_dev *edev,
+ struct efa_com_destroy_counter_params *params)
+{
+ struct efa_admin_destroy_counter_cmd cmd = {};
+ struct efa_admin_destroy_counter_resp resp;
+ struct efa_com_admin_queue *aq = &edev->aq;
+ int err;
+
+ cmd.aq_common_descriptor.opcode = EFA_ADMIN_DESTROY_COUNTER;
+ cmd.cntr_handle = params->cntr_handle;
+
+ err = efa_com_cmd_exec(aq, (struct efa_admin_aq_entry *)&cmd,
+ sizeof(cmd),
+ (struct efa_admin_acq_entry *)&resp,
+ sizeof(resp));
+ if (err) {
+ ibdev_err_ratelimited(edev->efa_dev,
+ "Failed to destroy counter [%d]\n", err);
+ return err;
+ }
+
+ return 0;
+}
+
+int efa_com_attach_counter(struct efa_com_dev *edev,
+ struct efa_com_attach_counter_params *params)
+{
+ struct efa_admin_attach_counter_cmd cmd = {};
+ struct efa_com_admin_queue *aq = &edev->aq;
+ struct efa_admin_attach_counter_resp resp;
+ int err;
+
+ cmd.aq_common_descriptor.opcode = EFA_ADMIN_ATTACH_COUNTER;
+ cmd.cntr_handle = params->cntr_handle;
+ cmd.attach_type = EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS;
+ cmd.u.qp_events.qp_handle = params->qp_handle;
+ cmd.u.qp_events.events = params->events;
+
+ err = efa_com_cmd_exec(aq, (struct efa_admin_aq_entry *)&cmd,
+ sizeof(cmd),
+ (struct efa_admin_acq_entry *)&resp,
+ sizeof(resp));
+ if (err) {
+ ibdev_err_ratelimited(edev->efa_dev,
+ "Failed to attach counter [%d]\n", err);
+ return err;
+ }
+
+ return 0;
+}
+
+int efa_com_modify_counter(struct efa_com_dev *edev,
+ struct efa_com_modify_counter_params *params)
+{
+ struct efa_admin_modify_counter_cmd cmd = {};
+ struct efa_com_admin_queue *aq = &edev->aq;
+ struct efa_admin_modify_counter_resp resp;
+ int err;
+
+ cmd.aq_common_descriptor.opcode = EFA_ADMIN_MODIFY_COUNTER;
+ cmd.cntr_handle = params->cntr_handle;
+ cmd.operation = params->operation;
+ cmd.value = params->value;
+
+ err = efa_com_cmd_exec(aq, (struct efa_admin_aq_entry *)&cmd,
+ sizeof(cmd),
+ (struct efa_admin_acq_entry *)&resp,
+ sizeof(resp));
+ if (err) {
+ ibdev_err_ratelimited(edev->efa_dev,
+ "Failed to modify counter [%d]\n", err);
+ return err;
+ }
+
+ return 0;
+}
diff --git a/drivers/infiniband/hw/efa/efa_com_cmd.h b/drivers/infiniband/hw/efa/efa_com_cmd.h
index ef15b3c38429..9bce27d585d5 100644
--- a/drivers/infiniband/hw/efa/efa_com_cmd.h
+++ b/drivers/infiniband/hw/efa/efa_com_cmd.h
@@ -145,6 +145,8 @@ struct efa_com_get_device_attr_result {
u16 min_sq_depth;
u16 max_link_speed_gbps;
u8 db_bar;
+ u32 max_event_counters;
+ u64 event_counter_max_val;
};
struct efa_com_get_hw_hints_result {
@@ -300,6 +302,31 @@ union efa_com_get_stats_result {
struct efa_com_network_stats network_stats;
};
+struct efa_com_create_counter_params {
+ dma_addr_t dma_addr;
+ u16 uarn;
+};
+
+struct efa_com_create_counter_result {
+ u32 cntr_handle;
+};
+
+struct efa_com_destroy_counter_params {
+ u32 cntr_handle;
+};
+
+struct efa_com_attach_counter_params {
+ u32 cntr_handle;
+ u32 qp_handle;
+ u32 events;
+};
+
+struct efa_com_modify_counter_params {
+ u32 cntr_handle;
+ u8 operation;
+ u64 value;
+};
+
int efa_com_create_qp(struct efa_com_dev *edev,
struct efa_com_create_qp_params *params,
struct efa_com_create_qp_result *res);
@@ -350,5 +377,14 @@ int efa_com_dealloc_uar(struct efa_com_dev *edev,
int efa_com_get_stats(struct efa_com_dev *edev,
struct efa_com_get_stats_params *params,
union efa_com_get_stats_result *result);
+int efa_com_create_counter(struct efa_com_dev *edev,
+ struct efa_com_create_counter_params *params,
+ struct efa_com_create_counter_result *result);
+int efa_com_destroy_counter(struct efa_com_dev *edev,
+ struct efa_com_destroy_counter_params *params);
+int efa_com_attach_counter(struct efa_com_dev *edev,
+ struct efa_com_attach_counter_params *params);
+int efa_com_modify_counter(struct efa_com_dev *edev,
+ struct efa_com_modify_counter_params *params);
#endif /* _EFA_COM_CMD_H_ */
diff --git a/drivers/infiniband/hw/efa/efa_main.c b/drivers/infiniband/hw/efa/efa_main.c
index 03c237c8c81e..7aa6b401787f 100644
--- a/drivers/infiniband/hw/efa/efa_main.c
+++ b/drivers/infiniband/hw/efa/efa_main.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause
/*
- * Copyright 2018-2025 Amazon.com, Inc. or its affiliates. All rights reserved.
+ * Copyright 2018-2026 Amazon.com, Inc. or its affiliates. All rights reserved.
*/
#include <linux/module.h>
@@ -372,20 +372,24 @@ static const struct ib_device_ops efa_dev_ops = {
.alloc_pd = efa_alloc_pd,
.alloc_ucontext = efa_alloc_ucontext,
.create_user_cq = efa_create_user_cq,
+ .create_comp_cntr = efa_create_comp_cntr,
.create_qp = efa_create_qp,
.create_user_ah = efa_create_ah,
.dealloc_pd = efa_dealloc_pd,
.dealloc_ucontext = efa_dealloc_ucontext,
.dereg_mr = efa_dereg_mr,
.destroy_ah = efa_destroy_ah,
+ .destroy_comp_cntr = efa_destroy_comp_cntr,
.destroy_cq = efa_destroy_cq,
.destroy_qp = efa_destroy_qp,
.get_hw_stats = efa_get_hw_stats,
.get_link_layer = efa_port_link_layer,
.get_port_immutable = efa_get_port_immutable,
+ .modify_comp_cntr = efa_modify_comp_cntr,
.mmap = efa_mmap,
.mmap_free = efa_mmap_free,
.modify_qp = efa_modify_qp,
+ .qp_attach_comp_cntr = efa_qp_attach_comp_cntr,
.query_device = efa_query_device,
.query_gid = efa_query_gid,
.query_pkey = efa_query_pkey,
@@ -396,6 +400,7 @@ static const struct ib_device_ops efa_dev_ops = {
INIT_RDMA_OBJ_SIZE(ib_ah, efa_ah, ibah),
INIT_RDMA_OBJ_SIZE(ib_cq, efa_cq, ibcq),
+ INIT_RDMA_OBJ_SIZE(ib_comp_cntr, efa_comp_cntr, ibcc),
INIT_RDMA_OBJ_SIZE(ib_pd, efa_pd, ibpd),
INIT_RDMA_OBJ_SIZE(ib_qp, efa_qp, ibqp),
INIT_RDMA_OBJ_SIZE(ib_ucontext, efa_ucontext, ibucontext),
diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index 7bd0838ebc99..8bf817bb8ef2 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -169,6 +169,11 @@ static inline struct efa_ah *to_eah(struct ib_ah *ibah)
return container_of(ibah, struct efa_ah, ibah);
}
+static inline struct efa_comp_cntr *to_ecc(struct ib_comp_cntr *ibcc)
+{
+ return container_of(ibcc, struct efa_comp_cntr, ibcc);
+}
+
static inline struct efa_user_mmap_entry *
to_emmap(struct rdma_user_mmap_entry *rdma_entry)
{
@@ -245,6 +250,7 @@ int efa_query_device(struct ib_device *ibdev,
props->max_recv_sge = dev_attr->max_rq_sge;
props->max_sge_rd = dev_attr->max_wr_rdma_sge;
props->max_pkeys = 1;
+ props->max_comp_cntr = dev_attr->max_event_counters / 2;
if (udata && udata->outlen) {
resp.max_sq_sge = dev_attr->max_sq_sge;
@@ -270,6 +276,9 @@ int efa_query_device(struct ib_device *ibdev,
if (EFA_DEV_CAP(dev, UNSOLICITED_WRITE_RECV))
resp.device_caps |= EFA_QUERY_DEVICE_CAPS_UNSOLICITED_WRITE_RECV;
+ if (EFA_DEV_CAP(dev, EVENT_COUNTERS))
+ resp.device_caps |= EFA_QUERY_DEVICE_CAPS_COMP_CNTR;
+
if (dev->neqs)
resp.device_caps |= EFA_QUERY_DEVICE_CAPS_CQ_NOTIFICATIONS;
@@ -2268,6 +2277,211 @@ enum rdma_link_layer efa_port_link_layer(struct ib_device *ibdev,
return IB_LINK_LAYER_UNSPECIFIED;
}
+static int efa_create_event_counter(struct efa_dev *dev, struct ib_umem *umem,
+ u16 uarn, u32 *handle)
+{
+ struct efa_com_create_counter_params params = {};
+ struct efa_com_create_counter_result result;
+ int err;
+
+ params.uarn = uarn;
+ params.dma_addr = ib_umem_start_dma_addr(umem);
+
+ err = efa_com_create_counter(&dev->edev, ¶ms, &result);
+ if (err)
+ return err;
+
+ *handle = result.cntr_handle;
+ return 0;
+}
+
+static int efa_destroy_event_counter(struct efa_dev *dev, u32 handle)
+{
+ struct efa_com_destroy_counter_params params = {
+ .cntr_handle = handle,
+ };
+
+ return efa_com_destroy_counter(&dev->edev, ¶ms);
+}
+
+static struct ib_umem *efa_comp_cntr_get_umem(struct ib_device *ib_dev,
+ struct uverbs_attr_bundle *attrs, int attr)
+{
+ struct efa_uverbs_buffer_desc desc;
+ struct ib_umem_dmabuf *umem_dmabuf;
+ int ret;
+
+ ret = uverbs_copy_from(&desc, attrs, attr);
+ if (ret)
+ return ERR_PTR(ret);
+
+ if (desc.reserved[0] || desc.reserved[1])
+ return ERR_PTR(-EINVAL);
+
+ switch (desc.type) {
+ case EFA_UVERBS_BUFFER_TYPE_VA:
+ return ib_umem_get(ib_dev, desc.addr, desc.length, IB_ACCESS_LOCAL_WRITE);
+ case EFA_UVERBS_BUFFER_TYPE_DMABUF:
+ umem_dmabuf = ib_umem_dmabuf_get_pinned(ib_dev, desc.addr, desc.length, desc.fd,
+ IB_ACCESS_LOCAL_WRITE);
+ if (IS_ERR(umem_dmabuf))
+ return ERR_CAST(umem_dmabuf);
+ return &umem_dmabuf->umem;
+ default:
+ return ERR_PTR(-EINVAL);
+ }
+}
+
+int efa_create_comp_cntr(struct ib_comp_cntr *ibcc, struct uverbs_attr_bundle *attrs)
+{
+ struct efa_dev *dev = to_edev(ibcc->device);
+ struct efa_comp_cntr *cc = to_ecc(ibcc);
+ struct efa_ucontext *ucontext;
+ struct ib_umem *comp_umem;
+ struct ib_umem *err_umem;
+ int err;
+
+ ucontext = rdma_udata_to_drv_context(&attrs->driver_udata, struct efa_ucontext,
+ ibucontext);
+
+ comp_umem = efa_comp_cntr_get_umem(ibcc->device, attrs,
+ EFA_IB_ATTR_CREATE_COMP_CNTR_COMP_BUFFER);
+ if (IS_ERR(comp_umem))
+ return PTR_ERR(comp_umem);
+
+ err_umem = efa_comp_cntr_get_umem(ibcc->device, attrs,
+ EFA_IB_ATTR_CREATE_COMP_CNTR_ERR_BUFFER);
+ if (IS_ERR(err_umem)) {
+ err = PTR_ERR(err_umem);
+ goto err_comp_umem;
+ }
+
+ if (comp_umem->length < sizeof(u64) || err_umem->length < sizeof(u64)) {
+ ibdev_dbg(&dev->ibdev, "Completion Counter memory too small\n");
+ err = -EINVAL;
+ goto err_err_umem;
+ }
+
+ err = efa_create_event_counter(dev, comp_umem, ucontext->uarn, &cc->comp_handle);
+ if (err) {
+ ibdev_dbg(&dev->ibdev, "Failed to create comp event counter [%d]\n", err);
+ goto err_err_umem;
+ }
+
+ err = efa_create_event_counter(dev, err_umem, ucontext->uarn, &cc->err_handle);
+ if (err) {
+ ibdev_dbg(&dev->ibdev, "Failed to create err event counter [%d]\n", err);
+ goto err_destroy_comp_event_cntr;
+ }
+
+ cc->comp_umem = comp_umem;
+ cc->err_umem = err_umem;
+ ibcc->comp_count_max_value = dev->dev_attr.event_counter_max_val;
+ ibcc->err_count_max_value = dev->dev_attr.event_counter_max_val;
+
+ return 0;
+
+err_destroy_comp_event_cntr:
+ efa_destroy_event_counter(dev, cc->comp_handle);
+err_err_umem:
+ ib_umem_release(err_umem);
+err_comp_umem:
+ ib_umem_release(comp_umem);
+ return err;
+}
+
+int efa_destroy_comp_cntr(struct ib_comp_cntr *ibcc)
+{
+ struct efa_dev *dev = to_edev(ibcc->device);
+ struct efa_comp_cntr *cc = to_ecc(ibcc);
+ int err;
+
+ err = efa_destroy_event_counter(dev, cc->comp_handle);
+ if (err)
+ return err;
+
+ err = efa_destroy_event_counter(dev, cc->err_handle);
+ if (err)
+ return err;
+
+ ib_umem_release(cc->comp_umem);
+ ib_umem_release(cc->err_umem);
+ return 0;
+}
+
+int efa_modify_comp_cntr(struct ib_comp_cntr *ibcc, enum ib_comp_cntr_entry entry,
+ enum ib_comp_cntr_modify_op op, u64 value)
+{
+ struct efa_com_modify_counter_params params = {};
+ struct efa_comp_cntr *cc = to_ecc(ibcc);
+
+ params.cntr_handle = entry == IB_COMP_CNTR_ENTRY_ERR ? cc->err_handle : cc->comp_handle;
+ params.operation = op == IB_COMP_CNTR_MODIFY_OP_SET ?
+ EFA_ADMIN_COUNTER_MODIFY_SET : EFA_ADMIN_COUNTER_MODIFY_ADD;
+ params.value = value;
+
+ return efa_com_modify_counter(&to_edev(ibcc->device)->edev, ¶ms);
+}
+
+static u32 efa_comp_cntr_op_to_comp_events(u32 op_mask)
+{
+ u32 events = 0;
+
+ if (op_mask & IB_COMP_CNTR_ATTACH_OP_SEND)
+ EFA_SET(&events, EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_SEND_COMP, 1);
+ if (op_mask & IB_COMP_CNTR_ATTACH_OP_RECV)
+ EFA_SET(&events, EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_RECV_COMP, 1);
+ if (op_mask & IB_COMP_CNTR_ATTACH_OP_RDMA_READ)
+ EFA_SET(&events, EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_READ_COMP, 1);
+ if (op_mask & IB_COMP_CNTR_ATTACH_OP_REMOTE_RDMA_READ)
+ EFA_SET(&events, EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_REMOTE_READ_COMP, 1);
+ if (op_mask & IB_COMP_CNTR_ATTACH_OP_RDMA_WRITE)
+ EFA_SET(&events, EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_WRITE_COMP, 1);
+ if (op_mask & IB_COMP_CNTR_ATTACH_OP_REMOTE_RDMA_WRITE)
+ EFA_SET(&events, EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_REMOTE_WRITE_COMP, 1);
+
+ return events;
+}
+
+static u32 efa_comp_cntr_op_to_err_events(u32 op_mask)
+{
+ u32 events = 0;
+
+ if (op_mask & IB_COMP_CNTR_ATTACH_OP_SEND)
+ EFA_SET(&events, EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_SEND_COMP_ERR, 1);
+ if (op_mask & IB_COMP_CNTR_ATTACH_OP_RECV)
+ EFA_SET(&events, EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_RECV_COMP_ERR, 1);
+ if (op_mask & IB_COMP_CNTR_ATTACH_OP_RDMA_READ)
+ EFA_SET(&events, EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_READ_COMP_ERR, 1);
+ if (op_mask & IB_COMP_CNTR_ATTACH_OP_RDMA_WRITE)
+ EFA_SET(&events, EFA_ADMIN_COUNTER_ATTACH_QP_EVENTS_WRITE_COMP_ERR, 1);
+
+ return events;
+}
+
+int efa_qp_attach_comp_cntr(struct ib_qp *ibqp, struct ib_comp_cntr *ibcc,
+ struct ib_comp_cntr_attach_attr *attr)
+{
+ struct efa_com_attach_counter_params params;
+ struct efa_dev *dev = to_edev(ibqp->device);
+ struct efa_comp_cntr *cc = to_ecc(ibcc);
+ struct efa_qp *qp = to_eqp(ibqp);
+ int err;
+
+ params.cntr_handle = cc->comp_handle;
+ params.qp_handle = qp->qp_handle;
+ params.events = efa_comp_cntr_op_to_comp_events(attr->op_mask);
+
+ err = efa_com_attach_counter(&dev->edev, ¶ms);
+ if (err)
+ return err;
+
+ params.cntr_handle = cc->err_handle;
+ params.events = efa_comp_cntr_op_to_err_events(attr->op_mask);
+
+ return efa_com_attach_counter(&dev->edev, ¶ms);
+}
+
DECLARE_UVERBS_NAMED_METHOD(EFA_IB_METHOD_MR_QUERY,
UVERBS_ATTR_IDR(EFA_IB_ATTR_QUERY_MR_HANDLE,
UVERBS_OBJECT_MR,
@@ -2290,8 +2504,23 @@ ADD_UVERBS_METHODS(efa_mr,
UVERBS_OBJECT_MR,
&UVERBS_METHOD(EFA_IB_METHOD_MR_QUERY));
+ADD_UVERBS_ATTRIBUTES_SIMPLE(
+ efa_comp_cntr_create,
+ UVERBS_OBJECT_COMP_CNTR,
+ UVERBS_METHOD_COMP_CNTR_CREATE,
+ UVERBS_ATTR_PTR_IN(
+ EFA_IB_ATTR_CREATE_COMP_CNTR_COMP_BUFFER,
+ UVERBS_ATTR_STRUCT(struct efa_uverbs_buffer_desc, length),
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_IN(
+ EFA_IB_ATTR_CREATE_COMP_CNTR_ERR_BUFFER,
+ UVERBS_ATTR_STRUCT(struct efa_uverbs_buffer_desc, length),
+ UA_MANDATORY));
+
const struct uapi_definition efa_uapi_defs[] = {
UAPI_DEF_CHAIN_OBJ_TREE(UVERBS_OBJECT_MR,
&efa_mr),
+ UAPI_DEF_CHAIN_OBJ_TREE(UVERBS_OBJECT_COMP_CNTR,
+ &efa_comp_cntr_create),
{},
};
diff --git a/include/uapi/rdma/efa-abi.h b/include/uapi/rdma/efa-abi.h
index d5c18f8de182..a8a2cc09d964 100644
--- a/include/uapi/rdma/efa-abi.h
+++ b/include/uapi/rdma/efa-abi.h
@@ -133,6 +133,7 @@ enum {
EFA_QUERY_DEVICE_CAPS_RDMA_WRITE = 1 << 5,
EFA_QUERY_DEVICE_CAPS_UNSOLICITED_WRITE_RECV = 1 << 6,
EFA_QUERY_DEVICE_CAPS_CQ_WITH_EXT_MEM = 1 << 7,
+ EFA_QUERY_DEVICE_CAPS_COMP_CNTR = 1 << 8,
};
struct efa_ibv_ex_query_device_resp {
@@ -163,4 +164,22 @@ enum efa_mr_methods {
EFA_IB_METHOD_MR_QUERY = (1U << UVERBS_ID_NS_SHIFT),
};
+enum efa_uverbs_buffer_type {
+ EFA_UVERBS_BUFFER_TYPE_DMABUF = 0,
+ EFA_UVERBS_BUFFER_TYPE_VA = 1,
+};
+
+struct efa_uverbs_buffer_desc {
+ __s32 fd;
+ __u32 type;
+ __u32 reserved[2];
+ __aligned_u64 addr;
+ __aligned_u64 length;
+};
+
+enum efa_comp_cntr_create_attrs {
+ EFA_IB_ATTR_CREATE_COMP_CNTR_COMP_BUFFER = (1U << UVERBS_ID_NS_SHIFT),
+ EFA_IB_ATTR_CREATE_COMP_CNTR_ERR_BUFFER,
+};
+
#endif /* EFA_ABI_USER_H */
--
2.47.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-05-11 22:37 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11 22:37 [PATCH for-next v4 0/5] Introduce Completion Counters Michael Margolin
2026-05-11 22:37 ` [PATCH for-next v4 1/5] RDMA/core: Add Completion Counters support Michael Margolin
2026-05-11 22:37 ` [PATCH for-next v4 2/5] RDMA/core: Prevent destroying in-use completion counters Michael Margolin
2026-05-11 22:37 ` [PATCH for-next v4 3/5] RDMA/core: Add Completion Counters to resource tracking Michael Margolin
2026-05-11 22:37 ` [PATCH for-next v4 4/5] RDMA/efa: Update device interface Michael Margolin
2026-05-11 22:37 ` [PATCH for-next v4 5/5] RDMA/efa: Add Completion Counters support Michael Margolin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox