All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] Support PERF MGMT for RXE
@ 2026-04-06 13:28 zhenwei pi
  2026-04-06 13:28 ` [PATCH v4 1/4] RDMA/core: Fix memory free for GID table zhenwei pi
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: zhenwei pi @ 2026-04-06 13:28 UTC (permalink / raw)
  To: linux-kernel, linux-rdma; +Cc: zyjzyj2000, jgg, leon, zhenwei pi

v4:
- drop rxe_ib_device_get_netdev and RXE_PORT, use 1 instead
- avoid UAF to get skb length
- remove one-line wrapper rxe_counter_get, use atomic64_read instead
- fix memory free for GID table, this is a new patch in this series.

v3:
- merge 'RDMA/rxe: use rxe_counter_get' into previous commit
- zero *out* MAD memory
- return success with error status rather than failure to avoid
  uplayer hang

v2:
- Fix overflow for PMA counter *link_downed_counter*
- Use *rxe_counter_get* instead of *atomic64_read* for hw-counters

v1:
Support PERF MGMT for RXE, add sent/received bytes for RXE counters,
also improve coding style.

zhenwei pi (4):
  RDMA/core: Fix memory free for GID table
  RDMA/rxe: remove rxe_ib_device_get_netdev() and RXE_PORT
  RDMA/rxe: add SENT/RCVD bytes
  RDMA/rxe: support perf mgmt GET method

 drivers/infiniband/core/cache.c             |   1 -
 drivers/infiniband/sw/rxe/Makefile          |   1 +
 drivers/infiniband/sw/rxe/rxe_hw_counters.c |   2 +
 drivers/infiniband/sw/rxe/rxe_hw_counters.h |   2 +
 drivers/infiniband/sw/rxe/rxe_loc.h         |   6 ++
 drivers/infiniband/sw/rxe/rxe_mad.c         | 101 ++++++++++++++++++++
 drivers/infiniband/sw/rxe/rxe_mcast.c       |   4 +-
 drivers/infiniband/sw/rxe/rxe_net.c         |   9 +-
 drivers/infiniband/sw/rxe/rxe_recv.c        |   6 ++
 drivers/infiniband/sw/rxe/rxe_verbs.c       |   5 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h       |  10 +-
 11 files changed, 133 insertions(+), 14 deletions(-)
 create mode 100644 drivers/infiniband/sw/rxe/rxe_mad.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v4 1/4] RDMA/core: Fix memory free for GID table
  2026-04-06 13:28 [PATCH v4 0/4] Support PERF MGMT for RXE zhenwei pi
@ 2026-04-06 13:28 ` zhenwei pi
  2026-04-07 14:51   ` Jason Gunthorpe
  2026-04-06 13:28 ` [PATCH v4 2/4] RDMA/rxe: remove rxe_ib_device_get_netdev() and RXE_PORT zhenwei pi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: zhenwei pi @ 2026-04-06 13:28 UTC (permalink / raw)
  To: linux-kernel, linux-rdma; +Cc: zyjzyj2000, jgg, leon, zhenwei pi

Remove RXE device, kernel shows:
RIP: 0010:free_large_kmalloc+0xf6/0x140
Code: 75 28 0f 0b 44 0f b6 2d a5 d6 d1 01 41 80 fd 01 0f 87 7c d1 ad ff 41 83 e5 01 74 3d 41 bc 00 f0 ff ff 45 31 ed e9 61 ff ff ff <0f> 0b 48 c7 c6 af b1 70 83 48 89 df e8 79 0a fa ff 5b 41 5c 41 5d
RSP: 0018:ffffd038c18074d8 EFLAGS: 00010293
RAX: 0017ffffc0000000 RBX: fffff86984219d00 RCX: 0000000000000000
RDX: 00000000000000f0 RSI: ffff899b88674000 RDI: fffff86984219d00
RBP: ffffd038c18074f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff899b88674000
R13: 0000000000000001 R14: ffff899b88674000 R15: ffff899b86180000
FS:  00007b163c71c740(0000) GS:ffff899c378bf000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007b163c730200 CR3: 0000000106a1d000 CR4: 0000000000350ef0
Call Trace:
 <TASK>
 kfree+0x163/0x3a0
 gid_table_release_one+0xaf/0xf0 [ib_core]
 ib_cache_release_one+0x66/0x80 [ib_core]
 ib_device_release+0x48/0xb0 [ib_core]
 device_release+0x44/0xa0
 kobject_put+0x9b/0x250
 put_device+0x13/0x30
 ib_unregister_device_and_put+0x40/0x60 [ib_core]
 nldev_dellink+0xd3/0x140 [ib_core]
 rdma_nl_rcv_msg+0x11d/0x300 [ib_core]
 ? netlink_bind+0x141/0x3a0
 rdma_nl_rcv_skb.constprop.0.isra.0+0xba/0x110 [ib_core]
 rdma_nl_rcv+0xe/0x20 [ib_core]
 netlink_unicast+0x28d/0x3e0
 netlink_sendmsg+0x214/0x470
 __sys_sendto+0x21f/0x230
 __x64_sys_sendto+0x24/0x40
 x64_sys_call+0x1888/0x26e0
 do_syscall_64+0xcb/0x14d0
 ? _copy_from_user+0x27/0x70
 ? do_sock_setsockopt+0xbd/0x190
 ? __sys_setsockopt+0x72/0xd0
 ? __x64_sys_setsockopt+0x1f/0x40
 ? x64_sys_call+0x221b/0x26e0
 ? do_syscall_64+0x109/0x14d0
 ? exc_page_fault+0x92/0x1c0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

GID table is allocated by *kzalloc_flex* instead of raw *kzalloc_obj*,
it also should be released in new style.

Fixes: 74e2711bb2af ("RDMA/core: Use kzalloc_flex for GID table")
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
---
 drivers/infiniband/core/cache.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 896486fa6185..647a547e2d7f 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -801,7 +801,6 @@ static void release_gid_table(struct ib_device *device,
 	}
 
 	mutex_destroy(&table->lock);
-	kfree(table->data_vec);
 	kfree(table);
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 2/4] RDMA/rxe: remove rxe_ib_device_get_netdev() and RXE_PORT
  2026-04-06 13:28 [PATCH v4 0/4] Support PERF MGMT for RXE zhenwei pi
  2026-04-06 13:28 ` [PATCH v4 1/4] RDMA/core: Fix memory free for GID table zhenwei pi
@ 2026-04-06 13:28 ` zhenwei pi
  2026-04-06 13:28 ` [PATCH v4 3/4] RDMA/rxe: add SENT/RCVD bytes zhenwei pi
  2026-04-06 13:28 ` [PATCH v4 4/4] RDMA/rxe: support perf mgmt GET method zhenwei pi
  3 siblings, 0 replies; 8+ messages in thread
From: zhenwei pi @ 2026-04-06 13:28 UTC (permalink / raw)
  To: linux-kernel, linux-rdma; +Cc: zyjzyj2000, jgg, leon, zhenwei pi

Suggested by Leon, remove the rxe_ib_device_get_netdev() wrapper and
the RXE_PORT definition. These additions do not improve readability,
and RXE has always had only a single port.

Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 4 ++--
 drivers/infiniband/sw/rxe/rxe_net.c   | 7 +++----
 drivers/infiniband/sw/rxe/rxe_verbs.c | 4 ++--
 drivers/infiniband/sw/rxe/rxe_verbs.h | 6 ------
 4 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 5cad72073eca..acd03bd87794 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -34,7 +34,7 @@ static int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
 	struct net_device *ndev;
 	int ret;
 
-	ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
+	ndev = ib_device_get_netdev(&rxe->ib_dev, 1);
 	if (!ndev)
 		return -ENODEV;
 
@@ -59,7 +59,7 @@ static int rxe_mcast_del(struct rxe_dev *rxe, union ib_gid *mgid)
 	struct net_device *ndev;
 	int ret;
 
-	ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
+	ndev = ib_device_get_netdev(&rxe->ib_dev, 1);
 	if (!ndev)
 		return -ENODEV;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 211bd3000acc..6621d01ac32d 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -602,7 +602,7 @@ const char *rxe_parent_name(struct rxe_dev *rxe, unsigned int port_num)
 	struct net_device *ndev;
 	char *ndev_name;
 
-	ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
+	ndev = ib_device_get_netdev(&rxe->ib_dev, 1);
 	if (!ndev)
 		return NULL;
 	ndev_name = ndev->name;
@@ -646,12 +646,11 @@ static void rxe_sock_put(struct sock *sk,
 
 void rxe_net_del(struct ib_device *dev)
 {
-	struct rxe_dev *rxe = container_of(dev, struct rxe_dev, ib_dev);
 	struct net_device *ndev;
 	struct sock *sk;
 	struct net *net;
 
-	ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
+	ndev = ib_device_get_netdev(dev, 1);
 	if (!ndev)
 		return;
 
@@ -699,7 +698,7 @@ void rxe_set_port_state(struct rxe_dev *rxe)
 {
 	struct net_device *ndev;
 
-	ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
+	ndev = ib_device_get_netdev(&rxe->ib_dev, 1);
 	if (!ndev)
 		return;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 4e5c429aea37..d3b2d610ca37 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -50,7 +50,7 @@ static int rxe_query_port(struct ib_device *ibdev,
 		goto err_out;
 	}
 
-	ndev = rxe_ib_device_get_netdev(ibdev);
+	ndev = ib_device_get_netdev(ibdev, 1);
 	if (!ndev) {
 		err = -ENODEV;
 		goto err_out;
@@ -1450,7 +1450,7 @@ static int rxe_enable_driver(struct ib_device *ib_dev)
 	struct rxe_dev *rxe = container_of(ib_dev, struct rxe_dev, ib_dev);
 	struct net_device *ndev;
 
-	ndev = rxe_ib_device_get_netdev(ib_dev);
+	ndev = ib_device_get_netdev(ib_dev, 1);
 	if (!ndev)
 		return -ENODEV;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index d92f80d16f78..e800545d1046 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -415,7 +415,6 @@ struct rxe_port {
 	u32			qp_gsi_index;
 };
 
-#define	RXE_PORT	1
 struct rxe_dev {
 	struct ib_device	ib_dev;
 	struct ib_device_attr	attr;
@@ -451,11 +450,6 @@ struct rxe_dev {
 	struct rxe_port		port;
 };
 
-static inline struct net_device *rxe_ib_device_get_netdev(struct ib_device *dev)
-{
-	return ib_device_get_netdev(dev, RXE_PORT);
-}
-
 static inline void rxe_counter_inc(struct rxe_dev *rxe, enum rxe_counters index)
 {
 	atomic64_inc(&rxe->stats_counters[index]);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 3/4] RDMA/rxe: add SENT/RCVD bytes
  2026-04-06 13:28 [PATCH v4 0/4] Support PERF MGMT for RXE zhenwei pi
  2026-04-06 13:28 ` [PATCH v4 1/4] RDMA/core: Fix memory free for GID table zhenwei pi
  2026-04-06 13:28 ` [PATCH v4 2/4] RDMA/rxe: remove rxe_ib_device_get_netdev() and RXE_PORT zhenwei pi
@ 2026-04-06 13:28 ` zhenwei pi
  2026-04-06 14:55   ` Zhu Yanjun
  2026-04-06 13:28 ` [PATCH v4 4/4] RDMA/rxe: support perf mgmt GET method zhenwei pi
  3 siblings, 1 reply; 8+ messages in thread
From: zhenwei pi @ 2026-04-06 13:28 UTC (permalink / raw)
  To: linux-kernel, linux-rdma; +Cc: zyjzyj2000, jgg, leon, zhenwei pi

There is a lack of sent/received counter in bytes.

Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
---
 drivers/infiniband/sw/rxe/rxe_hw_counters.c | 2 ++
 drivers/infiniband/sw/rxe/rxe_hw_counters.h | 2 ++
 drivers/infiniband/sw/rxe/rxe_net.c         | 2 ++
 drivers/infiniband/sw/rxe/rxe_recv.c        | 6 ++++++
 drivers/infiniband/sw/rxe/rxe_verbs.h       | 6 ++++++
 5 files changed, 18 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.c b/drivers/infiniband/sw/rxe/rxe_hw_counters.c
index 437917a7d8f2..17edaa9a9b9b 100644
--- a/drivers/infiniband/sw/rxe/rxe_hw_counters.c
+++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.c
@@ -22,6 +22,8 @@ static const struct rdma_stat_desc rxe_counter_descs[] = {
 	[RXE_CNT_LINK_DOWNED].name         =  "link_downed",
 	[RXE_CNT_RDMA_SEND].name           =  "rdma_sends",
 	[RXE_CNT_RDMA_RECV].name           =  "rdma_recvs",
+	[RXE_CNT_SENT_BYTES].name          =  "sent_bytes",
+	[RXE_CNT_RCVD_BYTES].name          =  "rcvd_bytes",
 };
 
 int rxe_ib_get_hw_stats(struct ib_device *ibdev,
diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.h b/drivers/infiniband/sw/rxe/rxe_hw_counters.h
index 051f9e1c3852..01b355103cbc 100644
--- a/drivers/infiniband/sw/rxe/rxe_hw_counters.h
+++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.h
@@ -26,6 +26,8 @@ enum rxe_counters {
 	RXE_CNT_LINK_DOWNED,
 	RXE_CNT_RDMA_SEND,
 	RXE_CNT_RDMA_RECV,
+	RXE_CNT_SENT_BYTES,
+	RXE_CNT_RCVD_BYTES,
 	RXE_NUM_OF_COUNTERS
 };
 
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 6621d01ac32d..86660031ffa2 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -503,6 +503,7 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 	int err;
 	int is_request = pkt->mask & RXE_REQ_MASK;
 	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+	unsigned int skblen = skb->len;
 	unsigned long flags;
 
 	spin_lock_irqsave(&qp->state_lock, flags);
@@ -526,6 +527,7 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 	}
 
 	rxe_counter_inc(rxe, RXE_CNT_SENT_PKTS);
+	rxe_counter_add(rxe, RXE_CNT_SENT_BYTES, skblen);
 	goto done;
 
 drop:
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 5861e4244049..0d9112e95eae 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -318,6 +318,7 @@ void rxe_rcv(struct sk_buff *skb)
 	int err;
 	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
 	struct rxe_dev *rxe = pkt->rxe;
+	unsigned int skblen = skb->len + sizeof(struct udphdr);
 
 	if (unlikely(skb->len < RXE_BTH_BYTES))
 		goto drop;
@@ -341,6 +342,11 @@ void rxe_rcv(struct sk_buff *skb)
 	if (unlikely(err))
 		goto drop;
 
+	if (skb->protocol == htons(ETH_P_IP))
+		skblen += sizeof(struct iphdr);
+	else if (skb->protocol == htons(ETH_P_IPV6))
+		skblen += sizeof(struct ipv6hdr);
+	rxe_counter_add(rxe, RXE_CNT_RCVD_BYTES, skblen);
 	rxe_counter_inc(rxe, RXE_CNT_RCVD_PKTS);
 
 	if (unlikely(bth_qpn(pkt) == IB_MULTICAST_QPN))
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index e800545d1046..0f5ffd94643f 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -455,6 +455,12 @@ static inline void rxe_counter_inc(struct rxe_dev *rxe, enum rxe_counters index)
 	atomic64_inc(&rxe->stats_counters[index]);
 }
 
+static inline void rxe_counter_add(struct rxe_dev *rxe, enum rxe_counters index,
+				   s64 val)
+{
+	atomic64_add(val, &rxe->stats_counters[index]);
+}
+
 static inline struct rxe_dev *to_rdev(struct ib_device *dev)
 {
 	return dev ? container_of(dev, struct rxe_dev, ib_dev) : NULL;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 4/4] RDMA/rxe: support perf mgmt GET method
  2026-04-06 13:28 [PATCH v4 0/4] Support PERF MGMT for RXE zhenwei pi
                   ` (2 preceding siblings ...)
  2026-04-06 13:28 ` [PATCH v4 3/4] RDMA/rxe: add SENT/RCVD bytes zhenwei pi
@ 2026-04-06 13:28 ` zhenwei pi
  3 siblings, 0 replies; 8+ messages in thread
From: zhenwei pi @ 2026-04-06 13:28 UTC (permalink / raw)
  To: linux-kernel, linux-rdma; +Cc: zyjzyj2000, jgg, leon, zhenwei pi

In RXE, hardware counters are already supported, but not in a
standardized manner. For instance, user-space monitoring tools like
atop only read from the *counters* directory. Therefore, it is
necessary to add perf management support to RXE.

Also use rxe_counter_get instead of raw atomic64_read in hw-counters.

Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
---
 drivers/infiniband/sw/rxe/Makefile    |   1 +
 drivers/infiniband/sw/rxe/rxe_loc.h   |   6 ++
 drivers/infiniband/sw/rxe/rxe_mad.c   | 101 ++++++++++++++++++++++++++
 drivers/infiniband/sw/rxe/rxe_verbs.c |   1 +
 4 files changed, 109 insertions(+)
 create mode 100644 drivers/infiniband/sw/rxe/rxe_mad.c

diff --git a/drivers/infiniband/sw/rxe/Makefile b/drivers/infiniband/sw/rxe/Makefile
index 3977f4f13258..e097c1ca1874 100644
--- a/drivers/infiniband/sw/rxe/Makefile
+++ b/drivers/infiniband/sw/rxe/Makefile
@@ -23,6 +23,7 @@ rdma_rxe-y := \
 	rxe_task.o \
 	rxe_net.o \
 	rxe_hw_counters.o \
+	rxe_mad.o \
 	rxe_ns.o
 
 rdma_rxe-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += rxe_odp.o
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index e095c12699cb..64d636bf80fd 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -242,4 +242,10 @@ static inline int rxe_ib_advise_mr(struct ib_pd *pd,
 
 #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
 
+/* rxe-mad.c */
+int rxe_process_mad(struct ib_device *ibdev, int mad_flags, u32 port_num,
+		    const struct ib_wc *in_wc, const struct ib_grh *in_grh,
+		    const struct ib_mad *in, struct ib_mad *out,
+		    size_t *out_mad_size, u16 *out_mad_pkey_index);
+
 #endif /* RXE_LOC_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_mad.c b/drivers/infiniband/sw/rxe/rxe_mad.c
new file mode 100644
index 000000000000..7cf6d94e636e
--- /dev/null
+++ b/drivers/infiniband/sw/rxe/rxe_mad.c
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2026 zhenwei pi <zhenwei.pi@linux.dev>
+ */
+
+#include <rdma/ib_pma.h>
+#include "rxe.h"
+#include "rxe_hw_counters.h"
+
+static int rxe_get_pma_info(struct ib_mad *out)
+{
+	struct ib_class_port_info cpi = {};
+
+	cpi.capability_mask = IB_PMA_CLASS_CAP_EXT_WIDTH;
+	memcpy((out->data + 40), &cpi, sizeof(cpi));
+
+	return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY;
+}
+
+static int rxe_get_pma_counters(struct rxe_dev *rxe, struct ib_mad *out)
+{
+	struct ib_pma_portcounters *pma_cnt = (struct ib_pma_portcounters *)(out->data + 40);
+	s64 val;
+
+	/* IBA release 1.8, 16.1.3.5: During operation, instead of overflowing, they shall stop
+	 * at all ones.
+	 */
+	val = atomic64_read(&rxe->stats_counters[RXE_CNT_LINK_DOWNED]);
+	if (val > U8_MAX)
+		pma_cnt->link_downed_counter = U8_MAX;
+	else
+		pma_cnt->link_downed_counter = (u8)val;
+
+	return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY;
+}
+
+static int rxe_get_pma_counters_ext(struct rxe_dev *rxe, struct ib_mad *out)
+{
+	struct ib_pma_portcounters_ext *pma_cnt_ext =
+		(struct ib_pma_portcounters_ext *)(out->data + 40);
+	s64 val;
+
+	val = atomic64_read(&rxe->stats_counters[RXE_CNT_SENT_BYTES]);
+	pma_cnt_ext->port_xmit_data = cpu_to_be64(val >> 2);
+
+	val = atomic64_read(&rxe->stats_counters[RXE_CNT_RCVD_BYTES]);
+	pma_cnt_ext->port_rcv_data = cpu_to_be64(val >> 2);
+
+	val = atomic64_read(&rxe->stats_counters[RXE_CNT_SENT_PKTS]);
+	pma_cnt_ext->port_xmit_packets = cpu_to_be64(val);
+
+	val = atomic64_read(&rxe->stats_counters[RXE_CNT_RCVD_PKTS]);
+	pma_cnt_ext->port_rcv_packets = cpu_to_be64(val);
+
+	return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY;
+}
+
+static int rxe_get_perf_mgmt(struct rxe_dev *rxe, const struct ib_mad *in, struct ib_mad *out)
+{
+	switch (in->mad_hdr.attr_id) {
+	case IB_PMA_CLASS_PORT_INFO:
+		return rxe_get_pma_info(out);
+
+	case IB_PMA_PORT_COUNTERS:
+		return rxe_get_pma_counters(rxe, out);
+
+	case IB_PMA_PORT_COUNTERS_EXT:
+		return rxe_get_pma_counters_ext(rxe, out);
+
+	default:
+		out->mad_hdr.status = cpu_to_be16(IB_MGMT_MAD_STATUS_UNSUPPORTED_METHOD_ATTRIB);
+		return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY;
+	}
+}
+
+int rxe_process_mad(struct ib_device *ibdev, int mad_flags, u32 port_num,
+		    const struct ib_wc *in_wc, const struct ib_grh *in_grh,
+		    const struct ib_mad *in, struct ib_mad *out,
+		    size_t *out_mad_size, u16 *out_mad_pkey_index)
+{
+	struct rxe_dev *rxe = to_rdev(ibdev);
+	u8 mgmt_class = in->mad_hdr.mgmt_class;
+	u8 method = in->mad_hdr.method;
+
+	if (port_num != 1)
+		return IB_MAD_RESULT_FAILURE;
+
+	memset(out, 0, sizeof(*out));
+	switch (mgmt_class) {
+	case IB_MGMT_CLASS_PERF_MGMT:
+		if (method == IB_MGMT_METHOD_GET)
+			return rxe_get_perf_mgmt(rxe, in, out);
+		break;
+
+	default:
+		break;
+	}
+
+	out->mad_hdr.status = cpu_to_be16(IB_MGMT_MAD_STATUS_UNSUPPORTED_METHOD);
+	return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY;
+}
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index d3b2d610ca37..1ef5cddf620a 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1505,6 +1505,7 @@ static const struct ib_device_ops rxe_dev_ops = {
 	.post_recv = rxe_post_recv,
 	.post_send = rxe_post_send,
 	.post_srq_recv = rxe_post_srq_recv,
+	.process_mad = rxe_process_mad,
 	.query_ah = rxe_query_ah,
 	.query_device = rxe_query_device,
 	.query_pkey = rxe_query_pkey,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 3/4] RDMA/rxe: add SENT/RCVD bytes
  2026-04-06 13:28 ` [PATCH v4 3/4] RDMA/rxe: add SENT/RCVD bytes zhenwei pi
@ 2026-04-06 14:55   ` Zhu Yanjun
  2026-04-07  0:58     ` zhenwei pi
  0 siblings, 1 reply; 8+ messages in thread
From: Zhu Yanjun @ 2026-04-06 14:55 UTC (permalink / raw)
  To: zhenwei pi, linux-kernel, linux-rdma, yanjun.zhu@linux.dev
  Cc: zyjzyj2000, jgg, leon

在 2026/4/6 6:28, zhenwei pi 写道:
> There is a lack of sent/received counter in bytes.
> 
> Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
> ---
>   drivers/infiniband/sw/rxe/rxe_hw_counters.c | 2 ++
>   drivers/infiniband/sw/rxe/rxe_hw_counters.h | 2 ++
>   drivers/infiniband/sw/rxe/rxe_net.c         | 2 ++
>   drivers/infiniband/sw/rxe/rxe_recv.c        | 6 ++++++
>   drivers/infiniband/sw/rxe/rxe_verbs.h       | 6 ++++++
>   5 files changed, 18 insertions(+)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.c b/drivers/infiniband/sw/rxe/rxe_hw_counters.c
> index 437917a7d8f2..17edaa9a9b9b 100644
> --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.c
> +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.c
> @@ -22,6 +22,8 @@ static const struct rdma_stat_desc rxe_counter_descs[] = {
>   	[RXE_CNT_LINK_DOWNED].name         =  "link_downed",
>   	[RXE_CNT_RDMA_SEND].name           =  "rdma_sends",
>   	[RXE_CNT_RDMA_RECV].name           =  "rdma_recvs",
> +	[RXE_CNT_SENT_BYTES].name          =  "sent_bytes",
> +	[RXE_CNT_RCVD_BYTES].name          =  "rcvd_bytes",
>   };
>   
>   int rxe_ib_get_hw_stats(struct ib_device *ibdev,
> diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.h b/drivers/infiniband/sw/rxe/rxe_hw_counters.h
> index 051f9e1c3852..01b355103cbc 100644
> --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.h
> +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.h
> @@ -26,6 +26,8 @@ enum rxe_counters {
>   	RXE_CNT_LINK_DOWNED,
>   	RXE_CNT_RDMA_SEND,
>   	RXE_CNT_RDMA_RECV,
> +	RXE_CNT_SENT_BYTES,
> +	RXE_CNT_RCVD_BYTES,
>   	RXE_NUM_OF_COUNTERS
>   };
>   
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 6621d01ac32d..86660031ffa2 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -503,6 +503,7 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
>   	int err;
>   	int is_request = pkt->mask & RXE_REQ_MASK;
>   	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
> +	unsigned int skblen = skb->len;
>   	unsigned long flags;
>   
>   	spin_lock_irqsave(&qp->state_lock, flags);
> @@ -526,6 +527,7 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
>   	}
>   
>   	rxe_counter_inc(rxe, RXE_CNT_SENT_PKTS);
> +	rxe_counter_add(rxe, RXE_CNT_SENT_BYTES, skblen);
>   	goto done;
>   
>   drop:
> diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
> index 5861e4244049..0d9112e95eae 100644
> --- a/drivers/infiniband/sw/rxe/rxe_recv.c
> +++ b/drivers/infiniband/sw/rxe/rxe_recv.c
> @@ -318,6 +318,7 @@ void rxe_rcv(struct sk_buff *skb)
>   	int err;
>   	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
>   	struct rxe_dev *rxe = pkt->rxe;
> +	unsigned int skblen = skb->len + sizeof(struct udphdr);
>   
>   	if (unlikely(skb->len < RXE_BTH_BYTES))
>   		goto drop;
> @@ -341,6 +342,11 @@ void rxe_rcv(struct sk_buff *skb)
>   	if (unlikely(err))
>   		goto drop;
>   
> +	if (skb->protocol == htons(ETH_P_IP))
> +		skblen += sizeof(struct iphdr);
> +	else if (skb->protocol == htons(ETH_P_IPV6))
> +		skblen += sizeof(struct ipv6hdr);
> +	rxe_counter_add(rxe, RXE_CNT_RCVD_BYTES, skblen);

 From the above source code, I think that you want to calculate total 
length starting from the Network Layer (IP Header).
Maybe the following is compact.

"
unsigned int skblen = skb->len - skb_network_offset(skb);
rxe_counter_add(rxe, RXE_CNT_RCVD_BYTES, skblen);
"

Zhu Yanjun

>   	rxe_counter_inc(rxe, RXE_CNT_RCVD_PKTS);
>   
>   	if (unlikely(bth_qpn(pkt) == IB_MULTICAST_QPN))
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
> index e800545d1046..0f5ffd94643f 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
> @@ -455,6 +455,12 @@ static inline void rxe_counter_inc(struct rxe_dev *rxe, enum rxe_counters index)
>   	atomic64_inc(&rxe->stats_counters[index]);
>   }
>   
> +static inline void rxe_counter_add(struct rxe_dev *rxe, enum rxe_counters index,
> +				   s64 val)
> +{
> +	atomic64_add(val, &rxe->stats_counters[index]);
> +}
> +
>   static inline struct rxe_dev *to_rdev(struct ib_device *dev)
>   {
>   	return dev ? container_of(dev, struct rxe_dev, ib_dev) : NULL;


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 3/4] RDMA/rxe: add SENT/RCVD bytes
  2026-04-06 14:55   ` Zhu Yanjun
@ 2026-04-07  0:58     ` zhenwei pi
  0 siblings, 0 replies; 8+ messages in thread
From: zhenwei pi @ 2026-04-07  0:58 UTC (permalink / raw)
  To: Zhu Yanjun, linux-kernel, linux-rdma; +Cc: zyjzyj2000, jgg, leon



On 4/6/26 22:55, Zhu Yanjun wrote:
> 在 2026/4/6 6:28, zhenwei pi 写道:
>> There is a lack of sent/received counter in bytes.
>>
>> Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
>> ---
>>   drivers/infiniband/sw/rxe/rxe_hw_counters.c | 2 ++
>>   drivers/infiniband/sw/rxe/rxe_hw_counters.h | 2 ++
>>   drivers/infiniband/sw/rxe/rxe_net.c         | 2 ++
>>   drivers/infiniband/sw/rxe/rxe_recv.c        | 6 ++++++
>>   drivers/infiniband/sw/rxe/rxe_verbs.h       | 6 ++++++
>>   5 files changed, 18 insertions(+)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.c b/drivers/ 
>> infiniband/sw/rxe/rxe_hw_counters.c
>> index 437917a7d8f2..17edaa9a9b9b 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.c
>> @@ -22,6 +22,8 @@ static const struct rdma_stat_desc 
>> rxe_counter_descs[] = {
>>       [RXE_CNT_LINK_DOWNED].name         =  "link_downed",
>>       [RXE_CNT_RDMA_SEND].name           =  "rdma_sends",
>>       [RXE_CNT_RDMA_RECV].name           =  "rdma_recvs",
>> +    [RXE_CNT_SENT_BYTES].name          =  "sent_bytes",
>> +    [RXE_CNT_RCVD_BYTES].name          =  "rcvd_bytes",
>>   };
>>   int rxe_ib_get_hw_stats(struct ib_device *ibdev,
>> diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.h b/drivers/ 
>> infiniband/sw/rxe/rxe_hw_counters.h
>> index 051f9e1c3852..01b355103cbc 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.h
>> +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.h
>> @@ -26,6 +26,8 @@ enum rxe_counters {
>>       RXE_CNT_LINK_DOWNED,
>>       RXE_CNT_RDMA_SEND,
>>       RXE_CNT_RDMA_RECV,
>> +    RXE_CNT_SENT_BYTES,
>> +    RXE_CNT_RCVD_BYTES,
>>       RXE_NUM_OF_COUNTERS
>>   };
>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/ 
>> sw/rxe/rxe_net.c
>> index 6621d01ac32d..86660031ffa2 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
>> @@ -503,6 +503,7 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct 
>> rxe_pkt_info *pkt,
>>       int err;
>>       int is_request = pkt->mask & RXE_REQ_MASK;
>>       struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
>> +    unsigned int skblen = skb->len;
>>       unsigned long flags;
>>       spin_lock_irqsave(&qp->state_lock, flags);
>> @@ -526,6 +527,7 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct 
>> rxe_pkt_info *pkt,
>>       }
>>       rxe_counter_inc(rxe, RXE_CNT_SENT_PKTS);
>> +    rxe_counter_add(rxe, RXE_CNT_SENT_BYTES, skblen);
>>       goto done;
>>   drop:
>> diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/ 
>> infiniband/sw/rxe/rxe_recv.c
>> index 5861e4244049..0d9112e95eae 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_recv.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_recv.c
>> @@ -318,6 +318,7 @@ void rxe_rcv(struct sk_buff *skb)
>>       int err;
>>       struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
>>       struct rxe_dev *rxe = pkt->rxe;
>> +    unsigned int skblen = skb->len + sizeof(struct udphdr);
>>       if (unlikely(skb->len < RXE_BTH_BYTES))
>>           goto drop;
>> @@ -341,6 +342,11 @@ void rxe_rcv(struct sk_buff *skb)
>>       if (unlikely(err))
>>           goto drop;
>> +    if (skb->protocol == htons(ETH_P_IP))
>> +        skblen += sizeof(struct iphdr);
>> +    else if (skb->protocol == htons(ETH_P_IPV6))
>> +        skblen += sizeof(struct ipv6hdr);
>> +    rxe_counter_add(rxe, RXE_CNT_RCVD_BYTES, skblen);
> 
>  From the above source code, I think that you want to calculate total 
> length starting from the Network Layer (IP Header).
> Maybe the following is compact.
> 
> "
> unsigned int skblen = skb->len - skb_network_offset(skb);
> rxe_counter_add(rxe, RXE_CNT_RCVD_BYTES, skblen);
> "
> 
> Zhu Yanjun
> 

Yes, TX side uses the total length of IP + UDP + IB as sent bytes, RX 
side should record the same length. Because IP and UDP headers have 
already been pulled at this stage, so add the addional length here. 
skb_network_offset(skb) is fine, I'll use it instead in the next version 
later.

Thanks.

>>       rxe_counter_inc(rxe, RXE_CNT_RCVD_PKTS);
>>       if (unlikely(bth_qpn(pkt) == IB_MULTICAST_QPN))
>> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/ 
>> infiniband/sw/rxe/rxe_verbs.h
>> index e800545d1046..0f5ffd94643f 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
>> @@ -455,6 +455,12 @@ static inline void rxe_counter_inc(struct rxe_dev 
>> *rxe, enum rxe_counters index)
>>       atomic64_inc(&rxe->stats_counters[index]);
>>   }
>> +static inline void rxe_counter_add(struct rxe_dev *rxe, enum 
>> rxe_counters index,
>> +                   s64 val)
>> +{
>> +    atomic64_add(val, &rxe->stats_counters[index]);
>> +}
>> +
>>   static inline struct rxe_dev *to_rdev(struct ib_device *dev)
>>   {
>>       return dev ? container_of(dev, struct rxe_dev, ib_dev) : NULL;
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 1/4] RDMA/core: Fix memory free for GID table
  2026-04-06 13:28 ` [PATCH v4 1/4] RDMA/core: Fix memory free for GID table zhenwei pi
@ 2026-04-07 14:51   ` Jason Gunthorpe
  0 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2026-04-07 14:51 UTC (permalink / raw)
  To: zhenwei pi; +Cc: linux-kernel, linux-rdma, zyjzyj2000, leon

On Mon, Apr 06, 2026 at 09:28:26PM +0800, zhenwei pi wrote:
> Remove RXE device, kernel shows:
> RIP: 0010:free_large_kmalloc+0xf6/0x140
> Code: 75 28 0f 0b 44 0f b6 2d a5 d6 d1 01 41 80 fd 01 0f 87 7c d1 ad ff 41 83 e5 01 74 3d 41 bc 00 f0 ff ff 45 31 ed e9 61 ff ff ff <0f> 0b 48 c7 c6 af b1 70 83 48 89 df e8 79 0a fa ff 5b 41 5c 41 5d
> RSP: 0018:ffffd038c18074d8 EFLAGS: 00010293
> RAX: 0017ffffc0000000 RBX: fffff86984219d00 RCX: 0000000000000000
> RDX: 00000000000000f0 RSI: ffff899b88674000 RDI: fffff86984219d00
> RBP: ffffd038c18074f0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff899b88674000
> R13: 0000000000000001 R14: ffff899b88674000 R15: ffff899b86180000
> FS:  00007b163c71c740(0000) GS:ffff899c378bf000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007b163c730200 CR3: 0000000106a1d000 CR4: 0000000000350ef0
> Call Trace:
>  <TASK>
>  kfree+0x163/0x3a0
>  gid_table_release_one+0xaf/0xf0 [ib_core]
>  ib_cache_release_one+0x66/0x80 [ib_core]
>  ib_device_release+0x48/0xb0 [ib_core]
>  device_release+0x44/0xa0
>  kobject_put+0x9b/0x250
>  put_device+0x13/0x30
>  ib_unregister_device_and_put+0x40/0x60 [ib_core]
>  nldev_dellink+0xd3/0x140 [ib_core]
>  rdma_nl_rcv_msg+0x11d/0x300 [ib_core]
>  ? netlink_bind+0x141/0x3a0
>  rdma_nl_rcv_skb.constprop.0.isra.0+0xba/0x110 [ib_core]
>  rdma_nl_rcv+0xe/0x20 [ib_core]
>  netlink_unicast+0x28d/0x3e0
>  netlink_sendmsg+0x214/0x470
>  __sys_sendto+0x21f/0x230
>  __x64_sys_sendto+0x24/0x40
>  x64_sys_call+0x1888/0x26e0
>  do_syscall_64+0xcb/0x14d0
>  ? _copy_from_user+0x27/0x70
>  ? do_sock_setsockopt+0xbd/0x190
>  ? __sys_setsockopt+0x72/0xd0
>  ? __x64_sys_setsockopt+0x1f/0x40
>  ? x64_sys_call+0x221b/0x26e0
>  ? do_syscall_64+0x109/0x14d0
>  ? exc_page_fault+0x92/0x1c0
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> GID table is allocated by *kzalloc_flex* instead of raw *kzalloc_obj*,
> it also should be released in new style.
> 
> Fixes: 74e2711bb2af ("RDMA/core: Use kzalloc_flex for GID table")
> Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
> ---
>  drivers/infiniband/core/cache.c | 1 -
>  1 file changed, 1 deletion(-)

Applied this patch to for-next

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-04-07 14:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-06 13:28 [PATCH v4 0/4] Support PERF MGMT for RXE zhenwei pi
2026-04-06 13:28 ` [PATCH v4 1/4] RDMA/core: Fix memory free for GID table zhenwei pi
2026-04-07 14:51   ` Jason Gunthorpe
2026-04-06 13:28 ` [PATCH v4 2/4] RDMA/rxe: remove rxe_ib_device_get_netdev() and RXE_PORT zhenwei pi
2026-04-06 13:28 ` [PATCH v4 3/4] RDMA/rxe: add SENT/RCVD bytes zhenwei pi
2026-04-06 14:55   ` Zhu Yanjun
2026-04-07  0:58     ` zhenwei pi
2026-04-06 13:28 ` [PATCH v4 4/4] RDMA/rxe: support perf mgmt GET method zhenwei pi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.