* [PATCH v4 0/4] Support PERF MGMT for RXE
@ 2026-04-06 13:28 zhenwei pi
2026-04-06 13:28 ` [PATCH v4 1/4] RDMA/core: Fix memory free for GID table zhenwei pi
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: zhenwei pi @ 2026-04-06 13:28 UTC (permalink / raw)
To: linux-kernel, linux-rdma; +Cc: zyjzyj2000, jgg, leon, zhenwei pi
v4:
- drop rxe_ib_device_get_netdev and RXE_PORT, use 1 instead
- avoid UAF to get skb length
- remove one-line wrapper rxe_counter_get, use atomic64_read instead
- fix memory free for GID table, this is a new patch in this series.
v3:
- merge 'RDMA/rxe: use rxe_counter_get' into previous commit
- zero *out* MAD memory
- return success with error status rather than failure to avoid
uplayer hang
v2:
- Fix overflow for PMA counter *link_downed_counter*
- Use *rxe_counter_get* instead of *atomic64_read* for hw-counters
v1:
Support PERF MGMT for RXE, add sent/received bytes for RXE counters,
also improve coding style.
zhenwei pi (4):
RDMA/core: Fix memory free for GID table
RDMA/rxe: remove rxe_ib_device_get_netdev() and RXE_PORT
RDMA/rxe: add SENT/RCVD bytes
RDMA/rxe: support perf mgmt GET method
drivers/infiniband/core/cache.c | 1 -
drivers/infiniband/sw/rxe/Makefile | 1 +
drivers/infiniband/sw/rxe/rxe_hw_counters.c | 2 +
drivers/infiniband/sw/rxe/rxe_hw_counters.h | 2 +
drivers/infiniband/sw/rxe/rxe_loc.h | 6 ++
drivers/infiniband/sw/rxe/rxe_mad.c | 101 ++++++++++++++++++++
drivers/infiniband/sw/rxe/rxe_mcast.c | 4 +-
drivers/infiniband/sw/rxe/rxe_net.c | 9 +-
drivers/infiniband/sw/rxe/rxe_recv.c | 6 ++
drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +-
drivers/infiniband/sw/rxe/rxe_verbs.h | 10 +-
11 files changed, 133 insertions(+), 14 deletions(-)
create mode 100644 drivers/infiniband/sw/rxe/rxe_mad.c
--
2.43.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v4 1/4] RDMA/core: Fix memory free for GID table
2026-04-06 13:28 [PATCH v4 0/4] Support PERF MGMT for RXE zhenwei pi
@ 2026-04-06 13:28 ` zhenwei pi
2026-04-07 14:51 ` Jason Gunthorpe
2026-04-06 13:28 ` [PATCH v4 2/4] RDMA/rxe: remove rxe_ib_device_get_netdev() and RXE_PORT zhenwei pi
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: zhenwei pi @ 2026-04-06 13:28 UTC (permalink / raw)
To: linux-kernel, linux-rdma; +Cc: zyjzyj2000, jgg, leon, zhenwei pi
Remove RXE device, kernel shows:
RIP: 0010:free_large_kmalloc+0xf6/0x140
Code: 75 28 0f 0b 44 0f b6 2d a5 d6 d1 01 41 80 fd 01 0f 87 7c d1 ad ff 41 83 e5 01 74 3d 41 bc 00 f0 ff ff 45 31 ed e9 61 ff ff ff <0f> 0b 48 c7 c6 af b1 70 83 48 89 df e8 79 0a fa ff 5b 41 5c 41 5d
RSP: 0018:ffffd038c18074d8 EFLAGS: 00010293
RAX: 0017ffffc0000000 RBX: fffff86984219d00 RCX: 0000000000000000
RDX: 00000000000000f0 RSI: ffff899b88674000 RDI: fffff86984219d00
RBP: ffffd038c18074f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff899b88674000
R13: 0000000000000001 R14: ffff899b88674000 R15: ffff899b86180000
FS: 00007b163c71c740(0000) GS:ffff899c378bf000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007b163c730200 CR3: 0000000106a1d000 CR4: 0000000000350ef0
Call Trace:
<TASK>
kfree+0x163/0x3a0
gid_table_release_one+0xaf/0xf0 [ib_core]
ib_cache_release_one+0x66/0x80 [ib_core]
ib_device_release+0x48/0xb0 [ib_core]
device_release+0x44/0xa0
kobject_put+0x9b/0x250
put_device+0x13/0x30
ib_unregister_device_and_put+0x40/0x60 [ib_core]
nldev_dellink+0xd3/0x140 [ib_core]
rdma_nl_rcv_msg+0x11d/0x300 [ib_core]
? netlink_bind+0x141/0x3a0
rdma_nl_rcv_skb.constprop.0.isra.0+0xba/0x110 [ib_core]
rdma_nl_rcv+0xe/0x20 [ib_core]
netlink_unicast+0x28d/0x3e0
netlink_sendmsg+0x214/0x470
__sys_sendto+0x21f/0x230
__x64_sys_sendto+0x24/0x40
x64_sys_call+0x1888/0x26e0
do_syscall_64+0xcb/0x14d0
? _copy_from_user+0x27/0x70
? do_sock_setsockopt+0xbd/0x190
? __sys_setsockopt+0x72/0xd0
? __x64_sys_setsockopt+0x1f/0x40
? x64_sys_call+0x221b/0x26e0
? do_syscall_64+0x109/0x14d0
? exc_page_fault+0x92/0x1c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
GID table is allocated by *kzalloc_flex* instead of raw *kzalloc_obj*,
it also should be released in new style.
Fixes: 74e2711bb2af ("RDMA/core: Use kzalloc_flex for GID table")
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
---
drivers/infiniband/core/cache.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 896486fa6185..647a547e2d7f 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -801,7 +801,6 @@ static void release_gid_table(struct ib_device *device,
}
mutex_destroy(&table->lock);
- kfree(table->data_vec);
kfree(table);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v4 2/4] RDMA/rxe: remove rxe_ib_device_get_netdev() and RXE_PORT
2026-04-06 13:28 [PATCH v4 0/4] Support PERF MGMT for RXE zhenwei pi
2026-04-06 13:28 ` [PATCH v4 1/4] RDMA/core: Fix memory free for GID table zhenwei pi
@ 2026-04-06 13:28 ` zhenwei pi
2026-04-06 13:28 ` [PATCH v4 3/4] RDMA/rxe: add SENT/RCVD bytes zhenwei pi
2026-04-06 13:28 ` [PATCH v4 4/4] RDMA/rxe: support perf mgmt GET method zhenwei pi
3 siblings, 0 replies; 8+ messages in thread
From: zhenwei pi @ 2026-04-06 13:28 UTC (permalink / raw)
To: linux-kernel, linux-rdma; +Cc: zyjzyj2000, jgg, leon, zhenwei pi
Suggested by Leon, remove the rxe_ib_device_get_netdev() wrapper and
the RXE_PORT definition. These additions do not improve readability,
and RXE has always had only a single port.
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
---
drivers/infiniband/sw/rxe/rxe_mcast.c | 4 ++--
drivers/infiniband/sw/rxe/rxe_net.c | 7 +++----
drivers/infiniband/sw/rxe/rxe_verbs.c | 4 ++--
drivers/infiniband/sw/rxe/rxe_verbs.h | 6 ------
4 files changed, 7 insertions(+), 14 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 5cad72073eca..acd03bd87794 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -34,7 +34,7 @@ static int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
struct net_device *ndev;
int ret;
- ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
+ ndev = ib_device_get_netdev(&rxe->ib_dev, 1);
if (!ndev)
return -ENODEV;
@@ -59,7 +59,7 @@ static int rxe_mcast_del(struct rxe_dev *rxe, union ib_gid *mgid)
struct net_device *ndev;
int ret;
- ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
+ ndev = ib_device_get_netdev(&rxe->ib_dev, 1);
if (!ndev)
return -ENODEV;
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 211bd3000acc..6621d01ac32d 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -602,7 +602,7 @@ const char *rxe_parent_name(struct rxe_dev *rxe, unsigned int port_num)
struct net_device *ndev;
char *ndev_name;
- ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
+ ndev = ib_device_get_netdev(&rxe->ib_dev, 1);
if (!ndev)
return NULL;
ndev_name = ndev->name;
@@ -646,12 +646,11 @@ static void rxe_sock_put(struct sock *sk,
void rxe_net_del(struct ib_device *dev)
{
- struct rxe_dev *rxe = container_of(dev, struct rxe_dev, ib_dev);
struct net_device *ndev;
struct sock *sk;
struct net *net;
- ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
+ ndev = ib_device_get_netdev(dev, 1);
if (!ndev)
return;
@@ -699,7 +698,7 @@ void rxe_set_port_state(struct rxe_dev *rxe)
{
struct net_device *ndev;
- ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
+ ndev = ib_device_get_netdev(&rxe->ib_dev, 1);
if (!ndev)
return;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 4e5c429aea37..d3b2d610ca37 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -50,7 +50,7 @@ static int rxe_query_port(struct ib_device *ibdev,
goto err_out;
}
- ndev = rxe_ib_device_get_netdev(ibdev);
+ ndev = ib_device_get_netdev(ibdev, 1);
if (!ndev) {
err = -ENODEV;
goto err_out;
@@ -1450,7 +1450,7 @@ static int rxe_enable_driver(struct ib_device *ib_dev)
struct rxe_dev *rxe = container_of(ib_dev, struct rxe_dev, ib_dev);
struct net_device *ndev;
- ndev = rxe_ib_device_get_netdev(ib_dev);
+ ndev = ib_device_get_netdev(ib_dev, 1);
if (!ndev)
return -ENODEV;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index d92f80d16f78..e800545d1046 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -415,7 +415,6 @@ struct rxe_port {
u32 qp_gsi_index;
};
-#define RXE_PORT 1
struct rxe_dev {
struct ib_device ib_dev;
struct ib_device_attr attr;
@@ -451,11 +450,6 @@ struct rxe_dev {
struct rxe_port port;
};
-static inline struct net_device *rxe_ib_device_get_netdev(struct ib_device *dev)
-{
- return ib_device_get_netdev(dev, RXE_PORT);
-}
-
static inline void rxe_counter_inc(struct rxe_dev *rxe, enum rxe_counters index)
{
atomic64_inc(&rxe->stats_counters[index]);
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v4 3/4] RDMA/rxe: add SENT/RCVD bytes
2026-04-06 13:28 [PATCH v4 0/4] Support PERF MGMT for RXE zhenwei pi
2026-04-06 13:28 ` [PATCH v4 1/4] RDMA/core: Fix memory free for GID table zhenwei pi
2026-04-06 13:28 ` [PATCH v4 2/4] RDMA/rxe: remove rxe_ib_device_get_netdev() and RXE_PORT zhenwei pi
@ 2026-04-06 13:28 ` zhenwei pi
2026-04-06 14:55 ` Zhu Yanjun
2026-04-06 13:28 ` [PATCH v4 4/4] RDMA/rxe: support perf mgmt GET method zhenwei pi
3 siblings, 1 reply; 8+ messages in thread
From: zhenwei pi @ 2026-04-06 13:28 UTC (permalink / raw)
To: linux-kernel, linux-rdma; +Cc: zyjzyj2000, jgg, leon, zhenwei pi
There is a lack of sent/received counter in bytes.
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
---
drivers/infiniband/sw/rxe/rxe_hw_counters.c | 2 ++
drivers/infiniband/sw/rxe/rxe_hw_counters.h | 2 ++
drivers/infiniband/sw/rxe/rxe_net.c | 2 ++
drivers/infiniband/sw/rxe/rxe_recv.c | 6 ++++++
drivers/infiniband/sw/rxe/rxe_verbs.h | 6 ++++++
5 files changed, 18 insertions(+)
diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.c b/drivers/infiniband/sw/rxe/rxe_hw_counters.c
index 437917a7d8f2..17edaa9a9b9b 100644
--- a/drivers/infiniband/sw/rxe/rxe_hw_counters.c
+++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.c
@@ -22,6 +22,8 @@ static const struct rdma_stat_desc rxe_counter_descs[] = {
[RXE_CNT_LINK_DOWNED].name = "link_downed",
[RXE_CNT_RDMA_SEND].name = "rdma_sends",
[RXE_CNT_RDMA_RECV].name = "rdma_recvs",
+ [RXE_CNT_SENT_BYTES].name = "sent_bytes",
+ [RXE_CNT_RCVD_BYTES].name = "rcvd_bytes",
};
int rxe_ib_get_hw_stats(struct ib_device *ibdev,
diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.h b/drivers/infiniband/sw/rxe/rxe_hw_counters.h
index 051f9e1c3852..01b355103cbc 100644
--- a/drivers/infiniband/sw/rxe/rxe_hw_counters.h
+++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.h
@@ -26,6 +26,8 @@ enum rxe_counters {
RXE_CNT_LINK_DOWNED,
RXE_CNT_RDMA_SEND,
RXE_CNT_RDMA_RECV,
+ RXE_CNT_SENT_BYTES,
+ RXE_CNT_RCVD_BYTES,
RXE_NUM_OF_COUNTERS
};
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 6621d01ac32d..86660031ffa2 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -503,6 +503,7 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
int err;
int is_request = pkt->mask & RXE_REQ_MASK;
struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+ unsigned int skblen = skb->len;
unsigned long flags;
spin_lock_irqsave(&qp->state_lock, flags);
@@ -526,6 +527,7 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
}
rxe_counter_inc(rxe, RXE_CNT_SENT_PKTS);
+ rxe_counter_add(rxe, RXE_CNT_SENT_BYTES, skblen);
goto done;
drop:
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 5861e4244049..0d9112e95eae 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -318,6 +318,7 @@ void rxe_rcv(struct sk_buff *skb)
int err;
struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
struct rxe_dev *rxe = pkt->rxe;
+ unsigned int skblen = skb->len + sizeof(struct udphdr);
if (unlikely(skb->len < RXE_BTH_BYTES))
goto drop;
@@ -341,6 +342,11 @@ void rxe_rcv(struct sk_buff *skb)
if (unlikely(err))
goto drop;
+ if (skb->protocol == htons(ETH_P_IP))
+ skblen += sizeof(struct iphdr);
+ else if (skb->protocol == htons(ETH_P_IPV6))
+ skblen += sizeof(struct ipv6hdr);
+ rxe_counter_add(rxe, RXE_CNT_RCVD_BYTES, skblen);
rxe_counter_inc(rxe, RXE_CNT_RCVD_PKTS);
if (unlikely(bth_qpn(pkt) == IB_MULTICAST_QPN))
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index e800545d1046..0f5ffd94643f 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -455,6 +455,12 @@ static inline void rxe_counter_inc(struct rxe_dev *rxe, enum rxe_counters index)
atomic64_inc(&rxe->stats_counters[index]);
}
+static inline void rxe_counter_add(struct rxe_dev *rxe, enum rxe_counters index,
+ s64 val)
+{
+ atomic64_add(val, &rxe->stats_counters[index]);
+}
+
static inline struct rxe_dev *to_rdev(struct ib_device *dev)
{
return dev ? container_of(dev, struct rxe_dev, ib_dev) : NULL;
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v4 4/4] RDMA/rxe: support perf mgmt GET method
2026-04-06 13:28 [PATCH v4 0/4] Support PERF MGMT for RXE zhenwei pi
` (2 preceding siblings ...)
2026-04-06 13:28 ` [PATCH v4 3/4] RDMA/rxe: add SENT/RCVD bytes zhenwei pi
@ 2026-04-06 13:28 ` zhenwei pi
3 siblings, 0 replies; 8+ messages in thread
From: zhenwei pi @ 2026-04-06 13:28 UTC (permalink / raw)
To: linux-kernel, linux-rdma; +Cc: zyjzyj2000, jgg, leon, zhenwei pi
In RXE, hardware counters are already supported, but not in a
standardized manner. For instance, user-space monitoring tools like
atop only read from the *counters* directory. Therefore, it is
necessary to add perf management support to RXE.
Also use rxe_counter_get instead of raw atomic64_read in hw-counters.
Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
---
drivers/infiniband/sw/rxe/Makefile | 1 +
drivers/infiniband/sw/rxe/rxe_loc.h | 6 ++
drivers/infiniband/sw/rxe/rxe_mad.c | 101 ++++++++++++++++++++++++++
drivers/infiniband/sw/rxe/rxe_verbs.c | 1 +
4 files changed, 109 insertions(+)
create mode 100644 drivers/infiniband/sw/rxe/rxe_mad.c
diff --git a/drivers/infiniband/sw/rxe/Makefile b/drivers/infiniband/sw/rxe/Makefile
index 3977f4f13258..e097c1ca1874 100644
--- a/drivers/infiniband/sw/rxe/Makefile
+++ b/drivers/infiniband/sw/rxe/Makefile
@@ -23,6 +23,7 @@ rdma_rxe-y := \
rxe_task.o \
rxe_net.o \
rxe_hw_counters.o \
+ rxe_mad.o \
rxe_ns.o
rdma_rxe-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += rxe_odp.o
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index e095c12699cb..64d636bf80fd 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -242,4 +242,10 @@ static inline int rxe_ib_advise_mr(struct ib_pd *pd,
#endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
+/* rxe-mad.c */
+int rxe_process_mad(struct ib_device *ibdev, int mad_flags, u32 port_num,
+ const struct ib_wc *in_wc, const struct ib_grh *in_grh,
+ const struct ib_mad *in, struct ib_mad *out,
+ size_t *out_mad_size, u16 *out_mad_pkey_index);
+
#endif /* RXE_LOC_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_mad.c b/drivers/infiniband/sw/rxe/rxe_mad.c
new file mode 100644
index 000000000000..7cf6d94e636e
--- /dev/null
+++ b/drivers/infiniband/sw/rxe/rxe_mad.c
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2026 zhenwei pi <zhenwei.pi@linux.dev>
+ */
+
+#include <rdma/ib_pma.h>
+#include "rxe.h"
+#include "rxe_hw_counters.h"
+
+static int rxe_get_pma_info(struct ib_mad *out)
+{
+ struct ib_class_port_info cpi = {};
+
+ cpi.capability_mask = IB_PMA_CLASS_CAP_EXT_WIDTH;
+ memcpy((out->data + 40), &cpi, sizeof(cpi));
+
+ return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY;
+}
+
+static int rxe_get_pma_counters(struct rxe_dev *rxe, struct ib_mad *out)
+{
+ struct ib_pma_portcounters *pma_cnt = (struct ib_pma_portcounters *)(out->data + 40);
+ s64 val;
+
+ /* IBA release 1.8, 16.1.3.5: During operation, instead of overflowing, they shall stop
+ * at all ones.
+ */
+ val = atomic64_read(&rxe->stats_counters[RXE_CNT_LINK_DOWNED]);
+ if (val > U8_MAX)
+ pma_cnt->link_downed_counter = U8_MAX;
+ else
+ pma_cnt->link_downed_counter = (u8)val;
+
+ return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY;
+}
+
+static int rxe_get_pma_counters_ext(struct rxe_dev *rxe, struct ib_mad *out)
+{
+ struct ib_pma_portcounters_ext *pma_cnt_ext =
+ (struct ib_pma_portcounters_ext *)(out->data + 40);
+ s64 val;
+
+ val = atomic64_read(&rxe->stats_counters[RXE_CNT_SENT_BYTES]);
+ pma_cnt_ext->port_xmit_data = cpu_to_be64(val >> 2);
+
+ val = atomic64_read(&rxe->stats_counters[RXE_CNT_RCVD_BYTES]);
+ pma_cnt_ext->port_rcv_data = cpu_to_be64(val >> 2);
+
+ val = atomic64_read(&rxe->stats_counters[RXE_CNT_SENT_PKTS]);
+ pma_cnt_ext->port_xmit_packets = cpu_to_be64(val);
+
+ val = atomic64_read(&rxe->stats_counters[RXE_CNT_RCVD_PKTS]);
+ pma_cnt_ext->port_rcv_packets = cpu_to_be64(val);
+
+ return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY;
+}
+
+static int rxe_get_perf_mgmt(struct rxe_dev *rxe, const struct ib_mad *in, struct ib_mad *out)
+{
+ switch (in->mad_hdr.attr_id) {
+ case IB_PMA_CLASS_PORT_INFO:
+ return rxe_get_pma_info(out);
+
+ case IB_PMA_PORT_COUNTERS:
+ return rxe_get_pma_counters(rxe, out);
+
+ case IB_PMA_PORT_COUNTERS_EXT:
+ return rxe_get_pma_counters_ext(rxe, out);
+
+ default:
+ out->mad_hdr.status = cpu_to_be16(IB_MGMT_MAD_STATUS_UNSUPPORTED_METHOD_ATTRIB);
+ return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY;
+ }
+}
+
+int rxe_process_mad(struct ib_device *ibdev, int mad_flags, u32 port_num,
+ const struct ib_wc *in_wc, const struct ib_grh *in_grh,
+ const struct ib_mad *in, struct ib_mad *out,
+ size_t *out_mad_size, u16 *out_mad_pkey_index)
+{
+ struct rxe_dev *rxe = to_rdev(ibdev);
+ u8 mgmt_class = in->mad_hdr.mgmt_class;
+ u8 method = in->mad_hdr.method;
+
+ if (port_num != 1)
+ return IB_MAD_RESULT_FAILURE;
+
+ memset(out, 0, sizeof(*out));
+ switch (mgmt_class) {
+ case IB_MGMT_CLASS_PERF_MGMT:
+ if (method == IB_MGMT_METHOD_GET)
+ return rxe_get_perf_mgmt(rxe, in, out);
+ break;
+
+ default:
+ break;
+ }
+
+ out->mad_hdr.status = cpu_to_be16(IB_MGMT_MAD_STATUS_UNSUPPORTED_METHOD);
+ return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY;
+}
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index d3b2d610ca37..1ef5cddf620a 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1505,6 +1505,7 @@ static const struct ib_device_ops rxe_dev_ops = {
.post_recv = rxe_post_recv,
.post_send = rxe_post_send,
.post_srq_recv = rxe_post_srq_recv,
+ .process_mad = rxe_process_mad,
.query_ah = rxe_query_ah,
.query_device = rxe_query_device,
.query_pkey = rxe_query_pkey,
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v4 3/4] RDMA/rxe: add SENT/RCVD bytes
2026-04-06 13:28 ` [PATCH v4 3/4] RDMA/rxe: add SENT/RCVD bytes zhenwei pi
@ 2026-04-06 14:55 ` Zhu Yanjun
2026-04-07 0:58 ` zhenwei pi
0 siblings, 1 reply; 8+ messages in thread
From: Zhu Yanjun @ 2026-04-06 14:55 UTC (permalink / raw)
To: zhenwei pi, linux-kernel, linux-rdma, yanjun.zhu@linux.dev
Cc: zyjzyj2000, jgg, leon
在 2026/4/6 6:28, zhenwei pi 写道:
> There is a lack of sent/received counter in bytes.
>
> Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
> ---
> drivers/infiniband/sw/rxe/rxe_hw_counters.c | 2 ++
> drivers/infiniband/sw/rxe/rxe_hw_counters.h | 2 ++
> drivers/infiniband/sw/rxe/rxe_net.c | 2 ++
> drivers/infiniband/sw/rxe/rxe_recv.c | 6 ++++++
> drivers/infiniband/sw/rxe/rxe_verbs.h | 6 ++++++
> 5 files changed, 18 insertions(+)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.c b/drivers/infiniband/sw/rxe/rxe_hw_counters.c
> index 437917a7d8f2..17edaa9a9b9b 100644
> --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.c
> +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.c
> @@ -22,6 +22,8 @@ static const struct rdma_stat_desc rxe_counter_descs[] = {
> [RXE_CNT_LINK_DOWNED].name = "link_downed",
> [RXE_CNT_RDMA_SEND].name = "rdma_sends",
> [RXE_CNT_RDMA_RECV].name = "rdma_recvs",
> + [RXE_CNT_SENT_BYTES].name = "sent_bytes",
> + [RXE_CNT_RCVD_BYTES].name = "rcvd_bytes",
> };
>
> int rxe_ib_get_hw_stats(struct ib_device *ibdev,
> diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.h b/drivers/infiniband/sw/rxe/rxe_hw_counters.h
> index 051f9e1c3852..01b355103cbc 100644
> --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.h
> +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.h
> @@ -26,6 +26,8 @@ enum rxe_counters {
> RXE_CNT_LINK_DOWNED,
> RXE_CNT_RDMA_SEND,
> RXE_CNT_RDMA_RECV,
> + RXE_CNT_SENT_BYTES,
> + RXE_CNT_RCVD_BYTES,
> RXE_NUM_OF_COUNTERS
> };
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 6621d01ac32d..86660031ffa2 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -503,6 +503,7 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
> int err;
> int is_request = pkt->mask & RXE_REQ_MASK;
> struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
> + unsigned int skblen = skb->len;
> unsigned long flags;
>
> spin_lock_irqsave(&qp->state_lock, flags);
> @@ -526,6 +527,7 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
> }
>
> rxe_counter_inc(rxe, RXE_CNT_SENT_PKTS);
> + rxe_counter_add(rxe, RXE_CNT_SENT_BYTES, skblen);
> goto done;
>
> drop:
> diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
> index 5861e4244049..0d9112e95eae 100644
> --- a/drivers/infiniband/sw/rxe/rxe_recv.c
> +++ b/drivers/infiniband/sw/rxe/rxe_recv.c
> @@ -318,6 +318,7 @@ void rxe_rcv(struct sk_buff *skb)
> int err;
> struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
> struct rxe_dev *rxe = pkt->rxe;
> + unsigned int skblen = skb->len + sizeof(struct udphdr);
>
> if (unlikely(skb->len < RXE_BTH_BYTES))
> goto drop;
> @@ -341,6 +342,11 @@ void rxe_rcv(struct sk_buff *skb)
> if (unlikely(err))
> goto drop;
>
> + if (skb->protocol == htons(ETH_P_IP))
> + skblen += sizeof(struct iphdr);
> + else if (skb->protocol == htons(ETH_P_IPV6))
> + skblen += sizeof(struct ipv6hdr);
> + rxe_counter_add(rxe, RXE_CNT_RCVD_BYTES, skblen);
From the above source code, I think that you want to calculate total
length starting from the Network Layer (IP Header).
Maybe the following is compact.
"
unsigned int skblen = skb->len - skb_network_offset(skb);
rxe_counter_add(rxe, RXE_CNT_RCVD_BYTES, skblen);
"
Zhu Yanjun
> rxe_counter_inc(rxe, RXE_CNT_RCVD_PKTS);
>
> if (unlikely(bth_qpn(pkt) == IB_MULTICAST_QPN))
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
> index e800545d1046..0f5ffd94643f 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
> @@ -455,6 +455,12 @@ static inline void rxe_counter_inc(struct rxe_dev *rxe, enum rxe_counters index)
> atomic64_inc(&rxe->stats_counters[index]);
> }
>
> +static inline void rxe_counter_add(struct rxe_dev *rxe, enum rxe_counters index,
> + s64 val)
> +{
> + atomic64_add(val, &rxe->stats_counters[index]);
> +}
> +
> static inline struct rxe_dev *to_rdev(struct ib_device *dev)
> {
> return dev ? container_of(dev, struct rxe_dev, ib_dev) : NULL;
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 3/4] RDMA/rxe: add SENT/RCVD bytes
2026-04-06 14:55 ` Zhu Yanjun
@ 2026-04-07 0:58 ` zhenwei pi
0 siblings, 0 replies; 8+ messages in thread
From: zhenwei pi @ 2026-04-07 0:58 UTC (permalink / raw)
To: Zhu Yanjun, linux-kernel, linux-rdma; +Cc: zyjzyj2000, jgg, leon
On 4/6/26 22:55, Zhu Yanjun wrote:
> 在 2026/4/6 6:28, zhenwei pi 写道:
>> There is a lack of sent/received counter in bytes.
>>
>> Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
>> ---
>> drivers/infiniband/sw/rxe/rxe_hw_counters.c | 2 ++
>> drivers/infiniband/sw/rxe/rxe_hw_counters.h | 2 ++
>> drivers/infiniband/sw/rxe/rxe_net.c | 2 ++
>> drivers/infiniband/sw/rxe/rxe_recv.c | 6 ++++++
>> drivers/infiniband/sw/rxe/rxe_verbs.h | 6 ++++++
>> 5 files changed, 18 insertions(+)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.c b/drivers/
>> infiniband/sw/rxe/rxe_hw_counters.c
>> index 437917a7d8f2..17edaa9a9b9b 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.c
>> @@ -22,6 +22,8 @@ static const struct rdma_stat_desc
>> rxe_counter_descs[] = {
>> [RXE_CNT_LINK_DOWNED].name = "link_downed",
>> [RXE_CNT_RDMA_SEND].name = "rdma_sends",
>> [RXE_CNT_RDMA_RECV].name = "rdma_recvs",
>> + [RXE_CNT_SENT_BYTES].name = "sent_bytes",
>> + [RXE_CNT_RCVD_BYTES].name = "rcvd_bytes",
>> };
>> int rxe_ib_get_hw_stats(struct ib_device *ibdev,
>> diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.h b/drivers/
>> infiniband/sw/rxe/rxe_hw_counters.h
>> index 051f9e1c3852..01b355103cbc 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.h
>> +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.h
>> @@ -26,6 +26,8 @@ enum rxe_counters {
>> RXE_CNT_LINK_DOWNED,
>> RXE_CNT_RDMA_SEND,
>> RXE_CNT_RDMA_RECV,
>> + RXE_CNT_SENT_BYTES,
>> + RXE_CNT_RCVD_BYTES,
>> RXE_NUM_OF_COUNTERS
>> };
>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/
>> sw/rxe/rxe_net.c
>> index 6621d01ac32d..86660031ffa2 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
>> @@ -503,6 +503,7 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct
>> rxe_pkt_info *pkt,
>> int err;
>> int is_request = pkt->mask & RXE_REQ_MASK;
>> struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
>> + unsigned int skblen = skb->len;
>> unsigned long flags;
>> spin_lock_irqsave(&qp->state_lock, flags);
>> @@ -526,6 +527,7 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct
>> rxe_pkt_info *pkt,
>> }
>> rxe_counter_inc(rxe, RXE_CNT_SENT_PKTS);
>> + rxe_counter_add(rxe, RXE_CNT_SENT_BYTES, skblen);
>> goto done;
>> drop:
>> diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/
>> infiniband/sw/rxe/rxe_recv.c
>> index 5861e4244049..0d9112e95eae 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_recv.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_recv.c
>> @@ -318,6 +318,7 @@ void rxe_rcv(struct sk_buff *skb)
>> int err;
>> struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
>> struct rxe_dev *rxe = pkt->rxe;
>> + unsigned int skblen = skb->len + sizeof(struct udphdr);
>> if (unlikely(skb->len < RXE_BTH_BYTES))
>> goto drop;
>> @@ -341,6 +342,11 @@ void rxe_rcv(struct sk_buff *skb)
>> if (unlikely(err))
>> goto drop;
>> + if (skb->protocol == htons(ETH_P_IP))
>> + skblen += sizeof(struct iphdr);
>> + else if (skb->protocol == htons(ETH_P_IPV6))
>> + skblen += sizeof(struct ipv6hdr);
>> + rxe_counter_add(rxe, RXE_CNT_RCVD_BYTES, skblen);
>
> From the above source code, I think that you want to calculate total
> length starting from the Network Layer (IP Header).
> Maybe the following is compact.
>
> "
> unsigned int skblen = skb->len - skb_network_offset(skb);
> rxe_counter_add(rxe, RXE_CNT_RCVD_BYTES, skblen);
> "
>
> Zhu Yanjun
>
Yes, TX side uses the total length of IP + UDP + IB as sent bytes, RX
side should record the same length. Because IP and UDP headers have
already been pulled at this stage, so add the addional length here.
skb_network_offset(skb) is fine, I'll use it instead in the next version
later.
Thanks.
>> rxe_counter_inc(rxe, RXE_CNT_RCVD_PKTS);
>> if (unlikely(bth_qpn(pkt) == IB_MULTICAST_QPN))
>> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/
>> infiniband/sw/rxe/rxe_verbs.h
>> index e800545d1046..0f5ffd94643f 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
>> @@ -455,6 +455,12 @@ static inline void rxe_counter_inc(struct rxe_dev
>> *rxe, enum rxe_counters index)
>> atomic64_inc(&rxe->stats_counters[index]);
>> }
>> +static inline void rxe_counter_add(struct rxe_dev *rxe, enum
>> rxe_counters index,
>> + s64 val)
>> +{
>> + atomic64_add(val, &rxe->stats_counters[index]);
>> +}
>> +
>> static inline struct rxe_dev *to_rdev(struct ib_device *dev)
>> {
>> return dev ? container_of(dev, struct rxe_dev, ib_dev) : NULL;
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 1/4] RDMA/core: Fix memory free for GID table
2026-04-06 13:28 ` [PATCH v4 1/4] RDMA/core: Fix memory free for GID table zhenwei pi
@ 2026-04-07 14:51 ` Jason Gunthorpe
0 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2026-04-07 14:51 UTC (permalink / raw)
To: zhenwei pi; +Cc: linux-kernel, linux-rdma, zyjzyj2000, leon
On Mon, Apr 06, 2026 at 09:28:26PM +0800, zhenwei pi wrote:
> Remove RXE device, kernel shows:
> RIP: 0010:free_large_kmalloc+0xf6/0x140
> Code: 75 28 0f 0b 44 0f b6 2d a5 d6 d1 01 41 80 fd 01 0f 87 7c d1 ad ff 41 83 e5 01 74 3d 41 bc 00 f0 ff ff 45 31 ed e9 61 ff ff ff <0f> 0b 48 c7 c6 af b1 70 83 48 89 df e8 79 0a fa ff 5b 41 5c 41 5d
> RSP: 0018:ffffd038c18074d8 EFLAGS: 00010293
> RAX: 0017ffffc0000000 RBX: fffff86984219d00 RCX: 0000000000000000
> RDX: 00000000000000f0 RSI: ffff899b88674000 RDI: fffff86984219d00
> RBP: ffffd038c18074f0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff899b88674000
> R13: 0000000000000001 R14: ffff899b88674000 R15: ffff899b86180000
> FS: 00007b163c71c740(0000) GS:ffff899c378bf000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007b163c730200 CR3: 0000000106a1d000 CR4: 0000000000350ef0
> Call Trace:
> <TASK>
> kfree+0x163/0x3a0
> gid_table_release_one+0xaf/0xf0 [ib_core]
> ib_cache_release_one+0x66/0x80 [ib_core]
> ib_device_release+0x48/0xb0 [ib_core]
> device_release+0x44/0xa0
> kobject_put+0x9b/0x250
> put_device+0x13/0x30
> ib_unregister_device_and_put+0x40/0x60 [ib_core]
> nldev_dellink+0xd3/0x140 [ib_core]
> rdma_nl_rcv_msg+0x11d/0x300 [ib_core]
> ? netlink_bind+0x141/0x3a0
> rdma_nl_rcv_skb.constprop.0.isra.0+0xba/0x110 [ib_core]
> rdma_nl_rcv+0xe/0x20 [ib_core]
> netlink_unicast+0x28d/0x3e0
> netlink_sendmsg+0x214/0x470
> __sys_sendto+0x21f/0x230
> __x64_sys_sendto+0x24/0x40
> x64_sys_call+0x1888/0x26e0
> do_syscall_64+0xcb/0x14d0
> ? _copy_from_user+0x27/0x70
> ? do_sock_setsockopt+0xbd/0x190
> ? __sys_setsockopt+0x72/0xd0
> ? __x64_sys_setsockopt+0x1f/0x40
> ? x64_sys_call+0x221b/0x26e0
> ? do_syscall_64+0x109/0x14d0
> ? exc_page_fault+0x92/0x1c0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> GID table is allocated by *kzalloc_flex* instead of raw *kzalloc_obj*,
> it also should be released in new style.
>
> Fixes: 74e2711bb2af ("RDMA/core: Use kzalloc_flex for GID table")
> Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
> ---
> drivers/infiniband/core/cache.c | 1 -
> 1 file changed, 1 deletion(-)
Applied this patch to for-next
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-04-07 14:51 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-06 13:28 [PATCH v4 0/4] Support PERF MGMT for RXE zhenwei pi
2026-04-06 13:28 ` [PATCH v4 1/4] RDMA/core: Fix memory free for GID table zhenwei pi
2026-04-07 14:51 ` Jason Gunthorpe
2026-04-06 13:28 ` [PATCH v4 2/4] RDMA/rxe: remove rxe_ib_device_get_netdev() and RXE_PORT zhenwei pi
2026-04-06 13:28 ` [PATCH v4 3/4] RDMA/rxe: add SENT/RCVD bytes zhenwei pi
2026-04-06 14:55 ` Zhu Yanjun
2026-04-07 0:58 ` zhenwei pi
2026-04-06 13:28 ` [PATCH v4 4/4] RDMA/rxe: support perf mgmt GET method zhenwei pi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.