* [PATCH 0/3] Support PERF MGMT for RXE
@ 2026-03-28 9:28 zhenwei pi
2026-03-28 9:28 ` [PATCH 1/3] RDMA/rxe: use RXE_PORT instead of magic number 1 zhenwei pi
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: zhenwei pi @ 2026-03-28 9:28 UTC (permalink / raw)
To: linux-kernel, linux-rdma; +Cc: zyjzyj2000, jgg, leon, zhenwei pi
Support PERF MGMT for RXE, add sent/received bytes for RXE counters,
also improve coding style.
zhenwei pi (3):
RDMA/rxe: use RXE_PORT instead of magic number 1
RDMA/rxe: add SENT/RCVD bytes
RDMA/rxe: support perf mgmt
drivers/infiniband/sw/rxe/Makefile | 1 +
drivers/infiniband/sw/rxe/rxe_hw_counters.c | 2 +
drivers/infiniband/sw/rxe/rxe_hw_counters.h | 2 +
drivers/infiniband/sw/rxe/rxe_loc.h | 6 ++
drivers/infiniband/sw/rxe/rxe_mad.c | 86 +++++++++++++++++++++
drivers/infiniband/sw/rxe/rxe_net.c | 7 +-
drivers/infiniband/sw/rxe/rxe_recv.c | 1 +
drivers/infiniband/sw/rxe/rxe_verbs.c | 9 ++-
drivers/infiniband/sw/rxe/rxe_verbs.h | 11 +++
9 files changed, 118 insertions(+), 7 deletions(-)
create mode 100644 drivers/infiniband/sw/rxe/rxe_mad.c
--
2.43.0
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH 1/3] RDMA/rxe: use RXE_PORT instead of magic number 1 2026-03-28 9:28 [PATCH 0/3] Support PERF MGMT for RXE zhenwei pi @ 2026-03-28 9:28 ` zhenwei pi 2026-03-28 9:28 ` [PATCH 2/3] RDMA/rxe: add SENT/RCVD bytes zhenwei pi 2026-03-28 9:28 ` [PATCH 3/3] RDMA/rxe: support perf mgmt zhenwei pi 2 siblings, 0 replies; 7+ messages in thread From: zhenwei pi @ 2026-03-28 9:28 UTC (permalink / raw) To: linux-kernel, linux-rdma; +Cc: zyjzyj2000, jgg, leon, zhenwei pi Align with the existing code: static ... rxe_ib_device_get_netdev(struct ib_device *dev) { return ib_device_get_netdev(dev, RXE_PORT); } Use *RXE_PORT* instead of magic number 1 for all. Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev> --- drivers/infiniband/sw/rxe/rxe_net.c | 6 +++--- drivers/infiniband/sw/rxe/rxe_verbs.c | 8 ++++---- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c index 0bd0902b11f7..20338cb8e3c2 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.c +++ b/drivers/infiniband/sw/rxe/rxe_net.c @@ -234,7 +234,7 @@ static int rxe_udp_encap_recv(struct sock *sk, struct sk_buff *skb) udph = udp_hdr(skb); pkt->rxe = rxe; - pkt->port_num = 1; + pkt->port_num = RXE_PORT; pkt->hdr = (u8 *)(udph + 1); pkt->mask = RXE_GRH_MASK; pkt->paylen = be16_to_cpu(udph->len) - sizeof(*udph); @@ -535,7 +535,7 @@ struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av, struct sk_buff *skb = NULL; struct net_device *ndev; const struct ib_gid_attr *attr; - const int port_num = 1; + const int port_num = RXE_PORT; attr = rdma_get_gid_attr(&rxe->ib_dev, port_num, av->grh.sgid_index); if (IS_ERR(attr)) @@ -630,7 +630,7 @@ static void rxe_port_event(struct rxe_dev *rxe, struct ib_event ev; ev.device = &rxe->ib_dev; - ev.element.port_num = 1; + ev.element.port_num = RXE_PORT; ev.event = event; ib_dispatch_event(&ev); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index fe41362c5144..bcd486e8668b 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -44,7 +44,7 @@ static int rxe_query_port(struct ib_device *ibdev, struct net_device *ndev; int err, ret; - if (port_num != 1) { + if (port_num != RXE_PORT) { err = -EINVAL; rxe_dbg_dev(rxe, "bad port_num = %d\n", port_num); goto err_out; @@ -147,7 +147,7 @@ static int rxe_modify_port(struct ib_device *ibdev, u32 port_num, struct rxe_port *port; int err; - if (port_num != 1) { + if (port_num != RXE_PORT) { err = -EINVAL; rxe_dbg_dev(rxe, "bad port_num = %d\n", port_num); goto err_out; @@ -180,7 +180,7 @@ static enum rdma_link_layer rxe_get_link_layer(struct ib_device *ibdev, struct rxe_dev *rxe = to_rdev(ibdev); int err; - if (port_num != 1) { + if (port_num != RXE_PORT) { err = -EINVAL; rxe_dbg_dev(rxe, "bad port_num = %d\n", port_num); goto err_out; @@ -200,7 +200,7 @@ static int rxe_port_immutable(struct ib_device *ibdev, u32 port_num, struct ib_port_attr attr = {}; int err; - if (port_num != 1) { + if (port_num != RXE_PORT) { err = -EINVAL; rxe_dbg_dev(rxe, "bad port_num = %d\n", port_num); goto err_out; -- 2.43.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/3] RDMA/rxe: add SENT/RCVD bytes 2026-03-28 9:28 [PATCH 0/3] Support PERF MGMT for RXE zhenwei pi 2026-03-28 9:28 ` [PATCH 1/3] RDMA/rxe: use RXE_PORT instead of magic number 1 zhenwei pi @ 2026-03-28 9:28 ` zhenwei pi 2026-03-28 18:02 ` Zhu Yanjun 2026-03-28 9:28 ` [PATCH 3/3] RDMA/rxe: support perf mgmt zhenwei pi 2 siblings, 1 reply; 7+ messages in thread From: zhenwei pi @ 2026-03-28 9:28 UTC (permalink / raw) To: linux-kernel, linux-rdma; +Cc: zyjzyj2000, jgg, leon, zhenwei pi There is a lack of sent/received counter in bytes. Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev> --- drivers/infiniband/sw/rxe/rxe_hw_counters.c | 2 ++ drivers/infiniband/sw/rxe/rxe_hw_counters.h | 2 ++ drivers/infiniband/sw/rxe/rxe_net.c | 1 + drivers/infiniband/sw/rxe/rxe_recv.c | 1 + drivers/infiniband/sw/rxe/rxe_verbs.h | 6 ++++++ 5 files changed, 12 insertions(+) diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.c b/drivers/infiniband/sw/rxe/rxe_hw_counters.c index 437917a7d8f2..17edaa9a9b9b 100644 --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.c +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.c @@ -22,6 +22,8 @@ static const struct rdma_stat_desc rxe_counter_descs[] = { [RXE_CNT_LINK_DOWNED].name = "link_downed", [RXE_CNT_RDMA_SEND].name = "rdma_sends", [RXE_CNT_RDMA_RECV].name = "rdma_recvs", + [RXE_CNT_SENT_BYTES].name = "sent_bytes", + [RXE_CNT_RCVD_BYTES].name = "rcvd_bytes", }; int rxe_ib_get_hw_stats(struct ib_device *ibdev, diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.h b/drivers/infiniband/sw/rxe/rxe_hw_counters.h index 051f9e1c3852..01b355103cbc 100644 --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.h +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.h @@ -26,6 +26,8 @@ enum rxe_counters { RXE_CNT_LINK_DOWNED, RXE_CNT_RDMA_SEND, RXE_CNT_RDMA_RECV, + RXE_CNT_SENT_BYTES, + RXE_CNT_RCVD_BYTES, RXE_NUM_OF_COUNTERS }; diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c index 20338cb8e3c2..ec0ae7479fe7 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.c +++ b/drivers/infiniband/sw/rxe/rxe_net.c @@ -519,6 +519,7 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt, } rxe_counter_inc(rxe, RXE_CNT_SENT_PKTS); + rxe_counter_add(rxe, RXE_CNT_SENT_BYTES, skb->len); goto done; drop: diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c index 5861e4244049..b5522017852d 100644 --- a/drivers/infiniband/sw/rxe/rxe_recv.c +++ b/drivers/infiniband/sw/rxe/rxe_recv.c @@ -342,6 +342,7 @@ void rxe_rcv(struct sk_buff *skb) goto drop; rxe_counter_inc(rxe, RXE_CNT_RCVD_PKTS); + rxe_counter_add(rxe, RXE_CNT_RCVD_BYTES, skb->len); if (unlikely(bth_qpn(pkt) == IB_MULTICAST_QPN)) rxe_rcv_mcast_pkt(rxe, skb); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index fb149f37e91d..2bcfb919a40b 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -460,6 +460,12 @@ static inline void rxe_counter_inc(struct rxe_dev *rxe, enum rxe_counters index) atomic64_inc(&rxe->stats_counters[index]); } +static inline void rxe_counter_add(struct rxe_dev *rxe, enum rxe_counters index, + s64 val) +{ + atomic64_add(val, &rxe->stats_counters[index]); +} + static inline struct rxe_dev *to_rdev(struct ib_device *dev) { return dev ? container_of(dev, struct rxe_dev, ib_dev) : NULL; -- 2.43.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/3] RDMA/rxe: add SENT/RCVD bytes 2026-03-28 9:28 ` [PATCH 2/3] RDMA/rxe: add SENT/RCVD bytes zhenwei pi @ 2026-03-28 18:02 ` Zhu Yanjun 0 siblings, 0 replies; 7+ messages in thread From: Zhu Yanjun @ 2026-03-28 18:02 UTC (permalink / raw) To: zhenwei pi, linux-kernel, linux-rdma, yanjun.zhu@linux.dev Cc: zyjzyj2000, jgg, leon 在 2026/3/28 2:28, zhenwei pi 写道: > There is a lack of sent/received counter in bytes. > > Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev> > --- > drivers/infiniband/sw/rxe/rxe_hw_counters.c | 2 ++ > drivers/infiniband/sw/rxe/rxe_hw_counters.h | 2 ++ > drivers/infiniband/sw/rxe/rxe_net.c | 1 + > drivers/infiniband/sw/rxe/rxe_recv.c | 1 + > drivers/infiniband/sw/rxe/rxe_verbs.h | 6 ++++++ > 5 files changed, 12 insertions(+) > > diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.c b/drivers/infiniband/sw/rxe/rxe_hw_counters.c > index 437917a7d8f2..17edaa9a9b9b 100644 > --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.c > +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.c > @@ -22,6 +22,8 @@ static const struct rdma_stat_desc rxe_counter_descs[] = { > [RXE_CNT_LINK_DOWNED].name = "link_downed", > [RXE_CNT_RDMA_SEND].name = "rdma_sends", > [RXE_CNT_RDMA_RECV].name = "rdma_recvs", > + [RXE_CNT_SENT_BYTES].name = "sent_bytes", > + [RXE_CNT_RCVD_BYTES].name = "rcvd_bytes", > }; > > int rxe_ib_get_hw_stats(struct ib_device *ibdev, > diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.h b/drivers/infiniband/sw/rxe/rxe_hw_counters.h > index 051f9e1c3852..01b355103cbc 100644 > --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.h > +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.h > @@ -26,6 +26,8 @@ enum rxe_counters { > RXE_CNT_LINK_DOWNED, > RXE_CNT_RDMA_SEND, > RXE_CNT_RDMA_RECV, > + RXE_CNT_SENT_BYTES, > + RXE_CNT_RCVD_BYTES, > RXE_NUM_OF_COUNTERS > }; > > diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c > index 20338cb8e3c2..ec0ae7479fe7 100644 > --- a/drivers/infiniband/sw/rxe/rxe_net.c > +++ b/drivers/infiniband/sw/rxe/rxe_net.c > @@ -519,6 +519,7 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt, > } > > rxe_counter_inc(rxe, RXE_CNT_SENT_PKTS); > + rxe_counter_add(rxe, RXE_CNT_SENT_BYTES, skb->len); > goto done; > > drop: > diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c > index 5861e4244049..b5522017852d 100644 > --- a/drivers/infiniband/sw/rxe/rxe_recv.c > +++ b/drivers/infiniband/sw/rxe/rxe_recv.c > @@ -342,6 +342,7 @@ void rxe_rcv(struct sk_buff *skb) > goto drop; > > rxe_counter_inc(rxe, RXE_CNT_RCVD_PKTS); > + rxe_counter_add(rxe, RXE_CNT_RCVD_BYTES, skb->len); > > if (unlikely(bth_qpn(pkt) == IB_MULTICAST_QPN)) > rxe_rcv_mcast_pkt(rxe, skb); > diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h > index fb149f37e91d..2bcfb919a40b 100644 > --- a/drivers/infiniband/sw/rxe/rxe_verbs.h > +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h > @@ -460,6 +460,12 @@ static inline void rxe_counter_inc(struct rxe_dev *rxe, enum rxe_counters index) > atomic64_inc(&rxe->stats_counters[index]); > } > > +static inline void rxe_counter_add(struct rxe_dev *rxe, enum rxe_counters index, > + s64 val) > +{ > + atomic64_add(val, &rxe->stats_counters[index]); Currently atomic64 variable is used to calculate the statistics. But to get better performance, per cpu variable is preferred. Since RXE is a software emulation rdma driver, the atomic64 variable is fine in RXE. To mlx5, broadcom and efa driver, to get better performance, the per cpu variable is preferred. Zhu Yanjun > +} > + > static inline struct rxe_dev *to_rdev(struct ib_device *dev) > { > return dev ? container_of(dev, struct rxe_dev, ib_dev) : NULL; ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 3/3] RDMA/rxe: support perf mgmt 2026-03-28 9:28 [PATCH 0/3] Support PERF MGMT for RXE zhenwei pi 2026-03-28 9:28 ` [PATCH 1/3] RDMA/rxe: use RXE_PORT instead of magic number 1 zhenwei pi 2026-03-28 9:28 ` [PATCH 2/3] RDMA/rxe: add SENT/RCVD bytes zhenwei pi @ 2026-03-28 9:28 ` zhenwei pi 2026-03-28 16:56 ` Zhu Yanjun 2 siblings, 1 reply; 7+ messages in thread From: zhenwei pi @ 2026-03-28 9:28 UTC (permalink / raw) To: linux-kernel, linux-rdma; +Cc: zyjzyj2000, jgg, leon, zhenwei pi In RXE, hardware counters are already supported, but not in a standardized manner. For instance, user-space monitoring tools like atop only read from the *counters* directory. Therefore, it is necessary to add perf management support to RXE. Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev> --- drivers/infiniband/sw/rxe/Makefile | 1 + drivers/infiniband/sw/rxe/rxe_loc.h | 6 ++ drivers/infiniband/sw/rxe/rxe_mad.c | 86 +++++++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_verbs.c | 1 + drivers/infiniband/sw/rxe/rxe_verbs.h | 5 ++ 5 files changed, 99 insertions(+) create mode 100644 drivers/infiniband/sw/rxe/rxe_mad.c diff --git a/drivers/infiniband/sw/rxe/Makefile b/drivers/infiniband/sw/rxe/Makefile index 93134f1d1d0c..3c47e5b982c2 100644 --- a/drivers/infiniband/sw/rxe/Makefile +++ b/drivers/infiniband/sw/rxe/Makefile @@ -22,6 +22,7 @@ rdma_rxe-y := \ rxe_mcast.o \ rxe_task.o \ rxe_net.o \ + rxe_mad.o \ rxe_hw_counters.o rdma_rxe-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += rxe_odp.o diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 7992290886e1..a8ce85147c1f 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -245,4 +245,10 @@ static inline int rxe_ib_advise_mr(struct ib_pd *pd, #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ +/* rxe-mad.c */ +int rxe_process_mad(struct ib_device *ibdev, int mad_flags, u32 port_num, + const struct ib_wc *in_wc, const struct ib_grh *in_grh, + const struct ib_mad *in, struct ib_mad *out, + size_t *out_mad_size, u16 *out_mad_pkey_index); + #endif /* RXE_LOC_H */ diff --git a/drivers/infiniband/sw/rxe/rxe_mad.c b/drivers/infiniband/sw/rxe/rxe_mad.c new file mode 100644 index 000000000000..9148f384b05c --- /dev/null +++ b/drivers/infiniband/sw/rxe/rxe_mad.c @@ -0,0 +1,86 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* + * Copyright (c) 2026 zhenwei pi <zhenwei.pi@linux.dev> + */ + +#include <rdma/ib_pma.h> +#include "rxe.h" +#include "rxe_hw_counters.h" + +static int rxe_process_pma_info(struct ib_mad *out) +{ + struct ib_class_port_info cpi = {}; + + cpi.capability_mask = IB_PMA_CLASS_CAP_EXT_WIDTH; + memcpy((out->data + 40), &cpi, sizeof(cpi)); + + return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY; +} + +static int rxe_process_pma_counters(struct rxe_dev *rxe, struct ib_mad *out) +{ + struct ib_pma_portcounters *pma_cnt = + (struct ib_pma_portcounters *)(out->data + 40); + + pma_cnt->link_downed_counter = rxe_counter_get(rxe, RXE_CNT_LINK_DOWNED); + + return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY; +} + +static int rxe_process_pma_counters_ext(struct rxe_dev *rxe, struct ib_mad *out) +{ + struct ib_pma_portcounters_ext *pma_cnt_ext = + (struct ib_pma_portcounters_ext *)(out->data + 40); + + pma_cnt_ext->port_xmit_data = cpu_to_be64(rxe_counter_get(rxe, RXE_CNT_SENT_BYTES) >> 2); + pma_cnt_ext->port_rcv_data = cpu_to_be64(rxe_counter_get(rxe, RXE_CNT_RCVD_BYTES) >> 2); + pma_cnt_ext->port_xmit_packets = cpu_to_be64(rxe_counter_get(rxe, RXE_CNT_SENT_PKTS)); + pma_cnt_ext->port_rcv_packets = cpu_to_be64(rxe_counter_get(rxe, RXE_CNT_RCVD_PKTS)); + + return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY; +} + +static int rxe_process_perf_mgmt(struct rxe_dev *rxe, const struct ib_mad *in, + struct ib_mad *out) +{ + switch (in->mad_hdr.attr_id) { + case IB_PMA_CLASS_PORT_INFO: + return rxe_process_pma_info(out); + + case IB_PMA_PORT_COUNTERS: + return rxe_process_pma_counters(rxe, out); + + case IB_PMA_PORT_COUNTERS_EXT: + return rxe_process_pma_counters_ext(rxe, out); + + default: + break; + } + + return IB_MAD_RESULT_FAILURE; +} + +int rxe_process_mad(struct ib_device *ibdev, int mad_flags, u32 port_num, + const struct ib_wc *in_wc, const struct ib_grh *in_grh, + const struct ib_mad *in, struct ib_mad *out, + size_t *out_mad_size, u16 *out_mad_pkey_index) +{ + struct rxe_dev *rxe = to_rdev(ibdev); + u8 mgmt_class = in->mad_hdr.mgmt_class; + u8 method = in->mad_hdr.method; + + if (port_num != RXE_PORT) + return IB_MAD_RESULT_FAILURE; + + switch (mgmt_class) { + case IB_MGMT_CLASS_PERF_MGMT: + if (method == IB_MGMT_METHOD_GET) + return rxe_process_perf_mgmt(rxe, in, out); + break; + + default: + break; + } + + return IB_MAD_RESULT_FAILURE; +} diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index bcd486e8668b..7df0cb5a09a3 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -1509,6 +1509,7 @@ static const struct ib_device_ops rxe_dev_ops = { .post_recv = rxe_post_recv, .post_send = rxe_post_send, .post_srq_recv = rxe_post_srq_recv, + .process_mad = rxe_process_mad, .query_ah = rxe_query_ah, .query_device = rxe_query_device, .query_pkey = rxe_query_pkey, diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index 2bcfb919a40b..1c4fa8eaa733 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -466,6 +466,11 @@ static inline void rxe_counter_add(struct rxe_dev *rxe, enum rxe_counters index, atomic64_add(val, &rxe->stats_counters[index]); } +static inline s64 rxe_counter_get(struct rxe_dev *rxe, enum rxe_counters index) +{ + return atomic64_read(&rxe->stats_counters[index]); +} + static inline struct rxe_dev *to_rdev(struct ib_device *dev) { return dev ? container_of(dev, struct rxe_dev, ib_dev) : NULL; -- 2.43.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 3/3] RDMA/rxe: support perf mgmt 2026-03-28 9:28 ` [PATCH 3/3] RDMA/rxe: support perf mgmt zhenwei pi @ 2026-03-28 16:56 ` Zhu Yanjun 2026-03-29 2:53 ` zhenwei pi 0 siblings, 1 reply; 7+ messages in thread From: Zhu Yanjun @ 2026-03-28 16:56 UTC (permalink / raw) To: zhenwei pi, linux-kernel, linux-rdma, yanjun.zhu@linux.dev Cc: zyjzyj2000, jgg, leon 在 2026/3/28 2:28, zhenwei pi 写道: > In RXE, hardware counters are already supported, but not in a > standardized manner. For instance, user-space monitoring tools like > atop only read from the *counters* directory. Therefore, it is > necessary to add perf management support to RXE. > > Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev> > --- > drivers/infiniband/sw/rxe/Makefile | 1 + > drivers/infiniband/sw/rxe/rxe_loc.h | 6 ++ > drivers/infiniband/sw/rxe/rxe_mad.c | 86 +++++++++++++++++++++++++++ > drivers/infiniband/sw/rxe/rxe_verbs.c | 1 + > drivers/infiniband/sw/rxe/rxe_verbs.h | 5 ++ > 5 files changed, 99 insertions(+) > create mode 100644 drivers/infiniband/sw/rxe/rxe_mad.c > > diff --git a/drivers/infiniband/sw/rxe/Makefile b/drivers/infiniband/sw/rxe/Makefile > index 93134f1d1d0c..3c47e5b982c2 100644 > --- a/drivers/infiniband/sw/rxe/Makefile > +++ b/drivers/infiniband/sw/rxe/Makefile > @@ -22,6 +22,7 @@ rdma_rxe-y := \ > rxe_mcast.o \ > rxe_task.o \ > rxe_net.o \ > + rxe_mad.o \ > rxe_hw_counters.o > > rdma_rxe-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += rxe_odp.o > diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h > index 7992290886e1..a8ce85147c1f 100644 > --- a/drivers/infiniband/sw/rxe/rxe_loc.h > +++ b/drivers/infiniband/sw/rxe/rxe_loc.h > @@ -245,4 +245,10 @@ static inline int rxe_ib_advise_mr(struct ib_pd *pd, > > #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ > > +/* rxe-mad.c */ > +int rxe_process_mad(struct ib_device *ibdev, int mad_flags, u32 port_num, > + const struct ib_wc *in_wc, const struct ib_grh *in_grh, > + const struct ib_mad *in, struct ib_mad *out, > + size_t *out_mad_size, u16 *out_mad_pkey_index); > + > #endif /* RXE_LOC_H */ > diff --git a/drivers/infiniband/sw/rxe/rxe_mad.c b/drivers/infiniband/sw/rxe/rxe_mad.c > new file mode 100644 > index 000000000000..9148f384b05c > --- /dev/null > +++ b/drivers/infiniband/sw/rxe/rxe_mad.c > @@ -0,0 +1,86 @@ > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB > +/* > + * Copyright (c) 2026 zhenwei pi <zhenwei.pi@linux.dev> > + */ > + > +#include <rdma/ib_pma.h> > +#include "rxe.h" > +#include "rxe_hw_counters.h" > + > +static int rxe_process_pma_info(struct ib_mad *out) > +{ > + struct ib_class_port_info cpi = {}; > + > + cpi.capability_mask = IB_PMA_CLASS_CAP_EXT_WIDTH; > + memcpy((out->data + 40), &cpi, sizeof(cpi)); > + > + return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY; > +} > + > +static int rxe_process_pma_counters(struct rxe_dev *rxe, struct ib_mad *out) > +{ > + struct ib_pma_portcounters *pma_cnt = > + (struct ib_pma_portcounters *)(out->data + 40); > + > + pma_cnt->link_downed_counter = rxe_counter_get(rxe, RXE_CNT_LINK_DOWNED); > + > + return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY; > +} > + > +static int rxe_process_pma_counters_ext(struct rxe_dev *rxe, struct ib_mad *out) > +{ > + struct ib_pma_portcounters_ext *pma_cnt_ext = > + (struct ib_pma_portcounters_ext *)(out->data + 40); > + > + pma_cnt_ext->port_xmit_data = cpu_to_be64(rxe_counter_get(rxe, RXE_CNT_SENT_BYTES) >> 2); > + pma_cnt_ext->port_rcv_data = cpu_to_be64(rxe_counter_get(rxe, RXE_CNT_RCVD_BYTES) >> 2); > + pma_cnt_ext->port_xmit_packets = cpu_to_be64(rxe_counter_get(rxe, RXE_CNT_SENT_PKTS)); > + pma_cnt_ext->port_rcv_packets = cpu_to_be64(rxe_counter_get(rxe, RXE_CNT_RCVD_PKTS)); In rxe_process_pma_counters_ext, cpu_to_be64() is used to calculate the 64-bit counters. But in rxe_process_pma_counters, pma_cnt->link_downed_counter = rxe_counter_get(), this misses the endianness conversion. The IB spec requires MAD data to be in Big Endian. struct ib_pma_portcounters { u8 reserved; u8 port_select; __be16 counter_select; __be16 symbol_error_counter; u8 link_error_recovery_counter; u8 link_downed_counter; <--- It is u8. __be16 port_rcv_errors; __be16 port_rcv_remphys_errors; __be16 port_rcv_switch_relay_errors; __be16 port_xmit_discards; u8 port_xmit_constraint_errors; u8 port_rcv_constraint_errors; u8 reserved1; u8 link_overrun_errors; /* LocalLink: 7:4, BufferOverrun: 3:0 */ __be16 reserved2; __be16 vl15_dropped; __be32 port_xmit_data; __be32 port_rcv_data; __be32 port_xmit_packets; __be32 port_rcv_packets; __be32 port_xmit_wait; } __packed; Please fix it. > + > + return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY; > +} > + > +static int rxe_process_perf_mgmt(struct rxe_dev *rxe, const struct ib_mad *in, > + struct ib_mad *out) > +{ > + switch (in->mad_hdr.attr_id) { > + case IB_PMA_CLASS_PORT_INFO: > + return rxe_process_pma_info(out); > + > + case IB_PMA_PORT_COUNTERS: > + return rxe_process_pma_counters(rxe, out); > + > + case IB_PMA_PORT_COUNTERS_EXT: > + return rxe_process_pma_counters_ext(rxe, out); > + > + default: > + break; > + } > + > + return IB_MAD_RESULT_FAILURE; > +} > + > +int rxe_process_mad(struct ib_device *ibdev, int mad_flags, u32 port_num, > + const struct ib_wc *in_wc, const struct ib_grh *in_grh, > + const struct ib_mad *in, struct ib_mad *out, > + size_t *out_mad_size, u16 *out_mad_pkey_index) > +{ > + struct rxe_dev *rxe = to_rdev(ibdev); > + u8 mgmt_class = in->mad_hdr.mgmt_class; > + u8 method = in->mad_hdr.method; > + > + if (port_num != RXE_PORT) > + return IB_MAD_RESULT_FAILURE; > + > + switch (mgmt_class) { > + case IB_MGMT_CLASS_PERF_MGMT: > + if (method == IB_MGMT_METHOD_GET) The function rxe_process_mad receives *in* MAD and populates the *out* MAD. Since the out buffer is eventually transmitted back to the caller—which could be a user-space application or a remote node on the network—it is important to ensure that any unassigned bytes (padding) within the out packet are initialized to zero. If the underlying framework does not pre-zero the out memory, areas outside of the memcpy((out->data + 40), &cpi, sizeof(cpi)) call could inadvertently leak sensitive residual kernel data. Thus add this line in the function before filling in data: memset(out->data, 0, 256); 256 is the length of out->data, please adjust this number based on the actual data length. Please also consider adding support for IB_MGMT_METHOD_SET. This would allow the user space application to clear or reset specific metrics by calling atomic64_set(..., 0) on the underlying RXE counters. Thanks a lot. Zhu Yanjun > + return rxe_process_perf_mgmt(rxe, in, out); > + break; > + > + default: > + break; > + } > + > + return IB_MAD_RESULT_FAILURE; > +} > diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c > index bcd486e8668b..7df0cb5a09a3 100644 > --- a/drivers/infiniband/sw/rxe/rxe_verbs.c > +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c > @@ -1509,6 +1509,7 @@ static const struct ib_device_ops rxe_dev_ops = { > .post_recv = rxe_post_recv, > .post_send = rxe_post_send, > .post_srq_recv = rxe_post_srq_recv, > + .process_mad = rxe_process_mad, > .query_ah = rxe_query_ah, > .query_device = rxe_query_device, > .query_pkey = rxe_query_pkey, > diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h > index 2bcfb919a40b..1c4fa8eaa733 100644 > --- a/drivers/infiniband/sw/rxe/rxe_verbs.h > +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h > @@ -466,6 +466,11 @@ static inline void rxe_counter_add(struct rxe_dev *rxe, enum rxe_counters index, > atomic64_add(val, &rxe->stats_counters[index]); > } > > +static inline s64 rxe_counter_get(struct rxe_dev *rxe, enum rxe_counters index) > +{ > + return atomic64_read(&rxe->stats_counters[index]); > +} > + > static inline struct rxe_dev *to_rdev(struct ib_device *dev) > { > return dev ? container_of(dev, struct rxe_dev, ib_dev) : NULL; ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 3/3] RDMA/rxe: support perf mgmt 2026-03-28 16:56 ` Zhu Yanjun @ 2026-03-29 2:53 ` zhenwei pi 0 siblings, 0 replies; 7+ messages in thread From: zhenwei pi @ 2026-03-29 2:53 UTC (permalink / raw) To: Zhu Yanjun, linux-kernel, linux-rdma; +Cc: zyjzyj2000, jgg, leon On 3/29/26 00:56, Zhu Yanjun wrote: > 在 2026/3/28 2:28, zhenwei pi 写道: >> In RXE, hardware counters are already supported, but not in a >> standardized manner. For instance, user-space monitoring tools like >> atop only read from the *counters* directory. Therefore, it is >> necessary to add perf management support to RXE. >> >> Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev> >> --- >> drivers/infiniband/sw/rxe/Makefile | 1 + >> drivers/infiniband/sw/rxe/rxe_loc.h | 6 ++ >> drivers/infiniband/sw/rxe/rxe_mad.c | 86 +++++++++++++++++++++++++++ >> drivers/infiniband/sw/rxe/rxe_verbs.c | 1 + >> drivers/infiniband/sw/rxe/rxe_verbs.h | 5 ++ >> 5 files changed, 99 insertions(+) >> create mode 100644 drivers/infiniband/sw/rxe/rxe_mad.c >> >> diff --git a/drivers/infiniband/sw/rxe/Makefile b/drivers/infiniband/ >> sw/rxe/Makefile >> index 93134f1d1d0c..3c47e5b982c2 100644 >> --- a/drivers/infiniband/sw/rxe/Makefile >> +++ b/drivers/infiniband/sw/rxe/Makefile >> @@ -22,6 +22,7 @@ rdma_rxe-y := \ >> rxe_mcast.o \ >> rxe_task.o \ >> rxe_net.o \ >> + rxe_mad.o \ >> rxe_hw_counters.o >> rdma_rxe-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += rxe_odp.o >> diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/ >> sw/rxe/rxe_loc.h >> index 7992290886e1..a8ce85147c1f 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_loc.h >> +++ b/drivers/infiniband/sw/rxe/rxe_loc.h >> @@ -245,4 +245,10 @@ static inline int rxe_ib_advise_mr(struct ib_pd *pd, >> #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ >> +/* rxe-mad.c */ >> +int rxe_process_mad(struct ib_device *ibdev, int mad_flags, u32 >> port_num, >> + const struct ib_wc *in_wc, const struct ib_grh *in_grh, >> + const struct ib_mad *in, struct ib_mad *out, >> + size_t *out_mad_size, u16 *out_mad_pkey_index); >> + >> #endif /* RXE_LOC_H */ >> diff --git a/drivers/infiniband/sw/rxe/rxe_mad.c b/drivers/infiniband/ >> sw/rxe/rxe_mad.c >> new file mode 100644 >> index 000000000000..9148f384b05c >> --- /dev/null >> +++ b/drivers/infiniband/sw/rxe/rxe_mad.c >> @@ -0,0 +1,86 @@ >> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB >> +/* >> + * Copyright (c) 2026 zhenwei pi <zhenwei.pi@linux.dev> >> + */ >> + >> +#include <rdma/ib_pma.h> >> +#include "rxe.h" >> +#include "rxe_hw_counters.h" >> + >> +static int rxe_process_pma_info(struct ib_mad *out) >> +{ >> + struct ib_class_port_info cpi = {}; >> + >> + cpi.capability_mask = IB_PMA_CLASS_CAP_EXT_WIDTH; >> + memcpy((out->data + 40), &cpi, sizeof(cpi)); >> + >> + return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY; >> +} >> + >> +static int rxe_process_pma_counters(struct rxe_dev *rxe, struct >> ib_mad *out) >> +{ >> + struct ib_pma_portcounters *pma_cnt = >> + (struct ib_pma_portcounters *)(out->data + 40); >> + >> + pma_cnt->link_downed_counter = rxe_counter_get(rxe, >> RXE_CNT_LINK_DOWNED); >> + >> + return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY; >> +} >> + >> +static int rxe_process_pma_counters_ext(struct rxe_dev *rxe, struct >> ib_mad *out) >> +{ >> + struct ib_pma_portcounters_ext *pma_cnt_ext = >> + (struct ib_pma_portcounters_ext *)(out->data + 40); >> + >> + pma_cnt_ext->port_xmit_data = cpu_to_be64(rxe_counter_get(rxe, >> RXE_CNT_SENT_BYTES) >> 2); >> + pma_cnt_ext->port_rcv_data = cpu_to_be64(rxe_counter_get(rxe, >> RXE_CNT_RCVD_BYTES) >> 2); >> + pma_cnt_ext->port_xmit_packets = cpu_to_be64(rxe_counter_get(rxe, >> RXE_CNT_SENT_PKTS)); >> + pma_cnt_ext->port_rcv_packets = cpu_to_be64(rxe_counter_get(rxe, >> RXE_CNT_RCVD_PKTS)); > > In rxe_process_pma_counters_ext, cpu_to_be64() is used to calculate the > 64-bit counters. > > But in rxe_process_pma_counters, pma_cnt->link_downed_counter = > rxe_counter_get(), this misses the endianness conversion. The IB spec > requires MAD data to be in Big Endian. > > struct ib_pma_portcounters { > u8 reserved; > u8 port_select; > __be16 counter_select; > __be16 symbol_error_counter; > u8 link_error_recovery_counter; > u8 link_downed_counter; <--- It is u8. > __be16 port_rcv_errors; > __be16 port_rcv_remphys_errors; > __be16 port_rcv_switch_relay_errors; > __be16 port_xmit_discards; > u8 port_xmit_constraint_errors; > u8 port_rcv_constraint_errors; > u8 reserved1; > u8 link_overrun_errors; /* LocalLink: 7:4, BufferOverrun: 3:0 */ > __be16 reserved2; > __be16 vl15_dropped; > __be32 port_xmit_data; > __be32 port_rcv_data; > __be32 port_xmit_packets; > __be32 port_rcv_packets; > __be32 port_xmit_wait; > } __packed; > > Please fix it. OK. > >> + >> + return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY; >> +} >> + >> +static int rxe_process_perf_mgmt(struct rxe_dev *rxe, const struct >> ib_mad *in, >> + struct ib_mad *out) >> +{ >> + switch (in->mad_hdr.attr_id) { >> + case IB_PMA_CLASS_PORT_INFO: >> + return rxe_process_pma_info(out); >> + >> + case IB_PMA_PORT_COUNTERS: >> + return rxe_process_pma_counters(rxe, out); >> + >> + case IB_PMA_PORT_COUNTERS_EXT: >> + return rxe_process_pma_counters_ext(rxe, out); >> + >> + default: >> + break; >> + } >> + >> + return IB_MAD_RESULT_FAILURE; >> +} >> + >> +int rxe_process_mad(struct ib_device *ibdev, int mad_flags, u32 >> port_num, >> + const struct ib_wc *in_wc, const struct ib_grh *in_grh, >> + const struct ib_mad *in, struct ib_mad *out, >> + size_t *out_mad_size, u16 *out_mad_pkey_index) >> +{ >> + struct rxe_dev *rxe = to_rdev(ibdev); >> + u8 mgmt_class = in->mad_hdr.mgmt_class; >> + u8 method = in->mad_hdr.method; >> + >> + if (port_num != RXE_PORT) >> + return IB_MAD_RESULT_FAILURE; >> + >> + switch (mgmt_class) { >> + case IB_MGMT_CLASS_PERF_MGMT: >> + if (method == IB_MGMT_METHOD_GET) > > The function rxe_process_mad receives *in* MAD and populates the *out* > MAD. Since the out buffer is eventually transmitted back to the caller— > which could be a user-space application or a remote node on the network— > it is important to ensure that any unassigned bytes (padding) within the > out packet are initialized to zero. If the underlying framework does not > pre-zero the out memory, areas outside of the memcpy((out->data + 40), > &cpi, sizeof(cpi)) call could inadvertently leak sensitive residual > kernel data. > > Thus add this line in the function before filling in data: > > memset(out->data, 0, 256); > > 256 is the length of out->data, please adjust this number based on the > actual data length. > I notice that all of the callers use kzalloc to allocate out MAD data, so there is no need to clear it again. In addition, other drivers like mlx5 does not clear memory too. I also have no objection to clear buffer in RXE, please let me know if it's necessary. > Please also consider adding support for IB_MGMT_METHOD_SET. This would > allow the user space application to clear or reset specific metrics by > calling atomic64_set(..., 0) on the underlying RXE counters. > Sure, I'll rename functions, for example, rename 'rxe_process_perf_mgmt' to 'rxe_get_perf_mgmt'. 'rxe_set_perf_mgmt' will be supported in the future easily. I'm not going to support SET method in this serries because of the lack of *store* opeation from these sysfs attributes. Yanjun, thanks! > Thanks a lot. > Zhu Yanjun >> + return rxe_process_perf_mgmt(rxe, in, out); >> + break; >> + >> + default: >> + break; >> + } >> + >> + return IB_MAD_RESULT_FAILURE; >> +} >> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/ >> infiniband/sw/rxe/rxe_verbs.c >> index bcd486e8668b..7df0cb5a09a3 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_verbs.c >> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c >> @@ -1509,6 +1509,7 @@ static const struct ib_device_ops rxe_dev_ops = { >> .post_recv = rxe_post_recv, >> .post_send = rxe_post_send, >> .post_srq_recv = rxe_post_srq_recv, >> + .process_mad = rxe_process_mad, >> .query_ah = rxe_query_ah, >> .query_device = rxe_query_device, >> .query_pkey = rxe_query_pkey, >> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/ >> infiniband/sw/rxe/rxe_verbs.h >> index 2bcfb919a40b..1c4fa8eaa733 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h >> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h >> @@ -466,6 +466,11 @@ static inline void rxe_counter_add(struct rxe_dev >> *rxe, enum rxe_counters index, >> atomic64_add(val, &rxe->stats_counters[index]); >> } >> +static inline s64 rxe_counter_get(struct rxe_dev *rxe, enum >> rxe_counters index) >> +{ >> + return atomic64_read(&rxe->stats_counters[index]); >> +} >> + >> static inline struct rxe_dev *to_rdev(struct ib_device *dev) >> { >> return dev ? container_of(dev, struct rxe_dev, ib_dev) : NULL; > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-03-29 2:53 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-28 9:28 [PATCH 0/3] Support PERF MGMT for RXE zhenwei pi 2026-03-28 9:28 ` [PATCH 1/3] RDMA/rxe: use RXE_PORT instead of magic number 1 zhenwei pi 2026-03-28 9:28 ` [PATCH 2/3] RDMA/rxe: add SENT/RCVD bytes zhenwei pi 2026-03-28 18:02 ` Zhu Yanjun 2026-03-28 9:28 ` [PATCH 3/3] RDMA/rxe: support perf mgmt zhenwei pi 2026-03-28 16:56 ` Zhu Yanjun 2026-03-29 2:53 ` zhenwei pi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox