* [PATCH net-next,V3, 0/3] add ethtool COALESCE_RX_CQE_FRAMES/NSECS and use it in MANA driver
@ 2026-03-06 23:19 Haiyang Zhang
2026-03-06 23:19 ` [PATCH net-next,V3, 1/3] net: ethtool: add ethtool COALESCE_RX_CQE_FRAMES/NSECS Haiyang Zhang
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Haiyang Zhang @ 2026-03-06 23:19 UTC (permalink / raw)
To: linux-hyperv, netdev; +Cc: haiyangz, paulros
From: Haiyang Zhang <haiyangz@microsoft.com>
Add two parameters for drivers supporting Rx CQE Coalescing.
ETHTOOL_A_COALESCE_RX_CQE_FRAMES:
Maximum number of frames that can be coalesced into a CQE.
ETHTOOL_A_COALESCE_RX_CQE_NSECS:
Time out value in nanoseconds after the first packet arrival in a
coalesced CQE to be sent.
Also implement in MANA driver with the new parameter and
counters.
Haiyang Zhang (3):
net: ethtool: add ethtool COALESCE_RX_CQE_FRAMES/NSECS
net: mana: Add support for RX CQE Coalescing
net: mana: Add ethtool counters for RX CQEs in coalesced type
Documentation/netlink/specs/ethtool.yaml | 8 ++
Documentation/networking/ethtool-netlink.rst | 10 ++
drivers/net/ethernet/microsoft/mana/mana_en.c | 91 +++++++++++++------
.../ethernet/microsoft/mana/mana_ethtool.c | 75 ++++++++++++++-
include/linux/ethtool.h | 6 +-
include/net/mana/mana.h | 17 +++-
.../uapi/linux/ethtool_netlink_generated.h | 2 +
net/ethtool/coalesce.c | 14 ++-
8 files changed, 187 insertions(+), 36 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net-next,V3, 1/3] net: ethtool: add ethtool COALESCE_RX_CQE_FRAMES/NSECS
2026-03-06 23:19 [PATCH net-next,V3, 0/3] add ethtool COALESCE_RX_CQE_FRAMES/NSECS and use it in MANA driver Haiyang Zhang
@ 2026-03-06 23:19 ` Haiyang Zhang
2026-03-06 23:19 ` [PATCH net-next,V3, 2/3] net: mana: Add support for RX CQE Coalescing Haiyang Zhang
2026-03-06 23:19 ` [PATCH net-next,V3, 3/3] net: mana: Add ethtool counters for RX CQEs in coalesced type Haiyang Zhang
2 siblings, 0 replies; 5+ messages in thread
From: Haiyang Zhang @ 2026-03-06 23:19 UTC (permalink / raw)
To: linux-hyperv, netdev, Andrew Lunn, Jakub Kicinski,
David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Donald Hunter, Jonathan Corbet, Shuah Khan,
Kory Maincent (Dent Project), Gal Pressman, Oleksij Rempel,
Vadim Fedorenko, linux-kernel, linux-doc
Cc: haiyangz, paulros
From: Haiyang Zhang <haiyangz@microsoft.com>
Add two parameters for drivers supporting Rx CQE Coalescing.
ETHTOOL_A_COALESCE_RX_CQE_FRAMES:
Maximum number of frames that can be coalesced into a CQE.
ETHTOOL_A_COALESCE_RX_CQE_NSECS:
Time out value in nanoseconds after the first packet arrival in a
coalesced CQE to be sent.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
Documentation/netlink/specs/ethtool.yaml | 8 ++++++++
Documentation/networking/ethtool-netlink.rst | 10 ++++++++++
include/linux/ethtool.h | 6 +++++-
include/uapi/linux/ethtool_netlink_generated.h | 2 ++
net/ethtool/coalesce.c | 14 +++++++++++++-
5 files changed, 38 insertions(+), 2 deletions(-)
diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
index 4707063af3b4..d254e26c014c 100644
--- a/Documentation/netlink/specs/ethtool.yaml
+++ b/Documentation/netlink/specs/ethtool.yaml
@@ -861,6 +861,12 @@ attribute-sets:
name: tx-profile
type: nest
nested-attributes: profile
+ -
+ name: rx-cqe-frames
+ type: u32
+ -
+ name: rx-cqe-nsecs
+ type: u32
-
name: pause-stat
@@ -2257,6 +2263,8 @@ operations:
- tx-aggr-time-usecs
- rx-profile
- tx-profile
+ - rx-cqe-frames
+ - rx-cqe-nsecs
dump: *coalesce-get-op
-
name: coalesce-set
diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index 32179168eb73..a9fbb16891fa 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -1076,6 +1076,8 @@ Kernel response contents:
``ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS`` u32 time (us), aggr, Tx
``ETHTOOL_A_COALESCE_RX_PROFILE`` nested profile of DIM, Rx
``ETHTOOL_A_COALESCE_TX_PROFILE`` nested profile of DIM, Tx
+ ``ETHTOOL_A_COALESCE_RX_CQE_FRAMES`` u32 max packets, Rx CQE
+ ``ETHTOOL_A_COALESCE_RX_CQE_NSECS`` u32 delay (ns), Rx CQE
=========================================== ====== =======================
Attributes are only included in reply if their value is not zero or the
@@ -1109,6 +1111,12 @@ well with frequent small-sized URBs transmissions.
to DIM parameters, see `Generic Network Dynamic Interrupt Moderation (Net DIM)
<https://www.kernel.org/doc/Documentation/networking/net_dim.rst>`_.
+Rx CQE coalescing allows multiple received packets to be coalesced into a single
+Completion Queue Entry (CQE). ``ETHTOOL_A_COALESCE_RX_CQE_FRAMES`` describes the
+maximum number of frames that can be coalesced into a CQE.
+``ETHTOOL_A_COALESCE_RX_CQE_NSECS`` describes max time in nanoseconds after the
+first packet arrival in a coalesced CQE to be sent.
+
COALESCE_SET
============
@@ -1147,6 +1155,8 @@ Request contents:
``ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS`` u32 time (us), aggr, Tx
``ETHTOOL_A_COALESCE_RX_PROFILE`` nested profile of DIM, Rx
``ETHTOOL_A_COALESCE_TX_PROFILE`` nested profile of DIM, Tx
+ ``ETHTOOL_A_COALESCE_RX_CQE_FRAMES`` u32 max packets, Rx CQE
+ ``ETHTOOL_A_COALESCE_RX_CQE_NSECS`` u32 delay (ns), Rx CQE
=========================================== ====== =======================
Request is rejected if it attributes declared as unsupported by driver (i.e.
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 83c375840835..656d465bcd06 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -332,6 +332,8 @@ struct kernel_ethtool_coalesce {
u32 tx_aggr_max_bytes;
u32 tx_aggr_max_frames;
u32 tx_aggr_time_usecs;
+ u32 rx_cqe_frames;
+ u32 rx_cqe_nsecs;
};
/**
@@ -380,7 +382,9 @@ bool ethtool_convert_link_mode_to_legacy_u32(u32 *legacy_u32,
#define ETHTOOL_COALESCE_TX_AGGR_TIME_USECS BIT(26)
#define ETHTOOL_COALESCE_RX_PROFILE BIT(27)
#define ETHTOOL_COALESCE_TX_PROFILE BIT(28)
-#define ETHTOOL_COALESCE_ALL_PARAMS GENMASK(28, 0)
+#define ETHTOOL_COALESCE_RX_CQE_FRAMES BIT(29)
+#define ETHTOOL_COALESCE_RX_CQE_NSECS BIT(30)
+#define ETHTOOL_COALESCE_ALL_PARAMS GENMASK(30, 0)
#define ETHTOOL_COALESCE_USECS \
(ETHTOOL_COALESCE_RX_USECS | ETHTOOL_COALESCE_TX_USECS)
diff --git a/include/uapi/linux/ethtool_netlink_generated.h b/include/uapi/linux/ethtool_netlink_generated.h
index 114b83017297..8134baf7860f 100644
--- a/include/uapi/linux/ethtool_netlink_generated.h
+++ b/include/uapi/linux/ethtool_netlink_generated.h
@@ -371,6 +371,8 @@ enum {
ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS,
ETHTOOL_A_COALESCE_RX_PROFILE,
ETHTOOL_A_COALESCE_TX_PROFILE,
+ ETHTOOL_A_COALESCE_RX_CQE_FRAMES,
+ ETHTOOL_A_COALESCE_RX_CQE_NSECS,
__ETHTOOL_A_COALESCE_CNT,
ETHTOOL_A_COALESCE_MAX = (__ETHTOOL_A_COALESCE_CNT - 1)
diff --git a/net/ethtool/coalesce.c b/net/ethtool/coalesce.c
index 3e18ca1ccc5e..349bb02c517a 100644
--- a/net/ethtool/coalesce.c
+++ b/net/ethtool/coalesce.c
@@ -118,6 +118,8 @@ static int coalesce_reply_size(const struct ethnl_req_info *req_base,
nla_total_size(sizeof(u32)) + /* _TX_AGGR_MAX_BYTES */
nla_total_size(sizeof(u32)) + /* _TX_AGGR_MAX_FRAMES */
nla_total_size(sizeof(u32)) + /* _TX_AGGR_TIME_USECS */
+ nla_total_size(sizeof(u32)) + /* _RX_CQE_FRAMES */
+ nla_total_size(sizeof(u32)) + /* _RX_CQE_NSECS */
total_modersz * 2; /* _{R,T}X_PROFILE */
}
@@ -269,7 +271,11 @@ static int coalesce_fill_reply(struct sk_buff *skb,
coalesce_put_u32(skb, ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES,
kcoal->tx_aggr_max_frames, supported) ||
coalesce_put_u32(skb, ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS,
- kcoal->tx_aggr_time_usecs, supported))
+ kcoal->tx_aggr_time_usecs, supported) ||
+ coalesce_put_u32(skb, ETHTOOL_A_COALESCE_RX_CQE_FRAMES,
+ kcoal->rx_cqe_frames, supported) ||
+ coalesce_put_u32(skb, ETHTOOL_A_COALESCE_RX_CQE_NSECS,
+ kcoal->rx_cqe_nsecs, supported))
return -EMSGSIZE;
if (!req_base->dev || !req_base->dev->irq_moder)
@@ -338,6 +344,8 @@ const struct nla_policy ethnl_coalesce_set_policy[] = {
[ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES] = { .type = NLA_U32 },
[ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES] = { .type = NLA_U32 },
[ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS] = { .type = NLA_U32 },
+ [ETHTOOL_A_COALESCE_RX_CQE_FRAMES] = { .type = NLA_U32 },
+ [ETHTOOL_A_COALESCE_RX_CQE_NSECS] = { .type = NLA_U32 },
[ETHTOOL_A_COALESCE_RX_PROFILE] =
NLA_POLICY_NESTED(coalesce_profile_policy),
[ETHTOOL_A_COALESCE_TX_PROFILE] =
@@ -570,6 +578,10 @@ __ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info,
tb[ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES], &mod);
ethnl_update_u32(&kernel_coalesce.tx_aggr_time_usecs,
tb[ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS], &mod);
+ ethnl_update_u32(&kernel_coalesce.rx_cqe_frames,
+ tb[ETHTOOL_A_COALESCE_RX_CQE_FRAMES], &mod);
+ ethnl_update_u32(&kernel_coalesce.rx_cqe_nsecs,
+ tb[ETHTOOL_A_COALESCE_RX_CQE_NSECS], &mod);
if (dev->irq_moder && dev->irq_moder->profile_flags & DIM_PROFILE_RX) {
ret = ethnl_update_profile(dev, &dev->irq_moder->rx_profile,
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net-next,V3, 2/3] net: mana: Add support for RX CQE Coalescing
2026-03-06 23:19 [PATCH net-next,V3, 0/3] add ethtool COALESCE_RX_CQE_FRAMES/NSECS and use it in MANA driver Haiyang Zhang
2026-03-06 23:19 ` [PATCH net-next,V3, 1/3] net: ethtool: add ethtool COALESCE_RX_CQE_FRAMES/NSECS Haiyang Zhang
@ 2026-03-06 23:19 ` Haiyang Zhang
2026-03-08 16:30 ` Haiyang Zhang
2026-03-06 23:19 ` [PATCH net-next,V3, 3/3] net: mana: Add ethtool counters for RX CQEs in coalesced type Haiyang Zhang
2 siblings, 1 reply; 5+ messages in thread
From: Haiyang Zhang @ 2026-03-06 23:19 UTC (permalink / raw)
To: linux-hyperv, netdev, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman,
Erni Sri Satya Vennela, Shradha Gupta, Dipayaan Roy,
Shiraz Saleem, Kees Cook, Subbaraya Sundeep, Breno Leitao,
Aditya Garg, linux-kernel, linux-rdma
Cc: paulros
From: Haiyang Zhang <haiyangz@microsoft.com>
Our NIC can have up to 4 RX packets on 1 CQE. To support this feature,
check and process the type CQE_RX_COALESCED_4. The default setting is
disabled, to avoid possible regression on latency.
And, add ethtool handler to switch this feature. To turn it on, run:
ethtool -C <nic> rx-cqe-frames 4
To turn it off:
ethtool -C <nic> rx-cqe-frames 1
The rx-cqe-nsec is the time out value in nanoseconds after the first
packet arrival in a coalesced CQE to be sent. It's read-only for this
NIC.
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/net/ethernet/microsoft/mana/mana_en.c | 72 ++++++++++++-------
.../ethernet/microsoft/mana/mana_ethtool.c | 60 +++++++++++++++-
include/net/mana/mana.h | 8 ++-
3 files changed, 111 insertions(+), 29 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index ea71de39f996..c06fec50e51f 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1365,6 +1365,7 @@ static int mana_cfg_vport_steering(struct mana_port_context *apc,
sizeof(resp));
req->hdr.req.msg_version = GDMA_MESSAGE_V2;
+ req->hdr.resp.msg_version = GDMA_MESSAGE_V2;
req->vport = apc->port_handle;
req->num_indir_entries = apc->indir_table_sz;
@@ -1376,7 +1377,9 @@ static int mana_cfg_vport_steering(struct mana_port_context *apc,
req->update_hashkey = update_key;
req->update_indir_tab = update_tab;
req->default_rxobj = apc->default_rxobj;
- req->cqe_coalescing_enable = 0;
+
+ if (rx != TRI_STATE_FALSE)
+ req->cqe_coalescing_enable = apc->cqe_coalescing_enable;
if (update_key)
memcpy(&req->hashkey, apc->hashkey, MANA_HASH_KEY_SIZE);
@@ -1407,6 +1410,10 @@ static int mana_cfg_vport_steering(struct mana_port_context *apc,
err = -EPROTO;
}
+ if (resp.hdr.response.msg_version >= GDMA_MESSAGE_V2)
+ apc->cqe_coalescing_timeout_ns =
+ resp.cqe_coalescing_timeout_ns;
+
netdev_info(ndev, "Configured steering vPort %llu entries %u\n",
apc->port_handle, apc->indir_table_sz);
out:
@@ -1915,11 +1922,12 @@ static struct sk_buff *mana_build_skb(struct mana_rxq *rxq, void *buf_va,
}
static void mana_rx_skb(void *buf_va, bool from_pool,
- struct mana_rxcomp_oob *cqe, struct mana_rxq *rxq)
+ struct mana_rxcomp_oob *cqe, struct mana_rxq *rxq,
+ int i)
{
struct mana_stats_rx *rx_stats = &rxq->stats;
struct net_device *ndev = rxq->ndev;
- uint pkt_len = cqe->ppi[0].pkt_len;
+ uint pkt_len = cqe->ppi[i].pkt_len;
u16 rxq_idx = rxq->rxq_idx;
struct napi_struct *napi;
struct xdp_buff xdp = {};
@@ -1963,7 +1971,7 @@ static void mana_rx_skb(void *buf_va, bool from_pool,
}
if (cqe->rx_hashtype != 0 && (ndev->features & NETIF_F_RXHASH)) {
- hash_value = cqe->ppi[0].pkt_hash;
+ hash_value = cqe->ppi[i].pkt_hash;
if (cqe->rx_hashtype & MANA_HASH_L4)
skb_set_hash(skb, hash_value, PKT_HASH_TYPE_L4);
@@ -2098,9 +2106,11 @@ static void mana_process_rx_cqe(struct mana_rxq *rxq, struct mana_cq *cq,
struct mana_recv_buf_oob *rxbuf_oob;
struct mana_port_context *apc;
struct device *dev = gc->dev;
+ bool coalesced = false;
void *old_buf = NULL;
u32 curr, pktlen;
bool old_fp;
+ int i;
apc = netdev_priv(ndev);
@@ -2112,13 +2122,16 @@ static void mana_process_rx_cqe(struct mana_rxq *rxq, struct mana_cq *cq,
++ndev->stats.rx_dropped;
rxbuf_oob = &rxq->rx_oobs[rxq->buf_index];
netdev_warn_once(ndev, "Dropped a truncated packet\n");
- goto drop;
- case CQE_RX_COALESCED_4:
- netdev_err(ndev, "RX coalescing is unsupported\n");
- apc->eth_stats.rx_coalesced_err++;
+ mana_move_wq_tail(rxq->gdma_rq,
+ rxbuf_oob->wqe_inf.wqe_size_in_bu);
+ mana_post_pkt_rxq(rxq);
return;
+ case CQE_RX_COALESCED_4:
+ coalesced = true;
+ break;
+
case CQE_RX_OBJECT_FENCE:
complete(&rxq->fence_event);
return;
@@ -2130,30 +2143,36 @@ static void mana_process_rx_cqe(struct mana_rxq *rxq, struct mana_cq *cq,
return;
}
- pktlen = oob->ppi[0].pkt_len;
+ for (i = 0; i < MANA_RXCOMP_OOB_NUM_PPI; i++) {
+ pktlen = oob->ppi[i].pkt_len;
+ if (pktlen == 0) {
+ if (i == 0)
+ netdev_err_once(
+ ndev,
+ "RX pkt len=0, rq=%u, cq=%u, rxobj=0x%llx\n",
+ rxq->gdma_id, cq->gdma_id, rxq->rxobj);
+ break;
+ }
- if (pktlen == 0) {
- /* data packets should never have packetlength of zero */
- netdev_err(ndev, "RX pkt len=0, rq=%u, cq=%u, rxobj=0x%llx\n",
- rxq->gdma_id, cq->gdma_id, rxq->rxobj);
- return;
- }
+ curr = rxq->buf_index;
+ rxbuf_oob = &rxq->rx_oobs[curr];
+ WARN_ON_ONCE(rxbuf_oob->wqe_inf.wqe_size_in_bu != 1);
- curr = rxq->buf_index;
- rxbuf_oob = &rxq->rx_oobs[curr];
- WARN_ON_ONCE(rxbuf_oob->wqe_inf.wqe_size_in_bu != 1);
+ mana_refill_rx_oob(dev, rxq, rxbuf_oob, &old_buf, &old_fp);
- mana_refill_rx_oob(dev, rxq, rxbuf_oob, &old_buf, &old_fp);
+ /* Unsuccessful refill will have old_buf == NULL.
+ * In this case, mana_rx_skb() will drop the packet.
+ */
+ mana_rx_skb(old_buf, old_fp, oob, rxq, i);
- /* Unsuccessful refill will have old_buf == NULL.
- * In this case, mana_rx_skb() will drop the packet.
- */
- mana_rx_skb(old_buf, old_fp, oob, rxq);
+ mana_move_wq_tail(rxq->gdma_rq,
+ rxbuf_oob->wqe_inf.wqe_size_in_bu);
-drop:
- mana_move_wq_tail(rxq->gdma_rq, rxbuf_oob->wqe_inf.wqe_size_in_bu);
+ mana_post_pkt_rxq(rxq);
- mana_post_pkt_rxq(rxq);
+ if (!coalesced)
+ break;
+ }
}
static void mana_poll_rx_cq(struct mana_cq *cq)
@@ -3332,6 +3351,7 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
apc->port_handle = INVALID_MANA_HANDLE;
apc->pf_filter_handle = INVALID_MANA_HANDLE;
apc->port_idx = port_idx;
+ apc->cqe_coalescing_enable = 0;
mutex_init(&apc->vport_mutex);
apc->vport_use_count = 0;
diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
index f2d220b371b5..4b234b16e57a 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
@@ -20,8 +20,6 @@ static const struct mana_stats_desc mana_eth_stats[] = {
tx_cqe_unknown_type)},
{"tx_linear_pkt_cnt", offsetof(struct mana_ethtool_stats,
tx_linear_pkt_cnt)},
- {"rx_coalesced_err", offsetof(struct mana_ethtool_stats,
- rx_coalesced_err)},
{"rx_cqe_unknown_type", offsetof(struct mana_ethtool_stats,
rx_cqe_unknown_type)},
};
@@ -390,6 +388,61 @@ static void mana_get_channels(struct net_device *ndev,
channel->combined_count = apc->num_queues;
}
+#define MANA_RX_CQE_NSEC_DEF 2048
+static int mana_get_coalesce(struct net_device *ndev,
+ struct ethtool_coalesce *ec,
+ struct kernel_ethtool_coalesce *kernel_coal,
+ struct netlink_ext_ack *extack)
+{
+ struct mana_port_context *apc = netdev_priv(ndev);
+
+ kernel_coal->rx_cqe_frames =
+ apc->cqe_coalescing_enable ? MANA_RXCOMP_OOB_NUM_PPI : 1;
+
+ kernel_coal->rx_cqe_nsecs = apc->cqe_coalescing_timeout_ns;
+
+ /* Return the default timeout value for old FW not providing
+ * this value.
+ */
+ if (apc->port_is_up && apc->cqe_coalescing_enable &&
+ !kernel_coal->rx_cqe_nsecs)
+ kernel_coal->rx_cqe_nsecs = MANA_RX_CQE_NSEC_DEF;
+
+ return 0;
+}
+
+static int mana_set_coalesce(struct net_device *ndev,
+ struct ethtool_coalesce *ec,
+ struct kernel_ethtool_coalesce *kernel_coal,
+ struct netlink_ext_ack *extack)
+{
+ struct mana_port_context *apc = netdev_priv(ndev);
+ u8 saved_cqe_coalescing_enable;
+ int err;
+
+ if (kernel_coal->rx_cqe_frames != 1 &&
+ kernel_coal->rx_cqe_frames != MANA_RXCOMP_OOB_NUM_PPI) {
+ NL_SET_ERR_MSG_FMT(extack,
+ "rx-frames must be 1 or %u, got %u",
+ MANA_RXCOMP_OOB_NUM_PPI,
+ kernel_coal->rx_cqe_frames);
+ return -EINVAL;
+ }
+
+ saved_cqe_coalescing_enable = apc->cqe_coalescing_enable;
+ apc->cqe_coalescing_enable =
+ kernel_coal->rx_cqe_frames == MANA_RXCOMP_OOB_NUM_PPI;
+
+ if (!apc->port_is_up)
+ return 0;
+
+ err = mana_config_rss(apc, TRI_STATE_TRUE, false, false);
+ if (err)
+ apc->cqe_coalescing_enable = saved_cqe_coalescing_enable;
+
+ return err;
+}
+
static int mana_set_channels(struct net_device *ndev,
struct ethtool_channels *channels)
{
@@ -510,6 +563,7 @@ static int mana_get_link_ksettings(struct net_device *ndev,
}
const struct ethtool_ops mana_ethtool_ops = {
+ .supported_coalesce_params = ETHTOOL_COALESCE_RX_CQE_FRAMES,
.get_ethtool_stats = mana_get_ethtool_stats,
.get_sset_count = mana_get_sset_count,
.get_strings = mana_get_strings,
@@ -520,6 +574,8 @@ const struct ethtool_ops mana_ethtool_ops = {
.set_rxfh = mana_set_rxfh,
.get_channels = mana_get_channels,
.set_channels = mana_set_channels,
+ .get_coalesce = mana_get_coalesce,
+ .set_coalesce = mana_set_coalesce,
.get_ringparam = mana_get_ringparam,
.set_ringparam = mana_set_ringparam,
.get_link_ksettings = mana_get_link_ksettings,
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index a078af283bdd..a7f89e7ddc56 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -378,7 +378,6 @@ struct mana_ethtool_stats {
u64 tx_cqe_err;
u64 tx_cqe_unknown_type;
u64 tx_linear_pkt_cnt;
- u64 rx_coalesced_err;
u64 rx_cqe_unknown_type;
};
@@ -557,6 +556,9 @@ struct mana_port_context {
bool port_is_up;
bool port_st_save; /* Saved port state */
+ u8 cqe_coalescing_enable;
+ u32 cqe_coalescing_timeout_ns;
+
struct mana_ethtool_stats eth_stats;
struct mana_ethtool_phy_stats phy_stats;
@@ -902,6 +904,10 @@ struct mana_cfg_rx_steer_req_v2 {
struct mana_cfg_rx_steer_resp {
struct gdma_resp_hdr hdr;
+
+ /* V2 */
+ u32 cqe_coalescing_timeout_ns;
+ u32 reserved1;
}; /* HW DATA */
/* Register HW vPort */
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net-next,V3, 3/3] net: mana: Add ethtool counters for RX CQEs in coalesced type
2026-03-06 23:19 [PATCH net-next,V3, 0/3] add ethtool COALESCE_RX_CQE_FRAMES/NSECS and use it in MANA driver Haiyang Zhang
2026-03-06 23:19 ` [PATCH net-next,V3, 1/3] net: ethtool: add ethtool COALESCE_RX_CQE_FRAMES/NSECS Haiyang Zhang
2026-03-06 23:19 ` [PATCH net-next,V3, 2/3] net: mana: Add support for RX CQE Coalescing Haiyang Zhang
@ 2026-03-06 23:19 ` Haiyang Zhang
2 siblings, 0 replies; 5+ messages in thread
From: Haiyang Zhang @ 2026-03-06 23:19 UTC (permalink / raw)
To: linux-hyperv, netdev, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman,
Erni Sri Satya Vennela, Dipayaan Roy, Shradha Gupta,
Shiraz Saleem, Kees Cook, Subbaraya Sundeep, Aditya Garg,
Breno Leitao, linux-kernel, linux-rdma
Cc: paulros
From: Haiyang Zhang <haiyangz@microsoft.com>
For RX CQEs with type CQE_RX_COALESCED_4, to measure the coalescing
efficiency, add counters to count how many contains 2, 3, 4 packets
respectively.
Also, add a counter for the error case of first packet with length == 0.
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/net/ethernet/microsoft/mana/mana_en.c | 21 ++++++++++++++++++-
.../ethernet/microsoft/mana/mana_ethtool.c | 15 +++++++++++--
include/net/mana/mana.h | 9 +++++---
3 files changed, 39 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index c06fec50e51f..11ea2b17502d 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -2146,11 +2146,23 @@ static void mana_process_rx_cqe(struct mana_rxq *rxq, struct mana_cq *cq,
for (i = 0; i < MANA_RXCOMP_OOB_NUM_PPI; i++) {
pktlen = oob->ppi[i].pkt_len;
if (pktlen == 0) {
- if (i == 0)
+ /* Collect coalesced CQE count based on packets processed.
+ * Coalesced CQEs have at least 2 packets, so index is i - 2.
+ */
+ if (i > 1) {
+ u64_stats_update_begin(&rxq->stats.syncp);
+ rxq->stats.coalesced_cqe[i - 2]++;
+ u64_stats_update_end(&rxq->stats.syncp);
+ } else if (i == 0) {
+ /* Error case stat */
+ u64_stats_update_begin(&rxq->stats.syncp);
+ rxq->stats.pkt_len0_err++;
+ u64_stats_update_end(&rxq->stats.syncp);
netdev_err_once(
ndev,
"RX pkt len=0, rq=%u, cq=%u, rxobj=0x%llx\n",
rxq->gdma_id, cq->gdma_id, rxq->rxobj);
+ }
break;
}
@@ -2173,6 +2185,13 @@ static void mana_process_rx_cqe(struct mana_rxq *rxq, struct mana_cq *cq,
if (!coalesced)
break;
}
+
+ /* Coalesced CQE with all 4 packets */
+ if (coalesced && i == MANA_RXCOMP_OOB_NUM_PPI) {
+ u64_stats_update_begin(&rxq->stats.syncp);
+ rxq->stats.coalesced_cqe[MANA_RXCOMP_OOB_NUM_PPI - 2]++;
+ u64_stats_update_end(&rxq->stats.syncp);
+ }
}
static void mana_poll_rx_cq(struct mana_cq *cq)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
index 4b234b16e57a..6a4b42fe0944 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
@@ -149,7 +149,7 @@ static void mana_get_strings(struct net_device *ndev, u32 stringset, u8 *data)
{
struct mana_port_context *apc = netdev_priv(ndev);
unsigned int num_queues = apc->num_queues;
- int i;
+ int i, j;
if (stringset != ETH_SS_STATS)
return;
@@ -168,6 +168,9 @@ static void mana_get_strings(struct net_device *ndev, u32 stringset, u8 *data)
ethtool_sprintf(&data, "rx_%d_xdp_drop", i);
ethtool_sprintf(&data, "rx_%d_xdp_tx", i);
ethtool_sprintf(&data, "rx_%d_xdp_redirect", i);
+ ethtool_sprintf(&data, "rx_%d_pkt_len0_err", i);
+ for (j = 0; j < MANA_RXCOMP_OOB_NUM_PPI - 1; j++)
+ ethtool_sprintf(&data, "rx_%d_coalesced_cqe_%d", i, j + 2);
}
for (i = 0; i < num_queues; i++) {
@@ -201,6 +204,8 @@ static void mana_get_ethtool_stats(struct net_device *ndev,
u64 xdp_xmit;
u64 xdp_drop;
u64 xdp_tx;
+ u64 pkt_len0_err;
+ u64 coalesced_cqe[MANA_RXCOMP_OOB_NUM_PPI - 1];
u64 tso_packets;
u64 tso_bytes;
u64 tso_inner_packets;
@@ -209,7 +214,7 @@ static void mana_get_ethtool_stats(struct net_device *ndev,
u64 short_pkt_fmt;
u64 csum_partial;
u64 mana_map_err;
- int q, i = 0;
+ int q, i = 0, j;
if (!apc->port_is_up)
return;
@@ -239,6 +244,9 @@ static void mana_get_ethtool_stats(struct net_device *ndev,
xdp_drop = rx_stats->xdp_drop;
xdp_tx = rx_stats->xdp_tx;
xdp_redirect = rx_stats->xdp_redirect;
+ pkt_len0_err = rx_stats->pkt_len0_err;
+ for (j = 0; j < MANA_RXCOMP_OOB_NUM_PPI - 1; j++)
+ coalesced_cqe[j] = rx_stats->coalesced_cqe[j];
} while (u64_stats_fetch_retry(&rx_stats->syncp, start));
data[i++] = packets;
@@ -246,6 +254,9 @@ static void mana_get_ethtool_stats(struct net_device *ndev,
data[i++] = xdp_drop;
data[i++] = xdp_tx;
data[i++] = xdp_redirect;
+ data[i++] = pkt_len0_err;
+ for (j = 0; j < MANA_RXCOMP_OOB_NUM_PPI - 1; j++)
+ data[i++] = coalesced_cqe[j];
}
for (q = 0; q < num_queues; q++) {
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index a7f89e7ddc56..3336688fed5e 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -61,8 +61,11 @@ enum TRI_STATE {
#define MAX_PORTS_IN_MANA_DEV 256
+/* Maximum number of packets per coalesced CQE */
+#define MANA_RXCOMP_OOB_NUM_PPI 4
+
/* Update this count whenever the respective structures are changed */
-#define MANA_STATS_RX_COUNT 5
+#define MANA_STATS_RX_COUNT (6 + MANA_RXCOMP_OOB_NUM_PPI - 1)
#define MANA_STATS_TX_COUNT 11
#define MANA_RX_FRAG_ALIGNMENT 64
@@ -73,6 +76,8 @@ struct mana_stats_rx {
u64 xdp_drop;
u64 xdp_tx;
u64 xdp_redirect;
+ u64 pkt_len0_err;
+ u64 coalesced_cqe[MANA_RXCOMP_OOB_NUM_PPI - 1];
struct u64_stats_sync syncp;
};
@@ -227,8 +232,6 @@ struct mana_rxcomp_perpkt_info {
u32 pkt_hash;
}; /* HW DATA */
-#define MANA_RXCOMP_OOB_NUM_PPI 4
-
/* Receive completion OOB */
struct mana_rxcomp_oob {
struct mana_cqe_header cqe_hdr;
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* RE: [PATCH net-next,V3, 2/3] net: mana: Add support for RX CQE Coalescing
2026-03-06 23:19 ` [PATCH net-next,V3, 2/3] net: mana: Add support for RX CQE Coalescing Haiyang Zhang
@ 2026-03-08 16:30 ` Haiyang Zhang
0 siblings, 0 replies; 5+ messages in thread
From: Haiyang Zhang @ 2026-03-08 16:30 UTC (permalink / raw)
To: Haiyang Zhang, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org, KY Srinivasan, Wei Liu, Dexuan Cui,
Long Li, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman,
Erni Sri Satya Vennela, Shradha Gupta, Dipayaan Roy,
Shiraz Saleem, Kees Cook, Subbaraya Sundeep, Breno Leitao,
Aditya Garg, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org
Cc: Paul Rosswurm
> -----Original Message-----
> From: Haiyang Zhang <haiyangz@linux.microsoft.com>
> Sent: Friday, March 6, 2026 6:19 PM
> To: linux-hyperv@vger.kernel.org; netdev@vger.kernel.org; KY Srinivasan
> <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Wei Liu
> <wei.liu@kernel.org>; Dexuan Cui <DECUI@microsoft.com>; Long Li
> <longli@microsoft.com>; Andrew Lunn <andrew+netdev@lunn.ch>; David S.
> Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub
> Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Konstantin
> Taranov <kotaranov@microsoft.com>; Simon Horman <horms@kernel.org>; Erni
> Sri Satya Vennela <ernis@linux.microsoft.com>; Shradha Gupta
> <shradhagupta@linux.microsoft.com>; Dipayaan Roy
> <dipayanroy@linux.microsoft.com>; Shiraz Saleem
> <shirazsaleem@microsoft.com>; Kees Cook <kees@kernel.org>; Subbaraya
> Sundeep <sbhatta@marvell.com>; Breno Leitao <leitao@debian.org>; Aditya
> Garg <gargaditya@linux.microsoft.com>; linux-kernel@vger.kernel.org;
> linux-rdma@vger.kernel.org
> Cc: Paul Rosswurm <paulros@microsoft.com>
> Subject: [PATCH net-next,V3, 2/3] net: mana: Add support for RX CQE
> Coalescing
>
> From: Haiyang Zhang <haiyangz@microsoft.com>
> @@ -2112,13 +2122,16 @@ static void mana_process_rx_cqe(struct mana_rxq
> *rxq, struct mana_cq *cq,
> ++ndev->stats.rx_dropped;
> rxbuf_oob = &rxq->rx_oobs[rxq->buf_index];
> netdev_warn_once(ndev, "Dropped a truncated packet\n");
> - goto drop;
>
> - case CQE_RX_COALESCED_4:
> - netdev_err(ndev, "RX coalescing is unsupported\n");
> - apc->eth_stats.rx_coalesced_err++;
> + mana_move_wq_tail(rxq->gdma_rq,
> + rxbuf_oob->wqe_inf.wqe_size_in_bu);
> + mana_post_pkt_rxq(rxq);
> return;
>
> + case CQE_RX_COALESCED_4:
> + coalesced = true;
> + break;
> +
> case CQE_RX_OBJECT_FENCE:
> complete(&rxq->fence_event);
> return;
> @@ -2130,30 +2143,36 @@ static void mana_process_rx_cqe(struct mana_rxq
> *rxq, struct mana_cq *cq,
> return;
> }
>
> - pktlen = oob->ppi[0].pkt_len;
> + for (i = 0; i < MANA_RXCOMP_OOB_NUM_PPI; i++) {
> + pktlen = oob->ppi[i].pkt_len;
> + if (pktlen == 0) {
> + if (i == 0)
> + netdev_err_once(
> + ndev,
> + "RX pkt len=0, rq=%u, cq=%u,
> rxobj=0x%llx\n",
> + rxq->gdma_id, cq->gdma_id, rxq->rxobj);
> + break;
> + }
>
> - if (pktlen == 0) {
> - /* data packets should never have packetlength of zero */
> - netdev_err(ndev, "RX pkt len=0, rq=%u, cq=%u, rxobj=0x%llx\n",
> - rxq->gdma_id, cq->gdma_id, rxq->rxobj);
> - return;
> - }
> + curr = rxq->buf_index;
> + rxbuf_oob = &rxq->rx_oobs[curr];
> + WARN_ON_ONCE(rxbuf_oob->wqe_inf.wqe_size_in_bu != 1);
>
> - curr = rxq->buf_index;
> - rxbuf_oob = &rxq->rx_oobs[curr];
> - WARN_ON_ONCE(rxbuf_oob->wqe_inf.wqe_size_in_bu != 1);
> + mana_refill_rx_oob(dev, rxq, rxbuf_oob, &old_buf, &old_fp);
>
> - mana_refill_rx_oob(dev, rxq, rxbuf_oob, &old_buf, &old_fp);
> + /* Unsuccessful refill will have old_buf == NULL.
> + * In this case, mana_rx_skb() will drop the packet.
> + */
> + mana_rx_skb(old_buf, old_fp, oob, rxq, i);
>
> - /* Unsuccessful refill will have old_buf == NULL.
> - * In this case, mana_rx_skb() will drop the packet.
> - */
> - mana_rx_skb(old_buf, old_fp, oob, rxq);
> + mana_move_wq_tail(rxq->gdma_rq,
> + rxbuf_oob->wqe_inf.wqe_size_in_bu);
I will fix this pointed out by AI review:
> The comment says "Unsuccessful refill will have old_buf == NULL" but this is
> only true for the first iteration.
> Should old_buf be set to NULL at the top of the loop, before calling
> mana_refill_rx_oob()?
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-08 16:30 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-06 23:19 [PATCH net-next,V3, 0/3] add ethtool COALESCE_RX_CQE_FRAMES/NSECS and use it in MANA driver Haiyang Zhang
2026-03-06 23:19 ` [PATCH net-next,V3, 1/3] net: ethtool: add ethtool COALESCE_RX_CQE_FRAMES/NSECS Haiyang Zhang
2026-03-06 23:19 ` [PATCH net-next,V3, 2/3] net: mana: Add support for RX CQE Coalescing Haiyang Zhang
2026-03-08 16:30 ` Haiyang Zhang
2026-03-06 23:19 ` [PATCH net-next,V3, 3/3] net: mana: Add ethtool counters for RX CQEs in coalesced type Haiyang Zhang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox