Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Simon Horman <horms@kernel.org>
To: haiyangz@linux.microsoft.com
Cc: Simon Horman <horms@kernel.org>,
	linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
	kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, longli@microsoft.com, andrew+netdev@lunn.ch,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, kotaranov@microsoft.com,
	shradhagupta@linux.microsoft.com, ernis@linux.microsoft.com,
	dipayanroy@linux.microsoft.com, gargaditya@linux.microsoft.com,
	kees@kernel.org, leitao@debian.org, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org, paulros@microsoft.com
Subject: Re: [PATCH net-next v3] net: mana: Add Interrupt Moderation support
Date: Sat, 13 Jun 2026 09:20:14 +0100	[thread overview]
Message-ID: <20260613082014.715350-1-horms@kernel.org> (raw)
In-Reply-To: <20260611190239.2532429-1-haiyangz@linux.microsoft.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
Date:   Thu Jun 11 19:02:22 2026 +0000

    net: mana: Add Interrupt Moderation support
    
    This commit adds Static and Dynamic Interrupt Moderation (DIM) support for
    Rx and Tx. It updates the queue creation procedure, adds functions to
    collect stats for DIM, and provides ethtool handler support to get/set
    the moderation settings.

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index 26aef21c6c2c8..d92e022d8533c 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -1579,6 +1579,9 @@ int mana_create_wq_obj(struct mana_port_context *apc,
>  
>  	mana_gd_init_req_hdr(&req.hdr, MANA_CREATE_WQ_OBJ,
>  			     sizeof(req), sizeof(resp));
> +
> +	req.hdr.req.msg_version = GDMA_MESSAGE_V3;
> +	req.hdr.resp.msg_version = GDMA_MESSAGE_V2;
>  	req.vport = vport;
>  	req.wq_type = wq_type;
>  	req.wq_gdma_region = wq_spec->gdma_region;
> @@ -1587,6 +1590,9 @@ int mana_create_wq_obj(struct mana_port_context *apc,
>  	req.cq_size = cq_spec->queue_size;
>  	req.cq_moderation_ctx_id = cq_spec->modr_ctx_id;
>  	req.cq_parent_qid = cq_spec->attached_eq;
> +	req.req_cq_moderation = cq_spec->req_cq_moderation;
> +	req.cq_moderation_comp = cq_spec->cq_moderation_comp;
> +	req.cq_moderation_usec = cq_spec->cq_moderation_usec;

[Severity: High]
Does this break queue creation on older Azure host firmwares?

The code unconditionally bumps the WQ object creation request to use
GDMA_MESSAGE_V3 and passes the req_cq_moderation flag. If the firmware does
not support GDMA_PF_CAP_FLAG_1_DYN_INTERRUPT_MODERATION, could the firmware
reject the queue creation request and break device initialization entirely?

[ ... ]

> +static void mana_update_tx_dim(struct mana_cq *cq)
> +{
> +	struct mana_port_context *apc = netdev_priv(cq->txq->ndev);
> +	struct dim_sample dim_sample = {};
> +	struct mana_txq *txq = cq->txq;
> +
> +	if (!apc->tx_dim_enabled)
> +		return;
> +
> +	dim_update_sample(READ_ONCE(cq->dim_event_ctr), txq->stats.packets,
> +			  txq->stats.bytes, &dim_sample);
> +	net_dim(&cq->dim, &dim_sample);
> +}

[Severity: High]
Does this measure the CPU enqueue rate rather than the hardware completion
rate?

It appears that txq->stats.packets is incremented in mana_start_xmit()
upon enqueue, not during completion in mana_poll_tx_cq(). Dynamic interrupt
moderation relies on measuring the actual hardware completion rate to
function correctly.

Additionally, is it safe to read txq->stats.packets here without the
u64_stats_fetch_begin() retry loop? Reading a 64-bit statistic locklessly
without the synchronization primitives could result in torn reads on 32-bit
systems.

[ ... ]

> @@ -2324,7 +2434,13 @@ static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
>  	if (w < cq->budget) {
>  		mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
>  		cq->work_done_since_doorbell = 0;
> -		napi_complete_done(&cq->napi, w);
> +
> +		if (napi_complete_done(&cq->napi, w)) {
> +			if (cq->type == MANA_CQ_TYPE_RX)
> +				mana_update_rx_dim(cq);
> +			else
> +				mana_update_tx_dim(cq);
> +		}

[Severity: High]
Is it safe to call mana_update_rx_dim() and mana_update_tx_dim() after
napi_complete_done()?

Since napi_complete_done() ends the NAPI polling cycle and relinquishes
mutual exclusion, and the hardware interrupt was already re-armed by
mana_gd_ring_cq() right before this, couldn't a new packet trigger an
interrupt immediately? This could allow NAPI to be scheduled on another CPU
concurrently while this CPU is still running the DIM update, potentially
corrupting the non-thread-safe net_dim() state machine.

[ ... ]

> @@ -2562,6 +2685,9 @@ static int mana_create_txq(struct mana_port_context *apc,
>  		napi_enable_locked(&cq->napi);
>  		txq->napi_initialized = true;
>  
> +		INIT_WORK(&cq->dim.work, mana_tx_dim_work);
> +		cq->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;

[Severity: High]
Could this allow NAPI to schedule an uninitialized work_struct?

napi_enable_locked() is called right before INIT_WORK(). Once NAPI is
enabled, it can be scheduled to run immediately (for example, via netpoll).
If NAPI executes before INIT_WORK completes, mana_update_tx_dim() would
pass the uninitialized cq->dim to net_dim(), which could cause a kernel
panic when it attempts to call schedule_work().

This pattern also occurs in mana_create_rxq().

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> index 94e658d07a27e..67a2b282ff4da 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c

[ ... ]

> @@ -440,18 +474,123 @@ static int mana_set_coalesce(struct net_device *ndev,
> +		if (enable_rx_dim)
> +			apc->rx_dim_enabled = true;
> +		if (enable_tx_dim)
> +			apc->tx_dim_enabled = true;

[Severity: High]
Could the lack of memory barriers here expose uninitialized DIM structures
to NAPI?

The driver initializes DIM via mana_dim_change() and then sets
apc->rx_dim_enabled to true. Without an smp_store_release() here and a
corresponding smp_load_acquire() in mana_update_rx_dim(), weakly-ordered
CPUs like ARM64 might reorder the stores. Concurrently, NAPI polling might
observe the flag as true before the initialization is fully visible in memory,
potentially invoking net_dim() on garbage memory.

      reply	other threads:[~2026-06-13  8:20 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-11 19:02 [PATCH net-next v3] net: mana: Add Interrupt Moderation support Haiyang Zhang
2026-06-13  8:20 ` Simon Horman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260613082014.715350-1-horms@kernel.org \
    --to=horms@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=dipayanroy@linux.microsoft.com \
    --cc=edumazet@google.com \
    --cc=ernis@linux.microsoft.com \
    --cc=gargaditya@linux.microsoft.com \
    --cc=haiyangz@linux.microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=kees@kernel.org \
    --cc=kotaranov@microsoft.com \
    --cc=kuba@kernel.org \
    --cc=kys@microsoft.com \
    --cc=leitao@debian.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=paulros@microsoft.com \
    --cc=shradhagupta@linux.microsoft.com \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox