From: Jakub Kicinski <kuba@kernel.org>
To: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, andrew+netdev@lunn.ch, davem@davemloft.net,
edumazet@google.com, pabeni@redhat.com, longli@microsoft.com,
kotaranov@microsoft.com, horms@kernel.org,
shradhagupta@linux.microsoft.com, ssengar@linux.microsoft.com,
ernis@linux.microsoft.com, shirazsaleem@microsoft.com,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
dipayanroy@microsoft.com
Subject: Re: [PATCH net-next v2] net: mana: Implement ndo_tx_timeout and serialize queue resets per port.
Date: Wed, 29 Oct 2025 18:22:33 -0700 [thread overview]
Message-ID: <20251029182233.59aea2d3@kernel.org> (raw)
In-Reply-To: <20251027201351.GA8995@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>
On Mon, 27 Oct 2025 13:13:51 -0700 Dipayaan Roy wrote:
> Implement .ndo_tx_timeout for MANA so any stalled TX queue can be detected
> and a device-controlled port reset for all queues can be scheduled to a
> ordered workqueue. The reset for all queues on stall detection is
> recomended by hardware team.
>
> The change introduces a single ordered workqueue
> ("mana_per_port_queue_reset_wq") with WQ_UNBOUND | WQ_MEM_RECLAIM and
> queues exactly one work_struct per port onto it. This achieves:
>
> * Global FIFO across all port reset requests (alloc_ordered_workqueue).
Why does this matter?
> * Natural per-port de-duplication: the same work_struct cannot be
> queued twice while pending/running.
That's true for all work_structs even without a separate wq.
> * Avoids hogging a per-CPU kworker for long, may-sleep reset paths
> (WQ_UNBOUND).
?
> * Guarantees forward progress during memory pressure
> (WQ_MEM_RECLAIM rescuer).
? meaning? What scenario are you preventing?
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
> ---
> Changes in v2:
> - Fixed cosmetic changes.
> ---
> drivers/net/ethernet/microsoft/mana/mana_en.c | 82 +++++++++++++++++++
> include/net/mana/gdma.h | 6 +-
> include/net/mana/mana.h | 15 ++++
> 3 files changed, 102 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index f4fc86f20213..05b7046ae3b5 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -258,6 +258,45 @@ static int mana_get_gso_hs(struct sk_buff *skb)
> return gso_hs;
> }
>
> +static void mana_per_port_queue_reset_work_handler(struct work_struct *work)
> +{
> + struct mana_queue_reset_work *reset_queue_work =
> + container_of(work, struct mana_queue_reset_work, work);
> + struct mana_port_context *apc = reset_queue_work->apc;
> + struct net_device *ndev = apc->ndev;
> + struct mana_context *ac = apc->ac;
> + int err;
> +
> + if (!rtnl_trylock()) {
Use disable_work_sync() in the remove path and you won't have to do
the retry dance, right?
> + /* Someone else holds RTNL, requeue and exit. */
> + queue_work(ac->per_port_queue_reset_wq,
> + &apc->queue_reset_work.work);
> + return;
> + }
> +static void mana_tx_timeout(struct net_device *netdev, unsigned int txqueue)
> +{
> + struct mana_port_context *apc = netdev_priv(netdev);
> + struct mana_context *ac = apc->ac;
> + struct gdma_context *gc = ac->gdma_dev->gdma_context;
> +
> + netdev_warn(netdev, "%s(): called on txq: %u\n", __func__, txqueue);
I believe that core already prints this sort of thing, please don't
duplicate.
> + /* Already in service, hence tx queue reset is not required.*/
> + if (gc->in_service)
> + return;
> +
> + /* Note: If there are pending queue reset work for this port(apc),
> + * subsequent request queued up from here are ignored. This is because
> + * we are using the same work instance per port(apc).
> + */
> + queue_work(ac->per_port_queue_reset_wq, &apc->queue_reset_work.work);
> +}
> +
> static int mana_shaper_set(struct net_shaper_binding *binding,
> const struct net_shaper *shaper,
> struct netlink_ext_ack *extack)
> @@ -844,7 +902,9 @@ static const struct net_device_ops mana_devops = {
> .ndo_bpf = mana_bpf,
> .ndo_xdp_xmit = mana_xdp_xmit,
> .ndo_change_mtu = mana_change_mtu,
> + .ndo_tx_timeout = mana_tx_timeout,
other ndos were aligned with tabs you're using spaces
> .net_shaper_ops = &mana_shaper_ops,
> +
> };
>
> static void mana_cleanup_port_context(struct mana_port_context *apc)
> @@ -3218,6 +3278,7 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
> ndev->min_mtu = ETH_MIN_MTU;
> ndev->needed_headroom = MANA_HEADROOM;
> ndev->dev_port = port_idx;
> + ndev->watchdog_timeo = MANA_TXQ_TIMEOUT;
why you need a define for this is unclear. You can use 15 * HZ here
directly
> SET_NETDEV_DEV(ndev, gc->dev);
>
> netif_set_tso_max_size(ndev, GSO_MAX_SIZE);
> @@ -3255,6 +3316,11 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
>
> debugfs_create_u32("current_speed", 0400, apc->mana_port_debugfs, &apc->speed);
>
> + /* Initialize the per port queue reset work.*/
> + apc->queue_reset_work.apc = apc;
> + INIT_WORK(&apc->queue_reset_work.work,
> + mana_per_port_queue_reset_work_handler);
> +
> return 0;
>
> free_indir:
> @@ -3456,6 +3522,15 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
> if (ac->num_ports > MAX_PORTS_IN_MANA_DEV)
> ac->num_ports = MAX_PORTS_IN_MANA_DEV;
>
> + ac->per_port_queue_reset_wq =
> + alloc_ordered_workqueue("mana_per_port_queue_reset_wq",
> + WQ_UNBOUND | WQ_MEM_RECLAIM);
> + if (!ac->per_port_queue_reset_wq) {
> + dev_err(dev, "Failed to allocate per port queue reset workqueue\n");
> + err = -ENOMEM;
> + goto out;
> + }
> +
> if (!resuming) {
> for (i = 0; i < ac->num_ports; i++) {
> err = mana_probe_port(ac, i, &ac->ports[i]);
> @@ -3528,6 +3603,8 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
> */
> rtnl_lock();
>
> + cancel_work_sync(&apc->queue_reset_work.work);
> +
> err = mana_detach(ndev, false);
> if (err)
> netdev_err(ndev, "Failed to detach vPort %d: %d\n",
> @@ -3547,6 +3624,11 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
> free_netdev(ndev);
> }
>
> + if (ac->per_port_queue_reset_wq) {
> + destroy_workqueue(ac->per_port_queue_reset_wq);
> + ac->per_port_queue_reset_wq = NULL;
> + }
> +
> mana_destroy_eq(ac);
> out:
> mana_gd_deregister_device(gd);
> diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
> index 57df78cfbf82..1f8c536ba3be 100644
> --- a/include/net/mana/gdma.h
> +++ b/include/net/mana/gdma.h
> @@ -591,6 +591,9 @@ enum {
> /* Driver can self reset on FPGA Reconfig EQE notification */
> #define GDMA_DRV_CAP_FLAG_1_HANDLE_RECONFIG_EQE BIT(17)
>
> +/* Driver detects stalled send queues and recovers them */
> +#define GDMA_DRV_CAP_FLAG_1_HANDLE_STALL_SQ_RECOVERY BIT(18)
> +
> #define GDMA_DRV_CAP_FLAGS1 \
> (GDMA_DRV_CAP_FLAG_1_EQ_SHARING_MULTI_VPORT | \
> GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX | \
> @@ -599,7 +602,8 @@ enum {
> GDMA_DRV_CAP_FLAG_1_DEV_LIST_HOLES_SUP | \
> GDMA_DRV_CAP_FLAG_1_DYNAMIC_IRQ_ALLOC_SUPPORT | \
> GDMA_DRV_CAP_FLAG_1_SELF_RESET_ON_EQE | \
> - GDMA_DRV_CAP_FLAG_1_HANDLE_RECONFIG_EQE)
> + GDMA_DRV_CAP_FLAG_1_HANDLE_RECONFIG_EQE | \
> + GDMA_DRV_CAP_FLAG_1_HANDLE_STALL_SQ_RECOVERY)
>
> #define GDMA_DRV_CAP_FLAGS2 0
>
> diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
> index 0921485565c0..e0b44ae2226a 100644
> --- a/include/net/mana/mana.h
> +++ b/include/net/mana/mana.h
> @@ -67,6 +67,11 @@ enum TRI_STATE {
>
> #define MANA_RX_FRAG_ALIGNMENT 64
>
> +/* Timeout value for Txq stall detection & recovery used by ndo_tx_timeout.
> + * The value is chosen after considering fpga re-config scenarios.
> + */
> +#define MANA_TXQ_TIMEOUT (15 * HZ)
> +
> struct mana_stats_rx {
> u64 packets;
> u64 bytes;
> @@ -475,13 +480,23 @@ struct mana_context {
>
> struct mana_eq *eqs;
> struct dentry *mana_eqs_debugfs;
> + struct workqueue_struct *per_port_queue_reset_wq;
>
> struct net_device *ports[MAX_PORTS_IN_MANA_DEV];
> };
>
> +struct mana_queue_reset_work {
> +
nit spurious empty line
> + /* Work structure */
> + struct work_struct work;
> + /* Pointer to the port context */
> + struct mana_port_context *apc;
This struct is placed in apc, unclear why you need a backpointer
instead of using container_of()
> +};
> +
> struct mana_port_context {
> struct mana_context *ac;
> struct net_device *ndev;
> + struct mana_queue_reset_work queue_reset_work;
>
> u8 mac_addr[ETH_ALEN];
>
--
pw-bot: cr
next prev parent reply other threads:[~2025-10-30 1:22 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-27 20:13 [PATCH net-next v2] net: mana: Implement ndo_tx_timeout and serialize queue resets per port Dipayaan Roy
2025-10-28 3:58 ` Pavan Chebbi
2025-10-30 1:22 ` Jakub Kicinski [this message]
2025-11-06 4:37 ` Dipayaan Roy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251029182233.59aea2d3@kernel.org \
--to=kuba@kernel.org \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=decui@microsoft.com \
--cc=dipayanroy@linux.microsoft.com \
--cc=dipayanroy@microsoft.com \
--cc=edumazet@google.com \
--cc=ernis@linux.microsoft.com \
--cc=haiyangz@microsoft.com \
--cc=horms@kernel.org \
--cc=kotaranov@microsoft.com \
--cc=kys@microsoft.com \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=longli@microsoft.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shirazsaleem@microsoft.com \
--cc=shradhagupta@linux.microsoft.com \
--cc=ssengar@linux.microsoft.com \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.