Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries
@ 2026-05-28 18:07 Erni Sri Satya Vennela
  2026-05-29 23:14 ` Jacob Keller
  2026-06-02 20:21 ` Jakub Kicinski
  0 siblings, 2 replies; 5+ messages in thread
From: Erni Sri Satya Vennela @ 2026-05-28 18:07 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, kotaranov, horms, ernis, dipayanroy, kees,
	linux-hyperv, netdev, linux-kernel, linux-rdma

mana_query_link_cfg() sends an HWC command to firmware on every call,
but the link speed and QoS values it returns only change when the
driver explicitly calls mana_set_bw_clamp(). This function is called
not only by userspace via ethtool get_link_ksettings, but also
periodically by hv_netvsc through netvsc_get_link_ksettings and by
the sysfs speed_show attribute via dev_attr_show, resulting in
unnecessary HWC traffic every few minutes.

Add a link_cfg_error field to mana_port_context to cache the query
result. The field uses three states: 1 (not yet queried, initial
value set during mana_probe_port), 0 (success, speed/max_speed are
valid), or a negative errno for permanent errors like -EOPNOTSUPP
when the hardware does not support the command. Transient errors and
qos_unconfigured responses are not cached so that subsequent calls
will retry.

To prevent a concurrent mana_set_bw_clamp() from racing with an
in-flight query and publishing stale pre-clamp speed/max_speed,
serialize the firmware transaction and the cache update under a new
per-port mutex (link_cfg_mutex). The mutex covers both the HWC
request and the subsequent stores in mana_query_link_cfg(), and the
HWC request and invalidation in mana_set_bw_clamp(). With this lock
held, two queries can no longer interleave their speed/max_speed
stores, and an invalidation can no longer slip in between a query's
response and its publish.

Invalidate the cache inside mana_set_bw_clamp() on success, so all
current and future callers that change the link configuration
automatically trigger a fresh query on the next mana_query_link_cfg()
call. Also reset link_cfg_error during resume in mana_probe() under
link_cfg_mutex, so that any slow-path query already in flight cannot
later store 0 and silently overwrite the post-resume invalidation.

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 41 +++++++++++++++----
 include/net/mana/mana.h                       |  4 ++
 2 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 82f1461a48e9..43018bc13dc1 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1456,6 +1456,12 @@ int mana_query_link_cfg(struct mana_port_context *apc)
 	struct mana_query_link_config_req req = {};
 	int err;
 
+	mutex_lock(&apc->link_cfg_mutex);
+
+	err = apc->link_cfg_error;
+	if (err <= 0)
+		goto out;
+
 	mana_gd_init_req_hdr(&req.hdr, MANA_QUERY_LINK_CONFIG,
 			     sizeof(req), sizeof(resp));
 
@@ -1468,10 +1474,11 @@ int mana_query_link_cfg(struct mana_port_context *apc)
 	if (err) {
 		if (err == -EOPNOTSUPP) {
 			netdev_info_once(ndev, "MANA_QUERY_LINK_CONFIG not supported\n");
-			return err;
+			apc->link_cfg_error = err;
+			goto out;
 		}
 		netdev_err(ndev, "Failed to query link config: %d\n", err);
-		return err;
+		goto out;
 	}
 
 	err = mana_verify_resp_hdr(&resp.hdr, MANA_QUERY_LINK_CONFIG,
@@ -1482,16 +1489,20 @@ int mana_query_link_cfg(struct mana_port_context *apc)
 			   resp.hdr.status);
 		if (!err)
 			err = -EOPNOTSUPP;
-		return err;
+		goto out;
 	}
 
 	if (resp.qos_unconfigured) {
 		err = -EINVAL;
-		return err;
+		goto out;
 	}
 	apc->speed = resp.link_speed_mbps;
 	apc->max_speed = resp.qos_speed_mbps;
-	return 0;
+	apc->link_cfg_error = 0;
+	err = 0;
+out:
+	mutex_unlock(&apc->link_cfg_mutex);
+	return err;
 }
 
 int mana_set_bw_clamp(struct mana_port_context *apc, u32 speed,
@@ -1508,17 +1519,19 @@ int mana_set_bw_clamp(struct mana_port_context *apc, u32 speed,
 	req.link_speed_mbps = speed;
 	req.enable_clamping = enable_clamping;
 
+	mutex_lock(&apc->link_cfg_mutex);
+
 	err = mana_send_request(apc->ac, &req, sizeof(req), &resp,
 				sizeof(resp));
 
 	if (err) {
 		if (err == -EOPNOTSUPP) {
 			netdev_info_once(ndev, "MANA_SET_BW_CLAMP not supported\n");
-			return err;
+			goto out;
 		}
 		netdev_err(ndev, "Failed to set bandwidth clamp for speed %u, err = %d",
 			   speed, err);
-		return err;
+		goto out;
 	}
 
 	err = mana_verify_resp_hdr(&resp.hdr, MANA_SET_BW_CLAMP,
@@ -1529,13 +1542,18 @@ int mana_set_bw_clamp(struct mana_port_context *apc, u32 speed,
 			   resp.hdr.status);
 		if (!err)
 			err = -EOPNOTSUPP;
-		return err;
+		goto out;
 	}
 
 	if (resp.qos_unconfigured)
 		netdev_info(ndev, "QoS is unconfigured\n");
 
-	return 0;
+	/* Invalidate the cache; next query will re-fetch from firmware. */
+	apc->link_cfg_error = 1;
+	err = 0;
+out:
+	mutex_unlock(&apc->link_cfg_mutex);
+	return err;
 }
 
 int mana_create_wq_obj(struct mana_port_context *apc,
@@ -3430,6 +3448,8 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
 	apc->port_handle = INVALID_MANA_HANDLE;
 	apc->pf_filter_handle = INVALID_MANA_HANDLE;
 	apc->port_idx = port_idx;
+	apc->link_cfg_error = 1;
+	mutex_init(&apc->link_cfg_mutex);
 	apc->cqe_coalescing_enable = 0;
 
 	mutex_init(&apc->vport_mutex);
@@ -3750,6 +3770,9 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
 			rtnl_lock();
 			apc = netdev_priv(ac->ports[i]);
 			enable_work(&apc->queue_reset_work);
+			mutex_lock(&apc->link_cfg_mutex);
+			apc->link_cfg_error = 1;
+			mutex_unlock(&apc->link_cfg_mutex);
 			err = mana_attach(ac->ports[i]);
 			rtnl_unlock();
 			/* Log the port for which the attach failed, stop
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index d9c27310fd04..af772b7297ec 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -555,6 +555,10 @@ struct mana_port_context {
 	u32 speed;
 	/* Maximum speed supported by the SKU (mbps) */
 	u32 max_speed;
+	/* 1 = not queried, 0 = cached success, negative = permanent error */
+	int link_cfg_error;
+	/* Serializes mana_query_link_cfg() and mana_set_bw_clamp(). */
+	struct mutex link_cfg_mutex;
 
 	bool port_is_up;
 	bool port_st_save; /* Saved port state */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries
  2026-05-28 18:07 [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries Erni Sri Satya Vennela
@ 2026-05-29 23:14 ` Jacob Keller
  2026-06-02 20:21 ` Jakub Kicinski
  1 sibling, 0 replies; 5+ messages in thread
From: Jacob Keller @ 2026-05-29 23:14 UTC (permalink / raw)
  To: Erni Sri Satya Vennela, kys, haiyangz, wei.liu, decui, longli,
	andrew+netdev, davem, edumazet, kuba, pabeni, kotaranov, horms,
	dipayanroy, kees, linux-hyperv, netdev, linux-kernel, linux-rdma

On 5/28/2026 11:07 AM, Erni Sri Satya Vennela wrote:
> mana_query_link_cfg() sends an HWC command to firmware on every call,
> but the link speed and QoS values it returns only change when the
> driver explicitly calls mana_set_bw_clamp(). This function is called
> not only by userspace via ethtool get_link_ksettings, but also
> periodically by hv_netvsc through netvsc_get_link_ksettings and by
> the sysfs speed_show attribute via dev_attr_show, resulting in
> unnecessary HWC traffic every few minutes.
> 
> Add a link_cfg_error field to mana_port_context to cache the query
> result. The field uses three states: 1 (not yet queried, initial
> value set during mana_probe_port), 0 (success, speed/max_speed are
> valid), or a negative errno for permanent errors like -EOPNOTSUPP
> when the hardware does not support the command. Transient errors and
> qos_unconfigured responses are not cached so that subsequent calls
> will retry.
> 
> To prevent a concurrent mana_set_bw_clamp() from racing with an
> in-flight query and publishing stale pre-clamp speed/max_speed,
> serialize the firmware transaction and the cache update under a new
> per-port mutex (link_cfg_mutex). The mutex covers both the HWC
> request and the subsequent stores in mana_query_link_cfg(), and the
> HWC request and invalidation in mana_set_bw_clamp(). With this lock
> held, two queries can no longer interleave their speed/max_speed
> stores, and an invalidation can no longer slip in between a query's
> response and its publish.
> 
> Invalidate the cache inside mana_set_bw_clamp() on success, so all
> current and future callers that change the link configuration
> automatically trigger a fresh query on the next mana_query_link_cfg()
> call. Also reset link_cfg_error during resume in mana_probe() under
> link_cfg_mutex, so that any slow-path query already in flight cannot
> later store 0 and silently overwrite the post-resume invalidation.
> 
> Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> ---

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries
  2026-05-28 18:07 [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries Erni Sri Satya Vennela
  2026-05-29 23:14 ` Jacob Keller
@ 2026-06-02 20:21 ` Jakub Kicinski
  2026-06-05  5:29   ` Erni Sri Satya Vennela
  1 sibling, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2026-06-02 20:21 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, pabeni, kotaranov, horms, dipayanroy, kees,
	linux-hyperv, netdev, linux-kernel, linux-rdma

On Thu, 28 May 2026 11:07:51 -0700 Erni Sri Satya Vennela wrote:
> mana_query_link_cfg() sends an HWC command to firmware on every call,
> but the link speed and QoS values it returns only change when the
> driver explicitly calls mana_set_bw_clamp(). This function is called
> not only by userspace via ethtool get_link_ksettings, but also
> periodically by hv_netvsc through netvsc_get_link_ksettings and by
> the sysfs speed_show attribute via dev_attr_show, resulting in
> unnecessary HWC traffic every few minutes.

mana is ops-locked, right? Because you support net shapers

Could you instead take the netdev_lock() in the work?
It's already held around the user space originated calls.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries
  2026-06-02 20:21 ` Jakub Kicinski
@ 2026-06-05  5:29   ` Erni Sri Satya Vennela
  2026-06-05 23:13     ` Jakub Kicinski
  0 siblings, 1 reply; 5+ messages in thread
From: Erni Sri Satya Vennela @ 2026-06-05  5:29 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, pabeni, kotaranov, horms, dipayanroy, kees,
	linux-hyperv, netdev, linux-kernel, linux-rdma

On Tue, Jun 02, 2026 at 01:21:27PM -0700, Jakub Kicinski wrote:
> On Thu, 28 May 2026 11:07:51 -0700 Erni Sri Satya Vennela wrote:
> > mana_query_link_cfg() sends an HWC command to firmware on every call,
> > but the link speed and QoS values it returns only change when the
> > driver explicitly calls mana_set_bw_clamp(). This function is called
> > not only by userspace via ethtool get_link_ksettings, but also
> > periodically by hv_netvsc through netvsc_get_link_ksettings and by
> > the sysfs speed_show attribute via dev_attr_show, resulting in
> > unnecessary HWC traffic every few minutes.
> 
> mana is ops-locked, right? Because you support net shapers
> 
> Could you instead take the netdev_lock() in the work?
> It's already held around the user space originated calls.

Hi Jakub,

I tried two netdev_lock-based variants. 

mana_query_link_cfg() has four callers:

1 ethtool ioctl/netlink			- has RTNL	- has netdev->lock
2 sysfs speed_show/duplex_show		- has RTNL	- no netdev->lock
3 netvsc_get_link_ksettings VF forward	- has RTNL	- no netdev->lock
4 mana_shaper_set			- no RTNL	- has netdev->lock

No existing lock covers all four.

A. netdev_assert_locked() in the mana_query_link_cfg() :
Lockdep WARN on every sysfs cat /sys/class/net/eth*/speed and every
periodic netvsc_get_link_ksettings() poll since callers 2 and 3 hold
RTNL only.
A slow firmware reply on callers 2/3 can land after mana_shaper_set
(caller 4) has changed the rate and invalidated the cache,
publishing a stale apc->speed / apc->max_speed as "cached valid". 
Because the value is cached, the staleness then persists until the next
shaper change

B. ASSERT_RTNL() + netdev_lock_ops() inside mana_query_link_cfg():
Self-deadlocks on #1 (__dev_ethtool already holds it) and #4
(mana_shaper_set already holds it and calls mana_query_link_cfg() before
the clamp).
ASSERT_RTNL() also WARNs from #4 — shaper genl ops don't take RTNL.

Eg. Deadlock scenario:
__dev_ethtool()
  netdev_lock_ops(dev)              ← held
    ops->get_link_ksettings()
      mana_get_link_ksettings()
        mana_query_link_cfg()
          netdev_lock_ops(ndev)     ← DEADLOCK

Can we consider private link_cfg_mutex which is orthogonal to RTNL and
netdev->lock and covers all four callers?

Thanks,
Vennela

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries
  2026-06-05  5:29   ` Erni Sri Satya Vennela
@ 2026-06-05 23:13     ` Jakub Kicinski
  0 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2026-06-05 23:13 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, pabeni, kotaranov, horms, dipayanroy, kees,
	linux-hyperv, netdev, linux-kernel, linux-rdma

On Thu, 4 Jun 2026 22:29:29 -0700 Erni Sri Satya Vennela wrote:
> I tried two netdev_lock-based variants. 
> 
> mana_query_link_cfg() has four callers:
> 
> 1 ethtool ioctl/netlink			- has RTNL	- has netdev->lock
> 2 sysfs speed_show/duplex_show		- has RTNL	- no netdev->lock
> 3 netvsc_get_link_ksettings VF forward	- has RTNL	- no netdev->lock
> 4 mana_shaper_set			- no RTNL	- has netdev->lock
> 
> No existing lock covers all four.

How fresh is your tree? The just-minted commit 9f275c2e9020 should
address the gap, I believe?

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-05 23:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-28 18:07 [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries Erni Sri Satya Vennela
2026-05-29 23:14 ` Jacob Keller
2026-06-02 20:21 ` Jakub Kicinski
2026-06-05  5:29   ` Erni Sri Satya Vennela
2026-06-05 23:13     ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox