* [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries
@ 2026-05-28 18:07 Erni Sri Satya Vennela
2026-05-29 23:14 ` Jacob Keller
2026-06-02 20:21 ` Jakub Kicinski
0 siblings, 2 replies; 5+ messages in thread
From: Erni Sri Satya Vennela @ 2026-05-28 18:07 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, kuba, pabeni, kotaranov, horms, ernis, dipayanroy, kees,
linux-hyperv, netdev, linux-kernel, linux-rdma
mana_query_link_cfg() sends an HWC command to firmware on every call,
but the link speed and QoS values it returns only change when the
driver explicitly calls mana_set_bw_clamp(). This function is called
not only by userspace via ethtool get_link_ksettings, but also
periodically by hv_netvsc through netvsc_get_link_ksettings and by
the sysfs speed_show attribute via dev_attr_show, resulting in
unnecessary HWC traffic every few minutes.
Add a link_cfg_error field to mana_port_context to cache the query
result. The field uses three states: 1 (not yet queried, initial
value set during mana_probe_port), 0 (success, speed/max_speed are
valid), or a negative errno for permanent errors like -EOPNOTSUPP
when the hardware does not support the command. Transient errors and
qos_unconfigured responses are not cached so that subsequent calls
will retry.
To prevent a concurrent mana_set_bw_clamp() from racing with an
in-flight query and publishing stale pre-clamp speed/max_speed,
serialize the firmware transaction and the cache update under a new
per-port mutex (link_cfg_mutex). The mutex covers both the HWC
request and the subsequent stores in mana_query_link_cfg(), and the
HWC request and invalidation in mana_set_bw_clamp(). With this lock
held, two queries can no longer interleave their speed/max_speed
stores, and an invalidation can no longer slip in between a query's
response and its publish.
Invalidate the cache inside mana_set_bw_clamp() on success, so all
current and future callers that change the link configuration
automatically trigger a fresh query on the next mana_query_link_cfg()
call. Also reset link_cfg_error during resume in mana_probe() under
link_cfg_mutex, so that any slow-path query already in flight cannot
later store 0 and silently overwrite the post-resume invalidation.
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/net/ethernet/microsoft/mana/mana_en.c | 41 +++++++++++++++----
include/net/mana/mana.h | 4 ++
2 files changed, 36 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 82f1461a48e9..43018bc13dc1 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1456,6 +1456,12 @@ int mana_query_link_cfg(struct mana_port_context *apc)
struct mana_query_link_config_req req = {};
int err;
+ mutex_lock(&apc->link_cfg_mutex);
+
+ err = apc->link_cfg_error;
+ if (err <= 0)
+ goto out;
+
mana_gd_init_req_hdr(&req.hdr, MANA_QUERY_LINK_CONFIG,
sizeof(req), sizeof(resp));
@@ -1468,10 +1474,11 @@ int mana_query_link_cfg(struct mana_port_context *apc)
if (err) {
if (err == -EOPNOTSUPP) {
netdev_info_once(ndev, "MANA_QUERY_LINK_CONFIG not supported\n");
- return err;
+ apc->link_cfg_error = err;
+ goto out;
}
netdev_err(ndev, "Failed to query link config: %d\n", err);
- return err;
+ goto out;
}
err = mana_verify_resp_hdr(&resp.hdr, MANA_QUERY_LINK_CONFIG,
@@ -1482,16 +1489,20 @@ int mana_query_link_cfg(struct mana_port_context *apc)
resp.hdr.status);
if (!err)
err = -EOPNOTSUPP;
- return err;
+ goto out;
}
if (resp.qos_unconfigured) {
err = -EINVAL;
- return err;
+ goto out;
}
apc->speed = resp.link_speed_mbps;
apc->max_speed = resp.qos_speed_mbps;
- return 0;
+ apc->link_cfg_error = 0;
+ err = 0;
+out:
+ mutex_unlock(&apc->link_cfg_mutex);
+ return err;
}
int mana_set_bw_clamp(struct mana_port_context *apc, u32 speed,
@@ -1508,17 +1519,19 @@ int mana_set_bw_clamp(struct mana_port_context *apc, u32 speed,
req.link_speed_mbps = speed;
req.enable_clamping = enable_clamping;
+ mutex_lock(&apc->link_cfg_mutex);
+
err = mana_send_request(apc->ac, &req, sizeof(req), &resp,
sizeof(resp));
if (err) {
if (err == -EOPNOTSUPP) {
netdev_info_once(ndev, "MANA_SET_BW_CLAMP not supported\n");
- return err;
+ goto out;
}
netdev_err(ndev, "Failed to set bandwidth clamp for speed %u, err = %d",
speed, err);
- return err;
+ goto out;
}
err = mana_verify_resp_hdr(&resp.hdr, MANA_SET_BW_CLAMP,
@@ -1529,13 +1542,18 @@ int mana_set_bw_clamp(struct mana_port_context *apc, u32 speed,
resp.hdr.status);
if (!err)
err = -EOPNOTSUPP;
- return err;
+ goto out;
}
if (resp.qos_unconfigured)
netdev_info(ndev, "QoS is unconfigured\n");
- return 0;
+ /* Invalidate the cache; next query will re-fetch from firmware. */
+ apc->link_cfg_error = 1;
+ err = 0;
+out:
+ mutex_unlock(&apc->link_cfg_mutex);
+ return err;
}
int mana_create_wq_obj(struct mana_port_context *apc,
@@ -3430,6 +3448,8 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
apc->port_handle = INVALID_MANA_HANDLE;
apc->pf_filter_handle = INVALID_MANA_HANDLE;
apc->port_idx = port_idx;
+ apc->link_cfg_error = 1;
+ mutex_init(&apc->link_cfg_mutex);
apc->cqe_coalescing_enable = 0;
mutex_init(&apc->vport_mutex);
@@ -3750,6 +3770,9 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
rtnl_lock();
apc = netdev_priv(ac->ports[i]);
enable_work(&apc->queue_reset_work);
+ mutex_lock(&apc->link_cfg_mutex);
+ apc->link_cfg_error = 1;
+ mutex_unlock(&apc->link_cfg_mutex);
err = mana_attach(ac->ports[i]);
rtnl_unlock();
/* Log the port for which the attach failed, stop
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index d9c27310fd04..af772b7297ec 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -555,6 +555,10 @@ struct mana_port_context {
u32 speed;
/* Maximum speed supported by the SKU (mbps) */
u32 max_speed;
+ /* 1 = not queried, 0 = cached success, negative = permanent error */
+ int link_cfg_error;
+ /* Serializes mana_query_link_cfg() and mana_set_bw_clamp(). */
+ struct mutex link_cfg_mutex;
bool port_is_up;
bool port_st_save; /* Saved port state */
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries
2026-05-28 18:07 [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries Erni Sri Satya Vennela
@ 2026-05-29 23:14 ` Jacob Keller
2026-06-02 20:21 ` Jakub Kicinski
1 sibling, 0 replies; 5+ messages in thread
From: Jacob Keller @ 2026-05-29 23:14 UTC (permalink / raw)
To: Erni Sri Satya Vennela, kys, haiyangz, wei.liu, decui, longli,
andrew+netdev, davem, edumazet, kuba, pabeni, kotaranov, horms,
dipayanroy, kees, linux-hyperv, netdev, linux-kernel, linux-rdma
On 5/28/2026 11:07 AM, Erni Sri Satya Vennela wrote:
> mana_query_link_cfg() sends an HWC command to firmware on every call,
> but the link speed and QoS values it returns only change when the
> driver explicitly calls mana_set_bw_clamp(). This function is called
> not only by userspace via ethtool get_link_ksettings, but also
> periodically by hv_netvsc through netvsc_get_link_ksettings and by
> the sysfs speed_show attribute via dev_attr_show, resulting in
> unnecessary HWC traffic every few minutes.
>
> Add a link_cfg_error field to mana_port_context to cache the query
> result. The field uses three states: 1 (not yet queried, initial
> value set during mana_probe_port), 0 (success, speed/max_speed are
> valid), or a negative errno for permanent errors like -EOPNOTSUPP
> when the hardware does not support the command. Transient errors and
> qos_unconfigured responses are not cached so that subsequent calls
> will retry.
>
> To prevent a concurrent mana_set_bw_clamp() from racing with an
> in-flight query and publishing stale pre-clamp speed/max_speed,
> serialize the firmware transaction and the cache update under a new
> per-port mutex (link_cfg_mutex). The mutex covers both the HWC
> request and the subsequent stores in mana_query_link_cfg(), and the
> HWC request and invalidation in mana_set_bw_clamp(). With this lock
> held, two queries can no longer interleave their speed/max_speed
> stores, and an invalidation can no longer slip in between a query's
> response and its publish.
>
> Invalidate the cache inside mana_set_bw_clamp() on success, so all
> current and future callers that change the link configuration
> automatically trigger a fresh query on the next mana_query_link_cfg()
> call. Also reset link_cfg_error during resume in mana_probe() under
> link_cfg_mutex, so that any slow-path query already in flight cannot
> later store 0 and silently overwrite the post-resume invalidation.
>
> Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> ---
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries
2026-05-28 18:07 [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries Erni Sri Satya Vennela
2026-05-29 23:14 ` Jacob Keller
@ 2026-06-02 20:21 ` Jakub Kicinski
2026-06-05 5:29 ` Erni Sri Satya Vennela
1 sibling, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2026-06-02 20:21 UTC (permalink / raw)
To: Erni Sri Satya Vennela
Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, pabeni, kotaranov, horms, dipayanroy, kees,
linux-hyperv, netdev, linux-kernel, linux-rdma
On Thu, 28 May 2026 11:07:51 -0700 Erni Sri Satya Vennela wrote:
> mana_query_link_cfg() sends an HWC command to firmware on every call,
> but the link speed and QoS values it returns only change when the
> driver explicitly calls mana_set_bw_clamp(). This function is called
> not only by userspace via ethtool get_link_ksettings, but also
> periodically by hv_netvsc through netvsc_get_link_ksettings and by
> the sysfs speed_show attribute via dev_attr_show, resulting in
> unnecessary HWC traffic every few minutes.
mana is ops-locked, right? Because you support net shapers
Could you instead take the netdev_lock() in the work?
It's already held around the user space originated calls.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries
2026-06-02 20:21 ` Jakub Kicinski
@ 2026-06-05 5:29 ` Erni Sri Satya Vennela
2026-06-05 23:13 ` Jakub Kicinski
0 siblings, 1 reply; 5+ messages in thread
From: Erni Sri Satya Vennela @ 2026-06-05 5:29 UTC (permalink / raw)
To: Jakub Kicinski
Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, pabeni, kotaranov, horms, dipayanroy, kees,
linux-hyperv, netdev, linux-kernel, linux-rdma
On Tue, Jun 02, 2026 at 01:21:27PM -0700, Jakub Kicinski wrote:
> On Thu, 28 May 2026 11:07:51 -0700 Erni Sri Satya Vennela wrote:
> > mana_query_link_cfg() sends an HWC command to firmware on every call,
> > but the link speed and QoS values it returns only change when the
> > driver explicitly calls mana_set_bw_clamp(). This function is called
> > not only by userspace via ethtool get_link_ksettings, but also
> > periodically by hv_netvsc through netvsc_get_link_ksettings and by
> > the sysfs speed_show attribute via dev_attr_show, resulting in
> > unnecessary HWC traffic every few minutes.
>
> mana is ops-locked, right? Because you support net shapers
>
> Could you instead take the netdev_lock() in the work?
> It's already held around the user space originated calls.
Hi Jakub,
I tried two netdev_lock-based variants.
mana_query_link_cfg() has four callers:
1 ethtool ioctl/netlink - has RTNL - has netdev->lock
2 sysfs speed_show/duplex_show - has RTNL - no netdev->lock
3 netvsc_get_link_ksettings VF forward - has RTNL - no netdev->lock
4 mana_shaper_set - no RTNL - has netdev->lock
No existing lock covers all four.
A. netdev_assert_locked() in the mana_query_link_cfg() :
Lockdep WARN on every sysfs cat /sys/class/net/eth*/speed and every
periodic netvsc_get_link_ksettings() poll since callers 2 and 3 hold
RTNL only.
A slow firmware reply on callers 2/3 can land after mana_shaper_set
(caller 4) has changed the rate and invalidated the cache,
publishing a stale apc->speed / apc->max_speed as "cached valid".
Because the value is cached, the staleness then persists until the next
shaper change
B. ASSERT_RTNL() + netdev_lock_ops() inside mana_query_link_cfg():
Self-deadlocks on #1 (__dev_ethtool already holds it) and #4
(mana_shaper_set already holds it and calls mana_query_link_cfg() before
the clamp).
ASSERT_RTNL() also WARNs from #4 — shaper genl ops don't take RTNL.
Eg. Deadlock scenario:
__dev_ethtool()
netdev_lock_ops(dev) ← held
ops->get_link_ksettings()
mana_get_link_ksettings()
mana_query_link_cfg()
netdev_lock_ops(ndev) ← DEADLOCK
Can we consider private link_cfg_mutex which is orthogonal to RTNL and
netdev->lock and covers all four callers?
Thanks,
Vennela
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries
2026-06-05 5:29 ` Erni Sri Satya Vennela
@ 2026-06-05 23:13 ` Jakub Kicinski
0 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2026-06-05 23:13 UTC (permalink / raw)
To: Erni Sri Satya Vennela
Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, pabeni, kotaranov, horms, dipayanroy, kees,
linux-hyperv, netdev, linux-kernel, linux-rdma
On Thu, 4 Jun 2026 22:29:29 -0700 Erni Sri Satya Vennela wrote:
> I tried two netdev_lock-based variants.
>
> mana_query_link_cfg() has four callers:
>
> 1 ethtool ioctl/netlink - has RTNL - has netdev->lock
> 2 sysfs speed_show/duplex_show - has RTNL - no netdev->lock
> 3 netvsc_get_link_ksettings VF forward - has RTNL - no netdev->lock
> 4 mana_shaper_set - no RTNL - has netdev->lock
>
> No existing lock covers all four.
How fresh is your tree? The just-minted commit 9f275c2e9020 should
address the gap, I believe?
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-05 23:13 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-28 18:07 [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries Erni Sri Satya Vennela
2026-05-29 23:14 ` Jacob Keller
2026-06-02 20:21 ` Jakub Kicinski
2026-06-05 5:29 ` Erni Sri Satya Vennela
2026-06-05 23:13 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox