* [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries
@ 2026-05-28 18:07 Erni Sri Satya Vennela
2026-05-29 23:14 ` Jacob Keller
2026-06-02 20:21 ` Jakub Kicinski
0 siblings, 2 replies; 6+ messages in thread
From: Erni Sri Satya Vennela @ 2026-05-28 18:07 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, kuba, pabeni, kotaranov, horms, ernis, dipayanroy, kees,
linux-hyperv, netdev, linux-kernel, linux-rdma
mana_query_link_cfg() sends an HWC command to firmware on every call,
but the link speed and QoS values it returns only change when the
driver explicitly calls mana_set_bw_clamp(). This function is called
not only by userspace via ethtool get_link_ksettings, but also
periodically by hv_netvsc through netvsc_get_link_ksettings and by
the sysfs speed_show attribute via dev_attr_show, resulting in
unnecessary HWC traffic every few minutes.
Add a link_cfg_error field to mana_port_context to cache the query
result. The field uses three states: 1 (not yet queried, initial
value set during mana_probe_port), 0 (success, speed/max_speed are
valid), or a negative errno for permanent errors like -EOPNOTSUPP
when the hardware does not support the command. Transient errors and
qos_unconfigured responses are not cached so that subsequent calls
will retry.
To prevent a concurrent mana_set_bw_clamp() from racing with an
in-flight query and publishing stale pre-clamp speed/max_speed,
serialize the firmware transaction and the cache update under a new
per-port mutex (link_cfg_mutex). The mutex covers both the HWC
request and the subsequent stores in mana_query_link_cfg(), and the
HWC request and invalidation in mana_set_bw_clamp(). With this lock
held, two queries can no longer interleave their speed/max_speed
stores, and an invalidation can no longer slip in between a query's
response and its publish.
Invalidate the cache inside mana_set_bw_clamp() on success, so all
current and future callers that change the link configuration
automatically trigger a fresh query on the next mana_query_link_cfg()
call. Also reset link_cfg_error during resume in mana_probe() under
link_cfg_mutex, so that any slow-path query already in flight cannot
later store 0 and silently overwrite the post-resume invalidation.
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/net/ethernet/microsoft/mana/mana_en.c | 41 +++++++++++++++----
include/net/mana/mana.h | 4 ++
2 files changed, 36 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 82f1461a48e9..43018bc13dc1 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1456,6 +1456,12 @@ int mana_query_link_cfg(struct mana_port_context *apc)
struct mana_query_link_config_req req = {};
int err;
+ mutex_lock(&apc->link_cfg_mutex);
+
+ err = apc->link_cfg_error;
+ if (err <= 0)
+ goto out;
+
mana_gd_init_req_hdr(&req.hdr, MANA_QUERY_LINK_CONFIG,
sizeof(req), sizeof(resp));
@@ -1468,10 +1474,11 @@ int mana_query_link_cfg(struct mana_port_context *apc)
if (err) {
if (err == -EOPNOTSUPP) {
netdev_info_once(ndev, "MANA_QUERY_LINK_CONFIG not supported\n");
- return err;
+ apc->link_cfg_error = err;
+ goto out;
}
netdev_err(ndev, "Failed to query link config: %d\n", err);
- return err;
+ goto out;
}
err = mana_verify_resp_hdr(&resp.hdr, MANA_QUERY_LINK_CONFIG,
@@ -1482,16 +1489,20 @@ int mana_query_link_cfg(struct mana_port_context *apc)
resp.hdr.status);
if (!err)
err = -EOPNOTSUPP;
- return err;
+ goto out;
}
if (resp.qos_unconfigured) {
err = -EINVAL;
- return err;
+ goto out;
}
apc->speed = resp.link_speed_mbps;
apc->max_speed = resp.qos_speed_mbps;
- return 0;
+ apc->link_cfg_error = 0;
+ err = 0;
+out:
+ mutex_unlock(&apc->link_cfg_mutex);
+ return err;
}
int mana_set_bw_clamp(struct mana_port_context *apc, u32 speed,
@@ -1508,17 +1519,19 @@ int mana_set_bw_clamp(struct mana_port_context *apc, u32 speed,
req.link_speed_mbps = speed;
req.enable_clamping = enable_clamping;
+ mutex_lock(&apc->link_cfg_mutex);
+
err = mana_send_request(apc->ac, &req, sizeof(req), &resp,
sizeof(resp));
if (err) {
if (err == -EOPNOTSUPP) {
netdev_info_once(ndev, "MANA_SET_BW_CLAMP not supported\n");
- return err;
+ goto out;
}
netdev_err(ndev, "Failed to set bandwidth clamp for speed %u, err = %d",
speed, err);
- return err;
+ goto out;
}
err = mana_verify_resp_hdr(&resp.hdr, MANA_SET_BW_CLAMP,
@@ -1529,13 +1542,18 @@ int mana_set_bw_clamp(struct mana_port_context *apc, u32 speed,
resp.hdr.status);
if (!err)
err = -EOPNOTSUPP;
- return err;
+ goto out;
}
if (resp.qos_unconfigured)
netdev_info(ndev, "QoS is unconfigured\n");
- return 0;
+ /* Invalidate the cache; next query will re-fetch from firmware. */
+ apc->link_cfg_error = 1;
+ err = 0;
+out:
+ mutex_unlock(&apc->link_cfg_mutex);
+ return err;
}
int mana_create_wq_obj(struct mana_port_context *apc,
@@ -3430,6 +3448,8 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
apc->port_handle = INVALID_MANA_HANDLE;
apc->pf_filter_handle = INVALID_MANA_HANDLE;
apc->port_idx = port_idx;
+ apc->link_cfg_error = 1;
+ mutex_init(&apc->link_cfg_mutex);
apc->cqe_coalescing_enable = 0;
mutex_init(&apc->vport_mutex);
@@ -3750,6 +3770,9 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
rtnl_lock();
apc = netdev_priv(ac->ports[i]);
enable_work(&apc->queue_reset_work);
+ mutex_lock(&apc->link_cfg_mutex);
+ apc->link_cfg_error = 1;
+ mutex_unlock(&apc->link_cfg_mutex);
err = mana_attach(ac->ports[i]);
rtnl_unlock();
/* Log the port for which the attach failed, stop
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index d9c27310fd04..af772b7297ec 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -555,6 +555,10 @@ struct mana_port_context {
u32 speed;
/* Maximum speed supported by the SKU (mbps) */
u32 max_speed;
+ /* 1 = not queried, 0 = cached success, negative = permanent error */
+ int link_cfg_error;
+ /* Serializes mana_query_link_cfg() and mana_set_bw_clamp(). */
+ struct mutex link_cfg_mutex;
bool port_is_up;
bool port_st_save; /* Saved port state */
--
2.34.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries
2026-05-28 18:07 [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries Erni Sri Satya Vennela
@ 2026-05-29 23:14 ` Jacob Keller
2026-06-02 20:21 ` Jakub Kicinski
1 sibling, 0 replies; 6+ messages in thread
From: Jacob Keller @ 2026-05-29 23:14 UTC (permalink / raw)
To: Erni Sri Satya Vennela, kys, haiyangz, wei.liu, decui, longli,
andrew+netdev, davem, edumazet, kuba, pabeni, kotaranov, horms,
dipayanroy, kees, linux-hyperv, netdev, linux-kernel, linux-rdma
On 5/28/2026 11:07 AM, Erni Sri Satya Vennela wrote:
> mana_query_link_cfg() sends an HWC command to firmware on every call,
> but the link speed and QoS values it returns only change when the
> driver explicitly calls mana_set_bw_clamp(). This function is called
> not only by userspace via ethtool get_link_ksettings, but also
> periodically by hv_netvsc through netvsc_get_link_ksettings and by
> the sysfs speed_show attribute via dev_attr_show, resulting in
> unnecessary HWC traffic every few minutes.
>
> Add a link_cfg_error field to mana_port_context to cache the query
> result. The field uses three states: 1 (not yet queried, initial
> value set during mana_probe_port), 0 (success, speed/max_speed are
> valid), or a negative errno for permanent errors like -EOPNOTSUPP
> when the hardware does not support the command. Transient errors and
> qos_unconfigured responses are not cached so that subsequent calls
> will retry.
>
> To prevent a concurrent mana_set_bw_clamp() from racing with an
> in-flight query and publishing stale pre-clamp speed/max_speed,
> serialize the firmware transaction and the cache update under a new
> per-port mutex (link_cfg_mutex). The mutex covers both the HWC
> request and the subsequent stores in mana_query_link_cfg(), and the
> HWC request and invalidation in mana_set_bw_clamp(). With this lock
> held, two queries can no longer interleave their speed/max_speed
> stores, and an invalidation can no longer slip in between a query's
> response and its publish.
>
> Invalidate the cache inside mana_set_bw_clamp() on success, so all
> current and future callers that change the link configuration
> automatically trigger a fresh query on the next mana_query_link_cfg()
> call. Also reset link_cfg_error during resume in mana_probe() under
> link_cfg_mutex, so that any slow-path query already in flight cannot
> later store 0 and silently overwrite the post-resume invalidation.
>
> Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> ---
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries
2026-05-28 18:07 [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries Erni Sri Satya Vennela
2026-05-29 23:14 ` Jacob Keller
@ 2026-06-02 20:21 ` Jakub Kicinski
2026-06-05 5:29 ` Erni Sri Satya Vennela
1 sibling, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2026-06-02 20:21 UTC (permalink / raw)
To: Erni Sri Satya Vennela
Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, pabeni, kotaranov, horms, dipayanroy, kees,
linux-hyperv, netdev, linux-kernel, linux-rdma
On Thu, 28 May 2026 11:07:51 -0700 Erni Sri Satya Vennela wrote:
> mana_query_link_cfg() sends an HWC command to firmware on every call,
> but the link speed and QoS values it returns only change when the
> driver explicitly calls mana_set_bw_clamp(). This function is called
> not only by userspace via ethtool get_link_ksettings, but also
> periodically by hv_netvsc through netvsc_get_link_ksettings and by
> the sysfs speed_show attribute via dev_attr_show, resulting in
> unnecessary HWC traffic every few minutes.
mana is ops-locked, right? Because you support net shapers
Could you instead take the netdev_lock() in the work?
It's already held around the user space originated calls.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries
2026-06-02 20:21 ` Jakub Kicinski
@ 2026-06-05 5:29 ` Erni Sri Satya Vennela
2026-06-05 23:13 ` Jakub Kicinski
0 siblings, 1 reply; 6+ messages in thread
From: Erni Sri Satya Vennela @ 2026-06-05 5:29 UTC (permalink / raw)
To: Jakub Kicinski
Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, pabeni, kotaranov, horms, dipayanroy, kees,
linux-hyperv, netdev, linux-kernel, linux-rdma
On Tue, Jun 02, 2026 at 01:21:27PM -0700, Jakub Kicinski wrote:
> On Thu, 28 May 2026 11:07:51 -0700 Erni Sri Satya Vennela wrote:
> > mana_query_link_cfg() sends an HWC command to firmware on every call,
> > but the link speed and QoS values it returns only change when the
> > driver explicitly calls mana_set_bw_clamp(). This function is called
> > not only by userspace via ethtool get_link_ksettings, but also
> > periodically by hv_netvsc through netvsc_get_link_ksettings and by
> > the sysfs speed_show attribute via dev_attr_show, resulting in
> > unnecessary HWC traffic every few minutes.
>
> mana is ops-locked, right? Because you support net shapers
>
> Could you instead take the netdev_lock() in the work?
> It's already held around the user space originated calls.
Hi Jakub,
I tried two netdev_lock-based variants.
mana_query_link_cfg() has four callers:
1 ethtool ioctl/netlink - has RTNL - has netdev->lock
2 sysfs speed_show/duplex_show - has RTNL - no netdev->lock
3 netvsc_get_link_ksettings VF forward - has RTNL - no netdev->lock
4 mana_shaper_set - no RTNL - has netdev->lock
No existing lock covers all four.
A. netdev_assert_locked() in the mana_query_link_cfg() :
Lockdep WARN on every sysfs cat /sys/class/net/eth*/speed and every
periodic netvsc_get_link_ksettings() poll since callers 2 and 3 hold
RTNL only.
A slow firmware reply on callers 2/3 can land after mana_shaper_set
(caller 4) has changed the rate and invalidated the cache,
publishing a stale apc->speed / apc->max_speed as "cached valid".
Because the value is cached, the staleness then persists until the next
shaper change
B. ASSERT_RTNL() + netdev_lock_ops() inside mana_query_link_cfg():
Self-deadlocks on #1 (__dev_ethtool already holds it) and #4
(mana_shaper_set already holds it and calls mana_query_link_cfg() before
the clamp).
ASSERT_RTNL() also WARNs from #4 — shaper genl ops don't take RTNL.
Eg. Deadlock scenario:
__dev_ethtool()
netdev_lock_ops(dev) ← held
ops->get_link_ksettings()
mana_get_link_ksettings()
mana_query_link_cfg()
netdev_lock_ops(ndev) ← DEADLOCK
Can we consider private link_cfg_mutex which is orthogonal to RTNL and
netdev->lock and covers all four callers?
Thanks,
Vennela
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries
2026-06-05 5:29 ` Erni Sri Satya Vennela
@ 2026-06-05 23:13 ` Jakub Kicinski
2026-06-06 11:17 ` Erni Sri Satya Vennela
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2026-06-05 23:13 UTC (permalink / raw)
To: Erni Sri Satya Vennela
Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, pabeni, kotaranov, horms, dipayanroy, kees,
linux-hyperv, netdev, linux-kernel, linux-rdma
On Thu, 4 Jun 2026 22:29:29 -0700 Erni Sri Satya Vennela wrote:
> I tried two netdev_lock-based variants.
>
> mana_query_link_cfg() has four callers:
>
> 1 ethtool ioctl/netlink - has RTNL - has netdev->lock
> 2 sysfs speed_show/duplex_show - has RTNL - no netdev->lock
> 3 netvsc_get_link_ksettings VF forward - has RTNL - no netdev->lock
> 4 mana_shaper_set - no RTNL - has netdev->lock
>
> No existing lock covers all four.
How fresh is your tree? The just-minted commit 9f275c2e9020 should
address the gap, I believe?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries
2026-06-05 23:13 ` Jakub Kicinski
@ 2026-06-06 11:17 ` Erni Sri Satya Vennela
0 siblings, 0 replies; 6+ messages in thread
From: Erni Sri Satya Vennela @ 2026-06-06 11:17 UTC (permalink / raw)
To: Jakub Kicinski
Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, pabeni, kotaranov, horms, dipayanroy, kees,
linux-hyperv, netdev, linux-kernel, linux-rdma
On Fri, Jun 05, 2026 at 04:13:15PM -0700, Jakub Kicinski wrote:
> On Thu, 4 Jun 2026 22:29:29 -0700 Erni Sri Satya Vennela wrote:
> > I tried two netdev_lock-based variants.
> >
> > mana_query_link_cfg() has four callers:
> >
> > 1 ethtool ioctl/netlink - has RTNL - has netdev->lock
> > 2 sysfs speed_show/duplex_show - has RTNL - no netdev->lock
> > 3 netvsc_get_link_ksettings VF forward - has RTNL - no netdev->lock
> > 4 mana_shaper_set - no RTNL - has netdev->lock
> >
> > No existing lock covers all four.
>
> How fresh is your tree? The just-minted commit 9f275c2e9020 should
> address the gap, I believe?
Hi Jakub,
Thanks for pointing out the commit 9f275c2e9020. It does close the gap.
My analysis was against a tree that predated it but the commit landed
on net-next on Jun 4 21:30 UTC and my reply went out about an hour later,
so the rebase that picked it up hadn't happened on my end yet.
I've now rebased to current net-next and re-walked the four callers of
mana_query_link_cfg(). All of them hold netdev->lock by the time they
reach mana_query_link_cfg(), and the race scenarios I described no longer
apply.
Thanks,
Vennela
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-06-06 11:17 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-28 18:07 [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries Erni Sri Satya Vennela
2026-05-29 23:14 ` Jacob Keller
2026-06-02 20:21 ` Jakub Kicinski
2026-06-05 5:29 ` Erni Sri Satya Vennela
2026-06-05 23:13 ` Jakub Kicinski
2026-06-06 11:17 ` Erni Sri Satya Vennela
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.