* [PATCH net-next] mlxsw: spectrum_dcb: Reject writes owned by an offloaded qdisc
@ 2026-05-18 5:56 Keno Fischer
2026-05-19 15:14 ` Petr Machata
0 siblings, 1 reply; 2+ messages in thread
From: Keno Fischer @ 2026-05-18 5:56 UTC (permalink / raw)
To: netdev; +Cc: Ido Schimmel, Petr Machata, linux-kernel
Some bits of state in the mlxsw driver can be controlled by two
mechanisms. The first is the DCB netlink command set (RTM_SETDCB
and e.g. DCB_ATTR_IEEE_ETS/DCB_ATTR_IEEE_MAXRATE), reachable e.g. from
the `dcb` userspace utility. The second is via the qdisc framework
(reachable e.g. from the `tc` userspace utility). Because they touch
the same bit of state, it is currently possible for the ETS config
to get out of sync with what the qdisc expects. One hard-to-debug
consequence is that a config script becomes non-idempotent. E.g.,
consider
```
dcb ets set dev "${port}" prio-tc 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
tc qdisc replace dev "${port}" root handle 1: prio bands 8 \
priomap 0 1 2 3 4 5 6 7
```
Here the user made a mistake and put the priomap in the wrong order.
The first time this code is run, the `qdisc` specification overwrites
the `prio-tc` specification. However, the second time this code is run,
the subsystem sees that the qdisc is identical and does nothing, causing
the `prio-tc` setting to stick and the final mapping to be different.
This can be very confusing, since the same config script creates two
different states.
I think there's two reasonable ways to fix this. One would be by
reflecting the ets state changes back into the qdisc, so that the
qdisc subsystem notices that it's been changed. However, I think it
is clearer to just forbid the non-qdisc API from trying to make
changes while the qdisc is offloaded.
This is similar to the existing check in mlxsw_sp_dcbnl_setbuffer,
which does essentially the same thing in reverse (buffers are only
explicitly managed when there is an offloaded qdisc, otherwise
they're managed explicitly).
I hope this saves the next person a few hours of debugging wondering
why their tc_no_buffer_discard_uc_tc_ counters are going up.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Keno Fischer <keno@juliahub.com>
---
.../net/ethernet/mellanox/mlxsw/spectrum.h | 2 +
.../ethernet/mellanox/mlxsw/spectrum_dcb.c | 10 +++++
.../ethernet/mellanox/mlxsw/spectrum_qdisc.c | 43 +++++++++++++++++++
3 files changed, 55 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index b03ff9e044f9..ecf4525c42a4 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -1260,6 +1260,8 @@ int mlxsw_sp_setup_tc_block_qevent_early_drop(struct mlxsw_sp_port *mlxsw_sp_por
struct flow_block_offload *f);
int mlxsw_sp_setup_tc_block_qevent_mark(struct mlxsw_sp_port *mlxsw_sp_port,
struct flow_block_offload *f);
+bool mlxsw_sp_qdisc_has_prio_ets(struct mlxsw_sp_port *mlxsw_sp_port);
+bool mlxsw_sp_qdisc_has_tbf_subgroup(struct mlxsw_sp_port *mlxsw_sp_port);
/* spectrum_fid.c */
struct mlxsw_sp_fid *mlxsw_sp_fid_lookup_by_index(struct mlxsw_sp *mlxsw_sp,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c
index f1ad937405a3..6e689d5de6fa 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c
@@ -151,6 +151,11 @@ static int mlxsw_sp_dcbnl_ieee_setets(struct net_device *dev,
struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
int err;
+ if (mlxsw_sp_qdisc_has_prio_ets(mlxsw_sp_port)) {
+ netdev_err(dev, "ETS configuration is controlled by offloaded qdisc\n");
+ return -EBUSY;
+ }
+
err = mlxsw_sp_port_ets_validate(mlxsw_sp_port, ets);
if (err)
return err;
@@ -450,6 +455,11 @@ static int mlxsw_sp_dcbnl_ieee_setmaxrate(struct net_device *dev,
struct ieee_maxrate *my_maxrate = mlxsw_sp_port->dcb.maxrate;
int err, i;
+ if (mlxsw_sp_qdisc_has_tbf_subgroup(mlxsw_sp_port)) {
+ netdev_err(dev, "maxrate is controlled by offloaded qdisc\n");
+ return -EBUSY;
+ }
+
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
err = mlxsw_sp_port_ets_maxrate_set(mlxsw_sp_port,
MLXSW_REG_QEEC_HR_SUBGROUP,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
index 73519301b744..c2ce4f3f562e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
@@ -2316,6 +2316,49 @@ int mlxsw_sp_setup_tc_block_qevent_mark(struct mlxsw_sp_port *mlxsw_sp_port,
action_mask);
}
+bool mlxsw_sp_qdisc_has_prio_ets(struct mlxsw_sp_port *mlxsw_sp_port)
+{
+ struct mlxsw_sp_qdisc_state *qdisc_state = mlxsw_sp_port->qdisc;
+ struct mlxsw_sp_qdisc *root_qdisc;
+ bool found;
+
+ mutex_lock(&qdisc_state->lock);
+ root_qdisc = &qdisc_state->root_qdisc;
+ found = root_qdisc->ops &&
+ (root_qdisc->ops->type == MLXSW_SP_QDISC_PRIO ||
+ root_qdisc->ops->type == MLXSW_SP_QDISC_ETS);
+ mutex_unlock(&qdisc_state->lock);
+
+ return found;
+}
+
+static struct mlxsw_sp_qdisc *
+mlxsw_sp_qdisc_walk_cb_find_subgroup_tbf(struct mlxsw_sp_qdisc *qdisc,
+ void *data)
+{
+ struct mlxsw_sp_port *mlxsw_sp_port = (struct mlxsw_sp_port *)data;
+
+ if (qdisc->ops && qdisc->ops->type == MLXSW_SP_QDISC_TBF &&
+ mlxsw_sp_qdisc_tbf_hr(mlxsw_sp_port, qdisc) ==
+ MLXSW_REG_QEEC_HR_SUBGROUP)
+ return qdisc;
+ return NULL;
+}
+
+bool mlxsw_sp_qdisc_has_tbf_subgroup(struct mlxsw_sp_port *mlxsw_sp_port)
+{
+ struct mlxsw_sp_qdisc_state *qdisc_state = mlxsw_sp_port->qdisc;
+ bool found;
+
+ mutex_lock(&qdisc_state->lock);
+ found = mlxsw_sp_qdisc_walk(&qdisc_state->root_qdisc,
+ mlxsw_sp_qdisc_walk_cb_find_subgroup_tbf,
+ mlxsw_sp_port) != NULL;
+ mutex_unlock(&qdisc_state->lock);
+
+ return found;
+}
+
int mlxsw_sp_tc_qdisc_init(struct mlxsw_sp_port *mlxsw_sp_port)
{
struct mlxsw_sp_qdisc_state *qdisc_state;
--
2.54.0
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH net-next] mlxsw: spectrum_dcb: Reject writes owned by an offloaded qdisc
2026-05-18 5:56 [PATCH net-next] mlxsw: spectrum_dcb: Reject writes owned by an offloaded qdisc Keno Fischer
@ 2026-05-19 15:14 ` Petr Machata
0 siblings, 0 replies; 2+ messages in thread
From: Petr Machata @ 2026-05-19 15:14 UTC (permalink / raw)
To: Keno Fischer; +Cc: netdev, Ido Schimmel, Petr Machata, linux-kernel
Keno Fischer <keno@juliahub.com> writes:
> Some bits of state in the mlxsw driver can be controlled by two
> mechanisms. The first is the DCB netlink command set (RTM_SETDCB
> and e.g. DCB_ATTR_IEEE_ETS/DCB_ATTR_IEEE_MAXRATE), reachable e.g. from
> the `dcb` userspace utility. The second is via the qdisc framework
> (reachable e.g. from the `tc` userspace utility). Because they touch
> the same bit of state, it is currently possible for the ETS config
> to get out of sync with what the qdisc expects. One hard-to-debug
> consequence is that a config script becomes non-idempotent. E.g.,
> consider
>
> ```
> dcb ets set dev "${port}" prio-tc 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
> tc qdisc replace dev "${port}" root handle 1: prio bands 8 \
> priomap 0 1 2 3 4 5 6 7
> ```
>
> Here the user made a mistake and put the priomap in the wrong order.
> The first time this code is run, the `qdisc` specification overwrites
> the `prio-tc` specification. However, the second time this code is run,
> the subsystem sees that the qdisc is identical and does nothing, causing
> the `prio-tc` setting to stick and the final mapping to be different.
> This can be very confusing, since the same config script creates two
> different states.
>
> I think there's two reasonable ways to fix this. One would be by
> reflecting the ets state changes back into the qdisc, so that the
> qdisc subsystem notices that it's been changed. However, I think it
> is clearer to just forbid the non-qdisc API from trying to make
> changes while the qdisc is offloaded.
>
> This is similar to the existing check in mlxsw_sp_dcbnl_setbuffer,
> which does essentially the same thing in reverse (buffers are only
> explicitly managed when there is an offloaded qdisc, otherwise
> they're managed explicitly).
>
> I hope this saves the next person a few hours of debugging wondering
> why their tc_no_buffer_discard_uc_tc_ counters are going up.
>
> Assisted-by: Claude:claude-opus-4-7
> Signed-off-by: Keno Fischer <keno@juliahub.com>
We do have this overwriting behavior documented on the wiki, but I don't
think it was ever really a deliberate consideration. I think it makes
sense to distance the DCB and qdisc configurations that way. I ran the
mlxsw-specific qos / sch tests that we have have, and everything passes,
which indicates that at least we didn't rely on the behavior in any way.
I'll let it pass through a full regression tonight. Overall the patch
looks OK to me, but Sashiko found something:
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
> index 73519301b744..c2ce4f3f562e 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
> @@ -2316,6 +2316,49 @@ int mlxsw_sp_setup_tc_block_qevent_mark(struct mlxsw_sp_port *mlxsw_sp_port,
> action_mask);
> }
>
> +bool mlxsw_sp_qdisc_has_prio_ets(struct mlxsw_sp_port *mlxsw_sp_port)
> +{
> + struct mlxsw_sp_qdisc_state *qdisc_state = mlxsw_sp_port->qdisc;
> + struct mlxsw_sp_qdisc *root_qdisc;
> + bool found;
> +
> + mutex_lock(&qdisc_state->lock);
> + root_qdisc = &qdisc_state->root_qdisc;
> + found = root_qdisc->ops &&
> + (root_qdisc->ops->type == MLXSW_SP_QDISC_PRIO ||
> + root_qdisc->ops->type == MLXSW_SP_QDISC_ETS);
> + mutex_unlock(&qdisc_state->lock);
> +
> + return found;
> +}
From Sashiko review:
The check here only examines the root qdisc type. However, a prio or ets
qdisc can be a valid child of a root tbf qdisc in the offloaded
configuration.
This is correct, ETS/PRIO are offloaded when under root TBF as well.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-05-19 16:03 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18 5:56 [PATCH net-next] mlxsw: spectrum_dcb: Reject writes owned by an offloaded qdisc Keno Fischer
2026-05-19 15:14 ` Petr Machata
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox