Netdev List
 help / color / mirror / Atom feed
From: Keno Fischer <keno@juliahub.com>
To: netdev@vger.kernel.org
Cc: Ido Schimmel <idosch@nvidia.com>, Petr Machata <petrm@nvidia.com>,
	linux-kernel@vger.kernel.org
Subject: [PATCH net-next] mlxsw: spectrum_dcb: Reject writes owned by an offloaded qdisc
Date: Mon, 18 May 2026 01:56:59 -0400	[thread overview]
Message-ID: <agqqK_6PxbvAwsMX@juliahub.com> (raw)

Some bits of state in the mlxsw driver can be controlled by two
mechanisms. The first is the DCB netlink command set (RTM_SETDCB
and e.g. DCB_ATTR_IEEE_ETS/DCB_ATTR_IEEE_MAXRATE), reachable e.g. from
the `dcb` userspace utility. The second is via the qdisc framework
(reachable e.g. from the `tc` userspace utility). Because they touch
the same bit of state, it is currently possible for the ETS config
to get out of sync with what the qdisc expects. One hard-to-debug
consequence is that a config script becomes non-idempotent. E.g.,
consider

```
dcb ets set dev "${port}" prio-tc 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
tc qdisc replace dev "${port}" root handle 1: prio bands 8 \
	priomap 0 1 2 3 4 5 6 7
```

Here the user made a mistake and put the priomap in the wrong order.
The first time this code is run, the `qdisc` specification overwrites
the `prio-tc` specification. However, the second time this code is run,
the subsystem sees that the qdisc is identical and does nothing, causing
the `prio-tc` setting to stick and the final mapping to be different.
This can be very confusing, since the same config script creates two
different states.

I think there's two reasonable ways to fix this. One would be by
reflecting the ets state changes back into the qdisc, so that the
qdisc subsystem notices that it's been changed. However, I think it
is clearer to just forbid the non-qdisc API from trying to make
changes while the qdisc is offloaded.

This is similar to the existing check in mlxsw_sp_dcbnl_setbuffer,
which does essentially the same thing in reverse (buffers are only
explicitly managed when there is an offloaded qdisc, otherwise
they're managed explicitly).

I hope this saves the next person a few hours of debugging wondering
why their tc_no_buffer_discard_uc_tc_ counters are going up.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Keno Fischer <keno@juliahub.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum.h    |  2 +
 .../ethernet/mellanox/mlxsw/spectrum_dcb.c    | 10 +++++
 .../ethernet/mellanox/mlxsw/spectrum_qdisc.c  | 43 +++++++++++++++++++
 3 files changed, 55 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index b03ff9e044f9..ecf4525c42a4 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -1260,6 +1260,8 @@ int mlxsw_sp_setup_tc_block_qevent_early_drop(struct mlxsw_sp_port *mlxsw_sp_por
 					      struct flow_block_offload *f);
 int mlxsw_sp_setup_tc_block_qevent_mark(struct mlxsw_sp_port *mlxsw_sp_port,
 					struct flow_block_offload *f);
+bool mlxsw_sp_qdisc_has_prio_ets(struct mlxsw_sp_port *mlxsw_sp_port);
+bool mlxsw_sp_qdisc_has_tbf_subgroup(struct mlxsw_sp_port *mlxsw_sp_port);
 
 /* spectrum_fid.c */
 struct mlxsw_sp_fid *mlxsw_sp_fid_lookup_by_index(struct mlxsw_sp *mlxsw_sp,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c
index f1ad937405a3..6e689d5de6fa 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c
@@ -151,6 +151,11 @@ static int mlxsw_sp_dcbnl_ieee_setets(struct net_device *dev,
 	struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
 	int err;
 
+	if (mlxsw_sp_qdisc_has_prio_ets(mlxsw_sp_port)) {
+		netdev_err(dev, "ETS configuration is controlled by offloaded qdisc\n");
+		return -EBUSY;
+	}
+
 	err = mlxsw_sp_port_ets_validate(mlxsw_sp_port, ets);
 	if (err)
 		return err;
@@ -450,6 +455,11 @@ static int mlxsw_sp_dcbnl_ieee_setmaxrate(struct net_device *dev,
 	struct ieee_maxrate *my_maxrate = mlxsw_sp_port->dcb.maxrate;
 	int err, i;
 
+	if (mlxsw_sp_qdisc_has_tbf_subgroup(mlxsw_sp_port)) {
+		netdev_err(dev, "maxrate is controlled by offloaded qdisc\n");
+		return -EBUSY;
+	}
+
 	for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
 		err = mlxsw_sp_port_ets_maxrate_set(mlxsw_sp_port,
 						    MLXSW_REG_QEEC_HR_SUBGROUP,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
index 73519301b744..c2ce4f3f562e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
@@ -2316,6 +2316,49 @@ int mlxsw_sp_setup_tc_block_qevent_mark(struct mlxsw_sp_port *mlxsw_sp_port,
 					      action_mask);
 }
 
+bool mlxsw_sp_qdisc_has_prio_ets(struct mlxsw_sp_port *mlxsw_sp_port)
+{
+	struct mlxsw_sp_qdisc_state *qdisc_state = mlxsw_sp_port->qdisc;
+	struct mlxsw_sp_qdisc *root_qdisc;
+	bool found;
+
+	mutex_lock(&qdisc_state->lock);
+	root_qdisc = &qdisc_state->root_qdisc;
+	found = root_qdisc->ops &&
+		    (root_qdisc->ops->type == MLXSW_SP_QDISC_PRIO ||
+		     root_qdisc->ops->type == MLXSW_SP_QDISC_ETS);
+	mutex_unlock(&qdisc_state->lock);
+
+	return found;
+}
+
+static struct mlxsw_sp_qdisc *
+mlxsw_sp_qdisc_walk_cb_find_subgroup_tbf(struct mlxsw_sp_qdisc *qdisc,
+					 void *data)
+{
+	struct mlxsw_sp_port *mlxsw_sp_port = (struct mlxsw_sp_port *)data;
+
+	if (qdisc->ops && qdisc->ops->type == MLXSW_SP_QDISC_TBF &&
+	    mlxsw_sp_qdisc_tbf_hr(mlxsw_sp_port, qdisc) ==
+	    MLXSW_REG_QEEC_HR_SUBGROUP)
+		return qdisc;
+	return NULL;
+}
+
+bool mlxsw_sp_qdisc_has_tbf_subgroup(struct mlxsw_sp_port *mlxsw_sp_port)
+{
+	struct mlxsw_sp_qdisc_state *qdisc_state = mlxsw_sp_port->qdisc;
+	bool found;
+
+	mutex_lock(&qdisc_state->lock);
+	found = mlxsw_sp_qdisc_walk(&qdisc_state->root_qdisc,
+				    mlxsw_sp_qdisc_walk_cb_find_subgroup_tbf,
+				    mlxsw_sp_port) != NULL;
+	mutex_unlock(&qdisc_state->lock);
+
+	return found;
+}
+
 int mlxsw_sp_tc_qdisc_init(struct mlxsw_sp_port *mlxsw_sp_port)
 {
 	struct mlxsw_sp_qdisc_state *qdisc_state;
-- 
2.54.0


             reply	other threads:[~2026-05-18  5:57 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-18  5:56 Keno Fischer [this message]
2026-05-19 15:14 ` [PATCH net-next] mlxsw: spectrum_dcb: Reject writes owned by an offloaded qdisc Petr Machata

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agqqK_6PxbvAwsMX@juliahub.com \
    --to=keno@juliahub.com \
    --cc=idosch@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=petrm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox