Netdev List
 help / color / mirror / Atom feed
From: Petr Machata <petrm@nvidia.com>
To: Keno Fischer <keno@juliahub.com>
Cc: <netdev@vger.kernel.org>, Ido Schimmel <idosch@nvidia.com>,
	Petr Machata <petrm@nvidia.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH net-next] mlxsw: spectrum_dcb: Reject writes owned by an offloaded qdisc
Date: Tue, 19 May 2026 17:14:48 +0200	[thread overview]
Message-ID: <87ik8jmlow.fsf@nvidia.com> (raw)
In-Reply-To: <agqqK_6PxbvAwsMX@juliahub.com>


Keno Fischer <keno@juliahub.com> writes:

> Some bits of state in the mlxsw driver can be controlled by two
> mechanisms. The first is the DCB netlink command set (RTM_SETDCB
> and e.g. DCB_ATTR_IEEE_ETS/DCB_ATTR_IEEE_MAXRATE), reachable e.g. from
> the `dcb` userspace utility. The second is via the qdisc framework
> (reachable e.g. from the `tc` userspace utility). Because they touch
> the same bit of state, it is currently possible for the ETS config
> to get out of sync with what the qdisc expects. One hard-to-debug
> consequence is that a config script becomes non-idempotent. E.g.,
> consider
>
> ```
> dcb ets set dev "${port}" prio-tc 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
> tc qdisc replace dev "${port}" root handle 1: prio bands 8 \
> 	priomap 0 1 2 3 4 5 6 7
> ```
>
> Here the user made a mistake and put the priomap in the wrong order.
> The first time this code is run, the `qdisc` specification overwrites
> the `prio-tc` specification. However, the second time this code is run,
> the subsystem sees that the qdisc is identical and does nothing, causing
> the `prio-tc` setting to stick and the final mapping to be different.
> This can be very confusing, since the same config script creates two
> different states.
>
> I think there's two reasonable ways to fix this. One would be by
> reflecting the ets state changes back into the qdisc, so that the
> qdisc subsystem notices that it's been changed. However, I think it
> is clearer to just forbid the non-qdisc API from trying to make
> changes while the qdisc is offloaded.
>
> This is similar to the existing check in mlxsw_sp_dcbnl_setbuffer,
> which does essentially the same thing in reverse (buffers are only
> explicitly managed when there is an offloaded qdisc, otherwise
> they're managed explicitly).
>
> I hope this saves the next person a few hours of debugging wondering
> why their tc_no_buffer_discard_uc_tc_ counters are going up.
>
> Assisted-by: Claude:claude-opus-4-7
> Signed-off-by: Keno Fischer <keno@juliahub.com>

We do have this overwriting behavior documented on the wiki, but I don't
think it was ever really a deliberate consideration. I think it makes
sense to distance the DCB and qdisc configurations that way. I ran the
mlxsw-specific qos / sch tests that we have have, and everything passes,
which indicates that at least we didn't rely on the behavior in any way.

I'll let it pass through a full regression tonight. Overall the patch
looks OK to me, but Sashiko found something:

> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
> index 73519301b744..c2ce4f3f562e 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
> @@ -2316,6 +2316,49 @@ int mlxsw_sp_setup_tc_block_qevent_mark(struct mlxsw_sp_port *mlxsw_sp_port,
>  					      action_mask);
>  }
>  
> +bool mlxsw_sp_qdisc_has_prio_ets(struct mlxsw_sp_port *mlxsw_sp_port)
> +{
> +	struct mlxsw_sp_qdisc_state *qdisc_state = mlxsw_sp_port->qdisc;
> +	struct mlxsw_sp_qdisc *root_qdisc;
> +	bool found;
> +
> +	mutex_lock(&qdisc_state->lock);
> +	root_qdisc = &qdisc_state->root_qdisc;
> +	found = root_qdisc->ops &&
> +		    (root_qdisc->ops->type == MLXSW_SP_QDISC_PRIO ||
> +		     root_qdisc->ops->type == MLXSW_SP_QDISC_ETS);
> +	mutex_unlock(&qdisc_state->lock);
> +
> +	return found;
> +}

From Sashiko review:

    The check here only examines the root qdisc type. However, a prio or ets
    qdisc can be a valid child of a root tbf qdisc in the offloaded
    configuration.

This is correct, ETS/PRIO are offloaded when under root TBF as well.

  reply	other threads:[~2026-05-19 16:03 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-18  5:56 [PATCH net-next] mlxsw: spectrum_dcb: Reject writes owned by an offloaded qdisc Keno Fischer
2026-05-19 15:14 ` Petr Machata [this message]
2026-05-19 22:55   ` Keno Fischer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ik8jmlow.fsf@nvidia.com \
    --to=petrm@nvidia.com \
    --cc=idosch@nvidia.com \
    --cc=keno@juliahub.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox