All of lore.kernel.org
 help / color / mirror / Atom feed
From: Petr Machata <petrm@nvidia.com>
To: Keno Fischer <keno@juliahub.com>
Cc: <netdev@vger.kernel.org>, Ido Schimmel <idosch@nvidia.com>,
	Petr Machata <petrm@nvidia.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH net-next] mlxsw: spectrum_dcb: Reject writes owned by an offloaded qdisc
Date: Tue, 19 May 2026 17:14:48 +0200	[thread overview]
Message-ID: <87ik8jmlow.fsf@nvidia.com> (raw)
In-Reply-To: <agqqK_6PxbvAwsMX@juliahub.com>


Keno Fischer <keno@juliahub.com> writes:

> Some bits of state in the mlxsw driver can be controlled by two
> mechanisms. The first is the DCB netlink command set (RTM_SETDCB
> and e.g. DCB_ATTR_IEEE_ETS/DCB_ATTR_IEEE_MAXRATE), reachable e.g. from
> the `dcb` userspace utility. The second is via the qdisc framework
> (reachable e.g. from the `tc` userspace utility). Because they touch
> the same bit of state, it is currently possible for the ETS config
> to get out of sync with what the qdisc expects. One hard-to-debug
> consequence is that a config script becomes non-idempotent. E.g.,
> consider
>
> ```
> dcb ets set dev "${port}" prio-tc 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
> tc qdisc replace dev "${port}" root handle 1: prio bands 8 \
> 	priomap 0 1 2 3 4 5 6 7
> ```
>
> Here the user made a mistake and put the priomap in the wrong order.
> The first time this code is run, the `qdisc` specification overwrites
> the `prio-tc` specification. However, the second time this code is run,
> the subsystem sees that the qdisc is identical and does nothing, causing
> the `prio-tc` setting to stick and the final mapping to be different.
> This can be very confusing, since the same config script creates two
> different states.
>
> I think there's two reasonable ways to fix this. One would be by
> reflecting the ets state changes back into the qdisc, so that the
> qdisc subsystem notices that it's been changed. However, I think it
> is clearer to just forbid the non-qdisc API from trying to make
> changes while the qdisc is offloaded.
>
> This is similar to the existing check in mlxsw_sp_dcbnl_setbuffer,
> which does essentially the same thing in reverse (buffers are only
> explicitly managed when there is an offloaded qdisc, otherwise
> they're managed explicitly).
>
> I hope this saves the next person a few hours of debugging wondering
> why their tc_no_buffer_discard_uc_tc_ counters are going up.
>
> Assisted-by: Claude:claude-opus-4-7
> Signed-off-by: Keno Fischer <keno@juliahub.com>

We do have this overwriting behavior documented on the wiki, but I don't
think it was ever really a deliberate consideration. I think it makes
sense to distance the DCB and qdisc configurations that way. I ran the
mlxsw-specific qos / sch tests that we have have, and everything passes,
which indicates that at least we didn't rely on the behavior in any way.

I'll let it pass through a full regression tonight. Overall the patch
looks OK to me, but Sashiko found something:

> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
> index 73519301b744..c2ce4f3f562e 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
> @@ -2316,6 +2316,49 @@ int mlxsw_sp_setup_tc_block_qevent_mark(struct mlxsw_sp_port *mlxsw_sp_port,
>  					      action_mask);
>  }
>  
> +bool mlxsw_sp_qdisc_has_prio_ets(struct mlxsw_sp_port *mlxsw_sp_port)
> +{
> +	struct mlxsw_sp_qdisc_state *qdisc_state = mlxsw_sp_port->qdisc;
> +	struct mlxsw_sp_qdisc *root_qdisc;
> +	bool found;
> +
> +	mutex_lock(&qdisc_state->lock);
> +	root_qdisc = &qdisc_state->root_qdisc;
> +	found = root_qdisc->ops &&
> +		    (root_qdisc->ops->type == MLXSW_SP_QDISC_PRIO ||
> +		     root_qdisc->ops->type == MLXSW_SP_QDISC_ETS);
> +	mutex_unlock(&qdisc_state->lock);
> +
> +	return found;
> +}

From Sashiko review:

    The check here only examines the root qdisc type. However, a prio or ets
    qdisc can be a valid child of a root tbf qdisc in the offloaded
    configuration.

This is correct, ETS/PRIO are offloaded when under root TBF as well.

  reply	other threads:[~2026-05-19 16:03 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-18  5:56 [PATCH net-next] mlxsw: spectrum_dcb: Reject writes owned by an offloaded qdisc Keno Fischer
2026-05-19 15:14 ` Petr Machata [this message]
2026-05-19 22:55   ` Keno Fischer
2026-05-20  8:35     ` Petr Machata

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ik8jmlow.fsf@nvidia.com \
    --to=petrm@nvidia.com \
    --cc=idosch@nvidia.com \
    --cc=keno@juliahub.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.