From: Stephen Hemminger <stephen@networkplumber.org>
To: Victor Nogueira <victor@mojatatu.com>
Cc: davem@davemloft.net, kuba@kernel.org, edumazet@google.com,
pabeni@redhat.com, jhs@mojatatu.com, jiri@resnulli.us,
xiyou.wangcong@gmail.com, horms@kernel.org,
netdev@vger.kernel.org
Subject: Re: [RFC PATCH net-next v2] net/sched: Introduce qdisc quirk_chk op
Date: Tue, 25 Nov 2025 19:11:07 -0800 [thread overview]
Message-ID: <20251125191107.6d9efcfb@phoenix.local> (raw)
In-Reply-To: <20251125224604.872351-1-victor@mojatatu.com>
On Tue, 25 Nov 2025 19:46:04 -0300
Victor Nogueira <victor@mojatatu.com> wrote:
> There is a pattern of bugs that end up creating UAFs or null ptr derefs.
> The majority of these bugs follow the formula below:
> a) create a nonsense hierarchy of qdiscs which has no practical value,
> b) start sending packets
> Optional c) netlink cmds to change hierarchy some more; It's more fun if
> you can get packets stuck - the formula in this case includes non
> work-conserving qdiscs somewhere in the hierarchy
> Optional d dependent on c) send more packets
> e) profit
>
> Current init/change qdisc APIs are localised to validate only within the
> constraint of a single qdisc. So catching #a or #c is a challenge. Our
> policy, when said bugs are presented, is to "make it work" by modifying
> generally used data structures and code, but these come at the expense of
> adding special checks for corner cases which are nonsensical to begin with.
>
> The goal of this patchset is to create an equivalent to PCI quirks, which
> will catch nonsensical hierarchies in #a and #c and reject such a config.
>
> With that in mind, we are proposing the addition of a new qdisc op
> (quirk_chk). We introduce, as a first example, the quirk_chk op to netem.
> Its purpose here is to validate whether the user is attempting to add 2
> netem duplicates in the same qdisc tree which will be forbidden unless
> the root qdisc is multiqueue.
>
> Here is an example that should now work:
>
> DEV="eth0"
> NUM_QUEUES=4
> DUPLICATE_PERCENT="5%"
>
> tc qdisc del dev $DEV root > /dev/null 2>&1
> tc qdisc add dev $DEV root handle 1: mq
>
> for i in $(seq 1 $NUM_QUEUES); do
> HANDLE_ID=$((i * 10))
> PARENT_ID="1:$i"
> tc qdisc add dev $DEV parent $PARENT_ID handle \
> ${HANDLE_ID}: netem duplicate $DUPLICATE_PERCENT
> done
>
> Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Victor Nogueira <victor@mojatatu.com>
> ---
> v1 -> v2:
> - Simplify process of getting root qdisc in netem_quirk_chk
> - Use parent's major directly instead of looking up parent qdisc in
> netem_quirk_chk
> - Call parse_attrs in netem_quirk_chk to avoid issue caught by syzbot
>
> Link to v1:
> https://lore.kernel.org/netdev/20251124223749.503979-1-victor@mojatatu.com/
> ---
> include/net/sch_generic.h | 3 +++
> net/sched/sch_api.c | 12 ++++++++++++
> net/sched/sch_netem.c | 40 +++++++++++++++++++++++++++------------
> 3 files changed, 43 insertions(+), 12 deletions(-)
>
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 94966692ccdf..60450372c5d5 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -313,6 +313,9 @@ struct Qdisc_ops {
> u32 block_index);
> void (*egress_block_set)(struct Qdisc *sch,
> u32 block_index);
> + int (*quirk_chk)(struct Qdisc *sch,
> + struct nlattr *arg,
> + struct netlink_ext_ack *extack);
> u32 (*ingress_block_get)(struct Qdisc *sch);
> u32 (*egress_block_get)(struct Qdisc *sch);
>
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index f56b18c8aebf..a850df437691 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -1315,6 +1315,12 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
> rcu_assign_pointer(sch->stab, stab);
> }
>
> + if (ops->quirk_chk) {
> + err = ops->quirk_chk(sch, tca[TCA_OPTIONS], extack);
> + if (err != 0)
> + goto err_out3;
> + }
> +
> if (ops->init) {
> err = ops->init(sch, tca[TCA_OPTIONS], extack);
> if (err != 0)
> @@ -1378,6 +1384,12 @@ static int qdisc_change(struct Qdisc *sch, struct nlattr **tca,
> NL_SET_ERR_MSG(extack, "Change of blocks is not supported");
> return -EOPNOTSUPP;
> }
> + if (sch->ops->quirk_chk) {
> + err = sch->ops->quirk_chk(sch, tca[TCA_OPTIONS],
> + extack);
> + if (err != 0)
> + return err;
> + }
> err = sch->ops->change(sch, tca[TCA_OPTIONS], extack);
> if (err)
> return err;
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index eafc316ae319..ceece2ae37bc 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -975,13 +975,27 @@ static int parse_attr(struct nlattr *tb[], int maxtype, struct nlattr *nla,
>
> static const struct Qdisc_class_ops netem_class_ops;
>
> -static int check_netem_in_tree(struct Qdisc *sch, bool duplicates,
> - struct netlink_ext_ack *extack)
> +static int netem_quirk_chk(struct Qdisc *sch, struct nlattr *opt,
> + struct netlink_ext_ack *extack)
> {
> + struct nlattr *tb[TCA_NETEM_MAX + 1];
> + struct tc_netem_qopt *qopt;
> struct Qdisc *root, *q;
> + struct net_device *dev;
> + bool root_is_mq;
> + bool duplicates;
> unsigned int i;
> + int ret;
> +
> + ret = parse_attr(tb, TCA_NETEM_MAX, opt, netem_policy, sizeof(*qopt));
> + if (ret < 0)
> + return ret;
>
> - root = qdisc_root_sleeping(sch);
> + qopt = nla_data(opt);
> + duplicates = qopt->duplicate;
> +
> + dev = sch->dev_queue->dev;
> + root = rtnl_dereference(dev->qdisc);
>
> if (sch != root && root->ops->cl_ops == &netem_class_ops) {
> if (duplicates ||
> @@ -992,19 +1006,25 @@ static int check_netem_in_tree(struct Qdisc *sch, bool duplicates,
> if (!qdisc_dev(root))
> return 0;
>
> + root_is_mq = root->flags & TCQ_F_MQROOT;
> +
What about HTB or other inherently multi-q qdisc?
Using netem on HTB on some branches is common practice.
next prev parent reply other threads:[~2025-11-26 3:11 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-25 22:46 [RFC PATCH net-next v2] net/sched: Introduce qdisc quirk_chk op Victor Nogueira
2025-11-26 3:11 ` Stephen Hemminger [this message]
2025-11-26 16:29 ` Jamal Hadi Salim
2025-11-26 20:31 ` Victor Nogueira
2025-11-26 4:02 ` Cong Wang
2025-11-26 8:16 ` [syzbot ci] " syzbot ci
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251125191107.6d9efcfb@phoenix.local \
--to=stephen@networkplumber.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=victor@mojatatu.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.