From: Vinicius Costa Gomes <vinicius.gomes@intel.com>
To: Vladimir Oltean <vladimir.oltean@nxp.com>, netdev@vger.kernel.org
Cc: Jamal Hadi Salim <jhs@mojatatu.com>,
Cong Wang <xiyou.wangcong@gmail.com>,
Jiri Pirko <jiri@resnulli.us>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Voon Weifeng <weifeng.voon@intel.com>,
Vladimir Oltean <olteanv@gmail.com>,
Kurt Kanzenbach <kurt@linutronix.de>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 net 1/2] net/sched: taprio: avoid disabling offload when it was never enabled
Date: Thu, 15 Sep 2022 14:52:29 -0700 [thread overview]
Message-ID: <877d24gvaa.fsf@intel.com> (raw)
In-Reply-To: <20220915100802.2308279-2-vladimir.oltean@nxp.com>
Vladimir Oltean <vladimir.oltean@nxp.com> writes:
> In an incredibly strange API design decision, qdisc->destroy() gets
> called even if qdisc->init() never succeeded, not exclusively since
> commit 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation"),
> but apparently also earlier (in the case of qdisc_create_dflt()).
>
> The taprio qdisc does not fully acknowledge this when it attempts full
> offload, because it starts off with q->flags = TAPRIO_FLAGS_INVALID in
> taprio_init(), then it replaces q->flags with TCA_TAPRIO_ATTR_FLAGS
> parsed from netlink (in taprio_change(), tail called from taprio_init()).
>
> But in taprio_destroy(), we call taprio_disable_offload(), and this
> determines what to do based on FULL_OFFLOAD_IS_ENABLED(q->flags).
>
> But looking at the implementation of FULL_OFFLOAD_IS_ENABLED()
> (a bitwise check of bit 1 in q->flags), it is invalid to call this macro
> on q->flags when it contains TAPRIO_FLAGS_INVALID, because that is set
> to U32_MAX, and therefore FULL_OFFLOAD_IS_ENABLED() will return true on
> an invalid set of flags.
>
> As a result, it is possible to crash the kernel if user space forces an
> error between setting q->flags = TAPRIO_FLAGS_INVALID, and the calling
> of taprio_enable_offload(). This is because drivers do not expect the
> offload to be disabled when it was never enabled.
>
> The error that we force here is to attach taprio as a non-root qdisc,
> but instead as child of an mqprio root qdisc:
>
> $ tc qdisc add dev swp0 root handle 1: \
> mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \
> queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
> $ tc qdisc replace dev swp0 parent 1:1 \
> taprio num_tc 8 map 0 1 2 3 4 5 6 7 \
> queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 \
> sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
> flags 0x0 clockid CLOCK_TAI
> Unable to handle kernel paging request at virtual address fffffffffffffff8
> [fffffffffffffff8] pgd=0000000000000000, p4d=0000000000000000
> Internal error: Oops: 96000004 [#1] PREEMPT SMP
> Call trace:
> taprio_dump+0x27c/0x310
> vsc9959_port_setup_tc+0x1f4/0x460
> felix_port_setup_tc+0x24/0x3c
> dsa_slave_setup_tc+0x54/0x27c
> taprio_disable_offload.isra.0+0x58/0xe0
> taprio_destroy+0x80/0x104
> qdisc_create+0x240/0x470
> tc_modify_qdisc+0x1fc/0x6b0
> rtnetlink_rcv_msg+0x12c/0x390
> netlink_rcv_skb+0x5c/0x130
> rtnetlink_rcv+0x1c/0x2c
>
> Fix this by keeping track of the operations we made, and undo the
> offload only if we actually did it.
>
> I've added "bool offloaded" inside a 4 byte hole between "int clockid"
> and "atomic64_t picos_per_byte". Now the first cache line looks like
> below:
>
> $ pahole -C taprio_sched net/sched/sch_taprio.o
> struct taprio_sched {
> struct Qdisc * * qdiscs; /* 0 8 */
> struct Qdisc * root; /* 8 8 */
> u32 flags; /* 16 4 */
> enum tk_offsets tk_offset; /* 20 4 */
> int clockid; /* 24 4 */
> bool offloaded; /* 28 1 */
>
> /* XXX 3 bytes hole, try to pack */
>
> atomic64_t picos_per_byte; /* 32 0 */
>
> /* XXX 8 bytes hole, try to pack */
>
> spinlock_t current_entry_lock; /* 40 0 */
>
> /* XXX 8 bytes hole, try to pack */
>
> struct sched_entry * current_entry; /* 48 8 */
> struct sched_gate_list * oper_sched; /* 56 8 */
> /* --- cacheline 1 boundary (64 bytes) --- */
>
> Fixes: 9c66d1564676 ("taprio: Add support for hardware offloading")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cheers,
--
Vinicius
next prev parent reply other threads:[~2022-09-15 21:52 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-15 10:08 [PATCH v2 net 0/2] Fixes for tc-taprio software mode Vladimir Oltean
2022-09-15 10:08 ` [PATCH v2 net 1/2] net/sched: taprio: avoid disabling offload when it was never enabled Vladimir Oltean
2022-09-15 21:52 ` Vinicius Costa Gomes [this message]
2022-09-15 10:08 ` [PATCH v2 net 2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs Vladimir Oltean
2022-09-15 21:54 ` Vinicius Costa Gomes
2023-02-22 14:03 ` Kurt Kanzenbach
2023-02-22 14:25 ` Vladimir Oltean
2023-02-23 13:03 ` Greg Kroah-Hartman
2023-02-23 13:18 ` Vladimir Oltean
2022-09-20 19:00 ` [PATCH v2 net 0/2] Fixes for tc-taprio software mode patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877d24gvaa.fsf@intel.com \
--to=vinicius.gomes@intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=kurt@linutronix.de \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=olteanv@gmail.com \
--cc=pabeni@redhat.com \
--cc=vladimir.oltean@nxp.com \
--cc=weifeng.voon@intel.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.