* [PATCH bpf] net/sched: cls_bpf: prevent unbounded recursion in offload rollback
@ 2026-05-22 2:58 Jiayuan Chen
0 siblings, 0 replies; only message in thread
From: Jiayuan Chen @ 2026-05-22 2:58 UTC (permalink / raw)
To: bpf, netdev, kuba, daniel, martin.lau
Cc: Jiayuan Chen, Alexei Starovoitov, Andrii Nakryiko,
Eduard Zingerman, Kumar Kartikeya Dwivedi, Song Liu,
Yonghong Song, Jiri Olsa, John Fastabend, Stanislav Fomichev,
Jamal Hadi Salim, Jiri Pirko, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, linux-kernel
Quan Sun reported [1] a stack overflow in cls_bpf_offload_cmd().
Reproducer on netdevsim: add a skip_sw cls_bpf filter, set the
bpf_tc_accept debugfs knob to 0, then `tc filter replace`. The replace
calls tc_setup_cb_replace() which fails. cls_bpf_offload_cmd() then
swaps prog/oldprog and recursively calls itself to roll back. But
bpf_tc_accept=0 makes the rollback fail too, which triggers yet another
rollback frame with the same arguments, and so on until the stack is
exhausted.
bpf_tc_accept is just a convenient knob for the reproducer. Any driver
whose tc_setup_cb_replace() fails twice in a row can hit the same loop,
so this is not a netdevsim-only issue.
Two ways to fix it:
1) Have the rollback call tc_setup_cb_add() on oldprog instead of
re-entering cls_bpf_offload_cmd().
2) Mark the rollback frame with a flag and skip a second-level
rollback from inside it.
Go with (2). It is the smaller change and keeps the original behaviour:
the rollback still goes through tc_setup_cb_replace(), so the driver
gets one real chance to restore its state. If that attempt also fails,
we just return the original error instead of recursing.
[1]: https://lore.kernel.org/bpf/ce5a6005-3c5e-4696-9e05-eba9461dc860@std.uestc.edu.cn/T/#u
Fixes: 102740bd9436 ("cls_bpf: fix offload assumptions after callback conversion")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
net/sched/cls_bpf.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 9a346b6221b35..d71fbf7cb407e 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -142,7 +142,8 @@ static bool cls_bpf_is_ebpf(const struct cls_bpf_prog *prog)
static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
struct cls_bpf_prog *oldprog,
- struct netlink_ext_ack *extack)
+ struct netlink_ext_ack *extack,
+ bool is_rollback)
{
struct tcf_block *block = tp->chain->block;
struct tc_cls_bpf_offload cls_bpf = {};
@@ -176,8 +177,8 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
skip_sw, &oldprog->gen_flags,
&oldprog->in_hw_count, true);
- if (prog && err) {
- cls_bpf_offload_cmd(tp, oldprog, prog, extack);
+ if (prog && err && !is_rollback) {
+ cls_bpf_offload_cmd(tp, oldprog, prog, extack, true);
return err;
}
@@ -208,7 +209,7 @@ static int cls_bpf_offload(struct tcf_proto *tp, struct cls_bpf_prog *prog,
if (!prog && !oldprog)
return 0;
- return cls_bpf_offload_cmd(tp, prog, oldprog, extack);
+ return cls_bpf_offload_cmd(tp, prog, oldprog, extack, false);
}
static void cls_bpf_stop_offload(struct tcf_proto *tp,
@@ -217,7 +218,7 @@ static void cls_bpf_stop_offload(struct tcf_proto *tp,
{
int err;
- err = cls_bpf_offload_cmd(tp, NULL, prog, extack);
+ err = cls_bpf_offload_cmd(tp, NULL, prog, extack, false);
if (err)
pr_err("Stopping hardware offload failed: %d\n", err);
}
--
2.43.0
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2026-05-22 2:59 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22 2:58 [PATCH bpf] net/sched: cls_bpf: prevent unbounded recursion in offload rollback Jiayuan Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox