From: Jamal Hadi Salim <jhs@mojatatu.com>
To: netdev@vger.kernel.org, bpf@vger.kernel.org
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, horms@kernel.org, toke@toke.dk,
jiri@resnulli.us, bigeasy@linutronix.de, clrkwllms@kernel.org,
rostedt@goodmis.org, kuniyu@google.com, sdf.kernel@gmail.com,
skhawaja@google.com, liuhangbin@gmail.com, krikku@gmail.com,
mkarsten@uwaterloo.ca, victor@mojatatu.com, ast@kernel.org,
hawk@kernel.org, john.fastabend@gmail.com, daniel@iogearbox.net,
Jamal Hadi Salim <jhs@mojatatu.com>,
Sashiko <sashiko-bot@kernel.org>
Subject: [PATCH net 1/3] net: Extend bpf_net_context lifetime to cover qdisc enqueue
Date: Fri, 26 Jun 2026 12:51:54 -0400 [thread overview]
Message-ID: <20260626165156.169012-2-jhs@mojatatu.com> (raw)
In-Reply-To: <20260626165156.169012-1-jhs@mojatatu.com>
The bpf_net_context used by sch_handle_egress() is stack-allocated and torn
down in that function returned. By the time tcf_qevent_handle() runs
current->bpf_net_context is NULL.
When a filter attached to a qevent block (e.g. RED's early_drop or mark
qevents, which always use shared blocks) returns TC_ACT_REDIRECT,
tcf_qevent_handle() calls skb_do_redirect(), which in turn calls bpf helper
bpf_net_ctx_get_ri(). That helper unconditionally dereferences
current->bpf_net_context resulting in a NULL pointer dereference.
Note: The same holds for actions that invoke BPF redirect helpers
(e.g. act_bpf running a program that calls bpf_redirect()) during qevent
classification itself. And as a matter of fact the same assumption is
made in the code outside of tc.
Fix:
Move the bpf_net_context lifecycle out of sch_handle_egress() into
__dev_queue_xmit(), so that it spans both the egress TC fast path and the
qdisc enqueue. The setup is placed outside the egress_needed_key static
branch because qevents are independent of clsact/NF egress hooks and
that key may stay disabled when only a qevent-bearing qdisc is
configured. Unfortunately this adds a small unconditional penalty to the
code path _per packet_ only guarded by CONFIG_NET_XGRESS (two writes and
one read for bpf_net_ctx_set, plus one write for bpf_net_ctx_clear).
This keeps all bpf_net_context management in net/core/dev.c i.e the
existing boundary between tc core and BPF without requiring any net/sched/
code to know about BPF plumbing.
Reproducer (see the accompanying tdc test):
tc qdisc add dev eth0 root handle 1: red limit 1MB min 10KB max 20KB \
avpkt 1000 burst 100 qevent early_drop block 10
tc qdisc add dev eth0 clsact
tc filter add block 10 pref 1 bpf obj redirect.o
tc filter add dev eth0 egress protocol ip prio 1 matchall \
action gact pass
traffic through eth0 triggers red_enqueue() -> tcf_qevent_handle() and,
on a redirect verdict, a NULL deref in skb_do_redirect().
Fixes: 401cb7dae813 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260620130749.226642-1-jhs%40mojatatu.com
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
net/core/dev.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 4b3d5cfdf6e0..8c214bfff8aa 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4527,14 +4527,11 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
{
struct bpf_mprog_entry *entry = rcu_dereference_bh(dev->tcx_egress);
enum skb_drop_reason drop_reason = SKB_DROP_REASON_TC_EGRESS;
- struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
int sch_ret;
if (!entry)
return skb;
- bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
-
/* qdisc_skb_cb(skb)->pkt_len & tcx_set_ingress() was
* already set by the caller.
*/
@@ -4550,12 +4547,10 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
/* No need to push/pop skb's mac_header here on egress! */
skb_do_redirect(skb);
*ret = NET_XMIT_SUCCESS;
- bpf_net_ctx_clear(bpf_net_ctx);
return NULL;
case TC_ACT_SHOT:
kfree_skb_reason(skb, drop_reason);
*ret = NET_XMIT_DROP;
- bpf_net_ctx_clear(bpf_net_ctx);
return NULL;
/* used by tc_run */
case TC_ACT_STOLEN:
@@ -4565,10 +4560,8 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
fallthrough;
case TC_ACT_CONSUMED:
*ret = NET_XMIT_SUCCESS;
- bpf_net_ctx_clear(bpf_net_ctx);
return NULL;
}
- bpf_net_ctx_clear(bpf_net_ctx);
return skb;
}
@@ -4767,6 +4760,9 @@ struct netdev_queue *netdev_core_pick_tx(struct net_device *dev,
*/
int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
{
+#ifdef CONFIG_NET_XGRESS
+ struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx = NULL;
+#endif
struct net_device *dev = skb->dev;
struct netdev_queue *txq = NULL;
enum skb_drop_reason reason;
@@ -4795,6 +4791,9 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
skb_update_prio(skb);
tcx_set_ingress(skb, false);
+#ifdef CONFIG_NET_XGRESS
+ bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
+#endif
#ifdef CONFIG_NET_EGRESS
if (static_branch_unlikely(&egress_needed_key)) {
if (nf_hook_egress_active()) {
@@ -4898,12 +4897,18 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
reason = SKB_DROP_REASON_RECURSION_LIMIT;
drop:
+#ifdef CONFIG_NET_XGRESS
+ bpf_net_ctx_clear(bpf_net_ctx);
+#endif
rcu_read_unlock_bh();
dev_core_stats_tx_dropped_inc(dev);
kfree_skb_list_reason(skb, reason);
return rc;
out:
+#ifdef CONFIG_NET_XGRESS
+ bpf_net_ctx_clear(bpf_net_ctx);
+#endif
rcu_read_unlock_bh();
return rc;
}
--
2.54.0
next prev parent reply other threads:[~2026-06-26 16:52 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-26 16:51 [PATCH net 0/3] Fix broken TC_ACT_REDIRECT Jamal Hadi Salim
2026-06-26 16:51 ` Jamal Hadi Salim [this message]
2026-06-26 16:51 ` [PATCH net 2/3] net/sched: Handle TC_ACT_REDIRECT from qdisc filter chains Jamal Hadi Salim
2026-06-26 16:51 ` [PATCH net 3/3] selftests/tc-testing: Verify bpf redirect on RED block with preceding clsact (egress) classifier Jamal Hadi Salim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260626165156.169012-2-jhs@mojatatu.com \
--to=jhs@mojatatu.com \
--cc=ast@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=bpf@vger.kernel.org \
--cc=clrkwllms@kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=jiri@resnulli.us \
--cc=john.fastabend@gmail.com \
--cc=krikku@gmail.com \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=liuhangbin@gmail.com \
--cc=mkarsten@uwaterloo.ca \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=rostedt@goodmis.org \
--cc=sashiko-bot@kernel.org \
--cc=sdf.kernel@gmail.com \
--cc=skhawaja@google.com \
--cc=toke@toke.dk \
--cc=victor@mojatatu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox