* [PATCH net v2 0/2] net/sched: act_ct: preserve tc_skb_cb across defragmentation
@ 2026-06-13 17:42 Ren Wei
2026-06-13 17:42 ` [PATCH net v2 1/2] " Ren Wei
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Ren Wei @ 2026-06-13 17:42 UTC (permalink / raw)
To: netdev, linux-kselftest, linux-kernel
Cc: jhs, jiri, kuba, paulb, victor, yuantan098, yifanwucs,
tomapufckgml, bird, xizh2024, n05ec
From: Zihan Xi <xizh2024@lzu.edu.cn>
Hi Linux kernel maintainers,
We found and validated an issue in net/sched/act_ct.c. The bug is
reachable when configuring TC with act_ct on a netdev (requires
CAP_NET_ADMIN). We have tested it, and the fix should not affect
other functionality.
We provide bug details, a PoC, and a crash log below.
v2 adds a tc-testing (TDC) selftest case in patch 2, per maintainer
feedback.
---- details below ----
Bug details:
tcf_ct_handle_fragments() calls nf_ct_handle_fragments() without
saving and restoring skb->cb. The defrag helper clears IPCB/IP6CB,
which aliases the tc_skb_cb/qdisc_skb_cb control buffer in
include/net/sch_generic.h. Fragmented traffic through act_ct
therefore loses qdisc metadata such as pkt_segs.
Later qdisc dequeue paths call qdisc_bstats_update() ->
qdisc_pkt_segs(). For a non-GSO skb, clobbered pkt_segs == 0 trips
DEBUG_NET_WARN_ON_ONCE() in qdisc_pkt_segs(). With panic_on_warn=1
the kernel panics.
Unlike ovs_ct_handle_fragments() in net/openvswitch/conntrack.c, the
act_ct caller only restored mru after defrag, not the full control
buffer. The attached patch saves and restores struct tc_skb_cb around
nf_ct_handle_fragments(), matching the OVS pattern.
Reproducer:
Run as root in the guest (QEMU bullseye image, eth0):
chmod +x ./poc.sh
./poc.sh eth0 10.0.2.2 100
The script installs a root prio qdisc, clsact egress with "action ct",
then sends oversized UDP datagrams with PMTUD disabled to force IPv4
fragmentation through the act_ct defrag path.
We run the PoC in a 2 vCPU, 2 GB RAM x86 QEMU environment.
------BEGIN poc.sh------
#!/bin/sh
set -eu
IFACE="${1:-eth0}"
DST="${2:-10.0.2.2}"
COUNT="${3:-100}"
sysctl -w kernel.panic_on_warn=1 >/dev/null
tc qdisc del dev "$IFACE" clsact 2>/dev/null || true
tc qdisc del dev "$IFACE" root 2>/dev/null || true
tc qdisc add dev "$IFACE" root handle 1: prio
tc qdisc add dev "$IFACE" clsact
tc filter add dev "$IFACE" egress protocol ip pref 1 u32 \
match u32 0 0 action ct zone 1 pipe
python3 - "$DST" "$COUNT" <<'PY'
import socket
import sys
import time
dst = sys.argv[1]
count = int(sys.argv[2])
IP_MTU_DISCOVER = 10
IP_PMTUDISC_DONT = 0
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.setsockopt(socket.IPPROTO_IP, IP_MTU_DISCOVER, IP_PMTUDISC_DONT)
payload = b"A" * 4000
for _ in range(count):
s.sendto(payload, (dst, 9))
time.sleep(0.01)
PY
------END poc.sh------
----BEGIN crash log----
[ 549.900801][T10210] Kernel panic - not syncing: kernel: panic_on_warn set ...
[ 549.901406][T10210] CPU: 2 UID: 0 PID: 10210 Comm: python3 Not tainted 7.1.0-rc1 #2 PREEMPT(full)
[ 549.902720][T10210] Call Trace:
[ 549.903756][T10210] ? qdisc_dequeue_head+0x287/0x370
[ 549.904713][T10210] check_panic_on_warn+0x61/0x80
[ 549.905053][T10210] __warn+0xe8/0x330
[ 549.905345][T10210] ? qdisc_dequeue_head+0x287/0x370
[ 549.909442][T10210] RIP: 0010:qdisc_dequeue_head+0x287/0x370
[ 549.914217][T10210] prio_dequeue+0x40c/0x6a0
[ 549.914539][T10210] __qdisc_run+0x170/0x1b30
[ 549.915561][T10210] __dev_queue_xmit+0x25e6/0x3ac0
[ 549.920352][T10210] ip_do_fragment+0x1188/0x19a0
[ 549.924214][T10210] udp_send_skb+0x885/0x1270
[ 549.924556][T10210] udp_sendmsg+0x13f3/0x20a0
-----END crash log-----
Best regards,
Zihan Xi
Zihan Xi (2):
net/sched: act_ct: preserve tc_skb_cb across defragmentation
selftests/tc-testing: act_ct: add TDC test for skb cb preservation
across defrag
net/sched/act_ct.c | 7 ++--
.../tc-testing/tc-tests/actions/ct.json | 38 +++++++++++++++++++
2 files changed, 42 insertions(+), 3 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH net v2 1/2] net/sched: act_ct: preserve tc_skb_cb across defragmentation
2026-06-13 17:42 [PATCH net v2 0/2] net/sched: act_ct: preserve tc_skb_cb across defragmentation Ren Wei
@ 2026-06-13 17:42 ` Ren Wei
2026-06-13 17:42 ` [PATCH net v2 2/2] selftests/tc-testing: act_ct: add TDC test for skb cb preservation across defrag Ren Wei
2026-06-19 0:50 ` [PATCH net v2 0/2] net/sched: act_ct: preserve tc_skb_cb across defragmentation patchwork-bot+netdevbpf
2 siblings, 0 replies; 4+ messages in thread
From: Ren Wei @ 2026-06-13 17:42 UTC (permalink / raw)
To: netdev
Cc: jhs, jiri, kuba, paulb, victor, yuantan098, yifanwucs,
tomapufckgml, bird, xizh2024, n05ec
From: Zihan Xi <xizh2024@lzu.edu.cn>
tcf_ct_handle_fragments() calls nf_ct_handle_fragments() without saving
and restoring skb->cb. The defrag helper clears IPCB/IP6CB, which aliases
the tc_skb_cb/qdisc_skb_cb control buffer. Fragmented traffic through
act_ct therefore loses qdisc metadata such as pkt_segs and can trigger
WARN_ON_ONCE() in qdisc_pkt_segs() when panic_on_warn is enabled.
Save and restore the full tc_skb_cb around nf_ct_handle_fragments(),
matching the pattern used by ovs_ct_handle_fragments().
Fixes: ec624fe740b4 ("net/sched: Extend qdisc control block with tc control block")
Cc: stable@vger.kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Assisted-by: Codex:gpt-5.4
Signed-off-by: Zihan Xi <xizh2024@lzu.edu.cn>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
---
changes in v2:
- Add TDC selftest in patch 2 per maintainer feedback
- v1 Link: https://lore.kernel.org/all/20260611154939.2615919-1-n05ec@lzu.edu.cn/
net/sched/act_ct.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 6158e13c98d3..ebd40daf05a6 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -845,10 +845,10 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb,
{
enum ip_conntrack_info ctinfo;
struct nf_conn *ct;
+ struct tc_skb_cb cb;
int err = 0;
bool frag;
u8 proto;
- u16 mru;
/* Previously seen (loopback)? Ignore. */
ct = nf_ct_get(skb, &ctinfo);
@@ -862,12 +862,13 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb,
if (err || !frag)
return err;
- err = nf_ct_handle_fragments(net, skb, zone, family, &proto, &mru);
+ cb = *tc_skb_cb(skb);
+ err = nf_ct_handle_fragments(net, skb, zone, family, &proto, &cb.mru);
if (err)
return err;
*defrag = true;
- tc_skb_cb(skb)->mru = mru;
+ *tc_skb_cb(skb) = cb;
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH net v2 2/2] selftests/tc-testing: act_ct: add TDC test for skb cb preservation across defrag
2026-06-13 17:42 [PATCH net v2 0/2] net/sched: act_ct: preserve tc_skb_cb across defragmentation Ren Wei
2026-06-13 17:42 ` [PATCH net v2 1/2] " Ren Wei
@ 2026-06-13 17:42 ` Ren Wei
2026-06-19 0:50 ` [PATCH net v2 0/2] net/sched: act_ct: preserve tc_skb_cb across defragmentation patchwork-bot+netdevbpf
2 siblings, 0 replies; 4+ messages in thread
From: Ren Wei @ 2026-06-13 17:42 UTC (permalink / raw)
To: netdev, linux-kselftest, linux-kernel
Cc: jhs, jiri, kuba, paulb, victor, yuantan098, yifanwucs,
tomapufckgml, bird, xizh2024, n05ec
From: Zihan Xi <xizh2024@lzu.edu.cn>
Add a tc-testing case that sends IPv4 fragments through act_ct on clsact
egress while a root prio qdisc is present on the transmit path.
The test verifies that packet processing and qdisc accounting continue
to work after conntrack defragmentation, covering tc_skb_cb preservation
across defragmentation.
Signed-off-by: Zihan Xi <xizh2024@lzu.edu.cn>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
---
changes in v2:
- Add tc-testing case 9c2a for skb cb preservation across defrag
- v1 Link: https://lore.kernel.org/all/20260611154939.2615919-1-n05ec@lzu.edu.cn/
.../tc-testing/tc-tests/actions/ct.json | 38 +++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/ct.json b/tools/testing/selftests/tc-testing/tc-tests/actions/ct.json
index 33bb8f3ff8ed..da65f838bd52 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/ct.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/ct.json
@@ -664,5 +664,43 @@
"teardown": [
"$TC qdisc del dev $DEV1 ingress_block 21 clsact"
]
+ },
+ {
+ "id": "9c2a",
+ "name": "Act_ct preserves skb cb across defrag before prio dequeue",
+ "category": [
+ "actions",
+ "ct",
+ "scapy"
+ ],
+ "plugins": {
+ "requires": [
+ "nsPlugin",
+ "scapyPlugin"
+ ]
+ },
+ "setup": [
+ "$TC qdisc add dev $DUMMY root handle 1: prio",
+ "$TC qdisc add dev $DUMMY clsact",
+ "$TC qdisc add dev $DEV1 clsact",
+ "$TC filter add dev $DEV1 ingress protocol ip prio 1 matchall action mirred egress redirect dev $DUMMY"
+ ],
+ "cmdUnderTest": "$TC filter add dev $DUMMY egress protocol ip prio 1 matchall action ct zone 1 pipe",
+ "scapy": [
+ {
+ "iface": "$DEV0",
+ "count": 1,
+ "packet": "[Ether()/frag for frag in fragment(IP(src='10.0.0.10', dst='10.0.0.1', id=1)/UDP(sport=12345, dport=9)/Raw(b'A' * 4000), fragsize=1400)]"
+ }
+ ],
+ "expExitCode": "0",
+ "verifyCmd": "$TC -s qdisc show dev $DUMMY | grep -A 1 '^qdisc prio 1:'",
+ "matchPattern": "Sent [1-9][0-9]* bytes [1-9][0-9]* pkt",
+ "matchCount": "1",
+ "teardown": [
+ "$TC qdisc del dev $DEV1 clsact",
+ "$TC qdisc del dev $DUMMY clsact",
+ "$TC qdisc del dev $DUMMY root handle 1:"
+ ]
}
]
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-06-19 0:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-13 17:42 [PATCH net v2 0/2] net/sched: act_ct: preserve tc_skb_cb across defragmentation Ren Wei
2026-06-13 17:42 ` [PATCH net v2 1/2] " Ren Wei
2026-06-13 17:42 ` [PATCH net v2 2/2] selftests/tc-testing: act_ct: add TDC test for skb cb preservation across defrag Ren Wei
2026-06-19 0:50 ` [PATCH net v2 0/2] net/sched: act_ct: preserve tc_skb_cb across defragmentation patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox