* [PATCH net v2] net/sched: cls_fw: fix NULL dereference of "old" filters before change()
@ 2026-04-03 16:04 Davide Caratti
2026-04-03 18:59 ` Jamal Hadi Salim
0 siblings, 1 reply; 4+ messages in thread
From: Davide Caratti @ 2026-04-03 16:04 UTC (permalink / raw)
To: Jamal Hadi Salim, Jiri Pirko, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Xiang Mei, netdev
Cc: Victor Nogueira
Like pointed out by Sashiko [1], TC filters are added to a shared block
before their ->change() function is called. This is a problem for cls_fw:
an invalid filter created with the "old" method can still classify some
packets before it is destroyed by the validation logic added by Xiang.
Therefore, insisting with repeated runs of the following script:
# ip link add dev crash0 type dummy
# ip link set dev crash0 up
# mausezahn crash0 -c 100000 -P 10 \
> -A 4.3.2.1 -B 1.2.3.4 -t udp "dp=1234" -q &
# sleep 1
# tc qdisc add dev crash0 egress_block 1 clsact
# tc filter add block 1 protocol ip prio 1 matchall \
> action skbedit mark 65536 continue
# tc filter add block 1 protocol ip prio 2 fw
# ip link del dev crash0
can still make fw_classify() hit the WARN_ON() in [2]:
WARNING: ./include/net/pkt_cls.h:88 at fw_classify+0x244/0x250 [cls_fw], CPU#18: mausezahn/1399
Modules linked in: cls_fw(E) act_skbedit(E)
CPU: 18 UID: 0 PID: 1399 Comm: mausezahn Tainted: G E 7.0.0-rc6-virtme #17 PREEMPT(full)
Tainted: [E]=UNSIGNED_MODULE
Hardware name: Red Hat KVM, BIOS 1.16.3-2.el9 04/01/2014
RIP: 0010:fw_classify+0x244/0x250 [cls_fw]
Code: 5c 49 c7 45 00 00 00 00 00 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 5b b8 ff ff ff ff 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 90 <0f> 0b 90 eb a0 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffd1b7026bf8a8 EFLAGS: 00010202
RAX: ffff8c5ac9c60800 RBX: ffff8c5ac99322c0 RCX: 0000000000000004
RDX: 0000000000000001 RSI: ffff8c5b74d7a000 RDI: ffff8c5ac8284f40
RBP: ffffd1b7026bf8d0 R08: 0000000000000000 R09: ffffd1b7026bf9b0
R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000010000
R13: ffffd1b7026bf930 R14: ffff8c5ac8284f40 R15: 0000000000000000
FS: 00007fca40c37740(0000) GS:ffff8c5b74d7a000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fca40e822a0 CR3: 0000000005ca0001 CR4: 0000000000172ef0
Call Trace:
<TASK>
tcf_classify+0x17d/0x5c0
tc_run+0x9d/0x150
__dev_queue_xmit+0x2ab/0x14d0
ip_finish_output2+0x340/0x8f0
ip_output+0xa4/0x250
raw_sendmsg+0x147d/0x14b0
__sys_sendto+0x1cc/0x1f0
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x126/0xf80
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fca40e822ba
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
RSP: 002b:00007ffc248a42c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 000055ef233289d0 RCX: 00007fca40e822ba
RDX: 000000000000001e RSI: 000055ef23328c30 RDI: 0000000000000003
RBP: 000055ef233289d0 R08: 00007ffc248a42d0 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000001e
R13: 00000000000186a0 R14: 0000000000000000 R15: 00007fca41043000
</TASK>
irq event stamp: 1045778
hardirqs last enabled at (1045784): [<ffffffff864ec042>] __up_console_sem+0x52/0x60
hardirqs last disabled at (1045789): [<ffffffff864ec027>] __up_console_sem+0x37/0x60
softirqs last enabled at (1045426): [<ffffffff874d48c7>] __alloc_skb+0x207/0x260
softirqs last disabled at (1045434): [<ffffffff874fe8f8>] __dev_queue_xmit+0x78/0x14d0
Then, because of the value in the packet's mark, dereference on 'q->handle'
with NULL 'q' occurs:
BUG: kernel NULL pointer dereference, address: 0000000000000038
[...]
RIP: 0010:fw_classify+0x1fe/0x250 [cls_fw]
[...]
Skip "old-style" classification on shared blocks, so that the NULL
dereference is fixed and WARN_ON() is not hit anymore in the short
lifetime of invalid cls_fw "old-style" filters.
V2: avoid NULL dereference without hitting WARN_ON() anymore (Sashiko)
[1] https://sashiko.dev/#/patchset/20260331050217.504278-1-xmei5%40asu.edu
[2] https://elixir.bootlin.com/linux/v7.0-rc6/source/include/net/pkt_cls.h#L86
Fixes: faeea8bbf6e9 ("net/sched: cls_fw: fix NULL pointer dereference on shared blocks")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
net/sched/cls_fw.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
index 23884ef8b80c..646a730dca93 100644
--- a/net/sched/cls_fw.c
+++ b/net/sched/cls_fw.c
@@ -74,9 +74,13 @@ TC_INDIRECT_SCOPE int fw_classify(struct sk_buff *skb,
}
}
} else {
- struct Qdisc *q = tcf_block_q(tp->chain->block);
+ struct Qdisc *q;
/* Old method: classify the packet using its skb mark. */
+ if (tcf_block_shared(tp->chain->block))
+ return -1;
+
+ q = tcf_block_q(tp->chain->block);
if (id && (TC_H_MAJ(id) == 0 ||
!(TC_H_MAJ(id ^ q->handle)))) {
res->classid = id;
--
2.52.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH net v2] net/sched: cls_fw: fix NULL dereference of "old" filters before change()
2026-04-03 16:04 [PATCH net v2] net/sched: cls_fw: fix NULL dereference of "old" filters before change() Davide Caratti
@ 2026-04-03 18:59 ` Jamal Hadi Salim
2026-04-08 8:28 ` Davide Caratti
0 siblings, 1 reply; 4+ messages in thread
From: Jamal Hadi Salim @ 2026-04-03 18:59 UTC (permalink / raw)
To: Davide Caratti
Cc: Jiri Pirko, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Xiang Mei, netdev, Victor Nogueira
On Fri, Apr 3, 2026 at 12:04 PM Davide Caratti <dcaratti@redhat.com> wrote:
>
> Like pointed out by Sashiko [1],
Just found out about this ;-> So bye-bye old AI? ;->
I must say, finding this specific bug is impressive.
>TC filters are added to a shared block
> before their ->change() function is called. This is a problem for cls_fw:
> an invalid filter created with the "old" method can still classify some
> packets before it is destroyed by the validation logic added by Xiang.
>
I think the real issue is that the filter is exposed to the datapath
before change() is invoked. i.e as you noted 1. init() was invoked, 2.
then the filter was exposed and 3. The change() was initiated.
So the packet arrived somewhere between step 2 and 3 and boom.
In the past - before commit ed76f5edccc9
(https://lore.kernel.org/netdev/20190204123301.4223-7-vladbu@mellanox.com/)
you couldnt expose the filter to the datapath before the change() was
completed..
Observation on consistency pov:
You will always get a "q" when you invoke tcf_block_q() if there are
no shared blocks attached. But that doesnt mean the "create"
configuration is complete; it is only complete if step 3 above
completes. This is because we are not sure if change() will result in
"old" or "new" lookup setup. So the check you added for q may be
inconsistent from that perspective and fw should have returned -1 like
all classifiers...
I cant think of a simple solution to verify if the config is
"inconsistent" other than to add something that gets checked in the
datapath (and when absent, return -1)
From that perspective, your check is not catastrophic, so it may be ok.
At minimal the Fixes: needs to change to Vlads commit? Good news is
only fw _seems_ to suffer from this challenge...
cheers,
jamal
> Therefore, insisting with repeated runs of the following script:
>
> # ip link add dev crash0 type dummy
> # ip link set dev crash0 up
> # mausezahn crash0 -c 100000 -P 10 \
> > -A 4.3.2.1 -B 1.2.3.4 -t udp "dp=1234" -q &
> # sleep 1
> # tc qdisc add dev crash0 egress_block 1 clsact
> # tc filter add block 1 protocol ip prio 1 matchall \
> > action skbedit mark 65536 continue
> # tc filter add block 1 protocol ip prio 2 fw
> # ip link del dev crash0
>
> can still make fw_classify() hit the WARN_ON() in [2]:
>
> WARNING: ./include/net/pkt_cls.h:88 at fw_classify+0x244/0x250 [cls_fw], CPU#18: mausezahn/1399
> Modules linked in: cls_fw(E) act_skbedit(E)
> CPU: 18 UID: 0 PID: 1399 Comm: mausezahn Tainted: G E 7.0.0-rc6-virtme #17 PREEMPT(full)
> Tainted: [E]=UNSIGNED_MODULE
> Hardware name: Red Hat KVM, BIOS 1.16.3-2.el9 04/01/2014
> RIP: 0010:fw_classify+0x244/0x250 [cls_fw]
> Code: 5c 49 c7 45 00 00 00 00 00 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 5b b8 ff ff ff ff 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 90 <0f> 0b 90 eb a0 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90
> RSP: 0018:ffffd1b7026bf8a8 EFLAGS: 00010202
> RAX: ffff8c5ac9c60800 RBX: ffff8c5ac99322c0 RCX: 0000000000000004
> RDX: 0000000000000001 RSI: ffff8c5b74d7a000 RDI: ffff8c5ac8284f40
> RBP: ffffd1b7026bf8d0 R08: 0000000000000000 R09: ffffd1b7026bf9b0
> R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000010000
> R13: ffffd1b7026bf930 R14: ffff8c5ac8284f40 R15: 0000000000000000
> FS: 00007fca40c37740(0000) GS:ffff8c5b74d7a000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fca40e822a0 CR3: 0000000005ca0001 CR4: 0000000000172ef0
> Call Trace:
> <TASK>
> tcf_classify+0x17d/0x5c0
> tc_run+0x9d/0x150
> __dev_queue_xmit+0x2ab/0x14d0
> ip_finish_output2+0x340/0x8f0
> ip_output+0xa4/0x250
> raw_sendmsg+0x147d/0x14b0
> __sys_sendto+0x1cc/0x1f0
> __x64_sys_sendto+0x24/0x30
> do_syscall_64+0x126/0xf80
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fca40e822ba
> Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
> RSP: 002b:00007ffc248a42c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 000055ef233289d0 RCX: 00007fca40e822ba
> RDX: 000000000000001e RSI: 000055ef23328c30 RDI: 0000000000000003
> RBP: 000055ef233289d0 R08: 00007ffc248a42d0 R09: 0000000000000010
> R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000001e
> R13: 00000000000186a0 R14: 0000000000000000 R15: 00007fca41043000
> </TASK>
> irq event stamp: 1045778
> hardirqs last enabled at (1045784): [<ffffffff864ec042>] __up_console_sem+0x52/0x60
> hardirqs last disabled at (1045789): [<ffffffff864ec027>] __up_console_sem+0x37/0x60
> softirqs last enabled at (1045426): [<ffffffff874d48c7>] __alloc_skb+0x207/0x260
> softirqs last disabled at (1045434): [<ffffffff874fe8f8>] __dev_queue_xmit+0x78/0x14d0
>
> Then, because of the value in the packet's mark, dereference on 'q->handle'
> with NULL 'q' occurs:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000038
> [...]
> RIP: 0010:fw_classify+0x1fe/0x250 [cls_fw]
> [...]
>
> Skip "old-style" classification on shared blocks, so that the NULL
> dereference is fixed and WARN_ON() is not hit anymore in the short
> lifetime of invalid cls_fw "old-style" filters.
>
> V2: avoid NULL dereference without hitting WARN_ON() anymore (Sashiko)
>
> [1] https://sashiko.dev/#/patchset/20260331050217.504278-1-xmei5%40asu.edu
> [2] https://elixir.bootlin.com/linux/v7.0-rc6/source/include/net/pkt_cls.h#L86
>
> Fixes: faeea8bbf6e9 ("net/sched: cls_fw: fix NULL pointer dereference on shared blocks")
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> ---
> net/sched/cls_fw.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
> index 23884ef8b80c..646a730dca93 100644
> --- a/net/sched/cls_fw.c
> +++ b/net/sched/cls_fw.c
> @@ -74,9 +74,13 @@ TC_INDIRECT_SCOPE int fw_classify(struct sk_buff *skb,
> }
> }
> } else {
> - struct Qdisc *q = tcf_block_q(tp->chain->block);
> + struct Qdisc *q;
>
> /* Old method: classify the packet using its skb mark. */
> + if (tcf_block_shared(tp->chain->block))
> + return -1;
> +
> + q = tcf_block_q(tp->chain->block);
> if (id && (TC_H_MAJ(id) == 0 ||
> !(TC_H_MAJ(id ^ q->handle)))) {
> res->classid = id;
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH net v2] net/sched: cls_fw: fix NULL dereference of "old" filters before change()
2026-04-03 18:59 ` Jamal Hadi Salim
@ 2026-04-08 8:28 ` Davide Caratti
2026-04-08 12:39 ` Jamal Hadi Salim
0 siblings, 1 reply; 4+ messages in thread
From: Davide Caratti @ 2026-04-08 8:28 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Jiri Pirko, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Xiang Mei, netdev, Victor Nogueira
On Fri, Apr 03, 2026 at 02:59:22PM -0400, Jamal Hadi Salim wrote:
> On Fri, Apr 3, 2026 at 12:04 PM Davide Caratti <dcaratti@redhat.com> wrote:
> >
> > Like pointed out by Sashiko [1],
>
> Just found out about this ;-> So bye-bye old AI? ;->
> I must say, finding this specific bug is impressive.
[...]
> Observation on consistency pov:
> You will always get a "q" when you invoke tcf_block_q() if there are
> no shared blocks attached. But that doesnt mean the "create"
> configuration is complete; it is only complete if step 3 above
> completes. This is because we are not sure if change() will result in
> "old" or "new" lookup setup. So the check you added for q may be
> inconsistent from that perspective and fw should have returned -1 like
> all classifiers...
"Hello Jamal, you are absolutely right! (emoji with fire) However, ..." :)
> I cant think of a simple solution to verify if the config is
> "inconsistent" other than to add something that gets checked in the
> datapath (and when absent, return -1)
^^ This. Specifically for cls_fw, that would mean converting fw_change() to
allocate some control data also for the "old" uapi, and I think it's too
much effort for the legacy.
IIUC the inconsistent behavior is: for a small amount of time, fwmark
classifier used in the "old" way would classify also when the filter's
'handle' is not zero.
> From that perspective, your check is not catastrophic, so it may be ok.
> At minimal the Fixes: needs to change to Vlads commit? Good news is
> only fw _seems_ to suffer from this challenge...
This is also what I understood by reading the code. Sure, I will edit
the Fixes: tag. Thanks for reading!
--
davide
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net v2] net/sched: cls_fw: fix NULL dereference of "old" filters before change()
2026-04-08 8:28 ` Davide Caratti
@ 2026-04-08 12:39 ` Jamal Hadi Salim
0 siblings, 0 replies; 4+ messages in thread
From: Jamal Hadi Salim @ 2026-04-08 12:39 UTC (permalink / raw)
To: Davide Caratti
Cc: Jiri Pirko, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Xiang Mei, netdev, Victor Nogueira
On Wed, Apr 8, 2026 at 4:28 AM Davide Caratti <dcaratti@redhat.com> wrote:
>
> On Fri, Apr 03, 2026 at 02:59:22PM -0400, Jamal Hadi Salim wrote:
> > On Fri, Apr 3, 2026 at 12:04 PM Davide Caratti <dcaratti@redhat.com> wrote:
> > >
> > > Like pointed out by Sashiko [1],
> >
> > Just found out about this ;-> So bye-bye old AI? ;->
> > I must say, finding this specific bug is impressive.
>
> [...]
>
> > Observation on consistency pov:
> > You will always get a "q" when you invoke tcf_block_q() if there are
> > no shared blocks attached. But that doesnt mean the "create"
> > configuration is complete; it is only complete if step 3 above
> > completes. This is because we are not sure if change() will result in
> > "old" or "new" lookup setup. So the check you added for q may be
> > inconsistent from that perspective and fw should have returned -1 like
> > all classifiers...
>
> "Hello Jamal, you are absolutely right! (emoji with fire) However, ..." :)
>
> > I cant think of a simple solution to verify if the config is
> > "inconsistent" other than to add something that gets checked in the
> > datapath (and when absent, return -1)
>
> ^^ This. Specifically for cls_fw, that would mean converting fw_change() to
> allocate some control data also for the "old" uapi, and I think it's too
> much effort for the legacy.
>
> IIUC the inconsistent behavior is: for a small amount of time, fwmark
> classifier used in the "old" way would classify also when the filter's
> 'handle' is not zero.
>
True.
> > From that perspective, your check is not catastrophic, so it may be ok.
> > At minimal the Fixes: needs to change to Vlads commit? Good news is
> > only fw _seems_ to suffer from this challenge...
>
> This is also what I understood by reading the code. Sure, I will edit
> the Fixes: tag. Thanks for reading!
>
If you resubmit please add my acked-by
cheers,
jamal
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-08 12:39 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-03 16:04 [PATCH net v2] net/sched: cls_fw: fix NULL dereference of "old" filters before change() Davide Caratti
2026-04-03 18:59 ` Jamal Hadi Salim
2026-04-08 8:28 ` Davide Caratti
2026-04-08 12:39 ` Jamal Hadi Salim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox