* [PATCH net 0/2] netfilter: br_netfilter: fix panic with metadata_dst skb
@ 2024-10-01 15:43 Andy Roulin
2024-10-01 15:43 ` [PATCH net 1/2] " Andy Roulin
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Andy Roulin @ 2024-10-01 15:43 UTC (permalink / raw)
To: netdev
Cc: pablo, kadlec, roopa, razor, davem, edumazet, kuba, pabeni, shuah,
idosch, petrm, Andy Roulin
There's a kernel panic possible in the br_netfilter module when sending
untagged traffic via a VxLAN device. Traceback is included below.
This happens during the check for fragmentation in br_nf_dev_queue_xmit
if the MTU on the VxLAN device is not big enough.
It is dependent on:
1) the br_netfilter module being loaded;
2) net.bridge.bridge-nf-call-iptables set to 1;
3) a bridge with a VxLAN (single-vxlan-device) netdevice as a bridge port;
4) untagged frames with size higher than the VxLAN MTU forwarded/flooded
This case was never supported in the first place, so the first patch drops
such packets.
A regression selftest is added as part of the second patch.
PING 10.0.0.2 (10.0.0.2) from 0.0.0.0 h1-eth0: 2000(2028) bytes of data.
[ 176.291791] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000110
[ 176.292101] Mem abort info:
[ 176.292184] ESR = 0x0000000096000004
[ 176.292322] EC = 0x25: DABT (current EL), IL = 32 bits
[ 176.292530] SET = 0, FnV = 0
[ 176.292709] EA = 0, S1PTW = 0
[ 176.292862] FSC = 0x04: level 0 translation fault
[ 176.293013] Data abort info:
[ 176.293104] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 176.293488] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 176.293787] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 176.293995] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043ef5000
[ 176.294166] [0000000000000110] pgd=0000000000000000,
p4d=0000000000000000
[ 176.294827] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 176.295252] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel veth
br_netfilter bridge stp llc ipv6 crct10dif_ce
[ 176.295923] CPU: 0 PID: 188 Comm: ping Not tainted
6.8.0-rc3-g5b3fbd61b9d1 #2
[ 176.296314] Hardware name: linux,dummy-virt (DT)
[ 176.296535] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[ 176.296808] pc : br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
[ 176.297382] lr : br_nf_dev_queue_xmit+0x2ac/0x4ec [br_netfilter]
[ 176.297636] sp : ffff800080003630
[ 176.297743] x29: ffff800080003630 x28: 0000000000000008 x27:
ffff6828c49ad9f8
[ 176.298093] x26: ffff6828c49ad000 x25: 0000000000000000 x24:
00000000000003e8
[ 176.298430] x23: 0000000000000000 x22: ffff6828c4960b40 x21:
ffff6828c3b16d28
[ 176.298652] x20: ffff6828c3167048 x19: ffff6828c3b16d00 x18:
0000000000000014
[ 176.298926] x17: ffffb0476322f000 x16: ffffb7e164023730 x15:
0000000095744632
[ 176.299296] x14: ffff6828c3f1c880 x13: 0000000000000002 x12:
ffffb7e137926a70
[ 176.299574] x11: 0000000000000001 x10: ffff6828c3f1c898 x9 :
0000000000000000
[ 176.300049] x8 : ffff6828c49bf070 x7 : 0008460f18d5f20e x6 :
f20e0100bebafeca
[ 176.300302] x5 : ffff6828c7f918fe x4 : ffff6828c49bf070 x3 :
0000000000000000
[ 176.300586] x2 : 0000000000000000 x1 : ffff6828c3c7ad00 x0 :
ffff6828c7f918f0
[ 176.300889] Call trace:
[ 176.301123] br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
[ 176.301411] br_nf_post_routing+0x2a8/0x3e4 [br_netfilter]
[ 176.301703] nf_hook_slow+0x48/0x124
[ 176.302060] br_forward_finish+0xc8/0xe8 [bridge]
[ 176.302371] br_nf_hook_thresh+0x124/0x134 [br_netfilter]
[ 176.302605] br_nf_forward_finish+0x118/0x22c [br_netfilter]
[ 176.302824] br_nf_forward_ip.part.0+0x264/0x290 [br_netfilter]
[ 176.303136] br_nf_forward+0x2b8/0x4e0 [br_netfilter]
[ 176.303359] nf_hook_slow+0x48/0x124
[ 176.303803] __br_forward+0xc4/0x194 [bridge]
[ 176.304013] br_flood+0xd4/0x168 [bridge]
[ 176.304300] br_handle_frame_finish+0x1d4/0x5c4 [bridge]
[ 176.304536] br_nf_hook_thresh+0x124/0x134 [br_netfilter]
[ 176.304978] br_nf_pre_routing_finish+0x29c/0x494 [br_netfilter]
[ 176.305188] br_nf_pre_routing+0x250/0x524 [br_netfilter]
[ 176.305428] br_handle_frame+0x244/0x3cc [bridge]
[ 176.305695] __netif_receive_skb_core.constprop.0+0x33c/0xecc
[ 176.306080] __netif_receive_skb_one_core+0x40/0x8c
[ 176.306197] __netif_receive_skb+0x18/0x64
[ 176.306369] process_backlog+0x80/0x124
[ 176.306540] __napi_poll+0x38/0x17c
[ 176.306636] net_rx_action+0x124/0x26c
[ 176.306758] __do_softirq+0x100/0x26c
[ 176.307051] ____do_softirq+0x10/0x1c
[ 176.307162] call_on_irq_stack+0x24/0x4c
[ 176.307289] do_softirq_own_stack+0x1c/0x2c
[ 176.307396] do_softirq+0x54/0x6c
[ 176.307485] __local_bh_enable_ip+0x8c/0x98
[ 176.307637] __dev_queue_xmit+0x22c/0xd28
[ 176.307775] neigh_resolve_output+0xf4/0x1a0
[ 176.308018] ip_finish_output2+0x1c8/0x628
[ 176.308137] ip_do_fragment+0x5b4/0x658
[ 176.308279] ip_fragment.constprop.0+0x48/0xec
[ 176.308420] __ip_finish_output+0xa4/0x254
[ 176.308593] ip_finish_output+0x34/0x130
[ 176.308814] ip_output+0x6c/0x108
[ 176.308929] ip_send_skb+0x50/0xf0
[ 176.309095] ip_push_pending_frames+0x30/0x54
[ 176.309254] raw_sendmsg+0x758/0xaec
[ 176.309568] inet_sendmsg+0x44/0x70
[ 176.309667] __sys_sendto+0x110/0x178
[ 176.309758] __arm64_sys_sendto+0x28/0x38
[ 176.309918] invoke_syscall+0x48/0x110
[ 176.310211] el0_svc_common.constprop.0+0x40/0xe0
[ 176.310353] do_el0_svc+0x1c/0x28
[ 176.310434] el0_svc+0x34/0xb4
[ 176.310551] el0t_64_sync_handler+0x120/0x12c
[ 176.310690] el0t_64_sync+0x190/0x194
[ 176.311066] Code: f9402e61 79402aa2 927ff821 f9400023 (f9408860)
[ 176.315743] ---[ end trace 0000000000000000 ]---
[ 176.316060] Kernel panic - not syncing: Oops: Fatal exception in
interrupt
[ 176.316371] Kernel Offset: 0x37e0e3000000 from 0xffff800080000000
[ 176.316564] PHYS_OFFSET: 0xffff97d780000000
[ 176.316782] CPU features: 0x0,88000203,3c020000,0100421b
[ 176.317210] Memory Limit: none
[ 176.317527] ---[ end Kernel panic - not syncing: Oops: Fatal
Exception in interrupt ]---\
Andy Roulin (2):
netfilter: br_netfilter: fix panic with metadata_dst skb
selftests: add regression test for br_netfilter panic
net/bridge/br_netfilter_hooks.c | 5 +
.../testing/selftests/net/netfilter/Makefile | 1 +
tools/testing/selftests/net/netfilter/config | 2 +
.../selftests/net/netfilter/vxlan_mtu_frag.sh | 121 ++++++++++++++++++
4 files changed, 129 insertions(+)
create mode 100755 tools/testing/selftests/net/netfilter/vxlan_mtu_frag.sh
--
2.39.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH net 1/2] netfilter: br_netfilter: fix panic with metadata_dst skb
2024-10-01 15:43 [PATCH net 0/2] netfilter: br_netfilter: fix panic with metadata_dst skb Andy Roulin
@ 2024-10-01 15:43 ` Andy Roulin
2024-10-03 12:12 ` Nikolay Aleksandrov
2024-10-01 15:44 ` [PATCH net 2/2] selftests: add regression test for br_netfilter panic Andy Roulin
2024-10-04 19:10 ` [PATCH net 0/2] netfilter: br_netfilter: fix panic with metadata_dst skb patchwork-bot+netdevbpf
2 siblings, 1 reply; 6+ messages in thread
From: Andy Roulin @ 2024-10-01 15:43 UTC (permalink / raw)
To: netdev
Cc: pablo, kadlec, roopa, razor, davem, edumazet, kuba, pabeni, shuah,
idosch, petrm, Andy Roulin
Fix a kernel panic in the br_netfilter module when sending untagged
traffic via a VxLAN device.
This happens during the check for fragmentation in br_nf_dev_queue_xmit.
It is dependent on:
1) the br_netfilter module being loaded;
2) net.bridge.bridge-nf-call-iptables set to 1;
3) a bridge with a VxLAN (single-vxlan-device) netdevice as a bridge port;
4) untagged frames with size higher than the VxLAN MTU forwarded/flooded
When forwarding the untagged packet to the VxLAN bridge port, before
the netfilter hooks are called, br_handle_egress_vlan_tunnel is called and
changes the skb_dst to the tunnel dst. The tunnel_dst is a metadata type
of dst, i.e., skb_valid_dst(skb) is false, and metadata->dst.dev is NULL.
Then in the br_netfilter hooks, in br_nf_dev_queue_xmit, there's a check
for frames that needs to be fragmented: frames with higher MTU than the
VxLAN device end up calling br_nf_ip_fragment, which in turns call
ip_skb_dst_mtu.
The ip_dst_mtu tries to use the skb_dst(skb) as if it was a valid dst
with valid dst->dev, thus the crash.
This case was never supported in the first place, so drop the packet
instead.
PING 10.0.0.2 (10.0.0.2) from 0.0.0.0 h1-eth0: 2000(2028) bytes of data.
[ 176.291791] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000110
[ 176.292101] Mem abort info:
[ 176.292184] ESR = 0x0000000096000004
[ 176.292322] EC = 0x25: DABT (current EL), IL = 32 bits
[ 176.292530] SET = 0, FnV = 0
[ 176.292709] EA = 0, S1PTW = 0
[ 176.292862] FSC = 0x04: level 0 translation fault
[ 176.293013] Data abort info:
[ 176.293104] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 176.293488] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 176.293787] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 176.293995] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043ef5000
[ 176.294166] [0000000000000110] pgd=0000000000000000,
p4d=0000000000000000
[ 176.294827] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 176.295252] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel veth
br_netfilter bridge stp llc ipv6 crct10dif_ce
[ 176.295923] CPU: 0 PID: 188 Comm: ping Not tainted
6.8.0-rc3-g5b3fbd61b9d1 #2
[ 176.296314] Hardware name: linux,dummy-virt (DT)
[ 176.296535] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[ 176.296808] pc : br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
[ 176.297382] lr : br_nf_dev_queue_xmit+0x2ac/0x4ec [br_netfilter]
[ 176.297636] sp : ffff800080003630
[ 176.297743] x29: ffff800080003630 x28: 0000000000000008 x27:
ffff6828c49ad9f8
[ 176.298093] x26: ffff6828c49ad000 x25: 0000000000000000 x24:
00000000000003e8
[ 176.298430] x23: 0000000000000000 x22: ffff6828c4960b40 x21:
ffff6828c3b16d28
[ 176.298652] x20: ffff6828c3167048 x19: ffff6828c3b16d00 x18:
0000000000000014
[ 176.298926] x17: ffffb0476322f000 x16: ffffb7e164023730 x15:
0000000095744632
[ 176.299296] x14: ffff6828c3f1c880 x13: 0000000000000002 x12:
ffffb7e137926a70
[ 176.299574] x11: 0000000000000001 x10: ffff6828c3f1c898 x9 :
0000000000000000
[ 176.300049] x8 : ffff6828c49bf070 x7 : 0008460f18d5f20e x6 :
f20e0100bebafeca
[ 176.300302] x5 : ffff6828c7f918fe x4 : ffff6828c49bf070 x3 :
0000000000000000
[ 176.300586] x2 : 0000000000000000 x1 : ffff6828c3c7ad00 x0 :
ffff6828c7f918f0
[ 176.300889] Call trace:
[ 176.301123] br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
[ 176.301411] br_nf_post_routing+0x2a8/0x3e4 [br_netfilter]
[ 176.301703] nf_hook_slow+0x48/0x124
[ 176.302060] br_forward_finish+0xc8/0xe8 [bridge]
[ 176.302371] br_nf_hook_thresh+0x124/0x134 [br_netfilter]
[ 176.302605] br_nf_forward_finish+0x118/0x22c [br_netfilter]
[ 176.302824] br_nf_forward_ip.part.0+0x264/0x290 [br_netfilter]
[ 176.303136] br_nf_forward+0x2b8/0x4e0 [br_netfilter]
[ 176.303359] nf_hook_slow+0x48/0x124
[ 176.303803] __br_forward+0xc4/0x194 [bridge]
[ 176.304013] br_flood+0xd4/0x168 [bridge]
[ 176.304300] br_handle_frame_finish+0x1d4/0x5c4 [bridge]
[ 176.304536] br_nf_hook_thresh+0x124/0x134 [br_netfilter]
[ 176.304978] br_nf_pre_routing_finish+0x29c/0x494 [br_netfilter]
[ 176.305188] br_nf_pre_routing+0x250/0x524 [br_netfilter]
[ 176.305428] br_handle_frame+0x244/0x3cc [bridge]
[ 176.305695] __netif_receive_skb_core.constprop.0+0x33c/0xecc
[ 176.306080] __netif_receive_skb_one_core+0x40/0x8c
[ 176.306197] __netif_receive_skb+0x18/0x64
[ 176.306369] process_backlog+0x80/0x124
[ 176.306540] __napi_poll+0x38/0x17c
[ 176.306636] net_rx_action+0x124/0x26c
[ 176.306758] __do_softirq+0x100/0x26c
[ 176.307051] ____do_softirq+0x10/0x1c
[ 176.307162] call_on_irq_stack+0x24/0x4c
[ 176.307289] do_softirq_own_stack+0x1c/0x2c
[ 176.307396] do_softirq+0x54/0x6c
[ 176.307485] __local_bh_enable_ip+0x8c/0x98
[ 176.307637] __dev_queue_xmit+0x22c/0xd28
[ 176.307775] neigh_resolve_output+0xf4/0x1a0
[ 176.308018] ip_finish_output2+0x1c8/0x628
[ 176.308137] ip_do_fragment+0x5b4/0x658
[ 176.308279] ip_fragment.constprop.0+0x48/0xec
[ 176.308420] __ip_finish_output+0xa4/0x254
[ 176.308593] ip_finish_output+0x34/0x130
[ 176.308814] ip_output+0x6c/0x108
[ 176.308929] ip_send_skb+0x50/0xf0
[ 176.309095] ip_push_pending_frames+0x30/0x54
[ 176.309254] raw_sendmsg+0x758/0xaec
[ 176.309568] inet_sendmsg+0x44/0x70
[ 176.309667] __sys_sendto+0x110/0x178
[ 176.309758] __arm64_sys_sendto+0x28/0x38
[ 176.309918] invoke_syscall+0x48/0x110
[ 176.310211] el0_svc_common.constprop.0+0x40/0xe0
[ 176.310353] do_el0_svc+0x1c/0x28
[ 176.310434] el0_svc+0x34/0xb4
[ 176.310551] el0t_64_sync_handler+0x120/0x12c
[ 176.310690] el0t_64_sync+0x190/0x194
[ 176.311066] Code: f9402e61 79402aa2 927ff821 f9400023 (f9408860)
[ 176.315743] ---[ end trace 0000000000000000 ]---
[ 176.316060] Kernel panic - not syncing: Oops: Fatal exception in
interrupt
[ 176.316371] Kernel Offset: 0x37e0e3000000 from 0xffff800080000000
[ 176.316564] PHYS_OFFSET: 0xffff97d780000000
[ 176.316782] CPU features: 0x0,88000203,3c020000,0100421b
[ 176.317210] Memory Limit: none
[ 176.317527] ---[ end Kernel panic - not syncing: Oops: Fatal
Exception in interrupt ]---\
Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Andy Roulin <aroulin@nvidia.com>
---
net/bridge/br_netfilter_hooks.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 0e8bc0ea6175..1d458e9da660 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -33,6 +33,7 @@
#include <net/ip.h>
#include <net/ipv6.h>
#include <net/addrconf.h>
+#include <net/dst_metadata.h>
#include <net/route.h>
#include <net/netfilter/br_netfilter.h>
#include <net/netns/generic.h>
@@ -879,6 +880,10 @@ static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff
return br_dev_queue_push_xmit(net, sk, skb);
}
+ /* Fragmentation on metadata/template dst is not supported */
+ if (unlikely(!skb_valid_dst(skb)))
+ goto drop;
+
/* This is wrong! We should preserve the original fragment
* boundaries by preserving frag_list rather than refragmenting.
*/
--
2.39.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH net 2/2] selftests: add regression test for br_netfilter panic
2024-10-01 15:43 [PATCH net 0/2] netfilter: br_netfilter: fix panic with metadata_dst skb Andy Roulin
2024-10-01 15:43 ` [PATCH net 1/2] " Andy Roulin
@ 2024-10-01 15:44 ` Andy Roulin
2024-10-03 12:13 ` Nikolay Aleksandrov
2024-10-04 19:10 ` [PATCH net 0/2] netfilter: br_netfilter: fix panic with metadata_dst skb patchwork-bot+netdevbpf
2 siblings, 1 reply; 6+ messages in thread
From: Andy Roulin @ 2024-10-01 15:44 UTC (permalink / raw)
To: netdev
Cc: pablo, kadlec, roopa, razor, davem, edumazet, kuba, pabeni, shuah,
idosch, petrm, Andy Roulin
Add a new netfilter selftests to test against br_netfilter panics when
VxLAN single-device is used together with untagged traffic and high MTU.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Andy Roulin <aroulin@nvidia.com>
---
.../testing/selftests/net/netfilter/Makefile | 1 +
tools/testing/selftests/net/netfilter/config | 2 +
.../selftests/net/netfilter/vxlan_mtu_frag.sh | 121 ++++++++++++++++++
3 files changed, 124 insertions(+)
create mode 100755 tools/testing/selftests/net/netfilter/vxlan_mtu_frag.sh
diff --git a/tools/testing/selftests/net/netfilter/Makefile b/tools/testing/selftests/net/netfilter/Makefile
index e6c9e777fead..542f7886a0bc 100644
--- a/tools/testing/selftests/net/netfilter/Makefile
+++ b/tools/testing/selftests/net/netfilter/Makefile
@@ -31,6 +31,7 @@ TEST_PROGS += nft_tproxy_tcp.sh
TEST_PROGS += nft_tproxy_udp.sh
TEST_PROGS += nft_zones_many.sh
TEST_PROGS += rpath.sh
+TEST_PROGS += vxlan_mtu_frag.sh
TEST_PROGS += xt_string.sh
TEST_PROGS_EXTENDED = nft_concat_range_perf.sh
diff --git a/tools/testing/selftests/net/netfilter/config b/tools/testing/selftests/net/netfilter/config
index c5fe7b34eaf1..43d8b500d391 100644
--- a/tools/testing/selftests/net/netfilter/config
+++ b/tools/testing/selftests/net/netfilter/config
@@ -7,6 +7,7 @@ CONFIG_BRIDGE_EBT_REDIRECT=m
CONFIG_BRIDGE_EBT_T_FILTER=m
CONFIG_BRIDGE_NETFILTER=m
CONFIG_BRIDGE_NF_EBTABLES=m
+CONFIG_BRIDGE_VLAN_FILTERING=y
CONFIG_CGROUP_BPF=y
CONFIG_DUMMY=m
CONFIG_INET_ESP=m
@@ -84,6 +85,7 @@ CONFIG_NFT_SYNPROXY=m
CONFIG_NFT_TPROXY=m
CONFIG_VETH=m
CONFIG_VLAN_8021Q=m
+CONFIG_VXLAN=m
CONFIG_XFRM_USER=m
CONFIG_XFRM_STATISTICS=y
CONFIG_NET_PKTGEN=m
diff --git a/tools/testing/selftests/net/netfilter/vxlan_mtu_frag.sh b/tools/testing/selftests/net/netfilter/vxlan_mtu_frag.sh
new file mode 100755
index 000000000000..912cb9583af1
--- /dev/null
+++ b/tools/testing/selftests/net/netfilter/vxlan_mtu_frag.sh
@@ -0,0 +1,121 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+source lib.sh
+
+if ! modprobe -q -n br_netfilter 2>&1; then
+ echo "SKIP: Test needs br_netfilter kernel module"
+ exit $ksft_skip
+fi
+
+cleanup()
+{
+ cleanup_all_ns
+}
+
+trap cleanup EXIT
+
+setup_ns host vtep router
+
+create_topology()
+{
+ ip link add host-eth0 netns "$host" type veth peer name vtep-host netns "$vtep"
+ ip link add vtep-router netns "$vtep" type veth peer name router-vtep netns "$router"
+}
+
+setup_host()
+{
+ # bring ports up
+ ip -n "$host" addr add 10.0.0.1/24 dev host-eth0
+ ip -n "$host" link set host-eth0 up
+
+ # Add VLAN 10,20
+ for vid in 10 20; do
+ ip -n "$host" link add link host-eth0 name host-eth0.$vid type vlan id $vid
+ ip -n "$host" addr add 10.0.$vid.1/24 dev host-eth0.$vid
+ ip -n "$host" link set host-eth0.$vid up
+ done
+}
+
+setup_vtep()
+{
+ # create bridge on vtep
+ ip -n "$vtep" link add name br0 type bridge
+ ip -n "$vtep" link set br0 type bridge vlan_filtering 1
+
+ # VLAN 10 is untagged PVID
+ ip -n "$vtep" link set dev vtep-host master br0
+ bridge -n "$vtep" vlan add dev vtep-host vid 10 pvid untagged
+
+ # VLAN 20 as other VID
+ ip -n "$vtep" link set dev vtep-host master br0
+ bridge -n "$vtep" vlan add dev vtep-host vid 20
+
+ # single-vxlan device on vtep
+ ip -n "$vtep" address add dev vtep-router 60.0.0.1/24
+ ip -n "$vtep" link add dev vxd type vxlan external \
+ vnifilter local 60.0.0.1 remote 60.0.0.2 dstport 4789 ttl 64
+ ip -n "$vtep" link set vxd master br0
+
+ # Add VLAN-VNI 1-1 mappings
+ bridge -n "$vtep" link set dev vxd vlan_tunnel on
+ for vid in 10 20; do
+ bridge -n "$vtep" vlan add dev vxd vid $vid
+ bridge -n "$vtep" vlan add dev vxd vid $vid tunnel_info id $vid
+ bridge -n "$vtep" vni add dev vxd vni $vid
+ done
+
+ # bring ports up
+ ip -n "$vtep" link set vxd up
+ ip -n "$vtep" link set vtep-router up
+ ip -n "$vtep" link set vtep-host up
+ ip -n "$vtep" link set dev br0 up
+}
+
+setup_router()
+{
+ # bring ports up
+ ip -n "$router" link set router-vtep up
+}
+
+setup()
+{
+ modprobe -q br_netfilter
+ create_topology
+ setup_host
+ setup_vtep
+ setup_router
+}
+
+test_large_mtu_untagged_traffic()
+{
+ ip -n "$vtep" link set vxd mtu 1000
+ ip -n "$host" neigh add 10.0.0.2 lladdr ca:fe:ba:be:00:01 dev host-eth0
+ ip netns exec "$host" \
+ ping -q 10.0.0.2 -I host-eth0 -c 1 -W 0.5 -s2000 > /dev/null 2>&1
+ return 0
+}
+
+test_large_mtu_tagged_traffic()
+{
+ for vid in 10 20; do
+ ip -n "$vtep" link set vxd mtu 1000
+ ip -n "$host" neigh add 10.0.$vid.2 lladdr ca:fe:ba:be:00:01 dev host-eth0.$vid
+ ip netns exec "$host" \
+ ping -q 10.0.$vid.2 -I host-eth0.$vid -c 1 -W 0.5 -s2000 > /dev/null 2>&1
+ done
+ return 0
+}
+
+do_test()
+{
+ # Frames will be dropped so ping will not succeed
+ # If it doesn't panic, it passes
+ test_large_mtu_tagged_traffic
+ test_large_mtu_untagged_traffic
+}
+
+setup && \
+echo "Test for VxLAN fragmentation with large MTU in br_netfilter:" && \
+do_test && echo "PASS!"
+exit $?
--
2.39.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net 1/2] netfilter: br_netfilter: fix panic with metadata_dst skb
2024-10-01 15:43 ` [PATCH net 1/2] " Andy Roulin
@ 2024-10-03 12:12 ` Nikolay Aleksandrov
0 siblings, 0 replies; 6+ messages in thread
From: Nikolay Aleksandrov @ 2024-10-03 12:12 UTC (permalink / raw)
To: Andy Roulin, netdev
Cc: pablo, kadlec, roopa, davem, edumazet, kuba, pabeni, shuah,
idosch, petrm
On 10/1/24 18:43, Andy Roulin wrote:
> Fix a kernel panic in the br_netfilter module when sending untagged
> traffic via a VxLAN device.
> This happens during the check for fragmentation in br_nf_dev_queue_xmit.
>
> It is dependent on:
> 1) the br_netfilter module being loaded;
> 2) net.bridge.bridge-nf-call-iptables set to 1;
> 3) a bridge with a VxLAN (single-vxlan-device) netdevice as a bridge port;
> 4) untagged frames with size higher than the VxLAN MTU forwarded/flooded
>
> When forwarding the untagged packet to the VxLAN bridge port, before
> the netfilter hooks are called, br_handle_egress_vlan_tunnel is called and
> changes the skb_dst to the tunnel dst. The tunnel_dst is a metadata type
> of dst, i.e., skb_valid_dst(skb) is false, and metadata->dst.dev is NULL.
>
> Then in the br_netfilter hooks, in br_nf_dev_queue_xmit, there's a check
> for frames that needs to be fragmented: frames with higher MTU than the
> VxLAN device end up calling br_nf_ip_fragment, which in turns call
> ip_skb_dst_mtu.
>
> The ip_dst_mtu tries to use the skb_dst(skb) as if it was a valid dst
> with valid dst->dev, thus the crash.
>
> This case was never supported in the first place, so drop the packet
> instead.
>
> PING 10.0.0.2 (10.0.0.2) from 0.0.0.0 h1-eth0: 2000(2028) bytes of data.
> [ 176.291791] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000110
> [ 176.292101] Mem abort info:
> [ 176.292184] ESR = 0x0000000096000004
> [ 176.292322] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 176.292530] SET = 0, FnV = 0
> [ 176.292709] EA = 0, S1PTW = 0
> [ 176.292862] FSC = 0x04: level 0 translation fault
> [ 176.293013] Data abort info:
> [ 176.293104] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> [ 176.293488] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [ 176.293787] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [ 176.293995] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043ef5000
> [ 176.294166] [0000000000000110] pgd=0000000000000000,
> p4d=0000000000000000
> [ 176.294827] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> [ 176.295252] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel veth
> br_netfilter bridge stp llc ipv6 crct10dif_ce
> [ 176.295923] CPU: 0 PID: 188 Comm: ping Not tainted
> 6.8.0-rc3-g5b3fbd61b9d1 #2
> [ 176.296314] Hardware name: linux,dummy-virt (DT)
> [ 176.296535] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS
> BTYPE=--)
> [ 176.296808] pc : br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
> [ 176.297382] lr : br_nf_dev_queue_xmit+0x2ac/0x4ec [br_netfilter]
> [ 176.297636] sp : ffff800080003630
> [ 176.297743] x29: ffff800080003630 x28: 0000000000000008 x27:
> ffff6828c49ad9f8
> [ 176.298093] x26: ffff6828c49ad000 x25: 0000000000000000 x24:
> 00000000000003e8
> [ 176.298430] x23: 0000000000000000 x22: ffff6828c4960b40 x21:
> ffff6828c3b16d28
> [ 176.298652] x20: ffff6828c3167048 x19: ffff6828c3b16d00 x18:
> 0000000000000014
> [ 176.298926] x17: ffffb0476322f000 x16: ffffb7e164023730 x15:
> 0000000095744632
> [ 176.299296] x14: ffff6828c3f1c880 x13: 0000000000000002 x12:
> ffffb7e137926a70
> [ 176.299574] x11: 0000000000000001 x10: ffff6828c3f1c898 x9 :
> 0000000000000000
> [ 176.300049] x8 : ffff6828c49bf070 x7 : 0008460f18d5f20e x6 :
> f20e0100bebafeca
> [ 176.300302] x5 : ffff6828c7f918fe x4 : ffff6828c49bf070 x3 :
> 0000000000000000
> [ 176.300586] x2 : 0000000000000000 x1 : ffff6828c3c7ad00 x0 :
> ffff6828c7f918f0
> [ 176.300889] Call trace:
> [ 176.301123] br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
> [ 176.301411] br_nf_post_routing+0x2a8/0x3e4 [br_netfilter]
> [ 176.301703] nf_hook_slow+0x48/0x124
> [ 176.302060] br_forward_finish+0xc8/0xe8 [bridge]
> [ 176.302371] br_nf_hook_thresh+0x124/0x134 [br_netfilter]
> [ 176.302605] br_nf_forward_finish+0x118/0x22c [br_netfilter]
> [ 176.302824] br_nf_forward_ip.part.0+0x264/0x290 [br_netfilter]
> [ 176.303136] br_nf_forward+0x2b8/0x4e0 [br_netfilter]
> [ 176.303359] nf_hook_slow+0x48/0x124
> [ 176.303803] __br_forward+0xc4/0x194 [bridge]
> [ 176.304013] br_flood+0xd4/0x168 [bridge]
> [ 176.304300] br_handle_frame_finish+0x1d4/0x5c4 [bridge]
> [ 176.304536] br_nf_hook_thresh+0x124/0x134 [br_netfilter]
> [ 176.304978] br_nf_pre_routing_finish+0x29c/0x494 [br_netfilter]
> [ 176.305188] br_nf_pre_routing+0x250/0x524 [br_netfilter]
> [ 176.305428] br_handle_frame+0x244/0x3cc [bridge]
> [ 176.305695] __netif_receive_skb_core.constprop.0+0x33c/0xecc
> [ 176.306080] __netif_receive_skb_one_core+0x40/0x8c
> [ 176.306197] __netif_receive_skb+0x18/0x64
> [ 176.306369] process_backlog+0x80/0x124
> [ 176.306540] __napi_poll+0x38/0x17c
> [ 176.306636] net_rx_action+0x124/0x26c
> [ 176.306758] __do_softirq+0x100/0x26c
> [ 176.307051] ____do_softirq+0x10/0x1c
> [ 176.307162] call_on_irq_stack+0x24/0x4c
> [ 176.307289] do_softirq_own_stack+0x1c/0x2c
> [ 176.307396] do_softirq+0x54/0x6c
> [ 176.307485] __local_bh_enable_ip+0x8c/0x98
> [ 176.307637] __dev_queue_xmit+0x22c/0xd28
> [ 176.307775] neigh_resolve_output+0xf4/0x1a0
> [ 176.308018] ip_finish_output2+0x1c8/0x628
> [ 176.308137] ip_do_fragment+0x5b4/0x658
> [ 176.308279] ip_fragment.constprop.0+0x48/0xec
> [ 176.308420] __ip_finish_output+0xa4/0x254
> [ 176.308593] ip_finish_output+0x34/0x130
> [ 176.308814] ip_output+0x6c/0x108
> [ 176.308929] ip_send_skb+0x50/0xf0
> [ 176.309095] ip_push_pending_frames+0x30/0x54
> [ 176.309254] raw_sendmsg+0x758/0xaec
> [ 176.309568] inet_sendmsg+0x44/0x70
> [ 176.309667] __sys_sendto+0x110/0x178
> [ 176.309758] __arm64_sys_sendto+0x28/0x38
> [ 176.309918] invoke_syscall+0x48/0x110
> [ 176.310211] el0_svc_common.constprop.0+0x40/0xe0
> [ 176.310353] do_el0_svc+0x1c/0x28
> [ 176.310434] el0_svc+0x34/0xb4
> [ 176.310551] el0t_64_sync_handler+0x120/0x12c
> [ 176.310690] el0t_64_sync+0x190/0x194
> [ 176.311066] Code: f9402e61 79402aa2 927ff821 f9400023 (f9408860)
> [ 176.315743] ---[ end trace 0000000000000000 ]---
> [ 176.316060] Kernel panic - not syncing: Oops: Fatal exception in
> interrupt
> [ 176.316371] Kernel Offset: 0x37e0e3000000 from 0xffff800080000000
> [ 176.316564] PHYS_OFFSET: 0xffff97d780000000
> [ 176.316782] CPU features: 0x0,88000203,3c020000,0100421b
> [ 176.317210] Memory Limit: none
> [ 176.317527] ---[ end Kernel panic - not syncing: Oops: Fatal
> Exception in interrupt ]---\
>
> Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths")
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> Signed-off-by: Andy Roulin <aroulin@nvidia.com>
> ---
> net/bridge/br_netfilter_hooks.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
> index 0e8bc0ea6175..1d458e9da660 100644
> --- a/net/bridge/br_netfilter_hooks.c
> +++ b/net/bridge/br_netfilter_hooks.c
> @@ -33,6 +33,7 @@
> #include <net/ip.h>
> #include <net/ipv6.h>
> #include <net/addrconf.h>
> +#include <net/dst_metadata.h>
> #include <net/route.h>
> #include <net/netfilter/br_netfilter.h>
> #include <net/netns/generic.h>
> @@ -879,6 +880,10 @@ static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff
> return br_dev_queue_push_xmit(net, sk, skb);
> }
>
> + /* Fragmentation on metadata/template dst is not supported */
> + if (unlikely(!skb_valid_dst(skb)))> + goto drop;
> +
> /* This is wrong! We should preserve the original fragment
> * boundaries by preserving frag_list rather than refragmenting.
> */
This helper's name is a bit misleading. :) But looking at it, it seems
only metadata dsts are not considered valid, so looks good to me.
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net 2/2] selftests: add regression test for br_netfilter panic
2024-10-01 15:44 ` [PATCH net 2/2] selftests: add regression test for br_netfilter panic Andy Roulin
@ 2024-10-03 12:13 ` Nikolay Aleksandrov
0 siblings, 0 replies; 6+ messages in thread
From: Nikolay Aleksandrov @ 2024-10-03 12:13 UTC (permalink / raw)
To: Andy Roulin, netdev
Cc: pablo, kadlec, roopa, davem, edumazet, kuba, pabeni, shuah,
idosch, petrm
On 10/1/24 18:44, Andy Roulin wrote:
> Add a new netfilter selftests to test against br_netfilter panics when
> VxLAN single-device is used together with untagged traffic and high MTU.
>
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Andy Roulin <aroulin@nvidia.com>
> ---
> .../testing/selftests/net/netfilter/Makefile | 1 +
> tools/testing/selftests/net/netfilter/config | 2 +
> .../selftests/net/netfilter/vxlan_mtu_frag.sh | 121 ++++++++++++++++++
> 3 files changed, 124 insertions(+)
> create mode 100755 tools/testing/selftests/net/netfilter/vxlan_mtu_frag.sh
>
Always happy to see new tests, thanks!
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net 0/2] netfilter: br_netfilter: fix panic with metadata_dst skb
2024-10-01 15:43 [PATCH net 0/2] netfilter: br_netfilter: fix panic with metadata_dst skb Andy Roulin
2024-10-01 15:43 ` [PATCH net 1/2] " Andy Roulin
2024-10-01 15:44 ` [PATCH net 2/2] selftests: add regression test for br_netfilter panic Andy Roulin
@ 2024-10-04 19:10 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-10-04 19:10 UTC (permalink / raw)
To: Andy Roulin
Cc: netdev, pablo, kadlec, roopa, razor, davem, edumazet, kuba,
pabeni, shuah, idosch, petrm
Hello:
This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Tue, 1 Oct 2024 08:43:58 -0700 you wrote:
> There's a kernel panic possible in the br_netfilter module when sending
> untagged traffic via a VxLAN device. Traceback is included below.
> This happens during the check for fragmentation in br_nf_dev_queue_xmit
> if the MTU on the VxLAN device is not big enough.
>
> It is dependent on:
> 1) the br_netfilter module being loaded;
> 2) net.bridge.bridge-nf-call-iptables set to 1;
> 3) a bridge with a VxLAN (single-vxlan-device) netdevice as a bridge port;
> 4) untagged frames with size higher than the VxLAN MTU forwarded/flooded
>
> [...]
Here is the summary with links:
- [net,1/2] netfilter: br_netfilter: fix panic with metadata_dst skb
https://git.kernel.org/netdev/net/c/f9ff7665cd12
- [net,2/2] selftests: add regression test for br_netfilter panic
https://git.kernel.org/netdev/net/c/bc4d22b72a2d
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-10-04 19:10 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-01 15:43 [PATCH net 0/2] netfilter: br_netfilter: fix panic with metadata_dst skb Andy Roulin
2024-10-01 15:43 ` [PATCH net 1/2] " Andy Roulin
2024-10-03 12:12 ` Nikolay Aleksandrov
2024-10-01 15:44 ` [PATCH net 2/2] selftests: add regression test for br_netfilter panic Andy Roulin
2024-10-03 12:13 ` Nikolay Aleksandrov
2024-10-04 19:10 ` [PATCH net 0/2] netfilter: br_netfilter: fix panic with metadata_dst skb patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).