* [PATCH net v2 0/2] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime
@ 2025-07-03 7:57 Dong Chenchen
2025-07-03 7:57 ` [PATCH net v2 1/2] " Dong Chenchen
2025-07-03 7:57 ` [PATCH net v2 2/2] selftests: Add test cases for vlan_filter modification " Dong Chenchen
0 siblings, 2 replies; 6+ messages in thread
From: Dong Chenchen @ 2025-07-03 7:57 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, horms, jiri, oscmaes92, linux,
pedro.netdev, idosch
Cc: netdev, linux-kernel, zhangchangzhong
Fix VLAN 0 refcount imbalance of toggling filtering during runtime.
v2:
Add auto_vid0 flag to track the refcount of vlan0
Add selftest case for vlan_filter
Dong Chenchen (2):
net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during
runtime
selftests: Add test cases for vlan_filter modification during runtime
net/8021q/vlan.c | 42 ++++++--
net/8021q/vlan.h | 1 +
tools/testing/selftests/net/vlan_hw_filter.sh | 98 ++++++++++++++++---
3 files changed, 120 insertions(+), 21 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH net v2 1/2] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime
2025-07-03 7:57 [PATCH net v2 0/2] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime Dong Chenchen
@ 2025-07-03 7:57 ` Dong Chenchen
2025-07-04 15:40 ` Ido Schimmel
2025-07-03 7:57 ` [PATCH net v2 2/2] selftests: Add test cases for vlan_filter modification " Dong Chenchen
1 sibling, 1 reply; 6+ messages in thread
From: Dong Chenchen @ 2025-07-03 7:57 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, horms, jiri, oscmaes92, linux,
pedro.netdev, idosch
Cc: netdev, linux-kernel, zhangchangzhong
8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
However, if changing filter feature during netdev runtime,
null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
for refcount imbalance.
[1]
BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
Write of size 8 at addr 0000000000000000 by task ip/382
CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 #60 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:123)
kasan_report (mm/kasan/report.c:636)
unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)
[2]
kernel BUG at net/8021q/vlan.c:99!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 #61 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1))
RSP: 0018:ffff88810badf310 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8
RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80
R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000
R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e
FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0
Call Trace:
<TASK>
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)
___sys_sendmsg (net/socket.c:2622)
__sys_sendmsg (net/socket.c:2652)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
Root cause is as below:
step1: add vlan0 for real_dev, such as bond, team.
register_vlan_dev
vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
step2: disable vlan filter feature and enable real_dev
step3: change filter from 0 to 1
vlan_device_event
vlan_filter_push_vids
ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
step4: real_dev down
vlan_device_event
vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
vlan_info_rcu_free //free vlan0
step5: real_dev up
vlan_device_event
vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
vlan_info_alloc //alloc new empty vid0. refcnt=1
step6: delete vlan0
unregister_vlan_dev
BUG_ON(!vlan_info); //will trigger it if step5 was not executed
vlan_group_set_device
array = vg->vlan_devices_arrays
//null-ptr-ref will be triggered after step5
E.g. the following sequence can reproduce null-ptr-ref
$ ip link add bond0 type bond mode 0
$ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
$ ethtool -K bond0 rx-vlan-filter off
$ ifconfig bond0 up
$ ethtool -K bond0 rx-vlan-filter on
$ ifconfig bond0 down
$ ifconfig bond0 up
$ ip link del vlan0
Add the auto_vid0 flag to track the refcount of vlan0, and use this
flag to determine whether to dec refcount while disabling real_dev.
Fixes: ad1afb003939 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
Co-developed-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
---
net/8021q/vlan.c | 42 +++++++++++++++++++++++++++++++++---------
net/8021q/vlan.h | 1 +
2 files changed, 34 insertions(+), 9 deletions(-)
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 06908e37c3d9..9a6df8c1daf9 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -357,6 +357,35 @@ static int __vlan_device_event(struct net_device *dev, unsigned long event)
return err;
}
+static void vlan_vid0_add(struct net_device *dev)
+{
+ struct vlan_info *vlan_info;
+ int err;
+
+ if (!(dev->features & NETIF_F_HW_VLAN_CTAG_FILTER))
+ return;
+
+ pr_info("adding VLAN 0 to HW filter on device %s\n", dev->name);
+
+ err = vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
+ if (err)
+ return;
+
+ vlan_info = rtnl_dereference(dev->vlan_info);
+ vlan_info->auto_vid0 = true;
+}
+
+static void vlan_vid0_del(struct net_device *dev)
+{
+ struct vlan_info *vlan_info = rtnl_dereference(dev->vlan_info);
+
+ if (!vlan_info || !vlan_info->auto_vid0)
+ return;
+
+ vlan_info->auto_vid0 = false;
+ vlan_vid_del(dev, htons(ETH_P_8021Q), 0);
+}
+
static int vlan_device_event(struct notifier_block *unused, unsigned long event,
void *ptr)
{
@@ -378,15 +407,10 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event,
return notifier_from_errno(err);
}
- if ((event == NETDEV_UP) &&
- (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) {
- pr_info("adding VLAN 0 to HW filter on device %s\n",
- dev->name);
- vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
- }
- if (event == NETDEV_DOWN &&
- (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER))
- vlan_vid_del(dev, htons(ETH_P_8021Q), 0);
+ if (event == NETDEV_UP)
+ vlan_vid0_add(dev);
+ else if (event == NETDEV_DOWN)
+ vlan_vid0_del(dev);
vlan_info = rtnl_dereference(dev->vlan_info);
if (!vlan_info)
diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h
index 5eaf38875554..c7ffe591d593 100644
--- a/net/8021q/vlan.h
+++ b/net/8021q/vlan.h
@@ -33,6 +33,7 @@ struct vlan_info {
struct vlan_group grp;
struct list_head vid_list;
unsigned int nr_vids;
+ bool auto_vid0;
struct rcu_head rcu;
};
--
2.25.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH net v2 2/2] selftests: Add test cases for vlan_filter modification during runtime
2025-07-03 7:57 [PATCH net v2 0/2] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime Dong Chenchen
2025-07-03 7:57 ` [PATCH net v2 1/2] " Dong Chenchen
@ 2025-07-03 7:57 ` Dong Chenchen
1 sibling, 0 replies; 6+ messages in thread
From: Dong Chenchen @ 2025-07-03 7:57 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, horms, jiri, oscmaes92, linux,
pedro.netdev, idosch
Cc: netdev, linux-kernel, zhangchangzhong
Add test cases for vlan_filter modification during runtime, which
may triger null-ptr-ref or memory leak of vlan0.
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
---
tools/testing/selftests/net/vlan_hw_filter.sh | 98 ++++++++++++++++---
1 file changed, 86 insertions(+), 12 deletions(-)
diff --git a/tools/testing/selftests/net/vlan_hw_filter.sh b/tools/testing/selftests/net/vlan_hw_filter.sh
index 7bc804ffaf7c..0fb56baf28e4 100755
--- a/tools/testing/selftests/net/vlan_hw_filter.sh
+++ b/tools/testing/selftests/net/vlan_hw_filter.sh
@@ -3,27 +3,101 @@
readonly NETNS="ns-$(mktemp -u XXXXXX)"
+ALL_TESTS="
+ test_vlan_filter_check
+ test_vlan0_del_crash_01
+ test_vlan0_del_crash_02
+ test_vlan0_del_crash_03
+ test_vid0_memleak
+"
+
ret=0
+setup() {
+ ip netns add ${NETNS}
+}
+
cleanup() {
- ip netns del $NETNS
+ ip netns del $NETNS 2>/dev/null
}
trap cleanup EXIT
fail() {
- echo "ERROR: ${1:-unexpected return code} (ret: $_)" >&2
- ret=1
+ echo "ERROR: ${1:-unexpected return code} (ret: $_)" >&2
+ ret=1
+}
+
+tests_run()
+{
+ local current_test
+ for current_test in ${TESTS:-$ALL_TESTS}; do
+ $current_test
+ done
+}
+
+test_vlan_filter_check() {
+ setup
+ ip netns exec ${NETNS} ip link add bond0 type bond mode 0
+ ip netns exec ${NETNS} ip link add bond_slave_1 type veth peer veth2
+ ip netns exec ${NETNS} ip link set bond_slave_1 master bond0
+ ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter off
+ ip netns exec ${NETNS} ip link add link bond_slave_1 name bond_slave_1.0 type vlan id 0
+ ip netns exec ${NETNS} ip link add link bond0 name bond0.0 type vlan id 0
+ ip netns exec ${NETNS} ip link set bond_slave_1 nomaster
+ ip netns exec ${NETNS} ip link del veth2 || fail "Please check vlan HW filter function"
+ cleanup
}
-ip netns add ${NETNS}
-ip netns exec ${NETNS} ip link add bond0 type bond mode 0
-ip netns exec ${NETNS} ip link add bond_slave_1 type veth peer veth2
-ip netns exec ${NETNS} ip link set bond_slave_1 master bond0
-ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter off
-ip netns exec ${NETNS} ip link add link bond_slave_1 name bond_slave_1.0 type vlan id 0
-ip netns exec ${NETNS} ip link add link bond0 name bond0.0 type vlan id 0
-ip netns exec ${NETNS} ip link set bond_slave_1 nomaster
-ip netns exec ${NETNS} ip link del veth2 || fail "Please check vlan HW filter function"
+#enable vlan_filter feature of real_dev with vlan0 during running time
+test_vlan0_del_crash_01() {
+ setup
+ ip netns exec ${NETNS} ip link add bond0 type bond mode 0
+ ip netns exec ${NETNS} ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
+ ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter off
+ ip netns exec ${NETNS} ifconfig bond0 up
+ ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter on
+ ip netns exec ${NETNS} ifconfig bond0 down
+ ip netns exec ${NETNS} ifconfig bond0 up
+ ip netns exec ${NETNS} ip link del vlan0 || fail "Please check vlan HW filter function"
+ cleanup
+}
+
+#enable vlan_filter feature and add vlan0 for real_dev during running time
+test_vlan0_del_crash_02() {
+ setup
+ ip netns exec ${NETNS} ip link add bond0 type bond mode 0
+ ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter off
+ ip netns exec ${NETNS} ifconfig bond0 up
+ ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter on
+ ip netns exec ${NETNS} ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
+ ip netns exec ${NETNS} ifconfig bond0 down
+ ip netns exec ${NETNS} ifconfig bond0 up
+ ip netns exec ${NETNS} ip link del vlan0 || fail "Please check vlan HW filter function"
+ cleanup
+}
+
+#enable vlan_filter feature of real_dev during running time
+#test kernel_bug of vlan unregister
+test_vlan0_del_crash_03() {
+ setup
+ ip netns exec ${NETNS} ip link add bond0 type bond mode 0
+ ip netns exec ${NETNS} ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
+ ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter off
+ ip netns exec ${NETNS} ifconfig bond0 up
+ ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter on
+ ip netns exec ${NETNS} ifconfig bond0 down
+ ip netns exec ${NETNS} ip link del vlan0 || fail "Please check vlan HW filter function"
+ cleanup
+}
+
+test_vid0_memleak() {
+ setup
+ ip netns exec ${NETNS} ip link add bond0 up type bond mode 0
+ ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter off
+ ip netns exec ${NETNS} ip link del dev bond0 || fail "Please check vlan HW filter function"
+ cleanup
+}
+tests_run
exit $ret
--
2.25.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net v2 1/2] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime
2025-07-03 7:57 ` [PATCH net v2 1/2] " Dong Chenchen
@ 2025-07-04 15:40 ` Ido Schimmel
2025-07-05 3:05 ` dongchenchen (A)
0 siblings, 1 reply; 6+ messages in thread
From: Ido Schimmel @ 2025-07-04 15:40 UTC (permalink / raw)
To: Dong Chenchen
Cc: davem, edumazet, kuba, pabeni, horms, jiri, oscmaes92, linux,
pedro.netdev, netdev, linux-kernel, zhangchangzhong
On Thu, Jul 03, 2025 at 03:57:01PM +0800, Dong Chenchen wrote:
> 8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
> remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
> However, if changing filter feature during netdev runtime,
> null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
> for refcount imbalance.
[...]
> Root cause is as below:
> step1: add vlan0 for real_dev, such as bond, team.
> register_vlan_dev
> vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
> step2: disable vlan filter feature and enable real_dev
> step3: change filter from 0 to 1
> vlan_device_event
> vlan_filter_push_vids
> ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
> step4: real_dev down
> vlan_device_event
> vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
> vlan_info_rcu_free //free vlan0
> step5: real_dev up
> vlan_device_event
> vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
> vlan_info_alloc //alloc new empty vid0. refcnt=1
> step6: delete vlan0
> unregister_vlan_dev
> BUG_ON(!vlan_info); //will trigger it if step5 was not executed
> vlan_group_set_device
> array = vg->vlan_devices_arrays
> //null-ptr-ref will be triggered after step5
>
> E.g. the following sequence can reproduce null-ptr-ref
>
> $ ip link add bond0 type bond mode 0
> $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
> $ ethtool -K bond0 rx-vlan-filter off
> $ ifconfig bond0 up
> $ ethtool -K bond0 rx-vlan-filter on
> $ ifconfig bond0 down
> $ ifconfig bond0 up
> $ ip link del vlan0
>
> Add the auto_vid0 flag to track the refcount of vlan0, and use this
> flag to determine whether to dec refcount while disabling real_dev.
>
> Fixes: ad1afb003939 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
> Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
> Co-developed-by: Ido Schimmel <idosch@idosch.org>
> Signed-off-by: Ido Schimmel <idosch@idosch.org>
> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
Given my name is also on the patch, do you mind making the commit
message a bit clearer? Specifically, in addition to the BUG_ON(), it
would be good to also describe the memory leak. Something like this:
"
Assuming the "rx-vlan-filter" feature is enabled on a net device, the
8021q module will automatically add or remove VLAN 0 when the net device
is put administratively up or down, respectively.
There are a couple of problems with the above scheme. The first problem
is a memory leak that can happen if the "rx-vlan-filter" feature is
disabled while the device is running:
# ip link add bond1 up type bond mode 0
# ethtool -K bond1 rx-vlan-filter off
# ip link del dev bond1
When the device is put administratively down the "rx-vlan-filter"
feature is disabled, so the 8021q module will not remove VLAN 0 and the
memory will be leaked [1].
Another problem that can happen is that the kernel can automatically
delete VLAN 0 when the device is put administratively down despite not
adding it when the device was put administratively up since during that
time the "rx-vlan-filter" feature was disabled.
This is not a problem if VLAN 0 does not exist, but if it belongs to an
upper VLAN device with VLAN ID 0, then a crash [2] can be triggered when the
VLAN device is deleted since the lower device does not have a VLAN info
structure associated with it:
# ip link add bond1 type bond mode 0
# ip link add link bond1 name vlan0 type vlan id 0 protocol 802.1q
# ethtool -K bond1 rx-vlan-filter off
# ip link set dev bond1 up
# ethtool -K bond1 rx-vlan-filter on
# ip link set dev bond1 down
# ip link del vlan0
Fix both problems by noting in the VLAN info whether VLAN 0 was
automatically added upon NETDEV_UP and based on that decide whether it
should be deleted upon NETDEV_DOWN, regardless of the state of the
"rx-vlan-filter" feature.
[1]
unreferenced object 0xffff8880068e3100 (size 256):
comm "ip", pid 384, jiffies 4296130254
hex dump (first 32 bytes):
00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0.............
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc 81ce31fa):
__kmalloc_cache_noprof+0x2b5/0x340
vlan_vid_add+0x434/0x940
vlan_device_event.cold+0x75/0xa8
notifier_call_chain+0xca/0x150
__dev_notify_flags+0xe3/0x250
rtnl_configure_link+0x193/0x260
rtnl_newlink_create+0x383/0x8e0
__rtnl_newlink+0x22c/0xa40
rtnl_newlink+0x627/0xb00
rtnetlink_rcv_msg+0x6fb/0xb70
netlink_rcv_skb+0x11f/0x350
netlink_unicast+0x426/0x710
netlink_sendmsg+0x75a/0xc20
__sock_sendmsg+0xc1/0x150
____sys_sendmsg+0x5aa/0x7b0
___sys_sendmsg+0xfc/0x180
unreferenced object 0xffff888008c06f20 (size 32):
comm "ip", pid 384, jiffies 4296130254
hex dump (first 32 bytes):
a0 31 8e 06 80 88 ff ff a0 31 8e 06 80 88 ff ff .1.......1......
81 00 00 00 01 00 00 00 cc cc cc cc cc cc cc cc ................
backtrace (crc fe715fa5):
__kmalloc_cache_noprof+0x2b5/0x340
vlan_vid_add+0x25a/0x940
vlan_device_event.cold+0x75/0xa8
notifier_call_chain+0xca/0x150
__dev_notify_flags+0xe3/0x250
rtnl_configure_link+0x193/0x260
rtnl_newlink_create+0x383/0x8e0
__rtnl_newlink+0x22c/0xa40
rtnl_newlink+0x627/0xb00
rtnetlink_rcv_msg+0x6fb/0xb70
netlink_rcv_skb+0x11f/0x350
netlink_unicast+0x426/0x710
netlink_sendmsg+0x75a/0xc20
__sock_sendmsg+0xc1/0x150
____sys_sendmsg+0x5aa/0x7b0
___sys_sendmsg+0xfc/0x180
[2]
kernel BUG at net/8021q/vlan.c:99!
Oops: invalid opcode: 0000 [#1] SMP KASAN
CPU: 4 UID: 0 PID: 405 Comm: ip Not tainted 6.16.0-rc4-virtme-gb9fd9888a565 #1 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014
RIP: 0010:unregister_vlan_dev+0x290/0x3b0
[...]
Call Trace:
<TASK>
rtnl_dellink+0x33d/0xa50
rtnetlink_rcv_msg+0x6fb/0xb70
netlink_rcv_skb+0x11f/0x350
netlink_unicast+0x426/0x710
netlink_sendmsg+0x75a/0xc20
__sock_sendmsg+0xc1/0x150
____sys_sendmsg+0x5aa/0x7b0
___sys_sendmsg+0xfc/0x180
__sys_sendmsg+0x124/0x1c0
do_syscall_64+0xbb/0x360
entry_SYSCALL_64_after_hwframe+0x4b/0x53
"
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net v2 1/2] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime
2025-07-04 15:40 ` Ido Schimmel
@ 2025-07-05 3:05 ` dongchenchen (A)
2025-07-06 7:00 ` Ido Schimmel
0 siblings, 1 reply; 6+ messages in thread
From: dongchenchen (A) @ 2025-07-05 3:05 UTC (permalink / raw)
To: Ido Schimmel
Cc: davem, edumazet, kuba, pabeni, horms, jiri, oscmaes92, linux,
pedro.netdev, netdev, linux-kernel, zhangchangzhong
> On Thu, Jul 03, 2025 at 03:57:01PM +0800, Dong Chenchen wrote:
>> 8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
>> remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
>> However, if changing filter feature during netdev runtime,
>> null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
>> for refcount imbalance.
> [...]
>
>> Root cause is as below:
>> step1: add vlan0 for real_dev, such as bond, team.
>> register_vlan_dev
>> vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
>> step2: disable vlan filter feature and enable real_dev
>> step3: change filter from 0 to 1
>> vlan_device_event
>> vlan_filter_push_vids
>> ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
>> step4: real_dev down
>> vlan_device_event
>> vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
>> vlan_info_rcu_free //free vlan0
>> step5: real_dev up
>> vlan_device_event
>> vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
>> vlan_info_alloc //alloc new empty vid0. refcnt=1
>> step6: delete vlan0
>> unregister_vlan_dev
>> BUG_ON(!vlan_info); //will trigger it if step5 was not executed
>> vlan_group_set_device
>> array = vg->vlan_devices_arrays
>> //null-ptr-ref will be triggered after step5
>>
>> E.g. the following sequence can reproduce null-ptr-ref
>>
>> $ ip link add bond0 type bond mode 0
>> $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
>> $ ethtool -K bond0 rx-vlan-filter off
>> $ ifconfig bond0 up
>> $ ethtool -K bond0 rx-vlan-filter on
>> $ ifconfig bond0 down
>> $ ifconfig bond0 up
>> $ ip link del vlan0
>>
>> Add the auto_vid0 flag to track the refcount of vlan0, and use this
>> flag to determine whether to dec refcount while disabling real_dev.
>>
>> Fixes: ad1afb003939 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
>> Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
>> Co-developed-by: Ido Schimmel <idosch@idosch.org>
>> Signed-off-by: Ido Schimmel <idosch@idosch.org>
>> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
> Given my name is also on the patch, do you mind making the commit
Hi, Ido
Thanks for your review
I apologize for adding your signature to the patch without your
permission. Perhaps I should use "suggested-by"?
> message a bit clearer? Specifically, in addition to the BUG_ON(), it
> would be good to also describe the memory leak. Something like this:
>
> "
> Assuming the "rx-vlan-filter" feature is enabled on a net device, the
> 8021q module will automatically add or remove VLAN 0 when the net device
> is put administratively up or down, respectively.
>
> There are a couple of problems with the above scheme. The first problem
> is a memory leak that can happen if the "rx-vlan-filter" feature is
> disabled while the device is running:
>
> # ip link add bond1 up type bond mode 0
> # ethtool -K bond1 rx-vlan-filter off
> # ip link del dev bond1
>
> When the device is put administratively down the "rx-vlan-filter"
> feature is disabled, so the 8021q module will not remove VLAN 0 and the
> memory will be leaked [1].
>
> Another problem that can happen is that the kernel can automatically
> delete VLAN 0 when the device is put administratively down despite not
> adding it when the device was put administratively up since during that
> time the "rx-vlan-filter" feature was disabled.
>
> This is not a problem if VLAN 0 does not exist, but if it belongs to an
> upper VLAN device with VLAN ID 0, then a crash [2] can be triggered when the
> VLAN device is deleted since the lower device does not have a VLAN info
> structure associated with it:
>
> # ip link add bond1 type bond mode 0
> # ip link add link bond1 name vlan0 type vlan id 0 protocol 802.1q
> # ethtool -K bond1 rx-vlan-filter off
> # ip link set dev bond1 up
> # ethtool -K bond1 rx-vlan-filter on
> # ip link set dev bond1 down
> # ip link del vlan0
>
> Fix both problems by noting in the VLAN info whether VLAN 0 was
> automatically added upon NETDEV_UP and based on that decide whether it
> should be deleted upon NETDEV_DOWN, regardless of the state of the
> "rx-vlan-filter" feature.
>
> [1]
> unreferenced object 0xffff8880068e3100 (size 256):
> comm "ip", pid 384, jiffies 4296130254
> hex dump (first 32 bytes):
> 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0.............
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> backtrace (crc 81ce31fa):
> __kmalloc_cache_noprof+0x2b5/0x340
> vlan_vid_add+0x434/0x940
> vlan_device_event.cold+0x75/0xa8
> notifier_call_chain+0xca/0x150
> __dev_notify_flags+0xe3/0x250
> rtnl_configure_link+0x193/0x260
> rtnl_newlink_create+0x383/0x8e0
> __rtnl_newlink+0x22c/0xa40
> rtnl_newlink+0x627/0xb00
> rtnetlink_rcv_msg+0x6fb/0xb70
> netlink_rcv_skb+0x11f/0x350
> netlink_unicast+0x426/0x710
> netlink_sendmsg+0x75a/0xc20
> __sock_sendmsg+0xc1/0x150
> ____sys_sendmsg+0x5aa/0x7b0
> ___sys_sendmsg+0xfc/0x180
> unreferenced object 0xffff888008c06f20 (size 32):
> comm "ip", pid 384, jiffies 4296130254
> hex dump (first 32 bytes):
> a0 31 8e 06 80 88 ff ff a0 31 8e 06 80 88 ff ff .1.......1......
> 81 00 00 00 01 00 00 00 cc cc cc cc cc cc cc cc ................
> backtrace (crc fe715fa5):
> __kmalloc_cache_noprof+0x2b5/0x340
> vlan_vid_add+0x25a/0x940
> vlan_device_event.cold+0x75/0xa8
> notifier_call_chain+0xca/0x150
> __dev_notify_flags+0xe3/0x250
> rtnl_configure_link+0x193/0x260
> rtnl_newlink_create+0x383/0x8e0
> __rtnl_newlink+0x22c/0xa40
> rtnl_newlink+0x627/0xb00
> rtnetlink_rcv_msg+0x6fb/0xb70
> netlink_rcv_skb+0x11f/0x350
> netlink_unicast+0x426/0x710
> netlink_sendmsg+0x75a/0xc20
> __sock_sendmsg+0xc1/0x150
> ____sys_sendmsg+0x5aa/0x7b0
> ___sys_sendmsg+0xfc/0x180
>
> [2]
> kernel BUG at net/8021q/vlan.c:99!
> Oops: invalid opcode: 0000 [#1] SMP KASAN
> CPU: 4 UID: 0 PID: 405 Comm: ip Not tainted 6.16.0-rc4-virtme-gb9fd9888a565 #1 PREEMPT(full)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014
> RIP: 0010:unregister_vlan_dev+0x290/0x3b0
> [...]
> Call Trace:
> <TASK>
> rtnl_dellink+0x33d/0xa50
> rtnetlink_rcv_msg+0x6fb/0xb70
> netlink_rcv_skb+0x11f/0x350
> netlink_unicast+0x426/0x710
> netlink_sendmsg+0x75a/0xc20
> __sock_sendmsg+0xc1/0x150
> ____sys_sendmsg+0x5aa/0x7b0
> ___sys_sendmsg+0xfc/0x180
> __sys_sendmsg+0x124/0x1c0
> do_syscall_64+0xbb/0x360
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> "
v3 wil be send. Thank you for your guidance.
Best regards
Dong Chenchen
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net v2 1/2] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime
2025-07-05 3:05 ` dongchenchen (A)
@ 2025-07-06 7:00 ` Ido Schimmel
0 siblings, 0 replies; 6+ messages in thread
From: Ido Schimmel @ 2025-07-06 7:00 UTC (permalink / raw)
To: dongchenchen (A)
Cc: davem, edumazet, kuba, pabeni, horms, jiri, oscmaes92, linux,
pedro.netdev, netdev, linux-kernel, zhangchangzhong
On Sat, Jul 05, 2025 at 11:05:38AM +0800, dongchenchen (A) wrote:
> I apologize for adding your signature to the patch without your
> permission. Perhaps I should use "suggested-by"?
I'm fine with "Suggested-by".
Thanks
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-07-06 7:00 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-03 7:57 [PATCH net v2 0/2] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime Dong Chenchen
2025-07-03 7:57 ` [PATCH net v2 1/2] " Dong Chenchen
2025-07-04 15:40 ` Ido Schimmel
2025-07-05 3:05 ` dongchenchen (A)
2025-07-06 7:00 ` Ido Schimmel
2025-07-03 7:57 ` [PATCH net v2 2/2] selftests: Add test cases for vlan_filter modification " Dong Chenchen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).