linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2 0/2] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime
@ 2025-07-03  7:57 Dong Chenchen
  2025-07-03  7:57 ` [PATCH net v2 1/2] " Dong Chenchen
  2025-07-03  7:57 ` [PATCH net v2 2/2] selftests: Add test cases for vlan_filter modification " Dong Chenchen
  0 siblings, 2 replies; 6+ messages in thread
From: Dong Chenchen @ 2025-07-03  7:57 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, horms, jiri, oscmaes92, linux,
	pedro.netdev, idosch
  Cc: netdev, linux-kernel, zhangchangzhong

Fix VLAN 0 refcount imbalance of toggling filtering during runtime.

v2:
	Add auto_vid0 flag to track the refcount of vlan0 
	Add selftest case for vlan_filter

Dong Chenchen (2):
  net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during
    runtime
  selftests: Add test cases for vlan_filter modification during runtime

 net/8021q/vlan.c                              | 42 ++++++--
 net/8021q/vlan.h                              |  1 +
 tools/testing/selftests/net/vlan_hw_filter.sh | 98 ++++++++++++++++---
 3 files changed, 120 insertions(+), 21 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net v2 1/2] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime
  2025-07-03  7:57 [PATCH net v2 0/2] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime Dong Chenchen
@ 2025-07-03  7:57 ` Dong Chenchen
  2025-07-04 15:40   ` Ido Schimmel
  2025-07-03  7:57 ` [PATCH net v2 2/2] selftests: Add test cases for vlan_filter modification " Dong Chenchen
  1 sibling, 1 reply; 6+ messages in thread
From: Dong Chenchen @ 2025-07-03  7:57 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, horms, jiri, oscmaes92, linux,
	pedro.netdev, idosch
  Cc: netdev, linux-kernel, zhangchangzhong

8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
However, if changing filter feature during netdev runtime,
null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
for refcount imbalance.

[1]
BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
Write of size 8 at addr 0000000000000000 by task ip/382
CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 #60 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:123)
kasan_report (mm/kasan/report.c:636)
unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)

[2]
kernel BUG at net/8021q/vlan.c:99!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 #61 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1))
RSP: 0018:ffff88810badf310 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8
RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80
R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000
R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e
FS:  00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0
Call Trace:
 <TASK>
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)
___sys_sendmsg (net/socket.c:2622)
__sys_sendmsg (net/socket.c:2652)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)

Root cause is as below:
step1: add vlan0 for real_dev, such as bond, team.
register_vlan_dev
    vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
step2: disable vlan filter feature and enable real_dev
step3: change filter from 0 to 1
vlan_device_event
    vlan_filter_push_vids
    	ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
step4: real_dev down
vlan_device_event
    vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
        vlan_info_rcu_free //free vlan0
step5: real_dev up
vlan_device_event
    vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
        vlan_info_alloc //alloc new empty vid0. refcnt=1
step6: delete vlan0
unregister_vlan_dev
    BUG_ON(!vlan_info); //will trigger it if step5 was not executed
    vlan_group_set_device
        array = vg->vlan_devices_arrays
	//null-ptr-ref will be triggered after step5

E.g. the following sequence can reproduce null-ptr-ref

$ ip link add bond0 type bond mode 0
$ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
$ ethtool -K bond0 rx-vlan-filter off
$ ifconfig bond0 up
$ ethtool -K bond0 rx-vlan-filter on
$ ifconfig bond0 down
$ ifconfig bond0 up
$ ip link del vlan0

Add the auto_vid0 flag to track the refcount of vlan0, and use this
flag to determine whether to dec refcount while disabling real_dev.

Fixes: ad1afb003939 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
Co-developed-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
---
 net/8021q/vlan.c | 42 +++++++++++++++++++++++++++++++++---------
 net/8021q/vlan.h |  1 +
 2 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 06908e37c3d9..9a6df8c1daf9 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -357,6 +357,35 @@ static int __vlan_device_event(struct net_device *dev, unsigned long event)
 	return err;
 }
 
+static void vlan_vid0_add(struct net_device *dev)
+{
+	struct vlan_info *vlan_info;
+	int err;
+
+	if (!(dev->features & NETIF_F_HW_VLAN_CTAG_FILTER))
+		return;
+
+	pr_info("adding VLAN 0 to HW filter on device %s\n", dev->name);
+
+	err = vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
+	if (err)
+		return;
+
+	vlan_info = rtnl_dereference(dev->vlan_info);
+	vlan_info->auto_vid0 = true;
+}
+
+static void vlan_vid0_del(struct net_device *dev)
+{
+	struct vlan_info *vlan_info = rtnl_dereference(dev->vlan_info);
+
+	if (!vlan_info || !vlan_info->auto_vid0)
+		return;
+
+	vlan_info->auto_vid0 = false;
+	vlan_vid_del(dev, htons(ETH_P_8021Q), 0);
+}
+
 static int vlan_device_event(struct notifier_block *unused, unsigned long event,
 			     void *ptr)
 {
@@ -378,15 +407,10 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event,
 			return notifier_from_errno(err);
 	}
 
-	if ((event == NETDEV_UP) &&
-	    (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) {
-		pr_info("adding VLAN 0 to HW filter on device %s\n",
-			dev->name);
-		vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
-	}
-	if (event == NETDEV_DOWN &&
-	    (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER))
-		vlan_vid_del(dev, htons(ETH_P_8021Q), 0);
+	if (event == NETDEV_UP)
+		vlan_vid0_add(dev);
+	else if (event == NETDEV_DOWN)
+		vlan_vid0_del(dev);
 
 	vlan_info = rtnl_dereference(dev->vlan_info);
 	if (!vlan_info)
diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h
index 5eaf38875554..c7ffe591d593 100644
--- a/net/8021q/vlan.h
+++ b/net/8021q/vlan.h
@@ -33,6 +33,7 @@ struct vlan_info {
 	struct vlan_group	grp;
 	struct list_head	vid_list;
 	unsigned int		nr_vids;
+	bool			auto_vid0;
 	struct rcu_head		rcu;
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net v2 2/2] selftests: Add test cases for vlan_filter modification during runtime
  2025-07-03  7:57 [PATCH net v2 0/2] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime Dong Chenchen
  2025-07-03  7:57 ` [PATCH net v2 1/2] " Dong Chenchen
@ 2025-07-03  7:57 ` Dong Chenchen
  1 sibling, 0 replies; 6+ messages in thread
From: Dong Chenchen @ 2025-07-03  7:57 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, horms, jiri, oscmaes92, linux,
	pedro.netdev, idosch
  Cc: netdev, linux-kernel, zhangchangzhong

Add test cases for vlan_filter modification during runtime, which
may triger null-ptr-ref or memory leak of vlan0.

Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
---
 tools/testing/selftests/net/vlan_hw_filter.sh | 98 ++++++++++++++++---
 1 file changed, 86 insertions(+), 12 deletions(-)

diff --git a/tools/testing/selftests/net/vlan_hw_filter.sh b/tools/testing/selftests/net/vlan_hw_filter.sh
index 7bc804ffaf7c..0fb56baf28e4 100755
--- a/tools/testing/selftests/net/vlan_hw_filter.sh
+++ b/tools/testing/selftests/net/vlan_hw_filter.sh
@@ -3,27 +3,101 @@
 
 readonly NETNS="ns-$(mktemp -u XXXXXX)"
 
+ALL_TESTS="
+	test_vlan_filter_check
+	test_vlan0_del_crash_01
+	test_vlan0_del_crash_02
+	test_vlan0_del_crash_03
+	test_vid0_memleak
+"
+
 ret=0
 
+setup() {
+	ip netns add ${NETNS}
+}
+
 cleanup() {
-	ip netns del $NETNS
+	ip netns del $NETNS 2>/dev/null
 }
 
 trap cleanup EXIT
 
 fail() {
-    echo "ERROR: ${1:-unexpected return code} (ret: $_)" >&2
-    ret=1
+	echo "ERROR: ${1:-unexpected return code} (ret: $_)" >&2
+	ret=1
+}
+
+tests_run()
+{
+	local current_test
+	for current_test in ${TESTS:-$ALL_TESTS}; do
+		$current_test
+	done
+}
+
+test_vlan_filter_check() {
+	setup
+	ip netns exec ${NETNS} ip link add bond0 type bond mode 0
+	ip netns exec ${NETNS} ip link add bond_slave_1 type veth peer veth2
+	ip netns exec ${NETNS} ip link set bond_slave_1 master bond0
+	ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter off
+	ip netns exec ${NETNS} ip link add link bond_slave_1 name bond_slave_1.0 type vlan id 0
+	ip netns exec ${NETNS} ip link add link bond0 name bond0.0 type vlan id 0
+	ip netns exec ${NETNS} ip link set bond_slave_1 nomaster
+	ip netns exec ${NETNS} ip link del veth2 || fail "Please check vlan HW filter function"
+	cleanup
 }
 
-ip netns add ${NETNS}
-ip netns exec ${NETNS} ip link add bond0 type bond mode 0
-ip netns exec ${NETNS} ip link add bond_slave_1 type veth peer veth2
-ip netns exec ${NETNS} ip link set bond_slave_1 master bond0
-ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter off
-ip netns exec ${NETNS} ip link add link bond_slave_1 name bond_slave_1.0 type vlan id 0
-ip netns exec ${NETNS} ip link add link bond0 name bond0.0 type vlan id 0
-ip netns exec ${NETNS} ip link set bond_slave_1 nomaster
-ip netns exec ${NETNS} ip link del veth2 || fail "Please check vlan HW filter function"
+#enable vlan_filter feature of real_dev with vlan0 during running time
+test_vlan0_del_crash_01() {
+	setup
+	ip netns exec ${NETNS} ip link add bond0 type bond mode 0
+	ip netns exec ${NETNS} ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
+	ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter off
+	ip netns exec ${NETNS} ifconfig bond0 up
+	ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter on
+	ip netns exec ${NETNS} ifconfig bond0 down
+	ip netns exec ${NETNS} ifconfig bond0 up
+	ip netns exec ${NETNS} ip link del vlan0 || fail "Please check vlan HW filter function"
+	cleanup
+}
+
+#enable vlan_filter feature and add vlan0 for real_dev during running time
+test_vlan0_del_crash_02() {
+	setup
+	ip netns exec ${NETNS} ip link add bond0 type bond mode 0
+	ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter off
+	ip netns exec ${NETNS} ifconfig bond0 up
+	ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter on
+	ip netns exec ${NETNS} ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
+	ip netns exec ${NETNS} ifconfig bond0 down
+	ip netns exec ${NETNS} ifconfig bond0 up
+	ip netns exec ${NETNS} ip link del vlan0 || fail "Please check vlan HW filter function"
+	cleanup
+}
+
+#enable vlan_filter feature of real_dev during running time
+#test kernel_bug of vlan unregister
+test_vlan0_del_crash_03() {
+	setup
+	ip netns exec ${NETNS} ip link add bond0 type bond mode 0
+	ip netns exec ${NETNS} ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
+	ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter off
+	ip netns exec ${NETNS} ifconfig bond0 up
+	ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter on
+	ip netns exec ${NETNS} ifconfig bond0 down
+	ip netns exec ${NETNS} ip link del vlan0 || fail "Please check vlan HW filter function"
+	cleanup
+}
+
+test_vid0_memleak() {
+	setup
+	ip netns exec ${NETNS} ip link add bond0 up type bond mode 0
+	ip netns exec ${NETNS} ethtool -K bond0 rx-vlan-filter off
+	ip netns exec ${NETNS} ip link del dev bond0 || fail "Please check vlan HW filter function"
+	cleanup
+}
 
+tests_run
 exit $ret
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2 1/2] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime
  2025-07-03  7:57 ` [PATCH net v2 1/2] " Dong Chenchen
@ 2025-07-04 15:40   ` Ido Schimmel
  2025-07-05  3:05     ` dongchenchen (A)
  0 siblings, 1 reply; 6+ messages in thread
From: Ido Schimmel @ 2025-07-04 15:40 UTC (permalink / raw)
  To: Dong Chenchen
  Cc: davem, edumazet, kuba, pabeni, horms, jiri, oscmaes92, linux,
	pedro.netdev, netdev, linux-kernel, zhangchangzhong

On Thu, Jul 03, 2025 at 03:57:01PM +0800, Dong Chenchen wrote:
> 8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
> remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
> However, if changing filter feature during netdev runtime,
> null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
> for refcount imbalance.

[...]

> Root cause is as below:
> step1: add vlan0 for real_dev, such as bond, team.
> register_vlan_dev
>     vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
> step2: disable vlan filter feature and enable real_dev
> step3: change filter from 0 to 1
> vlan_device_event
>     vlan_filter_push_vids
>     	ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
> step4: real_dev down
> vlan_device_event
>     vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
>         vlan_info_rcu_free //free vlan0
> step5: real_dev up
> vlan_device_event
>     vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
>         vlan_info_alloc //alloc new empty vid0. refcnt=1
> step6: delete vlan0
> unregister_vlan_dev
>     BUG_ON(!vlan_info); //will trigger it if step5 was not executed
>     vlan_group_set_device
>         array = vg->vlan_devices_arrays
> 	//null-ptr-ref will be triggered after step5
> 
> E.g. the following sequence can reproduce null-ptr-ref
> 
> $ ip link add bond0 type bond mode 0
> $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
> $ ethtool -K bond0 rx-vlan-filter off
> $ ifconfig bond0 up
> $ ethtool -K bond0 rx-vlan-filter on
> $ ifconfig bond0 down
> $ ifconfig bond0 up
> $ ip link del vlan0
> 
> Add the auto_vid0 flag to track the refcount of vlan0, and use this
> flag to determine whether to dec refcount while disabling real_dev.
> 
> Fixes: ad1afb003939 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
> Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
> Co-developed-by: Ido Schimmel <idosch@idosch.org>
> Signed-off-by: Ido Schimmel <idosch@idosch.org>
> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>

Given my name is also on the patch, do you mind making the commit
message a bit clearer? Specifically, in addition to the BUG_ON(), it
would be good to also describe the memory leak. Something like this:

"
Assuming the "rx-vlan-filter" feature is enabled on a net device, the
8021q module will automatically add or remove VLAN 0 when the net device
is put administratively up or down, respectively.

There are a couple of problems with the above scheme. The first problem
is a memory leak that can happen if the "rx-vlan-filter" feature is
disabled while the device is running:

 # ip link add bond1 up type bond mode 0
 # ethtool -K bond1 rx-vlan-filter off
 # ip link del dev bond1

When the device is put administratively down the "rx-vlan-filter"
feature is disabled, so the 8021q module will not remove VLAN 0 and the
memory will be leaked [1].

Another problem that can happen is that the kernel can automatically
delete VLAN 0 when the device is put administratively down despite not
adding it when the device was put administratively up since during that
time the "rx-vlan-filter" feature was disabled.

This is not a problem if VLAN 0 does not exist, but if it belongs to an
upper VLAN device with VLAN ID 0, then a crash [2] can be triggered when the
VLAN device is deleted since the lower device does not have a VLAN info
structure associated with it:

 # ip link add bond1 type bond mode 0
 # ip link add link bond1 name vlan0 type vlan id 0 protocol 802.1q
 # ethtool -K bond1 rx-vlan-filter off
 # ip link set dev bond1 up
 # ethtool -K bond1 rx-vlan-filter on
 # ip link set dev bond1 down
 # ip link del vlan0

Fix both problems by noting in the VLAN info whether VLAN 0 was
automatically added upon NETDEV_UP and based on that decide whether it
should be deleted upon NETDEV_DOWN, regardless of the state of the
"rx-vlan-filter" feature.

[1]
unreferenced object 0xffff8880068e3100 (size 256):
  comm "ip", pid 384, jiffies 4296130254
  hex dump (first 32 bytes):
    00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00  . 0.............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc 81ce31fa):
    __kmalloc_cache_noprof+0x2b5/0x340
    vlan_vid_add+0x434/0x940
    vlan_device_event.cold+0x75/0xa8
    notifier_call_chain+0xca/0x150
    __dev_notify_flags+0xe3/0x250
    rtnl_configure_link+0x193/0x260
    rtnl_newlink_create+0x383/0x8e0
    __rtnl_newlink+0x22c/0xa40
    rtnl_newlink+0x627/0xb00
    rtnetlink_rcv_msg+0x6fb/0xb70
    netlink_rcv_skb+0x11f/0x350
    netlink_unicast+0x426/0x710
    netlink_sendmsg+0x75a/0xc20
    __sock_sendmsg+0xc1/0x150
    ____sys_sendmsg+0x5aa/0x7b0
    ___sys_sendmsg+0xfc/0x180
unreferenced object 0xffff888008c06f20 (size 32):
  comm "ip", pid 384, jiffies 4296130254
  hex dump (first 32 bytes):
    a0 31 8e 06 80 88 ff ff a0 31 8e 06 80 88 ff ff  .1.......1......
    81 00 00 00 01 00 00 00 cc cc cc cc cc cc cc cc  ................
  backtrace (crc fe715fa5):
    __kmalloc_cache_noprof+0x2b5/0x340
    vlan_vid_add+0x25a/0x940
    vlan_device_event.cold+0x75/0xa8
    notifier_call_chain+0xca/0x150
    __dev_notify_flags+0xe3/0x250
    rtnl_configure_link+0x193/0x260
    rtnl_newlink_create+0x383/0x8e0
    __rtnl_newlink+0x22c/0xa40
    rtnl_newlink+0x627/0xb00
    rtnetlink_rcv_msg+0x6fb/0xb70
    netlink_rcv_skb+0x11f/0x350
    netlink_unicast+0x426/0x710
    netlink_sendmsg+0x75a/0xc20
    __sock_sendmsg+0xc1/0x150
    ____sys_sendmsg+0x5aa/0x7b0
    ___sys_sendmsg+0xfc/0x180

[2]
kernel BUG at net/8021q/vlan.c:99!
Oops: invalid opcode: 0000 [#1] SMP KASAN
CPU: 4 UID: 0 PID: 405 Comm: ip Not tainted 6.16.0-rc4-virtme-gb9fd9888a565 #1 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014
RIP: 0010:unregister_vlan_dev+0x290/0x3b0
[...]
Call Trace:
 <TASK>
 rtnl_dellink+0x33d/0xa50
 rtnetlink_rcv_msg+0x6fb/0xb70
 netlink_rcv_skb+0x11f/0x350
 netlink_unicast+0x426/0x710
 netlink_sendmsg+0x75a/0xc20
 __sock_sendmsg+0xc1/0x150
 ____sys_sendmsg+0x5aa/0x7b0
 ___sys_sendmsg+0xfc/0x180
 __sys_sendmsg+0x124/0x1c0
 do_syscall_64+0xbb/0x360
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
"

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2 1/2] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime
  2025-07-04 15:40   ` Ido Schimmel
@ 2025-07-05  3:05     ` dongchenchen (A)
  2025-07-06  7:00       ` Ido Schimmel
  0 siblings, 1 reply; 6+ messages in thread
From: dongchenchen (A) @ 2025-07-05  3:05 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: davem, edumazet, kuba, pabeni, horms, jiri, oscmaes92, linux,
	pedro.netdev, netdev, linux-kernel, zhangchangzhong


> On Thu, Jul 03, 2025 at 03:57:01PM +0800, Dong Chenchen wrote:
>> 8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
>> remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
>> However, if changing filter feature during netdev runtime,
>> null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
>> for refcount imbalance.
> [...]
>
>> Root cause is as below:
>> step1: add vlan0 for real_dev, such as bond, team.
>> register_vlan_dev
>>      vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
>> step2: disable vlan filter feature and enable real_dev
>> step3: change filter from 0 to 1
>> vlan_device_event
>>      vlan_filter_push_vids
>>      	ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
>> step4: real_dev down
>> vlan_device_event
>>      vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
>>          vlan_info_rcu_free //free vlan0
>> step5: real_dev up
>> vlan_device_event
>>      vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
>>          vlan_info_alloc //alloc new empty vid0. refcnt=1
>> step6: delete vlan0
>> unregister_vlan_dev
>>      BUG_ON(!vlan_info); //will trigger it if step5 was not executed
>>      vlan_group_set_device
>>          array = vg->vlan_devices_arrays
>> 	//null-ptr-ref will be triggered after step5
>>
>> E.g. the following sequence can reproduce null-ptr-ref
>>
>> $ ip link add bond0 type bond mode 0
>> $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
>> $ ethtool -K bond0 rx-vlan-filter off
>> $ ifconfig bond0 up
>> $ ethtool -K bond0 rx-vlan-filter on
>> $ ifconfig bond0 down
>> $ ifconfig bond0 up
>> $ ip link del vlan0
>>
>> Add the auto_vid0 flag to track the refcount of vlan0, and use this
>> flag to determine whether to dec refcount while disabling real_dev.
>>
>> Fixes: ad1afb003939 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
>> Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
>> Co-developed-by: Ido Schimmel <idosch@idosch.org>
>> Signed-off-by: Ido Schimmel <idosch@idosch.org>
>> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
> Given my name is also on the patch, do you mind making the commit

Hi, Ido

Thanks for your review
I apologize for adding your signature to the patch without your
permission. Perhaps I should use "suggested-by"?

> message a bit clearer? Specifically, in addition to the BUG_ON(), it
> would be good to also describe the memory leak. Something like this:
>
> "
> Assuming the "rx-vlan-filter" feature is enabled on a net device, the
> 8021q module will automatically add or remove VLAN 0 when the net device
> is put administratively up or down, respectively.
>
> There are a couple of problems with the above scheme. The first problem
> is a memory leak that can happen if the "rx-vlan-filter" feature is
> disabled while the device is running:
>
>   # ip link add bond1 up type bond mode 0
>   # ethtool -K bond1 rx-vlan-filter off
>   # ip link del dev bond1
>
> When the device is put administratively down the "rx-vlan-filter"
> feature is disabled, so the 8021q module will not remove VLAN 0 and the
> memory will be leaked [1].
>
> Another problem that can happen is that the kernel can automatically
> delete VLAN 0 when the device is put administratively down despite not
> adding it when the device was put administratively up since during that
> time the "rx-vlan-filter" feature was disabled.
>
> This is not a problem if VLAN 0 does not exist, but if it belongs to an
> upper VLAN device with VLAN ID 0, then a crash [2] can be triggered when the
> VLAN device is deleted since the lower device does not have a VLAN info
> structure associated with it:
>
>   # ip link add bond1 type bond mode 0
>   # ip link add link bond1 name vlan0 type vlan id 0 protocol 802.1q
>   # ethtool -K bond1 rx-vlan-filter off
>   # ip link set dev bond1 up
>   # ethtool -K bond1 rx-vlan-filter on
>   # ip link set dev bond1 down
>   # ip link del vlan0
>
> Fix both problems by noting in the VLAN info whether VLAN 0 was
> automatically added upon NETDEV_UP and based on that decide whether it
> should be deleted upon NETDEV_DOWN, regardless of the state of the
> "rx-vlan-filter" feature.
>
> [1]
> unreferenced object 0xffff8880068e3100 (size 256):
>    comm "ip", pid 384, jiffies 4296130254
>    hex dump (first 32 bytes):
>      00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00  . 0.............
>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>    backtrace (crc 81ce31fa):
>      __kmalloc_cache_noprof+0x2b5/0x340
>      vlan_vid_add+0x434/0x940
>      vlan_device_event.cold+0x75/0xa8
>      notifier_call_chain+0xca/0x150
>      __dev_notify_flags+0xe3/0x250
>      rtnl_configure_link+0x193/0x260
>      rtnl_newlink_create+0x383/0x8e0
>      __rtnl_newlink+0x22c/0xa40
>      rtnl_newlink+0x627/0xb00
>      rtnetlink_rcv_msg+0x6fb/0xb70
>      netlink_rcv_skb+0x11f/0x350
>      netlink_unicast+0x426/0x710
>      netlink_sendmsg+0x75a/0xc20
>      __sock_sendmsg+0xc1/0x150
>      ____sys_sendmsg+0x5aa/0x7b0
>      ___sys_sendmsg+0xfc/0x180
> unreferenced object 0xffff888008c06f20 (size 32):
>    comm "ip", pid 384, jiffies 4296130254
>    hex dump (first 32 bytes):
>      a0 31 8e 06 80 88 ff ff a0 31 8e 06 80 88 ff ff  .1.......1......
>      81 00 00 00 01 00 00 00 cc cc cc cc cc cc cc cc  ................
>    backtrace (crc fe715fa5):
>      __kmalloc_cache_noprof+0x2b5/0x340
>      vlan_vid_add+0x25a/0x940
>      vlan_device_event.cold+0x75/0xa8
>      notifier_call_chain+0xca/0x150
>      __dev_notify_flags+0xe3/0x250
>      rtnl_configure_link+0x193/0x260
>      rtnl_newlink_create+0x383/0x8e0
>      __rtnl_newlink+0x22c/0xa40
>      rtnl_newlink+0x627/0xb00
>      rtnetlink_rcv_msg+0x6fb/0xb70
>      netlink_rcv_skb+0x11f/0x350
>      netlink_unicast+0x426/0x710
>      netlink_sendmsg+0x75a/0xc20
>      __sock_sendmsg+0xc1/0x150
>      ____sys_sendmsg+0x5aa/0x7b0
>      ___sys_sendmsg+0xfc/0x180
>
> [2]
> kernel BUG at net/8021q/vlan.c:99!
> Oops: invalid opcode: 0000 [#1] SMP KASAN
> CPU: 4 UID: 0 PID: 405 Comm: ip Not tainted 6.16.0-rc4-virtme-gb9fd9888a565 #1 PREEMPT(full)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014
> RIP: 0010:unregister_vlan_dev+0x290/0x3b0
> [...]
> Call Trace:
>   <TASK>
>   rtnl_dellink+0x33d/0xa50
>   rtnetlink_rcv_msg+0x6fb/0xb70
>   netlink_rcv_skb+0x11f/0x350
>   netlink_unicast+0x426/0x710
>   netlink_sendmsg+0x75a/0xc20
>   __sock_sendmsg+0xc1/0x150
>   ____sys_sendmsg+0x5aa/0x7b0
>   ___sys_sendmsg+0xfc/0x180
>   __sys_sendmsg+0x124/0x1c0
>   do_syscall_64+0xbb/0x360
>   entry_SYSCALL_64_after_hwframe+0x4b/0x53
> "

v3 wil be send. Thank you for your guidance.

Best regards
Dong Chenchen


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2 1/2] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime
  2025-07-05  3:05     ` dongchenchen (A)
@ 2025-07-06  7:00       ` Ido Schimmel
  0 siblings, 0 replies; 6+ messages in thread
From: Ido Schimmel @ 2025-07-06  7:00 UTC (permalink / raw)
  To: dongchenchen (A)
  Cc: davem, edumazet, kuba, pabeni, horms, jiri, oscmaes92, linux,
	pedro.netdev, netdev, linux-kernel, zhangchangzhong

On Sat, Jul 05, 2025 at 11:05:38AM +0800, dongchenchen (A) wrote:
> I apologize for adding your signature to the patch without your
> permission. Perhaps I should use "suggested-by"?

I'm fine with "Suggested-by".

Thanks

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-07-06  7:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-03  7:57 [PATCH net v2 0/2] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime Dong Chenchen
2025-07-03  7:57 ` [PATCH net v2 1/2] " Dong Chenchen
2025-07-04 15:40   ` Ido Schimmel
2025-07-05  3:05     ` dongchenchen (A)
2025-07-06  7:00       ` Ido Schimmel
2025-07-03  7:57 ` [PATCH net v2 2/2] selftests: Add test cases for vlan_filter modification " Dong Chenchen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).