* [PATCH net] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime @ 2025-06-23 11:30 Dong Chenchen 2025-06-25 0:42 ` Jakub Kicinski 0 siblings, 1 reply; 6+ messages in thread From: Dong Chenchen @ 2025-06-23 11:30 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, horms, jiri, oscmaes92, linux, pedro.netdev Cc: netdev, linux-kernel, yuehaibing, zhangchangzhong 8021q(vlan_device_event) will add VLAN 0 when enabling the device, and remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set. However, if changing filter feature during netdev runtime, null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance. [1] BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110) Write of size 8 at addr 0000000000000000 by task ip/382 CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 #60 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:123) kasan_report (mm/kasan/report.c:636) unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110) rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 #61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: real_dev up vlan_device_event vlan_vid_add(dev, htons(ETH_P_8021Q), 0); vlan_info_alloc //alloc new empty vid0. refcnt=1 step6: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //will trigger it if step5 was not executed vlan_group_set_device array = vg->vlan_devices_arrays //null-ptr-ref will be triggered after step5 E.g. the following sequence can reproduce null-ptr-ref $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ifconfig bond0 up $ ip link del vlan0 Fix it by correctly modifying vid0 refcnt of real_dev when changing the NETIF_F_HW_VLAN_CTAG_FILTER flag during runtime. Fixes: ad1afb003939 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> --- net/8021q/vlan.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 06908e37c3d9..6e01ece0a95c 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -504,12 +504,21 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, break; case NETDEV_CVLAN_FILTER_PUSH_INFO: + flgs = dev_get_flags(dev); + if (flgs & IFF_UP) { + pr_info("adding VLAN 0 to HW filter on device %s\n", + dev->name); + vlan_vid_add(dev, htons(ETH_P_8021Q), 0); + } err = vlan_filter_push_vids(vlan_info, htons(ETH_P_8021Q)); if (err) return notifier_from_errno(err); break; case NETDEV_CVLAN_FILTER_DROP_INFO: + flgs = dev_get_flags(dev); + if (flgs & IFF_UP) + vlan_vid_del(dev, htons(ETH_P_8021Q), 0); vlan_filter_drop_vids(vlan_info, htons(ETH_P_8021Q)); break; -- 2.25.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime 2025-06-23 11:30 [PATCH net] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime Dong Chenchen @ 2025-06-25 0:42 ` Jakub Kicinski 2025-06-26 3:41 ` dongchenchen (A) 0 siblings, 1 reply; 6+ messages in thread From: Jakub Kicinski @ 2025-06-25 0:42 UTC (permalink / raw) To: Dong Chenchen Cc: davem, edumazet, pabeni, horms, jiri, oscmaes92, linux, pedro.netdev, netdev, linux-kernel, yuehaibing, zhangchangzhong On Mon, 23 Jun 2025 19:30:08 +0800 Dong Chenchen wrote: > $ ip link add bond0 type bond mode 0 > $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q > $ ethtool -K bond0 rx-vlan-filter off > $ ifconfig bond0 up > $ ethtool -K bond0 rx-vlan-filter on > $ ifconfig bond0 down > $ ifconfig bond0 up > $ ip link del vlan0 Please try to figure out the reasonable combinations in which we can change the flags and bring the device up and down. Create a selftest in bash and add it under tools/testing/selftests/net > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c > index 06908e37c3d9..6e01ece0a95c 100644 > --- a/net/8021q/vlan.c > +++ b/net/8021q/vlan.c > @@ -504,12 +504,21 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, > break; > > case NETDEV_CVLAN_FILTER_PUSH_INFO: > + flgs = dev_get_flags(dev); Why call dev_get_flags()? You can test dev->flags & IFF_UP directly > + if (flgs & IFF_UP) { > + pr_info("adding VLAN 0 to HW filter on device %s\n", > + dev->name); > + vlan_vid_add(dev, htons(ETH_P_8021Q), 0); Not sure if this works always, because if we have no vlan at all when the device comes up vlan_info will be NULL and we won't even get here. IIUC adding vlan 0 has to be handled early, where UP is handled. > + } > err = vlan_filter_push_vids(vlan_info, htons(ETH_P_8021Q)); > if (err) > return notifier_from_errno(err); > break; > > case NETDEV_CVLAN_FILTER_DROP_INFO: > + flgs = dev_get_flags(dev); > + if (flgs & IFF_UP) > + vlan_vid_del(dev, htons(ETH_P_8021Q), 0); > vlan_filter_drop_vids(vlan_info, htons(ETH_P_8021Q)); -- pw-bot: cr ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime 2025-06-25 0:42 ` Jakub Kicinski @ 2025-06-26 3:41 ` dongchenchen (A) 2025-06-27 14:41 ` Ido Schimmel 0 siblings, 1 reply; 6+ messages in thread From: dongchenchen (A) @ 2025-06-26 3:41 UTC (permalink / raw) To: Jakub Kicinski Cc: davem, edumazet, pabeni, horms, jiri, oscmaes92, linux, pedro.netdev, netdev, linux-kernel, yuehaibing, zhangchangzhong > On Mon, 23 Jun 2025 19:30:08 +0800 Dong Chenchen wrote: >> $ ip link add bond0 type bond mode 0 >> $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q >> $ ethtool -K bond0 rx-vlan-filter off >> $ ifconfig bond0 up >> $ ethtool -K bond0 rx-vlan-filter on >> $ ifconfig bond0 down >> $ ifconfig bond0 up >> $ ip link del vlan0 Hi, Jakub Thanks for your review! > Please try to figure out the reasonable combinations in which we can > change the flags and bring the device up and down. Create a selftest > in bash and add it under tools/testing/selftests/net selftest patch will be added in v2. >> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c >> index 06908e37c3d9..6e01ece0a95c 100644 >> --- a/net/8021q/vlan.c >> +++ b/net/8021q/vlan.c >> @@ -504,12 +504,21 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, >> break; >> >> case NETDEV_CVLAN_FILTER_PUSH_INFO: >> + flgs = dev_get_flags(dev); > Why call dev_get_flags()? You can test dev->flags & IFF_UP directly Yes, there is no need to use this function, I will modify it in v2 >> + if (flgs & IFF_UP) { >> + pr_info("adding VLAN 0 to HW filter on device %s\n", >> + dev->name); >> + vlan_vid_add(dev, htons(ETH_P_8021Q), 0); > Not sure if this works always, because if we have no vlan at all when > the device comes up vlan_info will be NULL and we won't even get here. > > IIUC adding vlan 0 has to be handled early, where UP is handled. Yes, that's right. the sequence as below can still trigger the issue: $ ip link add bond0 type bond mode 0 $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ifconfig bond0 down $ ifconfig bond0 up $ ip link del vlan0 maybe we can fix it by: diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 6e01ece0a95c..262f8d3f06ef 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -378,14 +378,18 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, return notifier_from_errno(err); } - if ((event == NETDEV_UP) && - (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) { + if (((event == NETDEV_UP) && + (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) || + (event == NETDEV_CVLAN_FILTER_PUSH_INFO && + (dev->flags & IFF_UP))) { pr_info("adding VLAN 0 to HW filter on device %s\n", dev->name); vlan_vid_add(dev, htons(ETH_P_8021Q), 0); } - if (event == NETDEV_DOWN && - (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) + if ((event == NETDEV_DOWN && + (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) || + (event == NETDEV_CVLAN_FILTER_DROP_INFO && + (dev->flags & IFF_UP))) vlan_vid_del(dev, htons(ETH_P_8021Q), 0); vlan_info = rtnl_dereference(dev->vlan_info); ------ Best regards Dong Chenchen >> + } >> err = vlan_filter_push_vids(vlan_info, htons(ETH_P_8021Q)); >> if (err) >> return notifier_from_errno(err); >> break; >> >> case NETDEV_CVLAN_FILTER_DROP_INFO: >> + flgs = dev_get_flags(dev); >> + if (flgs & IFF_UP) >> + vlan_vid_del(dev, htons(ETH_P_8021Q), 0); >> vlan_filter_drop_vids(vlan_info, htons(ETH_P_8021Q)); ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime 2025-06-26 3:41 ` dongchenchen (A) @ 2025-06-27 14:41 ` Ido Schimmel 2025-06-30 1:25 ` dongchenchen (A) 0 siblings, 1 reply; 6+ messages in thread From: Ido Schimmel @ 2025-06-27 14:41 UTC (permalink / raw) To: dongchenchen (A) Cc: Jakub Kicinski, davem, edumazet, pabeni, horms, jiri, oscmaes92, linux, pedro.netdev, netdev, linux-kernel, yuehaibing, zhangchangzhong On Thu, Jun 26, 2025 at 11:41:45AM +0800, dongchenchen (A) wrote: > maybe we can fix it by: > > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c > index 6e01ece0a95c..262f8d3f06ef 100644 > --- a/net/8021q/vlan.c > +++ b/net/8021q/vlan.c > @@ -378,14 +378,18 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, > return notifier_from_errno(err); > } > - if ((event == NETDEV_UP) && > - (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) { > + if (((event == NETDEV_UP) && > + (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) || > + (event == NETDEV_CVLAN_FILTER_PUSH_INFO && > + (dev->flags & IFF_UP))) { > pr_info("adding VLAN 0 to HW filter on device %s\n", > dev->name); > vlan_vid_add(dev, htons(ETH_P_8021Q), 0); > } > - if (event == NETDEV_DOWN && > - (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) > + if ((event == NETDEV_DOWN && > + (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) || > + (event == NETDEV_CVLAN_FILTER_DROP_INFO && > + (dev->flags & IFF_UP))) > vlan_vid_del(dev, htons(ETH_P_8021Q), 0); > vlan_info = rtnl_dereference(dev->vlan_info); As I understand it, there are two issues: 1. VID 0 is deleted when it shouldn't. This leads to the crashes you shared. 2. VID 0 is not deleted when it should. This leads to memory leaks like [1] with a reproducer such as [2]. AFAICT, your proposed patch solves the second issue, but only partially addresses the first issue. The automatic addition of VID 0 is assumed to be successful, but it can fail due to hardware issues or memory allocation failures. I realize it is not common, but it can happen. If you annotate __vlan_vid_add() [3] and inject failures [4], you will see that the crashes still happen with your patch. WDYT about something like [5]? Basically, noting in the VLAN info whether VID 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of "rx-vlan-filter". [1] unreferenced object 0xffff888007468a00 (size 256): comm "ip", pid 386, jiffies 4294820761 hex dump (first 32 bytes): 00 20 26 0a 80 88 ff ff 00 00 00 00 00 00 00 00 . &............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 43031ab1): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 unreferenced object 0xffff888002fc3aa0 (size 32): comm "ip", pid 386, jiffies 4294820761 hex dump (first 32 bytes): a0 8a 46 07 80 88 ff ff a0 8a 46 07 80 88 ff ff ..F.......F..... 81 00 00 00 01 00 00 00 cc cc cc cc cc cc cc cc ................ backtrace (crc c2f2186c): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x25a/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] #!/bin/bash ip link add bond1 up type bond mode 0 ethtool -K bond1 rx-vlan-filter off ip link del dev bond1 [3] diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c index 9404dd551dfd..6dc25c4877f2 100644 --- a/net/8021q/vlan_core.c +++ b/net/8021q/vlan_core.c @@ -314,6 +314,7 @@ static int __vlan_vid_add(struct vlan_info *vlan_info, __be16 proto, u16 vid, *pvid_info = vid_info; return 0; } +ALLOW_ERROR_INJECTION(__vlan_vid_add, ERRNO); int vlan_vid_add(struct net_device *dev, __be16 proto, u16 vid) { [4] #!/bin/bash echo __vlan_vid_add > /sys/kernel/debug/fail_function/inject printf %#x -12 > /sys/kernel/debug/fail_function/__vlan_vid_add/retval echo 100 > /sys/kernel/debug/fail_function/probability echo 1 > /sys/kernel/debug/fail_function/times echo 1 > /sys/kernel/debug/fail_function/verbose ip link add bond1 up type bond mode 0 ip link add link bond1 name vlan0 type vlan id 0 protocol 802.1q ip link set dev bond1 down ip link del vlan0 ip link del dev bond1 [5] diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 06908e37c3d9..9a6df8c1daf9 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -357,6 +357,35 @@ static int __vlan_device_event(struct net_device *dev, unsigned long event) return err; } +static void vlan_vid0_add(struct net_device *dev) +{ + struct vlan_info *vlan_info; + int err; + + if (!(dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) + return; + + pr_info("adding VLAN 0 to HW filter on device %s\n", dev->name); + + err = vlan_vid_add(dev, htons(ETH_P_8021Q), 0); + if (err) + return; + + vlan_info = rtnl_dereference(dev->vlan_info); + vlan_info->auto_vid0 = true; +} + +static void vlan_vid0_del(struct net_device *dev) +{ + struct vlan_info *vlan_info = rtnl_dereference(dev->vlan_info); + + if (!vlan_info || !vlan_info->auto_vid0) + return; + + vlan_info->auto_vid0 = false; + vlan_vid_del(dev, htons(ETH_P_8021Q), 0); +} + static int vlan_device_event(struct notifier_block *unused, unsigned long event, void *ptr) { @@ -378,15 +407,10 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, return notifier_from_errno(err); } - if ((event == NETDEV_UP) && - (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) { - pr_info("adding VLAN 0 to HW filter on device %s\n", - dev->name); - vlan_vid_add(dev, htons(ETH_P_8021Q), 0); - } - if (event == NETDEV_DOWN && - (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) - vlan_vid_del(dev, htons(ETH_P_8021Q), 0); + if (event == NETDEV_UP) + vlan_vid0_add(dev); + else if (event == NETDEV_DOWN) + vlan_vid0_del(dev); vlan_info = rtnl_dereference(dev->vlan_info); if (!vlan_info) diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h index 5eaf38875554..c7ffe591d593 100644 --- a/net/8021q/vlan.h +++ b/net/8021q/vlan.h @@ -33,6 +33,7 @@ struct vlan_info { struct vlan_group grp; struct list_head vid_list; unsigned int nr_vids; + bool auto_vid0; struct rcu_head rcu; }; ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime 2025-06-27 14:41 ` Ido Schimmel @ 2025-06-30 1:25 ` dongchenchen (A) 2025-06-30 8:14 ` Ido Schimmel 0 siblings, 1 reply; 6+ messages in thread From: dongchenchen (A) @ 2025-06-30 1:25 UTC (permalink / raw) To: Ido Schimmel Cc: Jakub Kicinski, davem, edumazet, pabeni, horms, jiri, oscmaes92, linux, pedro.netdev, netdev, linux-kernel, yuehaibing, zhangchangzhong > On Thu, Jun 26, 2025 at 11:41:45AM +0800, dongchenchen (A) wrote: > As I understand it, there are two issues: > > 1. VID 0 is deleted when it shouldn't. This leads to the crashes you > shared. > > 2. VID 0 is not deleted when it should. This leads to memory leaks like > [1] with a reproducer such as [2]. > > AFAICT, your proposed patch solves the second issue, but only partially > addresses the first issue. The automatic addition of VID 0 is assumed to > be successful, but it can fail due to hardware issues or memory > allocation failures. I realize it is not common, but it can happen. If > you annotate __vlan_vid_add() [3] and inject failures [4], you will see > that the crashes still happen with your patch. Hi, Ido Thanks for your review! > WDYT about something like [5]? Basically, noting in the VLAN info This fix [5] can completely solve the problem. I will send it together with selftest patch. > whether VID 0 was automatically added upon NETDEV_UP and based on that > decide whether it should be deleted upon NETDEV_DOWN, regardless of > "rx-vlan-filter". one small additional question: vlan0 will not exist if netdev set rx-vlan-filter after NETDEV_UP. Will this cause a difference in the processing logic for 8021q packets? Best regards Dong Chenchen > [2] > #!/bin/bash > > ip link add bond1 up type bond mode 0 > ethtool -K bond1 rx-vlan-filter off > ip link del dev bond1 > > [3] > diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c > index 9404dd551dfd..6dc25c4877f2 100644 > --- a/net/8021q/vlan_core.c > +++ b/net/8021q/vlan_core.c > @@ -314,6 +314,7 @@ static int __vlan_vid_add(struct vlan_info *vlan_info, __be16 proto, u16 vid, > *pvid_info = vid_info; > return 0; > } > +ALLOW_ERROR_INJECTION(__vlan_vid_add, ERRNO); > > int vlan_vid_add(struct net_device *dev, __be16 proto, u16 vid) > { > > [4] > #!/bin/bash > > echo __vlan_vid_add > /sys/kernel/debug/fail_function/inject > printf %#x -12 > /sys/kernel/debug/fail_function/__vlan_vid_add/retval > echo 100 > /sys/kernel/debug/fail_function/probability > echo 1 > /sys/kernel/debug/fail_function/times > echo 1 > /sys/kernel/debug/fail_function/verbose > ip link add bond1 up type bond mode 0 > ip link add link bond1 name vlan0 type vlan id 0 protocol 802.1q > ip link set dev bond1 down > ip link del vlan0 > ip link del dev bond1 > > [5] > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c > index 06908e37c3d9..9a6df8c1daf9 100644 > --- a/net/8021q/vlan.c > +++ b/net/8021q/vlan.c > @@ -357,6 +357,35 @@ static int __vlan_device_event(struct net_device *dev, unsigned long event) > return err; > } > > +static void vlan_vid0_add(struct net_device *dev) > +{ > + struct vlan_info *vlan_info; > + int err; > + > + if (!(dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) > + return; > + > + pr_info("adding VLAN 0 to HW filter on device %s\n", dev->name); > + > + err = vlan_vid_add(dev, htons(ETH_P_8021Q), 0); > + if (err) > + return; > + > + vlan_info = rtnl_dereference(dev->vlan_info); > + vlan_info->auto_vid0 = true; > +} > + > +static void vlan_vid0_del(struct net_device *dev) > +{ > + struct vlan_info *vlan_info = rtnl_dereference(dev->vlan_info); > + > + if (!vlan_info || !vlan_info->auto_vid0) > + return; > + > + vlan_info->auto_vid0 = false; > + vlan_vid_del(dev, htons(ETH_P_8021Q), 0); > +} > + > static int vlan_device_event(struct notifier_block *unused, unsigned long event, > void *ptr) > { > @@ -378,15 +407,10 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, > return notifier_from_errno(err); > } > > - if ((event == NETDEV_UP) && > - (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) { > - pr_info("adding VLAN 0 to HW filter on device %s\n", > - dev->name); > - vlan_vid_add(dev, htons(ETH_P_8021Q), 0); > - } > - if (event == NETDEV_DOWN && > - (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) > - vlan_vid_del(dev, htons(ETH_P_8021Q), 0); > + if (event == NETDEV_UP) > + vlan_vid0_add(dev); > + else if (event == NETDEV_DOWN) > + vlan_vid0_del(dev); > > vlan_info = rtnl_dereference(dev->vlan_info); > if (!vlan_info) > diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h > index 5eaf38875554..c7ffe591d593 100644 > --- a/net/8021q/vlan.h > +++ b/net/8021q/vlan.h > @@ -33,6 +33,7 @@ struct vlan_info { > struct vlan_group grp; > struct list_head vid_list; > unsigned int nr_vids; > + bool auto_vid0; > struct rcu_head rcu; > }; > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime 2025-06-30 1:25 ` dongchenchen (A) @ 2025-06-30 8:14 ` Ido Schimmel 0 siblings, 0 replies; 6+ messages in thread From: Ido Schimmel @ 2025-06-30 8:14 UTC (permalink / raw) To: dongchenchen (A) Cc: Jakub Kicinski, davem, edumazet, pabeni, horms, jiri, oscmaes92, linux, pedro.netdev, netdev, linux-kernel, yuehaibing, zhangchangzhong On Mon, Jun 30, 2025 at 09:25:42AM +0800, dongchenchen (A) wrote: > > > On Thu, Jun 26, 2025 at 11:41:45AM +0800, dongchenchen (A) wrote: > > As I understand it, there are two issues: > > > > 1. VID 0 is deleted when it shouldn't. This leads to the crashes you > > shared. > > > > 2. VID 0 is not deleted when it should. This leads to memory leaks like > > [1] with a reproducer such as [2]. > > > > AFAICT, your proposed patch solves the second issue, but only partially > > addresses the first issue. The automatic addition of VID 0 is assumed to > > be successful, but it can fail due to hardware issues or memory > > allocation failures. I realize it is not common, but it can happen. If > > you annotate __vlan_vid_add() [3] and inject failures [4], you will see > > that the crashes still happen with your patch. > > Hi, Ido > Thanks for your review! > > > WDYT about something like [5]? Basically, noting in the VLAN info > > This fix [5] can completely solve the problem. I will send it together with > selftest patch. Thanks. Please add tests for both cases (memory leak and crash). > > > whether VID 0 was automatically added upon NETDEV_UP and based on that > > decide whether it should be deleted upon NETDEV_DOWN, regardless of > > "rx-vlan-filter". > > one small additional question: vlan0 will not exist if netdev set rx-vlan-filter after NETDEV_UP. > Will this cause a difference in the processing logic for 8021q packets? AFAICT the proposed patch does not change this behavior. Users can bring the netdev down and then up if they want the kernel to add VID 0. My understanding is that "rx-vlan-filter on" without VID 0 will cause prio-tagged packets to be dropped by the underlying device. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-06-30 8:14 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-23 11:30 [PATCH net] net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime Dong Chenchen 2025-06-25 0:42 ` Jakub Kicinski 2025-06-26 3:41 ` dongchenchen (A) 2025-06-27 14:41 ` Ido Schimmel 2025-06-30 1:25 ` dongchenchen (A) 2025-06-30 8:14 ` Ido Schimmel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).