Re: Bridge VLAN memory leak

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ido Schimmel <idosch@nvidia.com>
To: Vlad Buslov <vladbu@nvidia.com>
Cc: netdev@vger.kernel.org, Maor Dickman <maord@nvidia.com>,
	Roopa Prabhu <roopa@nvidia.com>,
	razor@blackwall.org
Subject: Re: Bridge VLAN memory leak
Date: Tue, 5 Jul 2022 16:55:25 +0300	[thread overview]
Message-ID: <YsRCzTrajO5GZemf@shredder> (raw)
In-Reply-To: <87a69oc0qs.fsf@nvidia.com>

On Mon, Jul 04, 2022 at 06:47:29PM +0300, Vlad Buslov wrote:
> Hi Ido,
> 
> While implementing QinQ offload support in mlx5 I encountered a memory
> leak[0] in the bridge implementation which seems to be related to the new
> BR_VLFLAG_ADDED_BY_SWITCHDEV flag that you have recently added.

FTR, added here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=279737939a8194f02fa352ab4476a1b241f44ef4

> 
> To reproduce the issue netdevice must support bridge VLAN offload, so I
> can't provide a simple script that uses veth or anything like that.
> Instead, I'll describe the issue step-by-step:
> 
> 1. Create a bridge, add offload-capable netdevs to it and assign some
> VLAN to them. __vlan_vid_add() function will set the
> BR_VLFLAG_ADDED_BY_SWITCHDEV flag since br_switchdev_port_vlan_add()
> should return 0 if dev can offload VLANs and will also skip call to
> vlan_vid_add() in such case:
> 
>         /* Try switchdev op first. In case it is not supported, fallback to
>          * 8021q add.
>          */
>         err = br_switchdev_port_vlan_add(dev, v->vid, flags, false, extack);
>         if (err == -EOPNOTSUPP)
>                 return vlan_vid_add(dev, br->vlan_proto, v->vid);
>         v->priv_flags |= BR_VLFLAG_ADDED_BY_SWITCHDEV;
> 
> 2. Enable filtering and set VLAN protocol to 802.1ad. That will trigger
> the following code in __br_vlan_set_proto() that re-creates existing
> VLANs with vlan_vid_add() function call whether they have the
> BR_VLFLAG_ADDED_BY_SWITCHDEV flag set or not:
> 
>          /* Add VLANs for the new proto to the device filter. */
>          list_for_each_entry(p, &br->port_list, list) {
>                  vg = nbp_vlan_group(p);
>                  list_for_each_entry(vlan, &vg->vlan_list, vlist) {
>                          err = vlan_vid_add(p->dev, proto, vlan->vid);
>                          if (err)
>                                  goto err_filt;
>                  }
>          }
> 
> 3. Now delete the bridge. That will delete all existing VLANs via
> __vlan_vid_del() function, which skips calling vlan_vid_del() (that is
> necessary to clean up after vlan_vid_add()) if VLAN has
> BR_VLFLAG_ADDED_BY_SWITCHDEV flag set:
> 
>          /* Try switchdev op first. In case it is not supported, fallback to
>           * 8021q del.
>           */
>          err = br_switchdev_port_vlan_del(dev, v->vid);
>          if (!(v->priv_flags & BR_VLFLAG_ADDED_BY_SWITCHDEV))
>                  vlan_vid_del(dev, br->vlan_proto, v->vid);

Looking at the code before the change, I'm pretty sure you will be able
to reproduce the leak prior to above mentioned commit:

```
-	err = br_switchdev_port_vlan_del(dev, vid);
-	if (err == -EOPNOTSUPP) {
-		vlan_vid_del(dev, br->vlan_proto, vid);
-		return 0;
-	}
-	return err;
```

> 
> 
> The issue doesn't reproduce for me anymore if I just clear the
> BR_VLFLAG_ADDED_BY_SWITCHDEV flag when re-creating VLANs on step 2.
> However, I'm not sure whether it is the right approach in this case.
> WDYT?

As a switchdev driver you already know about the new VLAN protocol via
'SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL' so do we really need the VLANs
to be programmed again? The VLAN protocol is not communicated in
'SWITCHDEV_OBJ_ID_PORT_VLAN' anyway.

Can you try the below (compile tested only)?

```
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 6e53dc991409..9ffd40b8270c 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -959,6 +959,8 @@ int __br_vlan_set_proto(struct net_bridge *br, __be16 proto,
 	list_for_each_entry(p, &br->port_list, list) {
 		vg = nbp_vlan_group(p);
 		list_for_each_entry(vlan, &vg->vlan_list, vlist) {
+			if (vlan->priv_flags & BR_VLFLAG_ADDED_BY_SWITCHDEV)
+				continue;
 			err = vlan_vid_add(p->dev, proto, vlan->vid);
 			if (err)
 				goto err_filt;
@@ -973,8 +975,11 @@ int __br_vlan_set_proto(struct net_bridge *br, __be16 proto,
 	/* Delete VLANs for the old proto from the device filter. */
 	list_for_each_entry(p, &br->port_list, list) {
 		vg = nbp_vlan_group(p);
-		list_for_each_entry(vlan, &vg->vlan_list, vlist)
+		list_for_each_entry(vlan, &vg->vlan_list, vlist) {
+			if (vlan->priv_flags & BR_VLFLAG_ADDED_BY_SWITCHDEV)
+				continue;
 			vlan_vid_del(p->dev, oldproto, vlan->vid);
+		}
 	}
 
 	return 0;
@@ -983,13 +988,19 @@ int __br_vlan_set_proto(struct net_bridge *br, __be16 proto,
 	attr.u.vlan_protocol = ntohs(oldproto);
 	switchdev_port_attr_set(br->dev, &attr, NULL);
 
-	list_for_each_entry_continue_reverse(vlan, &vg->vlan_list, vlist)
+	list_for_each_entry_continue_reverse(vlan, &vg->vlan_list, vlist) {
+		if (vlan->priv_flags & BR_VLFLAG_ADDED_BY_SWITCHDEV)
+			continue;
 		vlan_vid_del(p->dev, proto, vlan->vid);
+	}
 
 	list_for_each_entry_continue_reverse(p, &br->port_list, list) {
 		vg = nbp_vlan_group(p);
-		list_for_each_entry(vlan, &vg->vlan_list, vlist)
+		list_for_each_entry(vlan, &vg->vlan_list, vlist) {
+			if (vlan->priv_flags & BR_VLFLAG_ADDED_BY_SWITCHDEV)
+				continue;
 			vlan_vid_del(p->dev, proto, vlan->vid);
+		}
 	}
 
 	return err;
```

> 
> [0]:
> 
> unreferenced object 0xffff8881f6771200 (size 256):
>   comm "ip", pid 446855, jiffies 4298238841 (age 55.240s)
>   hex dump (first 32 bytes):
>     00 00 7f 0e 83 88 ff ff 00 00 00 00 00 00 00 00  ................
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<00000000012819ac>] vlan_vid_add+0x437/0x750
>     [<00000000f2281fad>] __br_vlan_set_proto+0x289/0x920
>     [<000000000632b56f>] br_changelink+0x3d6/0x13f0
>     [<0000000089d25f04>] __rtnl_newlink+0x8ae/0x14c0
>     [<00000000f6276baf>] rtnl_newlink+0x5f/0x90
>     [<00000000746dc902>] rtnetlink_rcv_msg+0x336/0xa00
>     [<000000001c2241c0>] netlink_rcv_skb+0x11d/0x340
>     [<0000000010588814>] netlink_unicast+0x438/0x710
>     [<00000000e1a4cd5c>] netlink_sendmsg+0x788/0xc40
>     [<00000000e8992d4e>] sock_sendmsg+0xb0/0xe0
>     [<00000000621b8f91>] ____sys_sendmsg+0x4ff/0x6d0
>     [<000000000ea26996>] ___sys_sendmsg+0x12e/0x1b0
>     [<00000000684f7e25>] __sys_sendmsg+0xab/0x130
>     [<000000004538b104>] do_syscall_64+0x3d/0x90
>     [<0000000091ed9678>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
> 
> 
> Regards,
> Vlad

next prev parent reply	other threads:[~2022-07-05 14:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-04 15:47 Bridge VLAN memory leak Vlad Buslov
2022-07-05 13:55 ` Ido Schimmel [this message]
2022-07-05 14:46   ` Vlad Buslov
2022-07-05 15:42     ` Ido Schimmel
2022-07-05 16:25       ` Vlad Buslov

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:6e53dc99140 dfblob:9ffd40b8270 )
 OR (
bs:"Re: Bridge VLAN memory leak" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YsRCzTrajO5GZemf@shredder \
    --to=idosch@nvidia.com \
    --cc=maord@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=razor@blackwall.org \
    --cc=roopa@nvidia.com \
    --cc=vladbu@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.