netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ido Schimmel <idosch@idosch.org>
To: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Cc: netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	Nikolay Aleksandrov <razor@blackwall.org>,
	Florent Fourcot <florent.fourcot@wifirst.fr>,
	Hangbin Liu <liuhangbin@gmail.com>,
	Petr Machata <petrm@nvidia.com>, Jiri Pirko <jiri@resnulli.us>,
	Xin Long <lucien.xin@gmail.com>,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [BUG] kmemleak in rtnetlink_rcv() triggered by selftests/drivers/net/team in build cdc9718d5e59
Date: Sun, 9 Apr 2023 20:14:13 +0300	[thread overview]
Message-ID: <ZDLyZX545Cw+aLhE@shredder> (raw)
In-Reply-To: <78a8a03b-6070-3e6b-5042-f848dab16fb8@alu.unizg.hr>

On Sun, Apr 09, 2023 at 01:49:30PM +0200, Mirsad Goran Todorovac wrote:
> Hi all,
> 
> There appears to be a memleak triggered by the selftest drivers/net/team.

Thanks for the report. Not sure it's related to team, see below.

> 
> # cat /sys/kernel/debug/kmemleak
> unreferenced object 0xffff8c18def8ee00 (size 256):
>   comm "ip", pid 5727, jiffies 4294961159 (age 954.244s)
>   hex dump (first 32 bytes):
>     00 20 09 de 18 8c ff ff 00 00 00 00 00 00 00 00  . ..............
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<ffffffffb60fb25c>] slab_post_alloc_hook+0x8c/0x3e0
>     [<ffffffffb6102b39>] __kmem_cache_alloc_node+0x1d9/0x2a0
>     [<ffffffffb607684e>] kmalloc_trace+0x2e/0xc0
>     [<ffffffffb6dbc00b>] vlan_vid_add+0x11b/0x290
>     [<ffffffffb6dbcffc>] vlan_device_event+0x19c/0x880
>     [<ffffffffb5dde4d7>] raw_notifier_call_chain+0x47/0x70
>     [<ffffffffb6ab6940>] call_netdevice_notifiers_info+0x50/0xa0
>     [<ffffffffb6ac7574>] dev_open+0x94/0xa0
>     [<ffffffffc176515e>] 0xffffffffc176515e

Don't know what this is. Might be another issue.

>     [<ffffffffb6ada6b0>] do_set_master+0x90/0xb0
>     [<ffffffffb6adc5f4>] do_setlink+0x514/0x11f0
>     [<ffffffffb6ae4507>] __rtnl_newlink+0x4e7/0xa10
>     [<ffffffffb6ae4a8c>] rtnl_newlink+0x4c/0x70
>     [<ffffffffb6adf334>] rtnetlink_rcv_msg+0x184/0x5d0
>     [<ffffffffb6b6ad1e>] netlink_rcv_skb+0x5e/0x110
>     [<ffffffffb6ada0e9>] rtnetlink_rcv+0x19/0x20
> unreferenced object 0xffff8c18250d3700 (size 32):
>   comm "ip", pid 5727, jiffies 4294961159 (age 954.244s)
>   hex dump (first 32 bytes):
>     a0 ee f8 de 18 8c ff ff a0 ee f8 de 18 8c ff ff  ................
>     81 00 00 00 01 00 00 00 cc cc cc cc cc cc cc cc  ................
>   backtrace:
>     [<ffffffffb60fb25c>] slab_post_alloc_hook+0x8c/0x3e0
>     [<ffffffffb6102b39>] __kmem_cache_alloc_node+0x1d9/0x2a0
>     [<ffffffffb607684e>] kmalloc_trace+0x2e/0xc0
>     [<ffffffffb6dbc064>] vlan_vid_add+0x174/0x290
>     [<ffffffffb6dbcffc>] vlan_device_event+0x19c/0x880
>     [<ffffffffb5dde4d7>] raw_notifier_call_chain+0x47/0x70
>     [<ffffffffb6ab6940>] call_netdevice_notifiers_info+0x50/0xa0
>     [<ffffffffb6ac7574>] dev_open+0x94/0xa0
>     [<ffffffffc176515e>] 0xffffffffc176515e
>     [<ffffffffb6ada6b0>] do_set_master+0x90/0xb0
>     [<ffffffffb6adc5f4>] do_setlink+0x514/0x11f0
>     [<ffffffffb6ae4507>] __rtnl_newlink+0x4e7/0xa10
>     [<ffffffffb6ae4a8c>] rtnl_newlink+0x4c/0x70
>     [<ffffffffb6adf334>] rtnetlink_rcv_msg+0x184/0x5d0
>     [<ffffffffb6b6ad1e>] netlink_rcv_skb+0x5e/0x110
>     [<ffffffffb6ada0e9>] rtnetlink_rcv+0x19/0x20
> unreferenced object 0xffff8c1846e16800 (size 256):
>   comm "ip", pid 7837, jiffies 4295135225 (age 258.160s)
>   hex dump (first 32 bytes):
>     00 20 f7 de 18 8c ff ff 00 00 00 00 00 00 00 00  . ..............
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<ffffffffb60fb25c>] slab_post_alloc_hook+0x8c/0x3e0
>     [<ffffffffb6102b39>] __kmem_cache_alloc_node+0x1d9/0x2a0
>     [<ffffffffb607684e>] kmalloc_trace+0x2e/0xc0
>     [<ffffffffb6dbc00b>] vlan_vid_add+0x11b/0x290
>     [<ffffffffb6dbcffc>] vlan_device_event+0x19c/0x880
>     [<ffffffffb5dde4d7>] raw_notifier_call_chain+0x47/0x70
>     [<ffffffffb6ab6940>] call_netdevice_notifiers_info+0x50/0xa0
>     [<ffffffffb6ac7574>] dev_open+0x94/0xa0
>     [<ffffffffc177115e>] bond_enslave+0x34e/0x1840 [bonding]

This shows that the issue is related to the bond driver, not team.

>     [<ffffffffb6ada6b0>] do_set_master+0x90/0xb0
>     [<ffffffffb6adc5f4>] do_setlink+0x514/0x11f0
>     [<ffffffffb6ae4507>] __rtnl_newlink+0x4e7/0xa10
>     [<ffffffffb6ae4a8c>] rtnl_newlink+0x4c/0x70
>     [<ffffffffb6adf334>] rtnetlink_rcv_msg+0x184/0x5d0
>     [<ffffffffb6b6ad1e>] netlink_rcv_skb+0x5e/0x110
>     [<ffffffffb6ada0e9>] rtnetlink_rcv+0x19/0x20
> unreferenced object 0xffff8c184c5ff2a0 (size 32):

This is 'struct vlan_vid_info'

>   comm "ip", pid 7837, jiffies 4295135225 (age 258.160s)
>   hex dump (first 32 bytes):
>     a0 68 e1 46 18 8c ff ff a0 68 e1 46 18 8c ff ff  .h.F.....h.F....
>     81 00 00 00 01 00 00 00 cc cc cc cc cc cc cc cc  ................
            ^ VLAN ID 0

>   backtrace:
>     [<ffffffffb60fb25c>] slab_post_alloc_hook+0x8c/0x3e0
>     [<ffffffffb6102b39>] __kmem_cache_alloc_node+0x1d9/0x2a0
>     [<ffffffffb607684e>] kmalloc_trace+0x2e/0xc0
>     [<ffffffffb6dbc064>] vlan_vid_add+0x174/0x290
>     [<ffffffffb6dbcffc>] vlan_device_event+0x19c/0x880
>     [<ffffffffb5dde4d7>] raw_notifier_call_chain+0x47/0x70
>     [<ffffffffb6ab6940>] call_netdevice_notifiers_info+0x50/0xa0
>     [<ffffffffb6ac7574>] dev_open+0x94/0xa0
>     [<ffffffffc177115e>] bond_enslave+0x34e/0x1840 [bonding]
>     [<ffffffffb6ada6b0>] do_set_master+0x90/0xb0
>     [<ffffffffb6adc5f4>] do_setlink+0x514/0x11f0
>     [<ffffffffb6ae4507>] __rtnl_newlink+0x4e7/0xa10
>     [<ffffffffb6ae4a8c>] rtnl_newlink+0x4c/0x70
>     [<ffffffffb6adf334>] rtnetlink_rcv_msg+0x184/0x5d0
>     [<ffffffffb6b6ad1e>] netlink_rcv_skb+0x5e/0x110
>     [<ffffffffb6ada0e9>] rtnetlink_rcv+0x19/0x20

VLAN ID 0 is automatically added by the 8021q driver when a net device
is opened. In this case it's a device being enslaved to a bond. I
believe the issue was exposed by the new bond test that was added in
commit 222c94ec0ad4 ("selftests: bonding: add tests for ether type
changes") as part of v6.3-rc3.

The VLAN is supposed to be removed by the 8021q driver when a net device
is closed and the bond driver indeed calls dev_close() when a slave is
removed. However, this function is a NOP when 'IFF_UP' is not set.
Unfortunately, when a bond changes its type to Ethernet this flag is
incorrectly cleared in bond_ether_setup(), causing this VLAN to linger.
As far as I can tell, it's not a new issue.

Temporary fix is [1]. Please test it although we might end up with a
different fix (needs more thinking and it's already late here).

Reproduced using [2]. You can see in the before/after output how the
flag is cleared/retained [3].

[1]
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 236e5219c811..50dc068dc259 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1777,14 +1777,15 @@ void bond_lower_state_changed(struct slave *slave)
 
 /* The bonding driver uses ether_setup() to convert a master bond device
  * to ARPHRD_ETHER, that resets the target netdevice's flags so we always
- * have to restore the IFF_MASTER flag, and only restore IFF_SLAVE if it was set
+ * have to restore the IFF_MASTER flag, and only restore IFF_SLAVE and IFF_UP
+ * if they were set
  */
 static void bond_ether_setup(struct net_device *bond_dev)
 {
-	unsigned int slave_flag = bond_dev->flags & IFF_SLAVE;
+	unsigned int flags = bond_dev->flags & (IFF_SLAVE | IFF_UP);
 
 	ether_setup(bond_dev);
-	bond_dev->flags |= IFF_MASTER | slave_flag;
+	bond_dev->flags |= IFF_MASTER | flags;
 	bond_dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 }
 
[2]
#!/bin/bash

ip link add name t-nlmon type nlmon
ip link add name t-dummy type dummy
ip link add name t-bond type bond mode active-backup

ip link set dev t-bond up
ip link set dev t-nlmon master t-bond
ip link set dev t-nlmon nomaster
ip link show dev t-bond
ip link set dev t-dummy master t-bond
ip link show dev t-bond

ip link del dev t-bond
ip link del dev t-dummy
ip link del dev t-nlmon

[3]
Before:

12: t-bond: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/netlink
12: t-bond: <BROADCAST,MULTICAST,MASTER,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ce:b2:31:0a:53:83 brd ff:ff:ff:ff:ff:ff

After:

12: t-bond: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/netlink
12: t-bond: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 5a:18:e7:85:11:73 brd ff:ff:ff:ff:ff:ff

  reply	other threads:[~2023-04-09 17:14 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-09 11:49 [BUG] kmemleak in rtnetlink_rcv() triggered by selftests/drivers/net/team in build cdc9718d5e59 Mirsad Goran Todorovac
2023-04-09 17:14 ` Ido Schimmel [this message]
2023-04-09 18:47   ` [BUG] [FIXED: TESTED] " Mirsad Goran Todorovac
2023-04-10 17:34     ` Mirsad Goran Todorovac
2023-04-10 17:49       ` [BUG] [TESTING: HANG] " Mirsad Goran Todorovac
2023-04-13 18:19       ` [BUG] [FIXED: TESTED] " Ido Schimmel
2023-04-21 11:35         ` Mirsad Goran Todorovac

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZDLyZX545Cw+aLhE@shredder \
    --to=idosch@idosch.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=florent.fourcot@wifirst.fr \
    --cc=hawk@kernel.org \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liuhangbin@gmail.com \
    --cc=lucien.xin@gmail.com \
    --cc=mirsad.todorovac@alu.unizg.hr \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=petrm@nvidia.com \
    --cc=razor@blackwall.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).