netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: <Joe.Ghalam@dell.com>
To: <maheshb@google.com>
Cc: <herbert@gondor.apana.org.au>, <davem@davemloft.net>,
	<Clifford.Wichmann@dell.com>, <netdev@vger.kernel.org>
Subject: Re: macvlan: Fix device ref leak when purging bc_queue
Date: Fri, 21 Apr 2017 20:37:56 +0000	[thread overview]
Message-ID: <1492807076414.83805@Dell.com> (raw)
In-Reply-To: <CAF2d9jis_NcDO4bfOuf1OxPf_1XvOJQA1eo0OKW5SEk2O-Ny0A@mail.gmail.com>

________________________________________
> From: Mahesh Bandewar (महेश बंडेवार) <maheshb@google.com>
> Sent: Friday, April 21, 2017 12:23 PM
> To: Ghalam, Joe
> Cc: herbert@gondor.apana.org.au; David Miller; Wichmann, Clifford; linux-netdev
> Subject: Re: macvlan: Fix device ref leak when purging bc_queue

> May be the system is busy and snapshot is too small, and eventually
> process_broadcast() should get called. Deleting a slave does nothing
> about cancelling the work-queue so it would happen eventually.

> The change that Herbert proposed is correct. When packets are enqueued
> for processing later a dev reference is taken and it's removed when
> it's processed when it gets scheduled. The backlog is per port so it
> makes sense to remove reference(s) before purging the queue prior to
> deleting the port.

I only included the snapshot of the logs that's relevant. The system in question has been left in that state for hours, without ever seeing process_broadcast() being called. And, yes I did check the cpu load, and the system was running at around 20% load. So, I don't think that's the case. I would suggest to take closer look at the code in mtacvlan_dellink(), where it performs unlink and unregister:

void macvlan_dellink(struct net_device *dev, struct list_head *head)
{
	struct macvlan_dev *vlan = netdev_priv(dev);
	list_del_rcu(&vlan->list);
	unregister_netdevice_queue(dev, head);
	netdev_upper_dev_unlink(vlan->lowerdev, dev);
}

As I stated in my reply to Herbert initially, the code change he suggested is correct and needed, but not enough. We have tested with his code change and observed the same behavior. I can guarantee you that the code change to macvlan_port_destroy() has no effect on this issue, since the function macvlan_port_destroy () is not even called during the operation. 

Here is the forced stack trace that I caused to show the removal call:
Apr 20 06:23:40 OS10 kernel:  [<ffffffff810d312c>] __netdev_adjacent_dev_remove+0x3c/0x1a0
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81bb6e87>] __netdev_adjacent_dev_unlink_lists+0x67/0x69
Apr 20 06:23:40 OS10 kernel:  [<ffffffff810d32a0>] __netdev_adjacent_dev_unlink+0x82/0x40
Apr 20 06:23:40 OS10 kernel:  [<ffffffff811d31e0>] netdev_upper_dev_unlink+0x10/0x20
Apr 20 06:23:40 OS10 kernel:  [<ffffffff8180e770>] macvlan_dellink+0x50/0x130
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a2ca27>] rtnl_dellink+0xb7/0x120
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a609ab>] ? __netlink_ns_capable+0x3b/0x40
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a2a6c5>] rtnetlink_rcv_msg+0x95/0x250
Apr 20 06:23:40 OS10 kernel:  [<ffffffff811c1499>] ? zone_statistics+0x89/0xa0
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a0a9de>] ? __alloc_skb+0x7e/0x2a0
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a2a630>] ? rtnetlink_rcv+0x30/0x30
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a64f59>] netlink_rcv_skb+0xa9/0xc0
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a2a628>] rtnetlink_rcv+0x28/0x30
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a64603>] netlink_unicast+0xf3/0x200
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a64a1e>] netlink_sendmsg+0x30e/0x680
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a014fb>] sock_sendmsg+0x8b/0xc0
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a011ee>] ? move_addr_to_kernel.part.18+0x1e/0x60
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a01ff1>] ? move_addr_to_kernel+0x21/0x30
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a018f6>] ___sys_sendmsg+0x376/0x390
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a0019f>] ? sock_destroy_inode+0x2f/0x40
Apr 20 06:23:40 OS10 kernel:  [<ffffffff810a161c>] ? __do_page_fault+0x20c/0x560
Apr 20 06:23:40 OS10 kernel:  [<ffffffff812279ad>] ? dput+0xad/0x180
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81230a74>] ? mntput+0x24/0x40
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81212a50>] ? __fput+0x190/0x220
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a026b2>] __sys_sendmsg+0x42/0x80
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a02702>] SyS_sendmsg+0x12/0x20
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81bc86cd>] system_call_fast_compare_end+0x10/0x15

  reply	other threads:[~2017-04-21 20:47 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1492618528011.11322@Dell.com>
2017-04-20 12:55 ` macvlan: Fix device ref leak when purging bc_queue Herbert Xu
2017-04-20 16:09   ` Joe.Ghalam
2017-04-21  4:40     ` Herbert Xu
2017-04-21 14:40       ` Joe.Ghalam
2017-04-21 15:02         ` David Miller
2017-04-21 19:23         ` Mahesh Bandewar (महेश बंडेवार)
2017-04-21 20:37           ` Joe.Ghalam [this message]
2017-04-24  7:56         ` Herbert Xu
2017-04-24 15:01           ` Joe.Ghalam
2017-04-24 15:10             ` David Miller
2017-04-24 15:30               ` Joe.Ghalam
2017-04-25 14:42   ` David Miller
2017-04-25 15:19     ` Joe.Ghalam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1492807076414.83805@Dell.com \
    --to=joe.ghalam@dell.com \
    --cc=Clifford.Wichmann@dell.com \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=maheshb@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).