From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?B?UmFmYcWCIE1pxYJlY2tp?= Subject: Re: Repeating "unregister_netdevice: waiting for lo to become free" caused by upstream 76da0704507bb ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER") Date: Wed, 25 Apr 2018 16:44:58 +0200 Message-ID: <07b74ef0-5ce6-b391-7b0f-59685350e802@gmail.com> References: <228a9486-06af-2cbb-d4b9-677642f2d754@gmail.com> <5e44211e-7056-96ee-0d12-4f6dc8b22734@yandex-team.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Cc: Greg Kroah-Hartman , Stable , Dan Streetman , Dan Streetman , Mathias Tillman To: Konstantin Khlebnikov , WANG Cong , "David S. Miller" , Alexey Kuznetsov , Hideaki YOSHIFUJI , Network Development , jeffy , David Ahern Return-path: In-Reply-To: <5e44211e-7056-96ee-0d12-4f6dc8b22734@yandex-team.ru> Content-Language: en-US Sender: stable-owner@vger.kernel.org List-Id: netdev.vger.kernel.org On 25.04.2018 16:30, Konstantin Khlebnikov wrote: > On 25.04.2018 17:16, Rafał Miłecki wrote: >> On 23.04.2018 15:08, Rafał Miłecki wrote: >>> I've just updated my kernel 4.4.x and noticed a regression. Bisecting >>> pointed me to the commit 2417da3f4d6bc ("ipv6: only call >>> ip6_route_dev_notify() once for NETDEV_UNREGISTER") [0] which is >>> backport of upstream 76da0704507bb. That backported commit has >>> appeared in a 4.4.103. >>> >>> I use OpenWrt/LEDE [1] distribution and LXC [2] 1.1.5. After stopping >>> a container I start getting these messages: >>> [ 229.419188] unregister_netdevice: waiting for lo to become free. Usage count = 1 >>> [ 239.660408] unregister_netdevice: waiting for lo to become free. Usage count = 1 >>> [ 249.839189] unregister_netdevice: waiting for lo to become free. Usage count = 1 >>> (...) >>> >>> Trying to start LXC nevertheless results in lxc-start command hang >>> around network configuration. Trying to query LXC state afterwards >>> results in a lxc-info command hang too. >>> >>> I tried Googling for this issue and found similar reports: >>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729637 >>> https://github.com/fnproject/fn/issues/686 >>> https://lime-technology.com/forums/topic/66863-kernelunregister_netdevice-waiting-for-lo-to-become-free-usage-count-1/ >>> all of them related to the Docker, which is probably a similar use >>> case to the LXC. >>> >>> I couldn't find any reference to commit 76da0704507bb that could >>> suggest fixing the problem I'm seeing. >>> >>> Does anyone have an idea what is the issue I'm seeing about? Or even >>> better, how to fix it? Can I provide any additional info that would >>> help? >>> >>> >>> [0] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.4.y&id=2417da3f4d6bc4fc6c77f613f0e2264090892aa5 >>> [1] https://openwrt.org/ >>> [2] https://linuxcontainers.org/ >> >> Today I tried 4.14.34 to see if that helps. Unfortunately it doesn't. I >> still experience the same problem. >> >> From reading various reports regarding that "unregister_netdevice: >> waiting for lo to become free" message it appears the problem is caused >> by a leaking dst refcnt somewhere in the kernel code. >> >> I found links to few commit fixing leaks at various places: >> 4a31a6b19f9dd ("sctp: fix dst refcnt leak in sctp_v4_get_dst") >> 957d761cf91cd ("sctp: fix dst refcnt leak in sctp_v6_get_dst()") >> 4ee806d51176b ("net: tcp: close sock if net namespace is exiting") >> d747a7a51b009 ("tcp: reset sk_rx_dst in tcp_disconnect()") >> 751eb6b6042a5 ("ipv6: addrconf: fix dev refcont leak when DAD failed") >> >> All above patches are present in the linux-v4.4.y and are part of kernel >> 4.4.124 I use. So it seems I'm facing yet another dst refcnt leak. >> >> Could commit 2417da3f4d6bc ("ipv6: only call ip6_route_dev_notify() once >> for NETDEV_UNREGISTER") introduce a new dst refcnt leak? Or does it only >> expost existing one? > > Mathias Tillman reported this as "4.4.103 linux kernel regression". > Last message in that thread (which I couldn't find in mailing list archives) had: > | As it turns out, it's due to a patch in the Turris Omnia/OpenWRT code that adds a in6_dev_get call without calling in6_dev_put. Wow, this is very helpful, thank you! Somehow I didn't even think about OpenWrt downstream patches. Too bad this wasn't reported to the OpenWrt community, I spent 2 days on this. There is indeed: target/linux/generic/patches-4.4/670-ipv6-allow-rejecting-with-source-address-failed-policy.patch [PATCH 1/2] ipv6: allow rejecting with "source address failed policy" I'll move this issue discussion to the OpenWrt/LEDE now, I hope we can sort it out.