From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christoph Paasch Subject: Re: [PATCH net] netlink: make sure -EBUSY won't escape from netlink_insert Date: Wed, 16 Sep 2015 22:41:14 -0700 Message-ID: <20150917054114.GN1983@Chimay.local> References: <20150810.110015.1616067694505641781.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: daniel@iogearbox.net, herbert@gondor.apana.org.au, tgraf@suug.ch, torvalds@linux-foundation.org, netdev@vger.kernel.org To: David Miller Return-path: Received: from mail-pa0-f51.google.com ([209.85.220.51]:33943 "EHLO mail-pa0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753424AbbIQFlP (ORCPT ); Thu, 17 Sep 2015 01:41:15 -0400 Received: by padhy16 with SMTP id hy16so10422016pad.1 for ; Wed, 16 Sep 2015 22:41:15 -0700 (PDT) Content-Disposition: inline In-Reply-To: <20150810.110015.1616067694505641781.davem@davemloft.net> Sender: netdev-owner@vger.kernel.org List-ID: Hello, On 10/08/15 - 11:00:15, David Miller wrote: > From: Daniel Borkmann > Date: Fri, 7 Aug 2015 00:26:41 +0200 > > Linus reports the following deadlock on rtnl_mutex; triggered only > > once so far (extract): > ... > > It seems so far plausible that the recursive call into rtnetlink_rcv() > > looks suspicious. One way, where this could trigger is that the senders > > NETLINK_CB(skb).portid was wrongly 0 (which is rtnetlink socket), so > > the rtnl_getlink() request's answer would be sent to the kernel instead > > to the actual user process, thus grabbing rtnl_mutex() twice. > > > > One theory would be that netlink_autobind() triggered via netlink_sendmsg() > > internally overwrites the -EBUSY error to 0, but where it is wrongly > > originating from __netlink_insert() instead. That would reset the > > socket's portid to 0, which is then filled into NETLINK_CB(skb).portid > > later on. As commit d470e3b483dc ("[NETLINK]: Fix two socket hashing bugs.") > > also puts it, -EBUSY should not be propagated from netlink_insert(). > > > > It looks like it's very unlikely to reproduce. We need to trigger the > > rhashtable_insert_rehash() handler under a situation where rehashing > > currently occurs (one /rare/ way would be to hit ht->elasticity limits > > while not filled enough to expand the hashtable, but that would rather > > require a specifically crafted bind() sequence with knowledge about > > destination slots, seems unlikely). It probably makes sense to guard > > __netlink_insert() in any case and remap that error. It was suggested > > that EOVERFLOW might be better than an already overloaded ENOMEM. > > > > Reference: http://thread.gmane.org/gmane.linux.network/372676 > > Reported-by: Linus Torvalds > > Signed-off-by: Daniel Borkmann > > Applied and queued up for -stable, thanks. can this patch get queued up for 4.1 as well? It seems to fix a similar issue in 4.1.6. Thanks, Christoph