From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752245AbcGADG2 (ORCPT ); Thu, 30 Jun 2016 23:06:28 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:7726 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751948AbcGADG0 (ORCPT ); Thu, 30 Jun 2016 23:06:26 -0400 Subject: Re: [PATCH v2] notifier: Fix soft lockup for notifier_call_chain(). To: Eric Dumazet References: <57720369.6040507@huawei.com> <1467090824.6850.185.camel@edumazet-glaptop3.roam.corp.google.com> <5772149D.6040304@huawei.com> <1467094963.6850.189.camel@edumazet-glaptop3.roam.corp.google.com> <1467095271.6850.192.camel@edumazet-glaptop3.roam.corp.google.com> CC: , , , Eric Dumazet , "David S. Miller" , Netdev , Cong Wang From: Ding Tianhong Message-ID: <5775DE23.40101@huawei.com> Date: Fri, 1 Jul 2016 11:06:11 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <1467095271.6850.192.camel@edumazet-glaptop3.roam.corp.google.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.22.246] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090205.5775DE29.001F,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 74210be02a6b91a4a51a8fedefffaad2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2016/6/28 14:27, Eric Dumazet wrote: > On Tue, 2016-06-28 at 08:22 +0200, Eric Dumazet wrote: > >> Follow the stack trace and add another cond_resched() where it is needed >> then ? >> >> Lot of this code was written decade ago where nobody expected a root >> user was going to try hard to crash its host ;) >> >> I did not check if the following is valid (Maybe __fib6_clean_all() is >> called with some spinlock/rwlock held) > > Well, fib6_run_gc() can call it with > spin_lock_bh(&net->ipv6.fib6_gc_lock) so this wont work. > > We need more invasive changes. > > > Hi Eric: I debug this problem, and found that the __fib6_clean_all() would not hold the cpu more than 1 second event though there is a lot of ipv6 address to deal with, but the notifier_chian would call the ipv6 notifier several times and hold the cpu for a long time, so add cond_resched() in the addrconf_ifdown could solve the problem correctly, I think your first solution is the good way to fix this bug. Thanks Ding