From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:58671 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751176AbcLECGq (ORCPT ); Sun, 4 Dec 2016 21:06:46 -0500 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id uB525HaJ034449 for ; Sun, 4 Dec 2016 21:06:45 -0500 Received: from e36.co.us.ibm.com (e36.co.us.ibm.com [32.97.110.154]) by mx0a-001b2d01.pphosted.com with ESMTP id 2746ht0md1-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Sun, 04 Dec 2016 21:06:45 -0500 Received: from localhost by e36.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sun, 4 Dec 2016 19:06:44 -0700 Date: Sun, 4 Dec 2016 18:06:47 -0800 From: "Paul E. McKenney" To: Ding Tianhong Cc: gregkh@linuxfoundation.org, dhaval.giani@oracle.com, stable@vger.kernel.org, stable-commits@vger.kernel.org Subject: Re: Patch "rcu: Fix soft lockup for rcu_nocb_kthread" has been added to the 4.8-stable tree Reply-To: paulmck@linux.vnet.ibm.com References: <148075520919123@kroah.com> <32e0a260-b726-08b3-9124-99bdc1045278@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <32e0a260-b726-08b3-9124-99bdc1045278@huawei.com> Message-Id: <20161205020647.GX3924@linux.vnet.ibm.com> Sender: stable-owner@vger.kernel.org List-ID: Sorry, Ding, but this patch has been shown to fix other people's problems as well as a few of yours. The fact that it doesn't solve -all- your problems is absolutely no reason for you to stand in the way of its solving these other people's problems. Greg, please do add this patch to your -stable trees. Thanx, Paul On Mon, Dec 05, 2016 at 09:30:22AM +0800, Ding Tianhong wrote: > Hi Greg: > > Please don't add this patch to the stable tree, I still discus with Paul about this problem and maybe original solution is not > the best one, I need more time to check, thanks. > > Ding > > On 2016/12/3 16:53, gregkh@linuxfoundation.org wrote: > > > > This is a note to let you know that I've just added the patch titled > > > > rcu: Fix soft lockup for rcu_nocb_kthread > > > > to the 4.8-stable tree which can be found at: > > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary > > > > The filename of the patch is: > > rcu-fix-soft-lockup-for-rcu_nocb_kthread.patch > > and it can be found in the queue-4.8 subdirectory. > > > > If you, or anyone else, feels it should not be added to the stable tree, > > please let know about it. > > > > > >>From bedc1969150d480c462cdac320fa944b694a7162 Mon Sep 17 00:00:00 2001 > > From: Ding Tianhong > > Date: Wed, 15 Jun 2016 15:27:36 +0800 > > Subject: rcu: Fix soft lockup for rcu_nocb_kthread > > > > From: Ding Tianhong > > > > commit bedc1969150d480c462cdac320fa944b694a7162 upstream. > > > > Carrying out the following steps results in a softlockup in the > > RCU callback-offload (rcuo) kthreads: > > > > 1. Connect to ixgbevf, and set the speed to 10Gb/s. > > 2. Use ifconfig to bring the nic up and down repeatedly. > > > > [ 317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready > > [ 368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15] > > [ 368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > > [ 368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000 > > [ 368.106005] RIP: 0010:[] [] fib_table_lookup+0x14/0x390 > > [ 368.106005] RSP: 0018:ffff88061fc83ce8 EFLAGS: 00000286 > > [ 368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001 > > [ 368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00 > > [ 368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000 > > [ 368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58 > > [ 368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0 > > [ 368.106005] FS: 0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000 > > [ 368.106005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0 > > [ 368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > [ 368.106005] Stack: > > [ 368.106005] 00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00 > > [ 368.106005] ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146 > > [ 368.106005] ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0 > > [ 368.106005] Call Trace: > > [ 368.106005] > > [ 368.106005] > > [ 368.106005] [] ip_route_input_noref+0x516/0xbd0 > > [ 368.106005] [] ? skb_release_data+0xd6/0x110 > > [ 368.106005] [] ? kfree_skb+0x3a/0xa0 > > [ 368.106005] [] ip_rcv_finish+0x29f/0x350 > > [ 368.106005] [] ip_rcv+0x234/0x380 > > [ 368.106005] [] __netif_receive_skb_core+0x676/0x870 > > [ 368.106005] [] __netif_receive_skb+0x18/0x60 > > [ 368.106005] [] process_backlog+0xae/0x180 > > [ 368.106005] [] net_rx_action+0x152/0x240 > > [ 368.106005] [] __do_softirq+0xef/0x280 > > [ 368.106005] [] call_softirq+0x1c/0x30 > > [ 368.106005] > > [ 368.106005] > > [ 368.106005] [] do_softirq+0x65/0xa0 > > [ 368.106005] [] local_bh_enable+0x94/0xa0 > > [ 368.106005] [] rcu_nocb_kthread+0x232/0x370 > > [ 368.106005] [] ? wake_up_bit+0x30/0x30 > > [ 368.106005] [] ? rcu_start_gp+0x40/0x40 > > [ 368.106005] [] kthread+0xcf/0xe0 > > [ 368.106005] [] ? kthread_create_on_node+0x140/0x140 > > [ 368.106005] [] ret_from_fork+0x58/0x90 > > [ 368.106005] [] ? kthread_create_on_node+0x140/0x140 > > > > ==================================cut here============================== > > > > It turns out that the rcuos callback-offload kthread is busy processing > > a very large quantity of RCU callbacks, and it is not reliquishing the > > CPU while doing so. This commit therefore adds an cond_resched_rcu_qs() > > within the loop to allow other tasks to run. > > > > Signed-off-by: Ding Tianhong > > [ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ] > > Signed-off-by: Paul E. McKenney > > Cc: Dhaval Giani > > Signed-off-by: Greg Kroah-Hartman > > > > --- > > kernel/rcu/tree_plugin.h | 1 + > > 1 file changed, 1 insertion(+) > > > > --- a/kernel/rcu/tree_plugin.h > > +++ b/kernel/rcu/tree_plugin.h > > @@ -2173,6 +2173,7 @@ static int rcu_nocb_kthread(void *arg) > > cl++; > > c++; > > local_bh_enable(); > > + cond_resched_rcu_qs(); > > list = next; > > } > > trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1); > > > > > > Patches currently in stable-queue which might be from dingtianhong@huawei.com are > > > > queue-4.8/rcu-fix-soft-lockup-for-rcu_nocb_kthread.patch > > > > . > > >