From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752751AbcEQTPd (ORCPT <rfc822;w@1wt.eu>);
	Tue, 17 May 2016 15:15:33 -0400
Received: from e37.co.us.ibm.com ([32.97.110.158]:52400 "EHLO
	e37.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751218AbcEQTPa (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 17 May 2016 15:15:30 -0400
X-IBM-Helo: d03dlp02.boulder.ibm.com
X-IBM-MailFrom: paulmck@linux.vnet.ibm.com
X-IBM-RcptTo: linux-kernel@vger.kernel.org
Date: Tue, 17 May 2016 12:15:29 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: "santosh.shilimkar@oracle.com" <santosh.shilimkar@oracle.com>
Cc: linux-kernel@vger.kernel.org, Sasha Levin <sasha.levin@oracle.com>
Subject: Re: [rcu_sched stall] regression/miss-config ?
Message-ID: <20160517191529.GK3528@linux.vnet.ibm.com>
Reply-To: paulmck@linux.vnet.ibm.com
References: <b074c577-03ee-ea23-2bee-79ff2f317c6c@oracle.com>
 <c83a6c22-d650-0a89-8b05-1ed6080d36de@oracle.com>
 <20160516120329.GB3528@linux.vnet.ibm.com>
 <3d5a2847-86d2-cd15-a7e8-8f4b2ee5a64d@oracle.com>
 <20160516173401.GG3528@linux.vnet.ibm.com>
 <67eb4bf6-c3d2-b9af-30ff-713a6d75e773@oracle.com>
 <20160517005820.GI3528@linux.vnet.ibm.com>
 <ae356fa9-6eb1-3e39-67ab-f1ad831205f9@oracle.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <ae356fa9-6eb1-3e39-67ab-f1ad831205f9@oracle.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 16051719-0025-0000-0000-000040C32714
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, May 17, 2016 at 06:46:22AM -0700, santosh.shilimkar@oracle.com wrote:
> On 5/16/16 5:58 PM, Paul E. McKenney wrote:
> >On Mon, May 16, 2016 at 12:49:41PM -0700, Santosh Shilimkar wrote:
> >>On 5/16/2016 10:34 AM, Paul E. McKenney wrote:
> >>>On Mon, May 16, 2016 at 09:33:57AM -0700, Santosh Shilimkar wrote:
> 
> [...]
> 
> >>>Are you running CONFIG_NO_HZ_FULL=y?  If so, the problem might be that
> >>>you need more housekeeping CPUs than you currently have configured.
> >>>
> >>Yes, CONFIG_NO_HZ_FULL=y. Do you mean "CONFIG_NO_HZ_FULL_ALL=y" for
> >>book keeping. Seems like without that clock-event code will just use
> >>CPU0 for things like broadcasting which might become bottleneck.
> >>This could explain connect the hrtimer_interrupt() path getting slowed
> >>down because of book keeping bottleneck.
> >>
> >>$cat .config | grep NO_HZ
> >>CONFIG_NO_HZ_COMMON=y
> >># CONFIG_NO_HZ_IDLE is not set
> >>CONFIG_NO_HZ_FULL=y
> >># CONFIG_NO_HZ_FULL_ALL is not set
> >># CONFIG_NO_HZ_FULL_SYSIDLE is not set
> >>CONFIG_NO_HZ=y
> >># CONFIG_RCU_FAST_NO_HZ is not set
> >
> >Yes, CONFIG_NO_HZ_FULL_ALL=y would give you only one CPU for all
> >housekeeping tasks, including the RCU grace-period kthreads.  So you are
> >booting without any nohz_full boot parameter?  You can end up with the
> >same problem with CONFIG_NO_HZ_FULL=y and the nohz_full boot parameter
> >that you can with CONFIG_NO_HZ_FULL_ALL=y.
> >
> I see. Yes, the systems are booting without nohz_full boot parameter.
> Will try to add more CPUs to it & update the thread
> after the verification since it takes time to reproduce the issue.
> 
> Thanks for discussion so far Paul. Its very insightful for me.

Please let me know how things go with further testing, especially with
the priority setting.

							Thanx, Paul