From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754027AbcETEF1 (ORCPT <rfc822;w@1wt.eu>);
	Fri, 20 May 2016 00:05:27 -0400
Received: from e18.ny.us.ibm.com ([129.33.205.208]:44999 "EHLO
	e18.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752075AbcETEFZ (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 20 May 2016 00:05:25 -0400
X-IBM-Helo: d01dlp01.pok.ibm.com
X-IBM-MailFrom: paulmck@linux.vnet.ibm.com
X-IBM-RcptTo: linux-kernel@vger.kernel.org
Date: Thu, 19 May 2016 21:05:27 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: linux-kernel@vger.kernel.org, Sasha Levin <sasha.levin@oracle.com>
Subject: Re: [rcu_sched stall] regression/miss-config ?
Message-ID: <20160520040527.GZ3528@linux.vnet.ibm.com>
Reply-To: paulmck@linux.vnet.ibm.com
References: <b074c577-03ee-ea23-2bee-79ff2f317c6c@oracle.com>
 <c83a6c22-d650-0a89-8b05-1ed6080d36de@oracle.com>
 <20160516120329.GB3528@linux.vnet.ibm.com>
 <3d5a2847-86d2-cd15-a7e8-8f4b2ee5a64d@oracle.com>
 <20160516173401.GG3528@linux.vnet.ibm.com>
 <67eb4bf6-c3d2-b9af-30ff-713a6d75e773@oracle.com>
 <20160517005820.GI3528@linux.vnet.ibm.com>
 <ae356fa9-6eb1-3e39-67ab-f1ad831205f9@oracle.com>
 <20160517191529.GK3528@linux.vnet.ibm.com>
 <140d3624-7167-3e72-3ef2-a4da47ce986c@oracle.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <140d3624-7167-3e72-3ef2-a4da47ce986c@oracle.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 16052004-0045-0000-0000-0000043D29EB
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, May 19, 2016 at 04:45:26PM -0700, Santosh Shilimkar wrote:
> Hi Paul,
> 
> On 5/17/2016 12:15 PM, Paul E. McKenney wrote:
> >On Tue, May 17, 2016 at 06:46:22AM -0700, santosh.shilimkar@oracle.com wrote:
> >>On 5/16/16 5:58 PM, Paul E. McKenney wrote:
> >>>On Mon, May 16, 2016 at 12:49:41PM -0700, Santosh Shilimkar wrote:
> >>>>On 5/16/2016 10:34 AM, Paul E. McKenney wrote:
> >>>>>On Mon, May 16, 2016 at 09:33:57AM -0700, Santosh Shilimkar wrote:
> >>
> >>[...]
> >>
> >>>>>Are you running CONFIG_NO_HZ_FULL=y?  If so, the problem might be that
> >>>>>you need more housekeeping CPUs than you currently have configured.
> >>>>>
> >>>>Yes, CONFIG_NO_HZ_FULL=y. Do you mean "CONFIG_NO_HZ_FULL_ALL=y" for
> >>>>book keeping. Seems like without that clock-event code will just use
> >>>>CPU0 for things like broadcasting which might become bottleneck.
> >>>>This could explain connect the hrtimer_interrupt() path getting slowed
> >>>>down because of book keeping bottleneck.
> >>>>
> >>>>$cat .config | grep NO_HZ
> >>>>CONFIG_NO_HZ_COMMON=y
> >>>># CONFIG_NO_HZ_IDLE is not set
> >>>>CONFIG_NO_HZ_FULL=y
> >>>># CONFIG_NO_HZ_FULL_ALL is not set
> >>>># CONFIG_NO_HZ_FULL_SYSIDLE is not set
> >>>>CONFIG_NO_HZ=y
> >>>># CONFIG_RCU_FAST_NO_HZ is not set
> >>>
> >>>Yes, CONFIG_NO_HZ_FULL_ALL=y would give you only one CPU for all
> >>>housekeeping tasks, including the RCU grace-period kthreads.  So you are
> >>>booting without any nohz_full boot parameter?  You can end up with the
> >>>same problem with CONFIG_NO_HZ_FULL=y and the nohz_full boot parameter
> >>>that you can with CONFIG_NO_HZ_FULL_ALL=y.
> >>>
> >>I see. Yes, the systems are booting without nohz_full boot parameter.
> >>Will try to add more CPUs to it & update the thread
> >>after the verification since it takes time to reproduce the issue.
> >>
> >>Thanks for discussion so far Paul. Its very insightful for me.
> >
> >Please let me know how things go with further testing, especially with
> >the priority setting.
> >
> Sorry for delay. I manage to get information about XEN usecase
> custom config as discussed above. To reduce variables, I disabled
> "CONFIG_NO_HZ_FULL" altogether. So the effective setting was:
> 
> CONFIG_NO_HZ_IDLE=y
> # CONFIG_NO_HZ_FULL is not set
> CONFIG_TREE_RCU_TRACE=y
> CONFIG_RCU_KTHREAD_PRIO=1
> CONFIG_RCU_CPU_STALL_TIMEOUT=21
> CONFIG_RCU_TRACE=y
> 
> Unfortunately the XEN test still failed. Log end of
> the email. This test(s) is bit peculiar though since
> its database running in VM with 1 or 2 CPUs. One of
> the suspect is because the database RT processes are
> hogging the CPU(s), kernel RCU thread is not getting chance
> to run which eventually results in stall. Does it
> make sense ?
> 
> Please note that its non-preempt kernel using RT processes. ;-)

If you have enough real-time processes to consume all CPU, you will
indeed starve the grace-period kthread, so what you see below would
then be expected behavior.

Try setting CONFIG_RCU_KTHREAD_PRIO to be larger than the real-time
priority of your processes.

							Thanx, Paul

> # cat .config | grep PREEMPT
> CONFIG_PREEMPT_NOTIFIERS=y
> # CONFIG_PREEMPT_NONE is not set
> CONFIG_PREEMPT_VOLUNTARY=y
> # CONFIG_PREEMPT is not set
> 
> Regards,
> Santosh
> ...
> ....
> rcu_sched kthread starved for 399032 jiffies!
> INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0,
> t=462037 jiffies, g=118888, c=118887, q=0)
> All QSes seen, last rcu_sched kthread activity 462037
> (4296277632-4295815595), jiffies_till_next_fqs=3, root ->qsmask 0x0
> ocssd.bin       R  running task        0 15375      1 0x00000000
>  0000000000000000 ffff8800ec003bc8 ffffffff810a8581 ffffffff81abf980
>  000000000001d068 ffff8800ec003c28 ffffffff810e9c98 0000000000000000
>  0000000000000086 0000000000000000 0000000000000086 0000000000000082
> Call Trace:
>  <IRQ>  [<ffffffff810a8581>] sched_show_task+0xb1/0x120
>  [<ffffffff810e9c98>] print_other_cpu_stall+0x288/0x2d0
>  [<ffffffff810e9e60>] __rcu_pending+0x180/0x230
>  [<ffffffff810e9fa5>] rcu_check_callbacks+0x95/0x140
>  [<ffffffff810edf92>] update_process_times+0x42/0x70
>  [<ffffffff810ffc79>] tick_sched_handle+0x39/0x80
>  [<ffffffff810ffee2>] tick_sched_timer+0x52/0xa0
>  [<ffffffff810f0424>] __run_hrtimer+0x74/0x1d0
>  [<ffffffff810ffe90>] ? tick_nohz_handler+0xc0/0xc0
>  [<ffffffff810f07b2>] hrtimer_interrupt+0x102/0x240
>  [<ffffffff8100aaae>] xen_timer_interrupt+0x2e/0x130
>  [<ffffffff8143347a>] ? add_interrupt_randomness+0x3a/0x1f0
>  [<ffffffff81384290>] ? store_cursor_blink+0xc0/0xc0
>  [<ffffffff810d9de4>] handle_irq_event_percpu+0x54/0x1b0
>  [<ffffffff810dd477>] handle_percpu_irq+0x47/0x70
>  [<ffffffff810d9d27>] generic_handle_irq+0x27/0x40
>  [<ffffffff813e53ca>] evtchn_2l_handle_events+0x25a/0x260
>  [<ffffffff81085541>] ? __do_softirq+0x191/0x2f0
>  [<ffffffff813e2c2f>] __xen_evtchn_do_upcall+0x4f/0x90
>  [<ffffffff813e3404>] xen_evtchn_do_upcall+0x34/0x50
>  [<ffffffff816c761e>] xen_hvm_callback_vector+0x6e/0x80
>  <EOI>
> rcu_sched kthread starved for 462037 jiffies!
> 
>