All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Sachin Sant <sachinp@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Matt Fleming <matt@codeblueprint.co.uk>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	Mike Galbraith <efault@gmx.de>,
	LKML <linux-kernel@vger.kernel.org>,
	"linux-next@vger.kernel.org" <linux-next@vger.kernel.org>,
	Ross Zwisler <zwisler@gmail.com>
Subject: Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
Date: Mon, 6 Feb 2017 07:10:48 -0800	[thread overview]
Message-ID: <20170206151048.GK30506@linux.vnet.ibm.com> (raw)
In-Reply-To: <45DC157B-EA1A-4E1C-9449-8DD317837FCB@linux.vnet.ibm.com>

On Mon, Feb 06, 2017 at 11:53:10AM +0530, Sachin Sant wrote:
> 
> >>> I've seen it on tip. It looks like hot unplug goes really slow when
> >>> there's running tasks on the CPU being taken down.
> >>> 
> >>> What I did was something like:
> >>> 
> >>>  taskset -p $((1<<1)) $$
> >>>  for ((i=0; i<20; i++)) do while :; do :; done & done
> >>> 
> >>>  taskset -p $((1<<0)) $$
> >>>  echo 0 > /sys/devices/system/cpu/cpu1/online
> >>> 
> >>> And with those 20 tasks stuck sucking cycles on CPU1, the unplug goes
> >>> _really_ slow and the RCU stall triggers. What I suspect happens is that
> >>> hotplug stops participating in the RCU state machine early, but only
> >>> tells RCU about it really late, and in between it gets suspicious it
> >>> takes too long.
> >>> 
> >>> I've yet to dig through the RCU code to figure out the exact sequence of
> >>> events, but found the above to be fairly reliable in triggering the
> >>> issue.
> > 
> >> If you send me the full splat from the dmesg and the RCU portions of
> >> .config, I will take a look.  Is this new behavior, or a new test?
> > 
> 
> I have sent the required files to you via separate email.
> 
> > If new behavior, I would be most suspicious of these commits in -rcu which
> > recently entered -tip:
> > 
> > 19e4d983cda1 rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actions
> > 913324b1364f rcu: Eliminate flavor scan in rcu_momentary_dyntick_idle()
> > fcdcfefafa45 rcu: Pull rcu_qs_ctr into rcu_dynticks structure
> > 0919a0b7e7a5 rcu: Pull rcu_sched_qs_mask into rcu_dynticks structure
> > caa7c8e34293 rcu: Make rcu_note_context_switch() do deferred NOCB wakeups
> > 41e4b159d516 rcu: Make rcu_all_qs() do deferred NOCB wakeups
> > b457a3356a68 rcu: Make call_rcu() do deferred NOCB wakeups
> > 
> > Does reverting any of these help?
> 
> I tried reverting the above commits. That does not help. I can still recreate the issue.

Thank you for testing, Sachin!

Could you please try building and testing with CONFIG_RCU_BOOST=y?
You will need to enable CONFIG_RCU_EXPERT=y to see this Kconfig option.

							Thanx, Paul

  reply	other threads:[~2017-02-06 15:10 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-21 13:38 [PATCH v2 0/7] sched: Diagnostic checks for missing rq clock updates Matt Fleming
2016-09-21 13:38 ` [PATCH v2 1/7] sched/fair: Update the rq clock before detaching tasks Matt Fleming
2016-10-03 12:49   ` Peter Zijlstra
2016-10-03 14:37     ` Matt Fleming
2016-10-03 14:42       ` Peter Zijlstra
2016-09-21 13:38 ` [PATCH v2 2/7] sched/fair: Update rq clock before waking up new task Matt Fleming
2016-09-21 13:38 ` [PATCH v2 3/7] sched/fair: Update rq clock in task_hot() Matt Fleming
2016-09-21 13:38 ` [PATCH v2 4/7] sched: Add wrappers for lockdep_(un)pin_lock() Matt Fleming
2017-01-14 12:40   ` [tip:sched/core] sched/core: " tip-bot for Matt Fleming
2016-09-21 13:38 ` [PATCH v2 5/7] sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock Matt Fleming
2017-01-14 12:41   ` [tip:sched/core] " tip-bot for Matt Fleming
2016-09-21 13:38 ` [PATCH v2 6/7] sched/fair: Push rq lock pin/unpin into idle_balance() Matt Fleming
2017-01-14 12:41   ` [tip:sched/core] " tip-bot for Matt Fleming
2016-09-21 13:38 ` [PATCH v2 7/7] sched/core: Add debug code to catch missing update_rq_clock() Matt Fleming
2016-09-21 15:58   ` Petr Mladek
2016-09-21 19:08     ` Matt Fleming
2016-09-21 19:46       ` Thomas Gleixner
2016-09-22  0:44       ` Sergey Senozhatsky
2016-09-22  8:04     ` Peter Zijlstra
2016-09-22  8:36       ` Jan Kara
2016-09-22  9:39         ` Peter Zijlstra
2016-09-22 10:17           ` Peter Zijlstra
2017-01-14 12:44   ` [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls tip-bot for Matt Fleming
2017-01-30 21:24     ` Michael Ellerman
2017-01-30 21:24       ` Michael Ellerman
2017-01-30 21:34       ` Matt Fleming
2017-01-31  8:35         ` Michael Ellerman
2017-01-31  8:35           ` Michael Ellerman
2017-01-31 11:00         ` Sachin Sant
2017-01-31 11:00           ` Sachin Sant
2017-01-31 11:00           ` Sachin Sant
2017-01-31 11:48           ` Mike Galbraith
2017-01-31 11:48             ` Mike Galbraith
2017-01-31 11:48             ` Mike Galbraith
2017-01-31 17:22             ` Ross Zwisler
2017-01-31 17:22               ` Ross Zwisler
2017-02-02 15:55               ` Peter Zijlstra
2017-02-02 22:01                 ` Matt Fleming
2017-02-03  3:05                 ` Mike Galbraith
2017-02-03  4:33                 ` Sachin Sant
2017-02-03  4:33                   ` Sachin Sant
2017-02-03  8:53                   ` Peter Zijlstra
2017-02-03  8:53                     ` Peter Zijlstra
2017-02-03 11:04                     ` Sachin Sant
2017-02-03 11:04                       ` Sachin Sant
2017-02-03 12:59                     ` Mike Galbraith
2017-02-03 13:37                       ` Peter Zijlstra
2017-02-03 13:52                         ` Mike Galbraith
2017-02-03 15:44                         ` Paul E. McKenney
2017-02-03 15:54                           ` Paul E. McKenney
2017-02-06  6:23                             ` Sachin Sant
2017-02-06  6:23                               ` Sachin Sant
2017-02-06 15:10                               ` Paul E. McKenney [this message]
2017-02-06 15:14                                 ` Paul E. McKenney
2017-02-03 13:04                 ` Borislav Petkov
2017-02-22  9:03                 ` Wanpeng Li
2017-02-24  9:16                 ` [tip:sched/urgent] sched/core: Fix update_rq_clock() splat on hotplug (and suspend/resume) tip-bot for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170206151048.GK30506@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=matt@codeblueprint.co.uk \
    --cc=peterz@infradead.org \
    --cc=sachinp@linux.vnet.ibm.com \
    --cc=zwisler@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.