All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: linux-kernel@vger.kernel.org, xiaolong.ye@intel.com,
	tglx@linutronix.de, cmetcalf@mellanox.com, cl@linux.com,
	torvalds@linux-foundation.org, lcapitulino@redhat.com,
	efault@gmx.de, peterz@infradead.org, riel@redhat.com,
	kernellwp@gmail.com, mingo@kernel.org, john.stultz@linaro.org
Subject: Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION
Date: Thu, 7 Dec 2017 09:29:06 -0800	[thread overview]
Message-ID: <20171207172906.GP7829@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAFTL4hzTHSLN5ehnoA6hR7vQ9cm2q5W-Ndgqs8QOg0w73fELcQ@mail.gmail.com>

On Thu, Dec 07, 2017 at 05:14:54PM +0100, Frederic Weisbecker wrote:
> 2017-12-04 18:16 UTC+01:00, Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> > On Mon, Dec 04, 2017 at 04:53:15PM +0100, Frederic Weisbecker wrote:
> >> 2017-12-02 20:24 UTC+01:00, Paul E. McKenney
> >> I would prefer to keep it. It's useful for automated boot testing
> >> based on configs such as 0-day or -tip test machines. But I'm likely
> >> to migrate it to isolcpus implementation. Maybe something along the
> >> lines of CONFIG_CPU_ISOLATION_ALL.
> >
> > How about instead allowing something like "nohz_full=1-" specify that
> > all CPUs other than CPU 0 should be nohz_full CPUs?  That would shrink
> > the code by eliminating CONFIG_NO_HZ_FULL_ALL while still allowing
> > easy automation of that particular scenario.
> >
> > (Right now, the boot code complains about "nohz_full=1-", which means
> > that whatever is generating the boot parameters needs to know how many
> > CPUs there really are, which as you say can be a pain.)
> 
> Yes but automated boot testing is rather based on configs than boot
> options. It's much easier. I think that's how Wu Fengguang lab works,
> and -tip automated tests as well.

So you have gotten bug reports from them?  Because I see splats from
rcutorture testing rather frequently.  This thing is in no way a subtle
low-probability bug.  ;-)

> >> >> Did you have any nohz_full= or isolcpus= boot options?
> >> >
> >> > Replacing CONFIG_NO_HZ_FULL_ALL=y with nohz_full=1-7 works, that
> >> > is CONFIG_NO_HZ_FULL=y, CONFIG_NO_HZ_FULL_ALL=n, and nohz_full=1-7
> >> > on an eight-CPU test.
> >> >
> >> > But it is relatively easy to test.  Running the rcutorture TREE04
> >> > scenario on a four-socket x86 gets me RCU CPU stall warnings within
> >> > a few minutes more than half the time.  ;-)
> >>
> >> Indeed I managed to trigger something. If it's the same thing I should
> >> be able to track down the root cause.
> >>
> >> [  123.907557] ??? Writer stall state RTWS_STUTTER(8) g160 c160 f0x0
> >> ->state 0x1 cpu 2
> >> [  123.915184] rcu_torture_wri S    0   111      2 0x80080000
> >> [  123.920673] Call Trace:
> >> [  123.923096]  ? __schedule+0x2bf/0xbb0
> >> [  123.926715]  ? _raw_spin_unlock_irqrestore+0x59/0x70
> >> [  123.931657]  schedule+0x3c/0x90
> >> [  123.934777]  schedule_timeout+0x1e1/0x560
> >
> > It might well be the same thing, as this schedule_timeout() does look
> > familiar.  I have some diagnostic patches in -rcu, please see below
> > for the overall effect.
> 
> I fear I can hit that even without any nohz_full CPU as well.

Indeed, I do hit that with my TREE01 scenario, which does not set
CONFIG_NO_HZ_FULL.  But it is much less frequent.  The good news is that
I have finally figured out a way to extract information from this thing
without suppressing it.  At the moment it appears to be a rather strange
deadlock involving CPU hotplug, timers, and RCU.

But that is a completely different bug from the ones for which I have
the two patches in my tree.

Anyway, I will keep those two patches because I cannot have the
corresponding bugs possibly hiding RCU bugs in my testing.  If you
put some other fix in place, I will drop those two patches in favor of
your fix.

							Thanx, Paul

> >> [  123.938785]  ? __next_timer_interrupt+0xd0/0xd0
> >> [  123.943276]  stutter_wait+0xc5/0xe0
> >> [  123.946738]  rcu_torture_writer+0x1ae/0x730
> >> [  123.950912]  ? rcu_torture_pipe_update+0xf0/0xf0
> >> [  123.955491]  kthread+0x15f/0x1a0
> >> [  123.958702]  ? kthread_unpark+0x60/0x60
> >> [  123.962523]  ret_from_fork+0x24/0x30
> >> [  123.966091] rcu_preempt: wait state: 1 ->state: 0x402
> >> [  123.971112] rcu_sched: wait state: 1 ->state: 0x402
> >> [  123.975953] rcu_bh: wait state: 1 ->state: 0x402
> >
> > ------------------------------------------------------------------------
> >
> > diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> > index ffebcf878fba..23af27461d8c 100644
> > --- a/kernel/time/timer.c
> > +++ b/kernel/time/timer.c
> > @@ -1755,8 +1755,13 @@ static void process_timeout(struct timer_list *t)
> >   */
> >  signed long __sched schedule_timeout(signed long timeout)
> >  {
> > +	struct timer_base *base;
> >  	struct process_timer timer;
> >  	unsigned long expire;
> > +	unsigned long flags;
> > +	unsigned long i;
> > +	unsigned int idx, idx_now;
> > +	unsigned long j;
> >
> >  	switch (timeout)
> >  	{
> > @@ -1793,6 +1798,17 @@ signed long __sched schedule_timeout(signed long
> > timeout)
> >  	timer_setup_on_stack(&timer.timer, process_timeout, 0);
> >  	__mod_timer(&timer.timer, expire, 0);
> >  	schedule();
> > +	j = jiffies;
> > +	if (timeout < 5 && time_after(j, expire + 8 * HZ) &&
> > timer_pending(&timer.timer)) {
> > +		base = lock_timer_base(&timer.timer, &flags);
> > +		idx = timer_get_idx(&timer.timer);
> > +		idx_now = calc_wheel_index(j, base->clk);
> > +		raw_spin_unlock_irqrestore(&base->lock, flags);
> > +		pr_info("%s: Waylayed timer base->clk: %#lx jiffies: %#lx
> > base->next_expiry: %#lx timer->flags: %#x timer->expires %#lx idx: %x
> > idx_now: %x base->pending_map ", __func__, base->clk, j, base->next_expiry,
> > timer.timer.flags, timer.timer.expires, idx, idx_now);
> > +		for (i = 0; i < WHEEL_SIZE / sizeof(base->pending_map[0]) / 8; i++)
> > +			pr_cont("%016lx", base->pending_map[i]);
> > +		pr_cont("\n");
> > +	}
> >  	del_singleshot_timer_sync(&timer.timer);
> >
> >  	/* Remove the timer from the object tracker */
> >
> >
> 
> Hmm, that message doesn't seem to trigger :-s
> 
> Thanks.
> 

  reply	other threads:[~2017-12-07 17:29 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-30 20:20 [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION Paul E. McKenney
2017-12-02 13:59 ` Frederic Weisbecker
2017-12-02 19:24   ` Paul E. McKenney
2017-12-04 15:53     ` Frederic Weisbecker
2017-12-04 17:16       ` Paul E. McKenney
2017-12-07 16:14         ` Frederic Weisbecker
2017-12-07 17:29           ` Paul E. McKenney [this message]
2017-12-09 13:09             ` Frederic Weisbecker
2017-12-09 18:09               ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171207172906.GP7829@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=cl@linux.com \
    --cc=cmetcalf@mellanox.com \
    --cc=efault@gmx.de \
    --cc=frederic@kernel.org \
    --cc=john.stultz@linaro.org \
    --cc=kernellwp@gmail.com \
    --cc=lcapitulino@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=xiaolong.ye@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.