From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: anna-maria@linutronix.de
Cc: tglx@linutronix.de, linux-kernel@vger.kernel.org
Subject: Re: Timer refuses to expire
Date: Wed, 6 Dec 2017 14:04:21 -0800 [thread overview]
Message-ID: <20171206220421.GA12886@linux.vnet.ibm.com> (raw)
In-Reply-To: <20171205233744.GA2453@linux.vnet.ibm.com>
On Tue, Dec 05, 2017 at 03:37:44PM -0800, Paul E. McKenney wrote:
> On Mon, Dec 04, 2017 at 09:42:08AM -0800, Paul E. McKenney wrote:
> > On Fri, Dec 01, 2017 at 10:25:29AM -0800, Paul E. McKenney wrote:
> > > Hello, Anna-Maria,
> > >
> > > It turned out that one set of problems was due to NO_HZ_FULL issues,
> > > which are addressed with a pair of patches. However, there is still one
> > > case of a timer refusing to expire in rcutorture's TREE01 test scenario.
> > > The takes from two to four hours to trigger, and the most recent one
> > > gives the following diagnostic (see patch at the very end of this email):
> > >
> > > [13166.127458] schedule_timeout: Waylayed timer base->clk: 0x100c40004 jiffies: 0x100c4524e base->next_expiry: 0x100c40004 timer->flags: 0x1000003 timer->expires 0x100c40003 idx: 4 idx_now: ea base->pending_map 000000000000001000000000001080000000000000040000000002000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000
> > >
> > > The message appears any time a timer for less than five jiffies takes
> > > more than eight seconds to expire. In all cases, this expiry did not
> > > happen naturally, but rather via an unsolicited wakeup from the RCU CPU
> > > stall code. If I am interpreting this message correctly, base->clk
> > > has advanced past this timer, and the timer is correctly flagged in
> > > base->pending_map. This seems like part of the problem because the
> > > timer's timeout was only three jiffies. However, base->clk has not
> > > advanced for more than 20 seconds, which seems like another problem.
> > >
> > > What additional diagnostics would be helpful? I can capture data
> > > at the beginning of the schedule_timeout in the timer_list structure,
> > > and of course can print more information at the time of the wakeup.
> >
> > And on the off-chance that the following messages from this weekend's
> > runs are at all helpful. One interesting point is that starting at
> > time 4731.872311, there are repeated enqueues to CPU 5's timer wheel,
> > but CPU 5's ->clk does not advance. Things seem to advance at
> > time 4891.721190.
>
> Another layer on the onion... For at least some of the failures, there
> is a stalled CPU-hotplug operation. This of course can mean that the
> timers are stuck on the partially offlined CPU. So I dumped the stack
> of the task that is taking the CPU offline. Please see below for console
> output and patch.
>
> I am considering doing an unsolicited wakeup of the task doing the
> hotplug operation, but I am not convinced that the entirely of the CPU
> hotplug code is willing to put up with that sort of thing.
What I did instead was to dump out the state of the task that
__cpuhp_kick_ap() waits on, please see the patch at the very end of this
email. This triggered as shown below, and you guessed it, that task is
waiting on a grace period. Which I am guessing won't happen until the
outgoing CPU reaches CPUHP_TIMERS_DEAD state and calls timers_dead_cpu().
Which will prevent RCU's grace-period kthread from ever awakening, which
will prevent the task that __cpuhp_kick_ap() waits on from ever awakening,
which will prevent the outgoing CPU from reaching CPUHP_TIMERS_DEAD state.
Deadlock.
Does that make sense, or am I missing a trick here?
I tried invoking timers_dead_cpu() from sched_cpu_deactivate(), but
that triggered the BUG_ON(cpu_online(cpu)). I removed this BUG_ON(),
and appear to have deadlocked on the timer ->lock.
Any thoughts on what else to try?
Thanx, Paul
------------------------------------------------------------------------
[ 2939.381210] schedule_timeout: Waylayed timer base->clk: 0x1002709c3 jiffies: 0x100284614 base->next_expiry: 0x1002709c3 timer->flags: 0x40000007 timer->expires 0x10027f3a2 idx: 100 idx_now: 105 base->pending_map 000000000000000800000000000000000002000000000000020080400002000868000000000000010000000001000000000000000000000000000000000000000000000000000000
[ 2939.381219] Torture onoff task state:
[ 2939.381221] torture_onoff D14584 842 2 0x80000000
[ 2939.381239] Call Trace:
[ 2939.381245] ? __schedule+0x33c/0x6f0
[ 2939.381248] ? preempt_count_add+0x51/0x90
[ 2939.381250] schedule+0x37/0x90
[ 2939.381270] schedule_timeout+0x20d/0x4c0
[ 2939.381276] wait_for_completion+0x95/0x100
[ 2939.381282] ? wake_up_q+0x60/0x60
[ 2939.381299] __cpuhp_kick_ap+0x49/0x60
[ 2939.381313] cpuhp_kick_ap+0x29/0x70
[ 2939.381316] cpuhp_kick_ap_work+0x34/0x130
[ 2939.381322] _cpu_down+0x17e/0x1f0
[ 2939.381326] do_cpu_down+0x30/0x50
[ 2939.381354] torture_offline+0x76/0x140
[ 2939.381368] torture_onoff+0xef/0x1c0
[ 2939.381371] kthread+0xf0/0x130
[ 2939.381374] ? torture_kthread_stopping+0x70/0x70
[ 2939.381376] ? kthread_destroy_worker+0x40/0x40
[ 2939.381378] ret_from_fork+0x1f/0x30
[ 2939.381381] __cpuhp_kick_ap task state:
[ 2939.381383] cpuhp/7 D14504 46 2 0x80000000
[ 2939.381399] Call Trace:
[ 2939.381403] ? __schedule+0x33c/0x6f0
[ 2939.381406] schedule+0x37/0x90
[ 2939.381408] schedule_timeout+0x20d/0x4c0
[ 2939.381414] ? enqueue_task_fair+0x556/0x12b0
[ 2939.381417] wait_for_completion+0x95/0x100
[ 2939.381419] ? wake_up_q+0x60/0x60
[ 2939.381431] __wait_rcu_gp+0x10d/0x140
[ 2939.381441] ? sched_cpu_activate+0xc0/0xc0
[ 2939.381443] sched_cpu_deactivate+0x36/0xa0
[ 2939.381453] ? kfree_call_rcu+0x20/0x20
[ 2939.381454] ? __call_rcu+0x530/0x530
[ 2939.381457] ? __rcu_read_unlock+0x50/0x50
[ 2939.381459] ? __rcu_read_unlock+0x50/0x50
[ 2939.381462] cpuhp_invoke_callback+0xa8/0x610
[ 2939.381466] cpuhp_thread_fun+0xc5/0x150
[ 2939.381469] smpboot_thread_fn+0x15c/0x220
[ 2939.381476] kthread+0xf0/0x130
[ 2939.381478] ? sort_range+0x20/0x20
[ 2939.381480] ? kthread_destroy_worker+0x40/0x40
[ 2939.381482] ret_from_fork+0x1f/0x30
------------------------------------------------------------------------
commit aa2b802cb432a725589884d2ba47833de21e0871
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date: Wed Dec 6 09:46:30 2017 -0800
EXP: timer: Dump __cpuhp_kick_ap task state
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 04892a82f6ac..dc9098946b3a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -381,6 +381,8 @@ cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
st->bringup = !st->bringup;
}
+void schedule_timeout_set_task3dump(struct task_struct *t);
+
/* Regular hotplug invocation of the AP hotplug thread */
static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st)
{
@@ -395,7 +397,9 @@ static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st)
smp_mb();
st->should_run = true;
wake_up_process(st->thread);
+ schedule_timeout_set_task3dump(st->thread);
wait_for_ap_thread(st, st->bringup);
+ schedule_timeout_set_task3dump(NULL);
}
static int cpuhp_kick_ap(struct cpuhp_cpu_state *st, enum cpuhp_state target)
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 0a67df4cdeb6..532b43d05219 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1725,6 +1725,7 @@ static void process_timeout(struct timer_list *t)
}
static struct task_struct *schedule_timeout_task2dump;
+static struct task_struct *schedule_timeout_task3dump;
static DEFINE_MUTEX(schedule_timeout_task2dump_mutex);
void schedule_timeout_set_task2dump(struct task_struct *t)
{
@@ -1732,7 +1733,12 @@ void schedule_timeout_set_task2dump(struct task_struct *t)
WRITE_ONCE(schedule_timeout_task2dump, t);
mutex_unlock(&schedule_timeout_task2dump_mutex);
}
-EXPORT_SYMBOL_GPL(schedule_timeout_set_task2dump);
+void schedule_timeout_set_task3dump(struct task_struct *t)
+{
+ mutex_lock(&schedule_timeout_task2dump_mutex);
+ WRITE_ONCE(schedule_timeout_task3dump, t);
+ mutex_unlock(&schedule_timeout_task2dump_mutex);
+}
/**
* schedule_timeout - sleep until timeout
@@ -1824,6 +1830,10 @@ signed long __sched schedule_timeout(signed long timeout)
pr_info("Torture onoff task state:\n");
sched_show_task(schedule_timeout_task2dump);
}
+ if (schedule_timeout_task3dump) {
+ pr_info("__cpuhp_kick_ap task state:\n");
+ sched_show_task(schedule_timeout_task3dump);
+ }
mutex_unlock(&schedule_timeout_task2dump_mutex);
}
}
next prev parent reply other threads:[~2017-12-06 22:04 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-01 18:25 Timer refuses to expire Paul E. McKenney
2017-12-04 17:34 ` Paul E. McKenney
2017-12-04 17:42 ` Paul E. McKenney
2017-12-05 23:37 ` Paul E. McKenney
2017-12-06 22:04 ` Paul E. McKenney [this message]
2017-12-07 7:03 ` Boqun Feng
2017-12-07 14:56 ` Paul E. McKenney
2017-12-07 21:45 ` Paul E. McKenney
2017-12-08 0:31 ` Paul E. McKenney
2017-12-08 1:06 ` Boqun Feng
2017-12-08 1:26 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171206220421.GA12886@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=anna-maria@linutronix.de \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.