From: Chris Metcalf <cmetcalf@ezchip.com>
To: Frederic Weisbecker <fweisbec@gmail.com>,
Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Ingo Molnar <mingo@kernel.org>, Andrew Jones <drjones@redhat.com>,
Ulrich Obergfell <uobergfe@redhat.com>,
Fabian Frederick <fabf@skynet.be>,
Aaron Tomlin <atomlin@redhat.com>, Ben Zhang <benzh@chromium.org>,
Christoph Lameter <cl@linux.com>,
Gilad Ben-Yossef <gilad@benyossef.com>,
Steven Rostedt <rostedt@goodmis.org>,
<linux-kernel@vger.kernel.org>, Jonathan Corbet <corbet@lwn.net>,
<linux-doc@vger.kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH v10 1/3] smpboot: allow excluding cpus from the smpboot threads
Date: Thu, 4 Jun 2015 13:25:49 -0400 [thread overview]
Message-ID: <55708A1D.70707@ezchip.com> (raw)
In-Reply-To: <20150501212329.GA4179@lerouge>
On 05/01/2015 05:23 PM, Frederic Weisbecker wrote:
> On Fri, May 01, 2015 at 03:57:51PM -0400, Chris Metcalf wrote:
>> On 05/01/2015 04:53 AM, Frederic Weisbecker wrote:
>>>> + /* Unpark any threads that were voluntarily parked. */
>>>> + for_each_cpu_not(cpu, &ht->cpumask) {
>>>> + if (cpu_online(cpu)) {
>>>> + struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);
>>>> + if (tsk)
>>>> + kthread_unpark(tsk);
>>> I'm still not clear why we are doing that. kthread_stop() should be able
>>> to handle parked kthreads, otherwise it needs to be fixed.
>> Checking without the unpark, it's actually only a problem with nohz_full.
>> In a system without nohz_full, the kthreads are able to stop even when
>> they are parked; it's only in the nohz_full case that things wedge.
>> For example, booting with only cpu 0 as a housekeeping core (and
>> therefore all watchdogs 1-35 on my 36-core tilegx are parked), and
>> immediately doing "echo 0 > /proc/sys/kernel/watchdog", I see
>> (via SysRq ^O-l) the first parked watchdog, on cpu 1, hung with:
>>
>> frame 0: 0xfffffff7000f2928 lock_hrtimer_base+0xb8/0xc0
>> frame 1: 0xfffffff7000f2a28 hrtimer_try_to_cancel+0x40/0x170
>> frame 2: 0xfffffff7000f2a28 hrtimer_try_to_cancel+0x40/0x170
>> frame 3: 0xfffffff7000f2b98 hrtimer_cancel+0x40/0x68
>> frame 4: 0xfffffff70014cce0 watchdog_disable+0x50/0x70
>> frame 5: 0xfffffff70008c2d0 smpboot_thread_fn+0x350/0x438
>> frame 6: 0xfffffff700084b28 kthread+0x160/0x178
I finally had some time to look into this issue some more.
With PROVE_LOCKING enabled (after a fix I'll send to LKML shortly), we
get no warnings, and ^O-d to print locks shows:
Showing all locks held in the system:
3 locks held by watchdog/1/15:
#0: (&(&hp->lock)->rlock){-.....}, at: [<fffffff700620740>] hvc_poll+0xb8/0x4b8
#1: (rcu_read_lock){......}, at: [<fffffff70061d710>] __handle_sysrq+0x0/0x440
#2: (tasklist_lock){.+.+..}, at: [<fffffff7000d7310>] debug_show_all_locks+0xc0/0x350
3 locks held by sh/1732:
#0: (sb_writers#4){.+.+.+}, at: [<fffffff70022f6b8>] vfs_write+0x268/0x2c0
#1: (watchdog_proc_mutex){+.+.+.}, at: [<fffffff70016f368>] proc_watchdog_common+0x78/0x1c8
#2: (smpboot_threads_lock){+.+.+.}, at: [<fffffff700093558>] smpboot_unregister_percpu_thread+0x48/0x88
All the watchdog/1/15 locks are attributable to the fact that it's running
on the same core that ended up handling the "^O-d" request from SysRq.
The sh process from which I ran the echo eventually shows up as "blocked for
more than 120 seconds" and pretty much where you'd expect it to be, waiting
on a completion in kthread_stop() at kthread.c:473.
I instrumented lock_hrtimer_base(), and timer->base is null, and never gets
set non-null, so the loop spins forever. Perhaps something in nohz is preventing
the timer->base from being set?
I'm happy to keep debugging this but I'm not really clear on what could
be going wrong here. Any ideas?
--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com
next prev parent reply other threads:[~2015-06-04 17:26 UTC|newest]
Thread overview: 106+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-30 18:51 [PATCH] watchdog: nohz: don't run watchdog on nohz_full cores cmetcalf
2015-03-30 19:12 ` Don Zickus
2015-03-30 19:32 ` [PATCH v2] " Chris Metcalf
2015-03-30 20:02 ` Don Zickus
2015-04-02 15:19 ` Frederic Weisbecker
2015-03-31 2:04 ` [PATCH] " Mike Galbraith
2015-03-31 6:34 ` Mike Galbraith
2015-03-31 18:32 ` Chris Metcalf
2015-03-31 7:25 ` Ingo Molnar
2015-03-31 18:30 ` Chris Metcalf
2015-04-02 13:35 ` Don Zickus
2015-04-02 13:49 ` Chris Metcalf
2015-04-02 14:15 ` Don Zickus
2015-04-02 15:38 ` Frederic Weisbecker
2015-04-02 15:42 ` Chris Metcalf
2015-04-02 16:08 ` Don Zickus
2015-04-02 16:48 ` Frederic Weisbecker
2015-04-02 17:39 ` [PATCH v3] watchdog: add watchdog_cpumask sysctl to assist nohz cmetcalf
2015-04-02 18:06 ` Peter Zijlstra
2015-04-02 18:16 ` Chris Metcalf
2015-04-02 18:33 ` Peter Zijlstra
2015-04-02 18:49 ` Chris Metcalf
2015-04-02 18:45 ` Don Zickus
2015-04-03 16:08 ` [PATCH v4 1/2] smpboot: allow excluding cpus from the smpboot threads cmetcalf
2015-04-03 16:08 ` [PATCH v4 2/2] watchdog: add watchdog_exclude sysctl to assist nohz cmetcalf
2015-04-05 16:46 ` Ulrich Obergfell
2015-04-06 19:45 ` [PATCH v5 0/2] nohz/watchdog/smp_hotplug_thread changes cmetcalf
2015-04-06 19:45 ` [PATCH v5 1/2] smpboot: allow excluding cpus from the smpboot threads cmetcalf
2015-04-08 13:28 ` Frederic Weisbecker
2015-04-08 14:06 ` Chris Metcalf
2015-04-08 17:29 ` Frederic Weisbecker
2015-04-06 19:45 ` [PATCH v5 2/2] watchdog: add watchdog_exclude sysctl to assist nohz cmetcalf
2015-04-07 15:44 ` Don Zickus
2015-04-07 15:56 ` Sasha Levin
2015-04-07 17:49 ` Chris Metcalf
2015-04-08 14:01 ` Frederic Weisbecker
2015-04-08 19:11 ` [PATCH v6 1/2] smpboot: allow excluding cpus from the smpboot threads cmetcalf
2015-04-08 19:11 ` [PATCH v6 2/2] watchdog: add watchdog_cpumask sysctl to assist nohz cmetcalf
2015-04-08 20:37 ` [PATCH v6 1/2] smpboot: allow excluding cpus from the smpboot threads Thomas Gleixner
2015-04-09 20:29 ` [PATCH] " Chris Metcalf
2015-04-10 1:58 ` Frederic Weisbecker
2015-04-10 16:33 ` Chris Metcalf
2015-04-12 19:14 ` Frederic Weisbecker
2015-04-13 16:06 ` Chris Metcalf
2015-04-13 21:54 ` Frederic Weisbecker
2015-04-14 19:37 ` [PATCH v8 1/3] " Chris Metcalf
2015-04-14 19:37 ` [PATCH v8 2/3] watchdog: add watchdog_cpumask sysctl to assist nohz Chris Metcalf
2015-04-16 10:46 ` Ulrich Obergfell
2015-04-17 15:41 ` Chris Metcalf
2015-04-22 8:20 ` Ulrich Obergfell
2015-04-28 17:52 ` Chris Metcalf
2015-04-29 8:48 ` Ulrich Obergfell
2015-04-17 1:31 ` Chai Wen
2015-04-17 16:10 ` Chris Metcalf
2015-04-14 19:37 ` [PATCH v8 3/3] procfs: treat parked tasks as sleeping for task state Chris Metcalf
2015-04-16 15:28 ` [PATCH v8 1/3] smpboot: allow excluding cpus from the smpboot threads Frederic Weisbecker
2015-04-16 15:50 ` Chris Metcalf
2015-04-16 16:48 ` Frederic Weisbecker
2015-04-17 16:17 ` Chris Metcalf
2015-04-17 18:37 ` [PATCH v9 " Chris Metcalf
2015-04-17 18:37 ` [PATCH v9 2/3] watchdog: add watchdog_cpumask sysctl to assist nohz Chris Metcalf
2015-04-21 12:32 ` Ulrich Obergfell
2015-04-28 18:07 ` Chris Metcalf
2015-04-29 9:49 ` Ulrich Obergfell
2015-04-29 13:10 ` Don Zickus
2015-04-21 14:07 ` Ulrich Obergfell
2015-04-22 15:18 ` Don Zickus
2015-04-25 15:42 ` Ulrich Obergfell
2015-04-22 11:02 ` Ulrich Obergfell
2015-04-22 15:21 ` Don Zickus
2015-04-27 20:27 ` Chris Metcalf
2015-04-28 15:17 ` Don Zickus
2015-04-28 19:42 ` Andrew Morton
2015-04-30 19:39 ` [PATCH v10 0/3] add watchdog_cpumask to help nohz_full Chris Metcalf
2015-04-30 19:39 ` [PATCH v10 1/3] smpboot: allow excluding cpus from the smpboot threads Chris Metcalf
2015-05-01 8:53 ` Frederic Weisbecker
2015-05-01 19:57 ` Chris Metcalf
2015-05-01 21:23 ` Frederic Weisbecker
2015-05-04 22:06 ` Chris Metcalf
2015-06-03 2:34 ` Don Zickus
2015-06-04 17:25 ` Chris Metcalf [this message]
2015-05-01 20:00 ` [PATCH] smpboot: dynamically allocate the cpumask Chris Metcalf
2015-04-30 19:39 ` [PATCH v10 2/3] watchdog: add watchdog_cpumask sysctl to assist nohz Chris Metcalf
2015-04-30 20:00 ` Don Zickus
2015-04-30 20:09 ` Chris Metcalf
2015-05-01 13:46 ` Don Zickus
2015-04-30 19:39 ` [PATCH v10 3/3] procfs: treat parked tasks as sleeping for task state Chris Metcalf
2015-04-29 22:26 ` [PATCH v9 2/3] watchdog: add watchdog_cpumask sysctl to assist nohz Andrew Morton
2015-04-29 22:26 ` Andrew Morton
2015-04-17 18:37 ` [PATCH v9 3/3] procfs: treat parked tasks as sleeping for task state Chris Metcalf
2015-04-29 22:26 ` Andrew Morton
2015-04-29 22:26 ` [PATCH v9 1/3] smpboot: allow excluding cpus from the smpboot threads Andrew Morton
2015-04-30 16:07 ` Chris Metcalf
2015-04-14 15:23 ` [PATCH] " Frederic Weisbecker
2015-04-14 15:39 ` Chris Metcalf
2015-04-14 17:57 ` Thomas Gleixner
2015-04-10 20:48 ` [PATCH v7 1/3] " Chris Metcalf
2015-04-10 20:48 ` [PATCH v7 2/3] watchdog: add watchdog_cpumask sysctl to assist nohz Chris Metcalf
2015-04-10 20:48 ` [PATCH v7 3/3] procfs: treat parked tasks as sleeping for task state Chris Metcalf
2015-04-10 21:11 ` [PATCH v7 1/3] smpboot: allow excluding cpus from the smpboot threads Andrew Morton
2015-04-13 15:48 ` Chris Metcalf
2015-04-08 19:21 ` [PATCH v5 2/2] watchdog: add watchdog_exclude sysctl to assist nohz Chris Metcalf
2015-04-08 22:31 ` Frederic Weisbecker
2015-03-31 10:17 ` [PATCH] watchdog: nohz: don't run watchdog on nohz_full cores Christoph Lameter
2015-03-31 18:39 ` Chris Metcalf
2015-04-02 14:13 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55708A1D.70707@ezchip.com \
--to=cmetcalf@ezchip.com \
--cc=akpm@linux-foundation.org \
--cc=atomlin@redhat.com \
--cc=benzh@chromium.org \
--cc=cl@linux.com \
--cc=corbet@lwn.net \
--cc=drjones@redhat.com \
--cc=dzickus@redhat.com \
--cc=fabf@skynet.be \
--cc=fweisbec@gmail.com \
--cc=gilad@benyossef.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=uobergfe@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.