From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Heiko Carstens <heiko.carstens@de.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <sebastian.siewior@linutronix.de>,
linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org,
rt@linutronix.de, Martin Schwidefsky <schwidefsky@de.ibm.com>,
Anna-Maria Gleixner <anna-maria@linutronix.de>
Subject: [PATCH] cpu/hotplug: fix rollback during error-out in __cpu_disable()
Date: Tue, 5 Apr 2016 17:59:04 +0200 [thread overview]
Message-ID: <20160405155904.GA19022@linutronix.de> (raw)
In-Reply-To: <20160405121155.GF6890@osiris>
If we error out in __cpu_disable() (via takedown_cpu() which is
currently the last one that can fail) we don't rollback entirely to
CPUHP_ONLINE (where we started) but to CPUHP_AP_ONLINE_IDLE. This
happens because the former states were on the target CPU (the AP states)
and during the rollback we go back until the first BP state we started.
During the next cpu_down attempt (on the same failed CPU) will take
forever because the cpuhp thread is still down.
The fix this I rollback to where we started in _cpu_down() via a workqueue
to ensure that those callback will be run on the target CPU in
non-atomic context (as in normal cpu_up()).
The workqueues should be working again because the CPU_DOWN_FAILED were
already invoked.
notify_online() has been marked as ->skip_onerr because otherwise we
will see the CPU_ONLINE notifier in addition to the CPU_DOWN_FAILED.
However with ->skip_onerr we neither see CPU_ONLINE nor CPU_DOWN_FAILED
if something in between (CPU_DOWN_FAILED … CPUHP_TEARDOWN_CPU).
Currently there is nothing.
This regression got probably introduce in the rework while we introduced
the hotplug thread to offload the work to the target CPU.
Fixes: 4cb28ced23c4 ("cpu/hotplug: Create hotplug threads")
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
kernel/cpu.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6ea42e8da861..35b3ce38cd93 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -724,6 +724,8 @@ static int takedown_cpu(unsigned int cpu)
/* CPU didn't die: tell everyone. Can't complain. */
cpu_notify_nofail(CPU_DOWN_FAILED, cpu);
irq_unlock_sparse();
+ kthread_unpark(per_cpu_ptr(&cpuhp_state, cpu)->thread);
+ /* smpboot threads are up via CPUHP_AP_SMPBOOT_THREADS */
return err;
}
BUG_ON(cpu_online(cpu));
@@ -787,6 +789,13 @@ void cpuhp_report_idle_dead(void)
#ifdef CONFIG_HOTPLUG_CPU
+static void undo_cpu_down_work(struct work_struct *work)
+{
+ struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
+
+ undo_cpu_down(smp_processor_id(), st, cpuhp_ap_states);
+}
+
/* Requires cpu_add_remove_lock to be held */
static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
enum cpuhp_state target)
@@ -832,6 +841,15 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
* to do the further cleanups.
*/
ret = cpuhp_down_callbacks(cpu, st, cpuhp_bp_states, target);
+ if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
+ struct work_struct undo_work;
+
+ INIT_WORK_ONSTACK(&undo_work, undo_cpu_down_work);
+ st->target = prev_state;
+ schedule_work_on(cpu, &undo_work);
+ flush_work(&undo_work);
+ destroy_work_on_stack(&undo_work);
+ }
hasdied = prev_state != st->state && st->state == CPUHP_OFFLINE;
out:
@@ -1249,6 +1267,7 @@ static struct cpuhp_step cpuhp_ap_states[] = {
.name = "notify:online",
.startup = notify_online,
.teardown = notify_down_prepare,
+ .skip_onerr = true,
},
#endif
/*
--
2.8.0.rc3
next prev parent reply other threads:[~2016-04-05 15:59 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-04 10:27 [PATCH] s390/cpum_sf: Remove superfluous SMP function call Anna-Maria Gleixner
2016-04-05 10:49 ` Heiko Carstens
2016-04-05 11:13 ` [PREEMPT-RT] " Sebastian Andrzej Siewior
2016-04-05 11:23 ` Heiko Carstens
2016-04-05 11:36 ` Heiko Carstens
2016-04-05 11:51 ` rcochran
2016-04-05 11:55 ` Heiko Carstens
2016-04-05 11:57 ` Sebastian Andrzej Siewior
2016-04-05 12:11 ` Heiko Carstens
2016-04-05 12:19 ` Sebastian Andrzej Siewior
2016-04-05 15:59 ` Sebastian Andrzej Siewior [this message]
2016-04-06 19:51 ` [PATCH] cpu/hotplug: fix rollback during error-out in __cpu_disable() Heiko Carstens
2016-04-07 15:14 ` Sebastian Andrzej Siewior
2016-04-08 6:19 ` Heiko Carstens
2016-04-08 12:40 ` [PATCH v2] " Sebastian Andrzej Siewior
2016-04-22 7:54 ` [tip:smp/urgent] cpu/hotplug: Fix " tip-bot for Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160405155904.GA19022@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=anna-maria@linutronix.de \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=rt@linutronix.de \
--cc=schwidefsky@de.ibm.com \
--cc=sebastian.siewior@linutronix.de \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox