linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Nathan Lynch <nathanl@linux.ibm.com>
To: linuxppc-dev@lists.ozlabs.org
Cc: tyreld@linux.ibm.com, ajd@linux.ibm.com, mmc@linux.vnet.ibm.com,
	cforno12@linux.vnet.ibm.com, drt@linux.vnet.ibm.com,
	brking@linux.ibm.com
Subject: [PATCH v2 12/28] powerpc/pseries/mobility: use stop_machine for join/suspend
Date: Mon,  7 Dec 2020 15:51:44 -0600	[thread overview]
Message-ID: <20201207215200.1785968-13-nathanl@linux.ibm.com> (raw)
In-Reply-To: <20201207215200.1785968-1-nathanl@linux.ibm.com>

The partition suspend sequence as specified in the platform
architecture requires that all active processor threads call
H_JOIN, which:

- suspends the calling thread until it is the target of
  an H_PROD; or
- immediately returns H_CONTINUE, if the calling thread is the last to
  call H_JOIN. This thread is expected to call ibm,suspend-me to
  completely suspend the partition.

Upon returning from ibm,suspend-me the calling thread must wake all
others using H_PROD.

rtas_ibm_suspend_me_unsafe() uses on_each_cpu() to implement this
protocol, but because of its synchronizing nature this is susceptible
to deadlock versus users of stop_machine() or other callers of
on_each_cpu().

Not only is stop_machine() intended for use cases like this, it
handles error propagation and allows us to keep the data shared
between CPUs minimal: a single atomic counter which ensures exactly
one CPU will wake the others from their joined states.

Switch the migration code to use stop_machine() and a less complex
local implementation of the H_JOIN/ibm,suspend-me logic, which
carries additional benefits:

- more informative error reporting, appropriately ratelimited
- resets the lockup detector / watchdog on resume to prevent lockup
  warnings when the OS has been suspended for a time exceeding the
  threshold.

Fixes: 91dc182ca6e2 ("[PATCH] powerpc: special-case ibm,suspend-me RTAS call")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 132 ++++++++++++++++++++--
 1 file changed, 125 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 573ed48b43d8..5a3951626a96 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -12,9 +12,11 @@
 #include <linux/cpu.h>
 #include <linux/kernel.h>
 #include <linux/kobject.h>
+#include <linux/nmi.h>
 #include <linux/sched.h>
 #include <linux/smp.h>
 #include <linux/stat.h>
+#include <linux/stop_machine.h>
 #include <linux/completion.h>
 #include <linux/device.h>
 #include <linux/delay.h>
@@ -405,6 +407,128 @@ static int wait_for_vasi_session_suspending(u64 handle)
 	return ret;
 }
 
+static void prod_single(unsigned int target_cpu)
+{
+	long hvrc;
+	int hwid;
+
+	hwid = get_hard_smp_processor_id(target_cpu);
+	hvrc = plpar_hcall_norets(H_PROD, hwid);
+	if (hvrc == H_SUCCESS)
+		return;
+	pr_err_ratelimited("H_PROD of CPU %u (hwid %d) error: %ld\n",
+			   target_cpu, hwid, hvrc);
+}
+
+static void prod_others(void)
+{
+	unsigned int cpu;
+
+	for_each_online_cpu(cpu) {
+		if (cpu != smp_processor_id())
+			prod_single(cpu);
+	}
+}
+
+static u16 clamp_slb_size(void)
+{
+	u16 prev = mmu_slb_size;
+
+	slb_set_size(SLB_MIN_SIZE);
+
+	return prev;
+}
+
+static int do_suspend(void)
+{
+	u16 saved_slb_size;
+	int status;
+	int ret;
+
+	pr_info("calling ibm,suspend-me on CPU %i\n", smp_processor_id());
+
+	/*
+	 * The destination processor model may have fewer SLB entries
+	 * than the source. We reduce mmu_slb_size to a safe minimum
+	 * before suspending in order to minimize the possibility of
+	 * programming non-existent entries on the destination. If
+	 * suspend fails, we restore it before returning. On success
+	 * the OF reconfig path will update it from the new device
+	 * tree after resuming on the destination.
+	 */
+	saved_slb_size = clamp_slb_size();
+
+	ret = rtas_ibm_suspend_me(&status);
+	if (ret != 0) {
+		pr_err("ibm,suspend-me error: %d\n", status);
+		slb_set_size(saved_slb_size);
+	}
+
+	return ret;
+}
+
+static int do_join(void *arg)
+{
+	atomic_t *counter = arg;
+	long hvrc;
+	int ret;
+
+	/* Must ensure MSR.EE off for H_JOIN. */
+	hard_irq_disable();
+	hvrc = plpar_hcall_norets(H_JOIN);
+
+	switch (hvrc) {
+	case H_CONTINUE:
+		/*
+		 * All other CPUs are offline or in H_JOIN. This CPU
+		 * attempts the suspend.
+		 */
+		ret = do_suspend();
+		break;
+	case H_SUCCESS:
+		/*
+		 * The suspend is complete and this cpu has received a
+		 * prod.
+		 */
+		ret = 0;
+		break;
+	case H_BAD_MODE:
+	case H_HARDWARE:
+	default:
+		ret = -EIO;
+		pr_err_ratelimited("H_JOIN error %ld on CPU %i\n",
+				   hvrc, smp_processor_id());
+		break;
+	}
+
+	if (atomic_inc_return(counter) == 1) {
+		pr_info("CPU %u waking all threads\n", smp_processor_id());
+		prod_others();
+	}
+	/*
+	 * Execution may have been suspended for several seconds, so
+	 * reset the watchdog.
+	 */
+	touch_nmi_watchdog();
+	return ret;
+}
+
+static int pseries_migrate_partition(u64 handle)
+{
+	atomic_t counter = ATOMIC_INIT(0);
+	int ret;
+
+	ret = wait_for_vasi_session_suspending(handle);
+	if (ret)
+		return ret;
+
+	ret = stop_machine(do_join, &counter, cpu_online_mask);
+	if (ret == 0)
+		post_mobility_fixup();
+
+	return ret;
+}
+
 static ssize_t migration_store(struct class *class,
 			       struct class_attribute *attr, const char *buf,
 			       size_t count)
@@ -416,16 +540,10 @@ static ssize_t migration_store(struct class *class,
 	if (rc)
 		return rc;
 
-	rc = wait_for_vasi_session_suspending(streamid);
+	rc = pseries_migrate_partition(streamid);
 	if (rc)
 		return rc;
 
-	rc = rtas_ibm_suspend_me_unsafe(streamid);
-	if (rc)
-		return rc;
-
-	post_mobility_fixup();
-
 	return count;
 }
 
-- 
2.28.0


  parent reply	other threads:[~2020-12-07 22:22 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-07 21:51 [PATCH v2 00/28] partition suspend updates Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 01/28] powerpc/rtas: prevent suspend-related sys_rtas use on LE Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 02/28] powerpc/rtas: complete ibm,suspend-me status codes Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 03/28] powerpc/rtas: rtas_ibm_suspend_me -> rtas_ibm_suspend_me_unsafe Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 04/28] powerpc/rtas: add rtas_ibm_suspend_me() Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 05/28] powerpc/rtas: add rtas_activate_firmware() Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 06/28] powerpc/hvcall: add token and codes for H_VASI_SIGNAL Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 07/28] powerpc/pseries/mobility: don't error on absence of ibm, update-nodes Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 08/28] powerpc/pseries/mobility: add missing break to default case Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 09/28] powerpc/pseries/mobility: error message improvements Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 10/28] powerpc/pseries/mobility: use rtas_activate_firmware() on resume Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 11/28] powerpc/pseries/mobility: extract VASI session polling logic Nathan Lynch
2020-12-07 21:51 ` Nathan Lynch [this message]
2020-12-07 21:51 ` [PATCH v2 13/28] powerpc/pseries/mobility: signal suspend cancellation to platform Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 14/28] powerpc/pseries/mobility: retry partition suspend after error Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 15/28] powerpc/rtas: dispatch partition migration requests to pseries Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 16/28] powerpc/rtas: remove rtas_ibm_suspend_me_unsafe() Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 17/28] powerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend ops Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 18/28] powerpc/pseries/hibernation: pass stream id via function arguments Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 19/28] powerpc/pseries/hibernation: remove pseries_suspend_cpu() Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 20/28] powerpc/machdep: remove suspend_disable_cpu() Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 21/28] powerpc/rtas: remove rtas_suspend_cpu() Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 22/28] powerpc/pseries/hibernation: switch to rtas_ibm_suspend_me() Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 23/28] powerpc/rtas: remove unused rtas_suspend_last_cpu() Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 24/28] powerpc/pseries/hibernation: remove redundant cacheinfo update Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 25/28] powerpc/pseries/hibernation: perform post-suspend fixups later Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 26/28] powerpc/pseries/hibernation: remove prepare_late() callback Nathan Lynch
2020-12-07 21:51 ` [PATCH v2 27/28] powerpc/rtas: remove unused rtas_suspend_me_data Nathan Lynch
2020-12-07 21:52 ` [PATCH v2 28/28] powerpc/pseries/mobility: refactor node lookup during DT update Nathan Lynch
2020-12-15 10:49 ` [PATCH v2 00/28] partition suspend updates Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201207215200.1785968-13-nathanl@linux.ibm.com \
    --to=nathanl@linux.ibm.com \
    --cc=ajd@linux.ibm.com \
    --cc=brking@linux.ibm.com \
    --cc=cforno12@linux.vnet.ibm.com \
    --cc=drt@linux.vnet.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mmc@linux.vnet.ibm.com \
    --cc=tyreld@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).