LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 0/4] powernv: kvm: numa fault improvement
From: Liu ping fan @ 2014-01-22  8:33 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc
In-Reply-To: <87zjmoiogp.fsf@linux.vnet.ibm.com>

On Wed, Jan 22, 2014 at 1:18 PM, Aneesh Kumar K.V
<aneesh.kumar@linux.vnet.ibm.com> wrote:
> Paul Mackerras <paulus@samba.org> writes:
>
>> On Mon, Jan 20, 2014 at 03:48:36PM +0100, Alexander Graf wrote:
>>>
>>> On 15.01.2014, at 07:36, Liu ping fan <kernelfans@gmail.com> wrote:
>>>
>>> > On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf <agraf@suse.de> wrote:
>>> >>
>>> >> On 11.12.2013, at 09:47, Liu Ping Fan <kernelfans@gmail.com> wrote:
>>> >>
>>> >>> This series is based on Aneesh's series  "[PATCH -V2 0/5] powerpc: mm: Numa faults support for ppc64"
>>> >>>
>>> >>> For this series, I apply the same idea from the previous thread "[PATCH 0/3] optimize for powerpc _PAGE_NUMA"
>>> >>> (for which, I still try to get a machine to show nums)
>>> >>>
>>> >>> But for this series, I think that I have a good justification -- the fact of heavy cost when switching context between guest and host,
>>> >>> which is  well known.
>>> >>
>>> >> This cover letter isn't really telling me anything. Please put a proper description of what you're trying to achieve, why you're trying to achieve what you're trying and convince your readers that it's a good idea to do it the way you do it.
>>> >>
>>> > Sorry for the unclear message. After introducing the _PAGE_NUMA,
>>> > kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it
>>> > should rely on host's kvmppc_book3s_hv_page_fault() to call
>>> > do_numa_page() to do the numa fault check. This incurs the overhead
>>> > when exiting from rmode to vmode.  My idea is that in
>>> > kvmppc_do_h_enter(), we do a quick check, if the page is right placed,
>>> > there is no need to exit to vmode (i.e saving htab, slab switching)
>>> >
>>> >>> If my suppose is correct, will CCing kvm@vger.kernel.org from next version.
>>> >>
>>> >> This translates to me as "This is an RFC"?
>>> >>
>>> > Yes, I am not quite sure about it. I have no bare-metal to verify it.
>>> > So I hope at least, from the theory, it is correct.
>>>
>>> Paul, could you please give this some thought and maybe benchmark it?
>>
>> OK, once I get Aneesh to tell me how I get to have ptes with
>> _PAGE_NUMA set in the first place. :)
>>
>
> I guess we want patch 2, Which Liu has sent separately and I have
> reviewed. http://article.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/8619
> I am not sure about the rest of the patches in the series.
> We definitely don't want to numa migrate on henter. We may want to do
> that on fault. But even there, IMHO, we should let the host take the
> fault and do the numa migration instead of doing this in guest context.
>
My patch does NOT do the numa migration in guest context( h_enter).
Instead it just do a pre-check to see whether the numa migration is
needed. If needed, the host will take the fault and do the numa
migration as it currently does. Otherwise, h_enter can directly setup
hpte without HPTE_V_ABSENT.
And since pte_mknuma() is called system-wide periodly, so it has more
possibility that guest will suffer from HPTE_V_ABSENT.(as my previous
reply, I think we should also place the quick check in
kvmppc_hpte_hv_fault )

Thx,
Fan

> -aneesh
>

^ permalink raw reply

* Re: [PATCH V2] cpuidle/governors: Fix logic in selection of idle states
From: Daniel Lezcano @ 2014-01-22  8:29 UTC (permalink / raw)
  To: Preeti U Murthy, svaidy, linux-pm, benh, rjw, linux-kernel,
	srivatsa.bhat, paulmck, linuxppc-dev, tuukka.tikkanen
In-Reply-To: <20140117043351.21531.14192.stgit@preeti.in.ibm.com>

On 01/17/2014 05:33 AM, Preeti U Murthy wrote:
> The cpuidle governors today are not handling scenarios where no idle state
> can be chosen. Such scenarios coud arise if the user has disabled all the
> idle states at runtime or the latency requirement from the cpus is very strict.
>
> The menu governor returns 0th index of the idle state table when no other
> idle state is suitable. This is even when the idle state corresponding to this
> index is disabled or the latency requirement is strict and the exit_latency
> of the lowest idle state is also not acceptable. Hence this patch
> fixes this logic in the menu governor by defaulting to an idle state index
> of -1 unless any other state is suitable.
>
> The ladder governor needs a few more fixes in addition to that required in the
> menu governor. When the ladder governor decides to demote the idle state of a
> CPU, it does not check if the lower idle states are enabled. Add this logic
> in addition to the logic where it chooses an index of -1 if it can neither
> promote or demote the idle state of a cpu nor can it choose the current idle
> state.
>
> The cpuidle_idle_call() will return back if the governor decides upon not
> entering any idle state. However it cannot return an error code because all
> archs have the logic today that if the call to cpuidle_idle_call() fails, it
> means that the cpuidle driver failed to *function*; for instance due to
> errors during registration. As a result they end up deciding upon a
> default idle state on their own, which could very well be a deep idle state.
> This is incorrect in cases where no idle state is suitable.
>
> Besides for the scenario that this patch is addressing, the call actually
> succeeds. Its just that no idle state is thought to be suitable by the governors.
> Under such a circumstance return success code without entering any idle
> state.
>
> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>
> Changes from V1:https://lkml.org/lkml/2014/1/14/26
>
> 1. Change the return code to success from -EINVAL due to the reason mentioned
> in the changelog.
> 2. Add logic that the patch is addressing in the ladder governor as well.
> 3. Added relevant comments and removed redundant logic as suggested in the
> above thread.
> ---
>
>   drivers/cpuidle/cpuidle.c          |   15 +++++-
>   drivers/cpuidle/governors/ladder.c |   98 ++++++++++++++++++++++++++----------
>   drivers/cpuidle/governors/menu.c   |    7 +--
>   3 files changed, 89 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index a55e68f..831b664 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>
>   	/* ask the governor for the next state */
>   	next_state = cpuidle_curr_governor->select(drv, dev);
> +
> +	dev->last_residency = 0;
>   	if (need_resched()) {
> -		dev->last_residency = 0;

Why do you need to do this change ? ^^^^^

>   		/* give the governor an opportunity to reflect on the outcome */
>   		if (cpuidle_curr_governor->reflect)
>   			cpuidle_curr_governor->reflect(dev, next_state);
> @@ -140,6 +141,18 @@ int cpuidle_idle_call(void)
>   		return 0;
>   	}
>
> +	/* Unlike in the need_resched() case, we return here because the
> +	 * governor did not find a suitable idle state. However idle is still
> +	 * in progress as we are not asked to reschedule. Hence we return
> +	 * without enabling interrupts.

That will lead to a WARN.

> +	 * NOTE: The return code should still be success, since the verdict of this
> +	 * call is "do not enter any idle state" and not a failed call due to
> +	 * errors.
> +	 */
> +	if (next_state < 0)
> +		return 0;
> +

Returning from here breaks the symmetry of the trace.

>   	trace_cpu_idle_rcuidle(next_state, dev->cpu);
>
>   	broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
> diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
> index 9f08e8c..f495f57 100644
> --- a/drivers/cpuidle/governors/ladder.c
> +++ b/drivers/cpuidle/governors/ladder.c
> @@ -58,6 +58,36 @@ static inline void ladder_do_selection(struct ladder_device *ldev,
>   	ldev->last_state_idx = new_idx;
>   }
>
> +static int can_promote(struct ladder_device *ldev, int last_idx,
> +				int last_residency)
> +{
> +	struct ladder_device_state *last_state;
> +
> +	last_state = &ldev->states[last_idx];
> +	if (last_residency > last_state->threshold.promotion_time) {
> +		last_state->stats.promotion_count++;
> +		last_state->stats.demotion_count = 0;
> +		if (last_state->stats.promotion_count >= last_state->threshold.promotion_count)
> +			return 1;
> +	}
> +	return 0;
> +}
> +
> +static int can_demote(struct ladder_device *ldev, int last_idx,
> +			int last_residency)
> +{
> +	struct ladder_device_state *last_state;
> +
> +	last_state = &ldev->states[last_idx];
> +	if (last_residency < last_state->threshold.demotion_time) {
> +		last_state->stats.demotion_count++;
> +		last_state->stats.promotion_count = 0;
> +		if (last_state->stats.demotion_count >= last_state->threshold.demotion_count)
> +			return 1;
> +	}
> +	return 0;
> +}
> +
>   /**
>    * ladder_select_state - selects the next state to enter
>    * @drv: cpuidle driver
> @@ -73,29 +103,33 @@ static int ladder_select_state(struct cpuidle_driver *drv,
>
>   	/* Special case when user has set very strict latency requirement */
>   	if (unlikely(latency_req == 0)) {
> -		ladder_do_selection(ldev, last_idx, 0);
> -		return 0;
> +		if (last_idx >= 0)
> +			ladder_do_selection(ldev, last_idx, -1);
> +		goto out;
>   	}
>
> -	last_state = &ldev->states[last_idx];
> -
> -	if (drv->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID) {
> -		last_residency = cpuidle_get_last_residency(dev) - \
> -					 drv->states[last_idx].exit_latency;
> +	if (last_idx >= 0) {
> +		if (drv->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID) {
> +			last_residency = cpuidle_get_last_residency(dev) - \
> +						 drv->states[last_idx].exit_latency;
> +		} else {
> +			last_state = &ldev->states[last_idx];
> +			last_residency = last_state->threshold.promotion_time + 1;
> +		}
>   	}
> -	else
> -		last_residency = last_state->threshold.promotion_time + 1;
>
>   	/* consider promotion */
>   	if (last_idx < drv->state_count - 1 &&
>   	    !drv->states[last_idx + 1].disabled &&
>   	    !dev->states_usage[last_idx + 1].disable &&
> -	    last_residency > last_state->threshold.promotion_time &&
>   	    drv->states[last_idx + 1].exit_latency <= latency_req) {
> -		last_state->stats.promotion_count++;
> -		last_state->stats.demotion_count = 0;
> -		if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) {
> -			ladder_do_selection(ldev, last_idx, last_idx + 1);
> +		if (last_idx >= 0) {
> +			if (can_promote(ldev, last_idx, last_residency)) {
> +				ladder_do_selection(ldev, last_idx, last_idx + 1);
> +				return last_idx + 1;
> +			}
> +		} else {
> +			ldev->last_state_idx = last_idx + 1;
>   			return last_idx + 1;
>   		}
>   	}
> @@ -107,26 +141,36 @@ static int ladder_select_state(struct cpuidle_driver *drv,
>   	    drv->states[last_idx].exit_latency > latency_req)) {
>   		int i;
>
> -		for (i = last_idx - 1; i > CPUIDLE_DRIVER_STATE_START; i--) {
> -			if (drv->states[i].exit_latency <= latency_req)
> +		for (i = last_idx - 1; i >= CPUIDLE_DRIVER_STATE_START; i--) {
> +			if (drv->states[i].exit_latency <= latency_req &&
> +				!(drv->states[i].disabled || dev->states_usage[i].disable))
>   				break;
>   		}
> -		ladder_do_selection(ldev, last_idx, i);
> -		return i;
> +		if (i >= 0) {
> +			ladder_do_selection(ldev, last_idx, i);
> +			return i;
> +		}
> +		goto out;
>   	}
>
> -	if (last_idx > CPUIDLE_DRIVER_STATE_START &&
> -	    last_residency < last_state->threshold.demotion_time) {
> -		last_state->stats.demotion_count++;
> -		last_state->stats.promotion_count = 0;
> -		if (last_state->stats.demotion_count >= last_state->threshold.demotion_count) {
> -			ladder_do_selection(ldev, last_idx, last_idx - 1);
> -			return last_idx - 1;
> +	if (last_idx > CPUIDLE_DRIVER_STATE_START) {
> +		int i = last_idx - 1;
> +
> +		if (can_demote(ldev, last_idx, last_residency) &&
> +			!(drv->states[i].disabled || dev->states_usage[i].disable)) {
> +			ladder_do_selection(ldev, last_idx, i);
> +			return i;
>   		}
> +		/* We come here when the last_idx is still a suitable idle state, just that
> +		 * promotion or demotion is not ideal.
> +		 */
> +		ldev->last_state_idx = last_idx;
> +		return last_idx;
>   	}
>
> -	/* otherwise remain at the current state */
> -	return last_idx;
> +	/* we come here if no idle state is suitable */
> +out:	ldev->last_state_idx = -1;
> +	return ldev->last_state_idx;
>   }
>
>   /**
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index cf7f2f0..e9f17ce 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -297,12 +297,12 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>   		data->needs_update = 0;
>   	}
>
> -	data->last_state_idx = 0;
> +	data->last_state_idx = -1;
>   	data->exit_us = 0;
>
>   	/* Special case when user has set very strict latency requirement */
>   	if (unlikely(latency_req == 0))
> -		return 0;
> +		return data->last_state_idx;
>
>   	/* determine the expected residency time, round up */
>   	t = ktime_to_timespec(tick_nohz_get_sleep_length());
> @@ -368,7 +368,8 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>   /**
>    * menu_reflect - records that data structures need update
>    * @dev: the CPU
> - * @index: the index of actual entered state
> + * @index: the index of actual entered state or -1 if no idle state is
> + * suitable.
>    *
>    * NOTE: it's important to be fast here because this operation will add to
>    *       the overall exit latency.
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply

* [RESEND PATCH V5 8/8] cpuidle/powernv: Parse device tree to setup idle states
From: Preeti U Murthy @ 2014-01-22  7:09 UTC (permalink / raw)
  To: peterz, fweisbec, paul.gortmaker, paulus, mingo, mikey, shangw,
	rafael.j.wysocki, galak, =daniel.lezcano, benh, paulmck,
	--to=agraf, arnd, linux-pm, rostedt, michael, john.stultz, anton,
	tglx, chenhui.zhao, deepthi, r58472, geoff, linux-kernel,
	srivatsa.bhat, schwidefsky, svaidy, linuxppc-dev
In-Reply-To: <20140122065918.30650.22437.stgit@preeti.in.ibm.com>

Add deep idle states such as nap and fast sleep to the cpuidle state table
only if they are discovered from the device tree during cpuidle initialization.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---

 drivers/cpuidle/cpuidle-powernv.c |   81 +++++++++++++++++++++++++++++--------
 1 file changed, 64 insertions(+), 17 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 90f0c2b..b3face5 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -12,10 +12,17 @@
 #include <linux/cpu.h>
 #include <linux/notifier.h>
 #include <linux/clockchips.h>
+#include <linux/of.h>
 
 #include <asm/machdep.h>
 #include <asm/firmware.h>
 
+/* Flags and constants used in PowerNV platform */
+
+#define MAX_POWERNV_IDLE_STATES	8
+#define IDLE_USE_INST_NAP	0x00010000 /* Use nap instruction */
+#define IDLE_USE_INST_SLEEP	0x00020000 /* Use sleep instruction */
+
 struct cpuidle_driver powernv_idle_driver = {
 	.name             = "powernv_idle",
 	.owner            = THIS_MODULE,
@@ -87,7 +94,7 @@ static int fastsleep_loop(struct cpuidle_device *dev,
 /*
  * States for dedicated partition case.
  */
-static struct cpuidle_state powernv_states[] = {
+static struct cpuidle_state powernv_states[MAX_POWERNV_IDLE_STATES] = {
 	{ /* Snooze */
 		.name = "snooze",
 		.desc = "snooze",
@@ -95,20 +102,6 @@ static struct cpuidle_state powernv_states[] = {
 		.exit_latency = 0,
 		.target_residency = 0,
 		.enter = &snooze_loop },
-	{ /* NAP */
-		.name = "NAP",
-		.desc = "NAP",
-		.flags = CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 10,
-		.target_residency = 100,
-		.enter = &nap_loop },
-	 { /* Fastsleep */
-		.name = "fastsleep",
-		.desc = "fastsleep",
-		.flags = CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 10,
-		.target_residency = 100,
-		.enter = &fastsleep_loop },
 };
 
 static int powernv_cpuidle_add_cpu_notifier(struct notifier_block *n,
@@ -169,19 +162,73 @@ static int powernv_cpuidle_driver_init(void)
 	return 0;
 }
 
+static int powernv_add_idle_states(void)
+{
+	struct device_node *power_mgt;
+	struct property *prop;
+	int nr_idle_states = 1; /* Snooze */
+	int dt_idle_states;
+	u32 *flags;
+	int i;
+
+	/* Currently we have snooze statically defined */
+
+	power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
+	if (!power_mgt) {
+		pr_warn("opal: PowerMgmt Node not found\n");
+		return nr_idle_states;
+	}
+
+	prop = of_find_property(power_mgt, "ibm,cpu-idle-state-flags", NULL);
+	if (!prop) {
+		pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n");
+		return nr_idle_states;
+	}
+
+	dt_idle_states = prop->length / sizeof(u32);
+	flags = (u32 *) prop->value;
+
+	for (i = 0; i < dt_idle_states; i++) {
+
+		if (flags[i] & IDLE_USE_INST_NAP) {
+			/* Add NAP state */
+			strcpy(powernv_states[nr_idle_states].name, "Nap");
+			strcpy(powernv_states[nr_idle_states].desc, "Nap");
+			powernv_states[nr_idle_states].flags = CPUIDLE_FLAG_TIME_VALID;
+			powernv_states[nr_idle_states].exit_latency = 10;
+			powernv_states[nr_idle_states].target_residency = 100;
+			powernv_states[nr_idle_states].enter = &nap_loop;
+			nr_idle_states++;
+		}
+
+		if (flags[i] & IDLE_USE_INST_SLEEP) {
+			/* Add FASTSLEEP state */
+			strcpy(powernv_states[nr_idle_states].name, "FastSleep");
+			strcpy(powernv_states[nr_idle_states].desc, "FastSleep");
+			powernv_states[nr_idle_states].flags = CPUIDLE_FLAG_TIME_VALID;
+			powernv_states[nr_idle_states].exit_latency = 300;
+			powernv_states[nr_idle_states].target_residency = 1000000;
+			powernv_states[nr_idle_states].enter = &fastsleep_loop;
+			nr_idle_states++;
+		}
+	}
+
+	return nr_idle_states;
+}
+
 /*
  * powernv_idle_probe()
  * Choose state table for shared versus dedicated partition
  */
 static int powernv_idle_probe(void)
 {
-
 	if (cpuidle_disable != IDLE_NO_OVERRIDE)
 		return -ENODEV;
 
 	if (firmware_has_feature(FW_FEATURE_OPALv3)) {
 		cpuidle_state_table = powernv_states;
-		max_idle_state = ARRAY_SIZE(powernv_states);
+		/* Device tree can indicate more idle states */
+		max_idle_state = powernv_add_idle_states();
  	} else
  		return -ENODEV;
 

^ permalink raw reply related

* [RESEND PATCH V5 7/8] cpuidle/powernv: Add "Fast-Sleep" CPU idle state
From: Preeti U Murthy @ 2014-01-22  7:09 UTC (permalink / raw)
  To: peterz, fweisbec, paul.gortmaker, paulus, mingo, mikey, shangw,
	rafael.j.wysocki, galak, =daniel.lezcano, benh, paulmck,
	--to=agraf, arnd, linux-pm, rostedt, michael, john.stultz, anton,
	tglx, chenhui.zhao, deepthi, r58472, geoff, linux-kernel,
	srivatsa.bhat, schwidefsky, svaidy, linuxppc-dev
In-Reply-To: <20140122065918.30650.22437.stgit@preeti.in.ibm.com>

Fast sleep is one of the deep idle states on Power8 in which local timers of
CPUs stop. On PowerPC we do not have an external clock device which can
handle wakeup of such CPUs. Now that we have the support in the tick broadcast
framework for archs that do not sport such a device and the low level support
for fast sleep, enable it in the cpuidle framework on PowerNV.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---

 arch/powerpc/Kconfig              |    2 ++
 arch/powerpc/kernel/time.c        |    2 +-
 drivers/cpuidle/cpuidle-powernv.c |   42 +++++++++++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index fa39517..ec91584 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -129,6 +129,8 @@ config PPC
 	select GENERIC_CMOS_UPDATE
 	select GENERIC_TIME_VSYSCALL_OLD
 	select GENERIC_CLOCKEVENTS
+	select GENERIC_CLOCKEVENTS_BROADCAST
+	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select GENERIC_STRNCPY_FROM_USER
 	select GENERIC_STRNLEN_USER
 	select HAVE_MOD_ARCH_SPECIFIC
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index df2989b..95fa5ce 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -106,7 +106,7 @@ struct clock_event_device decrementer_clockevent = {
 	.irq            = 0,
 	.set_next_event = decrementer_set_next_event,
 	.set_mode       = decrementer_set_mode,
-	.features       = CLOCK_EVT_FEAT_ONESHOT,
+	.features       = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP,
 };
 EXPORT_SYMBOL(decrementer_clockevent);
 
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 78fd174..90f0c2b 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -11,6 +11,7 @@
 #include <linux/cpuidle.h>
 #include <linux/cpu.h>
 #include <linux/notifier.h>
+#include <linux/clockchips.h>
 
 #include <asm/machdep.h>
 #include <asm/firmware.h>
@@ -49,6 +50,40 @@ static int nap_loop(struct cpuidle_device *dev,
 	return index;
 }
 
+static int fastsleep_loop(struct cpuidle_device *dev,
+				struct cpuidle_driver *drv,
+				int index)
+{
+	int cpu = dev->cpu;
+	unsigned long old_lpcr = mfspr(SPRN_LPCR);
+	unsigned long new_lpcr;
+
+	if (unlikely(system_state < SYSTEM_RUNNING))
+		return index;
+
+	new_lpcr = old_lpcr;
+	new_lpcr &= ~(LPCR_MER | LPCR_PECE); /* lpcr[mer] must be 0 */
+
+	/* exit powersave upon external interrupt, but not decrementer
+	 * interrupt, Emulate sleep.
+	 */
+	new_lpcr |= LPCR_PECE0;
+
+	if (clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu)) {
+		new_lpcr |= LPCR_PECE1;
+		mtspr(SPRN_LPCR, new_lpcr);
+		power7_nap();
+	} else {
+		mtspr(SPRN_LPCR, new_lpcr);
+		power7_sleep();
+	}
+	clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
+
+	mtspr(SPRN_LPCR, old_lpcr);
+
+	return index;
+}
+
 /*
  * States for dedicated partition case.
  */
@@ -67,6 +102,13 @@ static struct cpuidle_state powernv_states[] = {
 		.exit_latency = 10,
 		.target_residency = 100,
 		.enter = &nap_loop },
+	 { /* Fastsleep */
+		.name = "fastsleep",
+		.desc = "fastsleep",
+		.flags = CPUIDLE_FLAG_TIME_VALID,
+		.exit_latency = 10,
+		.target_residency = 100,
+		.enter = &fastsleep_loop },
 };
 
 static int powernv_cpuidle_add_cpu_notifier(struct notifier_block *n,

^ permalink raw reply related

* [RESEND PATCH V5 6/8] time/cpuidle: Support in tick broadcast framework in the absence of external clock device
From: Preeti U Murthy @ 2014-01-22  7:09 UTC (permalink / raw)
  To: peterz, fweisbec, paul.gortmaker, paulus, mingo, mikey, shangw,
	rafael.j.wysocki, galak, =daniel.lezcano, benh, paulmck,
	--to=agraf, arnd, linux-pm, rostedt, michael, john.stultz, anton,
	tglx, chenhui.zhao, deepthi, r58472, geoff, linux-kernel,
	srivatsa.bhat, schwidefsky, svaidy, linuxppc-dev
In-Reply-To: <20140122065918.30650.22437.stgit@preeti.in.ibm.com>

On some architectures, in certain CPU deep idle states the local timers stop.
An external clock device is used to wakeup these CPUs. The kernel support for the
wakeup of these CPUs is provided by the tick broadcast framework by using the
external clock device as the wakeup source.

However not all implementations of architectures provide such an external
clock device such as some PowerPC ones. This patch includes support in the
broadcast framework to handle the wakeup of the CPUs in deep idle states on such
systems by queuing a hrtimer on one of the CPUs, meant to handle the wakeup of
CPUs in deep idle states. This CPU is identified as the bc_cpu.

Each time the hrtimer expires, it is reprogrammed for the next wakeup of the
CPUs in deep idle state after handling broadcast. However when a CPU is about
to enter  deep idle state with its wakeup time earlier than the time at which
the hrtimer is currently programmed, it *becomes the new bc_cpu* and restarts
the hrtimer on itself. This way the job of doing broadcast is handed around to
the CPUs that ask for the earliest wakeup just before entering deep idle
state. This is consistent with what happens in cases where an external clock
device is present. The smp affinity of this clock device is set to the CPU
with the earliest wakeup.

The important point here is that the bc_cpu cannot enter deep idle state
since it has a hrtimer queued to wakeup the other CPUs in deep idle. Hence it
cannot have its local timer stopped. Therefore for such a CPU, the
BROADCAST_ENTER notification has to fail implying that it cannot enter deep
idle state. On architectures where an external clock device is present, all
CPUs can enter deep idle.

During hotplug of the bc_cpu, the job of doing a broadcast is assigned to the
first cpu in the broadcast mask. This newly nominated bc_cpu is woken up by
an IPI so as to queue the above mentioned hrtimer on it.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---

 include/linux/clockchips.h   |    4 -
 kernel/time/clockevents.c    |    9 +-
 kernel/time/tick-broadcast.c |  192 ++++++++++++++++++++++++++++++++++++++----
 kernel/time/tick-internal.h  |    8 +-
 4 files changed, 186 insertions(+), 27 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 493aa02..bbda37b 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -186,9 +186,9 @@ static inline int tick_check_broadcast_expired(void) { return 0; }
 #endif
 
 #ifdef CONFIG_GENERIC_CLOCKEVENTS
-extern void clockevents_notify(unsigned long reason, void *arg);
+extern int clockevents_notify(unsigned long reason, void *arg);
 #else
-static inline void clockevents_notify(unsigned long reason, void *arg) {}
+static inline int clockevents_notify(unsigned long reason, void *arg) {}
 #endif
 
 #else /* CONFIG_GENERIC_CLOCKEVENTS_BUILD */
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 086ad60..d61404e 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -524,12 +524,13 @@ void clockevents_resume(void)
 #ifdef CONFIG_GENERIC_CLOCKEVENTS
 /**
  * clockevents_notify - notification about relevant events
+ * Returns non zero on error.
  */
-void clockevents_notify(unsigned long reason, void *arg)
+int clockevents_notify(unsigned long reason, void *arg)
 {
 	struct clock_event_device *dev, *tmp;
 	unsigned long flags;
-	int cpu;
+	int cpu, ret = 0;
 
 	raw_spin_lock_irqsave(&clockevents_lock, flags);
 
@@ -542,11 +543,12 @@ void clockevents_notify(unsigned long reason, void *arg)
 
 	case CLOCK_EVT_NOTIFY_BROADCAST_ENTER:
 	case CLOCK_EVT_NOTIFY_BROADCAST_EXIT:
-		tick_broadcast_oneshot_control(reason);
+		ret = tick_broadcast_oneshot_control(reason);
 		break;
 
 	case CLOCK_EVT_NOTIFY_CPU_DYING:
 		tick_handover_do_timer(arg);
+		tick_handover_broadcast_cpu(arg);
 		break;
 
 	case CLOCK_EVT_NOTIFY_SUSPEND:
@@ -585,6 +587,7 @@ void clockevents_notify(unsigned long reason, void *arg)
 		break;
 	}
 	raw_spin_unlock_irqrestore(&clockevents_lock, flags);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(clockevents_notify);
 
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 9532690..1c23912 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -20,6 +20,7 @@
 #include <linux/sched.h>
 #include <linux/smp.h>
 #include <linux/module.h>
+#include <linux/slab.h>
 
 #include "tick-internal.h"
 
@@ -35,6 +36,15 @@ static cpumask_var_t tmpmask;
 static DEFINE_RAW_SPINLOCK(tick_broadcast_lock);
 static int tick_broadcast_force;
 
+/*
+ * Helper variables for handling broadcast in the absence of a
+ * tick_broadcast_device.
+ * */
+static struct hrtimer *bc_hrtimer;
+static int bc_cpu = -1;
+static ktime_t bc_next_wakeup;
+static int hrtimer_initialized = 0;
+
 #ifdef CONFIG_TICK_ONESHOT
 static void tick_broadcast_clear_oneshot(int cpu);
 #else
@@ -528,6 +538,20 @@ static int tick_broadcast_set_event(struct clock_event_device *bc, int cpu,
 	return ret;
 }
 
+static void tick_broadcast_set_next_wakeup(int cpu, ktime_t expires, int force)
+{
+	struct clock_event_device *bc;
+
+	bc = tick_broadcast_device.evtdev;
+
+	if (bc) {
+		tick_broadcast_set_event(bc, cpu, expires, force);
+	} else {
+		hrtimer_start(bc_hrtimer, expires, HRTIMER_MODE_ABS_PINNED);
+		bc_cpu = cpu;
+	}
+}
+
 int tick_resume_broadcast_oneshot(struct clock_event_device *bc)
 {
 	clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT);
@@ -558,15 +582,13 @@ void tick_check_oneshot_broadcast(int cpu)
 /*
  * Handle oneshot mode broadcasting
  */
-static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
+static int tick_oneshot_broadcast(void)
 {
 	struct tick_device *td;
 	ktime_t now, next_event;
 	int cpu, next_cpu = 0;
 
-	raw_spin_lock(&tick_broadcast_lock);
-again:
-	dev->next_event.tv64 = KTIME_MAX;
+	bc_next_wakeup.tv64 = KTIME_MAX;
 	next_event.tv64 = KTIME_MAX;
 	cpumask_clear(tmpmask);
 	now = ktime_get();
@@ -620,34 +642,95 @@ again:
 	 * in the event mask
 	 */
 	if (next_event.tv64 != KTIME_MAX) {
-		/*
-		 * Rearm the broadcast device. If event expired,
-		 * repeat the above
-		 */
-		if (tick_broadcast_set_event(dev, next_cpu, next_event, 0))
+		bc_next_wakeup = next_event;
+	}
+
+	return next_cpu;
+}
+
+/*
+ * Handler in oneshot mode for the external clock device
+ */
+static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
+{
+	int next_cpu;
+
+	raw_spin_lock(&tick_broadcast_lock);
+
+again:	next_cpu = tick_oneshot_broadcast();
+	/*
+	 * Rearm the broadcast device. If event expired,
+	 * repeat the above
+	 */
+	if (bc_next_wakeup.tv64 != KTIME_MAX)
+		if (tick_broadcast_set_event(dev, next_cpu, bc_next_wakeup, 0))
 			goto again;
+
+	raw_spin_unlock(&tick_broadcast_lock);
+}
+
+/*
+ * Handler in oneshot mode for the hrtimer queued when there is no external
+ * clock device.
+ */
+static enum hrtimer_restart handle_broadcast(struct hrtimer *hrtmr)
+{
+	ktime_t now, interval;
+
+	raw_spin_lock(&tick_broadcast_lock);
+	tick_oneshot_broadcast();
+
+	now = ktime_get();
+
+	if (bc_next_wakeup.tv64 != KTIME_MAX) {
+		interval = ktime_sub(bc_next_wakeup, now);
+		hrtimer_forward_now(bc_hrtimer, interval);
+		raw_spin_unlock(&tick_broadcast_lock);
+		return HRTIMER_RESTART;
 	}
 	raw_spin_unlock(&tick_broadcast_lock);
+	return HRTIMER_NORESTART;
+}
+
+/* The CPU could be asked to take over from the previous bc_cpu,
+ * if it is being hotplugged out.
+ */
+static void tick_broadcast_exit_check(int cpu)
+{
+	if (cpu == bc_cpu)
+		hrtimer_start(bc_hrtimer, bc_next_wakeup,
+				HRTIMER_MODE_ABS_PINNED);
+}
+
+static int can_enter_broadcast(int cpu)
+{
+	return cpu != bc_cpu;
 }
 
 /*
  * Powerstate information: The system enters/leaves a state, where
  * affected devices might stop
+ *
+ * Returns non zero value if the entry into broadcast framework failed
+ * This scenario can arise on certain implementations of archs which do
+ * not have an external clock device to do the broadcast. Then one of the
+ * CPUs get nominated to handle broadcasting.
+ * Such a CPU cannot enter a state where its tick device can stop.
  */
-void tick_broadcast_oneshot_control(unsigned long reason)
+int tick_broadcast_oneshot_control(unsigned long reason)
 {
-	struct clock_event_device *bc, *dev;
+	struct clock_event_device *dev;
 	struct tick_device *td;
 	unsigned long flags;
 	ktime_t now;
-	int cpu;
+	int cpu, ret = 0;
 
 	/*
 	 * Periodic mode does not care about the enter/exit of power
 	 * states
 	 */
 	if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
-		return;
+		return ret;
 
 	/*
 	 * We are called with preemtion disabled from the depth of the
@@ -658,9 +741,8 @@ void tick_broadcast_oneshot_control(unsigned long reason)
 	dev = td->evtdev;
 
 	if (!(dev->features & CLOCK_EVT_FEAT_C3STOP))
-		return;
+		return ret;
 
-	bc = tick_broadcast_device.evtdev;
 
 	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
 	if (reason == CLOCK_EVT_NOTIFY_BROADCAST_ENTER) {
@@ -676,12 +758,22 @@ void tick_broadcast_oneshot_control(unsigned long reason)
 			 * woken by the IPI right away.
 			 */
 			if (!cpumask_test_cpu(cpu, tick_broadcast_force_mask) &&
-			    dev->next_event.tv64 < bc->next_event.tv64)
-				tick_broadcast_set_event(bc, cpu, dev->next_event, 1);
+			    dev->next_event.tv64 < bc_next_wakeup.tv64) {
+				bc_next_wakeup = dev->next_event;
+				tick_broadcast_set_next_wakeup(cpu, dev->next_event, 1);
+			}
+
+			if (!can_enter_broadcast(cpu)) {
+				cpumask_clear_cpu(cpu, tick_broadcast_oneshot_mask);
+				clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
+				ret = 1;
+			}
 		}
 	} else {
 		if (cpumask_test_and_clear_cpu(cpu, tick_broadcast_oneshot_mask)) {
 			clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
+
+			tick_broadcast_exit_check(cpu);
 			/*
 			 * The cpu which was handling the broadcast
 			 * timer marked this cpu in the broadcast
@@ -746,6 +838,7 @@ void tick_broadcast_oneshot_control(unsigned long reason)
 	}
 out:
 	raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
+	return ret;
 }
 
 /*
@@ -821,17 +914,57 @@ void tick_broadcast_switch_to_oneshot(void)
 {
 	struct clock_event_device *bc;
 	unsigned long flags;
+	int cpu = smp_processor_id();
 
 	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
 
+	bc_next_wakeup.tv64 = KTIME_MAX;
+
 	tick_broadcast_device.mode = TICKDEV_MODE_ONESHOT;
 	bc = tick_broadcast_device.evtdev;
-	if (bc)
+	if (bc) {
 		tick_broadcast_setup_oneshot(bc);
+		bc_next_wakeup = bc->next_event;
+	} else if (hrtimer_initialized) {
+
+		/*
+		 * There may be CPUs waiting for periodic broadcast. We need
+		 * to set the oneshot bits for those and program the hrtimer
+		 * to fire at the next tick period.
+ 		 */
+		cpumask_copy(tmpmask, tick_broadcast_mask);
+		cpumask_clear_cpu(cpu, tmpmask);
+		cpumask_or(tick_broadcast_oneshot_mask,
+			   tick_broadcast_oneshot_mask, tmpmask);
+
+		if (!cpumask_empty(tmpmask)) {
+			tick_broadcast_init_next_event(tmpmask,
+						       tick_next_period);
+			hrtimer_start(bc_hrtimer, tick_next_period, HRTIMER_MODE_ABS_PINNED);
+			bc_next_wakeup = tick_next_period;
+		}
+	}
 
 	raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
 }
 
+/*
+ * Use the broadcast function itself to wake up the new broadcast cpu
+ */
+void tick_handover_broadcast_cpu(int *cpup)
+{
+	struct tick_device *td;
+
+	if (*cpup == bc_cpu) {
+		int cpu = cpumask_first(tick_broadcast_oneshot_mask);
+
+		bc_cpu = (cpu < nr_cpu_ids) ? cpu : -1;
+		if (bc_cpu != -1) {
+			td = &per_cpu(tick_cpu_device, bc_cpu);
+			td->evtdev->broadcast(cpumask_of(bc_cpu));
+		}
+	}
+}
 
 /*
  * Remove a dead CPU from broadcasting
@@ -868,8 +1001,29 @@ int tick_broadcast_oneshot_active(void)
 bool tick_broadcast_oneshot_available(void)
 {
 	struct clock_event_device *bc = tick_broadcast_device.evtdev;
+	bool ret = true;
+	unsigned long flags;
 
-	return bc ? bc->features & CLOCK_EVT_FEAT_ONESHOT : false;
+	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
+
+	if (bc) {
+		ret = bc->features & CLOCK_EVT_FEAT_ONESHOT;
+	} else if (!hrtimer_initialized) {
+		/* An alternative to tick_broadcast_device on archs which do not have
+		 * an external device
+		 */
+		bc_hrtimer = kmalloc(sizeof(*bc_hrtimer), GFP_NOWAIT);
+		if (!bc_hrtimer) {
+			ret = false;
+			goto out;
+		}
+		hrtimer_init(bc_hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
+		bc_hrtimer->function = handle_broadcast;
+		hrtimer_initialized = 1;
+	}
+
+out:	raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
+	return ret;
 }
 
 #endif
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index 18e71f7..9e42177 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -46,23 +46,25 @@ extern int tick_switch_to_oneshot(void (*handler)(struct clock_event_device *));
 extern void tick_resume_oneshot(void);
 # ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
 extern void tick_broadcast_setup_oneshot(struct clock_event_device *bc);
-extern void tick_broadcast_oneshot_control(unsigned long reason);
+extern int tick_broadcast_oneshot_control(unsigned long reason);
 extern void tick_broadcast_switch_to_oneshot(void);
 extern void tick_shutdown_broadcast_oneshot(unsigned int *cpup);
 extern int tick_resume_broadcast_oneshot(struct clock_event_device *bc);
 extern int tick_broadcast_oneshot_active(void);
 extern void tick_check_oneshot_broadcast(int cpu);
+extern void tick_handover_broadcast_cpu(int *cpup);
 bool tick_broadcast_oneshot_available(void);
 # else /* BROADCAST */
 static inline void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
 {
 	BUG();
 }
-static inline void tick_broadcast_oneshot_control(unsigned long reason) { }
+static inline int tick_broadcast_oneshot_control(unsigned long reason) { }
 static inline void tick_broadcast_switch_to_oneshot(void) { }
 static inline void tick_shutdown_broadcast_oneshot(unsigned int *cpup) { }
 static inline int tick_broadcast_oneshot_active(void) { return 0; }
 static inline void tick_check_oneshot_broadcast(int cpu) { }
+static inline void tick_handover_broadcast_cpu(int *cpup) {}
 static inline bool tick_broadcast_oneshot_available(void) { return true; }
 # endif /* !BROADCAST */
 
@@ -87,7 +89,7 @@ static inline void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
 {
 	BUG();
 }
-static inline void tick_broadcast_oneshot_control(unsigned long reason) { }
+static inline int tick_broadcast_oneshot_control(unsigned long reason) { }
 static inline void tick_shutdown_broadcast_oneshot(unsigned int *cpup) { }
 static inline int tick_resume_broadcast_oneshot(struct clock_event_device *bc)
 {

^ permalink raw reply related

* [RESEND PATCH V5 5/8] powermgt: Add OPAL call to resync timebase on wakeup
From: Preeti U Murthy @ 2014-01-22  7:09 UTC (permalink / raw)
  To: peterz, fweisbec, paul.gortmaker, paulus, mingo, mikey, shangw,
	rafael.j.wysocki, galak, =daniel.lezcano, benh, paulmck,
	--to=agraf, arnd, linux-pm, rostedt, michael, john.stultz, anton,
	tglx, chenhui.zhao, deepthi, r58472, geoff, linux-kernel,
	srivatsa.bhat, schwidefsky, svaidy, linuxppc-dev
In-Reply-To: <20140122065918.30650.22437.stgit@preeti.in.ibm.com>

From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

During "Fast-sleep" and deeper power savings state, decrementer and
timebase could be stopped making it out of sync with rest
of the cores in the system.

Add a firmware call to request platform to resync timebase
using low level platform methods.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
---

 arch/powerpc/include/asm/opal.h                |    2 ++
 arch/powerpc/kernel/exceptions-64s.S           |    2 +-
 arch/powerpc/kernel/idle_power7.S              |   27 ++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal-wrappers.S |    1 +
 4 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9a87b44..8c4829f 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -154,6 +154,7 @@ extern int opal_enter_rtas(struct rtas_args *args,
 #define OPAL_FLASH_VALIDATE			76
 #define OPAL_FLASH_MANAGE			77
 #define OPAL_FLASH_UPDATE			78
+#define OPAL_RESYNC_TIMEBASE			79
 #define OPAL_GET_MSG				85
 #define OPAL_CHECK_ASYNC_COMPLETION		86
 
@@ -863,6 +864,7 @@ extern void opal_flash_init(void);
 extern int opal_machine_check(struct pt_regs *regs);
 
 extern void opal_shutdown(void);
+extern int opal_resync_timebase(void);
 
 extern void opal_lpc_init(void);
 
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index b01a9cb..9533d7a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -145,7 +145,7 @@ BEGIN_FTR_SECTION
 
 	/* Fast Sleep wakeup on PowerNV */
 8:	GET_PACA(r13)
-	b 	.power7_wakeup_loss
+	b 	.power7_wakeup_tb_loss
 
 9:
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S
index 14f78be..c3ab869 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -17,6 +17,7 @@
 #include <asm/ppc-opcode.h>
 #include <asm/hw_irq.h>
 #include <asm/kvm_book3s_asm.h>
+#include <asm/opal.h>
 
 #undef DEBUG
 
@@ -125,6 +126,32 @@ _GLOBAL(power7_sleep)
 	b	power7_powersave_common
 	/* No return */
 
+_GLOBAL(power7_wakeup_tb_loss)
+	ld	r2,PACATOC(r13);
+	ld	r1,PACAR1(r13)
+
+	/* Time base re-sync */
+	li	r0,OPAL_RESYNC_TIMEBASE
+	LOAD_REG_ADDR(r11,opal);
+	ld	r12,8(r11);
+	ld	r2,0(r11);
+	mtctr	r12
+	bctrl
+
+	/* TODO: Check r3 for failure */
+
+	REST_NVGPRS(r1)
+	REST_GPR(2, r1)
+	ld	r3,_CCR(r1)
+	ld	r4,_MSR(r1)
+	ld	r5,_NIP(r1)
+	addi	r1,r1,INT_FRAME_SIZE
+	mtcr	r3
+	mfspr	r3,SPRN_SRR1		/* Return SRR1 */
+	mtspr	SPRN_SRR1,r4
+	mtspr	SPRN_SRR0,r5
+	rfid
+
 _GLOBAL(power7_wakeup_loss)
 	ld	r1,PACAR1(r13)
 	REST_NVGPRS(r1)
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 719aa5c..a11a87c 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -126,5 +126,6 @@ OPAL_CALL(opal_return_cpu,			OPAL_RETURN_CPU);
 OPAL_CALL(opal_validate_flash,			OPAL_FLASH_VALIDATE);
 OPAL_CALL(opal_manage_flash,			OPAL_FLASH_MANAGE);
 OPAL_CALL(opal_update_flash,			OPAL_FLASH_UPDATE);
+OPAL_CALL(opal_resync_timebase,			OPAL_RESYNC_TIMEBASE);
 OPAL_CALL(opal_get_msg,				OPAL_GET_MSG);
 OPAL_CALL(opal_check_completion,		OPAL_CHECK_ASYNC_COMPLETION);

^ permalink raw reply related

* [RESEND PATCH V5 4/8] powernv/cpuidle: Add context management for Fast Sleep
From: Preeti U Murthy @ 2014-01-22  7:08 UTC (permalink / raw)
  To: peterz, fweisbec, paul.gortmaker, paulus, mingo, mikey, shangw,
	rafael.j.wysocki, galak, =daniel.lezcano, benh, paulmck,
	--to=agraf, arnd, linux-pm, rostedt, michael, john.stultz, anton,
	tglx, chenhui.zhao, deepthi, r58472, geoff, linux-kernel,
	srivatsa.bhat, schwidefsky, svaidy, linuxppc-dev
In-Reply-To: <20140122065918.30650.22437.stgit@preeti.in.ibm.com>

From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

Before adding Fast-Sleep into the cpuidle framework, some low level
support needs to be added to enable it. This includes saving and
restoring of certain registers at entry and exit time of this state
respectively just like we do in the NAP idle state.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
[Changelog modified by Preeti U. Murthy <preeti@linux.vnet.ibm.com>]
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
---

 arch/powerpc/include/asm/processor.h |    1 +
 arch/powerpc/kernel/exceptions-64s.S |   10 ++++-
 arch/powerpc/kernel/idle_power7.S    |   63 ++++++++++++++++++++++++----------
 3 files changed, 53 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index b62de43..d660dc3 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -450,6 +450,7 @@ enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
 
 extern int powersave_nap;	/* set if nap mode can be used in idle loop */
 extern void power7_nap(void);
+extern void power7_sleep(void);
 extern void flush_instruction_cache(void);
 extern void hard_reset_now(void);
 extern void poweroff_now(void);
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 38d5073..b01a9cb 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -121,9 +121,10 @@ BEGIN_FTR_SECTION
 	cmpwi	cr1,r13,2
 	/* Total loss of HV state is fatal, we could try to use the
 	 * PIR to locate a PACA, then use an emergency stack etc...
-	 * but for now, let's just stay stuck here
+	 * OPAL v3 based powernv platforms have new idle states
+	 * which fall in this catagory.
 	 */
-	bgt	cr1,.
+	bgt	cr1,8f
 	GET_PACA(r13)
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
@@ -141,6 +142,11 @@ BEGIN_FTR_SECTION
 	beq	cr1,2f
 	b	.power7_wakeup_noloss
 2:	b	.power7_wakeup_loss
+
+	/* Fast Sleep wakeup on PowerNV */
+8:	GET_PACA(r13)
+	b 	.power7_wakeup_loss
+
 9:
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 #endif /* CONFIG_PPC_P7_NAP */
diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S
index 3fdef0f..14f78be 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -20,17 +20,27 @@
 
 #undef DEBUG
 
-	.text
+/* Idle state entry routines */
 
-_GLOBAL(power7_idle)
-	/* Now check if user or arch enabled NAP mode */
-	LOAD_REG_ADDRBASE(r3,powersave_nap)
-	lwz	r4,ADDROFF(powersave_nap)(r3)
-	cmpwi	0,r4,0
-	beqlr
-	/* fall through */
+#define	IDLE_STATE_ENTER_SEQ(IDLE_INST)				\
+	/* Magic NAP/SLEEP/WINKLE mode enter sequence */	\
+	std	r0,0(r1);					\
+	ptesync;						\
+	ld	r0,0(r1);					\
+1:	cmp	cr0,r0,r0;					\
+	bne	1b;						\
+	IDLE_INST;						\
+	b	.
 
-_GLOBAL(power7_nap)
+	.text
+
+/*
+ * Pass requested state in r3:
+ * 	0 - nap
+ * 	1 - sleep
+ */
+_GLOBAL(power7_powersave_common)
+	/* Use r3 to pass state nap/sleep/winkle */
 	/* NAP is a state loss, we create a regs frame on the
 	 * stack, fill it up with the state we care about and
 	 * stick a pointer to it in PACAR1. We really only
@@ -79,8 +89,8 @@ _GLOBAL(power7_nap)
 	/* Continue saving state */
 	SAVE_GPR(2, r1)
 	SAVE_NVGPRS(r1)
-	mfcr	r3
-	std	r3,_CCR(r1)
+	mfcr	r4
+	std	r4,_CCR(r1)
 	std	r9,_MSR(r1)
 	std	r1,PACAR1(r13)
 
@@ -90,15 +100,30 @@ _GLOBAL(power7_enter_nap_mode)
 	li	r4,KVM_HWTHREAD_IN_NAP
 	stb	r4,HSTATE_HWTHREAD_STATE(r13)
 #endif
+	cmpwi	cr0,r3,1
+	beq	2f
+	IDLE_STATE_ENTER_SEQ(PPC_NAP)
+	/* No return */
+2:	IDLE_STATE_ENTER_SEQ(PPC_SLEEP)
+	/* No return */
 
-	/* Magic NAP mode enter sequence */
-	std	r0,0(r1)
-	ptesync
-	ld	r0,0(r1)
-1:	cmp	cr0,r0,r0
-	bne	1b
-	PPC_NAP
-	b	.
+_GLOBAL(power7_idle)
+	/* Now check if user or arch enabled NAP mode */
+	LOAD_REG_ADDRBASE(r3,powersave_nap)
+	lwz	r4,ADDROFF(powersave_nap)(r3)
+	cmpwi	0,r4,0
+	beqlr
+	/* fall through */
+
+_GLOBAL(power7_nap)
+	li	r3,0
+	b	power7_powersave_common
+	/* No return */
+
+_GLOBAL(power7_sleep)
+	li	r3,1
+	b	power7_powersave_common
+	/* No return */
 
 _GLOBAL(power7_wakeup_loss)
 	ld	r1,PACAR1(r13)

^ permalink raw reply related

* [RESEND PATCH V5 3/8] cpuidle/ppc: Split timer_interrupt() into timer handling and interrupt handling routines
From: Preeti U Murthy @ 2014-01-22  7:08 UTC (permalink / raw)
  To: peterz, fweisbec, paul.gortmaker, paulus, mingo, mikey, shangw,
	rafael.j.wysocki, galak, =daniel.lezcano, benh, paulmck,
	--to=agraf, arnd, linux-pm, rostedt, michael, john.stultz, anton,
	tglx, chenhui.zhao, deepthi, r58472, geoff, linux-kernel,
	srivatsa.bhat, schwidefsky, svaidy, linuxppc-dev
In-Reply-To: <20140122065918.30650.22437.stgit@preeti.in.ibm.com>

Split timer_interrupt(), which is the local timer interrupt handler on ppc
into routines called during regular interrupt handling and __timer_interrupt(),
which takes care of running local timers and collecting time related stats.

This will enable callers interested only in running expired local timers to
directly call into __timer_interupt(). One of the use cases of this is the
tick broadcast IPI handling in which the sleeping CPUs need to handle the local
timers that have expired.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---

 arch/powerpc/kernel/time.c |   81 +++++++++++++++++++++++++-------------------
 1 file changed, 46 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 3ff97db..df2989b 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -478,6 +478,47 @@ void arch_irq_work_raise(void)
 
 #endif /* CONFIG_IRQ_WORK */
 
+void __timer_interrupt(void)
+{
+	struct pt_regs *regs = get_irq_regs();
+	u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
+	struct clock_event_device *evt = &__get_cpu_var(decrementers);
+	u64 now;
+
+	trace_timer_interrupt_entry(regs);
+
+	if (test_irq_work_pending()) {
+		clear_irq_work_pending();
+		irq_work_run();
+	}
+
+	now = get_tb_or_rtc();
+	if (now >= *next_tb) {
+		*next_tb = ~(u64)0;
+		if (evt->event_handler)
+			evt->event_handler(evt);
+		__get_cpu_var(irq_stat).timer_irqs_event++;
+	} else {
+		now = *next_tb - now;
+		if (now <= DECREMENTER_MAX)
+			set_dec((int)now);
+		/* We may have raced with new irq work */
+		if (test_irq_work_pending())
+			set_dec(1);
+		__get_cpu_var(irq_stat).timer_irqs_others++;
+	}
+
+#ifdef CONFIG_PPC64
+	/* collect purr register values often, for accurate calculations */
+	if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
+		struct cpu_usage *cu = &__get_cpu_var(cpu_usage_array);
+		cu->current_tb = mfspr(SPRN_PURR);
+	}
+#endif
+
+	trace_timer_interrupt_exit(regs);
+}
+
 /*
  * timer_interrupt - gets called when the decrementer overflows,
  * with interrupts disabled.
@@ -486,8 +527,6 @@ void timer_interrupt(struct pt_regs * regs)
 {
 	struct pt_regs *old_regs;
 	u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
-	struct clock_event_device *evt = &__get_cpu_var(decrementers);
-	u64 now;
 
 	/* Ensure a positive value is written to the decrementer, or else
 	 * some CPUs will continue to take decrementer exceptions.
@@ -519,39 +558,7 @@ void timer_interrupt(struct pt_regs * regs)
 	old_regs = set_irq_regs(regs);
 	irq_enter();
 
-	trace_timer_interrupt_entry(regs);
-
-	if (test_irq_work_pending()) {
-		clear_irq_work_pending();
-		irq_work_run();
-	}
-
-	now = get_tb_or_rtc();
-	if (now >= *next_tb) {
-		*next_tb = ~(u64)0;
-		if (evt->event_handler)
-			evt->event_handler(evt);
-		__get_cpu_var(irq_stat).timer_irqs_event++;
-	} else {
-		now = *next_tb - now;
-		if (now <= DECREMENTER_MAX)
-			set_dec((int)now);
-		/* We may have raced with new irq work */
-		if (test_irq_work_pending())
-			set_dec(1);
-		__get_cpu_var(irq_stat).timer_irqs_others++;
-	}
-
-#ifdef CONFIG_PPC64
-	/* collect purr register values often, for accurate calculations */
-	if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
-		struct cpu_usage *cu = &__get_cpu_var(cpu_usage_array);
-		cu->current_tb = mfspr(SPRN_PURR);
-	}
-#endif
-
-	trace_timer_interrupt_exit(regs);
-
+	__timer_interrupt();
 	irq_exit();
 	set_irq_regs(old_regs);
 }
@@ -828,6 +835,10 @@ static void decrementer_set_mode(enum clock_event_mode mode,
 /* Interrupt handler for the timer broadcast IPI */
 void tick_broadcast_ipi_handler(void)
 {
+	u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
+
+	*next_tb = get_tb_or_rtc();
+	__timer_interrupt();
 }
 
 static void register_decrementer_clockevent(int cpu)

^ permalink raw reply related

* [RESEND PATCH V5 2/8] powerpc: Implement tick broadcast IPI as a fixed IPI message
From: Preeti U Murthy @ 2014-01-22  7:08 UTC (permalink / raw)
  To: peterz, fweisbec, paul.gortmaker, paulus, mingo, mikey, shangw,
	rafael.j.wysocki, galak, =daniel.lezcano, benh, paulmck,
	--to=agraf, arnd, linux-pm, rostedt, michael, john.stultz, anton,
	tglx, chenhui.zhao, deepthi, r58472, geoff, linux-kernel,
	srivatsa.bhat, schwidefsky, svaidy, linuxppc-dev
In-Reply-To: <20140122065918.30650.22437.stgit@preeti.in.ibm.com>

From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

For scalability and performance reasons, we want the tick broadcast IPIs
to be handled as efficiently as possible. Fixed IPI messages
are one of the most efficient mechanisms available - they are faster than
the smp_call_function mechanism because the IPI handlers are fixed and hence
they don't involve costly operations such as adding IPI handlers to the target
CPU's function queue, acquiring locks for synchronization etc.

Luckily we have an unused IPI message slot, so use that to implement
tick broadcast IPIs efficiently.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
[Functions renamed to tick_broadcast* and Changelog modified by
 Preeti U. Murthy<preeti@linux.vnet.ibm.com>]
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Geoff Levand <geoff@infradead.org> [For the PS3 part]
---

 arch/powerpc/include/asm/smp.h          |    2 +-
 arch/powerpc/include/asm/time.h         |    1 +
 arch/powerpc/kernel/smp.c               |   19 +++++++++++++++----
 arch/powerpc/kernel/time.c              |    5 +++++
 arch/powerpc/platforms/cell/interrupt.c |    2 +-
 arch/powerpc/platforms/ps3/smp.c        |    2 +-
 6 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 9f7356b..ff51046 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -120,7 +120,7 @@ extern int cpu_to_core_id(int cpu);
  * in /proc/interrupts will be wrong!!! --Troy */
 #define PPC_MSG_CALL_FUNCTION   0
 #define PPC_MSG_RESCHEDULE      1
-#define PPC_MSG_UNUSED		2
+#define PPC_MSG_TICK_BROADCAST	2
 #define PPC_MSG_DEBUGGER_BREAK  3
 
 /* for irq controllers that have dedicated ipis per message (4) */
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index c1f2676..1d428e6 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -28,6 +28,7 @@ extern struct clock_event_device decrementer_clockevent;
 struct rtc_time;
 extern void to_tm(int tim, struct rtc_time * tm);
 extern void GregorianDay(struct rtc_time *tm);
+extern void tick_broadcast_ipi_handler(void);
 
 extern void generic_calibrate_decr(void);
 
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ee7d76b..6f06f05 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -35,6 +35,7 @@
 #include <asm/ptrace.h>
 #include <linux/atomic.h>
 #include <asm/irq.h>
+#include <asm/hw_irq.h>
 #include <asm/page.h>
 #include <asm/pgtable.h>
 #include <asm/prom.h>
@@ -145,9 +146,9 @@ static irqreturn_t reschedule_action(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
-static irqreturn_t unused_action(int irq, void *data)
+static irqreturn_t tick_broadcast_ipi_action(int irq, void *data)
 {
-	/* This slot is unused and hence available for use, if needed */
+	tick_broadcast_ipi_handler();
 	return IRQ_HANDLED;
 }
 
@@ -168,14 +169,14 @@ static irqreturn_t debug_ipi_action(int irq, void *data)
 static irq_handler_t smp_ipi_action[] = {
 	[PPC_MSG_CALL_FUNCTION] =  call_function_action,
 	[PPC_MSG_RESCHEDULE] = reschedule_action,
-	[PPC_MSG_UNUSED] = unused_action,
+	[PPC_MSG_TICK_BROADCAST] = tick_broadcast_ipi_action,
 	[PPC_MSG_DEBUGGER_BREAK] = debug_ipi_action,
 };
 
 const char *smp_ipi_name[] = {
 	[PPC_MSG_CALL_FUNCTION] =  "ipi call function",
 	[PPC_MSG_RESCHEDULE] = "ipi reschedule",
-	[PPC_MSG_UNUSED] = "ipi unused",
+	[PPC_MSG_TICK_BROADCAST] = "ipi tick-broadcast",
 	[PPC_MSG_DEBUGGER_BREAK] = "ipi debugger",
 };
 
@@ -251,6 +252,8 @@ irqreturn_t smp_ipi_demux(void)
 			generic_smp_call_function_interrupt();
 		if (all & IPI_MESSAGE(PPC_MSG_RESCHEDULE))
 			scheduler_ipi();
+		if (all & IPI_MESSAGE(PPC_MSG_TICK_BROADCAST))
+			tick_broadcast_ipi_handler();
 		if (all & IPI_MESSAGE(PPC_MSG_DEBUGGER_BREAK))
 			debug_ipi_action(0, NULL);
 	} while (info->messages);
@@ -289,6 +292,14 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask)
 		do_message_pass(cpu, PPC_MSG_CALL_FUNCTION);
 }
 
+void tick_broadcast(const struct cpumask *mask)
+{
+	unsigned int cpu;
+
+	for_each_cpu(cpu, mask)
+		do_message_pass(cpu, PPC_MSG_TICK_BROADCAST);
+}
+
 #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC)
 void smp_send_debugger_break(void)
 {
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index b3dab20..3ff97db 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -825,6 +825,11 @@ static void decrementer_set_mode(enum clock_event_mode mode,
 		decrementer_set_next_event(DECREMENTER_MAX, dev);
 }
 
+/* Interrupt handler for the timer broadcast IPI */
+void tick_broadcast_ipi_handler(void)
+{
+}
+
 static void register_decrementer_clockevent(int cpu)
 {
 	struct clock_event_device *dec = &per_cpu(decrementers, cpu);
diff --git a/arch/powerpc/platforms/cell/interrupt.c b/arch/powerpc/platforms/cell/interrupt.c
index adf3726..8a106b4 100644
--- a/arch/powerpc/platforms/cell/interrupt.c
+++ b/arch/powerpc/platforms/cell/interrupt.c
@@ -215,7 +215,7 @@ void iic_request_IPIs(void)
 {
 	iic_request_ipi(PPC_MSG_CALL_FUNCTION);
 	iic_request_ipi(PPC_MSG_RESCHEDULE);
-	iic_request_ipi(PPC_MSG_UNUSED);
+	iic_request_ipi(PPC_MSG_TICK_BROADCAST);
 	iic_request_ipi(PPC_MSG_DEBUGGER_BREAK);
 }
 
diff --git a/arch/powerpc/platforms/ps3/smp.c b/arch/powerpc/platforms/ps3/smp.c
index 00d1a7c..b358bec 100644
--- a/arch/powerpc/platforms/ps3/smp.c
+++ b/arch/powerpc/platforms/ps3/smp.c
@@ -76,7 +76,7 @@ static int __init ps3_smp_probe(void)
 
 		BUILD_BUG_ON(PPC_MSG_CALL_FUNCTION    != 0);
 		BUILD_BUG_ON(PPC_MSG_RESCHEDULE       != 1);
-		BUILD_BUG_ON(PPC_MSG_UNUSED	      != 2);
+		BUILD_BUG_ON(PPC_MSG_TICK_BROADCAST   != 2);
 		BUILD_BUG_ON(PPC_MSG_DEBUGGER_BREAK   != 3);
 
 		for (i = 0; i < MSG_COUNT; i++) {

^ permalink raw reply related

* [RESEND PATCH V5 1/8] powerpc: Free up the slot of PPC_MSG_CALL_FUNC_SINGLE IPI message
From: Preeti U Murthy @ 2014-01-22  7:08 UTC (permalink / raw)
  To: peterz, fweisbec, paul.gortmaker, paulus, mingo, mikey, shangw,
	rafael.j.wysocki, galak, =daniel.lezcano, benh, paulmck,
	--to=agraf, arnd, linux-pm, rostedt, michael, john.stultz, anton,
	tglx, chenhui.zhao, deepthi, r58472, geoff, linux-kernel,
	srivatsa.bhat, schwidefsky, svaidy, linuxppc-dev
In-Reply-To: <20140122065918.30650.22437.stgit@preeti.in.ibm.com>

From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

The IPI handlers for both PPC_MSG_CALL_FUNC and PPC_MSG_CALL_FUNC_SINGLE map
to a common implementation - generic_smp_call_function_single_interrupt(). So,
we can consolidate them and save one of the IPI message slots, (which are
precious on powerpc, since only 4 of those slots are available).

So, implement the functionality of PPC_MSG_CALL_FUNC_SINGLE using
PPC_MSG_CALL_FUNC itself and release its IPI message slot, so that it can be
used for something else in the future, if desired.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Geoff Levand <geoff@infradead.org> [For the PS3 part]
---

 arch/powerpc/include/asm/smp.h          |    2 +-
 arch/powerpc/kernel/smp.c               |   12 +++++-------
 arch/powerpc/platforms/cell/interrupt.c |    2 +-
 arch/powerpc/platforms/ps3/smp.c        |    2 +-
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 084e080..9f7356b 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -120,7 +120,7 @@ extern int cpu_to_core_id(int cpu);
  * in /proc/interrupts will be wrong!!! --Troy */
 #define PPC_MSG_CALL_FUNCTION   0
 #define PPC_MSG_RESCHEDULE      1
-#define PPC_MSG_CALL_FUNC_SINGLE	2
+#define PPC_MSG_UNUSED		2
 #define PPC_MSG_DEBUGGER_BREAK  3
 
 /* for irq controllers that have dedicated ipis per message (4) */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ac2621a..ee7d76b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -145,9 +145,9 @@ static irqreturn_t reschedule_action(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
-static irqreturn_t call_function_single_action(int irq, void *data)
+static irqreturn_t unused_action(int irq, void *data)
 {
-	generic_smp_call_function_single_interrupt();
+	/* This slot is unused and hence available for use, if needed */
 	return IRQ_HANDLED;
 }
 
@@ -168,14 +168,14 @@ static irqreturn_t debug_ipi_action(int irq, void *data)
 static irq_handler_t smp_ipi_action[] = {
 	[PPC_MSG_CALL_FUNCTION] =  call_function_action,
 	[PPC_MSG_RESCHEDULE] = reschedule_action,
-	[PPC_MSG_CALL_FUNC_SINGLE] = call_function_single_action,
+	[PPC_MSG_UNUSED] = unused_action,
 	[PPC_MSG_DEBUGGER_BREAK] = debug_ipi_action,
 };
 
 const char *smp_ipi_name[] = {
 	[PPC_MSG_CALL_FUNCTION] =  "ipi call function",
 	[PPC_MSG_RESCHEDULE] = "ipi reschedule",
-	[PPC_MSG_CALL_FUNC_SINGLE] = "ipi call function single",
+	[PPC_MSG_UNUSED] = "ipi unused",
 	[PPC_MSG_DEBUGGER_BREAK] = "ipi debugger",
 };
 
@@ -251,8 +251,6 @@ irqreturn_t smp_ipi_demux(void)
 			generic_smp_call_function_interrupt();
 		if (all & IPI_MESSAGE(PPC_MSG_RESCHEDULE))
 			scheduler_ipi();
-		if (all & IPI_MESSAGE(PPC_MSG_CALL_FUNC_SINGLE))
-			generic_smp_call_function_single_interrupt();
 		if (all & IPI_MESSAGE(PPC_MSG_DEBUGGER_BREAK))
 			debug_ipi_action(0, NULL);
 	} while (info->messages);
@@ -280,7 +278,7 @@ EXPORT_SYMBOL_GPL(smp_send_reschedule);
 
 void arch_send_call_function_single_ipi(int cpu)
 {
-	do_message_pass(cpu, PPC_MSG_CALL_FUNC_SINGLE);
+	do_message_pass(cpu, PPC_MSG_CALL_FUNCTION);
 }
 
 void arch_send_call_function_ipi_mask(const struct cpumask *mask)
diff --git a/arch/powerpc/platforms/cell/interrupt.c b/arch/powerpc/platforms/cell/interrupt.c
index 2d42f3b..adf3726 100644
--- a/arch/powerpc/platforms/cell/interrupt.c
+++ b/arch/powerpc/platforms/cell/interrupt.c
@@ -215,7 +215,7 @@ void iic_request_IPIs(void)
 {
 	iic_request_ipi(PPC_MSG_CALL_FUNCTION);
 	iic_request_ipi(PPC_MSG_RESCHEDULE);
-	iic_request_ipi(PPC_MSG_CALL_FUNC_SINGLE);
+	iic_request_ipi(PPC_MSG_UNUSED);
 	iic_request_ipi(PPC_MSG_DEBUGGER_BREAK);
 }
 
diff --git a/arch/powerpc/platforms/ps3/smp.c b/arch/powerpc/platforms/ps3/smp.c
index 4b35166..00d1a7c 100644
--- a/arch/powerpc/platforms/ps3/smp.c
+++ b/arch/powerpc/platforms/ps3/smp.c
@@ -76,7 +76,7 @@ static int __init ps3_smp_probe(void)
 
 		BUILD_BUG_ON(PPC_MSG_CALL_FUNCTION    != 0);
 		BUILD_BUG_ON(PPC_MSG_RESCHEDULE       != 1);
-		BUILD_BUG_ON(PPC_MSG_CALL_FUNC_SINGLE != 2);
+		BUILD_BUG_ON(PPC_MSG_UNUSED	      != 2);
 		BUILD_BUG_ON(PPC_MSG_DEBUGGER_BREAK   != 3);
 
 		for (i = 0; i < MSG_COUNT; i++) {

^ permalink raw reply related

* [RESEND PATCH V5 0/8] cpuidle/ppc: Enable deep idle states on PowerNV
From: Preeti U Murthy @ 2014-01-22  7:07 UTC (permalink / raw)
  To: peterz, fweisbec, paul.gortmaker, paulus, mingo, mikey, shangw,
	rafael.j.wysocki, galak, =daniel.lezcano, benh, paulmck,
	--to=agraf, arnd, linux-pm, rostedt, michael, john.stultz, anton,
	tglx, chenhui.zhao, deepthi, r58472, geoff, linux-kernel,
	srivatsa.bhat, schwidefsky, svaidy, linuxppc-dev

On PowerPC, when CPUs enter certain deep idle states, the local timers stop
and the time base could go out of sync with the rest of the cores in the system.

This patchset adds support to wake up CPUs in such idle states by
broadcasting IPIs to them at their next timer events using the tick broadcast
framework in the Linux kernel. We refer to these IPIs as the tick
broadcast IPIs in this patchset.

However the tick broadcast framework as it exists today makes use of an external
clock device to wakeup CPUs in such idle states. But not all implementations of
PowerPC provides such an external clock device.

Hence Patch[6/8]:
[time/cpuidle: Support in tick broadcast framework for archs without external
clock device] adds support in the tick broadcast framework for such
use cases by queuing a hrtimer on one of the CPUs which is meant to handle the wakeup
of CPUs in deep idle states.
This patch was posted separately at: https://lkml.org/lkml/2013/12/12/687.

Patches 1-3 adds support in powerpc to hook onto the tick broadcast framework.

The patchset also includes support for resyncing of time base with the rest of the
cores in the system and context management for fast sleep. PATCH[4/8] and
PATCH[5/8] address these issues.

With the required support for deep idle states thus in place, the
patchset adds "Fast-Sleep" idle state into cpuidle (Patches 7 and 8). "Fast-Sleep"
is a deep idle state on Power8 in which the above mentioned challenges
exist. Fast-Sleep can yield us significantly more power
savings than the idle states that we have in cpuidle so far.

This patchset is based on Ben's ppc next branch at commit fac515db45207718
[Merge remote-tracking branch 'scott/next' into next],  and the
cpuidle driver for powernv posted by Deepthi Dharwar:
https://lkml.org/lkml/2014/1/14/172. The same patchset minus the resolving of
merge conflicts with Ben's ppc next branch had been posted earlier
at http://lkml.org/lkml/2014/1/15/70. This Repost resolves these merge
conflicts with Ben's ppc next branch. Hence the Repost. Besides the earlier
post was based and tested on the mainline commit that was quite old.

However the patchset posted earlier at http://lkml.org/lkml/2014/1/15/70
along wiith Deepthi's patches on cpuidle driver for
powernv applies cleanly on the mainline kernel at commit: 85ce70fdf48aa290b484531
dated Jan 16 2014 and has been tested on the same at the time of this Repost.


Changes in V5: The primary change in this version is in Patch[6/8].
As per the discussions in V4 posting of this patchset, it was decided to
refine handling the wakeup of CPUs in fast-sleep by doing the following:

1. In V4, a polling mechanism was used by the CPU handling broadcast to
find out the time of next wakeup of the CPUs in deep idle states. V5 avoids
polling by a way described under PATCH[6/8] in this patchset.

2. The mechanism of broadcast handling of CPUs in deep idle in the absence of an
external wakeup device should be generic and not arch specific code. Hence in this
version this functionality has been integrated into the tick broadcast framework in
the kernel unlike before where it was handled in powerpc specific code.

3. It was suggested that the "broadcast cpu" can be the time keeping cpu
itself. However this has challenges of its own:

 a. The time keeping cpu need not exist when all cpus are idle. Hence there
are phases in time when time keeping cpu is absent. But for the use case that
this patchset is trying to address we rely on the presence of a broadcast cpu
all the time.

 b. The nomination and un-assignment of the time keeping cpu is not protected
by a lock today and need not be as well since such is its use case in the
kernel. However we would need locks if we double up the time keeping cpu as the
broadcast cpu.

Hence the broadcast cpu is independent of the time-keeping cpu. However PATCH[6/8]
proposes a simpler solution to pick a broadcast cpu in this version.



Changes in V4: https://lkml.org/lkml/2013/11/29/97

1. Add Fast Sleep CPU idle state on PowerNV.

2. Add the required context management for Fast Sleep and the call to OPAL
to synchronize time base after wakeup from fast sleep.

4. Add parsing of CPU idle states from the device tree to populate the
cpuidle
state table.

5. Rename ambiguous functions in the code around waking up of CPUs from fast
sleep.

6. Fixed a bug in re-programming of the hrtimer that is queued to wakeup the
CPUs in fast sleep and modified Changelogs.

7. Added the ARCH_HAS_TICK_BROADCAST option. This signifies that we have a
arch specific function to perform broadcast.


Changes in V3:
http://thread.gmane.org/gmane.linux.power-management.general/38113

1. Fix the way in which a broadcast ipi is handled on the idling cpus. Timer
handling on a broadcast ipi is being done now without missing out any timer
stats generation.

2. Fix a bug in the programming of the hrtimer meant to do broadcast. Program
it to trigger at the earlier of a "broadcast period", and the next wakeup
event. By introducing the "broadcast period" as the maximum period after
which the broadcast hrtimer can fire, we ensure that we do not miss
wakeups in corner cases.

3. On hotplug of a broadcast cpu, trigger the hrtimer meant to do broadcast
to fire immediately on the new broadcast cpu. This will ensure we do not miss
doing a broadcast pending in the nearest future.

4. Change the type of allocation from GFP_KERNEL to GFP_NOWAIT while
initializing bc_hrtimer since we are in an atomic context and cannot sleep.

5. Use the broadcast ipi to wakeup the newly nominated broadcast cpu on
hotplug of the old instead of smp_call_function_single(). This is because we
are interrupt disabled at this point and should not be using
smp_call_function_single or its children in this context to send an ipi.

6. Move GENERIC_CLOCKEVENTS_BROADCAST to arch/powerpc/Kconfig.

7. Fix coding style issues.


Changes in V2: https://lkml.org/lkml/2013/8/14/239

1. Dynamically pick a broadcast CPU, instead of having a dedicated one.
2. Remove the constraint of having to disable tickless idle on the broadcast
CPU by queueing a hrtimer dedicated to do broadcast.



V1 posting: https://lkml.org/lkml/2013/7/25/740.

1. Added the infrastructure to wakeup CPUs in deep idle states in which the
local timers stop.

---

Preeti U Murthy (5):
      cpuidle/ppc: Split timer_interrupt() into timer handling and interrupt handling routines
      powermgt: Add OPAL call to resync timebase on wakeup
      time/cpuidle: Support in tick broadcast framework in the absence of external clock device
      cpuidle/powernv: Add "Fast-Sleep" CPU idle state
      cpuidle/powernv: Parse device tree to setup idle states

Srivatsa S. Bhat (2):
      powerpc: Free up the slot of PPC_MSG_CALL_FUNC_SINGLE IPI message
      powerpc: Implement tick broadcast IPI as a fixed IPI message

Vaidyanathan Srinivasan (1):
      powernv/cpuidle: Add context management for Fast Sleep


 arch/powerpc/Kconfig                           |    2 
 arch/powerpc/include/asm/opal.h                |    2 
 arch/powerpc/include/asm/processor.h           |    1 
 arch/powerpc/include/asm/smp.h                 |    2 
 arch/powerpc/include/asm/time.h                |    1 
 arch/powerpc/kernel/exceptions-64s.S           |   10 +
 arch/powerpc/kernel/idle_power7.S              |   90 +++++++++--
 arch/powerpc/kernel/smp.c                      |   23 ++-
 arch/powerpc/kernel/time.c                     |   88 +++++++----
 arch/powerpc/platforms/cell/interrupt.c        |    2 
 arch/powerpc/platforms/powernv/opal-wrappers.S |    1 
 arch/powerpc/platforms/ps3/smp.c               |    2 
 drivers/cpuidle/cpuidle-powernv.c              |  109 ++++++++++++--
 include/linux/clockchips.h                     |    4 -
 kernel/time/clockevents.c                      |    9 +
 kernel/time/tick-broadcast.c                   |  192 ++++++++++++++++++++++--
 kernel/time/tick-internal.h                    |    8 +
 17 files changed, 442 insertions(+), 104 deletions(-)

^ permalink raw reply

* Re: [PATCH RFC 00/73] tree-wide: clean up some no longer required #include <linux/init.h>
From: Stephen Rothwell @ 2014-01-22  7:00 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: linux-arch, linux-mips, linux-m68k, rusty, linux-ia64, kvm,
	linux-s390, netdev, x86, linux-kernel, torvalds, gregkh,
	linux-alpha, sparclinux, akpm, linuxppc-dev, linux-arm-kernel
In-Reply-To: <1390339396-3479-1-git-send-email-paul.gortmaker@windriver.com>

[-- Attachment #1: Type: text/plain, Size: 2351 bytes --]

Hi Paul,

On Tue, 21 Jan 2014 16:22:03 -0500 Paul Gortmaker <paul.gortmaker@windriver.com> wrote:
>
> Where: This work exists as a queue of patches that I apply to
> linux-next; since the changes are fixing some things that currently
> can only be found there.  The patch series can be found at:
> 
>    http://git.kernel.org/cgit/linux/kernel/git/paulg/init.git
>    git://git.kernel.org/pub/scm/linux/kernel/git/paulg/init.git
> 
> I've avoided annoying Stephen with another queue of patches for
> linux-next while the development content was in flux, but now that
> the merge window has opened, and new additions are fewer, perhaps he
> wouldn't mind tacking it on the end...  Stephen?

OK, I have added this to the end of linux-next today - we will see how we
go.  It is called "init".

Thanks for adding your subsystem tree as a participant of linux-next.  As
you may know, this is not a judgment of your code.  The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window. 

You will need to ensure that the patches/commits in your tree/series have
been:
     * submitted under GPL v2 (or later) and include the Contributor's
	Signed-off-by,
     * posted to the relevant mailing list,
     * reviewed by you (or another maintainer of your subsystem tree),
     * successfully unit tested, and 
     * destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch).  It is allowed to be rebased if you deem it necessary.

-- 
Cheers,
Stephen Rothwell 
sfr@canb.auug.org.au

Legal Stuff:
By participating in linux-next, your subsystem tree contributions are
public and will be included in the linux-next trees.  You may be sent
e-mail messages indicating errors or other issues when the
patches/commits from your subsystem tree are merged and tested in
linux-next.  These messages may also be cross-posted to the linux-next
mailing list, the linux-kernel mailing list, etc.  The linux-next tree
project and IBM (my employer) make no warranties regarding the linux-next
project, the testing procedures, the results, the e-mails, etc.  If you
don't agree to these ground rules, let me know and I'll remove your tree
from participation in linux-next.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH 0/4] powernv: kvm: numa fault improvement
From: Aneesh Kumar K.V @ 2014-01-22  5:18 UTC (permalink / raw)
  To: Paul Mackerras, Alexander Graf; +Cc: linuxppc-dev, kvm-ppc, Liu ping fan
In-Reply-To: <20140121112204.GE8265@iris.ozlabs.ibm.com>

Paul Mackerras <paulus@samba.org> writes:

> On Mon, Jan 20, 2014 at 03:48:36PM +0100, Alexander Graf wrote:
>> 
>> On 15.01.2014, at 07:36, Liu ping fan <kernelfans@gmail.com> wrote:
>> 
>> > On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf <agraf@suse.de> wrote:
>> >> 
>> >> On 11.12.2013, at 09:47, Liu Ping Fan <kernelfans@gmail.com> wrote:
>> >> 
>> >>> This series is based on Aneesh's series  "[PATCH -V2 0/5] powerpc: mm: Numa faults support for ppc64"
>> >>> 
>> >>> For this series, I apply the same idea from the previous thread "[PATCH 0/3] optimize for powerpc _PAGE_NUMA"
>> >>> (for which, I still try to get a machine to show nums)
>> >>> 
>> >>> But for this series, I think that I have a good justification -- the fact of heavy cost when switching context between guest and host,
>> >>> which is  well known.
>> >> 
>> >> This cover letter isn't really telling me anything. Please put a proper description of what you're trying to achieve, why you're trying to achieve what you're trying and convince your readers that it's a good idea to do it the way you do it.
>> >> 
>> > Sorry for the unclear message. After introducing the _PAGE_NUMA,
>> > kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it
>> > should rely on host's kvmppc_book3s_hv_page_fault() to call
>> > do_numa_page() to do the numa fault check. This incurs the overhead
>> > when exiting from rmode to vmode.  My idea is that in
>> > kvmppc_do_h_enter(), we do a quick check, if the page is right placed,
>> > there is no need to exit to vmode (i.e saving htab, slab switching)
>> > 
>> >>> If my suppose is correct, will CCing kvm@vger.kernel.org from next version.
>> >> 
>> >> This translates to me as "This is an RFC"?
>> >> 
>> > Yes, I am not quite sure about it. I have no bare-metal to verify it.
>> > So I hope at least, from the theory, it is correct.
>> 
>> Paul, could you please give this some thought and maybe benchmark it?
>
> OK, once I get Aneesh to tell me how I get to have ptes with
> _PAGE_NUMA set in the first place. :)
>

I guess we want patch 2, Which Liu has sent separately and I have
reviewed. http://article.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/8619
I am not sure about the rest of the patches in the series.
We definitely don't want to numa migrate on henter. We may want to do
that on fault. But even there, IMHO, we should let the host take the
fault and do the numa migration instead of doing this in guest context.

-aneesh

^ permalink raw reply

* Re: [PATCH] powerpc: fix hw breakpoints on !HAVE_HW_BREAKPOINT configurations
From: Michael Neuling @ 2014-01-22  4:46 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: linuxppc-dev, imunsie
In-Reply-To: <8761pdht31.fsf_-_@igel.home>

[-- Attachment #1: Type: text/plain, Size: 1232 bytes --]

I'm not near my machine to test but looks good.

Thanks,
Mikey
On 22 Jan 2014 08:56, "Andreas Schwab" <schwab@linux-m68k.org> wrote:

> This fixes a logic error that caused a failure to update the hw breakpoint
> registers when not using the hw-breakpoint interface.
>
> Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
> ---
>  arch/powerpc/kernel/process.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 4a96556..7714950 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -690,7 +690,7 @@ struct task_struct *__switch_to(struct task_struct
> *prev,
>   * schedule DABR
>   */
>  #ifndef CONFIG_HAVE_HW_BREAKPOINT
> -       if (unlikely(hw_brk_match(&__get_cpu_var(current_brk),
> &new->thread.hw_brk)))
> +       if (unlikely(!hw_brk_match(&__get_cpu_var(current_brk),
> &new->thread.hw_brk)))
>                 set_breakpoint(&new->thread.hw_brk);
>  #endif /* CONFIG_HAVE_HW_BREAKPOINT */
>  #endif
> --
> 1.8.5.3
>
>
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."
>
>

[-- Attachment #2: Type: text/html, Size: 1724 bytes --]

^ permalink raw reply

* Re: [PATCH 10/73] powerpc: use device_initcall for registering rtc devices
From: Paul Gortmaker @ 2014-01-22  2:26 UTC (permalink / raw)
  To: Geoff Levand; +Cc: linux-arch, Paul Mackerras, linuxppc-dev, LKML
In-Reply-To: <1390348085.5027.18.camel@smoke>

On Tue, Jan 21, 2014 at 6:48 PM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Paul,
>
> On Tue, 2014-01-21 at 16:22 -0500, Paul Gortmaker wrote:
>> Currently these two RTC devices are in core platform code
>> where it is not possible for them to be modular.  It will
>> never be modular, so using module_init as an alias for
>> __initcall can be somewhat misleading.
>>
>>  arch/powerpc/kernel/time.c        | 2 +-
>>  arch/powerpc/platforms/ps3/time.c | 3 +--
>>  2 files changed, 2 insertions(+), 3 deletions(-)
>
> I tested the PS3 part of this patch and it seems to work OK.
>
> Acked-by: Geoff Levand <geoff@infradead.org>

Thanks Geoff for the review and testing; I'll add the ack.

Paul.
--

>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* Re: [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters
From: Michael Ellerman @ 2014-01-22  1:32 UTC (permalink / raw)
  To: Cody P Schafer
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Linux PPC
In-Reply-To: <1389916434-2288-1-git-send-email-cody@linux.vnet.ibm.com>

On Thu, 2014-01-16 at 15:53 -0800, Cody P Schafer wrote:
> These patches add basic pmus for 2 powerpc hypervisor interfaces to obtain
> performance counters: gpci ("get performance counter info") and 24x7.
> 
> The counters supplied by these interfaces are continually counting and never
> need to be (and cannot be) disabled or enabled. They additionally do not
> generate any interrupts. This makes them in some regards similar to software
> counters, and as a result their implimentation shares some common code (which
> an initial patch exposes) with the sw counters.

Hi Cody,

Can you please add some more explanation of this series.

In particular why do we need two new PMUs, and how do they relate to each
other?

And can you add an example of how I'd actually use them using perf.

cheers

^ permalink raw reply

* Re: [PATCH 10/73] powerpc: use device_initcall for registering rtc devices
From: Geoff Levand @ 2014-01-21 23:48 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: linux-arch, linuxppc-dev, Paul Mackerras, linux-kernel
In-Reply-To: <1390339396-3479-11-git-send-email-paul.gortmaker@windriver.com>

Hi Paul,

On Tue, 2014-01-21 at 16:22 -0500, Paul Gortmaker wrote:
> Currently these two RTC devices are in core platform code
> where it is not possible for them to be modular.  It will
> never be modular, so using module_init as an alias for
> __initcall can be somewhat misleading.
> 
>  arch/powerpc/kernel/time.c        | 2 +-
>  arch/powerpc/platforms/ps3/time.c | 3 +--
>  2 files changed, 2 insertions(+), 3 deletions(-)

I tested the PS3 part of this patch and it seems to work OK.

Acked-by: Geoff Levand <geoff@infradead.org>

^ permalink raw reply

* [PATCH 3/3] powerpc/pseries: Report in kernel device tree update to drmgr
From: Tyrel Datwyler @ 2014-01-21 22:55 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: nfont
In-Reply-To: <1390344949-3983-1-git-send-email-tyreld@linux.vnet.ibm.com>

Tradiontally it has been drmgr's responsibilty to update the device tree
through the /proc/ppc64/ofdt interface after a suspend/resume operation.
This patchset however has modified suspend/resume ops to preform that update
entirely in the kernel during the resume. Therefore, a mechanism is required
for drmgr to determine who is responsible for the update. This patch adds a
show function the the "hibernate" attribute that returns 1 if the kernel
updates the device tree after the resume and 0 if drmgr is responsible.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/suspend.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c
index 16a2552..723115d 100644
--- a/arch/powerpc/platforms/pseries/suspend.c
+++ b/arch/powerpc/platforms/pseries/suspend.c
@@ -174,7 +174,30 @@ out:
 	return rc;
 }
 
-static DEVICE_ATTR(hibernate, S_IWUSR, NULL, store_hibernate);
+#define USER_DT_UPDATE	0
+#define KERN_DT_UPDATE	1
+
+/**
+ * show_hibernate - Report device tree update responsibilty
+ * @dev:		subsys root device
+ * @attr:		device attribute struct
+ * @buf:		buffer
+ *
+ * Report whether a device tree update is performed by the kernel after a
+ * resume, or if drmgr must coordinate the update from user space.
+ *
+ * Return value:
+ *	0 if drmgr is to initiate update, and 1 otherwise
+ **/
+static ssize_t show_hibernate(struct device *dev,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	return sprintf(buf, "%d\n", KERN_DT_UPDATE);
+}
+
+static DEVICE_ATTR(hibernate, S_IWUSR | S_IRUGO,
+		   show_hibernate, store_hibernate);
 
 static struct bus_type suspend_subsys = {
 	.name = "power",
-- 
1.7.12.4

^ permalink raw reply related

* [PATCH 2/3] powerpc/pseries: Update dynamic cache nodes for suspend/resume operation
From: Tyrel Datwyler @ 2014-01-21 22:55 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: nfont
In-Reply-To: <1390344949-3983-1-git-send-email-tyreld@linux.vnet.ibm.com>

pHyp can change cache nodes for suspend/resume operation. The current code
updates the device tree after all non boot CPUs are enabled. Hence, we do not
modify the cache list based on the latest cache nodes. Also we do not remove
cache entries for the primary CPU.

This patch removes the cache list for the boot CPU, updates the device tree
before enabling nonboot CPUs and adds cache list for the boot cpu.

Signed-off-by: Haren Myneni <hbabu@us.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/rtas.h |  4 ++++
 arch/powerpc/kernel/rtas.c      | 17 +++++++++++++++++
 arch/powerpc/kernel/time.c      |  6 ++++++
 3 files changed, 27 insertions(+)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 9bd52c6..da9d733 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -283,6 +283,10 @@ extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
 
 #ifdef CONFIG_PPC_PSERIES
 extern int pseries_devicetree_update(s32 scope);
+extern void post_mobility_fixup(void);
+extern void update_dynamic_configuration(void);
+#else /* !CONFIG_PPC_PSERIES */
+void update_dynamic_configuration(void) { }
 #endif
 
 #ifdef CONFIG_PPC_RTAS_DAEMON
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 4cf674d..8249eb2 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -43,6 +43,7 @@
 #include <asm/time.h>
 #include <asm/mmu.h>
 #include <asm/topology.h>
+#include "cacheinfo.h"
 
 struct rtas_t rtas = {
 	.lock = __ARCH_SPIN_LOCK_UNLOCKED
@@ -972,6 +973,22 @@ out:
 	free_cpumask_var(offline_mask);
 	return atomic_read(&data.error);
 }
+
+/*
+ * The device tree cache nodes can be modified during suspend/ resume.
+ * So delete all cache entries and recreate them again after the device tree
+ * update.
+ * We already deleted cache entries for notboot CPUs before suspend. So delete
+ * entries for the primary CPU, recreate entries after the device tree update.
+ * We can create entries for nonboot CPU when enable them later.
+ */
+
+void update_dynamic_configuration(void)
+{
+	cacheinfo_cpu_offline(smp_processor_id());
+	post_mobility_fixup();
+	cacheinfo_cpu_online(smp_processor_id());
+}
 #else /* CONFIG_PPC_PSERIES */
 int rtas_ibm_suspend_me(struct rtas_args *args)
 {
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index b3b1441..5f1ca28 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -69,6 +69,7 @@
 #include <asm/vdso_datapage.h>
 #include <asm/firmware.h>
 #include <asm/cputime.h>
+#include <asm/rtas.h>
 
 /* powerpc clocksource/clockevent code */
 
@@ -592,6 +593,11 @@ void arch_suspend_enable_irqs(void)
 	generic_suspend_enable_irqs();
 	if (ppc_md.suspend_enable_irqs)
 		ppc_md.suspend_enable_irqs();
+	/*
+	 * Update configuration which can be modified based on devicetree
+	 * changes during resume.
+	 */
+	update_dynamic_configuration();
 }
 #endif
 
-- 
1.7.12.4

^ permalink raw reply related

* [PATCH 1/3] powerpc/pseries: Device tree should only be updated once after suspend/migrate
From: Tyrel Datwyler @ 2014-01-21 22:55 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: nfont
In-Reply-To: <1390344949-3983-1-git-send-email-tyreld@linux.vnet.ibm.com>

The current code makes rtas calls for update-nodes, activate-firmware and then
update-nodes again. The FW provides the same data for both update-nodes calls.
As a result a proc entry exists error is reported for the second update while
adding device nodes.

This patch makes a single rtas call for update-nodes after activating the FW.
It also add rtas_busy delay for the activate-firmware rtas call.

Signed-off-by: Haren Myneni <hbabu@us.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 26 ++++++++++----------------
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index cde4e0a..bde7eba 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -290,13 +290,6 @@ void post_mobility_fixup(void)
 	int rc;
 	int activate_fw_token;
 
-	rc = pseries_devicetree_update(MIGRATION_SCOPE);
-	if (rc) {
-		printk(KERN_ERR "Initial post-mobility device tree update "
-		       "failed: %d\n", rc);
-		return;
-	}
-
 	activate_fw_token = rtas_token("ibm,activate-firmware");
 	if (activate_fw_token == RTAS_UNKNOWN_SERVICE) {
 		printk(KERN_ERR "Could not make post-mobility "
@@ -304,16 +297,17 @@ void post_mobility_fixup(void)
 		return;
 	}
 
-	rc = rtas_call(activate_fw_token, 0, 1, NULL);
-	if (!rc) {
-		rc = pseries_devicetree_update(MIGRATION_SCOPE);
-		if (rc)
-			printk(KERN_ERR "Secondary post-mobility device tree "
-			       "update failed: %d\n", rc);
-	} else {
+	do {
+		rc = rtas_call(activate_fw_token, 0, 1, NULL);
+	} while (rtas_busy_delay(rc));
+
+	if (rc)
 		printk(KERN_ERR "Post-mobility activate-fw failed: %d\n", rc);
-		return;
-	}
+
+	rc = pseries_devicetree_update(MIGRATION_SCOPE);
+	if (rc)
+		printk(KERN_ERR "Post-mobility device tree update "
+			"failed: %d\n", rc);
 
 	return;
 }
-- 
1.7.12.4

^ permalink raw reply related

* [PATCH 0/3] powerpc/pseries: fix issues in suspend/resume code
From: Tyrel Datwyler @ 2014-01-21 22:55 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: nfont

This patchset fixes a couple of issues encountered in the suspend/resume code
base. First when using the kernel device tree update code update-nodes is
unnecessarily called more than once. Second the cpu cache lists are not
updated after a suspend/resume which under certain conditions may cause a
panic. Finally, since the cache list fix utilzes in kernel device tree update
code a means for telling drmgr not to perform a device tree update from 
userspace is required.

Tyrel Datwyler (3):
  powerpc/pseries: Device tree should only be updated once after
    suspend/migrate
  powerpc/pseries: Update dynamic cache nodes for suspend/resume
    operation
  powerpc/pseries: Report in kernel device tree update to drmgr

 arch/powerpc/include/asm/rtas.h           |  4 ++++
 arch/powerpc/kernel/rtas.c                | 17 +++++++++++++++++
 arch/powerpc/kernel/time.c                |  6 ++++++
 arch/powerpc/platforms/pseries/mobility.c | 26 ++++++++++----------------
 arch/powerpc/platforms/pseries/suspend.c  | 25 ++++++++++++++++++++++++-
 5 files changed, 61 insertions(+), 17 deletions(-)

-- 
1.7.12.4

^ permalink raw reply

* [PATCH] powerpc: fix hw breakpoints on !HAVE_HW_BREAKPOINT configurations
From: Andreas Schwab @ 2014-01-21 22:24 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Ian Munsie
In-Reply-To: <12813.1357794092__45363.9676016339$1357794149$gmane$org@ale.ozlabs.ibm.com>

This fixes a logic error that caused a failure to update the hw breakpoint
registers when not using the hw-breakpoint interface.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
---
 arch/powerpc/kernel/process.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 4a96556..7714950 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -690,7 +690,7 @@ struct task_struct *__switch_to(struct task_struct *prev,
  * schedule DABR
  */
 #ifndef CONFIG_HAVE_HW_BREAKPOINT
-	if (unlikely(hw_brk_match(&__get_cpu_var(current_brk), &new->thread.hw_brk)))
+	if (unlikely(!hw_brk_match(&__get_cpu_var(current_brk), &new->thread.hw_brk)))
 		set_breakpoint(&new->thread.hw_brk);
 #endif /* CONFIG_HAVE_HW_BREAKPOINT */
 #endif
-- 
1.8.5.3


-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply related

* Re: [PATCH 12/73] powerpc: kvm e500/44x is not modular, so don't use module_init
From: Paul Gortmaker @ 2014-01-21 22:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kvm, Gleb Natapov, Alexander Graf, kvm-ppc,
	Paul Gortmaker, Paul Mackerras, Paolo Bonzini, linuxppc-dev
In-Reply-To: <1390339396-3479-13-git-send-email-paul.gortmaker@windriver.com>

On 14-01-21 04:22 PM, Paul Gortmaker wrote:
> In powerpc, CONFIG_KVM is bool, and  so are these three subarch
> options, for the 44x and e500 variants.  This means that any
> module_exit() calls and functions used by them such as the
> kvmppc_booke_exit() are dead code.  Here we remove them.
> 
> In addition, rather than use module_init, which is just
> __initcall for non-modules, we update those as well.
> 
> Note that direct use of __initcall is discouraged, vs. one
> of the priority categorized subgroups.  As __initcall gets
> mapped onto device_initcall, our use of subsys_initcall (which
> seems to make sense for netfilter code) will thus change this

I've fixed the above --  s/netfilter/PPC KVM/

The risks of recycling commit logs...

Paul.
--

> registration from level 6-device to level 4-subsys (i.e. slightly
> earlier).
> 
> However no impact of that small difference is expected,
> since the arch independent kvm code doesn't trigger any init;
> it is the arch initcalls here which actually call kvm_init.
> 
> Cc: Gleb Natapov <gleb@kernel.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Alexander Graf <agraf@suse.de>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: kvm@vger.kernel.org
> Cc: kvm-ppc@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
>  arch/powerpc/include/asm/kvm_ppc.h |  1 -
>  arch/powerpc/kvm/44x.c             | 10 +---------
>  arch/powerpc/kvm/booke.c           |  6 ------
>  arch/powerpc/kvm/e500.c            | 10 +---------
>  arch/powerpc/kvm/e500mc.c          | 10 +---------
>  5 files changed, 3 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index c8317fb..8466df5 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -109,7 +109,6 @@ extern void kvmppc_core_flush_tlb(struct kvm_vcpu *vcpu);
>  extern int kvmppc_core_check_requests(struct kvm_vcpu *vcpu);
>  
>  extern int kvmppc_booke_init(void);
> -extern void kvmppc_booke_exit(void);
>  
>  extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu);
>  extern int kvmppc_kvm_pv(struct kvm_vcpu *vcpu);
> diff --git a/arch/powerpc/kvm/44x.c b/arch/powerpc/kvm/44x.c
> index 93221e8..2129fc1 100644
> --- a/arch/powerpc/kvm/44x.c
> +++ b/arch/powerpc/kvm/44x.c
> @@ -222,12 +222,4 @@ static int __init kvmppc_44x_init(void)
>  err_out:
>  	return r;
>  }
> -
> -static void __exit kvmppc_44x_exit(void)
> -{
> -	kvmppc_pr_ops = NULL;
> -	kvmppc_booke_exit();
> -}
> -
> -module_init(kvmppc_44x_init);
> -module_exit(kvmppc_44x_exit);
> +subsys_initcall(kvmppc_44x_init);
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index 0591e05..49dffa2 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -1995,9 +1995,3 @@ int __init kvmppc_booke_init(void)
>  #endif /* !BOOKE_HV */
>  	return 0;
>  }
> -
> -void __exit kvmppc_booke_exit(void)
> -{
> -	free_pages(kvmppc_booke_handlers, VCPU_SIZE_ORDER);
> -	kvm_exit();
> -}
> diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
> index 497b142..115ef12 100644
> --- a/arch/powerpc/kvm/e500.c
> +++ b/arch/powerpc/kvm/e500.c
> @@ -564,12 +564,4 @@ static int __init kvmppc_e500_init(void)
>  err_out:
>  	return r;
>  }
> -
> -static void __exit kvmppc_e500_exit(void)
> -{
> -	kvmppc_pr_ops = NULL;
> -	kvmppc_booke_exit();
> -}
> -
> -module_init(kvmppc_e500_init);
> -module_exit(kvmppc_e500_exit);
> +subsys_initcall(kvmppc_e500_init);
> diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
> index 4132cd2..612c216 100644
> --- a/arch/powerpc/kvm/e500mc.c
> +++ b/arch/powerpc/kvm/e500mc.c
> @@ -382,12 +382,4 @@ static int __init kvmppc_e500mc_init(void)
>  err_out:
>  	return r;
>  }
> -
> -static void __exit kvmppc_e500mc_exit(void)
> -{
> -	kvmppc_pr_ops = NULL;
> -	kvmppc_booke_exit();
> -}
> -
> -module_init(kvmppc_e500mc_init);
> -module_exit(kvmppc_e500mc_exit);
> +subsys_initcall(kvmppc_e500mc_init);
> 

^ permalink raw reply

* Re: [PATCH] powerpc: set the correct ksp_limit on ppc32 when switching to irq stack
From: Benjamin Herrenschmidt @ 2014-01-21 21:48 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linuxppc, Kevin Hao
In-Reply-To: <20140121161450.GA3311@roeck-us.net>

On Tue, 2014-01-21 at 08:14 -0800, Guenter Roeck wrote:
> On Fri, Jan 17, 2014 at 12:25:28PM +0800, Kevin Hao wrote:
> > Guenter Roeck has got the following call trace on a p2020 board:
> >   Kernel stack overflow in process eb3e5a00, r1=eb79df90
> >   CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
> >   task: eb3e5a00 ti: c0616000 task.ti: ef440000
> >   NIP: c003a420 LR: c003a410 CTR: c0017518
> >   REGS: eb79dee0 TRAP: 0901   Not tainted (3.13.0-rc8-juniper-00146-g19eca00)
> >   MSR: 00029000 <CE,EE,ME>  CR: 24008444  XER: 00000000
> >   GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000
> >   GPR08: 00000000 020b8000 00000000 00000000 44008442
> >   NIP [c003a420] __do_softirq+0x94/0x1ec
> >   LR [c003a410] __do_softirq+0x84/0x1ec
> >   Call Trace:
> >   [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable)
> >   [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8
> >   [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c
> >   [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8
> >   [ef441f40] [c000e7f4] ret_from_except+0x0/0x18
> >   --- Exception: 501 at 0xfcda524
> >       LR = 0x10024900
> >   Instruction dump:
> >   7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9
> >   5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f
> >   Kernel panic - not syncing: kernel stack overflow
> >   CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
> >   Call Trace:
> > 
> > The reason is that we have used the wrong register to calculate the
> > ksp_limit in commit cbc9565ee826 (powerpc: Remove ksp_limit on ppc64).
> > Just fix it.
> > 
> > As suggested by Benjamin Herrenschmidt, also add the C prototype of the
> > function in the comment in order to avoid such kind of errors in the
> > future.
> > 
> Was this patch accepted, or are there any problems with it ?
> I didn't see any comments, and it still isn't upstream nor in linux-next.

It will be merged when I come back from vacation. It was too late for
3.13 so I'll send it to Linus next week and will CC -stable.

Cheers,
Ben.

^ permalink raw reply

* [PATCH] powerpc: Fix endian issues in kexec and crash dump code
From: Anton Blanchard @ 2014-01-21 21:40 UTC (permalink / raw)
  To: benh, paulus; +Cc: linuxppc-dev


We expose a number of OF properties in the kexec and crash dump code
and these need to be big endian.

Cc: stable@vger.kernel.org # v3.13
Signed-off-by: Anton Blanchard <anton@samba.org>
--

diff --git a/arch/powerpc/kernel/machine_kexec.c b/arch/powerpc/kernel/machine_kexec.c
index 75d4f73..015ae55 100644
--- a/arch/powerpc/kernel/machine_kexec.c
+++ b/arch/powerpc/kernel/machine_kexec.c
@@ -196,7 +196,9 @@ int overlaps_crashkernel(unsigned long start, unsigned long size)
 
 /* Values we need to export to the second kernel via the device tree. */
 static phys_addr_t kernel_end;
+static phys_addr_t crashk_base;
 static phys_addr_t crashk_size;
+static unsigned long long mem_limit;
 
 static struct property kernel_end_prop = {
 	.name = "linux,kernel-end",
@@ -207,7 +209,7 @@ static struct property kernel_end_prop = {
 static struct property crashk_base_prop = {
 	.name = "linux,crashkernel-base",
 	.length = sizeof(phys_addr_t),
-	.value = &crashk_res.start,
+	.value = &crashk_base
 };
 
 static struct property crashk_size_prop = {
@@ -219,9 +221,11 @@ static struct property crashk_size_prop = {
 static struct property memory_limit_prop = {
 	.name = "linux,memory-limit",
 	.length = sizeof(unsigned long long),
-	.value = &memory_limit,
+	.value = &mem_limit,
 };
 
+#define cpu_to_be_ulong	__PASTE(cpu_to_be, BITS_PER_LONG)
+
 static void __init export_crashk_values(struct device_node *node)
 {
 	struct property *prop;
@@ -237,8 +241,9 @@ static void __init export_crashk_values(struct device_node *node)
 		of_remove_property(node, prop);
 
 	if (crashk_res.start != 0) {
+		crashk_base = cpu_to_be_ulong(crashk_res.start),
 		of_add_property(node, &crashk_base_prop);
-		crashk_size = resource_size(&crashk_res);
+		crashk_size = cpu_to_be_ulong(resource_size(&crashk_res));
 		of_add_property(node, &crashk_size_prop);
 	}
 
@@ -246,6 +251,7 @@ static void __init export_crashk_values(struct device_node *node)
 	 * memory_limit is required by the kexec-tools to limit the
 	 * crash regions to the actual memory used.
 	 */
+	mem_limit = cpu_to_be_ulong(memory_limit);
 	of_update_property(node, &memory_limit_prop);
 }
 
@@ -264,7 +270,7 @@ static int __init kexec_setup(void)
 		of_remove_property(node, prop);
 
 	/* information needed by userspace when using default_machine_kexec */
-	kernel_end = __pa(_end);
+	kernel_end = cpu_to_be_ulong(__pa(_end));
 	of_add_property(node, &kernel_end_prop);
 
 	export_crashk_values(node);
diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c
index be4e6d6..59d229a 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -369,6 +369,7 @@ void default_machine_kexec(struct kimage *image)
 
 /* Values we need to export to the second kernel via the device tree. */
 static unsigned long htab_base;
+static unsigned long htab_size;
 
 static struct property htab_base_prop = {
 	.name = "linux,htab-base",
@@ -379,7 +380,7 @@ static struct property htab_base_prop = {
 static struct property htab_size_prop = {
 	.name = "linux,htab-size",
 	.length = sizeof(unsigned long),
-	.value = &htab_size_bytes,
+	.value = &htab_size,
 };
 
 static int __init export_htab_values(void)
@@ -403,8 +404,9 @@ static int __init export_htab_values(void)
 	if (prop)
 		of_remove_property(node, prop);
 
-	htab_base = __pa(htab_address);
+	htab_base = cpu_to_be64(__pa(htab_address));
 	of_add_property(node, &htab_base_prop);
+	htab_size = cpu_to_be64(htab_size_bytes);
 	of_add_property(node, &htab_size_prop);
 
 	of_node_put(node);

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox