LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] powerpc: thp: Fix crash on mremap
From: Kirill A. Shutemov @ 2014-01-13 22:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: aarcange, linux-mm, paulus, Aneesh Kumar K.V, linuxppc-dev,
	kirill.shutemov
In-Reply-To: <20140113141748.0b851e1573e41bf26de7c0ae@linux-foundation.org>

On Mon, Jan 13, 2014 at 02:17:48PM -0800, Andrew Morton wrote:
> On Thu, 2 Jan 2014 04:19:51 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Wed, Jan 01, 2014 at 09:29:05PM +1100, Benjamin Herrenschmidt wrote:
> > > On Wed, 2014-01-01 at 15:23 +0530, Aneesh Kumar K.V wrote:
> > > > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > > > 
> > > > This patch fix the below crash
> > > > 
> > > > NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
> > > > LR [c0000000000439ac] .hash_page+0x18c/0x5e0
> > > > ...
> > > > Call Trace:
> > > > [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
> > > > [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
> > > > [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
> > > > 
> > > > On ppc64 we use the pgtable for storing the hpte slot information and
> > > > store address to the pgtable at a constant offset (PTRS_PER_PMD) from
> > > > pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
> > > > the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
> > > > from new pmd.
> > > > 
> > > > We also want to move the withdraw and deposit before the set_pmd so
> > > > that, when page fault find the pmd as trans huge we can be sure that
> > > > pgtable can be located at the offset.
> > > > 
> 
> Did this get fixed?

New version: http://thread.gmane.org/gmane.linux.kernel.mm/111809

-- 
 Kirill A. Shutemov

^ permalink raw reply

* Re: [PATCH] powerpc: thp: Fix crash on mremap
From: Benjamin Herrenschmidt @ 2014-01-14  4:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: aarcange, linux-mm, paulus, Aneesh Kumar K.V, Kirill A. Shutemov,
	linuxppc-dev, kirill.shutemov
In-Reply-To: <20140113141748.0b851e1573e41bf26de7c0ae@linux-foundation.org>

On Mon, 2014-01-13 at 14:17 -0800, Andrew Morton wrote:

> Did this get fixed?

Any chance you can Ack the patch on that thread ?

http://thread.gmane.org/gmane.linux.kernel.mm/111809

So I can put it in powerpc -next with a CC stable ? Or if you tell me
tat Kirill Ack is sufficient then I'll go for it.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH mmotm/next] powerpc: fix powernv boot breakage on G5???
From: Benjamin Herrenschmidt @ 2014-01-14  4:17 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Mahesh Salgaonkar, linuxppc-dev
In-Reply-To: <alpine.LSU.2.11.1401120043210.1092@eggly.anvils>

On Sun, 2014-01-12 at 00:46 -0800, Hugh Dickins wrote:
> My PowerMac G5 cannot boot mmotm these days: different symptoms
> (starting /sbin/init failed? or ATA errors and hang?), with unrelated
> bugs adding to the confusion; but a bisection led to b5ff4211a829
> "powerpc/book3s: Queue up and process delayed MCE events".  Since that
> series seems to be mostly about powernv, I tried changing BOOK3S_64
> to POWERNV in entry_64.S, which has got it back to working for me.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> just in case this happens to be right, but it's well beyond me!
> ---

Do that help instead ?

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 770d6d6..9820d36 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -187,6 +187,7 @@ syscall_exit:
 #ifdef CONFIG_PPC_BOOK3S_64
 BEGIN_FTR_SECTION
 	bl	.machine_check_process_queued_event
+	ld	r3,RESULT(r1)
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
 #endif
 	CURRENT_THREAD_INFO(r12, r1)

Cheers,
Ben.

> 
>  arch/powerpc/kernel/entry_64.S |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- mmotm/arch/powerpc/kernel/entry_64.S	2014-01-10 18:24:56.940448828 -0800
> +++ linux/arch/powerpc/kernel/entry_64.S	2014-01-10 18:29:24.276455182 -0800
> @@ -184,7 +184,7 @@ syscall_exit:
>  	bl	.do_show_syscall_exit
>  	ld	r3,RESULT(r1)
>  #endif
> -#ifdef CONFIG_PPC_BOOK3S_64
> +#ifdef CONFIG_PPC_POWERNV
>  BEGIN_FTR_SECTION
>  	bl	.machine_check_process_queued_event
>  END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)

^ permalink raw reply related

* [PATCH] Move precessing of MCE queued event out from syscall exit path.
From: Mahesh J Salgaonkar @ 2014-01-14  4:26 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt; +Cc: Hugh Dickins

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Huge Dickins reported an issue that b5ff4211a829
"powerpc/book3s: Queue up and process delayed MCE events" breaks the
PowerMac G5 boot. This patch fixes it by moving the mce even processing
away from syscall exit, which was wrong to do that in first place, and
implements a different mechanism to deal with it using a paca flag and
decrementer interrupt to process the event.

Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/mce.h  |    3 +++
 arch/powerpc/include/asm/paca.h |    3 +++
 arch/powerpc/kernel/entry_64.S  |    5 -----
 arch/powerpc/kernel/irq.c       |   11 ++++++++++-
 arch/powerpc/kernel/mce.c       |    7 +++++++
 arch/powerpc/kernel/time.c      |    9 +++++++++
 6 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index 2257d1e..225e678 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -186,6 +186,9 @@ struct mce_error_info {
 #define MCE_EVENT_RELEASE	true
 #define MCE_EVENT_DONTRELEASE	false
 
+/* MCE bit flags (paca.mce_flags) */
+#define MCE_EVENT_PENDING	0x0001
+
 extern void save_mce_event(struct pt_regs *regs, long handled,
 			   struct mce_error_info *mce_err, uint64_t nip,
 			   uint64_t addr);
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index c3523d1..f9aa521 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -141,6 +141,9 @@ struct paca_struct {
 	u8 io_sync;			/* writel() needs spin_unlock sync */
 	u8 irq_work_pending;		/* IRQ_WORK interrupt while soft-disable */
 	u8 nap_state_lost;		/* NV GPR values lost in power7_idle */
+#ifdef CONFIG_PPC_BOOK3S_64
+	u8 mce_flags;			/* MCE bit flags. */
+#endif
 	u64 sprg3;			/* Saved user-visible sprg */
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	u64 tm_scratch;                 /* TM scratch area for reclaim */
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 770d6d6..bbfb029 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -184,11 +184,6 @@ syscall_exit:
 	bl	.do_show_syscall_exit
 	ld	r3,RESULT(r1)
 #endif
-#ifdef CONFIG_PPC_BOOK3S_64
-BEGIN_FTR_SECTION
-	bl	.machine_check_process_queued_event
-END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
-#endif
 	CURRENT_THREAD_INFO(r12, r1)
 
 	ld	r8,_MSR(r1)
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index ba01656..e22f591 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -67,6 +67,7 @@
 #include <asm/udbg.h>
 #include <asm/smp.h>
 #include <asm/debug.h>
+#include <asm/mce.h>
 
 #ifdef CONFIG_PPC64
 #include <asm/paca.h>
@@ -158,9 +159,17 @@ notrace unsigned int __check_irq_replay(void)
 	 * We may have missed a decrementer interrupt. We check the
 	 * decrementer itself rather than the paca irq_happened field
 	 * in case we also had a rollover while hard disabled
+	 * Also check if any MCE event is queued up that requires
+	 * processing. Machine check handler would set paca->mce_flags
+	 * and then call set_dec(1) to trigger a decrementer interrupt
+	 * from NMI.
 	 */
 	local_paca->irq_happened &= ~PACA_IRQ_DEC;
-	if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow())
+	if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow()
+#ifdef CONFIG_PPC_BOOK3S_64
+		|| local_paca->mce_flags & MCE_EVENT_PENDING
+#endif
+		)
 		return 0x900;
 
 	/* Finally check if an external interrupt happened */
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index d6edf2b..7bab827 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -185,6 +185,13 @@ void machine_check_queue_event(void)
 		return;
 	}
 	__get_cpu_var(mce_event_queue[index]) = evt;
+
+	/*
+	 * Set the event pending flag and raise an decrementer interrupt
+	 * to process the queued event later.
+	 */
+	local_paca->mce_flags |= MCE_EVENT_PENDING;
+	set_dec(1);
 }
 
 /*
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index b3b1441..87ccf92 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -69,6 +69,7 @@
 #include <asm/vdso_datapage.h>
 #include <asm/firmware.h>
 #include <asm/cputime.h>
+#include <asm/mce.h>
 
 /* powerpc clocksource/clockevent code */
 
@@ -505,6 +506,14 @@ void timer_interrupt(struct pt_regs * regs)
 		return;
 	}
 
+#ifdef CONFIG_PPC_BOOK3S_64
+	/* Check if we have MCE event pending for processing. */
+	if (local_paca->mce_flags & MCE_EVENT_PENDING) {
+		local_paca->mce_flags &= ~MCE_EVENT_PENDING;
+		machine_check_process_queued_event();
+	}
+#endif
+
 	/* Conditionally hard-enable interrupts now that the DEC has been
 	 * bumped to its maximum value
 	 */

^ permalink raw reply related

* Re: [PATCH] powerpc: thp: Fix crash on mremap
From: Andrew Morton @ 2014-01-14  4:32 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: aarcange, linux-mm, paulus, Aneesh Kumar K.V, Kirill A. Shutemov,
	linuxppc-dev, kirill.shutemov
In-Reply-To: <1389672810.6933.0.camel@pasglop>

On Tue, 14 Jan 2014 15:13:30 +1100 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Mon, 2014-01-13 at 14:17 -0800, Andrew Morton wrote:
> 
> > Did this get fixed?
> 
> Any chance you can Ack the patch on that thread ?
> 
> http://thread.gmane.org/gmane.linux.kernel.mm/111809
> 
> So I can put it in powerpc -next with a CC stable ? Or if you tell me
> tat Kirill Ack is sufficient then I'll go for it.

yup, it looks OK to me from a non-ppc perspective.  Please proceed as
described.

^ permalink raw reply

* [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Preeti U Murthy @ 2014-01-14  6:05 UTC (permalink / raw)
  To: deepthi, paulmck, linux-pm, benh, daniel.lezcano, rjw,
	linux-kernel, srivatsa.bhat, svaidy, linuxppc-dev,
	tuukka.tikkanen

On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
Inspite of this it was observed that the idle state count of the shallowest
idle state, snooze, was increasing.

This is because the governor returns the idle state index as 0 even in
scenarios when no idle state can be chosen. These scenarios could be when the
latency requirement is 0 or as mentioned above when the user wants to disable
certain cpu idle states at runtime. In the latter case, its possible that no
cpu idle state is valid because the suitable states were disabled
and the rest did not match the menu governor criteria to be chosen as the
next idle state.

This patch adds the code to indicate that a valid cpu idle state could not be
chosen by the menu governor and reports back to arch so that it can take some
default action.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---

 drivers/cpuidle/cpuidle.c        |    6 +++++-
 drivers/cpuidle/governors/menu.c |    7 ++++---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a55e68f..5bf06bb 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
 
 	/* ask the governor for the next state */
 	next_state = cpuidle_curr_governor->select(drv, dev);
+
+	dev->last_residency = 0;
 	if (need_resched()) {
-		dev->last_residency = 0;
 		/* give the governor an opportunity to reflect on the outcome */
 		if (cpuidle_curr_governor->reflect)
 			cpuidle_curr_governor->reflect(dev, next_state);
@@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
 		return 0;
 	}
 
+	if (next_state < 0)
+		return -EINVAL;
+
 	trace_cpu_idle_rcuidle(next_state, dev->cpu);
 
 	broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index cf7f2f0..6921543 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -283,6 +283,7 @@ again:
  * menu_select - selects the next idle state to enter
  * @drv: cpuidle driver containing state data
  * @dev: the CPU
+ * Returns -1 when no idle state is suitable
  */
 static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 {
@@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	int multiplier;
 	struct timespec t;
 
-	if (data->needs_update) {
+	if (data->last_state_idx >= 0 && data->needs_update) {
 		menu_update(drv, dev);
 		data->needs_update = 0;
 	}
 
-	data->last_state_idx = 0;
+	data->last_state_idx = -1;
 	data->exit_us = 0;
 
 	/* Special case when user has set very strict latency requirement */
 	if (unlikely(latency_req == 0))
-		return 0;
+		return data->last_state_idx;
 
 	/* determine the expected residency time, round up */
 	t = ktime_to_timespec(tick_nohz_get_sleep_length());

^ permalink raw reply related

* [PATCH] powerpc: Fix races with irq_work
From: Benjamin Herrenschmidt @ 2014-01-14  6:11 UTC (permalink / raw)
  To: linuxppc-dev list

If we set irq_work on a processor and immediately afterward, before the
irq work has a chance to be processed, we change the decrementer value,
we can seriously delay the handling of that irq_work.

Fix it by checking in a few places for pending irq work, first before
changing the decrementer in decrementer_set_next_event() and after
changing it in the same function and in timer_interrupt().

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index afb1b56..b3dab20 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -536,6 +536,9 @@ void timer_interrupt(struct pt_regs * regs)
 		now = *next_tb - now;
 		if (now <= DECREMENTER_MAX)
 			set_dec((int)now);
+		/* We may have raced with new irq work */
+		if (test_irq_work_pending())
+			set_dec(1);
 		__get_cpu_var(irq_stat).timer_irqs_others++;
 	}
 
@@ -802,8 +805,16 @@ static void __init clocksource_init(void)
 static int decrementer_set_next_event(unsigned long evt,
 				      struct clock_event_device *dev)
 {
+	/* Don't adjust the decrementer if some irq work is pending */
+	if (test_irq_work_pending())
+		return 0;
 	__get_cpu_var(decrementers_next_tb) = get_tb_or_rtc() + evt;
 	set_dec(evt);
+
+	/* We may have raced with new irq work */
+	if (test_irq_work_pending())
+		set_dec(1);
+
 	return 0;
 }
 

^ permalink raw reply related

* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Deepthi Dharwar @ 2014-01-14  6:16 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: linux-pm, daniel.lezcano, rjw, linux-kernel, srivatsa.bhat,
	paulmck, linuxppc-dev, tuukka.tikkanen
In-Reply-To: <20140114060516.6109.14901.stgit@preeti.in.ibm.com>

On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
> Inspite of this it was observed that the idle state count of the shallowest
> idle state, snooze, was increasing.
> 
> This is because the governor returns the idle state index as 0 even in
> scenarios when no idle state can be chosen. These scenarios could be when the
> latency requirement is 0 or as mentioned above when the user wants to disable
> certain cpu idle states at runtime. In the latter case, its possible that no
> cpu idle state is valid because the suitable states were disabled
> and the rest did not match the menu governor criteria to be chosen as the
> next idle state.
> 
> This patch adds the code to indicate that a valid cpu idle state could not be
> chosen by the menu governor and reports back to arch so that it can take some
> default action.
> 
> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> ---

Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>

> 
>  drivers/cpuidle/cpuidle.c        |    6 +++++-
>  drivers/cpuidle/governors/menu.c |    7 ++++---
>  2 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index a55e68f..5bf06bb 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
> 
>  	/* ask the governor for the next state */
>  	next_state = cpuidle_curr_governor->select(drv, dev);
> +
> +	dev->last_residency = 0;
>  	if (need_resched()) {
> -		dev->last_residency = 0;
>  		/* give the governor an opportunity to reflect on the outcome */
>  		if (cpuidle_curr_governor->reflect)
>  			cpuidle_curr_governor->reflect(dev, next_state);
> @@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
>  		return 0;
>  	}
> 
> +	if (next_state < 0)
> +		return -EINVAL;
> +
>  	trace_cpu_idle_rcuidle(next_state, dev->cpu);
> 
>  	broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index cf7f2f0..6921543 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -283,6 +283,7 @@ again:
>   * menu_select - selects the next idle state to enter
>   * @drv: cpuidle driver containing state data
>   * @dev: the CPU
> + * Returns -1 when no idle state is suitable
>   */
>  static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  {
> @@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  	int multiplier;
>  	struct timespec t;
> 
> -	if (data->needs_update) {
> +	if (data->last_state_idx >= 0 && data->needs_update) {
>  		menu_update(drv, dev);
>  		data->needs_update = 0;
>  	}
> 
> -	data->last_state_idx = 0;
> +	data->last_state_idx = -1;
>  	data->exit_us = 0;
> 
>  	/* Special case when user has set very strict latency requirement */
>  	if (unlikely(latency_req == 0))
> -		return 0;
> +		return data->last_state_idx;
> 
>  	/* determine the expected residency time, round up */
>  	t = ktime_to_timespec(tick_nohz_get_sleep_length());
> 

^ permalink raw reply

* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Srivatsa S. Bhat @ 2014-01-14  7:00 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: deepthi, linux-pm, daniel.lezcano, rjw, linux-kernel, paulmck,
	linuxppc-dev, tuukka.tikkanen
In-Reply-To: <20140114060516.6109.14901.stgit@preeti.in.ibm.com>

On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
> Inspite of this it was observed that the idle state count of the shallowest
> idle state, snooze, was increasing.
> 
> This is because the governor returns the idle state index as 0 even in
> scenarios when no idle state can be chosen. These scenarios could be when the
> latency requirement is 0 or as mentioned above when the user wants to disable
> certain cpu idle states at runtime. In the latter case, its possible that no
> cpu idle state is valid because the suitable states were disabled
> and the rest did not match the menu governor criteria to be chosen as the
> next idle state.
> 
> This patch adds the code to indicate that a valid cpu idle state could not be
> chosen by the menu governor and reports back to arch so that it can take some
> default action.
> 

That sounds fair enough. However, the "default" action of pseries idle loop
(pseries_lpar_idle()) surprises me. It enters Cede, which is _deeper_ than doing
a snooze! IOW, a user might "disable" cpuidle or set the PM_QOS_CPU_DMA_LATENCY
to 0 hoping to prevent the CPUs from going to deep idle states, but then the
machine would still end up going to Cede, even though that wont get reflected
in the idle state counts. IMHO that scenario needs some thought as well...

> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> ---
> 
>  drivers/cpuidle/cpuidle.c        |    6 +++++-
>  drivers/cpuidle/governors/menu.c |    7 ++++---
>  2 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index a55e68f..5bf06bb 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
> 
>  	/* ask the governor for the next state */
>  	next_state = cpuidle_curr_governor->select(drv, dev);
> +
> +	dev->last_residency = 0;
>  	if (need_resched()) {
> -		dev->last_residency = 0;
>  		/* give the governor an opportunity to reflect on the outcome */
>  		if (cpuidle_curr_governor->reflect)
>  			cpuidle_curr_governor->reflect(dev, next_state);

The comments on top of the .reflect() routines of the governors say that the
second parameter is the index of the actual state entered. But after this patch,
next_state can be negative, indicating an invalid index. So those comments need
to be updated accordingly.

> @@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
>  		return 0;
>  	}
> 
> +	if (next_state < 0)
> +		return -EINVAL;

The exit path above (due to need_resched) returns with irqs enabled, but the new
one you are adding (next_state < 0) returns with irqs disabled. This is correct,
because in the latter case, "idle" is still in progress and the arch will choose
a default handler to execute (unlike the former case where "idle" is over and
hence its time to enable interrupts).

IMHO it would be good to add comments around this code to explain this subtle
difference. We can never be too careful with these things... ;-)

> +
>  	trace_cpu_idle_rcuidle(next_state, dev->cpu);
> 
>  	broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index cf7f2f0..6921543 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -283,6 +283,7 @@ again:
>   * menu_select - selects the next idle state to enter
>   * @drv: cpuidle driver containing state data
>   * @dev: the CPU
> + * Returns -1 when no idle state is suitable
>   */
>  static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  {
> @@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  	int multiplier;
>  	struct timespec t;
> 
> -	if (data->needs_update) {
> +	if (data->last_state_idx >= 0 && data->needs_update) {
               ^^^^^
Doesn't hurt, but actually unnecessary, since ->needs_update is set to 1
only when index >= 0.

>  		menu_update(drv, dev);
>  		data->needs_update = 0;
>  	}
> 
> -	data->last_state_idx = 0;
> +	data->last_state_idx = -1;
>  	data->exit_us = 0;
> 
>  	/* Special case when user has set very strict latency requirement */
>  	if (unlikely(latency_req == 0))
> -		return 0;
> +		return data->last_state_idx;
> 
>  	/* determine the expected residency time, round up */
>  	t = ktime_to_timespec(tick_nohz_get_sleep_length());
> 

What about the ladder governor? I know its not used that much in practice,
but I think it would be good to update that as well, just to keep it
consistent.

Regards,
Srivatsa S. Bhat

^ permalink raw reply

* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Srivatsa S. Bhat @ 2014-01-14  7:37 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: deepthi, linux-pm, daniel.lezcano, rjw, linux-kernel, paulmck,
	linuxppc-dev, tuukka.tikkanen
In-Reply-To: <52D4E07E.204@linux.vnet.ibm.com>

On 01/14/2014 12:30 PM, Srivatsa S. Bhat wrote:
> On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
>> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
>> Inspite of this it was observed that the idle state count of the shallowest
>> idle state, snooze, was increasing.
>>
>> This is because the governor returns the idle state index as 0 even in
>> scenarios when no idle state can be chosen. These scenarios could be when the
>> latency requirement is 0 or as mentioned above when the user wants to disable
>> certain cpu idle states at runtime. In the latter case, its possible that no
>> cpu idle state is valid because the suitable states were disabled
>> and the rest did not match the menu governor criteria to be chosen as the
>> next idle state.
>>
>> This patch adds the code to indicate that a valid cpu idle state could not be
>> chosen by the menu governor and reports back to arch so that it can take some
>> default action.
>>
> 
> That sounds fair enough. However, the "default" action of pseries idle loop
> (pseries_lpar_idle()) surprises me. It enters Cede, which is _deeper_ than doing
> a snooze! IOW, a user might "disable" cpuidle or set the PM_QOS_CPU_DMA_LATENCY
> to 0 hoping to prevent the CPUs from going to deep idle states, but then the
> machine would still end up going to Cede, even though that wont get reflected
> in the idle state counts. IMHO that scenario needs some thought as well...
> 

I checked the git history and found that the default idle was changed (on purpose)
to cede the processor, in order to speed up booting.. Hmm..

commit 363edbe2614aa90df706c0f19ccfa2a6c06af0be
Author: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Date:   Fri Sep 6 00:25:06 2013 +0530

    powerpc: Default arch idle could cede processor on pseries


Regards,
Srivatsa S. Bhat

^ permalink raw reply

* Re: [PATCH] Move precessing of MCE queued event out from syscall exit path.
From: Hugh Dickins @ 2014-01-14  7:47 UTC (permalink / raw)
  To: Mahesh J Salgaonkar, Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <20140114042611.13145.6551.stgit@mars.in.ibm.com>

On Tue, 14 Jan 2014, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> Huge Dickins reported an issue that b5ff4211a829
> "powerpc/book3s: Queue up and process delayed MCE events" breaks the
> PowerMac G5 boot. This patch fixes it by moving the mce even processing
> away from syscall exit, which was wrong to do that in first place, and
> implements a different mechanism to deal with it using a paca flag and
> decrementer interrupt to process the event.
> 
> Reported-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Good work guys: I can happily report that both this rework,
and Ben's one-liner, fix the issue for me on the G5: thank you.

(Irrelevant not-so-happy detail: I very nearly mailed you an hour or
so earlier to report that neither fixed it; but retried my original
CONFIG_PPC_POWERNV hack after, and found that now equally useless.

I did write of changing behaviour and ATA errors: it now appears that's
an independent but intermittent issue on the G5 in 3.13-rc7-mm1, which
coincidentally happened to trigger when I tested rc7-mm1 without fixes,
but not when I tested with my hack, until today.

I've gone back to testing on rc6-mm1, the previous week's mmotm,
which showed failure to run /sbin/init: rc6-mm1 has no trouble mounting
root, and it runs properly with your new patch, and with Ben's patch.

And I may be quite wrong to point a finger at ATA errors: perhaps
they're always shown, and quickly cleared off screen in successful boots,
but left visible when root cannot be mounted for some other reason.

I don't know, and won't have time to investigate further - bisecting
intermittents is not much fun!  I'll just have to hope that it's
sorted out before it reaches 3.14-rc, or else bite the bullet and
investigate on that.)

Hugh

> ---
>  arch/powerpc/include/asm/mce.h  |    3 +++
>  arch/powerpc/include/asm/paca.h |    3 +++
>  arch/powerpc/kernel/entry_64.S  |    5 -----
>  arch/powerpc/kernel/irq.c       |   11 ++++++++++-
>  arch/powerpc/kernel/mce.c       |    7 +++++++
>  arch/powerpc/kernel/time.c      |    9 +++++++++
>  6 files changed, 32 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
> index 2257d1e..225e678 100644
> --- a/arch/powerpc/include/asm/mce.h
> +++ b/arch/powerpc/include/asm/mce.h
> @@ -186,6 +186,9 @@ struct mce_error_info {
>  #define MCE_EVENT_RELEASE	true
>  #define MCE_EVENT_DONTRELEASE	false
>  
> +/* MCE bit flags (paca.mce_flags) */
> +#define MCE_EVENT_PENDING	0x0001
> +
>  extern void save_mce_event(struct pt_regs *regs, long handled,
>  			   struct mce_error_info *mce_err, uint64_t nip,
>  			   uint64_t addr);
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index c3523d1..f9aa521 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -141,6 +141,9 @@ struct paca_struct {
>  	u8 io_sync;			/* writel() needs spin_unlock sync */
>  	u8 irq_work_pending;		/* IRQ_WORK interrupt while soft-disable */
>  	u8 nap_state_lost;		/* NV GPR values lost in power7_idle */
> +#ifdef CONFIG_PPC_BOOK3S_64
> +	u8 mce_flags;			/* MCE bit flags. */
> +#endif
>  	u64 sprg3;			/* Saved user-visible sprg */
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>  	u64 tm_scratch;                 /* TM scratch area for reclaim */
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 770d6d6..bbfb029 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -184,11 +184,6 @@ syscall_exit:
>  	bl	.do_show_syscall_exit
>  	ld	r3,RESULT(r1)
>  #endif
> -#ifdef CONFIG_PPC_BOOK3S_64
> -BEGIN_FTR_SECTION
> -	bl	.machine_check_process_queued_event
> -END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
> -#endif
>  	CURRENT_THREAD_INFO(r12, r1)
>  
>  	ld	r8,_MSR(r1)
> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index ba01656..e22f591 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -67,6 +67,7 @@
>  #include <asm/udbg.h>
>  #include <asm/smp.h>
>  #include <asm/debug.h>
> +#include <asm/mce.h>
>  
>  #ifdef CONFIG_PPC64
>  #include <asm/paca.h>
> @@ -158,9 +159,17 @@ notrace unsigned int __check_irq_replay(void)
>  	 * We may have missed a decrementer interrupt. We check the
>  	 * decrementer itself rather than the paca irq_happened field
>  	 * in case we also had a rollover while hard disabled
> +	 * Also check if any MCE event is queued up that requires
> +	 * processing. Machine check handler would set paca->mce_flags
> +	 * and then call set_dec(1) to trigger a decrementer interrupt
> +	 * from NMI.
>  	 */
>  	local_paca->irq_happened &= ~PACA_IRQ_DEC;
> -	if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow())
> +	if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow()
> +#ifdef CONFIG_PPC_BOOK3S_64
> +		|| local_paca->mce_flags & MCE_EVENT_PENDING
> +#endif
> +		)
>  		return 0x900;
>  
>  	/* Finally check if an external interrupt happened */
> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index d6edf2b..7bab827 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -185,6 +185,13 @@ void machine_check_queue_event(void)
>  		return;
>  	}
>  	__get_cpu_var(mce_event_queue[index]) = evt;
> +
> +	/*
> +	 * Set the event pending flag and raise an decrementer interrupt
> +	 * to process the queued event later.
> +	 */
> +	local_paca->mce_flags |= MCE_EVENT_PENDING;
> +	set_dec(1);
>  }
>  
>  /*
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index b3b1441..87ccf92 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -69,6 +69,7 @@
>  #include <asm/vdso_datapage.h>
>  #include <asm/firmware.h>
>  #include <asm/cputime.h>
> +#include <asm/mce.h>
>  
>  /* powerpc clocksource/clockevent code */
>  
> @@ -505,6 +506,14 @@ void timer_interrupt(struct pt_regs * regs)
>  		return;
>  	}
>  
> +#ifdef CONFIG_PPC_BOOK3S_64
> +	/* Check if we have MCE event pending for processing. */
> +	if (local_paca->mce_flags & MCE_EVENT_PENDING) {
> +		local_paca->mce_flags &= ~MCE_EVENT_PENDING;
> +		machine_check_process_queued_event();
> +	}
> +#endif
> +
>  	/* Conditionally hard-enable interrupts now that the DEC has been
>  	 * bumped to its maximum value
>  	 */

^ permalink raw reply

* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Deepthi Dharwar @ 2014-01-14  8:00 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-pm, daniel.lezcano, rjw, linux-kernel, Preeti U Murthy,
	paulmck, linuxppc-dev, tuukka.tikkanen
In-Reply-To: <52D4E07E.204@linux.vnet.ibm.com>

On 01/14/2014 12:30 PM, Srivatsa S. Bhat wrote:
> On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
>> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
>> Inspite of this it was observed that the idle state count of the shallowest
>> idle state, snooze, was increasing.
>>
>> This is because the governor returns the idle state index as 0 even in
>> scenarios when no idle state can be chosen. These scenarios could be when the
>> latency requirement is 0 or as mentioned above when the user wants to disable
>> certain cpu idle states at runtime. In the latter case, its possible that no
>> cpu idle state is valid because the suitable states were disabled
>> and the rest did not match the menu governor criteria to be chosen as the
>> next idle state.
>>
>> This patch adds the code to indicate that a valid cpu idle state could not be
>> chosen by the menu governor and reports back to arch so that it can take some
>> default action.
>>
> 
> That sounds fair enough. However, the "default" action of pseries idle loop
> (pseries_lpar_idle()) surprises me. It enters Cede, which is _deeper_ than doing
> a snooze! IOW, a user might "disable" cpuidle or set the PM_QOS_CPU_DMA_LATENCY
> to 0 hoping to prevent the CPUs from going to deep idle states, but then the
> machine would still end up going to Cede, even though that wont get reflected
> in the idle state counts. IMHO that scenario needs some thought as well...

It was the snooze loop earlier but later we changed it to cede in commit
363edbe2614 powerpc: Default arch idle will cede the processor on
pseries to address the following regressions:

>>snippet from the patch.
When adding cpuidle support to pSeries, we introduced two
    regressions:

      - The new cpuidle backend driver only works under hypervisors
        supporting the "SLPLAR" option, which isn't the case of the
        old POWER4 hypervisor and the HV "light" used on js2x blades

      - The cpuidle driver registers fairly late, meaning that for
        a significant portion of the boot process, we end up having
        all threads spinning. This slows down the boot process and
        increases the overall resource usage if the hypervisor has
        shared processors.

    This fixes both by implementing a "default" idle that will cede
    to the hypervisor when possible, in a very simple way without
    all the bells and whisles of cpuidle.

Regards,
Deepthi


>> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>> ---
>>
>>  drivers/cpuidle/cpuidle.c        |    6 +++++-
>>  drivers/cpuidle/governors/menu.c |    7 ++++---
>>  2 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index a55e68f..5bf06bb 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>>
>>  	/* ask the governor for the next state */
>>  	next_state = cpuidle_curr_governor->select(drv, dev);
>> +
>> +	dev->last_residency = 0;
>>  	if (need_resched()) {
>> -		dev->last_residency = 0;
>>  		/* give the governor an opportunity to reflect on the outcome */
>>  		if (cpuidle_curr_governor->reflect)
>>  			cpuidle_curr_governor->reflect(dev, next_state);
> 
> The comments on top of the .reflect() routines of the governors say that the
> second parameter is the index of the actual state entered. But after this patch,
> next_state can be negative, indicating an invalid index. So those comments need
> to be updated accordingly.
> 
>> @@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
>>  		return 0;
>>  	}
>>
>> +	if (next_state < 0)
>> +		return -EINVAL;
> 
> The exit path above (due to need_resched) returns with irqs enabled, but the new
> one you are adding (next_state < 0) returns with irqs disabled. This is correct,
> because in the latter case, "idle" is still in progress and the arch will choose
> a default handler to execute (unlike the former case where "idle" is over and
> hence its time to enable interrupts).
> 
> IMHO it would be good to add comments around this code to explain this subtle
> difference. We can never be too careful with these things... ;-)
> 
>> +
>>  	trace_cpu_idle_rcuidle(next_state, dev->cpu);
>>
>>  	broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>> index cf7f2f0..6921543 100644
>> --- a/drivers/cpuidle/governors/menu.c
>> +++ b/drivers/cpuidle/governors/menu.c
>> @@ -283,6 +283,7 @@ again:
>>   * menu_select - selects the next idle state to enter
>>   * @drv: cpuidle driver containing state data
>>   * @dev: the CPU
>> + * Returns -1 when no idle state is suitable
>>   */
>>  static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>>  {
>> @@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>>  	int multiplier;
>>  	struct timespec t;
>>
>> -	if (data->needs_update) {
>> +	if (data->last_state_idx >= 0 && data->needs_update) {
>                ^^^^^
> Doesn't hurt, but actually unnecessary, since ->needs_update is set to 1
> only when index >= 0.
> 
>>  		menu_update(drv, dev);
>>  		data->needs_update = 0;
>>  	}
>>
>> -	data->last_state_idx = 0;
>> +	data->last_state_idx = -1;
>>  	data->exit_us = 0;
>>
>>  	/* Special case when user has set very strict latency requirement */
>>  	if (unlikely(latency_req == 0))
>> -		return 0;
>> +		return data->last_state_idx;
>>
>>  	/* determine the expected residency time, round up */
>>  	t = ktime_to_timespec(tick_nohz_get_sleep_length());
>>
> 
> What about the ladder governor? I know its not used that much in practice,
> but I think it would be good to update that as well, just to keep it
> consistent.
> 
> Regards,
> Srivatsa S. Bhat
> 

^ permalink raw reply

* [PATCH 1/3] powerpc/fsl: add E500MC and E5500 PVR define
From: Dongsheng Wang @ 2014-01-14  7:59 UTC (permalink / raw)
  To: scottwood, benh; +Cc: anton, linuxppc-dev, chenhui.zhao, Wang Dongsheng

From: Wang Dongsheng <dongsheng.wang@freescale.com>

E500MC and E5500 PVR will be used in subsequent save/restore core
state patches.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 62b114e..cd7b630 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1075,6 +1075,8 @@
 #define PVR_8560	0x80200000
 #define PVR_VER_E500V1	0x8020
 #define PVR_VER_E500V2	0x8021
+#define PVR_VER_E500MC	0x8023
+#define PVR_VER_E5500	0x8024
 #define PVR_VER_E6500	0x8040
 
 /*
-- 
1.8.5

^ permalink raw reply related

* [PATCH 2/3] powerpc/85xx: Provide two functions to save/restore the core registers
From: Dongsheng Wang @ 2014-01-14  7:59 UTC (permalink / raw)
  To: scottwood, benh; +Cc: anton, linuxppc-dev, chenhui.zhao, Wang Dongsheng
In-Reply-To: <1389686397-46555-1-git-send-email-dongsheng.wang@freescale.com>

From: Wang Dongsheng <dongsheng.wang@freescale.com>

Add fsl_cpu_state_save/fsl_cpu_state_restore functions, used for deep
sleep and hibernation to save/restore core registers. We abstract out
save/restore code for use in various modules, to make them don't need
to maintain.

Currently supported processors type are E6500, E5500, E500MC, E500v2 and
E500v1.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>

diff --git a/arch/powerpc/include/asm/fsl_sleep.h b/arch/powerpc/include/asm/fsl_sleep.h
new file mode 100644
index 0000000..31c8a9b
--- /dev/null
+++ b/arch/powerpc/include/asm/fsl_sleep.h
@@ -0,0 +1,98 @@
+/*
+ * Freescale 85xx Power management set
+ *
+ * Author: Wang Dongsheng <dongsheng.wang@freescale.com>
+ *
+ * Copyright 2014 Freescale Semiconductor Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __ASM_FSL_SLEEP_H
+#define __ASM_FSL_SLEEP_H
+
+/*
+ * Freescale 85xx Core registers set, core register map definition
+ * Address base on r3, we need to compatible with both 32-bit and 64-bit, so
+ * the data width is 64-bit(double word).
+ *
+ * Acronyms:
+ *	dw(data width)	0x08
+ *
+ * Map:
+ * General-Purpose Registers
+ *	GPR1(sp)		0
+ *	GPR2			0x8		(dw * 1)
+ *	GPR13 - GPR31		0x10 ~ 0xa0	(dw * 2 ~ dw * 20)
+ * Foating-point registers
+ *	FPR14 - FPR31		0xa8 ~ 0x130	(dw * 21 ~ dw * 38)
+ * Registers for Branch Operations
+ *	CR			0x138		(dw * 39)
+ *	LR			0x140		(dw * 40)
+ * Processor Control Registers
+ *	MSR			0x148		(dw * 41)
+ *	EPCR			0x150		(dw * 42)
+ *
+ *	Only e500, e500v2 need to save HID0 - HID1
+ *	HID0 - HID1		0x158 ~ 0x160 (dw * 43 ~ dw * 44)
+ * Timer Registers
+ *	TCR			0x168		(dw * 45)
+ *	TB(64bit)		0x170		(dw * 46)
+ *	TBU(32bit)		0x178		(dw * 47)
+ *	TBL(32bit)		0x180		(dw * 48)
+ * Interrupt Registers
+ *	IVPR			0x188		(dw * 49)
+ *	IVOR0 - IVOR15		0x190 ~ 0x208	(dw * 50 ~ dw * 65)
+ *	IVOR32 - IVOR41		0x210 ~ 0x258	(dw * 66 ~ dw * 75)
+ * Software-Use Registers
+ *	SPRG1			0x260		(dw * 76), 64-bit need to save.
+ *	SPRG3			0x268		(dw * 77), 32-bit need to save.
+ * MMU Registers
+ *	PID0 - PID2		0x270 ~ 0x280	(dw * 78 ~ dw * 80)
+ * Debug Registers
+ *	DBCR0 - DBCR2		0x288 ~ 0x298	(dw * 81 ~ dw * 83)
+ *	IAC1 - IAC4		0x2a0 ~ 0x2b8	(dw * 84 ~ dw * 87)
+ *	DAC1 - DAC2		0x2c0 ~ 0x2c8	(dw * 88 ~ dw * 89)
+ *
+ */
+
+#define SR_GPR1			0x000
+#define SR_GPR2			0x008
+#define SR_GPR13		0x010
+#define SR_FPR14		0x0a8
+#define SR_CR			0x138
+#define SR_LR			0x140
+#define SR_MSR			0x148
+#define SR_EPCR			0x150
+#define SR_HID0			0x158
+#define SR_TCR			0x168
+#define SR_TB			0x170
+#define SR_TBU			0x178
+#define SR_TBL			0x180
+#define SR_IVPR			0x188
+#define SR_IVOR0		0x190
+#define SR_IVOR32		0x210
+#define SR_SPRG1		0x260
+#define SR_SPRG3		0x268
+#define SR_PID0			0x270
+#define SR_DBCR0		0x288
+#define SR_IAC1			0x2a0
+#define SR_DAC1			0x2c0
+#define FSL_CPU_SR_SIZE		(SR_DAC1 + 0x10)
+
+#ifndef __ASSEMBLY__
+
+enum core_save_type {
+	BASE_SAVE = 0,
+	ALL_SAVE = 1,
+};
+
+extern int fsl_cpu_state_save(void *save_page, enum core_save_type type);
+extern int fsl_cpu_state_restore(void *restore_page, enum core_save_type type);
+
+#endif
+
+#endif
+
diff --git a/arch/powerpc/platforms/85xx/Makefile b/arch/powerpc/platforms/85xx/Makefile
index 25cebe7..650a01c 100644
--- a/arch/powerpc/platforms/85xx/Makefile
+++ b/arch/powerpc/platforms/85xx/Makefile
@@ -4,6 +4,7 @@
 obj-$(CONFIG_SMP) += smp.o
 
 obj-y += common.o
+obj-y += save-core.o
 
 obj-$(CONFIG_BSC9131_RDB) += bsc913x_rdb.o
 obj-$(CONFIG_C293_PCIE)   += c293pcie.o
diff --git a/arch/powerpc/platforms/85xx/save-core.S b/arch/powerpc/platforms/85xx/save-core.S
new file mode 100644
index 0000000..a6b93b8
--- /dev/null
+++ b/arch/powerpc/platforms/85xx/save-core.S
@@ -0,0 +1,497 @@
+/*
+ * Freescale Power Management, Save/Restore core state
+ *
+ * Copyright 2014 Freescale Semiconductor, Inc.
+ * Author: Wang Dongsheng <dongsheng.wang@freescale.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/fsl_sleep.h>
+
+/*
+ * Freescale 85xx Cores
+ * Support Core List:
+ * E500v1, E500v2, E500MC, E5500, E6500.
+ */
+
+ /*
+ * Save/Restore register operation define
+ */
+#define LOAD_SAVE_ADDRESS	\
+	mr	r10, r3
+
+#ifdef CONFIG_PPC64
+#define PPC_STD(sreg, offset, areg) \
+	std	sreg, (offset)(areg)
+#define PPC_LD(lreg, offset, areg) \
+	ld	lreg, (offset)(areg)
+
+#define PPC_STFD(sreg, offset, areg) \
+	stfd	sreg, (offset)(areg)
+#define PPC_LFD(lreg, offset, areg) \
+	lfd	lreg, (offset)(areg)
+#else
+#define PPC_STD(sreg, offset, areg) \
+	stw	sreg, (offset)(areg)
+#define PPC_LD(lreg, offset, areg) \
+	lwz	lreg, (offset)(areg)
+
+#define PPC_STFD(sreg, offset, areg) \
+	stfs	sreg, (offset)(areg)
+#define PPC_LFD(lreg, offset, areg) \
+	lfs	lreg, (offset)(areg)
+#endif
+
+#define do_save_gpr_reg(reg, addr) \
+	mr	r0, reg ;\
+	PPC_STD(r0, addr, r10)
+
+#define do_restore_gpr_reg(reg, addr) \
+	PPC_LD(r0, addr, r10) ;\
+	mr	reg, r0
+
+#define do_save_fpr_reg(reg, addr) \
+	fmr	fr0, reg ;\
+	PPC_STFD(fr0, addr, r10)
+
+#define do_restore_fpr_reg(reg, addr) \
+	PPC_LFD(fr0, addr, r10) ;\
+	fmr	reg, fr0
+
+#define do_save_spr_reg(reg, addr) \
+	mfspr	r0, SPRN_##reg ;\
+	PPC_STD(r0, addr, r10)
+
+#define do_restore_spr_reg(reg, addr) \
+	PPC_LD(r0, addr, r10) ;\
+	mtspr	SPRN_##reg, r0
+
+#define do_save_special_reg(special, addr) \
+	mf##special	r0 ;\
+	PPC_STD(r0, addr, r10)
+#define do_restore_special_reg(special, addr) \
+	PPC_LD(r0, addr, r10) ;\
+	mt##special	r0
+
+#define do_sr_general_gpr_regs(func) \
+	do_##func##_gpr_reg(r1, SR_GPR1) ;\
+	do_##func##_gpr_reg(r2, SR_GPR2) ;\
+	do_##func##_gpr_reg(r13, SR_GPR13 + 0x00) ;\
+	do_##func##_gpr_reg(r14, SR_GPR13 + 0x08) ;\
+	do_##func##_gpr_reg(r15, SR_GPR13 + 0x10) ;\
+	do_##func##_gpr_reg(r16, SR_GPR13 + 0x18) ;\
+	do_##func##_gpr_reg(r17, SR_GPR13 + 0x20) ;\
+	do_##func##_gpr_reg(r18, SR_GPR13 + 0x28) ;\
+	do_##func##_gpr_reg(r19, SR_GPR13 + 0x30) ;\
+	do_##func##_gpr_reg(r20, SR_GPR13 + 0x38) ;\
+	do_##func##_gpr_reg(r21, SR_GPR13 + 0x40) ;\
+	do_##func##_gpr_reg(r22, SR_GPR13 + 0x48) ;\
+	do_##func##_gpr_reg(r23, SR_GPR13 + 0x50) ;\
+	do_##func##_gpr_reg(r24, SR_GPR13 + 0x58) ;\
+	do_##func##_gpr_reg(r25, SR_GPR13 + 0x60) ;\
+	do_##func##_gpr_reg(r26, SR_GPR13 + 0x68) ;\
+	do_##func##_gpr_reg(r27, SR_GPR13 + 0x70) ;\
+	do_##func##_gpr_reg(r28, SR_GPR13 + 0x78) ;\
+	do_##func##_gpr_reg(r29, SR_GPR13 + 0x80) ;\
+	do_##func##_gpr_reg(r30, SR_GPR13 + 0x88) ;\
+	do_##func##_gpr_reg(r31, SR_GPR13 + 0x90)
+
+#define do_sr_fpr_regs(func) \
+	do_##func##_fpr_reg(fr14, SR_FPR14 + 0x00) ;\
+	do_##func##_fpr_reg(fr15, SR_FPR14 + 0x08) ;\
+	do_##func##_fpr_reg(fr16, SR_FPR14 + 0x10) ;\
+	do_##func##_fpr_reg(fr17, SR_FPR14 + 0x18) ;\
+	do_##func##_fpr_reg(fr18, SR_FPR14 + 0x20) ;\
+	do_##func##_fpr_reg(fr19, SR_FPR14 + 0x28) ;\
+	do_##func##_fpr_reg(fr20, SR_FPR14 + 0x30) ;\
+	do_##func##_fpr_reg(fr21, SR_FPR14 + 0x38) ;\
+	do_##func##_fpr_reg(fr22, SR_FPR14 + 0x40) ;\
+	do_##func##_fpr_reg(fr23, SR_FPR14 + 0x48) ;\
+	do_##func##_fpr_reg(fr24, SR_FPR14 + 0x50) ;\
+	do_##func##_fpr_reg(fr25, SR_FPR14 + 0x58) ;\
+	do_##func##_fpr_reg(fr26, SR_FPR14 + 0x60) ;\
+	do_##func##_fpr_reg(fr27, SR_FPR14 + 0x68) ;\
+	do_##func##_fpr_reg(fr28, SR_FPR14 + 0x70) ;\
+	do_##func##_fpr_reg(fr29, SR_FPR14 + 0x78) ;\
+	do_##func##_fpr_reg(fr30, SR_FPR14 + 0x80) ;\
+	do_##func##_fpr_reg(fr31, SR_FPR14 + 0x88)
+
+#define do_sr_general_branch_regs(func) \
+	do_##func##_special_reg(CR, SR_CR)
+
+#define do_sr_general_pcr_regs(func) \
+	do_##func##_special_reg(MSR, SR_MSR) ;\
+	do_##func##_spr_reg(EPCR, SR_EPCR) ;\
+	do_##func##_spr_reg(HID0, SR_HID0 + 0x00)
+
+#define do_sr_e500_pcr_regs(func) \
+	do_##func##_spr_reg(HID1, SR_HID0 + 0x08)
+
+#define do_sr_save_tb_regs \
+	do_save_spr_reg(TBRU, SR_TBU) ;\
+	do_save_spr_reg(TBRL, SR_TBL)
+
+#define do_sr_restore_tb_regs \
+	do_restore_spr_reg(TBWU, SR_TBU) ;\
+	do_restore_spr_reg(TBWL, SR_TBL)
+
+#define do_sr_general_time_regs(func) \
+	do_sr_##func##_tb_regs	;\
+	do_##func##_spr_reg(TCR, SR_TCR)
+
+#define do_sr_interrupt_regs(func) \
+	do_##func##_spr_reg(IVPR, SR_IVPR) ;\
+	do_##func##_spr_reg(IVOR0, SR_IVOR0 + 0x00) ;\
+	do_##func##_spr_reg(IVOR1, SR_IVOR0 + 0x08) ;\
+	do_##func##_spr_reg(IVOR2, SR_IVOR0 + 0x10) ;\
+	do_##func##_spr_reg(IVOR3, SR_IVOR0 + 0x18) ;\
+	do_##func##_spr_reg(IVOR4, SR_IVOR0 + 0x20) ;\
+	do_##func##_spr_reg(IVOR5, SR_IVOR0 + 0x28) ;\
+	do_##func##_spr_reg(IVOR6, SR_IVOR0 + 0x30) ;\
+	do_##func##_spr_reg(IVOR7, SR_IVOR0 + 0x38) ;\
+	do_##func##_spr_reg(IVOR8, SR_IVOR0 + 0x40) ;\
+	do_##func##_spr_reg(IVOR10, SR_IVOR0 + 0x50) ;\
+	do_##func##_spr_reg(IVOR11, SR_IVOR0 + 0x58) ;\
+	do_##func##_spr_reg(IVOR12, SR_IVOR0 + 0x60) ;\
+	do_##func##_spr_reg(IVOR13, SR_IVOR0 + 0x68) ;\
+	do_##func##_spr_reg(IVOR14, SR_IVOR0 + 0x70) ;\
+	do_##func##_spr_reg(IVOR15, SR_IVOR0 + 0x78)
+
+#define do_e6500_sr_interrupt_regs(func) \
+	do_##func##_spr_reg(IVOR9, SR_IVOR0 + 0x48) ;\
+	do_##func##_spr_reg(IVOR32, SR_IVOR32 + 0x00) ;\
+	do_##func##_spr_reg(IVOR33, SR_IVOR32 + 0x08) ;\
+	do_##func##_spr_reg(IVOR35, SR_IVOR32 + 0x18) ;\
+	do_##func##_spr_reg(IVOR36, SR_IVOR32 + 0x20) ;\
+	do_##func##_spr_reg(IVOR37, SR_IVOR32 + 0x28) ;\
+	do_##func##_spr_reg(IVOR38, SR_IVOR32 + 0x30) ;\
+	do_##func##_spr_reg(IVOR39, SR_IVOR32 + 0x38) ;\
+	do_##func##_spr_reg(IVOR40, SR_IVOR32 + 0x40) ;\
+	do_##func##_spr_reg(IVOR41, SR_IVOR32 + 0x48)
+
+#define do_e5500_sr_interrupt_regs(func) \
+	do_##func##_spr_reg(IVOR9, SR_IVOR0 + 0x48) ;\
+	do_##func##_spr_reg(IVOR35, SR_IVOR32 + 0x18) ;\
+	do_##func##_spr_reg(IVOR36, SR_IVOR32 + 0x20) ;\
+	do_##func##_spr_reg(IVOR37, SR_IVOR32 + 0x28) ;\
+	do_##func##_spr_reg(IVOR38, SR_IVOR32 + 0x30) ;\
+	do_##func##_spr_reg(IVOR39, SR_IVOR32 + 0x38) ;\
+	do_##func##_spr_reg(IVOR40, SR_IVOR32 + 0x40) ;\
+	do_##func##_spr_reg(IVOR41, SR_IVOR32 + 0x48)
+
+#define do_e500_sr_interrupt_regs(func) \
+	do_##func##_spr_reg(IVOR32, SR_IVOR32 + 0x00) ;\
+	do_##func##_spr_reg(IVOR33, SR_IVOR32 + 0x08) ;\
+	do_##func##_spr_reg(IVOR34, SR_IVOR32 + 0x10)
+
+#define do_e500mc_sr_interrupt_regs(func) \
+	do_##func##_spr_reg(IVOR9, SR_IVOR0 + 0x48) ;\
+	do_##func##_spr_reg(IVOR35, SR_IVOR32 + 0x18) ;\
+	do_##func##_spr_reg(IVOR36, SR_IVOR32 + 0x20) ;\
+	do_##func##_spr_reg(IVOR37, SR_IVOR32 + 0x28) ;\
+	do_##func##_spr_reg(IVOR38, SR_IVOR32 + 0x30) ;\
+	do_##func##_spr_reg(IVOR39, SR_IVOR32 + 0x38) ;\
+	do_##func##_spr_reg(IVOR40, SR_IVOR32 + 0x40) ;\
+	do_##func##_spr_reg(IVOR41, SR_IVOR32 + 0x48)
+
+#define do_sr_general_software_regs(func) \
+	do_##func##_spr_reg(SPRG1, SR_SPRG1) ;\
+	do_##func##_spr_reg(SPRG3, SR_SPRG3)
+
+#define do_sr_general_mmu_regs(func) \
+	do_##func##_spr_reg(PID0, SR_PID0 + 0x00)
+
+#define do_sr_e500_mmu_regs(func) \
+	do_##func##_spr_reg(PID1, SR_PID0 + 0x08) ;\
+	do_##func##_spr_reg(PID2, SR_PID0 + 0x10)
+
+#define do_sr_debug_regs(func) \
+	do_##func##_spr_reg(DBCR0, SR_DBCR0 + 0x00) ;\
+	do_##func##_spr_reg(DBCR1, SR_DBCR0 + 0x08) ;\
+	do_##func##_spr_reg(DBCR2, SR_DBCR0 + 0x10) ;\
+	do_##func##_spr_reg(IAC1, SR_IAC1 + 0x00) ;\
+	do_##func##_spr_reg(IAC2, SR_IAC1 + 0x08) ;\
+	do_##func##_spr_reg(DAC1, SR_DAC1 + 0x00) ;\
+	do_##func##_spr_reg(DAC2, SR_DAC1 + 0x08)
+
+#define do_e6500_sr_debug_regs(func) \
+	do_##func##_spr_reg(IAC3, SR_IAC1 + 0x10) ;\
+	do_##func##_spr_reg(IAC4, SR_IAC1 + 0x18)
+
+/*
+ * Freescale 85xx Cores, Save/Restore core registers.
+ */
+_GLOBAL(core_registers_save_area)
+	.space FSL_CPU_SR_SIZE
+
+	.section .text
+	.align	5
+_GLOBAL(fsl_cpu_base_save)
+	do_sr_general_gpr_regs(save)
+	do_sr_general_branch_regs(save)
+	do_sr_general_pcr_regs(save)
+	do_sr_general_software_regs(save)
+	do_sr_general_mmu_regs(save)
+
+	/*
+	 * Need to save float-point registers if MSR[FP] = 1.
+	 */
+	mfmsr	r12
+	andi.	r12, r12, MSR_FP
+	beq	1f
+	do_sr_fpr_regs(save)
+
+1:
+	mfspr	r5, SPRN_TBRU
+	do_sr_general_time_regs(save)
+	mfspr	r6, SPRN_TBRU
+	cmpw	r5, r6
+	bne	1b
+
+	blr
+
+_GLOBAL(fsl_cpu_base_restore)
+	do_sr_general_gpr_regs(restore)
+	do_sr_general_branch_regs(restore)
+	do_sr_general_pcr_regs(restore)
+	do_sr_general_software_regs(restore)
+	do_sr_general_mmu_regs(restore)
+
+	isync
+
+	/*
+	 * Need to restore float-point registers if MSR[FP] = 1.
+	 */
+	mfmsr	r12
+	andi.	r12, r12, MSR_FP
+	beq	1f
+	do_sr_fpr_regs(restore)
+
+1:
+	/* Restore Time registers */
+	/* clear tb lower to avoid wrap */
+	li	r0, 0
+	mtspr	SPRN_TBWL, r0
+	do_sr_general_time_regs(restore)
+
+	lis	r0, (TSR_ENW | TSR_WIS | TSR_DIS | TSR_FIS)@h
+	mtspr	SPRN_TSR, r0
+
+	/* Kick decrementer */
+	li	r0, 1
+	mtdec	r0
+
+	blr
+
+/* Base registers, e500v1, e500v2 need to do some special save/restore */
+_GLOBAL(e500_base_special_save)
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500V1@l
+	cmpw	r11, r12
+	beq	500f
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500V2@l
+	cmpw	r11, r12
+	bne	1f
+
+500:
+	do_sr_e500_pcr_regs(save)
+	do_sr_e500_mmu_regs(save)
+1:
+	blr
+
+_GLOBAL(e500_base_special_restore)
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500V1@l
+	cmpw	r11, r12
+	beq	500f
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500V2@l
+	cmpw	r11, r12
+	bne	1f
+
+500:
+	do_sr_e500_pcr_regs(save)
+	do_sr_e500_mmu_regs(save)
+1:
+	blr
+
+_GLOBAL(fsl_cpu_append_save)
+	mfspr	r0, SPRN_PVR
+	rlwinm	r11, r0, 16, 16, 31
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E6500@l
+	cmpw	r11, r12
+	beq	e6500_append_save
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E5500@l
+	cmpw	r11, r12
+	beq	e5500_append_save
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500MC@l
+	cmpw	r11, r12
+	beq	e500mc_append_save
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500V2@l
+	cmpw	r11, r12
+	beq	e500v2_append_save
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500V1@l
+	cmpw	r11, r12
+	beq	e500v1_append_save
+
+	b	1f
+
+e6500_append_save:
+	do_e6500_sr_interrupt_regs(save)
+	do_e6500_sr_debug_regs(save)
+	b	1f
+
+e5500_append_save:
+	do_e5500_sr_interrupt_regs(save)
+	b	1f
+
+e500mc_append_save:
+	do_e500mc_sr_interrupt_regs(save)
+	b	1f
+
+e500v2_append_save:
+e500v1_append_save:
+	do_e500_sr_interrupt_regs(save)
+
+1:
+	do_sr_interrupt_regs(save)
+	do_sr_debug_regs(save)
+
+	blr
+
+_GLOBAL(fsl_cpu_append_restore)
+	mfspr	r0, SPRN_PVR
+	rlwinm	r11, r0, 16, 16, 31
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E6500@l
+	cmpw	r11, r12
+	beq	e6500_append_restore
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E5500@l
+	cmpw	r11, r12
+	beq	e5500_append_restore
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500MC@l
+	cmpw	r11, r12
+	beq	e500mc_append_restore
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500V2@l
+	cmpw	r11, r12
+	beq	e500v2_append_restore
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500V1@l
+	cmpw	r11, r12
+	beq	e500v1_append_restore
+
+	b	1f
+
+e6500_append_restore:
+	do_e6500_sr_interrupt_regs(restore)
+	do_e6500_sr_debug_regs(restore)
+	b	1f
+
+e5500_append_restore:
+	do_e5500_sr_interrupt_regs(restore)
+	b	1f
+
+e500mc_append_restore:
+	do_e500mc_sr_interrupt_regs(restore)
+	b	1f
+
+e500v2_append_restore:
+e500v1_append_restore:
+	do_e500_sr_interrupt_regs(restore)
+
+1:
+	do_sr_interrupt_regs(restore)
+	do_sr_debug_regs(restore)
+
+	sync
+
+	blr
+
+/*
+ * r3 = the virtual address of buffer
+ * r4 = suspend type, 0-BASE_SAVE, 1-ALL_SAVE
+ */
+_GLOBAL(fsl_cpu_state_save)
+	mflr	r9
+	LOAD_SAVE_ADDRESS
+
+	/* save the return address to SR_LR */
+	do_save_gpr_reg(r9, SR_LR)
+
+	/* if core_save_type is BASE_SAVE, goto 1f */
+	cmpwi	r4, 0
+	beq	1f
+
+	bl	fsl_cpu_append_save
+
+1:
+	bl	e500_base_special_save
+
+	bl	fsl_cpu_base_save
+
+	li	r3, 0
+	mtlr	r9
+	blr
+
+/*
+ * r3 = the virtual address of buffer
+ * r4 = suspend type, 0-BASE_SAVE, 1-ALL_SAVE
+ */
+_GLOBAL(fsl_cpu_state_restore)
+	mflr	r9
+	LOAD_SAVE_ADDRESS
+
+	/*
+	 * Disable machine checks and critical exceptions,
+	 * if core_save_type is ALL_SAVE, we will restore interrupt
+	 * IVORs registers.
+	 */
+	mfmsr	r5
+	rlwinm	r5, r5, 0, ~MSR_CE
+	rlwinm	r5, r5, 0, ~MSR_ME
+	mtmsr	r5
+	isync
+
+	/* if core_save_type is BASE_SAVE, goto 1f */
+	cmpwi	r4, 0
+	beq	1f
+
+	bl	fsl_cpu_append_restore
+
+1:
+	bl	e500_base_special_restore
+
+	bl	fsl_cpu_base_restore
+
+	/* return the return address of the save time */
+	do_restore_gpr_reg(r9, SR_LR)
+
+	li	r3, 0
+	mtlr	r9
+	blr
-- 
1.8.5

^ permalink raw reply related

* [PATCH 3/3] powerpc/fsl: Use the new interface to save or restore registers
From: Dongsheng Wang @ 2014-01-14  7:59 UTC (permalink / raw)
  To: scottwood, benh; +Cc: anton, linuxppc-dev, chenhui.zhao, Wang Dongsheng
In-Reply-To: <1389686397-46555-1-git-send-email-dongsheng.wang@freescale.com>

From: Wang Dongsheng <dongsheng.wang@freescale.com>

Use fsl_cpu_state_save/fsl_cpu_state_restore to save/restore registers.
Use the functions to save/restore registers, so we don't need to
maintain the code.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>

diff --git a/arch/powerpc/kernel/swsusp_booke.S b/arch/powerpc/kernel/swsusp_booke.S
index 553c140..b5992db 100644
--- a/arch/powerpc/kernel/swsusp_booke.S
+++ b/arch/powerpc/kernel/swsusp_booke.S
@@ -4,92 +4,28 @@
  * Copyright (c) 2009-2010 MontaVista Software, LLC.
  */
 
-#include <linux/threads.h>
-#include <asm/processor.h>
 #include <asm/page.h>
-#include <asm/cputable.h>
-#include <asm/thread_info.h>
 #include <asm/ppc_asm.h>
 #include <asm/asm-offsets.h>
 #include <asm/mmu.h>
-
-/*
- * Structure for storing CPU registers on the save area.
- */
-#define SL_SP		0
-#define SL_PC		4
-#define SL_MSR		8
-#define SL_TCR		0xc
-#define SL_SPRG0	0x10
-#define SL_SPRG1	0x14
-#define SL_SPRG2	0x18
-#define SL_SPRG3	0x1c
-#define SL_SPRG4	0x20
-#define SL_SPRG5	0x24
-#define SL_SPRG6	0x28
-#define SL_SPRG7	0x2c
-#define SL_TBU		0x30
-#define SL_TBL		0x34
-#define SL_R2		0x38
-#define SL_CR		0x3c
-#define SL_LR		0x40
-#define SL_R12		0x44	/* r12 to r31 */
-#define SL_SIZE		(SL_R12 + 80)
-
-	.section .data
-	.align	5
-
-_GLOBAL(swsusp_save_area)
-	.space	SL_SIZE
-
+#include <asm/fsl_sleep.h>
 
 	.section .text
 	.align	5
 
 _GLOBAL(swsusp_arch_suspend)
-	lis	r11,swsusp_save_area@h
-	ori	r11,r11,swsusp_save_area@l
-
-	mflr	r0
-	stw	r0,SL_LR(r11)
-	mfcr	r0
-	stw	r0,SL_CR(r11)
-	stw	r1,SL_SP(r11)
-	stw	r2,SL_R2(r11)
-	stmw	r12,SL_R12(r11)
-
-	/* Save MSR & TCR */
-	mfmsr	r4
-	stw	r4,SL_MSR(r11)
-	mfspr	r4,SPRN_TCR
-	stw	r4,SL_TCR(r11)
-
-	/* Get a stable timebase and save it */
-1:	mfspr	r4,SPRN_TBRU
-	stw	r4,SL_TBU(r11)
-	mfspr	r5,SPRN_TBRL
-	stw	r5,SL_TBL(r11)
-	mfspr	r3,SPRN_TBRU
-	cmpw	r3,r4
-	bne	1b
+	mflr	r15
+	lis	r3, core_registers_save_area@h
+	ori	r3, r3, core_registers_save_area@l
+
+	/* Save base register */
+	li	r4, 0
+	bl	fsl_cpu_state_save
 
-	/* Save SPRGs */
-	mfspr	r4,SPRN_SPRG0
-	stw	r4,SL_SPRG0(r11)
-	mfspr	r4,SPRN_SPRG1
-	stw	r4,SL_SPRG1(r11)
-	mfspr	r4,SPRN_SPRG2
-	stw	r4,SL_SPRG2(r11)
-	mfspr	r4,SPRN_SPRG3
-	stw	r4,SL_SPRG3(r11)
-	mfspr	r4,SPRN_SPRG4
-	stw	r4,SL_SPRG4(r11)
-	mfspr	r4,SPRN_SPRG5
-	stw	r4,SL_SPRG5(r11)
-	mfspr	r4,SPRN_SPRG6
-	stw	r4,SL_SPRG6(r11)
-	mfspr	r4,SPRN_SPRG7
-	stw	r4,SL_SPRG7(r11)
+	/* Save LR */
+	lis	r3, core_registers_save_area@h
+	ori	r3, r3, core_registers_save_area@l
+	stw	r15, SR_LR(r3)
 
 	/* Call the low level suspend stuff (we should probably have made
 	 * a stackframe...
@@ -97,11 +33,12 @@ _GLOBAL(swsusp_arch_suspend)
 	bl	swsusp_save
 
 	/* Restore LR from the save area */
-	lis	r11,swsusp_save_area@h
-	ori	r11,r11,swsusp_save_area@l
-	lwz	r0,SL_LR(r11)
-	mtlr	r0
+	lis	r3, core_registers_save_area@h
+	ori	r3, r3, core_registers_save_area@l
+	lwz	r15, SR_LR(r3)
+	mtlr	r15
 
+	li	r3, 0
 	blr
 
 _GLOBAL(swsusp_arch_resume)
@@ -138,9 +75,6 @@ _GLOBAL(swsusp_arch_resume)
 	bl flush_dcache_L1
 	bl flush_instruction_cache
 
-	lis	r11,swsusp_save_area@h
-	ori	r11,r11,swsusp_save_area@l
-
 	/*
 	 * Mappings from virtual addresses to physical addresses may be
 	 * different than they were prior to restoring hibernation state. 
@@ -149,53 +83,12 @@ _GLOBAL(swsusp_arch_resume)
 	 */
 	bl	_tlbil_all
 
-	lwz	r4,SL_SPRG0(r11)
-	mtspr	SPRN_SPRG0,r4
-	lwz	r4,SL_SPRG1(r11)
-	mtspr	SPRN_SPRG1,r4
-	lwz	r4,SL_SPRG2(r11)
-	mtspr	SPRN_SPRG2,r4
-	lwz	r4,SL_SPRG3(r11)
-	mtspr	SPRN_SPRG3,r4
-	lwz	r4,SL_SPRG4(r11)
-	mtspr	SPRN_SPRG4,r4
-	lwz	r4,SL_SPRG5(r11)
-	mtspr	SPRN_SPRG5,r4
-	lwz	r4,SL_SPRG6(r11)
-	mtspr	SPRN_SPRG6,r4
-	lwz	r4,SL_SPRG7(r11)
-	mtspr	SPRN_SPRG7,r4
-
-	/* restore the MSR */
-	lwz	r3,SL_MSR(r11)
-	mtmsr	r3
-
-	/* Restore TB */
-	li	r3,0
-	mtspr	SPRN_TBWL,r3
-	lwz	r3,SL_TBU(r11)
-	lwz	r4,SL_TBL(r11)
-	mtspr	SPRN_TBWU,r3
-	mtspr	SPRN_TBWL,r4
-
-	/* Restore TCR and clear any pending bits in TSR. */
-	lwz	r4,SL_TCR(r11)
-	mtspr	SPRN_TCR,r4
-	lis	r4, (TSR_ENW | TSR_WIS | TSR_DIS | TSR_FIS)@h
-	mtspr	SPRN_TSR,r4
-
-	/* Kick decrementer */
-	li	r0,1
-	mtdec	r0
-
-	/* Restore the callee-saved registers and return */
-	lwz	r0,SL_CR(r11)
-	mtcr	r0
-	lwz	r2,SL_R2(r11)
-	lmw	r12,SL_R12(r11)
-	lwz	r1,SL_SP(r11)
-	lwz	r0,SL_LR(r11)
-	mtlr	r0
+	lis	r3, core_registers_save_area@h
+	ori	r3, r3, core_registers_save_area@l
+
+	/* Restore base register */
+	li	r4, 0
+	bl	fsl_cpu_state_restore
 
 	li	r3,0
 	blr
-- 
1.8.5

^ permalink raw reply related

* Re: [PATCH] Move precessing of MCE queued event out from syscall exit path.
From: Benjamin Herrenschmidt @ 2014-01-14  8:20 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Mahesh J Salgaonkar, linuxppc-dev
In-Reply-To: <alpine.LSU.2.11.1401132314380.3222@eggly.anvils>

On Mon, 2014-01-13 at 23:47 -0800, Hugh Dickins wrote:
> 
> And I may be quite wrong to point a finger at ATA errors: perhaps
> they're always shown, and quickly cleared off screen in successful
> boots,
> but left visible when root cannot be mounted for some other reason.

dmesg would tell...

> I don't know, and won't have time to investigate further - bisecting
> intermittents is not much fun!  I'll just have to hope that it's
> sorted out before it reaches 3.14-rc, or else bite the bullet and
> investigate on that.)

Right :-) Oh well, I still use a G5 as a desktop so I might eventually
stumble upon them !

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Preeti U Murthy @ 2014-01-14  8:25 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: deepthi, linux-pm, daniel.lezcano, rjw, linux-kernel, paulmck,
	linuxppc-dev, tuukka.tikkanen
In-Reply-To: <52D4E07E.204@linux.vnet.ibm.com>

Hi Srivatsa,

On 01/14/2014 12:30 PM, Srivatsa S. Bhat wrote:
> On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
>> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
>> Inspite of this it was observed that the idle state count of the shallowest
>> idle state, snooze, was increasing.
>>
>> This is because the governor returns the idle state index as 0 even in
>> scenarios when no idle state can be chosen. These scenarios could be when the
>> latency requirement is 0 or as mentioned above when the user wants to disable
>> certain cpu idle states at runtime. In the latter case, its possible that no
>> cpu idle state is valid because the suitable states were disabled
>> and the rest did not match the menu governor criteria to be chosen as the
>> next idle state.
>>
>> This patch adds the code to indicate that a valid cpu idle state could not be
>> chosen by the menu governor and reports back to arch so that it can take some
>> default action.
>>
> 
> That sounds fair enough. However, the "default" action of pseries idle loop
> (pseries_lpar_idle()) surprises me. It enters Cede, which is _deeper_ than doing
> a snooze! IOW, a user might "disable" cpuidle or set the PM_QOS_CPU_DMA_LATENCY
> to 0 hoping to prevent the CPUs from going to deep idle states, but then the
> machine would still end up going to Cede, even though that wont get reflected
> in the idle state counts. IMHO that scenario needs some thought as well...

Yes I did see this, but since the patch intends to only communicate
whether the cpuidle governor was successful in choosing an idle state on
its part, I wished to address the default action of pseries idle loop
separately. You are right we will need to understand the patch which
introduced this action. I will take a look at it.

> 
>> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>> ---
>>
>>  drivers/cpuidle/cpuidle.c        |    6 +++++-
>>  drivers/cpuidle/governors/menu.c |    7 ++++---
>>  2 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index a55e68f..5bf06bb 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>>
>>  	/* ask the governor for the next state */
>>  	next_state = cpuidle_curr_governor->select(drv, dev);
>> +
>> +	dev->last_residency = 0;
>>  	if (need_resched()) {
>> -		dev->last_residency = 0;
>>  		/* give the governor an opportunity to reflect on the outcome */
>>  		if (cpuidle_curr_governor->reflect)
>>  			cpuidle_curr_governor->reflect(dev, next_state);
> 
> The comments on top of the .reflect() routines of the governors say that the
> second parameter is the index of the actual state entered. But after this patch,
> next_state can be negative, indicating an invalid index. So those comments need
> to be updated accordingly.

Right, I will take care of the comment in the next post.
> 
>> @@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
>>  		return 0;
>>  	}
>>
>> +	if (next_state < 0)
>> +		return -EINVAL;
> 
> The exit path above (due to need_resched) returns with irqs enabled, but the new
> one you are adding (next_state < 0) returns with irqs disabled. This is correct,
> because in the latter case, "idle" is still in progress and the arch will choose
> a default handler to execute (unlike the former case where "idle" is over and
> hence its time to enable interrupts).

Correct.
> 
> IMHO it would be good to add comments around this code to explain this subtle
> difference. We can never be too careful with these things... ;-)

Ok, will do so.
> 
>> +
>>  	trace_cpu_idle_rcuidle(next_state, dev->cpu);
>>
>>  	broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>> index cf7f2f0..6921543 100644
>> --- a/drivers/cpuidle/governors/menu.c
>> +++ b/drivers/cpuidle/governors/menu.c
>> @@ -283,6 +283,7 @@ again:
>>   * menu_select - selects the next idle state to enter
>>   * @drv: cpuidle driver containing state data
>>   * @dev: the CPU
>> + * Returns -1 when no idle state is suitable
>>   */
>>  static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>>  {
>> @@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>>  	int multiplier;
>>  	struct timespec t;
>>
>> -	if (data->needs_update) {
>> +	if (data->last_state_idx >= 0 && data->needs_update) {
>                ^^^^^
> Doesn't hurt, but actually unnecessary, since ->needs_update is set to 1
> only when index >= 0.

Right we do not need this check. I was assuming that needs_update would
be consistent with the index >= 0 only in the need_resched() case. But
needs_update will get unset each time the governor is invoked to be set
only if index >= 0 thereafter.

> 
>>  		menu_update(drv, dev);
>>  		data->needs_update = 0;
>>  	}
>>
>> -	data->last_state_idx = 0;
>> +	data->last_state_idx = -1;
>>  	data->exit_us = 0;
>>
>>  	/* Special case when user has set very strict latency requirement */
>>  	if (unlikely(latency_req == 0))
>> -		return 0;
>> +		return data->last_state_idx;
>>
>>  	/* determine the expected residency time, round up */
>>  	t = ktime_to_timespec(tick_nohz_get_sleep_length());
>>
> 
> What about the ladder governor? I know its not used that much in practice,
> but I think it would be good to update that as well, just to keep it
> consistent.

Yes this needs to be updated as well. But the ladder governor has a few
other details to take care of in addition to what is taken care of in
the menu governor by this patch. Hence I will be posting that separately.

Thanks

Regards
Preeti U Murthy
> 
> Regards,
> Srivatsa S. Bhat
> 

^ permalink raw reply

* Re: [PATCH v3] watchdog: mpc8xxx_wdt convert to watchdog core
From: Wim Van Sebroeck @ 2014-01-14  8:32 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: scottwood, linuxppc-dev, linux-kernel, Guenter Roeck,
	linux-watchdog
In-Reply-To: <20131204063214.D1DB01A2BEA@localhost.localdomain>

Hi Christophe,

> Convert mpc8xxx_wdt.c to the new watchdog API.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

This patch has been added to linux-watchdog-next.

Kind regards,
Wim.

^ permalink raw reply

* [PATCH] powerpc: dma-mapping: Return dma_direct_ops variable when dev == NULL
From: Chunhe Lan @ 2014-01-14  9:44 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Chunhe Lan

Without this patch, kind of below error will be dumped if
'insmod ixgbevf.ko' is executed:

    ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function
             Network Driver - version 2.7.12-k
    ixgbevf: Copyright (c) 2009 - 2012 Intel Corporation.
    ixgbevf 0000:01:10.0: enabling device (0000 -> 0002)
    ixgbevf 0000:01:10.0: No usable DMA configuration, aborting
    ixgbevf: probe of 0000:01:10.0 failed with error -5
    ......
    ......

Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Chunhe Lan <Chunhe.Lan@freescale.com>
---
 arch/powerpc/include/asm/dma-mapping.h |   13 +++++++++----
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index e27e9ad..b8c10de 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -84,10 +84,15 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 	 * only ISA DMA device we support is the floppy and we have a hack
 	 * in the floppy driver directly to get a device for us.
 	 */
-	if (unlikely(dev == NULL))
-		return NULL;
-
-	return dev->archdata.dma_ops;
+	if (dev && dev->archdata.dma_ops)
+		return dev->archdata.dma_ops;
+	/*
+	 * In some cases (for example, use the Intel(R) 10 Gigabit PCI
+	 * expression Virtual Function Network Driver -- ixgbevf.ko),
+	 * their value of dev is the NULL. If return NULL, the driver is
+	 * aborting. So return dma_direct_ops variable when dev == NULL.
+	 */
+	return &dma_direct_ops;
 }
 
 static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)
-- 
1.7.6.5

^ permalink raw reply related

* Re: [PATCH] powerpc: dma-mapping: Return dma_direct_ops variable when dev == NULL
From: Benjamin Herrenschmidt @ 2014-01-14 10:14 UTC (permalink / raw)
  To: Chunhe Lan; +Cc: linux-pci, linuxppc-dev
In-Reply-To: <1389692656-27758-1-git-send-email-Chunhe.Lan@freescale.com>

On Tue, 2014-01-14 at 17:44 +0800, Chunhe Lan wrote:
> Without this patch, kind of below error will be dumped if
> 'insmod ixgbevf.ko' is executed:
> 
>     ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function
>              Network Driver - version 2.7.12-k
>     ixgbevf: Copyright (c) 2009 - 2012 Intel Corporation.
>     ixgbevf 0000:01:10.0: enabling device (0000 -> 0002)
>     ixgbevf 0000:01:10.0: No usable DMA configuration, aborting
>     ixgbevf: probe of 0000:01:10.0 failed with error -5
>     ......
>     ......

That's not right. The DMA ops must be set properly for the VF somewhere
in the arch code instead. When creating VFs, is there a hook allowing
the arch to fix things up ?

(Also adding linux-pci on CC)

Ben.

> Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Tested-by: Chunhe Lan <Chunhe.Lan@freescale.com>
> ---
>  arch/powerpc/include/asm/dma-mapping.h |   13 +++++++++----
>  1 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
> index e27e9ad..b8c10de 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -84,10 +84,15 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
>  	 * only ISA DMA device we support is the floppy and we have a hack
>  	 * in the floppy driver directly to get a device for us.
>  	 */
> -	if (unlikely(dev == NULL))
> -		return NULL;
> -
> -	return dev->archdata.dma_ops;
> +	if (dev && dev->archdata.dma_ops)
> +		return dev->archdata.dma_ops;
> +	/*
> +	 * In some cases (for example, use the Intel(R) 10 Gigabit PCI
> +	 * expression Virtual Function Network Driver -- ixgbevf.ko),
> +	 * their value of dev is the NULL. If return NULL, the driver is
> +	 * aborting. So return dma_direct_ops variable when dev == NULL.
> +	 */
> +	return &dma_direct_ops;
>  }
>  
>  static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)

^ permalink raw reply

* [PATCH v2] Move precessing of MCE queued event out from syscall exit path.
From: Mahesh J Salgaonkar @ 2014-01-14 10:15 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt; +Cc: Hugh Dickins

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Huge Dickins reported an issue that b5ff4211a829
"powerpc/book3s: Queue up and process delayed MCE events" breaks the
PowerMac G5 boot. This patch fixes it by moving the mce even processing
away from syscall exit, which was wrong to do that in first place, and
using irq work framework to delay processing of mce event.

Reported-by: Hugh Dickins <hughd@google.com
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/mce.h |    1 -
 arch/powerpc/kernel/entry_64.S |    5 -----
 arch/powerpc/kernel/mce.c      |   13 ++++++++++---
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index 2257d1e..f97d8cb 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -192,7 +192,6 @@ extern void save_mce_event(struct pt_regs *regs, long handled,
 extern int get_mce_event(struct machine_check_event *mce, bool release);
 extern void release_mce_event(void);
 extern void machine_check_queue_event(void);
-extern void machine_check_process_queued_event(void);
 extern void machine_check_print_event_info(struct machine_check_event *evt);
 extern uint64_t get_mce_fault_addr(struct machine_check_event *evt);
 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 770d6d6..bbfb029 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -184,11 +184,6 @@ syscall_exit:
 	bl	.do_show_syscall_exit
 	ld	r3,RESULT(r1)
 #endif
-#ifdef CONFIG_PPC_BOOK3S_64
-BEGIN_FTR_SECTION
-	bl	.machine_check_process_queued_event
-END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
-#endif
 	CURRENT_THREAD_INFO(r12, r1)
 
 	ld	r8,_MSR(r1)
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index d6edf2b..a7fd4cb 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -26,6 +26,7 @@
 #include <linux/ptrace.h>
 #include <linux/percpu.h>
 #include <linux/export.h>
+#include <linux/irq_work.h>
 #include <asm/mce.h>
 
 static DEFINE_PER_CPU(int, mce_nest_count);
@@ -35,6 +36,11 @@ static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], mce_event);
 static DEFINE_PER_CPU(int, mce_queue_count);
 static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], mce_event_queue);
 
+static void machine_check_process_queued_event(struct irq_work *work);
+struct irq_work mce_event_process_work = {
+        .func = machine_check_process_queued_event,
+};
+
 static void mce_set_error_info(struct machine_check_event *mce,
 			       struct mce_error_info *mce_err)
 {
@@ -185,17 +191,19 @@ void machine_check_queue_event(void)
 		return;
 	}
 	__get_cpu_var(mce_event_queue[index]) = evt;
+
+	/* Queue irq work to process this event later. */
+	irq_work_queue(&mce_event_process_work);
 }
 
 /*
  * process pending MCE event from the mce event queue. This function will be
  * called during syscall exit.
  */
-void machine_check_process_queued_event(void)
+static void machine_check_process_queued_event(struct irq_work *work)
 {
 	int index;
 
-	preempt_disable();
 	/*
 	 * For now just print it to console.
 	 * TODO: log this error event to FSP or nvram.
@@ -206,7 +214,6 @@ void machine_check_process_queued_event(void)
 				&__get_cpu_var(mce_event_queue[index]));
 		__get_cpu_var(mce_queue_count)--;
 	}
-	preempt_enable();
 }
 
 void machine_check_print_event_info(struct machine_check_event *evt)

^ permalink raw reply related

* [PATCH v1 0/6] pseries/cpuidle: pseries cpuidle backend driver clean-ups.
From: Deepthi Dharwar @ 2014-01-14 10:55 UTC (permalink / raw)
  To: linux-pm, benh, daniel.lezcano, linux-kernel, srivatsa.bhat,
	preeti, svaidy, linuxppc-dev

The following patch series includes a bunch of clean-ups for the
pseries cpuidle backend driver. This includes,
moving the driver from arch/powerpc/platforms/pseries to
driver/cpuidle, refactoring of the code, making it a non-module,
removing smt-snooze-delay update and dead code around it.

After a number of attempts to consolidate the backend cpuidle driver 
for pSeries and powernv platforms, it seems best to have separate 
idle drivers for both the platforms as any kind of  code duplication 
seizes to exist beyond snooze loop and a hot plug notifier.  
Also going further, with addition of device tree parsing to setup
idle states and related changes just for powernv platform, it is
best to keep these drivers separate without adding complexity 
and thus improving readabilty for both the platform drivers.  

The clean-up undertaken here was posted earlier as part of
generic powerpc cpuidle driver clean-up.

V1 -> http://lkml.org/lkml/2013/7/23/143
V2 -> https://lkml.org/lkml/2013/7/30/872
V3 -> http://comments.gmane.org/gmane.linux.ports.ppc.embedded/63093
V4 -> https://lkml.org/lkml/2013/8/22/25
V5 -> http://lkml.org/lkml/2013/8/22/184
V6 -> https://lkml.org/lkml/2013/8/27/432
V7 -> https://lkml.org/lkml/2013/10/29/216
V8 -> https://lkml.org/lkml/2013/11/11/29

 Deepthi Dharwar (5):
      pseries/cpuidle: Move processor_idle.c to drivers/cpuidle.
      pseries/cpuidle: Use cpuidle_register() for initialisation.
      pseries/cpuidle: Make cpuidle-pseries backend driver a non-module.
      pseries/cpuidle: Remove MAX_IDLE_STATE macro.
      pseries/cpuidle: smt-snooze-delay cleanup.

Preeti U Murthy (1):
      pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines

 arch/powerpc/include/asm/processor.h            |    7 
 arch/powerpc/kernel/sysfs.c                     |    2 
 arch/powerpc/platforms/pseries/Kconfig          |    9 -
 arch/powerpc/platforms/pseries/Makefile         |    1 
 arch/powerpc/platforms/pseries/processor_idle.c |  364 -----------------------
 drivers/cpuidle/Kconfig                         |    5 
 drivers/cpuidle/Kconfig.powerpc                 |   11 +
 drivers/cpuidle/Makefile                        |    4 
 drivers/cpuidle/cpuidle-pseries.c               |  267 +++++++++++++++++
 9 files changed, 287 insertions(+), 383 deletions(-)
 delete mode 100644 arch/powerpc/platforms/pseries/processor_idle.c
 create mode 100644 drivers/cpuidle/Kconfig.powerpc
 create mode 100644 drivers/cpuidle/cpuidle-pseries.c

-- Deepthi

^ permalink raw reply

* [PATCH v1 1/6] pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines
From: Deepthi Dharwar @ 2014-01-14 10:55 UTC (permalink / raw)
  To: linux-pm, benh, daniel.lezcano, linux-kernel, srivatsa.bhat,
	preeti, svaidy, linuxppc-dev
In-Reply-To: <20140114105525.3064.52013.stgit@deepthi.in.ibm.com>

From: Preeti U Murthy <preeti@linux.vnet.ibm.com>

Commit fbd7740fdfdf9475f(powerpc: Simplify pSeries idle loop) switched pseries
cpu idle handling from complete idle loops to ppc_md.powersave functions.
Earlier to this switch, ppc64_runlatch_off() had to be called in each of the
idle routines. But after the switch, this call is handled in arch_cpu_idle(),
just before the call to ppc_md.powersave, where platform specific idle
routines are called.

As a consequence, the call to ppc64_runlatch_off() got duplicated in the
arch_cpu_idle() routine as well as in the some of the idle routines in
pseries and commit fbd7740fdfdf9475f missed to get rid of these redundant
calls. These calls were carried over subsequent enhancements to the pseries
cpuidle routines.

Although multiple calls to ppc64_runlatch_off() is harmless, there is still some
overhead due to it. Besides that, these calls could also make way for a
misunderstanding that it is *necessary* to call ppc64_runlatch_off() multiple
times, when that is not the case. Hence this patch takes care of eliminating
this redundancy.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/processor_idle.c |    3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c
index a166e38..09e4f56 100644
--- a/arch/powerpc/platforms/pseries/processor_idle.c
+++ b/arch/powerpc/platforms/pseries/processor_idle.c
@@ -17,7 +17,6 @@
 #include <asm/reg.h>
 #include <asm/machdep.h>
 #include <asm/firmware.h>
-#include <asm/runlatch.h>
 #include <asm/plpar_wrappers.h>

 struct cpuidle_driver pseries_idle_driver = {
@@ -63,7 +62,6 @@ static int snooze_loop(struct cpuidle_device *dev,
 	set_thread_flag(TIF_POLLING_NRFLAG);

 	while ((!need_resched()) && cpu_online(cpu)) {
-		ppc64_runlatch_off();
 		HMT_low();
 		HMT_very_low();
 	}
@@ -103,7 +101,6 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
 	idle_loop_prolog(&in_purr);
 	get_lppaca()->donate_dedicated_cpu = 1;

-	ppc64_runlatch_off();
 	HMT_medium();
 	check_and_cede_processor();

^ permalink raw reply related

* [PATCH v1 2/6] pseries/cpuidle: Move processor_idle.c to drivers/cpuidle.
From: Deepthi Dharwar @ 2014-01-14 10:56 UTC (permalink / raw)
  To: linux-pm, benh, daniel.lezcano, linux-kernel, srivatsa.bhat,
	preeti, svaidy, linuxppc-dev
In-Reply-To: <20140114105525.3064.52013.stgit@deepthi.in.ibm.com>

Move the file from arch specific pseries/processor_idle.c
to drivers/cpuidle/cpuidle-pseries.c
Make the relevant Makefile and Kconfig changes.
Also, introduce Kconfig.powerpc in drivers/cpuidle
for all powerpc cpuidle drivers.

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/processor.h            |    2 
 arch/powerpc/platforms/pseries/Kconfig          |    9 -
 arch/powerpc/platforms/pseries/Makefile         |    1 
 arch/powerpc/platforms/pseries/processor_idle.c |  361 -----------------------
 drivers/cpuidle/Kconfig                         |    5 
 drivers/cpuidle/Kconfig.powerpc                 |   11 +
 drivers/cpuidle/Makefile                        |    4 
 drivers/cpuidle/cpuidle-pseries.c               |  361 +++++++++++++++++++++++
 8 files changed, 382 insertions(+), 372 deletions(-)
 delete mode 100644 arch/powerpc/platforms/pseries/processor_idle.c
 create mode 100644 drivers/cpuidle/Kconfig.powerpc
 create mode 100644 drivers/cpuidle/cpuidle-pseries.c

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index fc14a38..fa98fdf 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -445,7 +445,7 @@ enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
 extern int powersave_nap;	/* set if nap mode can be used in idle loop */
 extern void power7_nap(void);
 
-#ifdef CONFIG_PSERIES_IDLE
+#ifdef CONFIG_PSERIES_CPUIDLE
 extern void update_smt_snooze_delay(int cpu, int residency);
 #else
 static inline void update_smt_snooze_delay(int cpu, int residency) {}
diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index 62b4f80..bb59bb0 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -119,12 +119,3 @@ config DTL
 	  which are accessible through a debugfs file.
 
 	  Say N if you are unsure.
-
-config PSERIES_IDLE
-	bool "Cpuidle driver for pSeries platforms"
-	depends on CPU_IDLE
-	depends on PPC_PSERIES
-	default y
-	help
-	  Select this option to enable processor idle state management
-	  through cpuidle subsystem.
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index fbccac9..0348079 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -21,7 +21,6 @@ obj-$(CONFIG_HCALL_STATS)	+= hvCall_inst.o
 obj-$(CONFIG_CMM)		+= cmm.o
 obj-$(CONFIG_DTL)		+= dtl.o
 obj-$(CONFIG_IO_EVENT_IRQ)	+= io_event_irq.o
-obj-$(CONFIG_PSERIES_IDLE)	+= processor_idle.o
 obj-$(CONFIG_LPARCFG)		+= lparcfg.o
 
 ifeq ($(CONFIG_PPC_PSERIES),y)
diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c
deleted file mode 100644
index 09e4f56..0000000
--- a/arch/powerpc/platforms/pseries/processor_idle.c
+++ /dev/null
@@ -1,361 +0,0 @@
-/*
- *  processor_idle - idle state cpuidle driver.
- *  Adapted from drivers/idle/intel_idle.c and
- *  drivers/acpi/processor_idle.c
- *
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/moduleparam.h>
-#include <linux/cpuidle.h>
-#include <linux/cpu.h>
-#include <linux/notifier.h>
-
-#include <asm/paca.h>
-#include <asm/reg.h>
-#include <asm/machdep.h>
-#include <asm/firmware.h>
-#include <asm/plpar_wrappers.h>
-
-struct cpuidle_driver pseries_idle_driver = {
-	.name             = "pseries_idle",
-	.owner            = THIS_MODULE,
-};
-
-#define MAX_IDLE_STATE_COUNT	2
-
-static int max_idle_state = MAX_IDLE_STATE_COUNT - 1;
-static struct cpuidle_device __percpu *pseries_cpuidle_devices;
-static struct cpuidle_state *cpuidle_state_table;
-
-static inline void idle_loop_prolog(unsigned long *in_purr)
-{
-	*in_purr = mfspr(SPRN_PURR);
-	/*
-	 * Indicate to the HV that we are idle. Now would be
-	 * a good time to find other work to dispatch.
-	 */
-	get_lppaca()->idle = 1;
-}
-
-static inline void idle_loop_epilog(unsigned long in_purr)
-{
-	u64 wait_cycles;
-
-	wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
-	wait_cycles += mfspr(SPRN_PURR) - in_purr;
-	get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
-	get_lppaca()->idle = 0;
-}
-
-static int snooze_loop(struct cpuidle_device *dev,
-			struct cpuidle_driver *drv,
-			int index)
-{
-	unsigned long in_purr;
-	int cpu = dev->cpu;
-
-	idle_loop_prolog(&in_purr);
-	local_irq_enable();
-	set_thread_flag(TIF_POLLING_NRFLAG);
-
-	while ((!need_resched()) && cpu_online(cpu)) {
-		HMT_low();
-		HMT_very_low();
-	}
-
-	HMT_medium();
-	clear_thread_flag(TIF_POLLING_NRFLAG);
-	smp_mb();
-
-	idle_loop_epilog(in_purr);
-
-	return index;
-}
-
-static void check_and_cede_processor(void)
-{
-	/*
-	 * Ensure our interrupt state is properly tracked,
-	 * also checks if no interrupt has occurred while we
-	 * were soft-disabled
-	 */
-	if (prep_irq_for_idle()) {
-		cede_processor();
-#ifdef CONFIG_TRACE_IRQFLAGS
-		/* Ensure that H_CEDE returns with IRQs on */
-		if (WARN_ON(!(mfmsr() & MSR_EE)))
-			__hard_irq_enable();
-#endif
-	}
-}
-
-static int dedicated_cede_loop(struct cpuidle_device *dev,
-				struct cpuidle_driver *drv,
-				int index)
-{
-	unsigned long in_purr;
-
-	idle_loop_prolog(&in_purr);
-	get_lppaca()->donate_dedicated_cpu = 1;
-
-	HMT_medium();
-	check_and_cede_processor();
-
-	get_lppaca()->donate_dedicated_cpu = 0;
-
-	idle_loop_epilog(in_purr);
-
-	return index;
-}
-
-static int shared_cede_loop(struct cpuidle_device *dev,
-			struct cpuidle_driver *drv,
-			int index)
-{
-	unsigned long in_purr;
-
-	idle_loop_prolog(&in_purr);
-
-	/*
-	 * Yield the processor to the hypervisor.  We return if
-	 * an external interrupt occurs (which are driven prior
-	 * to returning here) or if a prod occurs from another
-	 * processor. When returning here, external interrupts
-	 * are enabled.
-	 */
-	check_and_cede_processor();
-
-	idle_loop_epilog(in_purr);
-
-	return index;
-}
-
-/*
- * States for dedicated partition case.
- */
-static struct cpuidle_state dedicated_states[MAX_IDLE_STATE_COUNT] = {
-	{ /* Snooze */
-		.name = "snooze",
-		.desc = "snooze",
-		.flags = CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 0,
-		.target_residency = 0,
-		.enter = &snooze_loop },
-	{ /* CEDE */
-		.name = "CEDE",
-		.desc = "CEDE",
-		.flags = CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 10,
-		.target_residency = 100,
-		.enter = &dedicated_cede_loop },
-};
-
-/*
- * States for shared partition case.
- */
-static struct cpuidle_state shared_states[MAX_IDLE_STATE_COUNT] = {
-	{ /* Shared Cede */
-		.name = "Shared Cede",
-		.desc = "Shared Cede",
-		.flags = CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 0,
-		.target_residency = 0,
-		.enter = &shared_cede_loop },
-};
-
-void update_smt_snooze_delay(int cpu, int residency)
-{
-	struct cpuidle_driver *drv = cpuidle_get_driver();
-	struct cpuidle_device *dev = per_cpu(cpuidle_devices, cpu);
-
-	if (cpuidle_state_table != dedicated_states)
-		return;
-
-	if (residency < 0) {
-		/* Disable the Nap state on that cpu */
-		if (dev)
-			dev->states_usage[1].disable = 1;
-	} else
-		if (drv)
-			drv->states[1].target_residency = residency;
-}
-
-static int pseries_cpuidle_add_cpu_notifier(struct notifier_block *n,
-			unsigned long action, void *hcpu)
-{
-	int hotcpu = (unsigned long)hcpu;
-	struct cpuidle_device *dev =
-			per_cpu_ptr(pseries_cpuidle_devices, hotcpu);
-
-	if (dev && cpuidle_get_driver()) {
-		switch (action) {
-		case CPU_ONLINE:
-		case CPU_ONLINE_FROZEN:
-			cpuidle_pause_and_lock();
-			cpuidle_enable_device(dev);
-			cpuidle_resume_and_unlock();
-			break;
-
-		case CPU_DEAD:
-		case CPU_DEAD_FROZEN:
-			cpuidle_pause_and_lock();
-			cpuidle_disable_device(dev);
-			cpuidle_resume_and_unlock();
-			break;
-
-		default:
-			return NOTIFY_DONE;
-		}
-	}
-	return NOTIFY_OK;
-}
-
-static struct notifier_block setup_hotplug_notifier = {
-	.notifier_call = pseries_cpuidle_add_cpu_notifier,
-};
-
-/*
- * pseries_cpuidle_driver_init()
- */
-static int pseries_cpuidle_driver_init(void)
-{
-	int idle_state;
-	struct cpuidle_driver *drv = &pseries_idle_driver;
-
-	drv->state_count = 0;
-
-	for (idle_state = 0; idle_state < MAX_IDLE_STATE_COUNT; ++idle_state) {
-
-		if (idle_state > max_idle_state)
-			break;
-
-		/* is the state not enabled? */
-		if (cpuidle_state_table[idle_state].enter == NULL)
-			continue;
-
-		drv->states[drv->state_count] =	/* structure copy */
-			cpuidle_state_table[idle_state];
-
-		drv->state_count += 1;
-	}
-
-	return 0;
-}
-
-/* pseries_idle_devices_uninit(void)
- * unregister cpuidle devices and de-allocate memory
- */
-static void pseries_idle_devices_uninit(void)
-{
-	int i;
-	struct cpuidle_device *dev;
-
-	for_each_possible_cpu(i) {
-		dev = per_cpu_ptr(pseries_cpuidle_devices, i);
-		cpuidle_unregister_device(dev);
-	}
-
-	free_percpu(pseries_cpuidle_devices);
-	return;
-}
-
-/* pseries_idle_devices_init()
- * allocate, initialize and register cpuidle device
- */
-static int pseries_idle_devices_init(void)
-{
-	int i;
-	struct cpuidle_driver *drv = &pseries_idle_driver;
-	struct cpuidle_device *dev;
-
-	pseries_cpuidle_devices = alloc_percpu(struct cpuidle_device);
-	if (pseries_cpuidle_devices == NULL)
-		return -ENOMEM;
-
-	for_each_possible_cpu(i) {
-		dev = per_cpu_ptr(pseries_cpuidle_devices, i);
-		dev->state_count = drv->state_count;
-		dev->cpu = i;
-		if (cpuidle_register_device(dev)) {
-			printk(KERN_DEBUG \
-				"cpuidle_register_device %d failed!\n", i);
-			return -EIO;
-		}
-	}
-
-	return 0;
-}
-
-/*
- * pseries_idle_probe()
- * Choose state table for shared versus dedicated partition
- */
-static int pseries_idle_probe(void)
-{
-
-	if (!firmware_has_feature(FW_FEATURE_SPLPAR))
-		return -ENODEV;
-
-	if (cpuidle_disable != IDLE_NO_OVERRIDE)
-		return -ENODEV;
-
-	if (max_idle_state == 0) {
-		printk(KERN_DEBUG "pseries processor idle disabled.\n");
-		return -EPERM;
-	}
-
-	if (lppaca_shared_proc(get_lppaca()))
-		cpuidle_state_table = shared_states;
-	else
-		cpuidle_state_table = dedicated_states;
-
-	return 0;
-}
-
-static int __init pseries_processor_idle_init(void)
-{
-	int retval;
-
-	retval = pseries_idle_probe();
-	if (retval)
-		return retval;
-
-	pseries_cpuidle_driver_init();
-	retval = cpuidle_register_driver(&pseries_idle_driver);
-	if (retval) {
-		printk(KERN_DEBUG "Registration of pseries driver failed.\n");
-		return retval;
-	}
-
-	retval = pseries_idle_devices_init();
-	if (retval) {
-		pseries_idle_devices_uninit();
-		cpuidle_unregister_driver(&pseries_idle_driver);
-		return retval;
-	}
-
-	register_cpu_notifier(&setup_hotplug_notifier);
-	printk(KERN_DEBUG "pseries_idle_driver registered\n");
-
-	return 0;
-}
-
-static void __exit pseries_processor_idle_exit(void)
-{
-
-	unregister_cpu_notifier(&setup_hotplug_notifier);
-	pseries_idle_devices_uninit();
-	cpuidle_unregister_driver(&pseries_idle_driver);
-
-	return;
-}
-
-module_init(pseries_processor_idle_init);
-module_exit(pseries_processor_idle_exit);
-
-MODULE_AUTHOR("Deepthi Dharwar <deepthi@linux.vnet.ibm.com>");
-MODULE_DESCRIPTION("Cpuidle driver for POWER");
-MODULE_LICENSE("GPL");
diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
index b3fb81d..f04e25f 100644
--- a/drivers/cpuidle/Kconfig
+++ b/drivers/cpuidle/Kconfig
@@ -35,6 +35,11 @@ depends on ARM
 source "drivers/cpuidle/Kconfig.arm"
 endmenu
 
+menu "POWERPC CPU Idle Drivers"
+depends on PPC
+source "drivers/cpuidle/Kconfig.powerpc"
+endmenu
+
 endif
 
 config ARCH_NEEDS_CPU_IDLE_COUPLED
diff --git a/drivers/cpuidle/Kconfig.powerpc b/drivers/cpuidle/Kconfig.powerpc
new file mode 100644
index 0000000..8147de5
--- /dev/null
+++ b/drivers/cpuidle/Kconfig.powerpc
@@ -0,0 +1,11 @@
+#
+# POWERPC CPU Idle Drivers
+#
+config PSERIES_CPUIDLE
+	bool "Cpuidle driver for pSeries platforms"
+	depends on CPU_IDLE
+	depends on PPC_PSERIES
+	default y
+	help
+	  Select this option to enable processor idle state management
+	  through cpuidle subsystem.
diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
index 527be28..a6331ad 100644
--- a/drivers/cpuidle/Makefile
+++ b/drivers/cpuidle/Makefile
@@ -13,3 +13,7 @@ obj-$(CONFIG_ARM_KIRKWOOD_CPUIDLE)	+= cpuidle-kirkwood.o
 obj-$(CONFIG_ARM_ZYNQ_CPUIDLE)		+= cpuidle-zynq.o
 obj-$(CONFIG_ARM_U8500_CPUIDLE)         += cpuidle-ux500.o
 obj-$(CONFIG_ARM_AT91_CPUIDLE)          += cpuidle-at91.o
+
+###############################################################################
+# POWERPC drivers
+obj-$(CONFIG_PSERIES_CPUIDLE)		+= cpuidle-pseries.o
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
new file mode 100644
index 0000000..2115478
--- /dev/null
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -0,0 +1,361 @@
+/*
+ *  cpuidle-pseries - idle state cpuidle driver.
+ *  Adapted from drivers/idle/intel_idle.c and
+ *  drivers/acpi/processor_idle.c
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/moduleparam.h>
+#include <linux/cpuidle.h>
+#include <linux/cpu.h>
+#include <linux/notifier.h>
+
+#include <asm/paca.h>
+#include <asm/reg.h>
+#include <asm/machdep.h>
+#include <asm/firmware.h>
+#include <asm/plpar_wrappers.h>
+
+struct cpuidle_driver pseries_idle_driver = {
+	.name             = "pseries_idle",
+	.owner            = THIS_MODULE,
+};
+
+#define MAX_IDLE_STATE_COUNT	2
+
+static int max_idle_state = MAX_IDLE_STATE_COUNT - 1;
+static struct cpuidle_device __percpu *pseries_cpuidle_devices;
+static struct cpuidle_state *cpuidle_state_table;
+
+static inline void idle_loop_prolog(unsigned long *in_purr)
+{
+	*in_purr = mfspr(SPRN_PURR);
+	/*
+	 * Indicate to the HV that we are idle. Now would be
+	 * a good time to find other work to dispatch.
+	 */
+	get_lppaca()->idle = 1;
+}
+
+static inline void idle_loop_epilog(unsigned long in_purr)
+{
+	u64 wait_cycles;
+
+	wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
+	wait_cycles += mfspr(SPRN_PURR) - in_purr;
+	get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
+	get_lppaca()->idle = 0;
+}
+
+static int snooze_loop(struct cpuidle_device *dev,
+			struct cpuidle_driver *drv,
+			int index)
+{
+	unsigned long in_purr;
+	int cpu = dev->cpu;
+
+	idle_loop_prolog(&in_purr);
+	local_irq_enable();
+	set_thread_flag(TIF_POLLING_NRFLAG);
+
+	while ((!need_resched()) && cpu_online(cpu)) {
+		HMT_low();
+		HMT_very_low();
+	}
+
+	HMT_medium();
+	clear_thread_flag(TIF_POLLING_NRFLAG);
+	smp_mb();
+
+	idle_loop_epilog(in_purr);
+
+	return index;
+}
+
+static void check_and_cede_processor(void)
+{
+	/*
+	 * Ensure our interrupt state is properly tracked,
+	 * also checks if no interrupt has occurred while we
+	 * were soft-disabled
+	 */
+	if (prep_irq_for_idle()) {
+		cede_processor();
+#ifdef CONFIG_TRACE_IRQFLAGS
+		/* Ensure that H_CEDE returns with IRQs on */
+		if (WARN_ON(!(mfmsr() & MSR_EE)))
+			__hard_irq_enable();
+#endif
+	}
+}
+
+static int dedicated_cede_loop(struct cpuidle_device *dev,
+				struct cpuidle_driver *drv,
+				int index)
+{
+	unsigned long in_purr;
+
+	idle_loop_prolog(&in_purr);
+	get_lppaca()->donate_dedicated_cpu = 1;
+
+	HMT_medium();
+	check_and_cede_processor();
+
+	get_lppaca()->donate_dedicated_cpu = 0;
+
+	idle_loop_epilog(in_purr);
+
+	return index;
+}
+
+static int shared_cede_loop(struct cpuidle_device *dev,
+			struct cpuidle_driver *drv,
+			int index)
+{
+	unsigned long in_purr;
+
+	idle_loop_prolog(&in_purr);
+
+	/*
+	 * Yield the processor to the hypervisor.  We return if
+	 * an external interrupt occurs (which are driven prior
+	 * to returning here) or if a prod occurs from another
+	 * processor. When returning here, external interrupts
+	 * are enabled.
+	 */
+	check_and_cede_processor();
+
+	idle_loop_epilog(in_purr);
+
+	return index;
+}
+
+/*
+ * States for dedicated partition case.
+ */
+static struct cpuidle_state dedicated_states[MAX_IDLE_STATE_COUNT] = {
+	{ /* Snooze */
+		.name = "snooze",
+		.desc = "snooze",
+		.flags = CPUIDLE_FLAG_TIME_VALID,
+		.exit_latency = 0,
+		.target_residency = 0,
+		.enter = &snooze_loop },
+	{ /* CEDE */
+		.name = "CEDE",
+		.desc = "CEDE",
+		.flags = CPUIDLE_FLAG_TIME_VALID,
+		.exit_latency = 10,
+		.target_residency = 100,
+		.enter = &dedicated_cede_loop },
+};
+
+/*
+ * States for shared partition case.
+ */
+static struct cpuidle_state shared_states[MAX_IDLE_STATE_COUNT] = {
+	{ /* Shared Cede */
+		.name = "Shared Cede",
+		.desc = "Shared Cede",
+		.flags = CPUIDLE_FLAG_TIME_VALID,
+		.exit_latency = 0,
+		.target_residency = 0,
+		.enter = &shared_cede_loop },
+};
+
+void update_smt_snooze_delay(int cpu, int residency)
+{
+	struct cpuidle_driver *drv = cpuidle_get_driver();
+	struct cpuidle_device *dev = per_cpu(cpuidle_devices, cpu);
+
+	if (cpuidle_state_table != dedicated_states)
+		return;
+
+	if (residency < 0) {
+		/* Disable the Nap state on that cpu */
+		if (dev)
+			dev->states_usage[1].disable = 1;
+	} else
+		if (drv)
+			drv->states[1].target_residency = residency;
+}
+
+static int pseries_cpuidle_add_cpu_notifier(struct notifier_block *n,
+			unsigned long action, void *hcpu)
+{
+	int hotcpu = (unsigned long)hcpu;
+	struct cpuidle_device *dev =
+			per_cpu_ptr(pseries_cpuidle_devices, hotcpu);
+
+	if (dev && cpuidle_get_driver()) {
+		switch (action) {
+		case CPU_ONLINE:
+		case CPU_ONLINE_FROZEN:
+			cpuidle_pause_and_lock();
+			cpuidle_enable_device(dev);
+			cpuidle_resume_and_unlock();
+			break;
+
+		case CPU_DEAD:
+		case CPU_DEAD_FROZEN:
+			cpuidle_pause_and_lock();
+			cpuidle_disable_device(dev);
+			cpuidle_resume_and_unlock();
+			break;
+
+		default:
+			return NOTIFY_DONE;
+		}
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block setup_hotplug_notifier = {
+	.notifier_call = pseries_cpuidle_add_cpu_notifier,
+};
+
+/*
+ * pseries_cpuidle_driver_init()
+ */
+static int pseries_cpuidle_driver_init(void)
+{
+	int idle_state;
+	struct cpuidle_driver *drv = &pseries_idle_driver;
+
+	drv->state_count = 0;
+
+	for (idle_state = 0; idle_state < MAX_IDLE_STATE_COUNT; ++idle_state) {
+
+		if (idle_state > max_idle_state)
+			break;
+
+		/* is the state not enabled? */
+		if (cpuidle_state_table[idle_state].enter == NULL)
+			continue;
+
+		drv->states[drv->state_count] =	/* structure copy */
+			cpuidle_state_table[idle_state];
+
+		drv->state_count += 1;
+	}
+
+	return 0;
+}
+
+/* pseries_idle_devices_uninit(void)
+ * unregister cpuidle devices and de-allocate memory
+ */
+static void pseries_idle_devices_uninit(void)
+{
+	int i;
+	struct cpuidle_device *dev;
+
+	for_each_possible_cpu(i) {
+		dev = per_cpu_ptr(pseries_cpuidle_devices, i);
+		cpuidle_unregister_device(dev);
+	}
+
+	free_percpu(pseries_cpuidle_devices);
+	return;
+}
+
+/* pseries_idle_devices_init()
+ * allocate, initialize and register cpuidle device
+ */
+static int pseries_idle_devices_init(void)
+{
+	int i;
+	struct cpuidle_driver *drv = &pseries_idle_driver;
+	struct cpuidle_device *dev;
+
+	pseries_cpuidle_devices = alloc_percpu(struct cpuidle_device);
+	if (pseries_cpuidle_devices == NULL)
+		return -ENOMEM;
+
+	for_each_possible_cpu(i) {
+		dev = per_cpu_ptr(pseries_cpuidle_devices, i);
+		dev->state_count = drv->state_count;
+		dev->cpu = i;
+		if (cpuidle_register_device(dev)) {
+			printk(KERN_DEBUG \
+				"cpuidle_register_device %d failed!\n", i);
+			return -EIO;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * pseries_idle_probe()
+ * Choose state table for shared versus dedicated partition
+ */
+static int pseries_idle_probe(void)
+{
+
+	if (!firmware_has_feature(FW_FEATURE_SPLPAR))
+		return -ENODEV;
+
+	if (cpuidle_disable != IDLE_NO_OVERRIDE)
+		return -ENODEV;
+
+	if (max_idle_state == 0) {
+		printk(KERN_DEBUG "pseries processor idle disabled.\n");
+		return -EPERM;
+	}
+
+	if (lppaca_shared_proc(get_lppaca()))
+		cpuidle_state_table = shared_states;
+	else
+		cpuidle_state_table = dedicated_states;
+
+	return 0;
+}
+
+static int __init pseries_processor_idle_init(void)
+{
+	int retval;
+
+	retval = pseries_idle_probe();
+	if (retval)
+		return retval;
+
+	pseries_cpuidle_driver_init();
+	retval = cpuidle_register_driver(&pseries_idle_driver);
+	if (retval) {
+		printk(KERN_DEBUG "Registration of pseries driver failed.\n");
+		return retval;
+	}
+
+	retval = pseries_idle_devices_init();
+	if (retval) {
+		pseries_idle_devices_uninit();
+		cpuidle_unregister_driver(&pseries_idle_driver);
+		return retval;
+	}
+
+	register_cpu_notifier(&setup_hotplug_notifier);
+	printk(KERN_DEBUG "pseries_idle_driver registered\n");
+
+	return 0;
+}
+
+static void __exit pseries_processor_idle_exit(void)
+{
+
+	unregister_cpu_notifier(&setup_hotplug_notifier);
+	pseries_idle_devices_uninit();
+	cpuidle_unregister_driver(&pseries_idle_driver);
+
+	return;
+}
+
+module_init(pseries_processor_idle_init);
+module_exit(pseries_processor_idle_exit);
+
+MODULE_AUTHOR("Deepthi Dharwar <deepthi@linux.vnet.ibm.com>");
+MODULE_DESCRIPTION("Cpuidle driver for POWER");
+MODULE_LICENSE("GPL");

^ permalink raw reply related

* [PATCH v1 3/6] pseries/cpuidle: Use cpuidle_register() for initialisation.
From: Deepthi Dharwar @ 2014-01-14 10:56 UTC (permalink / raw)
  To: linux-pm, benh, daniel.lezcano, linux-kernel, srivatsa.bhat,
	preeti, svaidy, linuxppc-dev
In-Reply-To: <20140114105525.3064.52013.stgit@deepthi.in.ibm.com>

This patch replaces the cpuidle driver and devices initialisation
calls with a single generic cpuidle_register() call
and also includes minor refactoring of the code around it.

Remove the cpu online check in snooze loop, as this code can
only locally run on a cpu only if it is online. Therefore,
this check is not required.

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
---
 drivers/cpuidle/cpuidle-pseries.c |   78 +++++--------------------------------
 1 file changed, 11 insertions(+), 67 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 2115478..32d86bc 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -27,7 +27,6 @@ struct cpuidle_driver pseries_idle_driver = {
 #define MAX_IDLE_STATE_COUNT	2
 
 static int max_idle_state = MAX_IDLE_STATE_COUNT - 1;
-static struct cpuidle_device __percpu *pseries_cpuidle_devices;
 static struct cpuidle_state *cpuidle_state_table;
 
 static inline void idle_loop_prolog(unsigned long *in_purr)
@@ -55,13 +54,12 @@ static int snooze_loop(struct cpuidle_device *dev,
 			int index)
 {
 	unsigned long in_purr;
-	int cpu = dev->cpu;
 
 	idle_loop_prolog(&in_purr);
 	local_irq_enable();
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
-	while ((!need_resched()) && cpu_online(cpu)) {
+	while (!need_resched()) {
 		HMT_low();
 		HMT_very_low();
 	}
@@ -188,7 +186,7 @@ static int pseries_cpuidle_add_cpu_notifier(struct notifier_block *n,
 {
 	int hotcpu = (unsigned long)hcpu;
 	struct cpuidle_device *dev =
-			per_cpu_ptr(pseries_cpuidle_devices, hotcpu);
+				per_cpu(cpuidle_devices, hotcpu);
 
 	if (dev && cpuidle_get_driver()) {
 		switch (action) {
@@ -245,50 +243,6 @@ static int pseries_cpuidle_driver_init(void)
 	return 0;
 }
 
-/* pseries_idle_devices_uninit(void)
- * unregister cpuidle devices and de-allocate memory
- */
-static void pseries_idle_devices_uninit(void)
-{
-	int i;
-	struct cpuidle_device *dev;
-
-	for_each_possible_cpu(i) {
-		dev = per_cpu_ptr(pseries_cpuidle_devices, i);
-		cpuidle_unregister_device(dev);
-	}
-
-	free_percpu(pseries_cpuidle_devices);
-	return;
-}
-
-/* pseries_idle_devices_init()
- * allocate, initialize and register cpuidle device
- */
-static int pseries_idle_devices_init(void)
-{
-	int i;
-	struct cpuidle_driver *drv = &pseries_idle_driver;
-	struct cpuidle_device *dev;
-
-	pseries_cpuidle_devices = alloc_percpu(struct cpuidle_device);
-	if (pseries_cpuidle_devices == NULL)
-		return -ENOMEM;
-
-	for_each_possible_cpu(i) {
-		dev = per_cpu_ptr(pseries_cpuidle_devices, i);
-		dev->state_count = drv->state_count;
-		dev->cpu = i;
-		if (cpuidle_register_device(dev)) {
-			printk(KERN_DEBUG \
-				"cpuidle_register_device %d failed!\n", i);
-			return -EIO;
-		}
-	}
-
-	return 0;
-}
-
 /*
  * pseries_idle_probe()
  * Choose state table for shared versus dedicated partition
@@ -296,9 +250,6 @@ static int pseries_idle_devices_init(void)
 static int pseries_idle_probe(void)
 {
 
-	if (!firmware_has_feature(FW_FEATURE_SPLPAR))
-		return -ENODEV;
-
 	if (cpuidle_disable != IDLE_NO_OVERRIDE)
 		return -ENODEV;
 
@@ -307,10 +258,13 @@ static int pseries_idle_probe(void)
 		return -EPERM;
 	}
 
-	if (lppaca_shared_proc(get_lppaca()))
-		cpuidle_state_table = shared_states;
-	else
-		cpuidle_state_table = dedicated_states;
+	if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
+		if (lppaca_shared_proc(get_lppaca()))
+			cpuidle_state_table = shared_states;
+		else
+			cpuidle_state_table = dedicated_states;
+	} else
+		return -ENODEV;
 
 	return 0;
 }
@@ -324,22 +278,14 @@ static int __init pseries_processor_idle_init(void)
 		return retval;
 
 	pseries_cpuidle_driver_init();
-	retval = cpuidle_register_driver(&pseries_idle_driver);
+	retval = cpuidle_register(&pseries_idle_driver, NULL);
 	if (retval) {
 		printk(KERN_DEBUG "Registration of pseries driver failed.\n");
 		return retval;
 	}
 
-	retval = pseries_idle_devices_init();
-	if (retval) {
-		pseries_idle_devices_uninit();
-		cpuidle_unregister_driver(&pseries_idle_driver);
-		return retval;
-	}
-
 	register_cpu_notifier(&setup_hotplug_notifier);
 	printk(KERN_DEBUG "pseries_idle_driver registered\n");
-
 	return 0;
 }
 
@@ -347,9 +293,7 @@ static void __exit pseries_processor_idle_exit(void)
 {
 
 	unregister_cpu_notifier(&setup_hotplug_notifier);
-	pseries_idle_devices_uninit();
-	cpuidle_unregister_driver(&pseries_idle_driver);
-
+	cpuidle_unregister(&pseries_idle_driver);
 	return;
 }
 

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox