* Re: [PATCH] powerpc: thp: Fix crash on mremap
From: Kirill A. Shutemov @ 2014-01-13 22:30 UTC (permalink / raw)
To: Andrew Morton
Cc: aarcange, linux-mm, paulus, Aneesh Kumar K.V, linuxppc-dev,
kirill.shutemov
In-Reply-To: <20140113141748.0b851e1573e41bf26de7c0ae@linux-foundation.org>
On Mon, Jan 13, 2014 at 02:17:48PM -0800, Andrew Morton wrote:
> On Thu, 2 Jan 2014 04:19:51 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>
> > On Wed, Jan 01, 2014 at 09:29:05PM +1100, Benjamin Herrenschmidt wrote:
> > > On Wed, 2014-01-01 at 15:23 +0530, Aneesh Kumar K.V wrote:
> > > > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > > >
> > > > This patch fix the below crash
> > > >
> > > > NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
> > > > LR [c0000000000439ac] .hash_page+0x18c/0x5e0
> > > > ...
> > > > Call Trace:
> > > > [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
> > > > [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
> > > > [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
> > > >
> > > > On ppc64 we use the pgtable for storing the hpte slot information and
> > > > store address to the pgtable at a constant offset (PTRS_PER_PMD) from
> > > > pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
> > > > the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
> > > > from new pmd.
> > > >
> > > > We also want to move the withdraw and deposit before the set_pmd so
> > > > that, when page fault find the pmd as trans huge we can be sure that
> > > > pgtable can be located at the offset.
> > > >
>
> Did this get fixed?
New version: http://thread.gmane.org/gmane.linux.kernel.mm/111809
--
Kirill A. Shutemov
^ permalink raw reply
* Re: [PATCH] powerpc: thp: Fix crash on mremap
From: Benjamin Herrenschmidt @ 2014-01-14 4:13 UTC (permalink / raw)
To: Andrew Morton
Cc: aarcange, linux-mm, paulus, Aneesh Kumar K.V, Kirill A. Shutemov,
linuxppc-dev, kirill.shutemov
In-Reply-To: <20140113141748.0b851e1573e41bf26de7c0ae@linux-foundation.org>
On Mon, 2014-01-13 at 14:17 -0800, Andrew Morton wrote:
> Did this get fixed?
Any chance you can Ack the patch on that thread ?
http://thread.gmane.org/gmane.linux.kernel.mm/111809
So I can put it in powerpc -next with a CC stable ? Or if you tell me
tat Kirill Ack is sufficient then I'll go for it.
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH mmotm/next] powerpc: fix powernv boot breakage on G5???
From: Benjamin Herrenschmidt @ 2014-01-14 4:17 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Mahesh Salgaonkar, linuxppc-dev
In-Reply-To: <alpine.LSU.2.11.1401120043210.1092@eggly.anvils>
On Sun, 2014-01-12 at 00:46 -0800, Hugh Dickins wrote:
> My PowerMac G5 cannot boot mmotm these days: different symptoms
> (starting /sbin/init failed? or ATA errors and hang?), with unrelated
> bugs adding to the confusion; but a bisection led to b5ff4211a829
> "powerpc/book3s: Queue up and process delayed MCE events". Since that
> series seems to be mostly about powernv, I tried changing BOOK3S_64
> to POWERNV in entry_64.S, which has got it back to working for me.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> just in case this happens to be right, but it's well beyond me!
> ---
Do that help instead ?
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 770d6d6..9820d36 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -187,6 +187,7 @@ syscall_exit:
#ifdef CONFIG_PPC_BOOK3S_64
BEGIN_FTR_SECTION
bl .machine_check_process_queued_event
+ ld r3,RESULT(r1)
END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
#endif
CURRENT_THREAD_INFO(r12, r1)
Cheers,
Ben.
>
> arch/powerpc/kernel/entry_64.S | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- mmotm/arch/powerpc/kernel/entry_64.S 2014-01-10 18:24:56.940448828 -0800
> +++ linux/arch/powerpc/kernel/entry_64.S 2014-01-10 18:29:24.276455182 -0800
> @@ -184,7 +184,7 @@ syscall_exit:
> bl .do_show_syscall_exit
> ld r3,RESULT(r1)
> #endif
> -#ifdef CONFIG_PPC_BOOK3S_64
> +#ifdef CONFIG_PPC_POWERNV
> BEGIN_FTR_SECTION
> bl .machine_check_process_queued_event
> END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
^ permalink raw reply related
* [PATCH] Move precessing of MCE queued event out from syscall exit path.
From: Mahesh J Salgaonkar @ 2014-01-14 4:26 UTC (permalink / raw)
To: linuxppc-dev, Benjamin Herrenschmidt; +Cc: Hugh Dickins
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Huge Dickins reported an issue that b5ff4211a829
"powerpc/book3s: Queue up and process delayed MCE events" breaks the
PowerMac G5 boot. This patch fixes it by moving the mce even processing
away from syscall exit, which was wrong to do that in first place, and
implements a different mechanism to deal with it using a paca flag and
decrementer interrupt to process the event.
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/mce.h | 3 +++
arch/powerpc/include/asm/paca.h | 3 +++
arch/powerpc/kernel/entry_64.S | 5 -----
arch/powerpc/kernel/irq.c | 11 ++++++++++-
arch/powerpc/kernel/mce.c | 7 +++++++
arch/powerpc/kernel/time.c | 9 +++++++++
6 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index 2257d1e..225e678 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -186,6 +186,9 @@ struct mce_error_info {
#define MCE_EVENT_RELEASE true
#define MCE_EVENT_DONTRELEASE false
+/* MCE bit flags (paca.mce_flags) */
+#define MCE_EVENT_PENDING 0x0001
+
extern void save_mce_event(struct pt_regs *regs, long handled,
struct mce_error_info *mce_err, uint64_t nip,
uint64_t addr);
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index c3523d1..f9aa521 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -141,6 +141,9 @@ struct paca_struct {
u8 io_sync; /* writel() needs spin_unlock sync */
u8 irq_work_pending; /* IRQ_WORK interrupt while soft-disable */
u8 nap_state_lost; /* NV GPR values lost in power7_idle */
+#ifdef CONFIG_PPC_BOOK3S_64
+ u8 mce_flags; /* MCE bit flags. */
+#endif
u64 sprg3; /* Saved user-visible sprg */
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
u64 tm_scratch; /* TM scratch area for reclaim */
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 770d6d6..bbfb029 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -184,11 +184,6 @@ syscall_exit:
bl .do_show_syscall_exit
ld r3,RESULT(r1)
#endif
-#ifdef CONFIG_PPC_BOOK3S_64
-BEGIN_FTR_SECTION
- bl .machine_check_process_queued_event
-END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
-#endif
CURRENT_THREAD_INFO(r12, r1)
ld r8,_MSR(r1)
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index ba01656..e22f591 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -67,6 +67,7 @@
#include <asm/udbg.h>
#include <asm/smp.h>
#include <asm/debug.h>
+#include <asm/mce.h>
#ifdef CONFIG_PPC64
#include <asm/paca.h>
@@ -158,9 +159,17 @@ notrace unsigned int __check_irq_replay(void)
* We may have missed a decrementer interrupt. We check the
* decrementer itself rather than the paca irq_happened field
* in case we also had a rollover while hard disabled
+ * Also check if any MCE event is queued up that requires
+ * processing. Machine check handler would set paca->mce_flags
+ * and then call set_dec(1) to trigger a decrementer interrupt
+ * from NMI.
*/
local_paca->irq_happened &= ~PACA_IRQ_DEC;
- if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow())
+ if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow()
+#ifdef CONFIG_PPC_BOOK3S_64
+ || local_paca->mce_flags & MCE_EVENT_PENDING
+#endif
+ )
return 0x900;
/* Finally check if an external interrupt happened */
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index d6edf2b..7bab827 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -185,6 +185,13 @@ void machine_check_queue_event(void)
return;
}
__get_cpu_var(mce_event_queue[index]) = evt;
+
+ /*
+ * Set the event pending flag and raise an decrementer interrupt
+ * to process the queued event later.
+ */
+ local_paca->mce_flags |= MCE_EVENT_PENDING;
+ set_dec(1);
}
/*
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index b3b1441..87ccf92 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -69,6 +69,7 @@
#include <asm/vdso_datapage.h>
#include <asm/firmware.h>
#include <asm/cputime.h>
+#include <asm/mce.h>
/* powerpc clocksource/clockevent code */
@@ -505,6 +506,14 @@ void timer_interrupt(struct pt_regs * regs)
return;
}
+#ifdef CONFIG_PPC_BOOK3S_64
+ /* Check if we have MCE event pending for processing. */
+ if (local_paca->mce_flags & MCE_EVENT_PENDING) {
+ local_paca->mce_flags &= ~MCE_EVENT_PENDING;
+ machine_check_process_queued_event();
+ }
+#endif
+
/* Conditionally hard-enable interrupts now that the DEC has been
* bumped to its maximum value
*/
^ permalink raw reply related
* Re: [PATCH] powerpc: thp: Fix crash on mremap
From: Andrew Morton @ 2014-01-14 4:32 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: aarcange, linux-mm, paulus, Aneesh Kumar K.V, Kirill A. Shutemov,
linuxppc-dev, kirill.shutemov
In-Reply-To: <1389672810.6933.0.camel@pasglop>
On Tue, 14 Jan 2014 15:13:30 +1100 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> On Mon, 2014-01-13 at 14:17 -0800, Andrew Morton wrote:
>
> > Did this get fixed?
>
> Any chance you can Ack the patch on that thread ?
>
> http://thread.gmane.org/gmane.linux.kernel.mm/111809
>
> So I can put it in powerpc -next with a CC stable ? Or if you tell me
> tat Kirill Ack is sufficient then I'll go for it.
yup, it looks OK to me from a non-ppc perspective. Please proceed as
described.
^ permalink raw reply
* [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Preeti U Murthy @ 2014-01-14 6:05 UTC (permalink / raw)
To: deepthi, paulmck, linux-pm, benh, daniel.lezcano, rjw,
linux-kernel, srivatsa.bhat, svaidy, linuxppc-dev,
tuukka.tikkanen
On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
Inspite of this it was observed that the idle state count of the shallowest
idle state, snooze, was increasing.
This is because the governor returns the idle state index as 0 even in
scenarios when no idle state can be chosen. These scenarios could be when the
latency requirement is 0 or as mentioned above when the user wants to disable
certain cpu idle states at runtime. In the latter case, its possible that no
cpu idle state is valid because the suitable states were disabled
and the rest did not match the menu governor criteria to be chosen as the
next idle state.
This patch adds the code to indicate that a valid cpu idle state could not be
chosen by the menu governor and reports back to arch so that it can take some
default action.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---
drivers/cpuidle/cpuidle.c | 6 +++++-
| 7 ++++---
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a55e68f..5bf06bb 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
/* ask the governor for the next state */
next_state = cpuidle_curr_governor->select(drv, dev);
+
+ dev->last_residency = 0;
if (need_resched()) {
- dev->last_residency = 0;
/* give the governor an opportunity to reflect on the outcome */
if (cpuidle_curr_governor->reflect)
cpuidle_curr_governor->reflect(dev, next_state);
@@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
return 0;
}
+ if (next_state < 0)
+ return -EINVAL;
+
trace_cpu_idle_rcuidle(next_state, dev->cpu);
broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
--git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index cf7f2f0..6921543 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -283,6 +283,7 @@ again:
* menu_select - selects the next idle state to enter
* @drv: cpuidle driver containing state data
* @dev: the CPU
+ * Returns -1 when no idle state is suitable
*/
static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
{
@@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
int multiplier;
struct timespec t;
- if (data->needs_update) {
+ if (data->last_state_idx >= 0 && data->needs_update) {
menu_update(drv, dev);
data->needs_update = 0;
}
- data->last_state_idx = 0;
+ data->last_state_idx = -1;
data->exit_us = 0;
/* Special case when user has set very strict latency requirement */
if (unlikely(latency_req == 0))
- return 0;
+ return data->last_state_idx;
/* determine the expected residency time, round up */
t = ktime_to_timespec(tick_nohz_get_sleep_length());
^ permalink raw reply related
* [PATCH] powerpc: Fix races with irq_work
From: Benjamin Herrenschmidt @ 2014-01-14 6:11 UTC (permalink / raw)
To: linuxppc-dev list
If we set irq_work on a processor and immediately afterward, before the
irq work has a chance to be processed, we change the decrementer value,
we can seriously delay the handling of that irq_work.
Fix it by checking in a few places for pending irq work, first before
changing the decrementer in decrementer_set_next_event() and after
changing it in the same function and in timer_interrupt().
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index afb1b56..b3dab20 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -536,6 +536,9 @@ void timer_interrupt(struct pt_regs * regs)
now = *next_tb - now;
if (now <= DECREMENTER_MAX)
set_dec((int)now);
+ /* We may have raced with new irq work */
+ if (test_irq_work_pending())
+ set_dec(1);
__get_cpu_var(irq_stat).timer_irqs_others++;
}
@@ -802,8 +805,16 @@ static void __init clocksource_init(void)
static int decrementer_set_next_event(unsigned long evt,
struct clock_event_device *dev)
{
+ /* Don't adjust the decrementer if some irq work is pending */
+ if (test_irq_work_pending())
+ return 0;
__get_cpu_var(decrementers_next_tb) = get_tb_or_rtc() + evt;
set_dec(evt);
+
+ /* We may have raced with new irq work */
+ if (test_irq_work_pending())
+ set_dec(1);
+
return 0;
}
^ permalink raw reply related
* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Deepthi Dharwar @ 2014-01-14 6:16 UTC (permalink / raw)
To: Preeti U Murthy
Cc: linux-pm, daniel.lezcano, rjw, linux-kernel, srivatsa.bhat,
paulmck, linuxppc-dev, tuukka.tikkanen
In-Reply-To: <20140114060516.6109.14901.stgit@preeti.in.ibm.com>
On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
> Inspite of this it was observed that the idle state count of the shallowest
> idle state, snooze, was increasing.
>
> This is because the governor returns the idle state index as 0 even in
> scenarios when no idle state can be chosen. These scenarios could be when the
> latency requirement is 0 or as mentioned above when the user wants to disable
> certain cpu idle states at runtime. In the latter case, its possible that no
> cpu idle state is valid because the suitable states were disabled
> and the rest did not match the menu governor criteria to be chosen as the
> next idle state.
>
> This patch adds the code to indicate that a valid cpu idle state could not be
> chosen by the menu governor and reports back to arch so that it can take some
> default action.
>
> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> ---
Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
>
> drivers/cpuidle/cpuidle.c | 6 +++++-
> drivers/cpuidle/governors/menu.c | 7 ++++---
> 2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index a55e68f..5bf06bb 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>
> /* ask the governor for the next state */
> next_state = cpuidle_curr_governor->select(drv, dev);
> +
> + dev->last_residency = 0;
> if (need_resched()) {
> - dev->last_residency = 0;
> /* give the governor an opportunity to reflect on the outcome */
> if (cpuidle_curr_governor->reflect)
> cpuidle_curr_governor->reflect(dev, next_state);
> @@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
> return 0;
> }
>
> + if (next_state < 0)
> + return -EINVAL;
> +
> trace_cpu_idle_rcuidle(next_state, dev->cpu);
>
> broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index cf7f2f0..6921543 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -283,6 +283,7 @@ again:
> * menu_select - selects the next idle state to enter
> * @drv: cpuidle driver containing state data
> * @dev: the CPU
> + * Returns -1 when no idle state is suitable
> */
> static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> {
> @@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> int multiplier;
> struct timespec t;
>
> - if (data->needs_update) {
> + if (data->last_state_idx >= 0 && data->needs_update) {
> menu_update(drv, dev);
> data->needs_update = 0;
> }
>
> - data->last_state_idx = 0;
> + data->last_state_idx = -1;
> data->exit_us = 0;
>
> /* Special case when user has set very strict latency requirement */
> if (unlikely(latency_req == 0))
> - return 0;
> + return data->last_state_idx;
>
> /* determine the expected residency time, round up */
> t = ktime_to_timespec(tick_nohz_get_sleep_length());
>
^ permalink raw reply
* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Srivatsa S. Bhat @ 2014-01-14 7:00 UTC (permalink / raw)
To: Preeti U Murthy
Cc: deepthi, linux-pm, daniel.lezcano, rjw, linux-kernel, paulmck,
linuxppc-dev, tuukka.tikkanen
In-Reply-To: <20140114060516.6109.14901.stgit@preeti.in.ibm.com>
On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
> Inspite of this it was observed that the idle state count of the shallowest
> idle state, snooze, was increasing.
>
> This is because the governor returns the idle state index as 0 even in
> scenarios when no idle state can be chosen. These scenarios could be when the
> latency requirement is 0 or as mentioned above when the user wants to disable
> certain cpu idle states at runtime. In the latter case, its possible that no
> cpu idle state is valid because the suitable states were disabled
> and the rest did not match the menu governor criteria to be chosen as the
> next idle state.
>
> This patch adds the code to indicate that a valid cpu idle state could not be
> chosen by the menu governor and reports back to arch so that it can take some
> default action.
>
That sounds fair enough. However, the "default" action of pseries idle loop
(pseries_lpar_idle()) surprises me. It enters Cede, which is _deeper_ than doing
a snooze! IOW, a user might "disable" cpuidle or set the PM_QOS_CPU_DMA_LATENCY
to 0 hoping to prevent the CPUs from going to deep idle states, but then the
machine would still end up going to Cede, even though that wont get reflected
in the idle state counts. IMHO that scenario needs some thought as well...
> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> ---
>
> drivers/cpuidle/cpuidle.c | 6 +++++-
> drivers/cpuidle/governors/menu.c | 7 ++++---
> 2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index a55e68f..5bf06bb 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>
> /* ask the governor for the next state */
> next_state = cpuidle_curr_governor->select(drv, dev);
> +
> + dev->last_residency = 0;
> if (need_resched()) {
> - dev->last_residency = 0;
> /* give the governor an opportunity to reflect on the outcome */
> if (cpuidle_curr_governor->reflect)
> cpuidle_curr_governor->reflect(dev, next_state);
The comments on top of the .reflect() routines of the governors say that the
second parameter is the index of the actual state entered. But after this patch,
next_state can be negative, indicating an invalid index. So those comments need
to be updated accordingly.
> @@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
> return 0;
> }
>
> + if (next_state < 0)
> + return -EINVAL;
The exit path above (due to need_resched) returns with irqs enabled, but the new
one you are adding (next_state < 0) returns with irqs disabled. This is correct,
because in the latter case, "idle" is still in progress and the arch will choose
a default handler to execute (unlike the former case where "idle" is over and
hence its time to enable interrupts).
IMHO it would be good to add comments around this code to explain this subtle
difference. We can never be too careful with these things... ;-)
> +
> trace_cpu_idle_rcuidle(next_state, dev->cpu);
>
> broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index cf7f2f0..6921543 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -283,6 +283,7 @@ again:
> * menu_select - selects the next idle state to enter
> * @drv: cpuidle driver containing state data
> * @dev: the CPU
> + * Returns -1 when no idle state is suitable
> */
> static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> {
> @@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> int multiplier;
> struct timespec t;
>
> - if (data->needs_update) {
> + if (data->last_state_idx >= 0 && data->needs_update) {
^^^^^
Doesn't hurt, but actually unnecessary, since ->needs_update is set to 1
only when index >= 0.
> menu_update(drv, dev);
> data->needs_update = 0;
> }
>
> - data->last_state_idx = 0;
> + data->last_state_idx = -1;
> data->exit_us = 0;
>
> /* Special case when user has set very strict latency requirement */
> if (unlikely(latency_req == 0))
> - return 0;
> + return data->last_state_idx;
>
> /* determine the expected residency time, round up */
> t = ktime_to_timespec(tick_nohz_get_sleep_length());
>
What about the ladder governor? I know its not used that much in practice,
but I think it would be good to update that as well, just to keep it
consistent.
Regards,
Srivatsa S. Bhat
^ permalink raw reply
* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Srivatsa S. Bhat @ 2014-01-14 7:37 UTC (permalink / raw)
To: Preeti U Murthy
Cc: deepthi, linux-pm, daniel.lezcano, rjw, linux-kernel, paulmck,
linuxppc-dev, tuukka.tikkanen
In-Reply-To: <52D4E07E.204@linux.vnet.ibm.com>
On 01/14/2014 12:30 PM, Srivatsa S. Bhat wrote:
> On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
>> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
>> Inspite of this it was observed that the idle state count of the shallowest
>> idle state, snooze, was increasing.
>>
>> This is because the governor returns the idle state index as 0 even in
>> scenarios when no idle state can be chosen. These scenarios could be when the
>> latency requirement is 0 or as mentioned above when the user wants to disable
>> certain cpu idle states at runtime. In the latter case, its possible that no
>> cpu idle state is valid because the suitable states were disabled
>> and the rest did not match the menu governor criteria to be chosen as the
>> next idle state.
>>
>> This patch adds the code to indicate that a valid cpu idle state could not be
>> chosen by the menu governor and reports back to arch so that it can take some
>> default action.
>>
>
> That sounds fair enough. However, the "default" action of pseries idle loop
> (pseries_lpar_idle()) surprises me. It enters Cede, which is _deeper_ than doing
> a snooze! IOW, a user might "disable" cpuidle or set the PM_QOS_CPU_DMA_LATENCY
> to 0 hoping to prevent the CPUs from going to deep idle states, but then the
> machine would still end up going to Cede, even though that wont get reflected
> in the idle state counts. IMHO that scenario needs some thought as well...
>
I checked the git history and found that the default idle was changed (on purpose)
to cede the processor, in order to speed up booting.. Hmm..
commit 363edbe2614aa90df706c0f19ccfa2a6c06af0be
Author: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Date: Fri Sep 6 00:25:06 2013 +0530
powerpc: Default arch idle could cede processor on pseries
Regards,
Srivatsa S. Bhat
^ permalink raw reply
* Re: [PATCH] Move precessing of MCE queued event out from syscall exit path.
From: Hugh Dickins @ 2014-01-14 7:47 UTC (permalink / raw)
To: Mahesh J Salgaonkar, Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <20140114042611.13145.6551.stgit@mars.in.ibm.com>
On Tue, 14 Jan 2014, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>
> Huge Dickins reported an issue that b5ff4211a829
> "powerpc/book3s: Queue up and process delayed MCE events" breaks the
> PowerMac G5 boot. This patch fixes it by moving the mce even processing
> away from syscall exit, which was wrong to do that in first place, and
> implements a different mechanism to deal with it using a paca flag and
> decrementer interrupt to process the event.
>
> Reported-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Good work guys: I can happily report that both this rework,
and Ben's one-liner, fix the issue for me on the G5: thank you.
(Irrelevant not-so-happy detail: I very nearly mailed you an hour or
so earlier to report that neither fixed it; but retried my original
CONFIG_PPC_POWERNV hack after, and found that now equally useless.
I did write of changing behaviour and ATA errors: it now appears that's
an independent but intermittent issue on the G5 in 3.13-rc7-mm1, which
coincidentally happened to trigger when I tested rc7-mm1 without fixes,
but not when I tested with my hack, until today.
I've gone back to testing on rc6-mm1, the previous week's mmotm,
which showed failure to run /sbin/init: rc6-mm1 has no trouble mounting
root, and it runs properly with your new patch, and with Ben's patch.
And I may be quite wrong to point a finger at ATA errors: perhaps
they're always shown, and quickly cleared off screen in successful boots,
but left visible when root cannot be mounted for some other reason.
I don't know, and won't have time to investigate further - bisecting
intermittents is not much fun! I'll just have to hope that it's
sorted out before it reaches 3.14-rc, or else bite the bullet and
investigate on that.)
Hugh
> ---
> arch/powerpc/include/asm/mce.h | 3 +++
> arch/powerpc/include/asm/paca.h | 3 +++
> arch/powerpc/kernel/entry_64.S | 5 -----
> arch/powerpc/kernel/irq.c | 11 ++++++++++-
> arch/powerpc/kernel/mce.c | 7 +++++++
> arch/powerpc/kernel/time.c | 9 +++++++++
> 6 files changed, 32 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
> index 2257d1e..225e678 100644
> --- a/arch/powerpc/include/asm/mce.h
> +++ b/arch/powerpc/include/asm/mce.h
> @@ -186,6 +186,9 @@ struct mce_error_info {
> #define MCE_EVENT_RELEASE true
> #define MCE_EVENT_DONTRELEASE false
>
> +/* MCE bit flags (paca.mce_flags) */
> +#define MCE_EVENT_PENDING 0x0001
> +
> extern void save_mce_event(struct pt_regs *regs, long handled,
> struct mce_error_info *mce_err, uint64_t nip,
> uint64_t addr);
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index c3523d1..f9aa521 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -141,6 +141,9 @@ struct paca_struct {
> u8 io_sync; /* writel() needs spin_unlock sync */
> u8 irq_work_pending; /* IRQ_WORK interrupt while soft-disable */
> u8 nap_state_lost; /* NV GPR values lost in power7_idle */
> +#ifdef CONFIG_PPC_BOOK3S_64
> + u8 mce_flags; /* MCE bit flags. */
> +#endif
> u64 sprg3; /* Saved user-visible sprg */
> #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> u64 tm_scratch; /* TM scratch area for reclaim */
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 770d6d6..bbfb029 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -184,11 +184,6 @@ syscall_exit:
> bl .do_show_syscall_exit
> ld r3,RESULT(r1)
> #endif
> -#ifdef CONFIG_PPC_BOOK3S_64
> -BEGIN_FTR_SECTION
> - bl .machine_check_process_queued_event
> -END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
> -#endif
> CURRENT_THREAD_INFO(r12, r1)
>
> ld r8,_MSR(r1)
> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index ba01656..e22f591 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -67,6 +67,7 @@
> #include <asm/udbg.h>
> #include <asm/smp.h>
> #include <asm/debug.h>
> +#include <asm/mce.h>
>
> #ifdef CONFIG_PPC64
> #include <asm/paca.h>
> @@ -158,9 +159,17 @@ notrace unsigned int __check_irq_replay(void)
> * We may have missed a decrementer interrupt. We check the
> * decrementer itself rather than the paca irq_happened field
> * in case we also had a rollover while hard disabled
> + * Also check if any MCE event is queued up that requires
> + * processing. Machine check handler would set paca->mce_flags
> + * and then call set_dec(1) to trigger a decrementer interrupt
> + * from NMI.
> */
> local_paca->irq_happened &= ~PACA_IRQ_DEC;
> - if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow())
> + if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow()
> +#ifdef CONFIG_PPC_BOOK3S_64
> + || local_paca->mce_flags & MCE_EVENT_PENDING
> +#endif
> + )
> return 0x900;
>
> /* Finally check if an external interrupt happened */
> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index d6edf2b..7bab827 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -185,6 +185,13 @@ void machine_check_queue_event(void)
> return;
> }
> __get_cpu_var(mce_event_queue[index]) = evt;
> +
> + /*
> + * Set the event pending flag and raise an decrementer interrupt
> + * to process the queued event later.
> + */
> + local_paca->mce_flags |= MCE_EVENT_PENDING;
> + set_dec(1);
> }
>
> /*
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index b3b1441..87ccf92 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -69,6 +69,7 @@
> #include <asm/vdso_datapage.h>
> #include <asm/firmware.h>
> #include <asm/cputime.h>
> +#include <asm/mce.h>
>
> /* powerpc clocksource/clockevent code */
>
> @@ -505,6 +506,14 @@ void timer_interrupt(struct pt_regs * regs)
> return;
> }
>
> +#ifdef CONFIG_PPC_BOOK3S_64
> + /* Check if we have MCE event pending for processing. */
> + if (local_paca->mce_flags & MCE_EVENT_PENDING) {
> + local_paca->mce_flags &= ~MCE_EVENT_PENDING;
> + machine_check_process_queued_event();
> + }
> +#endif
> +
> /* Conditionally hard-enable interrupts now that the DEC has been
> * bumped to its maximum value
> */
^ permalink raw reply
* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Deepthi Dharwar @ 2014-01-14 8:00 UTC (permalink / raw)
To: Srivatsa S. Bhat
Cc: linux-pm, daniel.lezcano, rjw, linux-kernel, Preeti U Murthy,
paulmck, linuxppc-dev, tuukka.tikkanen
In-Reply-To: <52D4E07E.204@linux.vnet.ibm.com>
On 01/14/2014 12:30 PM, Srivatsa S. Bhat wrote:
> On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
>> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
>> Inspite of this it was observed that the idle state count of the shallowest
>> idle state, snooze, was increasing.
>>
>> This is because the governor returns the idle state index as 0 even in
>> scenarios when no idle state can be chosen. These scenarios could be when the
>> latency requirement is 0 or as mentioned above when the user wants to disable
>> certain cpu idle states at runtime. In the latter case, its possible that no
>> cpu idle state is valid because the suitable states were disabled
>> and the rest did not match the menu governor criteria to be chosen as the
>> next idle state.
>>
>> This patch adds the code to indicate that a valid cpu idle state could not be
>> chosen by the menu governor and reports back to arch so that it can take some
>> default action.
>>
>
> That sounds fair enough. However, the "default" action of pseries idle loop
> (pseries_lpar_idle()) surprises me. It enters Cede, which is _deeper_ than doing
> a snooze! IOW, a user might "disable" cpuidle or set the PM_QOS_CPU_DMA_LATENCY
> to 0 hoping to prevent the CPUs from going to deep idle states, but then the
> machine would still end up going to Cede, even though that wont get reflected
> in the idle state counts. IMHO that scenario needs some thought as well...
It was the snooze loop earlier but later we changed it to cede in commit
363edbe2614 powerpc: Default arch idle will cede the processor on
pseries to address the following regressions:
>>snippet from the patch.
When adding cpuidle support to pSeries, we introduced two
regressions:
- The new cpuidle backend driver only works under hypervisors
supporting the "SLPLAR" option, which isn't the case of the
old POWER4 hypervisor and the HV "light" used on js2x blades
- The cpuidle driver registers fairly late, meaning that for
a significant portion of the boot process, we end up having
all threads spinning. This slows down the boot process and
increases the overall resource usage if the hypervisor has
shared processors.
This fixes both by implementing a "default" idle that will cede
to the hypervisor when possible, in a very simple way without
all the bells and whisles of cpuidle.
Regards,
Deepthi
>> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>> ---
>>
>> drivers/cpuidle/cpuidle.c | 6 +++++-
>> drivers/cpuidle/governors/menu.c | 7 ++++---
>> 2 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index a55e68f..5bf06bb 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>>
>> /* ask the governor for the next state */
>> next_state = cpuidle_curr_governor->select(drv, dev);
>> +
>> + dev->last_residency = 0;
>> if (need_resched()) {
>> - dev->last_residency = 0;
>> /* give the governor an opportunity to reflect on the outcome */
>> if (cpuidle_curr_governor->reflect)
>> cpuidle_curr_governor->reflect(dev, next_state);
>
> The comments on top of the .reflect() routines of the governors say that the
> second parameter is the index of the actual state entered. But after this patch,
> next_state can be negative, indicating an invalid index. So those comments need
> to be updated accordingly.
>
>> @@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
>> return 0;
>> }
>>
>> + if (next_state < 0)
>> + return -EINVAL;
>
> The exit path above (due to need_resched) returns with irqs enabled, but the new
> one you are adding (next_state < 0) returns with irqs disabled. This is correct,
> because in the latter case, "idle" is still in progress and the arch will choose
> a default handler to execute (unlike the former case where "idle" is over and
> hence its time to enable interrupts).
>
> IMHO it would be good to add comments around this code to explain this subtle
> difference. We can never be too careful with these things... ;-)
>
>> +
>> trace_cpu_idle_rcuidle(next_state, dev->cpu);
>>
>> broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>> index cf7f2f0..6921543 100644
>> --- a/drivers/cpuidle/governors/menu.c
>> +++ b/drivers/cpuidle/governors/menu.c
>> @@ -283,6 +283,7 @@ again:
>> * menu_select - selects the next idle state to enter
>> * @drv: cpuidle driver containing state data
>> * @dev: the CPU
>> + * Returns -1 when no idle state is suitable
>> */
>> static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>> {
>> @@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>> int multiplier;
>> struct timespec t;
>>
>> - if (data->needs_update) {
>> + if (data->last_state_idx >= 0 && data->needs_update) {
> ^^^^^
> Doesn't hurt, but actually unnecessary, since ->needs_update is set to 1
> only when index >= 0.
>
>> menu_update(drv, dev);
>> data->needs_update = 0;
>> }
>>
>> - data->last_state_idx = 0;
>> + data->last_state_idx = -1;
>> data->exit_us = 0;
>>
>> /* Special case when user has set very strict latency requirement */
>> if (unlikely(latency_req == 0))
>> - return 0;
>> + return data->last_state_idx;
>>
>> /* determine the expected residency time, round up */
>> t = ktime_to_timespec(tick_nohz_get_sleep_length());
>>
>
> What about the ladder governor? I know its not used that much in practice,
> but I think it would be good to update that as well, just to keep it
> consistent.
>
> Regards,
> Srivatsa S. Bhat
>
^ permalink raw reply
* [PATCH 1/3] powerpc/fsl: add E500MC and E5500 PVR define
From: Dongsheng Wang @ 2014-01-14 7:59 UTC (permalink / raw)
To: scottwood, benh; +Cc: anton, linuxppc-dev, chenhui.zhao, Wang Dongsheng
From: Wang Dongsheng <dongsheng.wang@freescale.com>
E500MC and E5500 PVR will be used in subsequent save/restore core
state patches.
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 62b114e..cd7b630 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1075,6 +1075,8 @@
#define PVR_8560 0x80200000
#define PVR_VER_E500V1 0x8020
#define PVR_VER_E500V2 0x8021
+#define PVR_VER_E500MC 0x8023
+#define PVR_VER_E5500 0x8024
#define PVR_VER_E6500 0x8040
/*
--
1.8.5
^ permalink raw reply related
* [PATCH 2/3] powerpc/85xx: Provide two functions to save/restore the core registers
From: Dongsheng Wang @ 2014-01-14 7:59 UTC (permalink / raw)
To: scottwood, benh; +Cc: anton, linuxppc-dev, chenhui.zhao, Wang Dongsheng
In-Reply-To: <1389686397-46555-1-git-send-email-dongsheng.wang@freescale.com>
From: Wang Dongsheng <dongsheng.wang@freescale.com>
Add fsl_cpu_state_save/fsl_cpu_state_restore functions, used for deep
sleep and hibernation to save/restore core registers. We abstract out
save/restore code for use in various modules, to make them don't need
to maintain.
Currently supported processors type are E6500, E5500, E500MC, E500v2 and
E500v1.
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
diff --git a/arch/powerpc/include/asm/fsl_sleep.h b/arch/powerpc/include/asm/fsl_sleep.h
new file mode 100644
index 0000000..31c8a9b
--- /dev/null
+++ b/arch/powerpc/include/asm/fsl_sleep.h
@@ -0,0 +1,98 @@
+/*
+ * Freescale 85xx Power management set
+ *
+ * Author: Wang Dongsheng <dongsheng.wang@freescale.com>
+ *
+ * Copyright 2014 Freescale Semiconductor Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __ASM_FSL_SLEEP_H
+#define __ASM_FSL_SLEEP_H
+
+/*
+ * Freescale 85xx Core registers set, core register map definition
+ * Address base on r3, we need to compatible with both 32-bit and 64-bit, so
+ * the data width is 64-bit(double word).
+ *
+ * Acronyms:
+ * dw(data width) 0x08
+ *
+ * Map:
+ * General-Purpose Registers
+ * GPR1(sp) 0
+ * GPR2 0x8 (dw * 1)
+ * GPR13 - GPR31 0x10 ~ 0xa0 (dw * 2 ~ dw * 20)
+ * Foating-point registers
+ * FPR14 - FPR31 0xa8 ~ 0x130 (dw * 21 ~ dw * 38)
+ * Registers for Branch Operations
+ * CR 0x138 (dw * 39)
+ * LR 0x140 (dw * 40)
+ * Processor Control Registers
+ * MSR 0x148 (dw * 41)
+ * EPCR 0x150 (dw * 42)
+ *
+ * Only e500, e500v2 need to save HID0 - HID1
+ * HID0 - HID1 0x158 ~ 0x160 (dw * 43 ~ dw * 44)
+ * Timer Registers
+ * TCR 0x168 (dw * 45)
+ * TB(64bit) 0x170 (dw * 46)
+ * TBU(32bit) 0x178 (dw * 47)
+ * TBL(32bit) 0x180 (dw * 48)
+ * Interrupt Registers
+ * IVPR 0x188 (dw * 49)
+ * IVOR0 - IVOR15 0x190 ~ 0x208 (dw * 50 ~ dw * 65)
+ * IVOR32 - IVOR41 0x210 ~ 0x258 (dw * 66 ~ dw * 75)
+ * Software-Use Registers
+ * SPRG1 0x260 (dw * 76), 64-bit need to save.
+ * SPRG3 0x268 (dw * 77), 32-bit need to save.
+ * MMU Registers
+ * PID0 - PID2 0x270 ~ 0x280 (dw * 78 ~ dw * 80)
+ * Debug Registers
+ * DBCR0 - DBCR2 0x288 ~ 0x298 (dw * 81 ~ dw * 83)
+ * IAC1 - IAC4 0x2a0 ~ 0x2b8 (dw * 84 ~ dw * 87)
+ * DAC1 - DAC2 0x2c0 ~ 0x2c8 (dw * 88 ~ dw * 89)
+ *
+ */
+
+#define SR_GPR1 0x000
+#define SR_GPR2 0x008
+#define SR_GPR13 0x010
+#define SR_FPR14 0x0a8
+#define SR_CR 0x138
+#define SR_LR 0x140
+#define SR_MSR 0x148
+#define SR_EPCR 0x150
+#define SR_HID0 0x158
+#define SR_TCR 0x168
+#define SR_TB 0x170
+#define SR_TBU 0x178
+#define SR_TBL 0x180
+#define SR_IVPR 0x188
+#define SR_IVOR0 0x190
+#define SR_IVOR32 0x210
+#define SR_SPRG1 0x260
+#define SR_SPRG3 0x268
+#define SR_PID0 0x270
+#define SR_DBCR0 0x288
+#define SR_IAC1 0x2a0
+#define SR_DAC1 0x2c0
+#define FSL_CPU_SR_SIZE (SR_DAC1 + 0x10)
+
+#ifndef __ASSEMBLY__
+
+enum core_save_type {
+ BASE_SAVE = 0,
+ ALL_SAVE = 1,
+};
+
+extern int fsl_cpu_state_save(void *save_page, enum core_save_type type);
+extern int fsl_cpu_state_restore(void *restore_page, enum core_save_type type);
+
+#endif
+
+#endif
+
diff --git a/arch/powerpc/platforms/85xx/Makefile b/arch/powerpc/platforms/85xx/Makefile
index 25cebe7..650a01c 100644
--- a/arch/powerpc/platforms/85xx/Makefile
+++ b/arch/powerpc/platforms/85xx/Makefile
@@ -4,6 +4,7 @@
obj-$(CONFIG_SMP) += smp.o
obj-y += common.o
+obj-y += save-core.o
obj-$(CONFIG_BSC9131_RDB) += bsc913x_rdb.o
obj-$(CONFIG_C293_PCIE) += c293pcie.o
diff --git a/arch/powerpc/platforms/85xx/save-core.S b/arch/powerpc/platforms/85xx/save-core.S
new file mode 100644
index 0000000..a6b93b8
--- /dev/null
+++ b/arch/powerpc/platforms/85xx/save-core.S
@@ -0,0 +1,497 @@
+/*
+ * Freescale Power Management, Save/Restore core state
+ *
+ * Copyright 2014 Freescale Semiconductor, Inc.
+ * Author: Wang Dongsheng <dongsheng.wang@freescale.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/fsl_sleep.h>
+
+/*
+ * Freescale 85xx Cores
+ * Support Core List:
+ * E500v1, E500v2, E500MC, E5500, E6500.
+ */
+
+ /*
+ * Save/Restore register operation define
+ */
+#define LOAD_SAVE_ADDRESS \
+ mr r10, r3
+
+#ifdef CONFIG_PPC64
+#define PPC_STD(sreg, offset, areg) \
+ std sreg, (offset)(areg)
+#define PPC_LD(lreg, offset, areg) \
+ ld lreg, (offset)(areg)
+
+#define PPC_STFD(sreg, offset, areg) \
+ stfd sreg, (offset)(areg)
+#define PPC_LFD(lreg, offset, areg) \
+ lfd lreg, (offset)(areg)
+#else
+#define PPC_STD(sreg, offset, areg) \
+ stw sreg, (offset)(areg)
+#define PPC_LD(lreg, offset, areg) \
+ lwz lreg, (offset)(areg)
+
+#define PPC_STFD(sreg, offset, areg) \
+ stfs sreg, (offset)(areg)
+#define PPC_LFD(lreg, offset, areg) \
+ lfs lreg, (offset)(areg)
+#endif
+
+#define do_save_gpr_reg(reg, addr) \
+ mr r0, reg ;\
+ PPC_STD(r0, addr, r10)
+
+#define do_restore_gpr_reg(reg, addr) \
+ PPC_LD(r0, addr, r10) ;\
+ mr reg, r0
+
+#define do_save_fpr_reg(reg, addr) \
+ fmr fr0, reg ;\
+ PPC_STFD(fr0, addr, r10)
+
+#define do_restore_fpr_reg(reg, addr) \
+ PPC_LFD(fr0, addr, r10) ;\
+ fmr reg, fr0
+
+#define do_save_spr_reg(reg, addr) \
+ mfspr r0, SPRN_##reg ;\
+ PPC_STD(r0, addr, r10)
+
+#define do_restore_spr_reg(reg, addr) \
+ PPC_LD(r0, addr, r10) ;\
+ mtspr SPRN_##reg, r0
+
+#define do_save_special_reg(special, addr) \
+ mf##special r0 ;\
+ PPC_STD(r0, addr, r10)
+#define do_restore_special_reg(special, addr) \
+ PPC_LD(r0, addr, r10) ;\
+ mt##special r0
+
+#define do_sr_general_gpr_regs(func) \
+ do_##func##_gpr_reg(r1, SR_GPR1) ;\
+ do_##func##_gpr_reg(r2, SR_GPR2) ;\
+ do_##func##_gpr_reg(r13, SR_GPR13 + 0x00) ;\
+ do_##func##_gpr_reg(r14, SR_GPR13 + 0x08) ;\
+ do_##func##_gpr_reg(r15, SR_GPR13 + 0x10) ;\
+ do_##func##_gpr_reg(r16, SR_GPR13 + 0x18) ;\
+ do_##func##_gpr_reg(r17, SR_GPR13 + 0x20) ;\
+ do_##func##_gpr_reg(r18, SR_GPR13 + 0x28) ;\
+ do_##func##_gpr_reg(r19, SR_GPR13 + 0x30) ;\
+ do_##func##_gpr_reg(r20, SR_GPR13 + 0x38) ;\
+ do_##func##_gpr_reg(r21, SR_GPR13 + 0x40) ;\
+ do_##func##_gpr_reg(r22, SR_GPR13 + 0x48) ;\
+ do_##func##_gpr_reg(r23, SR_GPR13 + 0x50) ;\
+ do_##func##_gpr_reg(r24, SR_GPR13 + 0x58) ;\
+ do_##func##_gpr_reg(r25, SR_GPR13 + 0x60) ;\
+ do_##func##_gpr_reg(r26, SR_GPR13 + 0x68) ;\
+ do_##func##_gpr_reg(r27, SR_GPR13 + 0x70) ;\
+ do_##func##_gpr_reg(r28, SR_GPR13 + 0x78) ;\
+ do_##func##_gpr_reg(r29, SR_GPR13 + 0x80) ;\
+ do_##func##_gpr_reg(r30, SR_GPR13 + 0x88) ;\
+ do_##func##_gpr_reg(r31, SR_GPR13 + 0x90)
+
+#define do_sr_fpr_regs(func) \
+ do_##func##_fpr_reg(fr14, SR_FPR14 + 0x00) ;\
+ do_##func##_fpr_reg(fr15, SR_FPR14 + 0x08) ;\
+ do_##func##_fpr_reg(fr16, SR_FPR14 + 0x10) ;\
+ do_##func##_fpr_reg(fr17, SR_FPR14 + 0x18) ;\
+ do_##func##_fpr_reg(fr18, SR_FPR14 + 0x20) ;\
+ do_##func##_fpr_reg(fr19, SR_FPR14 + 0x28) ;\
+ do_##func##_fpr_reg(fr20, SR_FPR14 + 0x30) ;\
+ do_##func##_fpr_reg(fr21, SR_FPR14 + 0x38) ;\
+ do_##func##_fpr_reg(fr22, SR_FPR14 + 0x40) ;\
+ do_##func##_fpr_reg(fr23, SR_FPR14 + 0x48) ;\
+ do_##func##_fpr_reg(fr24, SR_FPR14 + 0x50) ;\
+ do_##func##_fpr_reg(fr25, SR_FPR14 + 0x58) ;\
+ do_##func##_fpr_reg(fr26, SR_FPR14 + 0x60) ;\
+ do_##func##_fpr_reg(fr27, SR_FPR14 + 0x68) ;\
+ do_##func##_fpr_reg(fr28, SR_FPR14 + 0x70) ;\
+ do_##func##_fpr_reg(fr29, SR_FPR14 + 0x78) ;\
+ do_##func##_fpr_reg(fr30, SR_FPR14 + 0x80) ;\
+ do_##func##_fpr_reg(fr31, SR_FPR14 + 0x88)
+
+#define do_sr_general_branch_regs(func) \
+ do_##func##_special_reg(CR, SR_CR)
+
+#define do_sr_general_pcr_regs(func) \
+ do_##func##_special_reg(MSR, SR_MSR) ;\
+ do_##func##_spr_reg(EPCR, SR_EPCR) ;\
+ do_##func##_spr_reg(HID0, SR_HID0 + 0x00)
+
+#define do_sr_e500_pcr_regs(func) \
+ do_##func##_spr_reg(HID1, SR_HID0 + 0x08)
+
+#define do_sr_save_tb_regs \
+ do_save_spr_reg(TBRU, SR_TBU) ;\
+ do_save_spr_reg(TBRL, SR_TBL)
+
+#define do_sr_restore_tb_regs \
+ do_restore_spr_reg(TBWU, SR_TBU) ;\
+ do_restore_spr_reg(TBWL, SR_TBL)
+
+#define do_sr_general_time_regs(func) \
+ do_sr_##func##_tb_regs ;\
+ do_##func##_spr_reg(TCR, SR_TCR)
+
+#define do_sr_interrupt_regs(func) \
+ do_##func##_spr_reg(IVPR, SR_IVPR) ;\
+ do_##func##_spr_reg(IVOR0, SR_IVOR0 + 0x00) ;\
+ do_##func##_spr_reg(IVOR1, SR_IVOR0 + 0x08) ;\
+ do_##func##_spr_reg(IVOR2, SR_IVOR0 + 0x10) ;\
+ do_##func##_spr_reg(IVOR3, SR_IVOR0 + 0x18) ;\
+ do_##func##_spr_reg(IVOR4, SR_IVOR0 + 0x20) ;\
+ do_##func##_spr_reg(IVOR5, SR_IVOR0 + 0x28) ;\
+ do_##func##_spr_reg(IVOR6, SR_IVOR0 + 0x30) ;\
+ do_##func##_spr_reg(IVOR7, SR_IVOR0 + 0x38) ;\
+ do_##func##_spr_reg(IVOR8, SR_IVOR0 + 0x40) ;\
+ do_##func##_spr_reg(IVOR10, SR_IVOR0 + 0x50) ;\
+ do_##func##_spr_reg(IVOR11, SR_IVOR0 + 0x58) ;\
+ do_##func##_spr_reg(IVOR12, SR_IVOR0 + 0x60) ;\
+ do_##func##_spr_reg(IVOR13, SR_IVOR0 + 0x68) ;\
+ do_##func##_spr_reg(IVOR14, SR_IVOR0 + 0x70) ;\
+ do_##func##_spr_reg(IVOR15, SR_IVOR0 + 0x78)
+
+#define do_e6500_sr_interrupt_regs(func) \
+ do_##func##_spr_reg(IVOR9, SR_IVOR0 + 0x48) ;\
+ do_##func##_spr_reg(IVOR32, SR_IVOR32 + 0x00) ;\
+ do_##func##_spr_reg(IVOR33, SR_IVOR32 + 0x08) ;\
+ do_##func##_spr_reg(IVOR35, SR_IVOR32 + 0x18) ;\
+ do_##func##_spr_reg(IVOR36, SR_IVOR32 + 0x20) ;\
+ do_##func##_spr_reg(IVOR37, SR_IVOR32 + 0x28) ;\
+ do_##func##_spr_reg(IVOR38, SR_IVOR32 + 0x30) ;\
+ do_##func##_spr_reg(IVOR39, SR_IVOR32 + 0x38) ;\
+ do_##func##_spr_reg(IVOR40, SR_IVOR32 + 0x40) ;\
+ do_##func##_spr_reg(IVOR41, SR_IVOR32 + 0x48)
+
+#define do_e5500_sr_interrupt_regs(func) \
+ do_##func##_spr_reg(IVOR9, SR_IVOR0 + 0x48) ;\
+ do_##func##_spr_reg(IVOR35, SR_IVOR32 + 0x18) ;\
+ do_##func##_spr_reg(IVOR36, SR_IVOR32 + 0x20) ;\
+ do_##func##_spr_reg(IVOR37, SR_IVOR32 + 0x28) ;\
+ do_##func##_spr_reg(IVOR38, SR_IVOR32 + 0x30) ;\
+ do_##func##_spr_reg(IVOR39, SR_IVOR32 + 0x38) ;\
+ do_##func##_spr_reg(IVOR40, SR_IVOR32 + 0x40) ;\
+ do_##func##_spr_reg(IVOR41, SR_IVOR32 + 0x48)
+
+#define do_e500_sr_interrupt_regs(func) \
+ do_##func##_spr_reg(IVOR32, SR_IVOR32 + 0x00) ;\
+ do_##func##_spr_reg(IVOR33, SR_IVOR32 + 0x08) ;\
+ do_##func##_spr_reg(IVOR34, SR_IVOR32 + 0x10)
+
+#define do_e500mc_sr_interrupt_regs(func) \
+ do_##func##_spr_reg(IVOR9, SR_IVOR0 + 0x48) ;\
+ do_##func##_spr_reg(IVOR35, SR_IVOR32 + 0x18) ;\
+ do_##func##_spr_reg(IVOR36, SR_IVOR32 + 0x20) ;\
+ do_##func##_spr_reg(IVOR37, SR_IVOR32 + 0x28) ;\
+ do_##func##_spr_reg(IVOR38, SR_IVOR32 + 0x30) ;\
+ do_##func##_spr_reg(IVOR39, SR_IVOR32 + 0x38) ;\
+ do_##func##_spr_reg(IVOR40, SR_IVOR32 + 0x40) ;\
+ do_##func##_spr_reg(IVOR41, SR_IVOR32 + 0x48)
+
+#define do_sr_general_software_regs(func) \
+ do_##func##_spr_reg(SPRG1, SR_SPRG1) ;\
+ do_##func##_spr_reg(SPRG3, SR_SPRG3)
+
+#define do_sr_general_mmu_regs(func) \
+ do_##func##_spr_reg(PID0, SR_PID0 + 0x00)
+
+#define do_sr_e500_mmu_regs(func) \
+ do_##func##_spr_reg(PID1, SR_PID0 + 0x08) ;\
+ do_##func##_spr_reg(PID2, SR_PID0 + 0x10)
+
+#define do_sr_debug_regs(func) \
+ do_##func##_spr_reg(DBCR0, SR_DBCR0 + 0x00) ;\
+ do_##func##_spr_reg(DBCR1, SR_DBCR0 + 0x08) ;\
+ do_##func##_spr_reg(DBCR2, SR_DBCR0 + 0x10) ;\
+ do_##func##_spr_reg(IAC1, SR_IAC1 + 0x00) ;\
+ do_##func##_spr_reg(IAC2, SR_IAC1 + 0x08) ;\
+ do_##func##_spr_reg(DAC1, SR_DAC1 + 0x00) ;\
+ do_##func##_spr_reg(DAC2, SR_DAC1 + 0x08)
+
+#define do_e6500_sr_debug_regs(func) \
+ do_##func##_spr_reg(IAC3, SR_IAC1 + 0x10) ;\
+ do_##func##_spr_reg(IAC4, SR_IAC1 + 0x18)
+
+/*
+ * Freescale 85xx Cores, Save/Restore core registers.
+ */
+_GLOBAL(core_registers_save_area)
+ .space FSL_CPU_SR_SIZE
+
+ .section .text
+ .align 5
+_GLOBAL(fsl_cpu_base_save)
+ do_sr_general_gpr_regs(save)
+ do_sr_general_branch_regs(save)
+ do_sr_general_pcr_regs(save)
+ do_sr_general_software_regs(save)
+ do_sr_general_mmu_regs(save)
+
+ /*
+ * Need to save float-point registers if MSR[FP] = 1.
+ */
+ mfmsr r12
+ andi. r12, r12, MSR_FP
+ beq 1f
+ do_sr_fpr_regs(save)
+
+1:
+ mfspr r5, SPRN_TBRU
+ do_sr_general_time_regs(save)
+ mfspr r6, SPRN_TBRU
+ cmpw r5, r6
+ bne 1b
+
+ blr
+
+_GLOBAL(fsl_cpu_base_restore)
+ do_sr_general_gpr_regs(restore)
+ do_sr_general_branch_regs(restore)
+ do_sr_general_pcr_regs(restore)
+ do_sr_general_software_regs(restore)
+ do_sr_general_mmu_regs(restore)
+
+ isync
+
+ /*
+ * Need to restore float-point registers if MSR[FP] = 1.
+ */
+ mfmsr r12
+ andi. r12, r12, MSR_FP
+ beq 1f
+ do_sr_fpr_regs(restore)
+
+1:
+ /* Restore Time registers */
+ /* clear tb lower to avoid wrap */
+ li r0, 0
+ mtspr SPRN_TBWL, r0
+ do_sr_general_time_regs(restore)
+
+ lis r0, (TSR_ENW | TSR_WIS | TSR_DIS | TSR_FIS)@h
+ mtspr SPRN_TSR, r0
+
+ /* Kick decrementer */
+ li r0, 1
+ mtdec r0
+
+ blr
+
+/* Base registers, e500v1, e500v2 need to do some special save/restore */
+_GLOBAL(e500_base_special_save)
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500V1@l
+ cmpw r11, r12
+ beq 500f
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500V2@l
+ cmpw r11, r12
+ bne 1f
+
+500:
+ do_sr_e500_pcr_regs(save)
+ do_sr_e500_mmu_regs(save)
+1:
+ blr
+
+_GLOBAL(e500_base_special_restore)
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500V1@l
+ cmpw r11, r12
+ beq 500f
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500V2@l
+ cmpw r11, r12
+ bne 1f
+
+500:
+ do_sr_e500_pcr_regs(save)
+ do_sr_e500_mmu_regs(save)
+1:
+ blr
+
+_GLOBAL(fsl_cpu_append_save)
+ mfspr r0, SPRN_PVR
+ rlwinm r11, r0, 16, 16, 31
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E6500@l
+ cmpw r11, r12
+ beq e6500_append_save
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E5500@l
+ cmpw r11, r12
+ beq e5500_append_save
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500MC@l
+ cmpw r11, r12
+ beq e500mc_append_save
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500V2@l
+ cmpw r11, r12
+ beq e500v2_append_save
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500V1@l
+ cmpw r11, r12
+ beq e500v1_append_save
+
+ b 1f
+
+e6500_append_save:
+ do_e6500_sr_interrupt_regs(save)
+ do_e6500_sr_debug_regs(save)
+ b 1f
+
+e5500_append_save:
+ do_e5500_sr_interrupt_regs(save)
+ b 1f
+
+e500mc_append_save:
+ do_e500mc_sr_interrupt_regs(save)
+ b 1f
+
+e500v2_append_save:
+e500v1_append_save:
+ do_e500_sr_interrupt_regs(save)
+
+1:
+ do_sr_interrupt_regs(save)
+ do_sr_debug_regs(save)
+
+ blr
+
+_GLOBAL(fsl_cpu_append_restore)
+ mfspr r0, SPRN_PVR
+ rlwinm r11, r0, 16, 16, 31
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E6500@l
+ cmpw r11, r12
+ beq e6500_append_restore
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E5500@l
+ cmpw r11, r12
+ beq e5500_append_restore
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500MC@l
+ cmpw r11, r12
+ beq e500mc_append_restore
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500V2@l
+ cmpw r11, r12
+ beq e500v2_append_restore
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500V1@l
+ cmpw r11, r12
+ beq e500v1_append_restore
+
+ b 1f
+
+e6500_append_restore:
+ do_e6500_sr_interrupt_regs(restore)
+ do_e6500_sr_debug_regs(restore)
+ b 1f
+
+e5500_append_restore:
+ do_e5500_sr_interrupt_regs(restore)
+ b 1f
+
+e500mc_append_restore:
+ do_e500mc_sr_interrupt_regs(restore)
+ b 1f
+
+e500v2_append_restore:
+e500v1_append_restore:
+ do_e500_sr_interrupt_regs(restore)
+
+1:
+ do_sr_interrupt_regs(restore)
+ do_sr_debug_regs(restore)
+
+ sync
+
+ blr
+
+/*
+ * r3 = the virtual address of buffer
+ * r4 = suspend type, 0-BASE_SAVE, 1-ALL_SAVE
+ */
+_GLOBAL(fsl_cpu_state_save)
+ mflr r9
+ LOAD_SAVE_ADDRESS
+
+ /* save the return address to SR_LR */
+ do_save_gpr_reg(r9, SR_LR)
+
+ /* if core_save_type is BASE_SAVE, goto 1f */
+ cmpwi r4, 0
+ beq 1f
+
+ bl fsl_cpu_append_save
+
+1:
+ bl e500_base_special_save
+
+ bl fsl_cpu_base_save
+
+ li r3, 0
+ mtlr r9
+ blr
+
+/*
+ * r3 = the virtual address of buffer
+ * r4 = suspend type, 0-BASE_SAVE, 1-ALL_SAVE
+ */
+_GLOBAL(fsl_cpu_state_restore)
+ mflr r9
+ LOAD_SAVE_ADDRESS
+
+ /*
+ * Disable machine checks and critical exceptions,
+ * if core_save_type is ALL_SAVE, we will restore interrupt
+ * IVORs registers.
+ */
+ mfmsr r5
+ rlwinm r5, r5, 0, ~MSR_CE
+ rlwinm r5, r5, 0, ~MSR_ME
+ mtmsr r5
+ isync
+
+ /* if core_save_type is BASE_SAVE, goto 1f */
+ cmpwi r4, 0
+ beq 1f
+
+ bl fsl_cpu_append_restore
+
+1:
+ bl e500_base_special_restore
+
+ bl fsl_cpu_base_restore
+
+ /* return the return address of the save time */
+ do_restore_gpr_reg(r9, SR_LR)
+
+ li r3, 0
+ mtlr r9
+ blr
--
1.8.5
^ permalink raw reply related
* [PATCH 3/3] powerpc/fsl: Use the new interface to save or restore registers
From: Dongsheng Wang @ 2014-01-14 7:59 UTC (permalink / raw)
To: scottwood, benh; +Cc: anton, linuxppc-dev, chenhui.zhao, Wang Dongsheng
In-Reply-To: <1389686397-46555-1-git-send-email-dongsheng.wang@freescale.com>
From: Wang Dongsheng <dongsheng.wang@freescale.com>
Use fsl_cpu_state_save/fsl_cpu_state_restore to save/restore registers.
Use the functions to save/restore registers, so we don't need to
maintain the code.
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
diff --git a/arch/powerpc/kernel/swsusp_booke.S b/arch/powerpc/kernel/swsusp_booke.S
index 553c140..b5992db 100644
--- a/arch/powerpc/kernel/swsusp_booke.S
+++ b/arch/powerpc/kernel/swsusp_booke.S
@@ -4,92 +4,28 @@
* Copyright (c) 2009-2010 MontaVista Software, LLC.
*/
-#include <linux/threads.h>
-#include <asm/processor.h>
#include <asm/page.h>
-#include <asm/cputable.h>
-#include <asm/thread_info.h>
#include <asm/ppc_asm.h>
#include <asm/asm-offsets.h>
#include <asm/mmu.h>
-
-/*
- * Structure for storing CPU registers on the save area.
- */
-#define SL_SP 0
-#define SL_PC 4
-#define SL_MSR 8
-#define SL_TCR 0xc
-#define SL_SPRG0 0x10
-#define SL_SPRG1 0x14
-#define SL_SPRG2 0x18
-#define SL_SPRG3 0x1c
-#define SL_SPRG4 0x20
-#define SL_SPRG5 0x24
-#define SL_SPRG6 0x28
-#define SL_SPRG7 0x2c
-#define SL_TBU 0x30
-#define SL_TBL 0x34
-#define SL_R2 0x38
-#define SL_CR 0x3c
-#define SL_LR 0x40
-#define SL_R12 0x44 /* r12 to r31 */
-#define SL_SIZE (SL_R12 + 80)
-
- .section .data
- .align 5
-
-_GLOBAL(swsusp_save_area)
- .space SL_SIZE
-
+#include <asm/fsl_sleep.h>
.section .text
.align 5
_GLOBAL(swsusp_arch_suspend)
- lis r11,swsusp_save_area@h
- ori r11,r11,swsusp_save_area@l
-
- mflr r0
- stw r0,SL_LR(r11)
- mfcr r0
- stw r0,SL_CR(r11)
- stw r1,SL_SP(r11)
- stw r2,SL_R2(r11)
- stmw r12,SL_R12(r11)
-
- /* Save MSR & TCR */
- mfmsr r4
- stw r4,SL_MSR(r11)
- mfspr r4,SPRN_TCR
- stw r4,SL_TCR(r11)
-
- /* Get a stable timebase and save it */
-1: mfspr r4,SPRN_TBRU
- stw r4,SL_TBU(r11)
- mfspr r5,SPRN_TBRL
- stw r5,SL_TBL(r11)
- mfspr r3,SPRN_TBRU
- cmpw r3,r4
- bne 1b
+ mflr r15
+ lis r3, core_registers_save_area@h
+ ori r3, r3, core_registers_save_area@l
+
+ /* Save base register */
+ li r4, 0
+ bl fsl_cpu_state_save
- /* Save SPRGs */
- mfspr r4,SPRN_SPRG0
- stw r4,SL_SPRG0(r11)
- mfspr r4,SPRN_SPRG1
- stw r4,SL_SPRG1(r11)
- mfspr r4,SPRN_SPRG2
- stw r4,SL_SPRG2(r11)
- mfspr r4,SPRN_SPRG3
- stw r4,SL_SPRG3(r11)
- mfspr r4,SPRN_SPRG4
- stw r4,SL_SPRG4(r11)
- mfspr r4,SPRN_SPRG5
- stw r4,SL_SPRG5(r11)
- mfspr r4,SPRN_SPRG6
- stw r4,SL_SPRG6(r11)
- mfspr r4,SPRN_SPRG7
- stw r4,SL_SPRG7(r11)
+ /* Save LR */
+ lis r3, core_registers_save_area@h
+ ori r3, r3, core_registers_save_area@l
+ stw r15, SR_LR(r3)
/* Call the low level suspend stuff (we should probably have made
* a stackframe...
@@ -97,11 +33,12 @@ _GLOBAL(swsusp_arch_suspend)
bl swsusp_save
/* Restore LR from the save area */
- lis r11,swsusp_save_area@h
- ori r11,r11,swsusp_save_area@l
- lwz r0,SL_LR(r11)
- mtlr r0
+ lis r3, core_registers_save_area@h
+ ori r3, r3, core_registers_save_area@l
+ lwz r15, SR_LR(r3)
+ mtlr r15
+ li r3, 0
blr
_GLOBAL(swsusp_arch_resume)
@@ -138,9 +75,6 @@ _GLOBAL(swsusp_arch_resume)
bl flush_dcache_L1
bl flush_instruction_cache
- lis r11,swsusp_save_area@h
- ori r11,r11,swsusp_save_area@l
-
/*
* Mappings from virtual addresses to physical addresses may be
* different than they were prior to restoring hibernation state.
@@ -149,53 +83,12 @@ _GLOBAL(swsusp_arch_resume)
*/
bl _tlbil_all
- lwz r4,SL_SPRG0(r11)
- mtspr SPRN_SPRG0,r4
- lwz r4,SL_SPRG1(r11)
- mtspr SPRN_SPRG1,r4
- lwz r4,SL_SPRG2(r11)
- mtspr SPRN_SPRG2,r4
- lwz r4,SL_SPRG3(r11)
- mtspr SPRN_SPRG3,r4
- lwz r4,SL_SPRG4(r11)
- mtspr SPRN_SPRG4,r4
- lwz r4,SL_SPRG5(r11)
- mtspr SPRN_SPRG5,r4
- lwz r4,SL_SPRG6(r11)
- mtspr SPRN_SPRG6,r4
- lwz r4,SL_SPRG7(r11)
- mtspr SPRN_SPRG7,r4
-
- /* restore the MSR */
- lwz r3,SL_MSR(r11)
- mtmsr r3
-
- /* Restore TB */
- li r3,0
- mtspr SPRN_TBWL,r3
- lwz r3,SL_TBU(r11)
- lwz r4,SL_TBL(r11)
- mtspr SPRN_TBWU,r3
- mtspr SPRN_TBWL,r4
-
- /* Restore TCR and clear any pending bits in TSR. */
- lwz r4,SL_TCR(r11)
- mtspr SPRN_TCR,r4
- lis r4, (TSR_ENW | TSR_WIS | TSR_DIS | TSR_FIS)@h
- mtspr SPRN_TSR,r4
-
- /* Kick decrementer */
- li r0,1
- mtdec r0
-
- /* Restore the callee-saved registers and return */
- lwz r0,SL_CR(r11)
- mtcr r0
- lwz r2,SL_R2(r11)
- lmw r12,SL_R12(r11)
- lwz r1,SL_SP(r11)
- lwz r0,SL_LR(r11)
- mtlr r0
+ lis r3, core_registers_save_area@h
+ ori r3, r3, core_registers_save_area@l
+
+ /* Restore base register */
+ li r4, 0
+ bl fsl_cpu_state_restore
li r3,0
blr
--
1.8.5
^ permalink raw reply related
* Re: [PATCH] Move precessing of MCE queued event out from syscall exit path.
From: Benjamin Herrenschmidt @ 2014-01-14 8:20 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Mahesh J Salgaonkar, linuxppc-dev
In-Reply-To: <alpine.LSU.2.11.1401132314380.3222@eggly.anvils>
On Mon, 2014-01-13 at 23:47 -0800, Hugh Dickins wrote:
>
> And I may be quite wrong to point a finger at ATA errors: perhaps
> they're always shown, and quickly cleared off screen in successful
> boots,
> but left visible when root cannot be mounted for some other reason.
dmesg would tell...
> I don't know, and won't have time to investigate further - bisecting
> intermittents is not much fun! I'll just have to hope that it's
> sorted out before it reaches 3.14-rc, or else bite the bullet and
> investigate on that.)
Right :-) Oh well, I still use a G5 as a desktop so I might eventually
stumble upon them !
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Preeti U Murthy @ 2014-01-14 8:25 UTC (permalink / raw)
To: Srivatsa S. Bhat
Cc: deepthi, linux-pm, daniel.lezcano, rjw, linux-kernel, paulmck,
linuxppc-dev, tuukka.tikkanen
In-Reply-To: <52D4E07E.204@linux.vnet.ibm.com>
Hi Srivatsa,
On 01/14/2014 12:30 PM, Srivatsa S. Bhat wrote:
> On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
>> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
>> Inspite of this it was observed that the idle state count of the shallowest
>> idle state, snooze, was increasing.
>>
>> This is because the governor returns the idle state index as 0 even in
>> scenarios when no idle state can be chosen. These scenarios could be when the
>> latency requirement is 0 or as mentioned above when the user wants to disable
>> certain cpu idle states at runtime. In the latter case, its possible that no
>> cpu idle state is valid because the suitable states were disabled
>> and the rest did not match the menu governor criteria to be chosen as the
>> next idle state.
>>
>> This patch adds the code to indicate that a valid cpu idle state could not be
>> chosen by the menu governor and reports back to arch so that it can take some
>> default action.
>>
>
> That sounds fair enough. However, the "default" action of pseries idle loop
> (pseries_lpar_idle()) surprises me. It enters Cede, which is _deeper_ than doing
> a snooze! IOW, a user might "disable" cpuidle or set the PM_QOS_CPU_DMA_LATENCY
> to 0 hoping to prevent the CPUs from going to deep idle states, but then the
> machine would still end up going to Cede, even though that wont get reflected
> in the idle state counts. IMHO that scenario needs some thought as well...
Yes I did see this, but since the patch intends to only communicate
whether the cpuidle governor was successful in choosing an idle state on
its part, I wished to address the default action of pseries idle loop
separately. You are right we will need to understand the patch which
introduced this action. I will take a look at it.
>
>> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>> ---
>>
>> drivers/cpuidle/cpuidle.c | 6 +++++-
>> drivers/cpuidle/governors/menu.c | 7 ++++---
>> 2 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index a55e68f..5bf06bb 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>>
>> /* ask the governor for the next state */
>> next_state = cpuidle_curr_governor->select(drv, dev);
>> +
>> + dev->last_residency = 0;
>> if (need_resched()) {
>> - dev->last_residency = 0;
>> /* give the governor an opportunity to reflect on the outcome */
>> if (cpuidle_curr_governor->reflect)
>> cpuidle_curr_governor->reflect(dev, next_state);
>
> The comments on top of the .reflect() routines of the governors say that the
> second parameter is the index of the actual state entered. But after this patch,
> next_state can be negative, indicating an invalid index. So those comments need
> to be updated accordingly.
Right, I will take care of the comment in the next post.
>
>> @@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
>> return 0;
>> }
>>
>> + if (next_state < 0)
>> + return -EINVAL;
>
> The exit path above (due to need_resched) returns with irqs enabled, but the new
> one you are adding (next_state < 0) returns with irqs disabled. This is correct,
> because in the latter case, "idle" is still in progress and the arch will choose
> a default handler to execute (unlike the former case where "idle" is over and
> hence its time to enable interrupts).
Correct.
>
> IMHO it would be good to add comments around this code to explain this subtle
> difference. We can never be too careful with these things... ;-)
Ok, will do so.
>
>> +
>> trace_cpu_idle_rcuidle(next_state, dev->cpu);
>>
>> broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>> index cf7f2f0..6921543 100644
>> --- a/drivers/cpuidle/governors/menu.c
>> +++ b/drivers/cpuidle/governors/menu.c
>> @@ -283,6 +283,7 @@ again:
>> * menu_select - selects the next idle state to enter
>> * @drv: cpuidle driver containing state data
>> * @dev: the CPU
>> + * Returns -1 when no idle state is suitable
>> */
>> static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>> {
>> @@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>> int multiplier;
>> struct timespec t;
>>
>> - if (data->needs_update) {
>> + if (data->last_state_idx >= 0 && data->needs_update) {
> ^^^^^
> Doesn't hurt, but actually unnecessary, since ->needs_update is set to 1
> only when index >= 0.
Right we do not need this check. I was assuming that needs_update would
be consistent with the index >= 0 only in the need_resched() case. But
needs_update will get unset each time the governor is invoked to be set
only if index >= 0 thereafter.
>
>> menu_update(drv, dev);
>> data->needs_update = 0;
>> }
>>
>> - data->last_state_idx = 0;
>> + data->last_state_idx = -1;
>> data->exit_us = 0;
>>
>> /* Special case when user has set very strict latency requirement */
>> if (unlikely(latency_req == 0))
>> - return 0;
>> + return data->last_state_idx;
>>
>> /* determine the expected residency time, round up */
>> t = ktime_to_timespec(tick_nohz_get_sleep_length());
>>
>
> What about the ladder governor? I know its not used that much in practice,
> but I think it would be good to update that as well, just to keep it
> consistent.
Yes this needs to be updated as well. But the ladder governor has a few
other details to take care of in addition to what is taken care of in
the menu governor by this patch. Hence I will be posting that separately.
Thanks
Regards
Preeti U Murthy
>
> Regards,
> Srivatsa S. Bhat
>
^ permalink raw reply
* Re: [PATCH v3] watchdog: mpc8xxx_wdt convert to watchdog core
From: Wim Van Sebroeck @ 2014-01-14 8:32 UTC (permalink / raw)
To: Christophe Leroy
Cc: scottwood, linuxppc-dev, linux-kernel, Guenter Roeck,
linux-watchdog
In-Reply-To: <20131204063214.D1DB01A2BEA@localhost.localdomain>
Hi Christophe,
> Convert mpc8xxx_wdt.c to the new watchdog API.
>
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
This patch has been added to linux-watchdog-next.
Kind regards,
Wim.
^ permalink raw reply
* [PATCH] powerpc: dma-mapping: Return dma_direct_ops variable when dev == NULL
From: Chunhe Lan @ 2014-01-14 9:44 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Chunhe Lan
Without this patch, kind of below error will be dumped if
'insmod ixgbevf.ko' is executed:
ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function
Network Driver - version 2.7.12-k
ixgbevf: Copyright (c) 2009 - 2012 Intel Corporation.
ixgbevf 0000:01:10.0: enabling device (0000 -> 0002)
ixgbevf 0000:01:10.0: No usable DMA configuration, aborting
ixgbevf: probe of 0000:01:10.0 failed with error -5
......
......
Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Chunhe Lan <Chunhe.Lan@freescale.com>
---
arch/powerpc/include/asm/dma-mapping.h | 13 +++++++++----
1 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index e27e9ad..b8c10de 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -84,10 +84,15 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
* only ISA DMA device we support is the floppy and we have a hack
* in the floppy driver directly to get a device for us.
*/
- if (unlikely(dev == NULL))
- return NULL;
-
- return dev->archdata.dma_ops;
+ if (dev && dev->archdata.dma_ops)
+ return dev->archdata.dma_ops;
+ /*
+ * In some cases (for example, use the Intel(R) 10 Gigabit PCI
+ * expression Virtual Function Network Driver -- ixgbevf.ko),
+ * their value of dev is the NULL. If return NULL, the driver is
+ * aborting. So return dma_direct_ops variable when dev == NULL.
+ */
+ return &dma_direct_ops;
}
static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)
--
1.7.6.5
^ permalink raw reply related
* Re: [PATCH] powerpc: dma-mapping: Return dma_direct_ops variable when dev == NULL
From: Benjamin Herrenschmidt @ 2014-01-14 10:14 UTC (permalink / raw)
To: Chunhe Lan; +Cc: linux-pci, linuxppc-dev
In-Reply-To: <1389692656-27758-1-git-send-email-Chunhe.Lan@freescale.com>
On Tue, 2014-01-14 at 17:44 +0800, Chunhe Lan wrote:
> Without this patch, kind of below error will be dumped if
> 'insmod ixgbevf.ko' is executed:
>
> ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function
> Network Driver - version 2.7.12-k
> ixgbevf: Copyright (c) 2009 - 2012 Intel Corporation.
> ixgbevf 0000:01:10.0: enabling device (0000 -> 0002)
> ixgbevf 0000:01:10.0: No usable DMA configuration, aborting
> ixgbevf: probe of 0000:01:10.0 failed with error -5
> ......
> ......
That's not right. The DMA ops must be set properly for the VF somewhere
in the arch code instead. When creating VFs, is there a hook allowing
the arch to fix things up ?
(Also adding linux-pci on CC)
Ben.
> Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Tested-by: Chunhe Lan <Chunhe.Lan@freescale.com>
> ---
> arch/powerpc/include/asm/dma-mapping.h | 13 +++++++++----
> 1 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
> index e27e9ad..b8c10de 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -84,10 +84,15 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> * only ISA DMA device we support is the floppy and we have a hack
> * in the floppy driver directly to get a device for us.
> */
> - if (unlikely(dev == NULL))
> - return NULL;
> -
> - return dev->archdata.dma_ops;
> + if (dev && dev->archdata.dma_ops)
> + return dev->archdata.dma_ops;
> + /*
> + * In some cases (for example, use the Intel(R) 10 Gigabit PCI
> + * expression Virtual Function Network Driver -- ixgbevf.ko),
> + * their value of dev is the NULL. If return NULL, the driver is
> + * aborting. So return dma_direct_ops variable when dev == NULL.
> + */
> + return &dma_direct_ops;
> }
>
> static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)
^ permalink raw reply
* [PATCH v2] Move precessing of MCE queued event out from syscall exit path.
From: Mahesh J Salgaonkar @ 2014-01-14 10:15 UTC (permalink / raw)
To: linuxppc-dev, Benjamin Herrenschmidt; +Cc: Hugh Dickins
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Huge Dickins reported an issue that b5ff4211a829
"powerpc/book3s: Queue up and process delayed MCE events" breaks the
PowerMac G5 boot. This patch fixes it by moving the mce even processing
away from syscall exit, which was wrong to do that in first place, and
using irq work framework to delay processing of mce event.
Reported-by: Hugh Dickins <hughd@google.com
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/mce.h | 1 -
arch/powerpc/kernel/entry_64.S | 5 -----
arch/powerpc/kernel/mce.c | 13 ++++++++++---
3 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index 2257d1e..f97d8cb 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -192,7 +192,6 @@ extern void save_mce_event(struct pt_regs *regs, long handled,
extern int get_mce_event(struct machine_check_event *mce, bool release);
extern void release_mce_event(void);
extern void machine_check_queue_event(void);
-extern void machine_check_process_queued_event(void);
extern void machine_check_print_event_info(struct machine_check_event *evt);
extern uint64_t get_mce_fault_addr(struct machine_check_event *evt);
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 770d6d6..bbfb029 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -184,11 +184,6 @@ syscall_exit:
bl .do_show_syscall_exit
ld r3,RESULT(r1)
#endif
-#ifdef CONFIG_PPC_BOOK3S_64
-BEGIN_FTR_SECTION
- bl .machine_check_process_queued_event
-END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
-#endif
CURRENT_THREAD_INFO(r12, r1)
ld r8,_MSR(r1)
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index d6edf2b..a7fd4cb 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -26,6 +26,7 @@
#include <linux/ptrace.h>
#include <linux/percpu.h>
#include <linux/export.h>
+#include <linux/irq_work.h>
#include <asm/mce.h>
static DEFINE_PER_CPU(int, mce_nest_count);
@@ -35,6 +36,11 @@ static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], mce_event);
static DEFINE_PER_CPU(int, mce_queue_count);
static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], mce_event_queue);
+static void machine_check_process_queued_event(struct irq_work *work);
+struct irq_work mce_event_process_work = {
+ .func = machine_check_process_queued_event,
+};
+
static void mce_set_error_info(struct machine_check_event *mce,
struct mce_error_info *mce_err)
{
@@ -185,17 +191,19 @@ void machine_check_queue_event(void)
return;
}
__get_cpu_var(mce_event_queue[index]) = evt;
+
+ /* Queue irq work to process this event later. */
+ irq_work_queue(&mce_event_process_work);
}
/*
* process pending MCE event from the mce event queue. This function will be
* called during syscall exit.
*/
-void machine_check_process_queued_event(void)
+static void machine_check_process_queued_event(struct irq_work *work)
{
int index;
- preempt_disable();
/*
* For now just print it to console.
* TODO: log this error event to FSP or nvram.
@@ -206,7 +214,6 @@ void machine_check_process_queued_event(void)
&__get_cpu_var(mce_event_queue[index]));
__get_cpu_var(mce_queue_count)--;
}
- preempt_enable();
}
void machine_check_print_event_info(struct machine_check_event *evt)
^ permalink raw reply related
* [PATCH v1 0/6] pseries/cpuidle: pseries cpuidle backend driver clean-ups.
From: Deepthi Dharwar @ 2014-01-14 10:55 UTC (permalink / raw)
To: linux-pm, benh, daniel.lezcano, linux-kernel, srivatsa.bhat,
preeti, svaidy, linuxppc-dev
The following patch series includes a bunch of clean-ups for the
pseries cpuidle backend driver. This includes,
moving the driver from arch/powerpc/platforms/pseries to
driver/cpuidle, refactoring of the code, making it a non-module,
removing smt-snooze-delay update and dead code around it.
After a number of attempts to consolidate the backend cpuidle driver
for pSeries and powernv platforms, it seems best to have separate
idle drivers for both the platforms as any kind of code duplication
seizes to exist beyond snooze loop and a hot plug notifier.
Also going further, with addition of device tree parsing to setup
idle states and related changes just for powernv platform, it is
best to keep these drivers separate without adding complexity
and thus improving readabilty for both the platform drivers.
The clean-up undertaken here was posted earlier as part of
generic powerpc cpuidle driver clean-up.
V1 -> http://lkml.org/lkml/2013/7/23/143
V2 -> https://lkml.org/lkml/2013/7/30/872
V3 -> http://comments.gmane.org/gmane.linux.ports.ppc.embedded/63093
V4 -> https://lkml.org/lkml/2013/8/22/25
V5 -> http://lkml.org/lkml/2013/8/22/184
V6 -> https://lkml.org/lkml/2013/8/27/432
V7 -> https://lkml.org/lkml/2013/10/29/216
V8 -> https://lkml.org/lkml/2013/11/11/29
Deepthi Dharwar (5):
pseries/cpuidle: Move processor_idle.c to drivers/cpuidle.
pseries/cpuidle: Use cpuidle_register() for initialisation.
pseries/cpuidle: Make cpuidle-pseries backend driver a non-module.
pseries/cpuidle: Remove MAX_IDLE_STATE macro.
pseries/cpuidle: smt-snooze-delay cleanup.
Preeti U Murthy (1):
pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines
arch/powerpc/include/asm/processor.h | 7
arch/powerpc/kernel/sysfs.c | 2
arch/powerpc/platforms/pseries/Kconfig | 9 -
arch/powerpc/platforms/pseries/Makefile | 1
arch/powerpc/platforms/pseries/processor_idle.c | 364 -----------------------
drivers/cpuidle/Kconfig | 5
drivers/cpuidle/Kconfig.powerpc | 11 +
drivers/cpuidle/Makefile | 4
drivers/cpuidle/cpuidle-pseries.c | 267 +++++++++++++++++
9 files changed, 287 insertions(+), 383 deletions(-)
delete mode 100644 arch/powerpc/platforms/pseries/processor_idle.c
create mode 100644 drivers/cpuidle/Kconfig.powerpc
create mode 100644 drivers/cpuidle/cpuidle-pseries.c
-- Deepthi
^ permalink raw reply
* [PATCH v1 1/6] pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines
From: Deepthi Dharwar @ 2014-01-14 10:55 UTC (permalink / raw)
To: linux-pm, benh, daniel.lezcano, linux-kernel, srivatsa.bhat,
preeti, svaidy, linuxppc-dev
In-Reply-To: <20140114105525.3064.52013.stgit@deepthi.in.ibm.com>
From: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Commit fbd7740fdfdf9475f(powerpc: Simplify pSeries idle loop) switched pseries
cpu idle handling from complete idle loops to ppc_md.powersave functions.
Earlier to this switch, ppc64_runlatch_off() had to be called in each of the
idle routines. But after the switch, this call is handled in arch_cpu_idle(),
just before the call to ppc_md.powersave, where platform specific idle
routines are called.
As a consequence, the call to ppc64_runlatch_off() got duplicated in the
arch_cpu_idle() routine as well as in the some of the idle routines in
pseries and commit fbd7740fdfdf9475f missed to get rid of these redundant
calls. These calls were carried over subsequent enhancements to the pseries
cpuidle routines.
Although multiple calls to ppc64_runlatch_off() is harmless, there is still some
overhead due to it. Besides that, these calls could also make way for a
misunderstanding that it is *necessary* to call ppc64_runlatch_off() multiple
times, when that is not the case. Hence this patch takes care of eliminating
this redundancy.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
---
arch/powerpc/platforms/pseries/processor_idle.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c
index a166e38..09e4f56 100644
--- a/arch/powerpc/platforms/pseries/processor_idle.c
+++ b/arch/powerpc/platforms/pseries/processor_idle.c
@@ -17,7 +17,6 @@
#include <asm/reg.h>
#include <asm/machdep.h>
#include <asm/firmware.h>
-#include <asm/runlatch.h>
#include <asm/plpar_wrappers.h>
struct cpuidle_driver pseries_idle_driver = {
@@ -63,7 +62,6 @@ static int snooze_loop(struct cpuidle_device *dev,
set_thread_flag(TIF_POLLING_NRFLAG);
while ((!need_resched()) && cpu_online(cpu)) {
- ppc64_runlatch_off();
HMT_low();
HMT_very_low();
}
@@ -103,7 +101,6 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
idle_loop_prolog(&in_purr);
get_lppaca()->donate_dedicated_cpu = 1;
- ppc64_runlatch_off();
HMT_medium();
check_and_cede_processor();
^ permalink raw reply related
* [PATCH v1 2/6] pseries/cpuidle: Move processor_idle.c to drivers/cpuidle.
From: Deepthi Dharwar @ 2014-01-14 10:56 UTC (permalink / raw)
To: linux-pm, benh, daniel.lezcano, linux-kernel, srivatsa.bhat,
preeti, svaidy, linuxppc-dev
In-Reply-To: <20140114105525.3064.52013.stgit@deepthi.in.ibm.com>
Move the file from arch specific pseries/processor_idle.c
to drivers/cpuidle/cpuidle-pseries.c
Make the relevant Makefile and Kconfig changes.
Also, introduce Kconfig.powerpc in drivers/cpuidle
for all powerpc cpuidle drivers.
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/processor.h | 2
arch/powerpc/platforms/pseries/Kconfig | 9 -
arch/powerpc/platforms/pseries/Makefile | 1
arch/powerpc/platforms/pseries/processor_idle.c | 361 -----------------------
drivers/cpuidle/Kconfig | 5
drivers/cpuidle/Kconfig.powerpc | 11 +
drivers/cpuidle/Makefile | 4
drivers/cpuidle/cpuidle-pseries.c | 361 +++++++++++++++++++++++
8 files changed, 382 insertions(+), 372 deletions(-)
delete mode 100644 arch/powerpc/platforms/pseries/processor_idle.c
create mode 100644 drivers/cpuidle/Kconfig.powerpc
create mode 100644 drivers/cpuidle/cpuidle-pseries.c
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index fc14a38..fa98fdf 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -445,7 +445,7 @@ enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
extern int powersave_nap; /* set if nap mode can be used in idle loop */
extern void power7_nap(void);
-#ifdef CONFIG_PSERIES_IDLE
+#ifdef CONFIG_PSERIES_CPUIDLE
extern void update_smt_snooze_delay(int cpu, int residency);
#else
static inline void update_smt_snooze_delay(int cpu, int residency) {}
diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index 62b4f80..bb59bb0 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -119,12 +119,3 @@ config DTL
which are accessible through a debugfs file.
Say N if you are unsure.
-
-config PSERIES_IDLE
- bool "Cpuidle driver for pSeries platforms"
- depends on CPU_IDLE
- depends on PPC_PSERIES
- default y
- help
- Select this option to enable processor idle state management
- through cpuidle subsystem.
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index fbccac9..0348079 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -21,7 +21,6 @@ obj-$(CONFIG_HCALL_STATS) += hvCall_inst.o
obj-$(CONFIG_CMM) += cmm.o
obj-$(CONFIG_DTL) += dtl.o
obj-$(CONFIG_IO_EVENT_IRQ) += io_event_irq.o
-obj-$(CONFIG_PSERIES_IDLE) += processor_idle.o
obj-$(CONFIG_LPARCFG) += lparcfg.o
ifeq ($(CONFIG_PPC_PSERIES),y)
diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c
deleted file mode 100644
index 09e4f56..0000000
--- a/arch/powerpc/platforms/pseries/processor_idle.c
+++ /dev/null
@@ -1,361 +0,0 @@
-/*
- * processor_idle - idle state cpuidle driver.
- * Adapted from drivers/idle/intel_idle.c and
- * drivers/acpi/processor_idle.c
- *
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/moduleparam.h>
-#include <linux/cpuidle.h>
-#include <linux/cpu.h>
-#include <linux/notifier.h>
-
-#include <asm/paca.h>
-#include <asm/reg.h>
-#include <asm/machdep.h>
-#include <asm/firmware.h>
-#include <asm/plpar_wrappers.h>
-
-struct cpuidle_driver pseries_idle_driver = {
- .name = "pseries_idle",
- .owner = THIS_MODULE,
-};
-
-#define MAX_IDLE_STATE_COUNT 2
-
-static int max_idle_state = MAX_IDLE_STATE_COUNT - 1;
-static struct cpuidle_device __percpu *pseries_cpuidle_devices;
-static struct cpuidle_state *cpuidle_state_table;
-
-static inline void idle_loop_prolog(unsigned long *in_purr)
-{
- *in_purr = mfspr(SPRN_PURR);
- /*
- * Indicate to the HV that we are idle. Now would be
- * a good time to find other work to dispatch.
- */
- get_lppaca()->idle = 1;
-}
-
-static inline void idle_loop_epilog(unsigned long in_purr)
-{
- u64 wait_cycles;
-
- wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
- wait_cycles += mfspr(SPRN_PURR) - in_purr;
- get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
- get_lppaca()->idle = 0;
-}
-
-static int snooze_loop(struct cpuidle_device *dev,
- struct cpuidle_driver *drv,
- int index)
-{
- unsigned long in_purr;
- int cpu = dev->cpu;
-
- idle_loop_prolog(&in_purr);
- local_irq_enable();
- set_thread_flag(TIF_POLLING_NRFLAG);
-
- while ((!need_resched()) && cpu_online(cpu)) {
- HMT_low();
- HMT_very_low();
- }
-
- HMT_medium();
- clear_thread_flag(TIF_POLLING_NRFLAG);
- smp_mb();
-
- idle_loop_epilog(in_purr);
-
- return index;
-}
-
-static void check_and_cede_processor(void)
-{
- /*
- * Ensure our interrupt state is properly tracked,
- * also checks if no interrupt has occurred while we
- * were soft-disabled
- */
- if (prep_irq_for_idle()) {
- cede_processor();
-#ifdef CONFIG_TRACE_IRQFLAGS
- /* Ensure that H_CEDE returns with IRQs on */
- if (WARN_ON(!(mfmsr() & MSR_EE)))
- __hard_irq_enable();
-#endif
- }
-}
-
-static int dedicated_cede_loop(struct cpuidle_device *dev,
- struct cpuidle_driver *drv,
- int index)
-{
- unsigned long in_purr;
-
- idle_loop_prolog(&in_purr);
- get_lppaca()->donate_dedicated_cpu = 1;
-
- HMT_medium();
- check_and_cede_processor();
-
- get_lppaca()->donate_dedicated_cpu = 0;
-
- idle_loop_epilog(in_purr);
-
- return index;
-}
-
-static int shared_cede_loop(struct cpuidle_device *dev,
- struct cpuidle_driver *drv,
- int index)
-{
- unsigned long in_purr;
-
- idle_loop_prolog(&in_purr);
-
- /*
- * Yield the processor to the hypervisor. We return if
- * an external interrupt occurs (which are driven prior
- * to returning here) or if a prod occurs from another
- * processor. When returning here, external interrupts
- * are enabled.
- */
- check_and_cede_processor();
-
- idle_loop_epilog(in_purr);
-
- return index;
-}
-
-/*
- * States for dedicated partition case.
- */
-static struct cpuidle_state dedicated_states[MAX_IDLE_STATE_COUNT] = {
- { /* Snooze */
- .name = "snooze",
- .desc = "snooze",
- .flags = CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 0,
- .target_residency = 0,
- .enter = &snooze_loop },
- { /* CEDE */
- .name = "CEDE",
- .desc = "CEDE",
- .flags = CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 10,
- .target_residency = 100,
- .enter = &dedicated_cede_loop },
-};
-
-/*
- * States for shared partition case.
- */
-static struct cpuidle_state shared_states[MAX_IDLE_STATE_COUNT] = {
- { /* Shared Cede */
- .name = "Shared Cede",
- .desc = "Shared Cede",
- .flags = CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 0,
- .target_residency = 0,
- .enter = &shared_cede_loop },
-};
-
-void update_smt_snooze_delay(int cpu, int residency)
-{
- struct cpuidle_driver *drv = cpuidle_get_driver();
- struct cpuidle_device *dev = per_cpu(cpuidle_devices, cpu);
-
- if (cpuidle_state_table != dedicated_states)
- return;
-
- if (residency < 0) {
- /* Disable the Nap state on that cpu */
- if (dev)
- dev->states_usage[1].disable = 1;
- } else
- if (drv)
- drv->states[1].target_residency = residency;
-}
-
-static int pseries_cpuidle_add_cpu_notifier(struct notifier_block *n,
- unsigned long action, void *hcpu)
-{
- int hotcpu = (unsigned long)hcpu;
- struct cpuidle_device *dev =
- per_cpu_ptr(pseries_cpuidle_devices, hotcpu);
-
- if (dev && cpuidle_get_driver()) {
- switch (action) {
- case CPU_ONLINE:
- case CPU_ONLINE_FROZEN:
- cpuidle_pause_and_lock();
- cpuidle_enable_device(dev);
- cpuidle_resume_and_unlock();
- break;
-
- case CPU_DEAD:
- case CPU_DEAD_FROZEN:
- cpuidle_pause_and_lock();
- cpuidle_disable_device(dev);
- cpuidle_resume_and_unlock();
- break;
-
- default:
- return NOTIFY_DONE;
- }
- }
- return NOTIFY_OK;
-}
-
-static struct notifier_block setup_hotplug_notifier = {
- .notifier_call = pseries_cpuidle_add_cpu_notifier,
-};
-
-/*
- * pseries_cpuidle_driver_init()
- */
-static int pseries_cpuidle_driver_init(void)
-{
- int idle_state;
- struct cpuidle_driver *drv = &pseries_idle_driver;
-
- drv->state_count = 0;
-
- for (idle_state = 0; idle_state < MAX_IDLE_STATE_COUNT; ++idle_state) {
-
- if (idle_state > max_idle_state)
- break;
-
- /* is the state not enabled? */
- if (cpuidle_state_table[idle_state].enter == NULL)
- continue;
-
- drv->states[drv->state_count] = /* structure copy */
- cpuidle_state_table[idle_state];
-
- drv->state_count += 1;
- }
-
- return 0;
-}
-
-/* pseries_idle_devices_uninit(void)
- * unregister cpuidle devices and de-allocate memory
- */
-static void pseries_idle_devices_uninit(void)
-{
- int i;
- struct cpuidle_device *dev;
-
- for_each_possible_cpu(i) {
- dev = per_cpu_ptr(pseries_cpuidle_devices, i);
- cpuidle_unregister_device(dev);
- }
-
- free_percpu(pseries_cpuidle_devices);
- return;
-}
-
-/* pseries_idle_devices_init()
- * allocate, initialize and register cpuidle device
- */
-static int pseries_idle_devices_init(void)
-{
- int i;
- struct cpuidle_driver *drv = &pseries_idle_driver;
- struct cpuidle_device *dev;
-
- pseries_cpuidle_devices = alloc_percpu(struct cpuidle_device);
- if (pseries_cpuidle_devices == NULL)
- return -ENOMEM;
-
- for_each_possible_cpu(i) {
- dev = per_cpu_ptr(pseries_cpuidle_devices, i);
- dev->state_count = drv->state_count;
- dev->cpu = i;
- if (cpuidle_register_device(dev)) {
- printk(KERN_DEBUG \
- "cpuidle_register_device %d failed!\n", i);
- return -EIO;
- }
- }
-
- return 0;
-}
-
-/*
- * pseries_idle_probe()
- * Choose state table for shared versus dedicated partition
- */
-static int pseries_idle_probe(void)
-{
-
- if (!firmware_has_feature(FW_FEATURE_SPLPAR))
- return -ENODEV;
-
- if (cpuidle_disable != IDLE_NO_OVERRIDE)
- return -ENODEV;
-
- if (max_idle_state == 0) {
- printk(KERN_DEBUG "pseries processor idle disabled.\n");
- return -EPERM;
- }
-
- if (lppaca_shared_proc(get_lppaca()))
- cpuidle_state_table = shared_states;
- else
- cpuidle_state_table = dedicated_states;
-
- return 0;
-}
-
-static int __init pseries_processor_idle_init(void)
-{
- int retval;
-
- retval = pseries_idle_probe();
- if (retval)
- return retval;
-
- pseries_cpuidle_driver_init();
- retval = cpuidle_register_driver(&pseries_idle_driver);
- if (retval) {
- printk(KERN_DEBUG "Registration of pseries driver failed.\n");
- return retval;
- }
-
- retval = pseries_idle_devices_init();
- if (retval) {
- pseries_idle_devices_uninit();
- cpuidle_unregister_driver(&pseries_idle_driver);
- return retval;
- }
-
- register_cpu_notifier(&setup_hotplug_notifier);
- printk(KERN_DEBUG "pseries_idle_driver registered\n");
-
- return 0;
-}
-
-static void __exit pseries_processor_idle_exit(void)
-{
-
- unregister_cpu_notifier(&setup_hotplug_notifier);
- pseries_idle_devices_uninit();
- cpuidle_unregister_driver(&pseries_idle_driver);
-
- return;
-}
-
-module_init(pseries_processor_idle_init);
-module_exit(pseries_processor_idle_exit);
-
-MODULE_AUTHOR("Deepthi Dharwar <deepthi@linux.vnet.ibm.com>");
-MODULE_DESCRIPTION("Cpuidle driver for POWER");
-MODULE_LICENSE("GPL");
diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
index b3fb81d..f04e25f 100644
--- a/drivers/cpuidle/Kconfig
+++ b/drivers/cpuidle/Kconfig
@@ -35,6 +35,11 @@ depends on ARM
source "drivers/cpuidle/Kconfig.arm"
endmenu
+menu "POWERPC CPU Idle Drivers"
+depends on PPC
+source "drivers/cpuidle/Kconfig.powerpc"
+endmenu
+
endif
config ARCH_NEEDS_CPU_IDLE_COUPLED
diff --git a/drivers/cpuidle/Kconfig.powerpc b/drivers/cpuidle/Kconfig.powerpc
new file mode 100644
index 0000000..8147de5
--- /dev/null
+++ b/drivers/cpuidle/Kconfig.powerpc
@@ -0,0 +1,11 @@
+#
+# POWERPC CPU Idle Drivers
+#
+config PSERIES_CPUIDLE
+ bool "Cpuidle driver for pSeries platforms"
+ depends on CPU_IDLE
+ depends on PPC_PSERIES
+ default y
+ help
+ Select this option to enable processor idle state management
+ through cpuidle subsystem.
diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
index 527be28..a6331ad 100644
--- a/drivers/cpuidle/Makefile
+++ b/drivers/cpuidle/Makefile
@@ -13,3 +13,7 @@ obj-$(CONFIG_ARM_KIRKWOOD_CPUIDLE) += cpuidle-kirkwood.o
obj-$(CONFIG_ARM_ZYNQ_CPUIDLE) += cpuidle-zynq.o
obj-$(CONFIG_ARM_U8500_CPUIDLE) += cpuidle-ux500.o
obj-$(CONFIG_ARM_AT91_CPUIDLE) += cpuidle-at91.o
+
+###############################################################################
+# POWERPC drivers
+obj-$(CONFIG_PSERIES_CPUIDLE) += cpuidle-pseries.o
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
new file mode 100644
index 0000000..2115478
--- /dev/null
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -0,0 +1,361 @@
+/*
+ * cpuidle-pseries - idle state cpuidle driver.
+ * Adapted from drivers/idle/intel_idle.c and
+ * drivers/acpi/processor_idle.c
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/moduleparam.h>
+#include <linux/cpuidle.h>
+#include <linux/cpu.h>
+#include <linux/notifier.h>
+
+#include <asm/paca.h>
+#include <asm/reg.h>
+#include <asm/machdep.h>
+#include <asm/firmware.h>
+#include <asm/plpar_wrappers.h>
+
+struct cpuidle_driver pseries_idle_driver = {
+ .name = "pseries_idle",
+ .owner = THIS_MODULE,
+};
+
+#define MAX_IDLE_STATE_COUNT 2
+
+static int max_idle_state = MAX_IDLE_STATE_COUNT - 1;
+static struct cpuidle_device __percpu *pseries_cpuidle_devices;
+static struct cpuidle_state *cpuidle_state_table;
+
+static inline void idle_loop_prolog(unsigned long *in_purr)
+{
+ *in_purr = mfspr(SPRN_PURR);
+ /*
+ * Indicate to the HV that we are idle. Now would be
+ * a good time to find other work to dispatch.
+ */
+ get_lppaca()->idle = 1;
+}
+
+static inline void idle_loop_epilog(unsigned long in_purr)
+{
+ u64 wait_cycles;
+
+ wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
+ wait_cycles += mfspr(SPRN_PURR) - in_purr;
+ get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
+ get_lppaca()->idle = 0;
+}
+
+static int snooze_loop(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv,
+ int index)
+{
+ unsigned long in_purr;
+ int cpu = dev->cpu;
+
+ idle_loop_prolog(&in_purr);
+ local_irq_enable();
+ set_thread_flag(TIF_POLLING_NRFLAG);
+
+ while ((!need_resched()) && cpu_online(cpu)) {
+ HMT_low();
+ HMT_very_low();
+ }
+
+ HMT_medium();
+ clear_thread_flag(TIF_POLLING_NRFLAG);
+ smp_mb();
+
+ idle_loop_epilog(in_purr);
+
+ return index;
+}
+
+static void check_and_cede_processor(void)
+{
+ /*
+ * Ensure our interrupt state is properly tracked,
+ * also checks if no interrupt has occurred while we
+ * were soft-disabled
+ */
+ if (prep_irq_for_idle()) {
+ cede_processor();
+#ifdef CONFIG_TRACE_IRQFLAGS
+ /* Ensure that H_CEDE returns with IRQs on */
+ if (WARN_ON(!(mfmsr() & MSR_EE)))
+ __hard_irq_enable();
+#endif
+ }
+}
+
+static int dedicated_cede_loop(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv,
+ int index)
+{
+ unsigned long in_purr;
+
+ idle_loop_prolog(&in_purr);
+ get_lppaca()->donate_dedicated_cpu = 1;
+
+ HMT_medium();
+ check_and_cede_processor();
+
+ get_lppaca()->donate_dedicated_cpu = 0;
+
+ idle_loop_epilog(in_purr);
+
+ return index;
+}
+
+static int shared_cede_loop(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv,
+ int index)
+{
+ unsigned long in_purr;
+
+ idle_loop_prolog(&in_purr);
+
+ /*
+ * Yield the processor to the hypervisor. We return if
+ * an external interrupt occurs (which are driven prior
+ * to returning here) or if a prod occurs from another
+ * processor. When returning here, external interrupts
+ * are enabled.
+ */
+ check_and_cede_processor();
+
+ idle_loop_epilog(in_purr);
+
+ return index;
+}
+
+/*
+ * States for dedicated partition case.
+ */
+static struct cpuidle_state dedicated_states[MAX_IDLE_STATE_COUNT] = {
+ { /* Snooze */
+ .name = "snooze",
+ .desc = "snooze",
+ .flags = CPUIDLE_FLAG_TIME_VALID,
+ .exit_latency = 0,
+ .target_residency = 0,
+ .enter = &snooze_loop },
+ { /* CEDE */
+ .name = "CEDE",
+ .desc = "CEDE",
+ .flags = CPUIDLE_FLAG_TIME_VALID,
+ .exit_latency = 10,
+ .target_residency = 100,
+ .enter = &dedicated_cede_loop },
+};
+
+/*
+ * States for shared partition case.
+ */
+static struct cpuidle_state shared_states[MAX_IDLE_STATE_COUNT] = {
+ { /* Shared Cede */
+ .name = "Shared Cede",
+ .desc = "Shared Cede",
+ .flags = CPUIDLE_FLAG_TIME_VALID,
+ .exit_latency = 0,
+ .target_residency = 0,
+ .enter = &shared_cede_loop },
+};
+
+void update_smt_snooze_delay(int cpu, int residency)
+{
+ struct cpuidle_driver *drv = cpuidle_get_driver();
+ struct cpuidle_device *dev = per_cpu(cpuidle_devices, cpu);
+
+ if (cpuidle_state_table != dedicated_states)
+ return;
+
+ if (residency < 0) {
+ /* Disable the Nap state on that cpu */
+ if (dev)
+ dev->states_usage[1].disable = 1;
+ } else
+ if (drv)
+ drv->states[1].target_residency = residency;
+}
+
+static int pseries_cpuidle_add_cpu_notifier(struct notifier_block *n,
+ unsigned long action, void *hcpu)
+{
+ int hotcpu = (unsigned long)hcpu;
+ struct cpuidle_device *dev =
+ per_cpu_ptr(pseries_cpuidle_devices, hotcpu);
+
+ if (dev && cpuidle_get_driver()) {
+ switch (action) {
+ case CPU_ONLINE:
+ case CPU_ONLINE_FROZEN:
+ cpuidle_pause_and_lock();
+ cpuidle_enable_device(dev);
+ cpuidle_resume_and_unlock();
+ break;
+
+ case CPU_DEAD:
+ case CPU_DEAD_FROZEN:
+ cpuidle_pause_and_lock();
+ cpuidle_disable_device(dev);
+ cpuidle_resume_and_unlock();
+ break;
+
+ default:
+ return NOTIFY_DONE;
+ }
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block setup_hotplug_notifier = {
+ .notifier_call = pseries_cpuidle_add_cpu_notifier,
+};
+
+/*
+ * pseries_cpuidle_driver_init()
+ */
+static int pseries_cpuidle_driver_init(void)
+{
+ int idle_state;
+ struct cpuidle_driver *drv = &pseries_idle_driver;
+
+ drv->state_count = 0;
+
+ for (idle_state = 0; idle_state < MAX_IDLE_STATE_COUNT; ++idle_state) {
+
+ if (idle_state > max_idle_state)
+ break;
+
+ /* is the state not enabled? */
+ if (cpuidle_state_table[idle_state].enter == NULL)
+ continue;
+
+ drv->states[drv->state_count] = /* structure copy */
+ cpuidle_state_table[idle_state];
+
+ drv->state_count += 1;
+ }
+
+ return 0;
+}
+
+/* pseries_idle_devices_uninit(void)
+ * unregister cpuidle devices and de-allocate memory
+ */
+static void pseries_idle_devices_uninit(void)
+{
+ int i;
+ struct cpuidle_device *dev;
+
+ for_each_possible_cpu(i) {
+ dev = per_cpu_ptr(pseries_cpuidle_devices, i);
+ cpuidle_unregister_device(dev);
+ }
+
+ free_percpu(pseries_cpuidle_devices);
+ return;
+}
+
+/* pseries_idle_devices_init()
+ * allocate, initialize and register cpuidle device
+ */
+static int pseries_idle_devices_init(void)
+{
+ int i;
+ struct cpuidle_driver *drv = &pseries_idle_driver;
+ struct cpuidle_device *dev;
+
+ pseries_cpuidle_devices = alloc_percpu(struct cpuidle_device);
+ if (pseries_cpuidle_devices == NULL)
+ return -ENOMEM;
+
+ for_each_possible_cpu(i) {
+ dev = per_cpu_ptr(pseries_cpuidle_devices, i);
+ dev->state_count = drv->state_count;
+ dev->cpu = i;
+ if (cpuidle_register_device(dev)) {
+ printk(KERN_DEBUG \
+ "cpuidle_register_device %d failed!\n", i);
+ return -EIO;
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * pseries_idle_probe()
+ * Choose state table for shared versus dedicated partition
+ */
+static int pseries_idle_probe(void)
+{
+
+ if (!firmware_has_feature(FW_FEATURE_SPLPAR))
+ return -ENODEV;
+
+ if (cpuidle_disable != IDLE_NO_OVERRIDE)
+ return -ENODEV;
+
+ if (max_idle_state == 0) {
+ printk(KERN_DEBUG "pseries processor idle disabled.\n");
+ return -EPERM;
+ }
+
+ if (lppaca_shared_proc(get_lppaca()))
+ cpuidle_state_table = shared_states;
+ else
+ cpuidle_state_table = dedicated_states;
+
+ return 0;
+}
+
+static int __init pseries_processor_idle_init(void)
+{
+ int retval;
+
+ retval = pseries_idle_probe();
+ if (retval)
+ return retval;
+
+ pseries_cpuidle_driver_init();
+ retval = cpuidle_register_driver(&pseries_idle_driver);
+ if (retval) {
+ printk(KERN_DEBUG "Registration of pseries driver failed.\n");
+ return retval;
+ }
+
+ retval = pseries_idle_devices_init();
+ if (retval) {
+ pseries_idle_devices_uninit();
+ cpuidle_unregister_driver(&pseries_idle_driver);
+ return retval;
+ }
+
+ register_cpu_notifier(&setup_hotplug_notifier);
+ printk(KERN_DEBUG "pseries_idle_driver registered\n");
+
+ return 0;
+}
+
+static void __exit pseries_processor_idle_exit(void)
+{
+
+ unregister_cpu_notifier(&setup_hotplug_notifier);
+ pseries_idle_devices_uninit();
+ cpuidle_unregister_driver(&pseries_idle_driver);
+
+ return;
+}
+
+module_init(pseries_processor_idle_init);
+module_exit(pseries_processor_idle_exit);
+
+MODULE_AUTHOR("Deepthi Dharwar <deepthi@linux.vnet.ibm.com>");
+MODULE_DESCRIPTION("Cpuidle driver for POWER");
+MODULE_LICENSE("GPL");
^ permalink raw reply related
* [PATCH v1 3/6] pseries/cpuidle: Use cpuidle_register() for initialisation.
From: Deepthi Dharwar @ 2014-01-14 10:56 UTC (permalink / raw)
To: linux-pm, benh, daniel.lezcano, linux-kernel, srivatsa.bhat,
preeti, svaidy, linuxppc-dev
In-Reply-To: <20140114105525.3064.52013.stgit@deepthi.in.ibm.com>
This patch replaces the cpuidle driver and devices initialisation
calls with a single generic cpuidle_register() call
and also includes minor refactoring of the code around it.
Remove the cpu online check in snooze loop, as this code can
only locally run on a cpu only if it is online. Therefore,
this check is not required.
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
---
drivers/cpuidle/cpuidle-pseries.c | 78 +++++--------------------------------
1 file changed, 11 insertions(+), 67 deletions(-)
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 2115478..32d86bc 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -27,7 +27,6 @@ struct cpuidle_driver pseries_idle_driver = {
#define MAX_IDLE_STATE_COUNT 2
static int max_idle_state = MAX_IDLE_STATE_COUNT - 1;
-static struct cpuidle_device __percpu *pseries_cpuidle_devices;
static struct cpuidle_state *cpuidle_state_table;
static inline void idle_loop_prolog(unsigned long *in_purr)
@@ -55,13 +54,12 @@ static int snooze_loop(struct cpuidle_device *dev,
int index)
{
unsigned long in_purr;
- int cpu = dev->cpu;
idle_loop_prolog(&in_purr);
local_irq_enable();
set_thread_flag(TIF_POLLING_NRFLAG);
- while ((!need_resched()) && cpu_online(cpu)) {
+ while (!need_resched()) {
HMT_low();
HMT_very_low();
}
@@ -188,7 +186,7 @@ static int pseries_cpuidle_add_cpu_notifier(struct notifier_block *n,
{
int hotcpu = (unsigned long)hcpu;
struct cpuidle_device *dev =
- per_cpu_ptr(pseries_cpuidle_devices, hotcpu);
+ per_cpu(cpuidle_devices, hotcpu);
if (dev && cpuidle_get_driver()) {
switch (action) {
@@ -245,50 +243,6 @@ static int pseries_cpuidle_driver_init(void)
return 0;
}
-/* pseries_idle_devices_uninit(void)
- * unregister cpuidle devices and de-allocate memory
- */
-static void pseries_idle_devices_uninit(void)
-{
- int i;
- struct cpuidle_device *dev;
-
- for_each_possible_cpu(i) {
- dev = per_cpu_ptr(pseries_cpuidle_devices, i);
- cpuidle_unregister_device(dev);
- }
-
- free_percpu(pseries_cpuidle_devices);
- return;
-}
-
-/* pseries_idle_devices_init()
- * allocate, initialize and register cpuidle device
- */
-static int pseries_idle_devices_init(void)
-{
- int i;
- struct cpuidle_driver *drv = &pseries_idle_driver;
- struct cpuidle_device *dev;
-
- pseries_cpuidle_devices = alloc_percpu(struct cpuidle_device);
- if (pseries_cpuidle_devices == NULL)
- return -ENOMEM;
-
- for_each_possible_cpu(i) {
- dev = per_cpu_ptr(pseries_cpuidle_devices, i);
- dev->state_count = drv->state_count;
- dev->cpu = i;
- if (cpuidle_register_device(dev)) {
- printk(KERN_DEBUG \
- "cpuidle_register_device %d failed!\n", i);
- return -EIO;
- }
- }
-
- return 0;
-}
-
/*
* pseries_idle_probe()
* Choose state table for shared versus dedicated partition
@@ -296,9 +250,6 @@ static int pseries_idle_devices_init(void)
static int pseries_idle_probe(void)
{
- if (!firmware_has_feature(FW_FEATURE_SPLPAR))
- return -ENODEV;
-
if (cpuidle_disable != IDLE_NO_OVERRIDE)
return -ENODEV;
@@ -307,10 +258,13 @@ static int pseries_idle_probe(void)
return -EPERM;
}
- if (lppaca_shared_proc(get_lppaca()))
- cpuidle_state_table = shared_states;
- else
- cpuidle_state_table = dedicated_states;
+ if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
+ if (lppaca_shared_proc(get_lppaca()))
+ cpuidle_state_table = shared_states;
+ else
+ cpuidle_state_table = dedicated_states;
+ } else
+ return -ENODEV;
return 0;
}
@@ -324,22 +278,14 @@ static int __init pseries_processor_idle_init(void)
return retval;
pseries_cpuidle_driver_init();
- retval = cpuidle_register_driver(&pseries_idle_driver);
+ retval = cpuidle_register(&pseries_idle_driver, NULL);
if (retval) {
printk(KERN_DEBUG "Registration of pseries driver failed.\n");
return retval;
}
- retval = pseries_idle_devices_init();
- if (retval) {
- pseries_idle_devices_uninit();
- cpuidle_unregister_driver(&pseries_idle_driver);
- return retval;
- }
-
register_cpu_notifier(&setup_hotplug_notifier);
printk(KERN_DEBUG "pseries_idle_driver registered\n");
-
return 0;
}
@@ -347,9 +293,7 @@ static void __exit pseries_processor_idle_exit(void)
{
unregister_cpu_notifier(&setup_hotplug_notifier);
- pseries_idle_devices_uninit();
- cpuidle_unregister_driver(&pseries_idle_driver);
-
+ cpuidle_unregister(&pseries_idle_driver);
return;
}
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox