LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [rtc-linux] [PATCH] rtc/ds3232: Enable ds3232 to work as wakeup source
From: Andrew Morton @ 2014-02-25 22:07 UTC (permalink / raw)
  To: rtc-linux; +Cc: a.zummo, linuxppc-dev, Dongsheng Wang, chenhui.zhao
In-Reply-To: <1390281891-9632-1-git-send-email-dongsheng.wang@freescale.com>

On Tue, 21 Jan 2014 13:24:51 +0800 Dongsheng Wang <dongsheng.wang@freescale.com> wrote:

> From: Wang Dongsheng <dongsheng.wang@freescale.com>
> 
> Add suspend/resume and device_init_wakeup to enable ds3232 as
> wakeup source, /sys/class/rtc/rtcX/wakealarm for set wakeup alarm.
> 
> ...
> 
> @@ -411,23 +424,21 @@ static int ds3232_probe(struct i2c_client *client,
>  	if (ret)
>  		return ret;
>  
> -	ds3232->rtc = devm_rtc_device_register(&client->dev, client->name,
> -					  &ds3232_rtc_ops, THIS_MODULE);
> -	if (IS_ERR(ds3232->rtc)) {
> -		dev_err(&client->dev, "unable to register the class device\n");
> -		return PTR_ERR(ds3232->rtc);
> -	}
> -
> -	if (client->irq >= 0) {
> +	if (client->irq != NO_IRQ) {

x86_64 allmodconfig:

drivers/rtc/rtc-ds3232.c: In function 'ds3232_probe':
drivers/rtc/rtc-ds3232.c:427: error: 'NO_IRQ' undeclared (first use in this function)
drivers/rtc/rtc-ds3232.c:427: error: (Each undeclared identifier is reported only once
drivers/rtc/rtc-ds3232.c:427: error: for each function it appears in.)

Not all architectures implement NO_IRQ.

I think this should be 

	if (client->irq > 0) {

but I'm not sure - iirc, x86 (at least) treats zero as "not an IRQ". 
But I think some architectures permit IRQ 0.  There was discussion many
years ago but I don't think anything got resolved.


Help!  I think some ppc people will know what to do here?

^ permalink raw reply

* Re: [PATCH v2 01/11] perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus
From: Cody P Schafer @ 2014-02-25 22:19 UTC (permalink / raw)
  To: Michael Ellerman, Linux PPC, Arnaldo Carvalho de Melo,
	Ingo Molnar, Paul Mackerras, Peter Zijlstra
  Cc: LKML
In-Reply-To: <530CFE30.70803@linux.vnet.ibm.com>

On 02/25/2014 12:33 PM, Cody P Schafer wrote:
> On 02/24/2014 07:33 PM, Michael Ellerman wrote:
>> On Fri, 2014-14-02 at 22:02:05 UTC, Cody P Schafer wrote:
>>> Add PMU_RANGE_ATTR() and PMU_RANGE_RESV() (for reserved areas) which
>>> generate functions to extract the relevent bits from
>>> event->attr.config{,1,2} for use by sw-like pmus where the
>>> 'config{,1,2}' values don't map directly to hardware registers.
>>>
>>> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
>>> ---
>>>   include/linux/perf_event.h | 17 +++++++++++++++++
>>>   1 file changed, 17 insertions(+)
>>>
>>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>>> index e56b07f..2702e91 100644
>>> --- a/include/linux/perf_event.h
>>> +++ b/include/linux/perf_event.h
>>> @@ -871,4 +871,21 @@ _name##_show(struct device
>>> *dev,                    \
>>>                                       \
>>>   static struct device_attribute format_attr_##_name = __ATTR_RO(_name)
>>>
>>> +#define PMU_RANGE_ATTR(name, attr_var, bit_start, bit_end)        \
>>> +PMU_FORMAT_ATTR(name, #attr_var ":" #bit_start "-" #bit_end);        \
>>> +PMU_RANGE_RESV(name, attr_var, bit_start, bit_end)
>>> +
>>> +#define PMU_RANGE_RESV(name, attr_var, bit_start, bit_end)        \
>>> +static u64 event_get_##name##_max(void)                    \
>>> +{                                    \
>>> +    int bits = (bit_end) - (bit_start) + 1;                \
>>> +    return ((0x1ULL << (bits - 1ULL)) - 1ULL) |            \
>>> +        (0xFULL << (bits - 4ULL));                \
>>> +}                                    \
>>> +static u64 event_get_##name(struct perf_event *event)            \
>>> +{                                    \
>>> +    return (event->attr.attr_var >> (bit_start)) &            \
>>> +        event_get_##name##_max();                \
>>> +}
>>
>> I still don't like the names.
>>
>> EVENT_GETTER_AND_FORMAT()
>
> EVENT_RANGE()
>
> I'd prefer to describe the intended usage rather than what is generated
> both in case we change some of the specifics later, and to provide
> additional information to the developers beyond what a simple code
> reading gives.
>
>> EVENT_RESERVED()
>
> Sure. The PMU_* naming was just based on the PMU_FORMAT_ATTR() naming,
> so I kept it for continuity with the existing API. Maybe
> EVENT_RANGE_RESERVED() would be more appropriate?
>

Thinking about this a bit more, EVENT_RANGE() and EVENT_RANGE_RESERVED() 
aren't quite ideal either. The "EVENT" name collides with the files we 
put in the event/ dir, which these macros generate files for the format/ 
dir. Maybe:

FORMAT_RANGE() and FORMAT_RANGE_RESERVED()
or
PMU_FORMAT_RANGE(), PMU_FORMAT_RANGE_RESERVED()

^ permalink raw reply

* Re: [PATCH] powerpc: warn users of smt-snooze-delay that the API isn't there anymore
From: Benjamin Herrenschmidt @ 2014-02-25 22:40 UTC (permalink / raw)
  To: Deepthi Dharwar
  Cc: Madhavan Srinivasan, linuxppc-dev, Wang Dongsheng, linux-kernel,
	Paul Gortmaker, Paul Mackerras, Olof Johansson, Cody P Schafer
In-Reply-To: <530C4D57.1030905@linux.vnet.ibm.com>

On Tue, 2014-02-25 at 13:29 +0530, Deepthi Dharwar wrote:
> We currently do not use smt-snooze-delay in the kernel.
> The sysfs entries needs to  be retained until we do a clean up
> ppc64_cpu
> util that uses these entries to determine SMT,
> clean up patch for this has already been posted out by Prerna.
> Once, we have the ppc64_cpu changes in, we can look to clean up these
> parts from the kernel.

We generally shouldn't change user visible interfaces.

People still have old versions of ppc64_cpu, we must not break them

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] powerpc: warn users of smt-snooze-delay that the API isn't there anymore
From: Cody P Schafer @ 2014-02-25 22:47 UTC (permalink / raw)
  To: Madhavan Srinivasan, Benjamin Herrenschmidt, Olof Johansson,
	Paul Gortmaker, Wang Dongsheng
  Cc: linuxppc-dev, Paul Mackerras, linux-kernel
In-Reply-To: <530C21E6.5020106@linux.vnet.ibm.com>

On 02/24/2014 08:53 PM, Madhavan Srinivasan wrote:
> On Saturday 22 February 2014 05:44 AM, Cody P Schafer wrote:
>> /sys/devices/system/cpu/cpu*/smt-snooze-delay was converted into a NOP
>> in commit 3fa8cad82b94d0bed002571bd246f2299ffc876b, and now does
>> nothing. Add a pr_warn() to convince any users that they should stop
>> using it.
>>
>> The commit message from the removing commit notes that this
>> functionality should move into the cpuidle driver, essentially by
>
> Would prefer to cleanup the code since the functionality is moved,
> instead of adding to it.

We'd still want users of the interface to use an attribute wired up 
under the cpuidle/ dir, so a warning (to update their software) is still 
needed. As deepthi has noted, cpuidle right now doesn't support changing 
this on a per-cpu basis, so a "cleanup" isn't a simple matter.

^ permalink raw reply

* Re: [PATCH] powerpc/powernv: Read opal error log and export it through sysfs interface.
From: Stewart Smith @ 2014-02-25 23:19 UTC (permalink / raw)
  To: Mahesh Jagannath Salgaonkar, linuxppc-dev
In-Reply-To: <530C23D3.6020203@linux.vnet.ibm.com>

Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
>>  I think we could provide a better interface with instead having a file
>>  per log message appear in sysfs. We're never going to have more than 128
>>  of these at any one time on the Linux side, so it's not going to bee too
>>  many files.
>
> It is not just about 128 files, we may be adding/removing sysfs node for
> every new log id that gets informed to kernel and ack-ed. In worst case,
> when we have flood of elog errors with user daemon consuming it and
> ack-ing back to get ready for next log in a tight poll, we may
> continuously add/remove the sysfs node for each new <id>.

Do we ever get a storm of hundreds/thousands of them though? If many
come it at once userspace may just be woken up one or two times, as it
would just select() and wait for events.

>>  I've seen some conflicting things on this - is it 2kb or 16kb?
>
> We choose 16kb because we want to pull all the log data and not
> partial.

So the max log size for any one entry is in fact 16kb?

>>  This means we constantly use 128 * sizeof(struct opal_err_log) which
>>  equates to somewhere north of 2MB of memory (due to list overhead).
>> 
>>  I don't think we need to statically allocate this, we can probably just
>>  allocate on-demand as in a typical system you're probably quite
>>  unlikely to have too many of these sitting around (besides, if for
>>  whatever reason we cannot allocate memory at some point, that's okay
>>  because we can read it again later).
>
> The reason we choose to go for static allocation is, we can not afford
> to drop or delay a critical error log due to memory allocation failure.
> OR we can keep static allocations for critical errors and follow dynamic
> allocation for informative error logs.  What do you say?

Userspace is probably going to have to do IO to get the log and ack it,
so it's probably not a huge problem - if we can't allocate a few kb in a
couple of attempts then we likely have bigger problems.

If we were going to have a sustained amount of hundreds/thousands of
these per second then perhaps we'd have other issues, but from what I
understand we're probably only going to have a handful per year on a
typical system? (I am, of course, not talking about our dev systems,
which are rather atypical :)

I'll likely have a patch today that shows kind of what I mean.

^ permalink raw reply

* [PATCH 0/7] cpuidle/powernv: Enable Fast-Sleep on PowerNV
From: Preeti U Murthy @ 2014-02-26  0:07 UTC (permalink / raw)
  To: linux-pm, geoff, fweisbec, daniel.lezcano, srivatsa.bhat, benh,
	tglx, svaidy, linuxppc-dev, mingo
  Cc: paulmck, rafael.j.wysocki

This series is based on tip/timers/core ontop of commit
849401b66d305:tick: Fixup more fallout from hrtimer broadcast mode.

Fast sleep is one of the deep idle states on Power8 in which local timers of
CPUs stop. On PowerPC we do not have an external clock device which can
handle wakeup of such CPUs. Now that we have the support in the tick
broadcast framework for archs that do not sport such a device soon to go
upstream, add fast sleep as one of the idle states on PowerNV along with
related arch specific support.

The earlier versions of this patchset included support in the tick broadcast
framework for such idle states. Now that the support in the broadcast
framework has been pulled into tip separately, this series is posted
independently and as a new patchset altogether. This series depends in
particular on the following commits in tip/timers/core:

1.da7e6f45c3:time: Change the return type of clockevents_notify() to integer
2.ba8f20c2eb:cpuidle: Handle clockevents_notify(BROADCAST_ENTER) failure
3.5d1638acb9f62fa:tick: Introduce hrtimer based broadcast
4.f1689bb7abec8e2e6:time: Fixup fallout from recent clockevent/tick changes
5.849401b66d305f3feb75:Fixup more fallout from hrtimer broadcast mode

---

Preeti U Murthy (3):
      cpuidle/ppc: Split timer_interrupt() into timer handling and interrupt handling routines
      cpuidle/powernv: Add "Fast-Sleep" CPU idle state
      cpuidle/powernv: Parse device tree to setup idle states

Srivatsa S. Bhat (2):
      powerpc: Free up the slot of PPC_MSG_CALL_FUNC_SINGLE IPI message
      powerpc: Implement tick broadcast IPI as a fixed IPI message

Vaidyanathan Srinivasan (2):
      powernv/cpuidle: Add context management for Fast Sleep
      powermgt: Add OPAL call to resync timebase on wakeup


 arch/powerpc/Kconfig                           |    2 
 arch/powerpc/include/asm/opal.h                |    2 
 arch/powerpc/include/asm/processor.h           |    1 
 arch/powerpc/include/asm/smp.h                 |    2 
 arch/powerpc/include/asm/time.h                |    1 
 arch/powerpc/kernel/exceptions-64s.S           |   10 ++
 arch/powerpc/kernel/idle_power7.S              |   90 +++++++++++++++++----
 arch/powerpc/kernel/smp.c                      |   25 ++++--
 arch/powerpc/kernel/time.c                     |   90 +++++++++++++--------
 arch/powerpc/platforms/cell/interrupt.c        |    2 
 arch/powerpc/platforms/powernv/opal-wrappers.S |    1 
 arch/powerpc/platforms/ps3/smp.c               |    2 
 drivers/cpuidle/cpuidle-powernv.c              |  102 ++++++++++++++++++++++--
 13 files changed, 253 insertions(+), 77 deletions(-)

-- 
Signature

^ permalink raw reply

* [PATCH 1/7] powerpc: Free up the slot of PPC_MSG_CALL_FUNC_SINGLE IPI message
From: Preeti U Murthy @ 2014-02-26  0:07 UTC (permalink / raw)
  To: linux-pm, geoff, fweisbec, daniel.lezcano, srivatsa.bhat, benh,
	tglx, svaidy, linuxppc-dev, mingo
  Cc: paulmck, rafael.j.wysocki
In-Reply-To: <20140226000310.17879.67295.stgit@preeti>

From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

The IPI handlers for both PPC_MSG_CALL_FUNC and PPC_MSG_CALL_FUNC_SINGLE map
to a common implementation - generic_smp_call_function_single_interrupt(). So,
we can consolidate them and save one of the IPI message slots, (which are
precious on powerpc, since only 4 of those slots are available).

So, implement the functionality of PPC_MSG_CALL_FUNC_SINGLE using
PPC_MSG_CALL_FUNC itself and release its IPI message slot, so that it can be
used for something else in the future, if desired.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Geoff Levand <geoff@infradead.org> [For the PS3 part]
---

 arch/powerpc/include/asm/smp.h          |    2 +-
 arch/powerpc/kernel/smp.c               |   12 +++++-------
 arch/powerpc/platforms/cell/interrupt.c |    2 +-
 arch/powerpc/platforms/ps3/smp.c        |    2 +-
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 084e080..9f7356b 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -120,7 +120,7 @@ extern int cpu_to_core_id(int cpu);
  * in /proc/interrupts will be wrong!!! --Troy */
 #define PPC_MSG_CALL_FUNCTION   0
 #define PPC_MSG_RESCHEDULE      1
-#define PPC_MSG_CALL_FUNC_SINGLE	2
+#define PPC_MSG_UNUSED		2
 #define PPC_MSG_DEBUGGER_BREAK  3
 
 /* for irq controllers that have dedicated ipis per message (4) */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ac2621a..ee7d76b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -145,9 +145,9 @@ static irqreturn_t reschedule_action(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
-static irqreturn_t call_function_single_action(int irq, void *data)
+static irqreturn_t unused_action(int irq, void *data)
 {
-	generic_smp_call_function_single_interrupt();
+	/* This slot is unused and hence available for use, if needed */
 	return IRQ_HANDLED;
 }
 
@@ -168,14 +168,14 @@ static irqreturn_t debug_ipi_action(int irq, void *data)
 static irq_handler_t smp_ipi_action[] = {
 	[PPC_MSG_CALL_FUNCTION] =  call_function_action,
 	[PPC_MSG_RESCHEDULE] = reschedule_action,
-	[PPC_MSG_CALL_FUNC_SINGLE] = call_function_single_action,
+	[PPC_MSG_UNUSED] = unused_action,
 	[PPC_MSG_DEBUGGER_BREAK] = debug_ipi_action,
 };
 
 const char *smp_ipi_name[] = {
 	[PPC_MSG_CALL_FUNCTION] =  "ipi call function",
 	[PPC_MSG_RESCHEDULE] = "ipi reschedule",
-	[PPC_MSG_CALL_FUNC_SINGLE] = "ipi call function single",
+	[PPC_MSG_UNUSED] = "ipi unused",
 	[PPC_MSG_DEBUGGER_BREAK] = "ipi debugger",
 };
 
@@ -251,8 +251,6 @@ irqreturn_t smp_ipi_demux(void)
 			generic_smp_call_function_interrupt();
 		if (all & IPI_MESSAGE(PPC_MSG_RESCHEDULE))
 			scheduler_ipi();
-		if (all & IPI_MESSAGE(PPC_MSG_CALL_FUNC_SINGLE))
-			generic_smp_call_function_single_interrupt();
 		if (all & IPI_MESSAGE(PPC_MSG_DEBUGGER_BREAK))
 			debug_ipi_action(0, NULL);
 	} while (info->messages);
@@ -280,7 +278,7 @@ EXPORT_SYMBOL_GPL(smp_send_reschedule);
 
 void arch_send_call_function_single_ipi(int cpu)
 {
-	do_message_pass(cpu, PPC_MSG_CALL_FUNC_SINGLE);
+	do_message_pass(cpu, PPC_MSG_CALL_FUNCTION);
 }
 
 void arch_send_call_function_ipi_mask(const struct cpumask *mask)
diff --git a/arch/powerpc/platforms/cell/interrupt.c b/arch/powerpc/platforms/cell/interrupt.c
index 2d42f3b..adf3726 100644
--- a/arch/powerpc/platforms/cell/interrupt.c
+++ b/arch/powerpc/platforms/cell/interrupt.c
@@ -215,7 +215,7 @@ void iic_request_IPIs(void)
 {
 	iic_request_ipi(PPC_MSG_CALL_FUNCTION);
 	iic_request_ipi(PPC_MSG_RESCHEDULE);
-	iic_request_ipi(PPC_MSG_CALL_FUNC_SINGLE);
+	iic_request_ipi(PPC_MSG_UNUSED);
 	iic_request_ipi(PPC_MSG_DEBUGGER_BREAK);
 }
 
diff --git a/arch/powerpc/platforms/ps3/smp.c b/arch/powerpc/platforms/ps3/smp.c
index 4b35166..00d1a7c 100644
--- a/arch/powerpc/platforms/ps3/smp.c
+++ b/arch/powerpc/platforms/ps3/smp.c
@@ -76,7 +76,7 @@ static int __init ps3_smp_probe(void)
 
 		BUILD_BUG_ON(PPC_MSG_CALL_FUNCTION    != 0);
 		BUILD_BUG_ON(PPC_MSG_RESCHEDULE       != 1);
-		BUILD_BUG_ON(PPC_MSG_CALL_FUNC_SINGLE != 2);
+		BUILD_BUG_ON(PPC_MSG_UNUSED	      != 2);
 		BUILD_BUG_ON(PPC_MSG_DEBUGGER_BREAK   != 3);
 
 		for (i = 0; i < MSG_COUNT; i++) {

^ permalink raw reply related

* [PATCH 2/7] powerpc: Implement tick broadcast IPI as a fixed IPI message
From: Preeti U Murthy @ 2014-02-26  0:07 UTC (permalink / raw)
  To: linux-pm, geoff, fweisbec, daniel.lezcano, srivatsa.bhat, benh,
	tglx, svaidy, linuxppc-dev, mingo
  Cc: paulmck, rafael.j.wysocki
In-Reply-To: <20140226000310.17879.67295.stgit@preeti>

From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

For scalability and performance reasons, we want the tick broadcast IPIs
to be handled as efficiently as possible. Fixed IPI messages
are one of the most efficient mechanisms available - they are faster than
the smp_call_function mechanism because the IPI handlers are fixed and hence
they don't involve costly operations such as adding IPI handlers to the target
CPU's function queue, acquiring locks for synchronization etc.

Luckily we have an unused IPI message slot, so use that to implement
tick broadcast IPIs efficiently.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
[Functions renamed to tick_broadcast* and Changelog modified by
 Preeti U. Murthy<preeti@linux.vnet.ibm.com>]
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Geoff Levand <geoff@infradead.org> [For the PS3 part]
---

 arch/powerpc/include/asm/smp.h          |    2 +-
 arch/powerpc/include/asm/time.h         |    1 +
 arch/powerpc/kernel/smp.c               |   21 +++++++++++++++++----
 arch/powerpc/kernel/time.c              |    5 +++++
 arch/powerpc/platforms/cell/interrupt.c |    2 +-
 arch/powerpc/platforms/ps3/smp.c        |    2 +-
 6 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 9f7356b..ff51046 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -120,7 +120,7 @@ extern int cpu_to_core_id(int cpu);
  * in /proc/interrupts will be wrong!!! --Troy */
 #define PPC_MSG_CALL_FUNCTION   0
 #define PPC_MSG_RESCHEDULE      1
-#define PPC_MSG_UNUSED		2
+#define PPC_MSG_TICK_BROADCAST	2
 #define PPC_MSG_DEBUGGER_BREAK  3
 
 /* for irq controllers that have dedicated ipis per message (4) */
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index c1f2676..1d428e6 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -28,6 +28,7 @@ extern struct clock_event_device decrementer_clockevent;
 struct rtc_time;
 extern void to_tm(int tim, struct rtc_time * tm);
 extern void GregorianDay(struct rtc_time *tm);
+extern void tick_broadcast_ipi_handler(void);
 
 extern void generic_calibrate_decr(void);
 
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ee7d76b..e2a4232 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -35,6 +35,7 @@
 #include <asm/ptrace.h>
 #include <linux/atomic.h>
 #include <asm/irq.h>
+#include <asm/hw_irq.h>
 #include <asm/page.h>
 #include <asm/pgtable.h>
 #include <asm/prom.h>
@@ -145,9 +146,9 @@ static irqreturn_t reschedule_action(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
-static irqreturn_t unused_action(int irq, void *data)
+static irqreturn_t tick_broadcast_ipi_action(int irq, void *data)
 {
-	/* This slot is unused and hence available for use, if needed */
+	tick_broadcast_ipi_handler();
 	return IRQ_HANDLED;
 }
 
@@ -168,14 +169,14 @@ static irqreturn_t debug_ipi_action(int irq, void *data)
 static irq_handler_t smp_ipi_action[] = {
 	[PPC_MSG_CALL_FUNCTION] =  call_function_action,
 	[PPC_MSG_RESCHEDULE] = reschedule_action,
-	[PPC_MSG_UNUSED] = unused_action,
+	[PPC_MSG_TICK_BROADCAST] = tick_broadcast_ipi_action,
 	[PPC_MSG_DEBUGGER_BREAK] = debug_ipi_action,
 };
 
 const char *smp_ipi_name[] = {
 	[PPC_MSG_CALL_FUNCTION] =  "ipi call function",
 	[PPC_MSG_RESCHEDULE] = "ipi reschedule",
-	[PPC_MSG_UNUSED] = "ipi unused",
+	[PPC_MSG_TICK_BROADCAST] = "ipi tick-broadcast",
 	[PPC_MSG_DEBUGGER_BREAK] = "ipi debugger",
 };
 
@@ -251,6 +252,8 @@ irqreturn_t smp_ipi_demux(void)
 			generic_smp_call_function_interrupt();
 		if (all & IPI_MESSAGE(PPC_MSG_RESCHEDULE))
 			scheduler_ipi();
+		if (all & IPI_MESSAGE(PPC_MSG_TICK_BROADCAST))
+			tick_broadcast_ipi_handler();
 		if (all & IPI_MESSAGE(PPC_MSG_DEBUGGER_BREAK))
 			debug_ipi_action(0, NULL);
 	} while (info->messages);
@@ -289,6 +292,16 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask)
 		do_message_pass(cpu, PPC_MSG_CALL_FUNCTION);
 }
 
+#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
+void tick_broadcast(const struct cpumask *mask)
+{
+	unsigned int cpu;
+
+	for_each_cpu(cpu, mask)
+		do_message_pass(cpu, PPC_MSG_TICK_BROADCAST);
+}
+#endif
+
 #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC)
 void smp_send_debugger_break(void)
 {
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index b3dab20..3ff97db 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -825,6 +825,11 @@ static void decrementer_set_mode(enum clock_event_mode mode,
 		decrementer_set_next_event(DECREMENTER_MAX, dev);
 }
 
+/* Interrupt handler for the timer broadcast IPI */
+void tick_broadcast_ipi_handler(void)
+{
+}
+
 static void register_decrementer_clockevent(int cpu)
 {
 	struct clock_event_device *dec = &per_cpu(decrementers, cpu);
diff --git a/arch/powerpc/platforms/cell/interrupt.c b/arch/powerpc/platforms/cell/interrupt.c
index adf3726..8a106b4 100644
--- a/arch/powerpc/platforms/cell/interrupt.c
+++ b/arch/powerpc/platforms/cell/interrupt.c
@@ -215,7 +215,7 @@ void iic_request_IPIs(void)
 {
 	iic_request_ipi(PPC_MSG_CALL_FUNCTION);
 	iic_request_ipi(PPC_MSG_RESCHEDULE);
-	iic_request_ipi(PPC_MSG_UNUSED);
+	iic_request_ipi(PPC_MSG_TICK_BROADCAST);
 	iic_request_ipi(PPC_MSG_DEBUGGER_BREAK);
 }
 
diff --git a/arch/powerpc/platforms/ps3/smp.c b/arch/powerpc/platforms/ps3/smp.c
index 00d1a7c..b358bec 100644
--- a/arch/powerpc/platforms/ps3/smp.c
+++ b/arch/powerpc/platforms/ps3/smp.c
@@ -76,7 +76,7 @@ static int __init ps3_smp_probe(void)
 
 		BUILD_BUG_ON(PPC_MSG_CALL_FUNCTION    != 0);
 		BUILD_BUG_ON(PPC_MSG_RESCHEDULE       != 1);
-		BUILD_BUG_ON(PPC_MSG_UNUSED	      != 2);
+		BUILD_BUG_ON(PPC_MSG_TICK_BROADCAST   != 2);
 		BUILD_BUG_ON(PPC_MSG_DEBUGGER_BREAK   != 3);
 
 		for (i = 0; i < MSG_COUNT; i++) {

^ permalink raw reply related

* [PATCH 3/7] cpuidle/ppc: Split timer_interrupt() into timer handling and interrupt handling routines
From: Preeti U Murthy @ 2014-02-26  0:08 UTC (permalink / raw)
  To: linux-pm, geoff, fweisbec, daniel.lezcano, srivatsa.bhat, benh,
	tglx, svaidy, linuxppc-dev, mingo
  Cc: paulmck, rafael.j.wysocki
In-Reply-To: <20140226000310.17879.67295.stgit@preeti>

Split timer_interrupt(), which is the local timer interrupt handler on ppc
into routines called during regular interrupt handling and __timer_interrupt(),
which takes care of running local timers and collecting time related stats.

This will enable callers interested only in running expired local timers to
directly call into __timer_interupt(). One of the use cases of this is the
tick broadcast IPI handling in which the sleeping CPUs need to handle the local
timers that have expired.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---

 arch/powerpc/kernel/time.c |   81 +++++++++++++++++++++++++-------------------
 1 file changed, 46 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 3ff97db..df2989b 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -478,6 +478,47 @@ void arch_irq_work_raise(void)
 
 #endif /* CONFIG_IRQ_WORK */
 
+void __timer_interrupt(void)
+{
+	struct pt_regs *regs = get_irq_regs();
+	u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
+	struct clock_event_device *evt = &__get_cpu_var(decrementers);
+	u64 now;
+
+	trace_timer_interrupt_entry(regs);
+
+	if (test_irq_work_pending()) {
+		clear_irq_work_pending();
+		irq_work_run();
+	}
+
+	now = get_tb_or_rtc();
+	if (now >= *next_tb) {
+		*next_tb = ~(u64)0;
+		if (evt->event_handler)
+			evt->event_handler(evt);
+		__get_cpu_var(irq_stat).timer_irqs_event++;
+	} else {
+		now = *next_tb - now;
+		if (now <= DECREMENTER_MAX)
+			set_dec((int)now);
+		/* We may have raced with new irq work */
+		if (test_irq_work_pending())
+			set_dec(1);
+		__get_cpu_var(irq_stat).timer_irqs_others++;
+	}
+
+#ifdef CONFIG_PPC64
+	/* collect purr register values often, for accurate calculations */
+	if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
+		struct cpu_usage *cu = &__get_cpu_var(cpu_usage_array);
+		cu->current_tb = mfspr(SPRN_PURR);
+	}
+#endif
+
+	trace_timer_interrupt_exit(regs);
+}
+
 /*
  * timer_interrupt - gets called when the decrementer overflows,
  * with interrupts disabled.
@@ -486,8 +527,6 @@ void timer_interrupt(struct pt_regs * regs)
 {
 	struct pt_regs *old_regs;
 	u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
-	struct clock_event_device *evt = &__get_cpu_var(decrementers);
-	u64 now;
 
 	/* Ensure a positive value is written to the decrementer, or else
 	 * some CPUs will continue to take decrementer exceptions.
@@ -519,39 +558,7 @@ void timer_interrupt(struct pt_regs * regs)
 	old_regs = set_irq_regs(regs);
 	irq_enter();
 
-	trace_timer_interrupt_entry(regs);
-
-	if (test_irq_work_pending()) {
-		clear_irq_work_pending();
-		irq_work_run();
-	}
-
-	now = get_tb_or_rtc();
-	if (now >= *next_tb) {
-		*next_tb = ~(u64)0;
-		if (evt->event_handler)
-			evt->event_handler(evt);
-		__get_cpu_var(irq_stat).timer_irqs_event++;
-	} else {
-		now = *next_tb - now;
-		if (now <= DECREMENTER_MAX)
-			set_dec((int)now);
-		/* We may have raced with new irq work */
-		if (test_irq_work_pending())
-			set_dec(1);
-		__get_cpu_var(irq_stat).timer_irqs_others++;
-	}
-
-#ifdef CONFIG_PPC64
-	/* collect purr register values often, for accurate calculations */
-	if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
-		struct cpu_usage *cu = &__get_cpu_var(cpu_usage_array);
-		cu->current_tb = mfspr(SPRN_PURR);
-	}
-#endif
-
-	trace_timer_interrupt_exit(regs);
-
+	__timer_interrupt();
 	irq_exit();
 	set_irq_regs(old_regs);
 }
@@ -828,6 +835,10 @@ static void decrementer_set_mode(enum clock_event_mode mode,
 /* Interrupt handler for the timer broadcast IPI */
 void tick_broadcast_ipi_handler(void)
 {
+	u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
+
+	*next_tb = get_tb_or_rtc();
+	__timer_interrupt();
 }
 
 static void register_decrementer_clockevent(int cpu)

^ permalink raw reply related

* [PATCH 4/7] powernv/cpuidle: Add context management for Fast Sleep
From: Preeti U Murthy @ 2014-02-26  0:08 UTC (permalink / raw)
  To: linux-pm, geoff, fweisbec, daniel.lezcano, srivatsa.bhat, benh,
	tglx, svaidy, linuxppc-dev, mingo
  Cc: paulmck, rafael.j.wysocki
In-Reply-To: <20140226000310.17879.67295.stgit@preeti>

From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

Before adding Fast-Sleep into the cpuidle framework, some low level
support needs to be added to enable it. This includes saving and
restoring of certain registers at entry and exit time of this state
respectively just like we do in the NAP idle state.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
[Changelog modified by Preeti U. Murthy <preeti@linux.vnet.ibm.com>]
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
---

 arch/powerpc/include/asm/processor.h |    1 +
 arch/powerpc/kernel/exceptions-64s.S |   10 ++++-
 arch/powerpc/kernel/idle_power7.S    |   63 ++++++++++++++++++++++++----------
 3 files changed, 53 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index b62de43..d660dc3 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -450,6 +450,7 @@ enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
 
 extern int powersave_nap;	/* set if nap mode can be used in idle loop */
 extern void power7_nap(void);
+extern void power7_sleep(void);
 extern void flush_instruction_cache(void);
 extern void hard_reset_now(void);
 extern void poweroff_now(void);
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 38d5073..b01a9cb 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -121,9 +121,10 @@ BEGIN_FTR_SECTION
 	cmpwi	cr1,r13,2
 	/* Total loss of HV state is fatal, we could try to use the
 	 * PIR to locate a PACA, then use an emergency stack etc...
-	 * but for now, let's just stay stuck here
+	 * OPAL v3 based powernv platforms have new idle states
+	 * which fall in this catagory.
 	 */
-	bgt	cr1,.
+	bgt	cr1,8f
 	GET_PACA(r13)
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
@@ -141,6 +142,11 @@ BEGIN_FTR_SECTION
 	beq	cr1,2f
 	b	.power7_wakeup_noloss
 2:	b	.power7_wakeup_loss
+
+	/* Fast Sleep wakeup on PowerNV */
+8:	GET_PACA(r13)
+	b 	.power7_wakeup_loss
+
 9:
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 #endif /* CONFIG_PPC_P7_NAP */
diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S
index 3fdef0f..14f78be 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -20,17 +20,27 @@
 
 #undef DEBUG
 
-	.text
+/* Idle state entry routines */
 
-_GLOBAL(power7_idle)
-	/* Now check if user or arch enabled NAP mode */
-	LOAD_REG_ADDRBASE(r3,powersave_nap)
-	lwz	r4,ADDROFF(powersave_nap)(r3)
-	cmpwi	0,r4,0
-	beqlr
-	/* fall through */
+#define	IDLE_STATE_ENTER_SEQ(IDLE_INST)				\
+	/* Magic NAP/SLEEP/WINKLE mode enter sequence */	\
+	std	r0,0(r1);					\
+	ptesync;						\
+	ld	r0,0(r1);					\
+1:	cmp	cr0,r0,r0;					\
+	bne	1b;						\
+	IDLE_INST;						\
+	b	.
 
-_GLOBAL(power7_nap)
+	.text
+
+/*
+ * Pass requested state in r3:
+ * 	0 - nap
+ * 	1 - sleep
+ */
+_GLOBAL(power7_powersave_common)
+	/* Use r3 to pass state nap/sleep/winkle */
 	/* NAP is a state loss, we create a regs frame on the
 	 * stack, fill it up with the state we care about and
 	 * stick a pointer to it in PACAR1. We really only
@@ -79,8 +89,8 @@ _GLOBAL(power7_nap)
 	/* Continue saving state */
 	SAVE_GPR(2, r1)
 	SAVE_NVGPRS(r1)
-	mfcr	r3
-	std	r3,_CCR(r1)
+	mfcr	r4
+	std	r4,_CCR(r1)
 	std	r9,_MSR(r1)
 	std	r1,PACAR1(r13)
 
@@ -90,15 +100,30 @@ _GLOBAL(power7_enter_nap_mode)
 	li	r4,KVM_HWTHREAD_IN_NAP
 	stb	r4,HSTATE_HWTHREAD_STATE(r13)
 #endif
+	cmpwi	cr0,r3,1
+	beq	2f
+	IDLE_STATE_ENTER_SEQ(PPC_NAP)
+	/* No return */
+2:	IDLE_STATE_ENTER_SEQ(PPC_SLEEP)
+	/* No return */
 
-	/* Magic NAP mode enter sequence */
-	std	r0,0(r1)
-	ptesync
-	ld	r0,0(r1)
-1:	cmp	cr0,r0,r0
-	bne	1b
-	PPC_NAP
-	b	.
+_GLOBAL(power7_idle)
+	/* Now check if user or arch enabled NAP mode */
+	LOAD_REG_ADDRBASE(r3,powersave_nap)
+	lwz	r4,ADDROFF(powersave_nap)(r3)
+	cmpwi	0,r4,0
+	beqlr
+	/* fall through */
+
+_GLOBAL(power7_nap)
+	li	r3,0
+	b	power7_powersave_common
+	/* No return */
+
+_GLOBAL(power7_sleep)
+	li	r3,1
+	b	power7_powersave_common
+	/* No return */
 
 _GLOBAL(power7_wakeup_loss)
 	ld	r1,PACAR1(r13)

^ permalink raw reply related

* [PATCH 5/7] powermgt: Add OPAL call to resync timebase on wakeup
From: Preeti U Murthy @ 2014-02-26  0:08 UTC (permalink / raw)
  To: linux-pm, geoff, fweisbec, daniel.lezcano, srivatsa.bhat, benh,
	tglx, svaidy, linuxppc-dev, mingo
  Cc: paulmck, rafael.j.wysocki
In-Reply-To: <20140226000310.17879.67295.stgit@preeti>

From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

During "Fast-sleep" and deeper power savings state, decrementer and
timebase could be stopped making it out of sync with rest
of the cores in the system.

Add a firmware call to request platform to resync timebase
using low level platform methods.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
---

 arch/powerpc/include/asm/opal.h                |    2 ++
 arch/powerpc/kernel/exceptions-64s.S           |    2 +-
 arch/powerpc/kernel/idle_power7.S              |   27 ++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal-wrappers.S |    1 +
 4 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 40157e2..c71c72e 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -154,6 +154,7 @@ extern int opal_enter_rtas(struct rtas_args *args,
 #define OPAL_FLASH_VALIDATE			76
 #define OPAL_FLASH_MANAGE			77
 #define OPAL_FLASH_UPDATE			78
+#define OPAL_RESYNC_TIMEBASE			79
 #define OPAL_GET_MSG				85
 #define OPAL_CHECK_ASYNC_COMPLETION		86
 #define OPAL_SYNC_HOST_REBOOT			87
@@ -865,6 +866,7 @@ extern void opal_flash_init(void);
 extern int opal_machine_check(struct pt_regs *regs);
 
 extern void opal_shutdown(void);
+extern int opal_resync_timebase(void);
 
 extern void opal_lpc_init(void);
 
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index b01a9cb..9533d7a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -145,7 +145,7 @@ BEGIN_FTR_SECTION
 
 	/* Fast Sleep wakeup on PowerNV */
 8:	GET_PACA(r13)
-	b 	.power7_wakeup_loss
+	b 	.power7_wakeup_tb_loss
 
 9:
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S
index 14f78be..c3ab869 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -17,6 +17,7 @@
 #include <asm/ppc-opcode.h>
 #include <asm/hw_irq.h>
 #include <asm/kvm_book3s_asm.h>
+#include <asm/opal.h>
 
 #undef DEBUG
 
@@ -125,6 +126,32 @@ _GLOBAL(power7_sleep)
 	b	power7_powersave_common
 	/* No return */
 
+_GLOBAL(power7_wakeup_tb_loss)
+	ld	r2,PACATOC(r13);
+	ld	r1,PACAR1(r13)
+
+	/* Time base re-sync */
+	li	r0,OPAL_RESYNC_TIMEBASE
+	LOAD_REG_ADDR(r11,opal);
+	ld	r12,8(r11);
+	ld	r2,0(r11);
+	mtctr	r12
+	bctrl
+
+	/* TODO: Check r3 for failure */
+
+	REST_NVGPRS(r1)
+	REST_GPR(2, r1)
+	ld	r3,_CCR(r1)
+	ld	r4,_MSR(r1)
+	ld	r5,_NIP(r1)
+	addi	r1,r1,INT_FRAME_SIZE
+	mtcr	r3
+	mfspr	r3,SPRN_SRR1		/* Return SRR1 */
+	mtspr	SPRN_SRR1,r4
+	mtspr	SPRN_SRR0,r5
+	rfid
+
 _GLOBAL(power7_wakeup_loss)
 	ld	r1,PACAR1(r13)
 	REST_NVGPRS(r1)
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 3e8829c..aab54b6 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -126,6 +126,7 @@ OPAL_CALL(opal_return_cpu,			OPAL_RETURN_CPU);
 OPAL_CALL(opal_validate_flash,			OPAL_FLASH_VALIDATE);
 OPAL_CALL(opal_manage_flash,			OPAL_FLASH_MANAGE);
 OPAL_CALL(opal_update_flash,			OPAL_FLASH_UPDATE);
+OPAL_CALL(opal_resync_timebase,			OPAL_RESYNC_TIMEBASE);
 OPAL_CALL(opal_get_msg,				OPAL_GET_MSG);
 OPAL_CALL(opal_check_completion,		OPAL_CHECK_ASYNC_COMPLETION);
 OPAL_CALL(opal_sync_host_reboot,		OPAL_SYNC_HOST_REBOOT);

^ permalink raw reply related

* [PATCH 6/7] cpuidle/powernv: Add "Fast-Sleep" CPU idle state
From: Preeti U Murthy @ 2014-02-26  0:09 UTC (permalink / raw)
  To: linux-pm, geoff, fweisbec, daniel.lezcano, srivatsa.bhat, benh,
	tglx, svaidy, linuxppc-dev, mingo
  Cc: paulmck, rafael.j.wysocki
In-Reply-To: <20140226000310.17879.67295.stgit@preeti>

Fast sleep is one of the deep idle states on Power8 in which local timers of
CPUs stop. On PowerPC we do not have an external clock device which can
handle wakeup of such CPUs. Now that we have the support in the tick broadcast
framework for archs that do not sport such a device and the low level support
for fast sleep, enable it in the cpuidle framework on PowerNV.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---

 arch/powerpc/Kconfig              |    2 ++
 arch/powerpc/kernel/time.c        |    4 +++-
 drivers/cpuidle/cpuidle-powernv.c |   34 ++++++++++++++++++++++++++++++++++
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 957bf34..b841420 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -130,6 +130,8 @@ config PPC
 	select GENERIC_CMOS_UPDATE
 	select GENERIC_TIME_VSYSCALL_OLD
 	select GENERIC_CLOCKEVENTS
+	select GENERIC_CLOCKEVENTS_BROADCAST if SMP
+	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select GENERIC_STRNCPY_FROM_USER
 	select GENERIC_STRNLEN_USER
 	select HAVE_MOD_ARCH_SPECIFIC
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index df2989b..122a580 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -42,6 +42,7 @@
 #include <linux/timex.h>
 #include <linux/kernel_stat.h>
 #include <linux/time.h>
+#include <linux/clockchips.h>
 #include <linux/init.h>
 #include <linux/profile.h>
 #include <linux/cpu.h>
@@ -106,7 +107,7 @@ struct clock_event_device decrementer_clockevent = {
 	.irq            = 0,
 	.set_next_event = decrementer_set_next_event,
 	.set_mode       = decrementer_set_mode,
-	.features       = CLOCK_EVT_FEAT_ONESHOT,
+	.features       = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP,
 };
 EXPORT_SYMBOL(decrementer_clockevent);
 
@@ -944,6 +945,7 @@ void __init time_init(void)
 	clocksource_init();
 
 	init_decrementer_clockevent();
+	tick_setup_hrtimer_broadcast();
 }
 
 
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 78fd174..4fb97ce 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -11,6 +11,7 @@
 #include <linux/cpuidle.h>
 #include <linux/cpu.h>
 #include <linux/notifier.h>
+#include <linux/clockchips.h>
 
 #include <asm/machdep.h>
 #include <asm/firmware.h>
@@ -49,6 +50,32 @@ static int nap_loop(struct cpuidle_device *dev,
 	return index;
 }
 
+static int fastsleep_loop(struct cpuidle_device *dev,
+				struct cpuidle_driver *drv,
+				int index)
+{
+	unsigned long old_lpcr = mfspr(SPRN_LPCR);
+	unsigned long new_lpcr;
+
+	if (unlikely(system_state < SYSTEM_RUNNING))
+		return index;
+
+	new_lpcr = old_lpcr;
+	new_lpcr &= ~(LPCR_MER | LPCR_PECE); /* lpcr[mer] must be 0 */
+
+	/* exit powersave upon external interrupt, but not decrementer
+	 * interrupt.
+	 */
+	new_lpcr |= LPCR_PECE0;
+
+	mtspr(SPRN_LPCR, new_lpcr);
+	power7_sleep();
+
+	mtspr(SPRN_LPCR, old_lpcr);
+
+	return index;
+}
+
 /*
  * States for dedicated partition case.
  */
@@ -67,6 +94,13 @@ static struct cpuidle_state powernv_states[] = {
 		.exit_latency = 10,
 		.target_residency = 100,
 		.enter = &nap_loop },
+	 { /* Fastsleep */
+		.name = "fastsleep",
+		.desc = "fastsleep",
+		.flags = CPUIDLE_FLAG_TIME_VALID,
+		.exit_latency = 10,
+		.target_residency = 100,
+		.enter = &fastsleep_loop },
 };
 
 static int powernv_cpuidle_add_cpu_notifier(struct notifier_block *n,

^ permalink raw reply related

* [PATCH 7/7] cpuidle/powernv: Parse device tree to setup idle states
From: Preeti U Murthy @ 2014-02-26  0:09 UTC (permalink / raw)
  To: linux-pm, geoff, fweisbec, daniel.lezcano, srivatsa.bhat, benh,
	tglx, svaidy, linuxppc-dev, mingo
  Cc: paulmck, rafael.j.wysocki
In-Reply-To: <20140226000310.17879.67295.stgit@preeti>

Add deep idle states such as nap and fast sleep to the cpuidle state table
only if they are discovered from the device tree during cpuidle initialization.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---

 drivers/cpuidle/cpuidle-powernv.c |   82 +++++++++++++++++++++++++++++--------
 1 file changed, 65 insertions(+), 17 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 4fb97ce..fdae1c4 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -12,10 +12,17 @@
 #include <linux/cpu.h>
 #include <linux/notifier.h>
 #include <linux/clockchips.h>
+#include <linux/of.h>
 
 #include <asm/machdep.h>
 #include <asm/firmware.h>
 
+/* Flags and constants used in PowerNV platform */
+
+#define MAX_POWERNV_IDLE_STATES	8
+#define IDLE_USE_INST_NAP	0x00010000 /* Use nap instruction */
+#define IDLE_USE_INST_SLEEP	0x00020000 /* Use sleep instruction */
+
 struct cpuidle_driver powernv_idle_driver = {
 	.name             = "powernv_idle",
 	.owner            = THIS_MODULE,
@@ -79,7 +86,7 @@ static int fastsleep_loop(struct cpuidle_device *dev,
 /*
  * States for dedicated partition case.
  */
-static struct cpuidle_state powernv_states[] = {
+static struct cpuidle_state powernv_states[MAX_POWERNV_IDLE_STATES] = {
 	{ /* Snooze */
 		.name = "snooze",
 		.desc = "snooze",
@@ -87,20 +94,6 @@ static struct cpuidle_state powernv_states[] = {
 		.exit_latency = 0,
 		.target_residency = 0,
 		.enter = &snooze_loop },
-	{ /* NAP */
-		.name = "NAP",
-		.desc = "NAP",
-		.flags = CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 10,
-		.target_residency = 100,
-		.enter = &nap_loop },
-	 { /* Fastsleep */
-		.name = "fastsleep",
-		.desc = "fastsleep",
-		.flags = CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 10,
-		.target_residency = 100,
-		.enter = &fastsleep_loop },
 };
 
 static int powernv_cpuidle_add_cpu_notifier(struct notifier_block *n,
@@ -161,19 +154,74 @@ static int powernv_cpuidle_driver_init(void)
 	return 0;
 }
 
+static int powernv_add_idle_states(void)
+{
+	struct device_node *power_mgt;
+	struct property *prop;
+	int nr_idle_states = 1; /* Snooze */
+	int dt_idle_states;
+	u32 *flags;
+	int i;
+
+	/* Currently we have snooze statically defined */
+
+	power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
+	if (!power_mgt) {
+		pr_warn("opal: PowerMgmt Node not found\n");
+		return nr_idle_states;
+	}
+
+	prop = of_find_property(power_mgt, "ibm,cpu-idle-state-flags", NULL);
+	if (!prop) {
+		pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n");
+		return nr_idle_states;
+	}
+
+	dt_idle_states = prop->length / sizeof(u32);
+	flags = (u32 *) prop->value;
+
+	for (i = 0; i < dt_idle_states; i++) {
+
+		if (flags[i] & IDLE_USE_INST_NAP) {
+			/* Add NAP state */
+			strcpy(powernv_states[nr_idle_states].name, "Nap");
+			strcpy(powernv_states[nr_idle_states].desc, "Nap");
+			powernv_states[nr_idle_states].flags = CPUIDLE_FLAG_TIME_VALID;
+			powernv_states[nr_idle_states].exit_latency = 10;
+			powernv_states[nr_idle_states].target_residency = 100;
+			powernv_states[nr_idle_states].enter = &nap_loop;
+			nr_idle_states++;
+		}
+
+		if (flags[i] & IDLE_USE_INST_SLEEP) {
+			/* Add FASTSLEEP state */
+			strcpy(powernv_states[nr_idle_states].name, "FastSleep");
+			strcpy(powernv_states[nr_idle_states].desc, "FastSleep");
+			powernv_states[nr_idle_states].flags =
+				CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TIMER_STOP;
+			powernv_states[nr_idle_states].exit_latency = 300;
+			powernv_states[nr_idle_states].target_residency = 1000000;
+			powernv_states[nr_idle_states].enter = &fastsleep_loop;
+			nr_idle_states++;
+		}
+	}
+
+	return nr_idle_states;
+}
+
 /*
  * powernv_idle_probe()
  * Choose state table for shared versus dedicated partition
  */
 static int powernv_idle_probe(void)
 {
-
 	if (cpuidle_disable != IDLE_NO_OVERRIDE)
 		return -ENODEV;
 
 	if (firmware_has_feature(FW_FEATURE_OPALv3)) {
 		cpuidle_state_table = powernv_states;
-		max_idle_state = ARRAY_SIZE(powernv_states);
+		/* Device tree can indicate more idle states */
+		max_idle_state = powernv_add_idle_states();
  	} else
  		return -ENODEV;
 

^ permalink raw reply related

* [PATCH] powerpc: ftrace: bugfix for test_24bit_addr
From: Liu Ping Fan @ 2014-02-26  2:23 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Paul Mackerras

The branch target should be the func addr, not the addr of func_descr_t.
So using ppc_function_entry() to generate the right target addr.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
This bug will make ftrace fail to work. It can be triggered when the kernel
size grows up.
---
 arch/powerpc/kernel/ftrace.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c
index 9b27b29..b0ded97 100644
--- a/arch/powerpc/kernel/ftrace.c
+++ b/arch/powerpc/kernel/ftrace.c
@@ -74,6 +74,7 @@ ftrace_modify_code(unsigned long ip, unsigned int old, unsigned int new)
  */
 static int test_24bit_addr(unsigned long ip, unsigned long addr)
 {
+	addr = ppc_function_entry((void *)addr);
 
 	/* use the create_branch to verify that this offset can be branched */
 	return create_branch((unsigned int *)ip, addr, 0);
-- 
1.8.1.4

^ permalink raw reply related

* RE: [rtc-linux] [PATCH] rtc/ds3232: Enable ds3232 to work as wakeup source
From: Dongsheng.Wang @ 2014-02-26  3:09 UTC (permalink / raw)
  To: Andrew Morton, rtc-linux@googlegroups.com,
	benh@kernel.crashing.org
  Cc: Scott Wood, a.zummo@towertech.it, linuxppc-dev@lists.ozlabs.org,
	chenhui.zhao@freescale.com
In-Reply-To: <20140225140705.fe9f7038bbffbfbde899e0f7@linux-foundation.org>



> -----Original Message-----
> From: Andrew Morton [mailto:akpm@linux-foundation.org]
> Sent: Wednesday, February 26, 2014 6:07 AM
> To: rtc-linux@googlegroups.com
> Cc: Wang Dongsheng-B40534; a.zummo@towertech.it; Zhao Chenhui-B35336; lin=
uxppc-
> dev@lists.ozlabs.org
> Subject: Re: [rtc-linux] [PATCH] rtc/ds3232: Enable ds3232 to work as wak=
eup
> source
>=20
> On Tue, 21 Jan 2014 13:24:51 +0800 Dongsheng Wang <dongsheng.wang@freesca=
le.com>
> wrote:
>=20
> > From: Wang Dongsheng <dongsheng.wang@freescale.com>
> >
> > Add suspend/resume and device_init_wakeup to enable ds3232 as
> > wakeup source, /sys/class/rtc/rtcX/wakealarm for set wakeup alarm.
> >
> > ...
> >
> > @@ -411,23 +424,21 @@ static int ds3232_probe(struct i2c_client *client=
,
> >  	if (ret)
> >  		return ret;
> >
> > -	ds3232->rtc =3D devm_rtc_device_register(&client->dev, client->name,
> > -					  &ds3232_rtc_ops, THIS_MODULE);
> > -	if (IS_ERR(ds3232->rtc)) {
> > -		dev_err(&client->dev, "unable to register the class device\n");
> > -		return PTR_ERR(ds3232->rtc);
> > -	}
> > -
> > -	if (client->irq >=3D 0) {
> > +	if (client->irq !=3D NO_IRQ) {
>=20
> x86_64 allmodconfig:
>=20
> drivers/rtc/rtc-ds3232.c: In function 'ds3232_probe':
> drivers/rtc/rtc-ds3232.c:427: error: 'NO_IRQ' undeclared (first use in th=
is
> function)
> drivers/rtc/rtc-ds3232.c:427: error: (Each undeclared identifier is repor=
ted
> only once
> drivers/rtc/rtc-ds3232.c:427: error: for each function it appears in.)
>=20
> Not all architectures implement NO_IRQ.
>=20
> I think this should be
>=20
> 	if (client->irq > 0) {
>=20
> but I'm not sure - iirc, x86 (at least) treats zero as "not an IRQ".
> But I think some architectures permit IRQ 0.  There was discussion many
> years ago but I don't think anything got resolved.
>=20
I think this is why NO_IRQ is defined in kernel, that should be resolved th=
is issue.

Sorry, I don't know why some architectures didn't define this macro?


Hi Ben,

Did you have some suggestion?

Thanks,
-Dongsheng

>=20
> Help!  I think some ppc people will know what to do here?
>=20

^ permalink raw reply

* Re: [PATCH 0/4] powernv: kvm: numa fault improvement
From: Liu ping fan @ 2014-02-26  3:09 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Paul Mackerras, linuxppc-dev, Aneesh Kumar K.V, kvm-ppc
In-Reply-To: <62DB3340-5AF7-4DA4-A790-77EE00696F57@suse.de>

Sorry to update lately. It takes a long time to apply for test machine
and then, I hit a series of other bugs which I could not resolve
easily. And for now, I have some high priority task, and will come
back to this topic when time is available.
Besides this, I had do some basic test for numa-fault and no
numa-fault test for HV guest, it shows that 10% drop in performance
when  numa-fault is on. (Test with $pg_access_random 60 4 200, and
guest has 10GB mlocked pages ).
I thought this is caused based on the following factors: cache-miss,
tlb-miss, guest->host exit and hw-thread cooperate to exit from guest
state.  Hope my patches to be helpful to reduce the cost of
guest->host exit and hw-thread cooperate to exit.

My test case launches 4 threads on guest( as 4 hw-threads ), and each
of them has random access to PAGE_ALIGN area.
Hope from some suggestion about the test case, so when I had time, I
could improve and finish the test.

Thanks,
Fan

--- test case: usage: pg_random_access  secs  fork_num  mem_size---
#include <ctype.h>
#include <errno.h>
#include <libgen.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <time.h>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/timerfd.h>
#include <time.h>
#include <stdint.h>        /* Definition of uint64_t */
#include <poll.h>


/* */
#define CMD_STOP 0x1234
#define SHM_FNAME "/numafault_shm"
#define PAGE_SIZE (1<<12)

/* the protocol defined on the shm */
#define SHM_CMD_OFF 0x0
#define SHM_CNT_OFF 0x1
#define SHM_MESSAGE_OFF 0x2

#define handle_error(msg) \
        do { perror(msg); exit(EXIT_FAILURE); } while (0)


void __inline__ random_access(void *region_start, int len)
{
        int *p;
        int num;

        num = random();
        num &= ~(PAGE_SIZE - 1);
        num &= (len - 1);
        p = region_start + num;
        *p = 0x654321;
}

static int numafault_body(int size_MB)
{
        /* since MB is always align on PAGE_SIZE, so it is ok to test
fault on page */
        int size = size_MB*1024*1024;
        void *region_start = malloc(size);
        unsigned long *pmap;
        int shm_fid;
        unsigned long cnt = 0;
        pid_t pid = getpid();
        char *dst;
        char buf[128];

        shm_fid = shm_open(SHM_FNAME, O_RDWR, S_IRUSR | S_IWUSR);
        ftruncate(shm_fid, 2*sizeof(long));
        pmap = mmap(NULL, 2*sizeof(long), PROT_WRITE | PROT_READ,
MAP_SHARED, shm_fid, 0);
        if (!pmap) {
                printf("child fail to setup mmap of shm\n");
                return -1;
        }

        while (*(pmap+SHM_CMD_OFF) != CMD_STOP){
                random_access(region_start, size);
                cnt++;
        }

        __atomic_fetch_add((pmap+SHM_CNT_OFF), cnt, __ATOMIC_SEQ_CST);
        dst = (char *)(pmap+SHM_MESSAGE_OFF);
        //tofix, need lock
        sprintf(buf, "child [%i] cnt=%u\n\0", pid, cnt);
        strcat(dst, buf);

        munmap(pmap, 2*sizeof(long));
        shm_unlink(SHM_FNAME);
        fprintf(stdout, "[%s] cnt=%lu\n", pid, cnt);
        fflush(stdout);
        exit(0);

}

int main(int argc, char **argv)
{
        int i;
        pid_t pid;
        int shm_fid;
        unsigned long *pmap;
        int fork_num;
        int size;
        char *dst_info;

        struct itimerspec new_value;
        int fd;
        struct timespec now;
        uint64_t exp, tot_exp;
        ssize_t s;
        struct pollfd pfd;
        int elapsed;

        if (argc != 4){
            fprintf(stderr, "%s wait-secs [secs elapsed before parent
asks the children to exit]\n \
                    fork-num [child num]\n \
                    size [memory region covered by each child in MB]\n",
                    argv[0]);
            exit(EXIT_FAILURE);
        }
        elapsed = atoi(argv[1]);
        fork_num = atoi(argv[2]);
        size = atoi(argv[3]);
        printf("fork %i child process to test mem %i MB for a period: %i sec\n",
                fork_num, size, elapsed);

        fd = timerfd_create(CLOCK_REALTIME, 0);
        if (fd == -1)
            handle_error("timerfd_create");


        shm_fid = shm_open(SHM_FNAME, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
        ftruncate(shm_fid, PAGE_SIZE);
        pmap = mmap(NULL, PAGE_SIZE, PROT_WRITE | PROT_READ,
MAP_SHARED, shm_fid, 0);
        if (!pmap) {
                printf("fail to setup mmap of shm\n");
                return -1;
        }
        memset(pmap, 0, 2*sizeof(long));
        //wmb();

        for (i = 0; i < fork_num; i++){
                switch (pid = fork())
                {
                case 0:            /* child */
                        numafault_body(size);
                        exit(0);
                case -1:           /* error */
                        err (stderr, "fork failed: %s\n", strerror (errno));
                        break;
                default:           /* parent */
                        printf("fork child [%i]\n", pid);
                }
        }

        if (clock_gettime(CLOCK_REALTIME, &now) == -1)
                handle_error("clock_gettime");

        /* Create a CLOCK_REALTIME absolute timer with initial
expiration and interval as specified in command line */

        new_value.it_value.tv_sec = now.tv_sec + elapsed;
        new_value.it_value.tv_nsec = now.tv_nsec;
        new_value.it_interval.tv_sec = 0;
        new_value.it_interval.tv_nsec = 0;

        if (timerfd_settime(fd, TFD_TIMER_ABSTIME, &new_value, NULL) == -1)
                handle_error("timerfd_settime");

        pfd.fd = fd;
        pfd.events = POLLIN;
        pfd.revents = 0;
        /* -1: infinite wait */
        poll(&pfd, 1, -1);



        /* ask children to stop and get back cnt */

        *(pmap + SHM_CMD_OFF) = CMD_STOP;

        wait(NULL);
        dst_info = (char *)(pmap + SHM_MESSAGE_OFF);
        printf(dst_info);
        printf("total cnt:%lu\n", *(pmap + SHM_CNT_OFF));

        munmap(pmap, PAGE_SIZE);
        shm_unlink(SHM_FNAME);
}




On Mon, Jan 20, 2014 at 10:48 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 15.01.2014, at 07:36, Liu ping fan <kernelfans@gmail.com> wrote:
>
>> On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf <agraf@suse.de> wrote:
>>>
>>> On 11.12.2013, at 09:47, Liu Ping Fan <kernelfans@gmail.com> wrote:
>>>
>>>> This series is based on Aneesh's series  "[PATCH -V2 0/5] powerpc: mm: Numa faults support for ppc64"
>>>>
>>>> For this series, I apply the same idea from the previous thread "[PATCH 0/3] optimize for powerpc _PAGE_NUMA"
>>>> (for which, I still try to get a machine to show nums)
>>>>
>>>> But for this series, I think that I have a good justification -- the fact of heavy cost when switching context between guest and host,
>>>> which is  well known.
>>>
>>> This cover letter isn't really telling me anything. Please put a proper description of what you're trying to achieve, why you're trying to achieve what you're trying and convince your readers that it's a good idea to do it the way you do it.
>>>
>> Sorry for the unclear message. After introducing the _PAGE_NUMA,
>> kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it
>> should rely on host's kvmppc_book3s_hv_page_fault() to call
>> do_numa_page() to do the numa fault check. This incurs the overhead
>> when exiting from rmode to vmode.  My idea is that in
>> kvmppc_do_h_enter(), we do a quick check, if the page is right placed,
>> there is no need to exit to vmode (i.e saving htab, slab switching)
>>
>>>> If my suppose is correct, will CCing kvm@vger.kernel.org from next version.
>>>
>>> This translates to me as "This is an RFC"?
>>>
>> Yes, I am not quite sure about it. I have no bare-metal to verify it.
>> So I hope at least, from the theory, it is correct.
>
> Paul, could you please give this some thought and maybe benchmark it?
>
>
> Alex
>

^ permalink raw reply

* Re: [rtc-linux] [PATCH] rtc/ds3232: Enable ds3232 to work as wakeup source
From: Scott Wood @ 2014-02-26  3:20 UTC (permalink / raw)
  To: Wang Dongsheng-B40534
  Cc: a.zummo@towertech.it, Zhao Chenhui-B35336,
	rtc-linux@googlegroups.com, Andrew Morton,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <3eba8b896e624b698ab8291399c5a1b0@BN1PR03MB188.namprd03.prod.outlook.com>

On Tue, 2014-02-25 at 21:09 -0600, Wang Dongsheng-B40534 wrote:
> 
> > -----Original Message-----
> > From: Andrew Morton [mailto:akpm@linux-foundation.org]
> > Sent: Wednesday, February 26, 2014 6:07 AM
> > To: rtc-linux@googlegroups.com
> > Cc: Wang Dongsheng-B40534; a.zummo@towertech.it; Zhao Chenhui-B35336; linuxppc-
> > dev@lists.ozlabs.org
> > Subject: Re: [rtc-linux] [PATCH] rtc/ds3232: Enable ds3232 to work as wakeup
> > source
> > 
> > On Tue, 21 Jan 2014 13:24:51 +0800 Dongsheng Wang <dongsheng.wang@freescale.com>
> > wrote:
> > 
> > > +	if (client->irq != NO_IRQ) {
> > 
> > x86_64 allmodconfig:
> > 
> > drivers/rtc/rtc-ds3232.c: In function 'ds3232_probe':
> > drivers/rtc/rtc-ds3232.c:427: error: 'NO_IRQ' undeclared (first use in this
> > function)
> > drivers/rtc/rtc-ds3232.c:427: error: (Each undeclared identifier is reported
> > only once
> > drivers/rtc/rtc-ds3232.c:427: error: for each function it appears in.)
> > 
> > Not all architectures implement NO_IRQ.
> > 
> > I think this should be
> > 
> > 	if (client->irq > 0) {
> > 
> > but I'm not sure - iirc, x86 (at least) treats zero as "not an IRQ".
> > But I think some architectures permit IRQ 0.  There was discussion many
> > years ago but I don't think anything got resolved.
> > 
> I think this is why NO_IRQ is defined in kernel, that should be resolved this issue.
> 
> Sorry, I don't know why some architectures didn't define this macro?

NO_IRQ is deprecated (see "git log -SNO_IRQ" for the trend of removing
uses of it, as well as situations where it gives the wrong results).
"if (client->irq > 0)" is correct.

-Scott

^ permalink raw reply

* RE: [rtc-linux] [PATCH] rtc/ds3232: Enable ds3232 to work as wakeup source
From: Dongsheng.Wang @ 2014-02-26  3:26 UTC (permalink / raw)
  To: Scott Wood
  Cc: a.zummo@towertech.it, chenhui.zhao@freescale.com,
	rtc-linux@googlegroups.com, Andrew Morton,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1393384830.6733.987.camel@snotra.buserror.net>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogV29vZCBTY290dC1CMDc0
MjENCj4gU2VudDogV2VkbmVzZGF5LCBGZWJydWFyeSAyNiwgMjAxNCAxMToyMSBBTQ0KPiBUbzog
V2FuZyBEb25nc2hlbmctQjQwNTM0DQo+IENjOiBBbmRyZXcgTW9ydG9uOyBydGMtbGludXhAZ29v
Z2xlZ3JvdXBzLmNvbTsgYmVuaEBrZXJuZWwuY3Jhc2hpbmcub3JnOw0KPiBhLnp1bW1vQHRvd2Vy
dGVjaC5pdDsgWmhhbyBDaGVuaHVpLUIzNTMzNjsgbGludXhwcGMtZGV2QGxpc3RzLm96bGFicy5v
cmcNCj4gU3ViamVjdDogUmU6IFtydGMtbGludXhdIFtQQVRDSF0gcnRjL2RzMzIzMjogRW5hYmxl
IGRzMzIzMiB0byB3b3JrIGFzIHdha2V1cA0KPiBzb3VyY2UNCj4gDQo+IE9uIFR1ZSwgMjAxNC0w
Mi0yNSBhdCAyMTowOSAtMDYwMCwgV2FuZyBEb25nc2hlbmctQjQwNTM0IHdyb3RlOg0KPiA+DQo+
ID4gPiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiA+ID4gRnJvbTogQW5kcmV3IE1vcnRv
biBbbWFpbHRvOmFrcG1AbGludXgtZm91bmRhdGlvbi5vcmddDQo+ID4gPiBTZW50OiBXZWRuZXNk
YXksIEZlYnJ1YXJ5IDI2LCAyMDE0IDY6MDcgQU0NCj4gPiA+IFRvOiBydGMtbGludXhAZ29vZ2xl
Z3JvdXBzLmNvbQ0KPiA+ID4gQ2M6IFdhbmcgRG9uZ3NoZW5nLUI0MDUzNDsgYS56dW1tb0B0b3dl
cnRlY2guaXQ7IFpoYW8gQ2hlbmh1aS1CMzUzMzY7DQo+IGxpbnV4cHBjLQ0KPiA+ID4gZGV2QGxp
c3RzLm96bGFicy5vcmcNCj4gPiA+IFN1YmplY3Q6IFJlOiBbcnRjLWxpbnV4XSBbUEFUQ0hdIHJ0
Yy9kczMyMzI6IEVuYWJsZSBkczMyMzIgdG8gd29yayBhcyB3YWtldXANCj4gPiA+IHNvdXJjZQ0K
PiA+ID4NCj4gPiA+IE9uIFR1ZSwgMjEgSmFuIDIwMTQgMTM6MjQ6NTEgKzA4MDAgRG9uZ3NoZW5n
IFdhbmcNCj4gPGRvbmdzaGVuZy53YW5nQGZyZWVzY2FsZS5jb20+DQo+ID4gPiB3cm90ZToNCj4g
PiA+DQo+ID4gPiA+ICsJaWYgKGNsaWVudC0+aXJxICE9IE5PX0lSUSkgew0KPiA+ID4NCj4gPiA+
IHg4Nl82NCBhbGxtb2Rjb25maWc6DQo+ID4gPg0KPiA+ID4gZHJpdmVycy9ydGMvcnRjLWRzMzIz
Mi5jOiBJbiBmdW5jdGlvbiAnZHMzMjMyX3Byb2JlJzoNCj4gPiA+IGRyaXZlcnMvcnRjL3J0Yy1k
czMyMzIuYzo0Mjc6IGVycm9yOiAnTk9fSVJRJyB1bmRlY2xhcmVkIChmaXJzdCB1c2UgaW4gdGhp
cw0KPiA+ID4gZnVuY3Rpb24pDQo+ID4gPiBkcml2ZXJzL3J0Yy9ydGMtZHMzMjMyLmM6NDI3OiBl
cnJvcjogKEVhY2ggdW5kZWNsYXJlZCBpZGVudGlmaWVyIGlzIHJlcG9ydGVkDQo+ID4gPiBvbmx5
IG9uY2UNCj4gPiA+IGRyaXZlcnMvcnRjL3J0Yy1kczMyMzIuYzo0Mjc6IGVycm9yOiBmb3IgZWFj
aCBmdW5jdGlvbiBpdCBhcHBlYXJzIGluLikNCj4gPiA+DQo+ID4gPiBOb3QgYWxsIGFyY2hpdGVj
dHVyZXMgaW1wbGVtZW50IE5PX0lSUS4NCj4gPiA+DQo+ID4gPiBJIHRoaW5rIHRoaXMgc2hvdWxk
IGJlDQo+ID4gPg0KPiA+ID4gCWlmIChjbGllbnQtPmlycSA+IDApIHsNCj4gPiA+DQo+ID4gPiBi
dXQgSSdtIG5vdCBzdXJlIC0gaWlyYywgeDg2IChhdCBsZWFzdCkgdHJlYXRzIHplcm8gYXMgIm5v
dCBhbiBJUlEiLg0KPiA+ID4gQnV0IEkgdGhpbmsgc29tZSBhcmNoaXRlY3R1cmVzIHBlcm1pdCBJ
UlEgMC4gIFRoZXJlIHdhcyBkaXNjdXNzaW9uIG1hbnkNCj4gPiA+IHllYXJzIGFnbyBidXQgSSBk
b24ndCB0aGluayBhbnl0aGluZyBnb3QgcmVzb2x2ZWQuDQo+ID4gPg0KPiA+IEkgdGhpbmsgdGhp
cyBpcyB3aHkgTk9fSVJRIGlzIGRlZmluZWQgaW4ga2VybmVsLCB0aGF0IHNob3VsZCBiZSByZXNv
bHZlZCB0aGlzDQo+IGlzc3VlLg0KPiA+DQo+ID4gU29ycnksIEkgZG9uJ3Qga25vdyB3aHkgc29t
ZSBhcmNoaXRlY3R1cmVzIGRpZG4ndCBkZWZpbmUgdGhpcyBtYWNybz8NCj4gDQo+IE5PX0lSUSBp
cyBkZXByZWNhdGVkIChzZWUgImdpdCBsb2cgLVNOT19JUlEiIGZvciB0aGUgdHJlbmQgb2YgcmVt
b3ZpbmcNCj4gdXNlcyBvZiBpdCwgYXMgd2VsbCBhcyBzaXR1YXRpb25zIHdoZXJlIGl0IGdpdmVz
IHRoZSB3cm9uZyByZXN1bHRzKS4NCj4gImlmIChjbGllbnQtPmlycSA+IDApIiBpcyBjb3JyZWN0
Lg0KPiANClRoYW5rcy4NCg0KLURvbmdzaGVuZw0KDQo+IC1TY290dA0KPiANCg0K

^ permalink raw reply

* Re: [PATCH] powerpc: warn users of smt-snooze-delay that the API isn't there anymore
From: Michael Ellerman @ 2014-02-26  3:45 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Deepthi Dharwar, Madhavan Srinivasan, Cody P Schafer,
	Wang Dongsheng, linux-kernel, Paul Gortmaker, Paul Mackerras,
	Olof Johansson, linuxppc-dev
In-Reply-To: <1393368002.6314.10.camel@pasglop>

On Wed, 2014-02-26 at 09:40 +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2014-02-25 at 13:29 +0530, Deepthi Dharwar wrote:
> > We currently do not use smt-snooze-delay in the kernel.
> > The sysfs entries needs to  be retained until we do a clean up
> > ppc64_cpu
> > util that uses these entries to determine SMT,
> > clean up patch for this has already been posted out by Prerna.
> > Once, we have the ppc64_cpu changes in, we can look to clean up these
> > parts from the kernel.
> 
> We generally shouldn't change user visible interfaces.
> 
> People still have old versions of ppc64_cpu, we must not break them

Yeah we can't remove the file entirely, at least for a few more years.

ppc64_cpu should never have used that file to determine if a cpu existed, but
it did, so we're stuck with it.

What we can do is remove the unused percpu, and just leave the file in sysfs,
and have it print a warning when anyone touches it.

cheers

^ permalink raw reply

* Re: [PATCH v2 10/11] powerpc/perf: add kconfig option for hypervisor provided counters
From: Michael Ellerman @ 2014-02-26  3:48 UTC (permalink / raw)
  To: Cody P Schafer
  Cc: Paul Bolle, Peter Zijlstra, LKML, Tang Yuantian, Ingo Molnar,
	Paul Mackerras, Aneesh Kumar K.V, Arnaldo Carvalho de Melo,
	Priyanka Jain, Scott Wood, Lijun Pan, Anshuman Khandual,
	Linux PPC, Anton Blanchard
In-Reply-To: <530D0BA1.90609@linux.vnet.ibm.com>

On Tue, 2014-02-25 at 13:31 -0800, Cody P Schafer wrote:
> On 02/24/2014 07:33 PM, Michael Ellerman wrote:
> > On Fri, 2014-14-02 at 22:02:14 UTC, Cody P Schafer wrote:
> >> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
> >> ---
> >>   arch/powerpc/perf/Makefile             | 2 ++
> >>   arch/powerpc/platforms/Kconfig.cputype | 6 ++++++
> >>   2 files changed, 8 insertions(+)
> >>
> >> diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
> >> index 60d71ee..f9c083a 100644
> >> --- a/arch/powerpc/perf/Makefile
> >> +++ b/arch/powerpc/perf/Makefile
> >> @@ -11,5 +11,7 @@ obj32-$(CONFIG_PPC_PERF_CTRS)	+= mpc7450-pmu.o
> >>   obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
> >>   obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o
> >>
> >> +obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o
> >> +
> >>   obj-$(CONFIG_PPC64)		+= $(obj64-y)
> >>   obj-$(CONFIG_PPC32)		+= $(obj32-y)
> >> diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
> >> index 434fda3..dcc67cd 100644
> >> --- a/arch/powerpc/platforms/Kconfig.cputype
> >> +++ b/arch/powerpc/platforms/Kconfig.cputype
> >> @@ -364,6 +364,12 @@ config PPC_PERF_CTRS
> >>          help
> >>            This enables the powerpc-specific perf_event back-end.
> >>
> >> +config HV_PERF_CTRS
> >> +       def_bool y
> >
> > This was bool, why did you change it?
> 
> No, it wasn't. v1 also had def_bool. https://lkml.org/lkml/2014/1/16/518
> Maybe you're confusing v2.1 and v2 of this patch?

Er yes. Point releases of a patch series confuse me :)

> >> +       depends on PERF_EVENTS && PPC_HAVE_PMU_SUPPORT
> >
> > Should be:
> >
> > 	depends on PERF_EVENTS && PPC_PSERIES
> >
> >> +       help
> >> +         Enable access to perf counters provided by the hypervisor
> >> +
> 
> Yep, the v2.1 patch (which I bungled and labeled as 9/11) already 
> changes both of these.
> It'll end up rolled into v3.

Yes please.

cheers

^ permalink raw reply

* [prefix=PATCH v5 2/3] powerpc/pseries: Update dynamic cache nodes for suspend/resume operation
From: Tyrel Datwyler @ 2014-02-26  4:02 UTC (permalink / raw)
  To: benh; +Cc: nfont, linuxppc-dev, Tyrel Datwyler
In-Reply-To: <1392843415-17153-3-git-send-email-tyreld@linux.vnet.ibm.com>

From: Haren Myneni <hbabu@us.ibm.com>

pHyp can change cache nodes for suspend/resume operation. Currently the
device tree is updated by drmgr in userspace after all non boot CPUs are
enabled. Hence, we do not modify the cache list based on the latest cache
nodes. Also we do not remove cache entries for the primary CPU.

This patch removes the cache list for the boot CPU, updates the device tree
before enabling nonboot CPUs and adds cache list for the boot cpu.

This patch also has the side effect that older versions of drmgr will
perform a second device tree update from userspace. While this is a
redundant waste of a couple cycles it is harmless since firmware returns the
same data for the subsequent update-nodes/properties rtas calls.

Signed-off-by: Haren Myneni <hbabu@us.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
---

Changes from v4:
- fixes build break for !SMP configs.
 

 arch/powerpc/include/asm/rtas.h          |  1 +
 arch/powerpc/kernel/cacheinfo.c          |  7 +++++--
 arch/powerpc/platforms/pseries/suspend.c | 19 +++++++++++++++++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 9bd52c6..a0e1add 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -283,6 +283,7 @@ extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
 
 #ifdef CONFIG_PPC_PSERIES
 extern int pseries_devicetree_update(s32 scope);
+extern void post_mobility_fixup(void);
 #endif
 
 #ifdef CONFIG_PPC_RTAS_DAEMON
diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
index 2912b87..40198d5 100644
--- a/arch/powerpc/kernel/cacheinfo.c
+++ b/arch/powerpc/kernel/cacheinfo.c
@@ -756,7 +756,10 @@ void cacheinfo_cpu_online(unsigned int cpu_id)
 	cacheinfo_sysfs_populate(cpu_id, cache);
 }
 
-#ifdef CONFIG_HOTPLUG_CPU /* functions needed for cpu offline */
+/* functions needed to remove cache entry for cpu offline or suspend/resume */
+
+#if (defined(CONFIG_PPC_PSERIES) && defined(CONFIG_SUSPEND)) || \
+    defined(CONFIG_HOTPLUG_CPU)
 
 static struct cache *cache_lookup_by_cpu(unsigned int cpu_id)
 {
@@ -843,4 +846,4 @@ void cacheinfo_cpu_offline(unsigned int cpu_id)
 	if (cache)
 		cache_cpu_clear(cache, cpu_id);
 }
-#endif /* CONFIG_HOTPLUG_CPU */
+#endif /* (CONFIG_PPC_PSERIES && CONFIG_SUSPEND) || CONFIG_HOTPLUG_CPU */
diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c
index 16a2552..1d9c580 100644
--- a/arch/powerpc/platforms/pseries/suspend.c
+++ b/arch/powerpc/platforms/pseries/suspend.c
@@ -26,6 +26,7 @@
 #include <asm/mmu.h>
 #include <asm/rtas.h>
 #include <asm/topology.h>
+#include "../../kernel/cacheinfo.h"
 
 static u64 stream_id;
 static struct device suspend_dev;
@@ -79,6 +80,23 @@ static int pseries_suspend_cpu(void)
 }
 
 /**
+ * pseries_suspend_enable_irqs
+ *
+ * Post suspend configuration updates
+ *
+ **/
+static void pseries_suspend_enable_irqs(void)
+{
+	/*
+	 * Update configuration which can be modified based on device tree
+	 * changes during resume.
+	 */
+	cacheinfo_cpu_offline(smp_processor_id());
+	post_mobility_fixup();
+	cacheinfo_cpu_online(smp_processor_id());
+}
+
+/**
  * pseries_suspend_enter - Final phase of hibernation
  *
  * Return value:
@@ -235,6 +253,7 @@ static int __init pseries_suspend_init(void)
 		return rc;
 
 	ppc_md.suspend_disable_cpu = pseries_suspend_cpu;
+	ppc_md.suspend_enable_irqs = pseries_suspend_enable_irqs;
 	suspend_set_ops(&pseries_suspend_ops);
 	return 0;
 }
-- 
1.7.12.2

^ permalink raw reply related

* Re: [PATCH v4 0/3] powerpc/pseries: fix issues in suspend/resume code
From: Tyrel Datwyler @ 2014-02-26  4:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: nfont, linuxppc-dev
In-Reply-To: <1393206791.6771.157.camel@pasglop>

On 02/23/2014 05:53 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2014-02-19 at 12:56 -0800, Tyrel Datwyler wrote:
>> This patchset fixes a couple of issues encountered in the suspend/resume code
>> base. First when using the kernel device tree update code update-nodes is
>> unnecessarily called more than once. Second the cpu cache lists are not
>> updated after a suspend/resume which under certain conditions may cause a
>> panic. Finally, since the cache list fix utilzes in kernel device tree update
>> code a means for telling drmgr that updating the device tree from userspace
>> is unnecessary.
> 
> This series breaks the !SMP build.
> 
> Cheers,
> Ben.

The second patch in this series created the build breakage. I've replied
to that patch with a new version that fixes the break.

-Tyrel

> 
>> Changes from v3:
>> - Updated patch descriptions to better reflect the behavior/interface changes
>>   and why they are acceptable per Ben's concerns.
>>
>> Changes from v2:
>> - Moved dynamic configuration update code into pseries specific routine
>>   per Nathan's suggestion.
>>
>> Changes from v1:
>> - Fixed several commit message typos
>> - Fixed authorship of first two patches
>>
>> Haren Myneni (2):
>>   powerpc/pseries: Device tree should only be updated once after
>>     suspend/migrate
>>   powerpc/pseries: Update dynamic cache nodes for suspend/resume
>>     operation
>>
>> Tyrel Datwyler (1):
>>   powerpc/pseries: Report in kernel device tree update to drmgr
>>
>>  arch/powerpc/include/asm/rtas.h           |  1 +
>>  arch/powerpc/platforms/pseries/mobility.c | 26 +++++++-----------
>>  arch/powerpc/platforms/pseries/suspend.c  | 44 ++++++++++++++++++++++++++++++-
>>  3 files changed, 54 insertions(+), 17 deletions(-)
>>
> 
> 

^ permalink raw reply

* Re: [PATCH] powerpc: ftrace: bugfix for test_24bit_addr
From: Ananth N Mavinakayanahalli @ 2014-02-26  4:35 UTC (permalink / raw)
  To: Liu Ping Fan; +Cc: Paul Mackerras, linuxppc-dev, Anton Blanchard
In-Reply-To: <1393381381-30179-1-git-send-email-pingfank@linux.vnet.ibm.com>

On Wed, Feb 26, 2014 at 10:23:01AM +0800, Liu Ping Fan wrote:
> The branch target should be the func addr, not the addr of func_descr_t.
> So using ppc_function_entry() to generate the right target addr.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
> This bug will make ftrace fail to work. It can be triggered when the kernel
> size grows up.
> ---
>  arch/powerpc/kernel/ftrace.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c
> index 9b27b29..b0ded97 100644
> --- a/arch/powerpc/kernel/ftrace.c
> +++ b/arch/powerpc/kernel/ftrace.c
> @@ -74,6 +74,7 @@ ftrace_modify_code(unsigned long ip, unsigned int old, unsigned int new)
>   */
>  static int test_24bit_addr(unsigned long ip, unsigned long addr)
>  {
> +	addr = ppc_function_entry((void *)addr);

Won't this break on LE?

^ permalink raw reply

* [PATCH v2] powerpc/powernv Platform dump interface
From: Stewart Smith @ 2014-02-26  5:42 UTC (permalink / raw)
  To: benh, linuxppc-dev, Vasant Hegde; +Cc: Stewart Smith
In-Reply-To: <1393293485-6018-1-git-send-email-stewart@linux.vnet.ibm.com>

This enables support for userspace to fetch and initiate FSP and
Platform dumps from the service processor (via firmware) through sysfs.

Based on original patch from Vasant Hegde <hegdevasant@linux.vnet.ibm.com>

Flow:
  - We register for OPAL notification events.
  - OPAL sends new dump available notification.
  - We make information on dump available via sysfs
  - Userspace requests dump contents
  - We retrieve the dump via OPAL interface
  - User copies the dump data
  - userspace sends ack for dump
  - We send ACK to OPAL.

sysfs files:
  - We add the /sys/firmware/opal/dump directory
  - echoing 1 (well, anything, but in future we may support
    different dump types) to /sys/firmware/opal/dump/initiate_dump
    will initiate a dump.
  - Each dump that we've been notified of gets a directory
    in /sys/firmware/opal/dump/ with a name of the dump type and ID (in hex,
    as this is what's used elsewhere to identify the dump).
  - Each dump has files: id, type, dump and acknowledge
    dump is binary and is the dump itself.
    echoing 'ack' to acknowledge (currently any string will do) will
    acknowledge the dump and it will soon after disappear from sysfs.

OPAL APIs:
  - opal_dump_init()
  - opal_dump_info()
  - opal_dump_read()
  - opal_dump_ack()
  - opal_dump_resend_notification()

Currently we are only ever notified for one dump at a time (until
the user explicitly acks the current dump, then we get a notification
of the next dump), but this kernel code should "just work" when OPAL
starts notifying us of all the dumps present.

Changes since v1:
 - Add support for getting dump type from OPAL through new OPAL call
   (falling back to old OPAL_DUMP_INFO call if OPAL_DUMP_INFO2 isn't
    supported)
 - use dump type in directory name for dump

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
---
 Documentation/ABI/stable/sysfs-firmware-opal-dump |   41 ++
 arch/powerpc/include/asm/opal.h                   |   14 +
 arch/powerpc/platforms/powernv/Makefile           |    2 +-
 arch/powerpc/platforms/powernv/opal-dump.c        |  530 +++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal-wrappers.S    |    6 +
 arch/powerpc/platforms/powernv/opal.c             |    2 +
 6 files changed, 594 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ABI/stable/sysfs-firmware-opal-dump
 create mode 100644 arch/powerpc/platforms/powernv/opal-dump.c

diff --git a/Documentation/ABI/stable/sysfs-firmware-opal-dump b/Documentation/ABI/stable/sysfs-firmware-opal-dump
new file mode 100644
index 0000000..32fe7f5
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-firmware-opal-dump
@@ -0,0 +1,41 @@
+What:		/sys/firmware/opal/dump
+Date:		Feb 2014
+Contact:	Stewart Smith <stewart@linux.vnet.ibm.com>
+Description:
+		This directory exposes interfaces for interacting with
+		the FSP and platform dumps through OPAL firmware interface.
+
+		This is only for the powerpc/powernv platform.
+
+		initiate_dump:	When '1' is written to it,
+				we will initiate a dump.
+				Read this file for supported commands.
+
+		0xXX-0xYYYY:	A directory for dump of type 0xXX and
+				id 0xYYYY (in hex). The name of this
+				directory should not be relied upon to
+				be in this format, only that it's unique
+				among all dumps. For determining the type
+				and ID of the dump, use the id and type files.
+				Do not rely on any particular size of dump
+				type or dump id.
+
+		Each dump has the following files:
+		id:		An ASCII representation of the dump ID
+				in hex (e.g. '0x01')
+		type:		An ASCII representation of the type of
+				dump in the format "0x%x %s" with the ID
+				in hex and a description of the dump type
+				(or 'unknown').
+				Type '0xffffffff unknown' is used when
+				we could not get the type from firmware.
+				e.g. '0x02 System/Platform Dump'
+		dump:		A binary file containing the dump.
+				The size of the dump is the size of this file.
+		acknowledge:	When 'ack' is written to this, we will
+				acknowledge that we've retrieved the
+				dump to the service processor. It will
+				then remove it, making the dump
+				inaccessible.
+				Reading this file will get a list of
+				supported actions.
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 40157e2..89c840c 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -154,9 +154,15 @@ extern int opal_enter_rtas(struct rtas_args *args,
 #define OPAL_FLASH_VALIDATE			76
 #define OPAL_FLASH_MANAGE			77
 #define OPAL_FLASH_UPDATE			78
+#define OPAL_DUMP_INIT				81
+#define OPAL_DUMP_INFO				82
+#define OPAL_DUMP_READ				83
+#define OPAL_DUMP_ACK				84
 #define OPAL_GET_MSG				85
 #define OPAL_CHECK_ASYNC_COMPLETION		86
+#define OPAL_DUMP_RESEND			91
 #define OPAL_SYNC_HOST_REBOOT			87
+#define OPAL_DUMP_INFO2				94
 
 #ifndef __ASSEMBLY__
 
@@ -237,6 +243,7 @@ enum OpalPendingState {
 	OPAL_EVENT_EPOW			= 0x80,
 	OPAL_EVENT_LED_STATUS		= 0x100,
 	OPAL_EVENT_PCI_ERROR		= 0x200,
+	OPAL_EVENT_DUMP_AVAIL		= 0x400,
 	OPAL_EVENT_MSG_PENDING		= 0x800,
 };
 
@@ -826,6 +833,12 @@ int64_t opal_lpc_read(uint32_t chip_id, enum OpalLPCAddressType addr_type,
 int64_t opal_validate_flash(uint64_t buffer, uint32_t *size, uint32_t *result);
 int64_t opal_manage_flash(uint8_t op);
 int64_t opal_update_flash(uint64_t blk_list);
+int64_t opal_dump_init(uint8_t dump_type);
+int64_t opal_dump_info(uint32_t *dump_id, uint32_t *dump_size);
+int64_t opal_dump_info2(uint32_t *dump_id, uint32_t *dump_size, uint32_t *dump_type);
+int64_t opal_dump_read(uint32_t dump_id, uint64_t buffer);
+int64_t opal_dump_ack(uint32_t dump_id);
+int64_t opal_dump_resend_notification(void);
 
 int64_t opal_get_msg(uint64_t buffer, size_t size);
 int64_t opal_check_completion(uint64_t buffer, size_t size, uint64_t token);
@@ -861,6 +874,7 @@ extern void opal_get_rtc_time(struct rtc_time *tm);
 extern unsigned long opal_get_boot_time(void);
 extern void opal_nvram_init(void);
 extern void opal_flash_init(void);
+extern void opal_platform_dump_init(void);
 
 extern int opal_machine_check(struct pt_regs *regs);
 
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 8d767fd..3528c11 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,6 +1,6 @@
 obj-y			+= setup.o opal-takeover.o opal-wrappers.o opal.o
 obj-y			+= opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
-obj-y			+= rng.o
+obj-y			+= rng.o opal-dump.o
 
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
diff --git a/arch/powerpc/platforms/powernv/opal-dump.c b/arch/powerpc/platforms/powernv/opal-dump.c
new file mode 100644
index 0000000..11abc0a
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-dump.c
@@ -0,0 +1,530 @@
+/*
+ * PowerNV OPAL Dump Interface
+ *
+ * Copyright 2013,2014 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/kobject.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <linux/pagemap.h>
+#include <linux/delay.h>
+
+#include <asm/opal.h>
+
+#define DUMP_TYPE_FSP	0x01
+
+struct dump_obj {
+	struct kobject  kobj;
+	struct bin_attribute dump_attr;
+	uint32_t	id;  /* becomes object name */
+	uint32_t	type;
+	uint32_t	size;
+	char		*buffer;
+};
+#define to_dump_obj(x) container_of(x, struct dump_obj, kobj)
+
+struct dump_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct dump_obj *dump, struct dump_attribute *attr,
+			char *buf);
+	ssize_t (*store)(struct dump_obj *dump, struct dump_attribute *attr,
+			 const char *buf, size_t count);
+};
+#define to_dump_attr(x) container_of(x, struct dump_attribute, attr)
+
+static ssize_t dump_id_show(struct dump_obj *dump_obj,
+			    struct dump_attribute *attr,
+			    char *buf)
+{
+	return sprintf(buf, "0x%x\n", dump_obj->id);
+}
+
+static const char* dump_type_to_string(uint32_t type)
+{
+	switch (type) {
+	case 0x01: return "SP Dump";
+	case 0x02: return "System/Platform Dump";
+	case 0x03: return "SMA Dump";
+	default: return "unknown";
+	}
+}
+
+static ssize_t dump_type_show(struct dump_obj *dump_obj,
+			      struct dump_attribute *attr,
+			      char *buf)
+{
+	
+	return sprintf(buf, "0x%x %s\n", dump_obj->type,
+		       dump_type_to_string(dump_obj->type));
+}
+
+static ssize_t dump_ack_show(struct dump_obj *dump_obj,
+			     struct dump_attribute *attr,
+			     char *buf)
+{
+	return sprintf(buf, "ack - acknowledge dump\n");
+}
+
+/*
+ * Send acknowledgement to OPAL
+ */
+static int64_t dump_send_ack(uint32_t dump_id)
+{
+	int rc;
+
+	rc = opal_dump_ack(dump_id);
+	if (rc)
+		pr_warn("%s: Failed to send ack to Dump ID 0x%x (%d)\n",
+			__func__, dump_id, rc);
+	return rc;
+}
+
+static void delay_release_kobj(void *kobj)
+{
+	kobject_put((struct kobject *)kobj);
+}
+
+static ssize_t dump_ack_store(struct dump_obj *dump_obj,
+			      struct dump_attribute *attr,
+			      const char *buf,
+			      size_t count)
+{
+	dump_send_ack(dump_obj->id);
+	sysfs_schedule_callback(&dump_obj->kobj, delay_release_kobj,
+				&dump_obj->kobj, THIS_MODULE);
+	return count;
+}
+
+/* Attributes of a dump
+ * The binary attribute of the dump itself is dynamic
+ * due to the dynamic size of the dump
+ */
+static struct dump_attribute id_attribute =
+	__ATTR(id, 0666, dump_id_show, NULL);
+static struct dump_attribute type_attribute =
+	__ATTR(type, 0666, dump_type_show, NULL);
+static struct dump_attribute ack_attribute =
+	__ATTR(acknowledge, 0660, dump_ack_show, dump_ack_store);
+
+static ssize_t init_dump_show(struct dump_obj *dump_obj,
+			      struct dump_attribute *attr,
+			      char *buf)
+{
+	return sprintf(buf, "1 - initiate dump\n");
+}
+
+static int64_t dump_fips_init(uint8_t type)
+{
+	int rc;
+
+	rc = opal_dump_init(type);
+	if (rc)
+		pr_warn("%s: Failed to initiate FipS dump (%d)\n",
+			__func__, rc);
+	return rc;
+}
+
+static ssize_t init_dump_store(struct dump_obj *dump_obj,
+			       struct dump_attribute *attr,
+			       const char *buf,
+			       size_t count)
+{
+	dump_fips_init(DUMP_TYPE_FSP);
+	pr_info("%s: Initiated FSP dump\n", __func__);
+	return count;
+}
+
+static struct dump_attribute initiate_attribute =
+	__ATTR(initiate_dump, 0600, init_dump_show, init_dump_store);
+
+static struct attribute *initiate_attrs[] = {
+	&initiate_attribute.attr,
+	NULL,
+};
+
+static struct attribute_group initiate_attr_group = {
+	.attrs = initiate_attrs,
+};
+
+static struct kset *dump_kset;
+
+static ssize_t dump_attr_show(struct kobject *kobj,
+			      struct attribute *attr,
+			      char *buf)
+{
+	struct dump_attribute *attribute;
+	struct dump_obj *dump;
+
+	attribute = to_dump_attr(attr);
+	dump = to_dump_obj(kobj);
+
+	if (!attribute->show)
+		return -EIO;
+
+	return attribute->show(dump, attribute, buf);
+}
+
+static ssize_t dump_attr_store(struct kobject *kobj,
+			       struct attribute *attr,
+			       const char *buf, size_t len)
+{
+	struct dump_attribute *attribute;
+	struct dump_obj *dump;
+
+	attribute = to_dump_attr(attr);
+	dump = to_dump_obj(kobj);
+
+	if (!attribute->store)
+		return -EIO;
+
+	return attribute->store(dump, attribute, buf, len);
+}
+
+static const struct sysfs_ops dump_sysfs_ops = {
+	.show = dump_attr_show,
+	.store = dump_attr_store,
+};
+
+static void dump_release(struct kobject *kobj)
+{
+	struct dump_obj *dump;
+
+	dump = to_dump_obj(kobj);
+	vfree(dump->buffer);
+	kfree(dump);
+}
+
+static struct attribute *dump_default_attrs[] = {
+	&id_attribute.attr,
+	&type_attribute.attr,
+	&ack_attribute.attr,
+	NULL,
+};
+
+static struct kobj_type dump_ktype = {
+	.sysfs_ops = &dump_sysfs_ops,
+	.release = &dump_release,
+	.default_attrs = dump_default_attrs,
+};
+
+static void free_dump_sg_list(struct opal_sg_list *list)
+{
+	struct opal_sg_list *sg1;
+	while (list) {
+		sg1 = list->next;
+		kfree(list);
+		list = sg1;
+	}
+	list = NULL;
+}
+
+static struct opal_sg_list *dump_data_to_sglist(struct dump_obj *dump)
+{
+	struct opal_sg_list *sg1, *list = NULL;
+	void *addr;
+	int64_t size;
+
+	addr = dump->buffer;
+	size = dump->size;
+
+	sg1 = kzalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!sg1)
+		goto nomem;
+
+	list = sg1;
+	sg1->num_entries = 0;
+	while (size > 0) {
+		/* Translate virtual address to physical address */
+		sg1->entry[sg1->num_entries].data =
+			(void *)(vmalloc_to_pfn(addr) << PAGE_SHIFT);
+
+		if (size > PAGE_SIZE)
+			sg1->entry[sg1->num_entries].length = PAGE_SIZE;
+		else
+			sg1->entry[sg1->num_entries].length = size;
+
+		sg1->num_entries++;
+		if (sg1->num_entries >= SG_ENTRIES_PER_NODE) {
+			sg1->next = kzalloc(PAGE_SIZE, GFP_KERNEL);
+			if (!sg1->next)
+				goto nomem;
+
+			sg1 = sg1->next;
+			sg1->num_entries = 0;
+		}
+		addr += PAGE_SIZE;
+		size -= PAGE_SIZE;
+	}
+	return list;
+
+nomem:
+	pr_err("%s : Failed to allocate memory\n", __func__);
+	free_dump_sg_list(list);
+	return NULL;
+}
+
+static void sglist_to_phy_addr(struct opal_sg_list *list)
+{
+	struct opal_sg_list *sg, *next;
+
+	for (sg = list; sg; sg = next) {
+		next = sg->next;
+		/* Don't translate NULL pointer for last entry */
+		if (sg->next)
+			sg->next = (struct opal_sg_list *)__pa(sg->next);
+		else
+			sg->next = NULL;
+
+		/* Convert num_entries to length */
+		sg->num_entries =
+			sg->num_entries * sizeof(struct opal_sg_entry) + 16;
+	}
+}
+
+static int64_t dump_read_info(uint32_t *id, uint32_t *size, uint32_t *type)
+{
+	int rc;
+	*type = 0xffffffff;
+
+	rc = opal_dump_info2(id, size, type);
+
+	if (rc == OPAL_PARAMETER)
+		rc = opal_dump_info(id, size);
+
+	if (rc)
+		pr_warn("%s: Failed to get dump info (%d)\n",
+			__func__, rc);
+	return rc;
+}
+
+static int64_t dump_read_data(struct dump_obj *dump)
+{
+	struct opal_sg_list *list;
+	uint64_t addr;
+	int64_t rc;
+
+	/* Allocate memory */
+	dump->buffer = vzalloc(PAGE_ALIGN(dump->size));
+	if (!dump->buffer) {
+		pr_err("%s : Failed to allocate memory\n", __func__);
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	/* Generate SG list */
+	list = dump_data_to_sglist(dump);
+	if (!list) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	/* Translate sg list addr to real address */
+	sglist_to_phy_addr(list);
+
+	/* First entry address */
+	addr = __pa(list);
+
+	/* Fetch data */
+	rc = OPAL_BUSY_EVENT;
+	while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
+		rc = opal_dump_read(dump->id, addr);
+		if (rc == OPAL_BUSY_EVENT) {
+			opal_poll_events(NULL);
+			msleep(20);
+		}
+	}
+
+	if (rc != OPAL_SUCCESS && rc != OPAL_PARTIAL)
+		pr_warn("%s: Extract dump failed for ID 0x%x\n",
+			__func__, dump->id);
+
+	/* Free SG list */
+	free_dump_sg_list(list);
+
+out:
+	return rc;
+}
+
+static ssize_t dump_attr_read(struct file *filep, struct kobject *kobj,
+			      struct bin_attribute *bin_attr,
+			      char *buffer, loff_t pos, size_t count)
+{
+	ssize_t rc;
+
+	struct dump_obj *dump = to_dump_obj(kobj);
+
+	if (!dump->buffer) {
+		rc = dump_read_data(dump);
+
+		if (rc != OPAL_SUCCESS && rc != OPAL_PARTIAL) {
+			vfree(dump->buffer);
+			dump->buffer = NULL;
+
+			return -EIO;
+		}
+		if (rc == OPAL_PARTIAL) {
+			/* On a partial read, we just return EIO
+			 * and rely on userspace to ask us to try
+			 * again.
+			 */
+			pr_info("%s: Platform dump partially read.ID = 0x%x\n",
+				__func__, dump->id);
+			return -EIO;
+		}
+	}
+
+	memcpy(buffer, dump->buffer + pos, count);
+
+	/* If we're done reading, let's free the buffer as we
+	 * probably won't need it around anymore (and can always
+	 * re-fetch it from firmware.
+	*/
+	if ((pos+count) >= dump->size) {
+		pr_info("%s: Freeing Platform dump Id = 0x%x\n",
+			__func__, dump->id);
+		vfree(dump->buffer);
+		dump->buffer = NULL;
+	}
+
+	return count;
+}
+
+static struct dump_obj *create_dump_obj(uint32_t id, size_t size,
+					uint32_t type)
+{
+	struct dump_obj *dump;
+	int rc;
+
+	dump = kzalloc(sizeof(*dump), GFP_KERNEL);
+	if (!dump)
+		return NULL;
+
+	dump->kobj.kset = dump_kset;
+
+	kobject_init(&dump->kobj, &dump_ktype);
+
+	sysfs_bin_attr_init(&dump->dump_attr);
+
+	dump->dump_attr.attr.name = "dump";
+	dump->dump_attr.attr.mode = 0400;
+	dump->dump_attr.size = size;
+	dump->dump_attr.read = dump_attr_read;
+
+	dump->id = id;
+	dump->size = size;
+	dump->type = type;
+
+	rc = kobject_add(&dump->kobj, NULL, "0x%x-0x%x", type, id);
+	if (rc) {
+		kobject_put(&dump->kobj);
+		return NULL;
+	}
+
+	rc = sysfs_create_bin_file(&dump->kobj, &dump->dump_attr);
+	if (rc) {
+		kobject_put(&dump->kobj);
+		return NULL;
+	}
+
+	pr_info("%s: New platform dump. ID = 0x%x Size %u\n",
+		__func__, dump->id, dump->size);
+
+	kobject_uevent(&dump->kobj, KOBJ_ADD);
+
+	return dump;
+}
+
+static int process_dump(void)
+{
+	int rc;
+	uint32_t dump_id, dump_size, dump_type;
+	struct dump_obj *dump;
+	char name[22];
+
+	rc = dump_read_info(&dump_id, &dump_size, &dump_type);
+	if (rc != OPAL_SUCCESS)
+		return rc;
+
+	sprintf(name, "0x%x-0x%x", dump_type, dump_id);
+
+	/* we may get notified twice, let's handle
+	 * that gracefully and not create two conflicting
+	 * entries.
+	 */
+	if (kset_find_obj(dump_kset, name))
+		return 0;
+
+	dump = create_dump_obj(dump_id, dump_size, dump_type);
+	if (!dump)
+		return -1;
+
+	return 0;
+}
+
+static void dump_work_fn(struct work_struct *work)
+{
+	process_dump();
+}
+
+static DECLARE_WORK(dump_work, dump_work_fn);
+
+static void schedule_process_dump(void)
+{
+	schedule_work(&dump_work);
+}
+
+/*
+ * New dump available notification
+ *
+ * Once we get notification, we add sysfs entries for it.
+ * We only fetch the dump on demand, and create sysfs asynchronously.
+ */
+static int dump_event(struct notifier_block *nb,
+		      unsigned long events, void *change)
+{
+	if (events & OPAL_EVENT_DUMP_AVAIL)
+		schedule_process_dump();
+
+	return 0;
+}
+
+static struct notifier_block dump_nb = {
+	.notifier_call  = dump_event,
+	.next           = NULL,
+	.priority       = 0
+};
+
+void __init opal_platform_dump_init(void)
+{
+	int rc;
+
+	dump_kset = kset_create_and_add("dump", NULL, opal_kobj);
+	if (!dump_kset) {
+		pr_warn("%s: Failed to create dump kset\n", __func__);
+		return;
+	}
+
+	rc = sysfs_create_group(&dump_kset->kobj, &initiate_attr_group);
+	if (rc) {
+		pr_warn("%s: Failed to create initiate dump attr group\n",
+			__func__);
+		kobject_put(&dump_kset->kobj);
+		return;
+	}
+
+	rc = opal_notifier_register(&dump_nb);
+	if (rc) {
+		pr_warn("%s: Can't register OPAL event notifier (%d)\n",
+			__func__, rc);
+		return;
+	}
+
+	opal_dump_resend_notification();
+}
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 3e8829c..3e02783 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -126,6 +126,12 @@ OPAL_CALL(opal_return_cpu,			OPAL_RETURN_CPU);
 OPAL_CALL(opal_validate_flash,			OPAL_FLASH_VALIDATE);
 OPAL_CALL(opal_manage_flash,			OPAL_FLASH_MANAGE);
 OPAL_CALL(opal_update_flash,			OPAL_FLASH_UPDATE);
+OPAL_CALL(opal_dump_init,			OPAL_DUMP_INIT);
+OPAL_CALL(opal_dump_info,			OPAL_DUMP_INFO);
+OPAL_CALL(opal_dump_info2,			OPAL_DUMP_INFO2);
+OPAL_CALL(opal_dump_read,			OPAL_DUMP_READ);
+OPAL_CALL(opal_dump_ack,			OPAL_DUMP_ACK);
 OPAL_CALL(opal_get_msg,				OPAL_GET_MSG);
 OPAL_CALL(opal_check_completion,		OPAL_CHECK_ASYNC_COMPLETION);
+OPAL_CALL(opal_dump_resend_notification,	OPAL_DUMP_RESEND);
 OPAL_CALL(opal_sync_host_reboot,		OPAL_SYNC_HOST_REBOOT);
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 65499ad..262cd1a 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -474,6 +474,8 @@ static int __init opal_init(void)
 	if (rc == 0) {
 		/* Setup code update interface */
 		opal_flash_init();
+		/* Setup platform dump extract interface */
+		opal_platform_dump_init();
 	}
 
 	return 0;
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH] powerpc: ftrace: bugfix for test_24bit_addr
From: Liu ping fan @ 2014-02-26  6:06 UTC (permalink / raw)
  To: ananth; +Cc: Paul Mackerras, linuxppc-dev, Anton Blanchard
In-Reply-To: <20140226043552.GA14672@in.ibm.com>

On Wed, Feb 26, 2014 at 12:35 PM, Ananth N Mavinakayanahalli
<ananth@in.ibm.com> wrote:
> On Wed, Feb 26, 2014 at 10:23:01AM +0800, Liu Ping Fan wrote:
>> The branch target should be the func addr, not the addr of func_descr_t.
>> So using ppc_function_entry() to generate the right target addr.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>> This bug will make ftrace fail to work. It can be triggered when the kernel
>> size grows up.
>> ---
>>  arch/powerpc/kernel/ftrace.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c
>> index 9b27b29..b0ded97 100644
>> --- a/arch/powerpc/kernel/ftrace.c
>> +++ b/arch/powerpc/kernel/ftrace.c
>> @@ -74,6 +74,7 @@ ftrace_modify_code(unsigned long ip, unsigned int old, unsigned int new)
>>   */
>>  static int test_24bit_addr(unsigned long ip, unsigned long addr)
>>  {
>> +     addr = ppc_function_entry((void *)addr);
>
> Won't this break on LE?
>
How? I can not figure out it. Anyway, ppc_function_entry() is already
used in other places with LE.

Thx,
Fan

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox