LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] mtd: m25p80: Make the name of mtd_info fixed
From: Brian Norris @ 2014-01-23  2:12 UTC (permalink / raw)
  To: Hou Zhiqiang
  Cc: scottwood, linuxppc-dev, mingkai.hu, linux-mtd, Ezequiel Garcia
In-Reply-To: <1388990069-27066-1-git-send-email-b48286@freescale.com>

Hi Hou,

On Mon, Jan 06, 2014 at 02:34:29PM +0800, Hou Zhiqiang wrote:
> To give spi flash layout using "mtdparts=..." in cmdline, we must
> give mtd_info a fixed name,because the cmdlinepart's parser will
> match the name given in cmdline with the mtd_info.
> 
> Now, if use OF node, mtd_info's name will be spi->dev->name. It
> consists of spi_master->bus_num, and the spi_master->bus_num maybe
> dynamically fetched.
> So, give the mtd_info a new fiexd name "name.cs", "name" is name of
> spi_device_id and "cs" is chip-select in spi_dev.
> 
> Signed-off-by: Hou Zhiqiang <b48286@freescale.com>
> ---
>  drivers/mtd/devices/m25p80.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c
> index eb558e8..d1ed480 100644
> --- a/drivers/mtd/devices/m25p80.c
> +++ b/drivers/mtd/devices/m25p80.c
> @@ -1012,7 +1012,8 @@ static int m25p_probe(struct spi_device *spi)
>  	if (data && data->name)
>  		flash->mtd.name = data->name;
>  	else
> -		flash->mtd.name = dev_name(&spi->dev);
> +		flash->mtd.name = kasprintf(GFP_KERNEL, "%s.%d",
> +				id->name, spi->chip_select);

Changing the mtd.name may have far-reaching consequences for users who
already have mtdparts= command lines. But your concern is probably valid
for dynamically-determined bus numbers. Perhaps you can edit this patch
to only change the name when the busnum is dynamically-allocated?

This also needs a NULL check (for OOM), and you leak memory on device
removal.

>  
>  	flash->mtd.type = MTD_NORFLASH;
>  	flash->mtd.writesize = 1;

Brian

^ permalink raw reply

* Re: [PATCH 2/3] powerpc/85xx: Provide two functions to save/restore the core registers
From: Scott Wood @ 2014-01-23  0:50 UTC (permalink / raw)
  To: Wang Dongsheng-B40534
  Cc: anton@enomsg.org, linuxppc-dev@lists.ozlabs.org,
	Zhao Chenhui-B35336
In-Reply-To: <7db8f36932f84bc5bdcb8a7777c55383@BN1PR03MB188.namprd03.prod.outlook.com>

On Mon, 2014-01-20 at 20:43 -0600, Wang Dongsheng-B40534 wrote:
> 
> > -----Original Message-----
> > From: Wood Scott-B07421
> > Sent: Tuesday, January 21, 2014 9:06 AM
> > To: Wang Dongsheng-B40534
> > Cc: benh@kernel.crashing.org; Zhao Chenhui-B35336; anton@enomsg.org; linuxppc-
> > dev@lists.ozlabs.org
> > Subject: Re: [PATCH 2/3] powerpc/85xx: Provide two functions to save/restore the
> > core registers
> > 
> > On Mon, 2014-01-20 at 00:03 -0600, Wang Dongsheng-B40534 wrote:
> > > > > > > +	/*
> > > > > > > +	 * Need to save float-point registers if MSR[FP] = 1.
> > > > > > > +	 */
> > > > > > > +	mfmsr	r12
> > > > > > > +	andi.	r12, r12, MSR_FP
> > > > > > > +	beq	1f
> > > > > > > +	do_sr_fpr_regs(save)
> > > > > >
> > > > > > C code should have already ensured that MSR[FP] is not 1 (and thus the
> > FP
> > > > > > context has been saved).
> > > > > >
> > > > >
> > > > > Yes, right. But I mean if the FP still use in core save flow, we need to
> > save
> > > > it.
> > > > > In this process, i don't care what other code do, we need to focus on not
> > > > losing
> > > > > valuable data.
> > > >
> > > > It is not allowed to use FP at that point.
> > > >
> > > If MSR[FP] not active, that is FP not allowed to use.
> > > But here is a normal judgment, if MSR[FP] is active, this means that the
> > floating
> > > point module is being used. I offer is a function of the interface, we don't
> > know
> > > where is the function will be called. Just because we call this function in
> > the
> > > context of uncertainty, we need this judgment to ensure that no data is lost.
> > 
> > The whole point of calling enable_kernel_fp() in C code before
> > suspending is to ensure that the FP state gets saved.  If FP is used
> > after that point it is a bug.  If you're worried about such bugs, then
> > clear MSR[FP] after calling enable_kernel_fp(), rather than adding
> > redundant state saving.
> > 
> 
> enable_kernel_fp() calling in MEM suspend flow.
> Hibernation is different with MEM suspend, and I'm not sure where will call this
> interface, so we need to ensure the integrity of the core saving. I don't think
> this code is *redundant*. I trust that the kernel can keep the FP related
> operations, that's why a judgment is here. :)

For hibernation, save_processor_state() is called first, which does
flush_fp_to_thread() which has a similar effect (though I wonder if it's
being called on the correct task for non-SMP).

-Scott

^ permalink raw reply

* Re: [PATCH] clk: corenet: Update the clock bindings
From: Scott Wood @ 2014-01-23  0:44 UTC (permalink / raw)
  To: Tang Yuantian; +Cc: b07421, b32579, linuxppc-dev, devicetree
In-Reply-To: <1390269732-22798-1-git-send-email-Yuantian.Tang@freescale.com>

On Tue, 2014-01-21 at 10:02 +0800, Tang Yuantian wrote:
> From: Tang Yuantian <yuantian.tang@freescale.com>
> 
> Main changs include:
> 	- Clarified the clock nodes' version number
> 	- Fixed a issue in example
> 
> Singed-off-by: Tang Yuantian <Yuantian.Tang@freescale.com>
> ---
>  Documentation/devicetree/bindings/clock/corenet-clock.txt | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/clock/corenet-clock.txt b/Documentation/devicetree/bindings/clock/corenet-clock.txt
> index 24711af..d6cadef 100644
> --- a/Documentation/devicetree/bindings/clock/corenet-clock.txt
> +++ b/Documentation/devicetree/bindings/clock/corenet-clock.txt
> @@ -54,6 +54,8 @@ Required properties:
>  		It takes parent's clock-frequency as its clock.
>  	* "fsl,qoriq-sysclk-2.0": for input system clock (v2.0).
>  		It takes parent's clock-frequency as its clock.
> +	Note: v1.0 and v2.0 are clock version which should align to
> +	clockgen node's they belong to which is chassis version.

Instead, how about a note like this near the top of the file:

All references to "1.0" and "2.0" refer to the QorIQ chassis version to
which the chip complies.

Chassis Version		Example Chips
---------------		-------------
1.0			p4080, p5020, p5040
2.0			t4240, b4860, t1040


BTW, this binding and the associated driver really should be called
"qoriq-clock", not "corenet-clock".  This would match the compatible
string, and it doesn't really have much to do with corenet (which is
part of the QorIQ chassis v1 and v2, but not *this* part).  Do you know
if the chassis v3 clock interface will be similar enough to share a
driver?

-Scott

^ permalink raw reply

* Re: [PATCH RFC 00/73] tree-wide: clean up some no longer required #include <linux/init.h>
From: Paul Gortmaker @ 2014-01-23  0:38 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: linux-arch, linux-mips, linux-m68k, rusty, linux-ia64, kvm,
	linux-s390, netdev, x86, linux-kernel, torvalds, gregkh,
	linux-alpha, sparclinux, akpm, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20140122180023.dd90d34cba38d9f9ac516349@canb.auug.org.au>

[Re: [PATCH RFC 00/73] tree-wide: clean up some no longer required #include=
 <linux/init.h>] On 22/01/2014 (Wed 18:00) Stephen Rothwell wrote:

> Hi Paul,
>=20
> On Tue, 21 Jan 2014 16:22:03 -0500 Paul Gortmaker <paul.gortmaker@windriv=
er.com> wrote:
> >
> > Where: This work exists as a queue of patches that I apply to
> > linux-next; since the changes are fixing some things that currently
> > can only be found there.  The patch series can be found at:
> >=20
> >    http://git.kernel.org/cgit/linux/kernel/git/paulg/init.git
> >    git://git.kernel.org/pub/scm/linux/kernel/git/paulg/init.git
> >=20
> > I've avoided annoying Stephen with another queue of patches for
> > linux-next while the development content was in flux, but now that
> > the merge window has opened, and new additions are fewer, perhaps he
> > wouldn't mind tacking it on the end...  Stephen?
>=20
> OK, I have added this to the end of linux-next today - we will see how we
> go.  It is called "init".

Thanks, it was a great help as it uncovered a few issues in fringe arch
that I didn't have toolchains for, and I've fixed all of those up.

I've noticed that powerpc has been un-buildable for a while now; I have
used this hack patch locally so I could run the ppc defconfigs to check
that I didn't break anything.  Maybe useful for linux-next in the
interim?  It is a hack patch -- Not-Signed-off-by: Paul Gortmaker.  :)

Paul.
--

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/includ=
e/asm/pgtable-ppc64.h
index d27960c89a71..d0f070a2b395 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -560,9 +560,9 @@ extern void pmdp_invalidate(struct vm_area_struct *vma,=
 unsigned long address,
 			    pmd_t *pmdp);
=20
 #define pmd_move_must_withdraw pmd_move_must_withdraw
-typedef struct spinlock spinlock_t;
-static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
-					 spinlock_t *old_pmd_ptl)
+struct spinlock;
+static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
+					 struct spinlock *old_pmd_ptl)
 {
 	/*
 	 * Archs like ppc64 use pgtable to store per pmd

^ permalink raw reply related

* Re: [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters
From: Cody P Schafer @ 2014-01-23  0:11 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Linux PPC
In-Reply-To: <1390354379.11104.3.camel@concordia>

On 01/21/2014 05:32 PM, Michael Ellerman wrote:
> On Thu, 2014-01-16 at 15:53 -0800, Cody P Schafer wrote:
>> These patches add basic pmus for 2 powerpc hypervisor interfaces to obtain
>> performance counters: gpci ("get performance counter info") and 24x7.
>>
>> The counters supplied by these interfaces are continually counting and never
>> need to be (and cannot be) disabled or enabled. They additionally do not
>> generate any interrupts. This makes them in some regards similar to software
>> counters, and as a result their implimentation shares some common code (which
>> an initial patch exposes) with the sw counters.
> 
> Hi Cody,
> 
> Can you please add some more explanation of this series.
Sure
 > In particular why do we need two new PMUs, and how do they relate to each
> other?
These 2 PMUs end up providing access to some cpu, core, and chip level counters not
exposed via other interfaces, and additionally allow monitoring the performance of
other lpars (guests) on the same host system. Because it provides access to core and
chip level counters, this pair of PMUs could be thought of as powerpc's counterpart
to x86's uncore events.

As an example, "processor_bus_utilization_abc" and "processor_bus_utilization_wxyz"
(in hv_gpci.h) allow retreval of total cycles and idle cycles for various inter-chip buses.

GPCI is an interface that already exists on some power7 machines (depending on the fw
version), but is rather in-flexible and code intensive to add additional counters to.
The 24x7 interfaces currently are designed to co-exist with the gpci interface while
replacing most of gpci's functionality on newer systems. Right now, the 24x7 code I've
submitted uses the gpci calls to check if it has permission to access certain classes
of counters.

> And can you add an example of how I'd actually use them using perf.

# For gpci (formed from reading hv_gpci.h), gets "processor_time_in_timebase_cycles"
perf stat -e 'hv_gpci/counter_info_version=3,offset=0,length=8,secondary_index=0,starting_index=0xffffffff,request=0x10/' -r 0 -a -x ' ' sleep 0.1

# For 24x7, assuming access to hw+fw that supports it, gets a yet-to-be identified counter:
perf stat -e 'hv_24x7/domain=2,offset=8,starting_index=0,lpar=0xffffffff/' -r 0 -C 0 -x ' ' sleep 0.1

^ permalink raw reply

* RE: [02/12,v3] pci: fsl: add structure fsl_pci
From: Roy Zang @ 2014-01-22 23:38 UTC (permalink / raw)
  To: Scott Wood, Minghuan.Lian@freescale.com
  Cc: Bjorn Helgaas, linux-pci@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20140103221923.GB22546@home.buserror.net>



> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Friday, January 03, 2014 4:19 PM
> To: Lian Minghuan-B31939
> Cc: linuxppc-dev@lists.ozlabs.org; linux-pci@vger.kernel.org; Zang Roy-
> R61911; Bjorn Helgaas
> Subject: Re: [02/12,v3] pci: fsl: add structure fsl_pci
>=20
> On Wed, Oct 23, 2013 at 06:41:24PM +0800, Minghuan Lian wrote:
> > PowerPC uses structure pci_controller to describe PCI controller, but
> > ARM uses structure pci_sys_data. In order to support PowerPC and ARM
> > simultaneously, the patch adds a structure fsl_pci that contains most
> > of the members of the pci_controller and pci_sys_data.
> > Meanwhile, it defines a interface fsl_arch_sys_to_pci() which should
> > be implemented in architecture-specific PCI controller driver to
> > convert pci_controller or pci_sys_data to fsl_pci.
> >
> > Signed-off-by: Minghuan Lian <Minghuan.Lian@freescale.com>
> >
> > ---
> > change log:
> > v1-v3:
> > Derived from http://patchwork.ozlabs.org/patch/278965/
> >
> > Based on upstream master.
> > Based on the discussion of RFC version here
> > http://patchwork.ozlabs.org/patch/274487/
> >
> >  include/linux/fsl/pci-common.h | 41
> > +++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 41 insertions(+)
> >
> > diff --git a/include/linux/fsl/pci-common.h
> > b/include/linux/fsl/pci-common.h index 5e4f683..e56a040 100644
> > --- a/include/linux/fsl/pci-common.h
> > +++ b/include/linux/fsl/pci-common.h
> > @@ -102,5 +102,46 @@ struct ccsr_pci {
> >
> >  };
> >
> > +/*
> > + * Structure of a PCI controller (host bridge)  */ struct fsl_pci {
> > +	struct list_head node;
> > +	bool is_pcie;
> > +	struct device_node *dn;
> > +	struct device *dev;
> > +
> > +	int first_busno;
> > +	int last_busno;
> > +	int self_busno;
> > +	struct resource busn;
> > +
> > +	struct pci_ops *ops;
> > +	struct ccsr_pci __iomem *regs;
> > +
> > +	u32 indirect_type;
> > +
> > +	struct resource io_resource;
> > +	resource_size_t io_base_phys;
> > +	resource_size_t pci_io_size;
> > +
> > +	struct resource mem_resources[3];
> > +	resource_size_t mem_offset[3];
> > +
> > +	int global_number;	/* PCI domain number */
> > +
> > +	resource_size_t dma_window_base_cur;
> > +	resource_size_t dma_window_size;
> > +
> > +	void *sys;
> > +};
>=20
> I don't like the extent to which this duplicates (not moves) PPC's struct
> pci_controller.  Also this leaves some fields like "indirect_type"
> unexplained (PPC_INDIRECT_TYPE_xxx is only in the PPC header).
INDIRECT type is for configuration space access, I do not think it is ppc h=
eader specific.
It is good to put it to freescale pci common code.
Roy

^ permalink raw reply

* Re: [PATCH 3/3] powerpc/fsl: Use the new interface to save or restore registers
From: Scott Wood @ 2014-01-22 20:34 UTC (permalink / raw)
  To: Wang Dongsheng-B40534
  Cc: anton@enomsg.org, linuxppc-dev@lists.ozlabs.org,
	Zhao Chenhui-B35336
In-Reply-To: <18d3109ac114477a805618829632b463@BN1PR03MB188.namprd03.prod.outlook.com>

On Sun, 2014-01-19 at 23:57 -0600, Wang Dongsheng-B40534 wrote:
> > > > > Use fsl_cpu_state_save/fsl_cpu_state_restore to save/restore registers.
> > > > > Use the functions to save/restore registers, so we don't need to
> > > > > maintain the code.
> > > > >
> > > > > Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
> > > >
> > > > Is there any functional change with this patchset (e.g. suspend
> > > > supported on chips where it wasn't before), or is it just cleanup?  A
> > > > cover letter would be useful to describe the purpose of the overall
> > > > patchset when it isn't obvious.
> > > >
> > >
> > > Yes, just cleanup..
> > 
> > It seems to be introducing complexity rather than removing it.  Is this
> > cleanup needed to prepare for adding new functionality?
> > 
> > Plus, I'm skeptical that this is functionally equivalent.  It looks like
> > the new code saves a lot more than the old code does.  Why?
> > 
> 
> Actually, I want to take a practical example to push the save/restore patches.
> And this is also reasonable for 32bit-hibernation, the code is more clean. :)
> I think I need to change the description of the patch.
> 
> > > > > +
> > > > > +	/* Restore base register */
> > > > > +	li	r4, 0
> > > > > +	bl	fsl_cpu_state_restore
> > > >
> > > > Why are you calling anything with "fsl" in the name from code that is
> > > > supposed to be for all booke?
> > > >
> > > E200, E300 not support.
> > > Support E500, E500v2, E500MC, E5500, E6500.
> > >
> > > Do you have any suggestions about this?
> > 
> > What about non-FSL booke such as 44x?
> > 
> > Or if this file never supported 44x, rename it appropriately.
> > 
> Currently does not support. ok change the name first, if later support, and
> then again to modify the name of this function.
> 
> How about 85xx_cpu_state_restore?

Symbols can't begin with numbers.  booke_cpu_state_restore would be
better (it would still provide a place for 44x to be added if somebody
actually cared about doing so).

I'm still not convinced that asm code is the place to do this, though.

-Scott

^ permalink raw reply

* Re: [PATCH RFC] powerpc/mpc85xx: add support for the kmp204x reference board
From: Scott Wood @ 2014-01-22 20:33 UTC (permalink / raw)
  To: Valentin Longchamp; +Cc: linuxppc-dev@lists.ozlabs.org
In-Reply-To: <52DFF413.7060806@keymile.com>

On Wed, 2014-01-22 at 17:38 +0100, Valentin Longchamp wrote:
> On 01/21/2014 06:01 PM, Scott Wood wrote:
> > On Tue, 2014-01-21 at 17:34 +0100, Valentin Longchamp wrote:
> >> Can you please explicitly tell me how I should build this node ? What other
> >> comments ? Must I be more generic with the name ?
> >>
> >> Something like :
> >>
> >> spi@1 {
> >> 	compatible = "zarlink,30343", "spidev";
> > 
> > Remove "spidev".  Any nodes under the SPI controller node will be SPI
> > devices, right?  So it doesn't add anything regarding hardware
> > description.
> >  
> 
> OK.
> 
> Thank you for the feedback, I will then send a revised patch as soon as I have time.

Oh, and ideally the node name should describe the function of the device
-- "spi" as a node name usually means a SPI controller.

Maybe "ptp_clock@1"?

Also, zarlink should be added to
Documentation/devicetree/bindings/vendor-prefixes.txt

-Scott

^ permalink raw reply

* [PATCH v2 3/3] powerpc/pseries: Report in kernel device tree update to drmgr
From: Tyrel Datwyler @ 2014-01-22 19:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: nfont, Tyrel Datwyler
In-Reply-To: <1390420717-23907-1-git-send-email-tyreld@linux.vnet.ibm.com>

Traditionally it has been drmgr's responsibilty to update the device tree
through the /proc/ppc64/ofdt interface after a suspend/resume operation.
This patchset however has modified suspend/resume ops to preform that update
entirely in the kernel during the resume. Therefore, a mechanism is required
for drmgr to determine who is responsible for the update. This patch adds a
show function to the "hibernate" attribute that returns 1 if the kernel
updates the device tree after the resume and 0 if drmgr is responsible.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/suspend.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c
index 16a2552..723115d 100644
--- a/arch/powerpc/platforms/pseries/suspend.c
+++ b/arch/powerpc/platforms/pseries/suspend.c
@@ -174,7 +174,30 @@ out:
 	return rc;
 }
 
-static DEVICE_ATTR(hibernate, S_IWUSR, NULL, store_hibernate);
+#define USER_DT_UPDATE	0
+#define KERN_DT_UPDATE	1
+
+/**
+ * show_hibernate - Report device tree update responsibilty
+ * @dev:		subsys root device
+ * @attr:		device attribute struct
+ * @buf:		buffer
+ *
+ * Report whether a device tree update is performed by the kernel after a
+ * resume, or if drmgr must coordinate the update from user space.
+ *
+ * Return value:
+ *	0 if drmgr is to initiate update, and 1 otherwise
+ **/
+static ssize_t show_hibernate(struct device *dev,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	return sprintf(buf, "%d\n", KERN_DT_UPDATE);
+}
+
+static DEVICE_ATTR(hibernate, S_IWUSR | S_IRUGO,
+		   show_hibernate, store_hibernate);
 
 static struct bus_type suspend_subsys = {
 	.name = "power",
-- 
1.7.12.4

^ permalink raw reply related

* [PATCH v2 2/3] powerpc/pseries: Update dynamic cache nodes for suspend/resume operation
From: Tyrel Datwyler @ 2014-01-22 19:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: nfont, Tyrel Datwyler
In-Reply-To: <1390420717-23907-1-git-send-email-tyreld@linux.vnet.ibm.com>

From: Haren Myneni <hbabu@us.ibm.com>

From: Haren Myneni <hbabu@us.ibm.com>

pHyp can change cache nodes for suspend/resume operation. The current code
updates the device tree after all non boot CPUs are enabled. Hence, we do not
modify the cache list based on the latest cache nodes. Also we do not remove
cache entries for the primary CPU.

This patch removes the cache list for the boot CPU, updates the device tree
before enabling nonboot CPUs and adds cache list for the boot cpu.

Signed-off-by: Haren Myneni <hbabu@us.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/rtas.h |  4 ++++
 arch/powerpc/kernel/rtas.c      | 17 +++++++++++++++++
 arch/powerpc/kernel/time.c      |  6 ++++++
 3 files changed, 27 insertions(+)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 9bd52c6..da9d733 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -283,6 +283,10 @@ extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
 
 #ifdef CONFIG_PPC_PSERIES
 extern int pseries_devicetree_update(s32 scope);
+extern void post_mobility_fixup(void);
+extern void update_dynamic_configuration(void);
+#else /* !CONFIG_PPC_PSERIES */
+void update_dynamic_configuration(void) { }
 #endif
 
 #ifdef CONFIG_PPC_RTAS_DAEMON
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 4cf674d..8249eb2 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -43,6 +43,7 @@
 #include <asm/time.h>
 #include <asm/mmu.h>
 #include <asm/topology.h>
+#include "cacheinfo.h"
 
 struct rtas_t rtas = {
 	.lock = __ARCH_SPIN_LOCK_UNLOCKED
@@ -972,6 +973,22 @@ out:
 	free_cpumask_var(offline_mask);
 	return atomic_read(&data.error);
 }
+
+/*
+ * The device tree cache nodes can be modified during suspend/ resume.
+ * So delete all cache entries and recreate them again after the device tree
+ * update.
+ * We already deleted cache entries for notboot CPUs before suspend. So delete
+ * entries for the primary CPU, recreate entries after the device tree update.
+ * We can create entries for nonboot CPU when enable them later.
+ */
+
+void update_dynamic_configuration(void)
+{
+	cacheinfo_cpu_offline(smp_processor_id());
+	post_mobility_fixup();
+	cacheinfo_cpu_online(smp_processor_id());
+}
 #else /* CONFIG_PPC_PSERIES */
 int rtas_ibm_suspend_me(struct rtas_args *args)
 {
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index b3b1441..5f1ca28 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -69,6 +69,7 @@
 #include <asm/vdso_datapage.h>
 #include <asm/firmware.h>
 #include <asm/cputime.h>
+#include <asm/rtas.h>
 
 /* powerpc clocksource/clockevent code */
 
@@ -592,6 +593,11 @@ void arch_suspend_enable_irqs(void)
 	generic_suspend_enable_irqs();
 	if (ppc_md.suspend_enable_irqs)
 		ppc_md.suspend_enable_irqs();
+	/*
+	 * Update configuration which can be modified based on devicetree
+	 * changes during resume.
+	 */
+	update_dynamic_configuration();
 }
 #endif
 
-- 
1.7.12.4

^ permalink raw reply related

* [PATCH v2 0/3] powerpc/pseries: fix issues in suspend/resume code
From: Tyrel Datwyler @ 2014-01-22 19:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: nfont, Tyrel Datwyler

This patchset fixes a couple of issues encountered in the suspend/resume code
base. First when using the kernel device tree update code update-nodes is
unnecessarily called more than once. Second the cpu cache lists are not
updated after a suspend/resume which under certain conditions may cause a
panic. Finally, since the cache list fix utilzes in kernel device tree update
code a means for telling drmgr not to perform a device tree update from
userspace is required.

Changes from v1:
- Fixed several commit message typos
- Fixed authorship of first two patches

Haren Myneni (2):
  powerpc/pseries: Device tree should only be updated once after
    suspend/migrate
  powerpc/pseries: Update dynamic cache nodes for suspend/resume
    operation

Tyrel Datwyler (1):
  powerpc/pseries: Report in kernel device tree update to drmgr

 arch/powerpc/include/asm/rtas.h           |  4 ++++
 arch/powerpc/kernel/rtas.c                | 17 +++++++++++++++++
 arch/powerpc/kernel/time.c                |  6 ++++++
 arch/powerpc/platforms/pseries/mobility.c | 26 ++++++++++----------------
 arch/powerpc/platforms/pseries/suspend.c  | 25 ++++++++++++++++++++++++-
 5 files changed, 61 insertions(+), 17 deletions(-)

-- 
1.7.12.4

^ permalink raw reply

* [PATCH v2 1/3] powerpc/pseries: Device tree should only be updated once after suspend/migrate
From: Tyrel Datwyler @ 2014-01-22 19:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: nfont, Tyrel Datwyler
In-Reply-To: <1390420717-23907-1-git-send-email-tyreld@linux.vnet.ibm.com>

From: Haren Myneni <hbabu@us.ibm.com>

From: Haren Myneni <hbabu@us.ibm.com>

The current code makes rtas calls for update-nodes, activate-firmware and then
update-nodes again. The FW provides the same data for both update-nodes calls.
As a result a proc entry exists error is reported for the second update while
adding device nodes.

This patch makes a single rtas call for update-nodes after activating the FW.
It also add rtas_busy delay for the activate-firmware rtas call.

Signed-off-by: Haren Myneni <hbabu@us.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 26 ++++++++++----------------
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index cde4e0a..bde7eba 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -290,13 +290,6 @@ void post_mobility_fixup(void)
 	int rc;
 	int activate_fw_token;
 
-	rc = pseries_devicetree_update(MIGRATION_SCOPE);
-	if (rc) {
-		printk(KERN_ERR "Initial post-mobility device tree update "
-		       "failed: %d\n", rc);
-		return;
-	}
-
 	activate_fw_token = rtas_token("ibm,activate-firmware");
 	if (activate_fw_token == RTAS_UNKNOWN_SERVICE) {
 		printk(KERN_ERR "Could not make post-mobility "
@@ -304,16 +297,17 @@ void post_mobility_fixup(void)
 		return;
 	}
 
-	rc = rtas_call(activate_fw_token, 0, 1, NULL);
-	if (!rc) {
-		rc = pseries_devicetree_update(MIGRATION_SCOPE);
-		if (rc)
-			printk(KERN_ERR "Secondary post-mobility device tree "
-			       "update failed: %d\n", rc);
-	} else {
+	do {
+		rc = rtas_call(activate_fw_token, 0, 1, NULL);
+	} while (rtas_busy_delay(rc));
+
+	if (rc)
 		printk(KERN_ERR "Post-mobility activate-fw failed: %d\n", rc);
-		return;
-	}
+
+	rc = pseries_devicetree_update(MIGRATION_SCOPE);
+	if (rc)
+		printk(KERN_ERR "Post-mobility device tree update "
+			"failed: %d\n", rc);
 
 	return;
 }
-- 
1.7.12.4

^ permalink raw reply related

* Re: [PATCH RFC] powerpc/mpc85xx: add support for the kmp204x reference board
From: Valentin Longchamp @ 2014-01-22 16:38 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1390323689.24905.484.camel@snotra.buserror.net>

On 01/21/2014 06:01 PM, Scott Wood wrote:
> On Tue, 2014-01-21 at 17:34 +0100, Valentin Longchamp wrote:
>> On 01/20/2014 11:37 PM, Scott Wood wrote:
>>> On Mon, 2014-01-20 at 17:38 +0100, Valentin Longchamp wrote:
>>>> On 01/17/2014 10:48 PM, Scott Wood wrote:
>>>>> Why isn't the compatible "keymile,kmcoge4", like the model?
>>>>
>>>> Because kmcoge4 is the board that is based on the kmp204x architecture/design.
>>>> We expect other boards (kmcoge7 for instance) based on the same kmp204x design.
>>>
>>> The top-level compatible isn't for the "architecture" or the "design".
>>> It's for the board.  Surely there's something different about kmcoge7
>>> versus kmcoge4 -- is it visible to software?
>>
>> There should only be a few differences in the dts between the two boards.
>>
>> Reading the ePAPR my understanding was that compatible is the "programming
>> model" and that's what I have named above design/architecture while model is the
>> exact model of the device in this case the exact board name.
> 
> In practice, model is more for human consumption (e.g. there may be many
> variants that all look identical to software).  The "programming model"
> for an entire board includes everything on it.
>  
>>>> You would prefer that I have the model and compatible stricly the same and add
>>>> any future board into the compatible boards[] from corenet_generic ?
>>>
>>> That's how it's usually done.  Or, at least provide the board
>>> architecture name as a secondary compatible after the board name.
>>>
>>>> If possible I would like to be able to see the boards that are based on a
>>>> similar design, that's what I wanted to achieve with this kmp204x name.
>>>
>>> Is "kmp204x" an official name of the architecture, rather than a
>>> generalization of "kmp2040" and "kmp2041"?  If there were a p2042, and
>>> you made a board for it, is there any chance it would be called kmp204x
>>> even if it were very different from the p2040/p2041 board?
>>
>> It's the name we have picked up, but it's not official. We also use km83xx,
>> km82xx and it was derived from that.
>>
>> If the hypothetical p2042 board was different it would then have another name.
> 
> In that case, I don't object to it being listed in compatible, though
> the specific board name should come first.

OK then to sum up both points we would have:

	model = "keymile,kmcoge4";
	compatible = "keymile,kmcoge4", "keymile,kmp204x";

And I would add "keymile,kmcoge4" into the boards[] table.

> 
>>>>> The device tree describes the hardware, not what driver you want to use.
>>>>>
>>>>> Plus, I don't see any driver that matches "gen,spidev" nor any binding
>>>>> for it, and "gen" doesn't make sense as a vendor prefix.  The only
>>>>> instance of that string I can find in the Linux tree is in mgcoge.dts.
>>>>
>>>> Well it comes from mgcoge and that's why I have used this
>>>>
>>>> It's for usage with the spidev driver (driver/spi/spidev.c). I agree that the
>>>> gen brings nothing. Would
>>>>
>>>> spidev@1 {
>>>> 	compatible = "spidev";
>>>>
>>>> make more sense ?
>>>
>>> It doesn't address any of the other comments.
>>
>> Can you please explicitly tell me how I should build this node ? What other
>> comments ? Must I be more generic with the name ?
>>
>> Something like :
>>
>> spi@1 {
>> 	compatible = "zarlink,30343", "spidev";
> 
> Remove "spidev".  Any nodes under the SPI controller node will be SPI
> devices, right?  So it doesn't add anything regarding hardware
> description.
>  

OK.

Thank you for the feedback, I will then send a revised patch as soon as I have time.

Valentin

^ permalink raw reply

* Re: [PATCH V5 6/8] time/cpuidle: Support in tick broadcast framework in the absence of external clock device
From: Thomas Gleixner @ 2014-01-22 13:27 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: daniel.lezcano, peterz, fweisbec, paul.gortmaker, paulus, mingo,
	mikey, shangw, rafael.j.wysocki, agraf, paulmck, arnd, linux-pm,
	rostedt, michael, john.stultz, anton, chenhui.zhao, deepthi,
	r58472, geoff, linux-kernel, srivatsa.bhat, schwidefsky,
	linuxppc-dev
In-Reply-To: <20140115080947.20446.30305.stgit@preeti.in.ibm.com>

On Wed, 15 Jan 2014, Preeti U Murthy wrote:
> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
> index 086ad60..d61404e 100644
> --- a/kernel/time/clockevents.c
> +++ b/kernel/time/clockevents.c
> @@ -524,12 +524,13 @@ void clockevents_resume(void)
>  #ifdef CONFIG_GENERIC_CLOCKEVENTS
>  /**
>   * clockevents_notify - notification about relevant events
> + * Returns non zero on error.
>   */
> -void clockevents_notify(unsigned long reason, void *arg)
> +int clockevents_notify(unsigned long reason, void *arg)
>  {

The interface change of clockevents_notify wants to be a separate
patch.

> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
> index 9532690..1c23912 100644
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -20,6 +20,7 @@
>  #include <linux/sched.h>
>  #include <linux/smp.h>
>  #include <linux/module.h>
> +#include <linux/slab.h>
>  
>  #include "tick-internal.h"
>  
> @@ -35,6 +36,15 @@ static cpumask_var_t tmpmask;
>  static DEFINE_RAW_SPINLOCK(tick_broadcast_lock);
>  static int tick_broadcast_force;
>  
> +/*
> + * Helper variables for handling broadcast in the absence of a
> + * tick_broadcast_device.
> + * */
> +static struct hrtimer *bc_hrtimer;
> +static int bc_cpu = -1;
> +static ktime_t bc_next_wakeup;

Why do you need another variable to store the expiry time? The
broadcast code already knows it and the hrtimer expiry value gives you
the same information for free.

> +static int hrtimer_initialized = 0;

What's the point of this hrtimer_initialized dance? Why not simply
making the hrtimer static and avoid that all together. Also adding the
initialization into tick_broadcast_oneshot_available() is
braindamaged.  Why not adding this to tick_broadcast_init() which is
the proper place to do?

Aside of that you are making this hrtimer mode unconditional, which
might break existing systems which are not aware of the hrtimer
implications.

What you really want is a pseudo clock event device which has the
proper functions for handling the timer and you can register it from
your architecture code. The broadcast core code needs a few tweaks to
avoid the shutdown of the cpu local clock event device, but aside of
that the whole thing just falls into place. So architectures can use
this if they want and are sure that their low level idle code knows
about the deep idle preventing return value of
clockevents_notify(). Once that works you can register the hrtimer
based broadcast device and a real hardware broadcast device with a
higher rating. It just works.

Find an incomplete and nonfunctional concept patch below. It should be
simple to make it work for real.

Thanks,

	tglx
----
Index: linux-2.6/include/linux/clockchips.h
===================================================================
--- linux-2.6.orig/include/linux/clockchips.h
+++ linux-2.6/include/linux/clockchips.h
@@ -62,6 +62,11 @@ enum clock_event_mode {
 #define CLOCK_EVT_FEAT_DYNIRQ		0x000020
 #define CLOCK_EVT_FEAT_PERCPU		0x000040
 
+/*
+ * Clockevent device is based on a hrtimer for broadcast
+ */
+#define CLOCK_EVT_FEAT_HRTIMER		0x000080
+
 /**
  * struct clock_event_device - clock event device descriptor
  * @event_handler:	Assigned by the framework to be called by the low
@@ -83,6 +88,7 @@ enum clock_event_mode {
  * @name:		ptr to clock event name
  * @rating:		variable to rate clock event devices
  * @irq:		IRQ number (only for non CPU local devices)
+ * @bound_on:		Bound on CPU
  * @cpumask:		cpumask to indicate for which CPUs this device works
  * @list:		list head for the management code
  * @owner:		module reference
@@ -113,6 +119,7 @@ struct clock_event_device {
 	const char		*name;
 	int			rating;
 	int			irq;
+	int			bound_on;
 	const struct cpumask	*cpumask;
 	struct list_head	list;
 	struct module		*owner;
Index: linux-2.6/kernel/time/tick-broadcast-hrtimer.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/time/tick-broadcast-hrtimer.c
@@ -0,0 +1,77 @@
+
+static struct hrtimer bctimer;
+
+static void bc_set_mode(enum clock_event_mode mode,
+			struct clock_event_device *bc)
+{
+	switch (mode) {
+	case CLOCK_EVT_MODE_SHUTDOWN:
+		/*
+		 * Note, we cannot cancel the timer here as we might
+		 * run into the following live lock scenario:
+		 *
+		 * cpu 0		cpu1
+		 * lock(broadcast_lock);
+		 *			hrtimer_interrupt()
+		 *			bc_handler()
+		 *			   tick_handle_oneshot_broadcast();
+		 *			    lock(broadcast_lock);
+		 * hrtimer_cancel()
+		 *  wait_for_callback()
+		 */
+		hrtimer_try_to_cancel(&bctimer);
+		break;
+	default:
+		break;
+	}
+}
+
+/*
+ * This is called from the guts of the broadcast code when the cpu
+ * which is about to enter idle has the earliest broadcast timer event.
+ */
+static int bc_set_next(ktime_t expires, struct clock_event_device *bc)
+{
+	/*
+	 * We try to cancel the timer first. If the callback is on
+	 * flight on some other cpu then we let it handle it. If we
+	 * were able to cancel the timer nothing can rearm it as we
+	 * own broadcast_lock.
+	 */
+	if (hrtimer_try_to_cancel(&bctimer) >= 0) {
+		hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED);
+		/* Bind the "device" to the cpu */
+		bc->bound_on = smp_processor_id();
+	}
+	return 0;
+}
+
+static struct clock_event_device ce_broadcast_hrtimer = {
+	.set_mode		= bc_set_mode,
+	.set_next_ktime		= bc_set_next,
+	.features		= CLOCK_EVT_FEAT_ONESHOT |
+				  CLOCK_EVT_FEAT_KTIME |
+				  CLOCK_EVT_FEAT_HRTIMER,
+	.rating			= 0,
+	.bound_on		= -1,
+	.min_delta_ns		= 1,
+	.max_delta_ns		= KTIME_MAX,
+	.min_delta_ticks	= 1,
+	.max_delta_ticks	= KTIME_MAX,
+	.mult			= 1,
+	.shift			= 0,
+	.cpumask		= cpu_all_mask,
+};
+
+static enum hrtimer_restart bc_handler(struct hrtimer *t)
+{
+	ce_broadcast_hrtimer.event_handler(&ce_broadcast_hrtimer);
+	return HRTIMER_NORESTART;
+}
+
+void tick_setup_hrtimer_broadcast(void)
+{
+	hrtimer_init(&bctimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+	bctimer.function = bc_handler;
+	clockevents_register(&ce_broadcast_hrtimer);
+}
Index: linux-2.6/kernel/time/tick-broadcast.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-broadcast.c
+++ linux-2.6/kernel/time/tick-broadcast.c
@@ -630,6 +630,42 @@ again:
 	raw_spin_unlock(&tick_broadcast_lock);
 }
 
+static int broadcast_needs_cpu(struct clock_event_device *bc, int cpu)
+{
+	if (!(bc->features & CLOCK_EVT_FEAT_HRTIMER))
+		return 0;
+	if (bc->next_event.tv64 == KTIME_MAX)
+		return 0;
+	return bc->bound_on == cpu ? -EBUSY : 0;
+}
+
+static void broadcast_shutdown_local(struct clock_event_device *bc,
+				     struct clock_event_device *dev)
+{
+	/*
+	 * For hrtimer based broadcasting we cannot shutdown the cpu
+	 * local device if our own event is the first one to expire or
+	 * if we own the broadcast timer.
+	 */
+	if (bc->features & CLOCK_EVT_FEAT_HRTIMER) {
+		if (broadcast_needs_cpu(bc))
+			return;
+		if (dev->next_event.tv64 < bc->next_event.tv64)
+			return;
+	}
+	clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
+}
+
+static void broadcast_move_bc(int deadcpu)
+{
+	struct clock_event_device *bc = tick_broadcast_device.evtdev;
+
+	if (!bc || !broadcast_needs_cpu(bc, deadcpu))
+		return;
+	/* This moves the broadcast assignment to this cpu */
+	clockevents_program_event(bc, bc->next_event, 1);
+}
+
 /*
  * Powerstate information: The system enters/leaves a state, where
  * affected devices might stop
@@ -639,8 +675,8 @@ void tick_broadcast_oneshot_control(unsi
 	struct clock_event_device *bc, *dev;
 	struct tick_device *td;
 	unsigned long flags;
+	int cpu, ret = 0;
 	ktime_t now;
-	int cpu;
 
 	/*
 	 * Periodic mode does not care about the enter/exit of power
@@ -666,7 +702,7 @@ void tick_broadcast_oneshot_control(unsi
 	if (reason == CLOCK_EVT_NOTIFY_BROADCAST_ENTER) {
 		if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_oneshot_mask)) {
 			WARN_ON_ONCE(cpumask_test_cpu(cpu, tick_broadcast_pending_mask));
-			clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
+			broadcast_shutdown_local(bc, dev);
 			/*
 			 * We only reprogram the broadcast timer if we
 			 * did not mark ourself in the force mask and
@@ -679,6 +715,11 @@ void tick_broadcast_oneshot_control(unsi
 			    dev->next_event.tv64 < bc->next_event.tv64)
 				tick_broadcast_set_event(bc, cpu, dev->next_event, 1);
 		}
+		/*
+		 * If the current CPU owns the hrtimer broadcast
+		 * mechanism, it cannot go deep idle.
+		 */
+		ret = broadcast_needs_cpu(bc, cpu);
 	} else {
 		if (cpumask_test_and_clear_cpu(cpu, tick_broadcast_oneshot_mask)) {
 			clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
@@ -851,6 +892,8 @@ void tick_shutdown_broadcast_oneshot(uns
 	cpumask_clear_cpu(cpu, tick_broadcast_pending_mask);
 	cpumask_clear_cpu(cpu, tick_broadcast_force_mask);
 
+	broadcast_move_bc(cpu);
+
 	raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
 }
 

^ permalink raw reply

* Re: [PATCH 8/8] powerpc: Fix endian issues in crash dump code
From: Anton Blanchard @ 2014-01-22 10:42 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: paulus, linuxppc-dev, Ulrich.Weigand
In-Reply-To: <1387341927.20735.1.camel@concordia>


Hi Michael,
 
> Not my favourite colour :D  What about this instead?
> 
> We could also add of_property_read_u32(), with an implied index of
> zero?
> 
> I don't like the rc handling, but couldn't come up with anything I
> liked better.

Thanks for pointing that out, I didn't realise we had so many
of_property_read_* helpers. I'll be sure to use them from here on :)

Anton

^ permalink raw reply

* [Bug 67811] PASEMI: Kernel 3.13.0 doesn't boot with a PA6T cpu
From: Christian Zigotzky @ 2014-01-22 10:18 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <52D6E928.7050307@xenosoft.de>

[-- Attachment #1: Type: text/plain, Size: 5215 bytes --]

Hi All,

Thanks a lot for your effort to solve the boot problems. Unfortunately, 
this patch doesn't work for the Nemo board. I need the patch created by 
Olof Johansson.

diff -rupN linux-3.13/arch/powerpc/kernel/head_64.S 
linux-3.13-nemo/arch/powerpc/kernel/head_64.S
--- linux-3.13/arch/powerpc/kernel/head_64.S    2014-01-05 
00:12:14.000000000 +0100
+++ linux-3.13-nemo/arch/powerpc/kernel/head_64.S    2014-01-05 
23:06:13.001618802 +0100
@@ -69,6 +69,13 @@ _GLOBAL(__start)
      /* NOP this out unconditionally */
  BEGIN_FTR_SECTION
      FIXUP_ENDIAN
+/* Hack for PWRficient platforms: Due to CFE(?) bug, the 64-bit
+ * word at 0x8 needs to be set to 0. Patch it up here once we're
+ * done executing it (we can be lazy and avoid invalidating
+ * icache)
+ */
+li    r0,0
+std    0,8(0)
      b    .__start_initialization_multiplatform
  END_FTR_SECTION(0, 1)

Is it possible to integrate Olof's patch into the kernel sources?

All the best,

Christian

Am 15.01.14 21:01, schrieb Christian Zigotzky:
> author 	Linus Torvalds <torvalds@linux-foundation.org> 	2014-01-13 
> 03:59:05 (GMT)
> committer 	Linus Torvalds <torvalds@linux-foundation.org> 	2014-01-13 
> 03:59:05 (GMT)
> commit 	a6da83f98267bc8ee4e34aa899169991eb0ceb93 
> <https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a6da83f98267bc8ee4e34aa899169991eb0ceb93> 
> (patch 
> <https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=a6da83f98267bc8ee4e34aa899169991eb0ceb93>) 
>
> tree 	84c228e0a87475dbdb0f72621c137cce8253131b 
> <https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/?id=a6da83f98267bc8ee4e34aa899169991eb0ceb93> 
>
> parent 	061f49ec2d722f485237870f04544d8bec15a778 
> <https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=061f49ec2d722f485237870f04544d8bec15a778> 
> (diff 
> <https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/?id=a6da83f98267bc8ee4e34aa899169991eb0ceb93&id2=061f49ec2d722f485237870f04544d8bec15a778>) 
>
> parent 	10348f5976830e5d8f74e8abb04a9a057a5e8478 
> <https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=10348f5976830e5d8f74e8abb04a9a057a5e8478> 
> (diff 
> <https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/?id=a6da83f98267bc8ee4e34aa899169991eb0ceb93&id2=10348f5976830e5d8f74e8abb04a9a057a5e8478>) 
>
>
> Merge branch 'merge' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
> Pull powerpc fix from Ben Herrenschmidt: "Here's one regression fix 
> for 3.13 that I would appreciate if you could still pull in. It was an 
> "interesting" one to debug, basically it's an old bug that got 
> somewhat "exposed" by new code breaking the boot on PA Semi boards 
> (yes, it does appear that some people are still using these!)" * 
> 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: 
> powerpc: Check return value of instance-to-package OF call
> Diffstat 
> <https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/?id=a6da83f98267bc8ee4e34aa899169991eb0ceb93>
> -rw-r--r-- 	arch/powerpc/kernel/prom_init.c 
> <https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/arch/powerpc/kernel/prom_init.c?id=a6da83f98267bc8ee4e34aa899169991eb0ceb93> 
> 	22 	
>
> 	
> 	
>
> 1 files changed, 13 insertions, 9 deletions
> diff --git a/arch/powerpc/kernel/prom_init.c 
> b/arch/powerpc/kernel/prom_init.c
> index cb64a6e..078145a 100644
> --- a/arch/powerpc/kernel/prom_init.c 
> <https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/kernel/prom_init.c?id=061f49ec2d722f485237870f04544d8bec15a778>
> +++ b/arch/powerpc/kernel/prom_init.c 
> <https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/kernel/prom_init.c?id=a6da83f98267bc8ee4e34aa899169991eb0ceb93>
> @@ -1986,19 +1986,23 @@ static void __init prom_init_stdout(void)
> /* Get the full OF pathname of the stdout device */
> memset(path, 0, 256);
> call_prom("instance-to-path", 3, 1, prom.stdout, path, 255);
> - stdout_node = call_prom("instance-to-package", 1, 1, prom.stdout);
> - val = cpu_to_be32(stdout_node);
> - prom_setprop(prom.chosen, "/chosen", "linux,stdout-package",
> - &val, sizeof(val));
> prom_printf("OF stdout device is: %s\n", of_stdout_device);
> prom_setprop(prom.chosen, "/chosen", "linux,stdout-path",
> path, strlen(path) + 1);
> - /* If it's a display, note it */
> - memset(type, 0, sizeof(type));
> - prom_getprop(stdout_node, "device_type", type, sizeof(type));
> - if (strcmp(type, "display") == 0)
> - prom_setprop(stdout_node, path, "linux,boot-display", NULL, 0);
> + /* instance-to-package fails on PA-Semi */
> + stdout_node = call_prom("instance-to-package", 1, 1, prom.stdout);
> + if (stdout_node != PROM_ERROR) {
> + val = cpu_to_be32(stdout_node);
> + prom_setprop(prom.chosen, "/chosen", "linux,stdout-package",
> + &val, sizeof(val));
> +
> + /* If it's a display, note it */
> + memset(type, 0, sizeof(type));
> + prom_getprop(stdout_node, "device_type", type, sizeof(type));
> + if (strcmp(type, "display") == 0)
> + prom_setprop(stdout_node, path, "linux,boot-display", NULL, 0);
> + }
> }
> static int __init prom_find_machine_type(void)
>


[-- Attachment #2: Type: text/html, Size: 10171 bytes --]

^ permalink raw reply

* Re: [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters
From: Anshuman Khandual @ 2014-01-22  9:37 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Peter Zijlstra, Linux PPC, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Cody P Schafer
In-Reply-To: <1390354379.11104.3.camel@concordia>

On 01/22/2014 07:02 AM, Michael Ellerman wrote:
> On Thu, 2014-01-16 at 15:53 -0800, Cody P Schafer wrote:
>> These patches add basic pmus for 2 powerpc hypervisor interfaces to obtain
>> performance counters: gpci ("get performance counter info") and 24x7.
>>
>> The counters supplied by these interfaces are continually counting and never
>> need to be (and cannot be) disabled or enabled. They additionally do not
>> generate any interrupts. This makes them in some regards similar to software
>> counters, and as a result their implimentation shares some common code (which
>> an initial patch exposes) with the sw counters.
> 
> Hi Cody,
> 
> Can you please add some more explanation of this series.
> 
> In particular why do we need two new PMUs, and how do they relate to each
> other?
> 
> And can you add an example of how I'd actually use them using perf.
> 

Yeah, agreed.

^ permalink raw reply

* Re: [PATCH 0/4] powernv: kvm: numa fault improvement
From: Liu ping fan @ 2014-01-22  8:33 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc
In-Reply-To: <87zjmoiogp.fsf@linux.vnet.ibm.com>

On Wed, Jan 22, 2014 at 1:18 PM, Aneesh Kumar K.V
<aneesh.kumar@linux.vnet.ibm.com> wrote:
> Paul Mackerras <paulus@samba.org> writes:
>
>> On Mon, Jan 20, 2014 at 03:48:36PM +0100, Alexander Graf wrote:
>>>
>>> On 15.01.2014, at 07:36, Liu ping fan <kernelfans@gmail.com> wrote:
>>>
>>> > On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf <agraf@suse.de> wrote:
>>> >>
>>> >> On 11.12.2013, at 09:47, Liu Ping Fan <kernelfans@gmail.com> wrote:
>>> >>
>>> >>> This series is based on Aneesh's series  "[PATCH -V2 0/5] powerpc: mm: Numa faults support for ppc64"
>>> >>>
>>> >>> For this series, I apply the same idea from the previous thread "[PATCH 0/3] optimize for powerpc _PAGE_NUMA"
>>> >>> (for which, I still try to get a machine to show nums)
>>> >>>
>>> >>> But for this series, I think that I have a good justification -- the fact of heavy cost when switching context between guest and host,
>>> >>> which is  well known.
>>> >>
>>> >> This cover letter isn't really telling me anything. Please put a proper description of what you're trying to achieve, why you're trying to achieve what you're trying and convince your readers that it's a good idea to do it the way you do it.
>>> >>
>>> > Sorry for the unclear message. After introducing the _PAGE_NUMA,
>>> > kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it
>>> > should rely on host's kvmppc_book3s_hv_page_fault() to call
>>> > do_numa_page() to do the numa fault check. This incurs the overhead
>>> > when exiting from rmode to vmode.  My idea is that in
>>> > kvmppc_do_h_enter(), we do a quick check, if the page is right placed,
>>> > there is no need to exit to vmode (i.e saving htab, slab switching)
>>> >
>>> >>> If my suppose is correct, will CCing kvm@vger.kernel.org from next version.
>>> >>
>>> >> This translates to me as "This is an RFC"?
>>> >>
>>> > Yes, I am not quite sure about it. I have no bare-metal to verify it.
>>> > So I hope at least, from the theory, it is correct.
>>>
>>> Paul, could you please give this some thought and maybe benchmark it?
>>
>> OK, once I get Aneesh to tell me how I get to have ptes with
>> _PAGE_NUMA set in the first place. :)
>>
>
> I guess we want patch 2, Which Liu has sent separately and I have
> reviewed. http://article.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/8619
> I am not sure about the rest of the patches in the series.
> We definitely don't want to numa migrate on henter. We may want to do
> that on fault. But even there, IMHO, we should let the host take the
> fault and do the numa migration instead of doing this in guest context.
>
My patch does NOT do the numa migration in guest context( h_enter).
Instead it just do a pre-check to see whether the numa migration is
needed. If needed, the host will take the fault and do the numa
migration as it currently does. Otherwise, h_enter can directly setup
hpte without HPTE_V_ABSENT.
And since pte_mknuma() is called system-wide periodly, so it has more
possibility that guest will suffer from HPTE_V_ABSENT.(as my previous
reply, I think we should also place the quick check in
kvmppc_hpte_hv_fault )

Thx,
Fan

> -aneesh
>

^ permalink raw reply

* Re: [PATCH V2] cpuidle/governors: Fix logic in selection of idle states
From: Daniel Lezcano @ 2014-01-22  8:29 UTC (permalink / raw)
  To: Preeti U Murthy, svaidy, linux-pm, benh, rjw, linux-kernel,
	srivatsa.bhat, paulmck, linuxppc-dev, tuukka.tikkanen
In-Reply-To: <20140117043351.21531.14192.stgit@preeti.in.ibm.com>

On 01/17/2014 05:33 AM, Preeti U Murthy wrote:
> The cpuidle governors today are not handling scenarios where no idle state
> can be chosen. Such scenarios coud arise if the user has disabled all the
> idle states at runtime or the latency requirement from the cpus is very strict.
>
> The menu governor returns 0th index of the idle state table when no other
> idle state is suitable. This is even when the idle state corresponding to this
> index is disabled or the latency requirement is strict and the exit_latency
> of the lowest idle state is also not acceptable. Hence this patch
> fixes this logic in the menu governor by defaulting to an idle state index
> of -1 unless any other state is suitable.
>
> The ladder governor needs a few more fixes in addition to that required in the
> menu governor. When the ladder governor decides to demote the idle state of a
> CPU, it does not check if the lower idle states are enabled. Add this logic
> in addition to the logic where it chooses an index of -1 if it can neither
> promote or demote the idle state of a cpu nor can it choose the current idle
> state.
>
> The cpuidle_idle_call() will return back if the governor decides upon not
> entering any idle state. However it cannot return an error code because all
> archs have the logic today that if the call to cpuidle_idle_call() fails, it
> means that the cpuidle driver failed to *function*; for instance due to
> errors during registration. As a result they end up deciding upon a
> default idle state on their own, which could very well be a deep idle state.
> This is incorrect in cases where no idle state is suitable.
>
> Besides for the scenario that this patch is addressing, the call actually
> succeeds. Its just that no idle state is thought to be suitable by the governors.
> Under such a circumstance return success code without entering any idle
> state.
>
> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>
> Changes from V1:https://lkml.org/lkml/2014/1/14/26
>
> 1. Change the return code to success from -EINVAL due to the reason mentioned
> in the changelog.
> 2. Add logic that the patch is addressing in the ladder governor as well.
> 3. Added relevant comments and removed redundant logic as suggested in the
> above thread.
> ---
>
>   drivers/cpuidle/cpuidle.c          |   15 +++++-
>   drivers/cpuidle/governors/ladder.c |   98 ++++++++++++++++++++++++++----------
>   drivers/cpuidle/governors/menu.c   |    7 +--
>   3 files changed, 89 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index a55e68f..831b664 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>
>   	/* ask the governor for the next state */
>   	next_state = cpuidle_curr_governor->select(drv, dev);
> +
> +	dev->last_residency = 0;
>   	if (need_resched()) {
> -		dev->last_residency = 0;

Why do you need to do this change ? ^^^^^

>   		/* give the governor an opportunity to reflect on the outcome */
>   		if (cpuidle_curr_governor->reflect)
>   			cpuidle_curr_governor->reflect(dev, next_state);
> @@ -140,6 +141,18 @@ int cpuidle_idle_call(void)
>   		return 0;
>   	}
>
> +	/* Unlike in the need_resched() case, we return here because the
> +	 * governor did not find a suitable idle state. However idle is still
> +	 * in progress as we are not asked to reschedule. Hence we return
> +	 * without enabling interrupts.

That will lead to a WARN.

> +	 * NOTE: The return code should still be success, since the verdict of this
> +	 * call is "do not enter any idle state" and not a failed call due to
> +	 * errors.
> +	 */
> +	if (next_state < 0)
> +		return 0;
> +

Returning from here breaks the symmetry of the trace.

>   	trace_cpu_idle_rcuidle(next_state, dev->cpu);
>
>   	broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
> diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
> index 9f08e8c..f495f57 100644
> --- a/drivers/cpuidle/governors/ladder.c
> +++ b/drivers/cpuidle/governors/ladder.c
> @@ -58,6 +58,36 @@ static inline void ladder_do_selection(struct ladder_device *ldev,
>   	ldev->last_state_idx = new_idx;
>   }
>
> +static int can_promote(struct ladder_device *ldev, int last_idx,
> +				int last_residency)
> +{
> +	struct ladder_device_state *last_state;
> +
> +	last_state = &ldev->states[last_idx];
> +	if (last_residency > last_state->threshold.promotion_time) {
> +		last_state->stats.promotion_count++;
> +		last_state->stats.demotion_count = 0;
> +		if (last_state->stats.promotion_count >= last_state->threshold.promotion_count)
> +			return 1;
> +	}
> +	return 0;
> +}
> +
> +static int can_demote(struct ladder_device *ldev, int last_idx,
> +			int last_residency)
> +{
> +	struct ladder_device_state *last_state;
> +
> +	last_state = &ldev->states[last_idx];
> +	if (last_residency < last_state->threshold.demotion_time) {
> +		last_state->stats.demotion_count++;
> +		last_state->stats.promotion_count = 0;
> +		if (last_state->stats.demotion_count >= last_state->threshold.demotion_count)
> +			return 1;
> +	}
> +	return 0;
> +}
> +
>   /**
>    * ladder_select_state - selects the next state to enter
>    * @drv: cpuidle driver
> @@ -73,29 +103,33 @@ static int ladder_select_state(struct cpuidle_driver *drv,
>
>   	/* Special case when user has set very strict latency requirement */
>   	if (unlikely(latency_req == 0)) {
> -		ladder_do_selection(ldev, last_idx, 0);
> -		return 0;
> +		if (last_idx >= 0)
> +			ladder_do_selection(ldev, last_idx, -1);
> +		goto out;
>   	}
>
> -	last_state = &ldev->states[last_idx];
> -
> -	if (drv->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID) {
> -		last_residency = cpuidle_get_last_residency(dev) - \
> -					 drv->states[last_idx].exit_latency;
> +	if (last_idx >= 0) {
> +		if (drv->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID) {
> +			last_residency = cpuidle_get_last_residency(dev) - \
> +						 drv->states[last_idx].exit_latency;
> +		} else {
> +			last_state = &ldev->states[last_idx];
> +			last_residency = last_state->threshold.promotion_time + 1;
> +		}
>   	}
> -	else
> -		last_residency = last_state->threshold.promotion_time + 1;
>
>   	/* consider promotion */
>   	if (last_idx < drv->state_count - 1 &&
>   	    !drv->states[last_idx + 1].disabled &&
>   	    !dev->states_usage[last_idx + 1].disable &&
> -	    last_residency > last_state->threshold.promotion_time &&
>   	    drv->states[last_idx + 1].exit_latency <= latency_req) {
> -		last_state->stats.promotion_count++;
> -		last_state->stats.demotion_count = 0;
> -		if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) {
> -			ladder_do_selection(ldev, last_idx, last_idx + 1);
> +		if (last_idx >= 0) {
> +			if (can_promote(ldev, last_idx, last_residency)) {
> +				ladder_do_selection(ldev, last_idx, last_idx + 1);
> +				return last_idx + 1;
> +			}
> +		} else {
> +			ldev->last_state_idx = last_idx + 1;
>   			return last_idx + 1;
>   		}
>   	}
> @@ -107,26 +141,36 @@ static int ladder_select_state(struct cpuidle_driver *drv,
>   	    drv->states[last_idx].exit_latency > latency_req)) {
>   		int i;
>
> -		for (i = last_idx - 1; i > CPUIDLE_DRIVER_STATE_START; i--) {
> -			if (drv->states[i].exit_latency <= latency_req)
> +		for (i = last_idx - 1; i >= CPUIDLE_DRIVER_STATE_START; i--) {
> +			if (drv->states[i].exit_latency <= latency_req &&
> +				!(drv->states[i].disabled || dev->states_usage[i].disable))
>   				break;
>   		}
> -		ladder_do_selection(ldev, last_idx, i);
> -		return i;
> +		if (i >= 0) {
> +			ladder_do_selection(ldev, last_idx, i);
> +			return i;
> +		}
> +		goto out;
>   	}
>
> -	if (last_idx > CPUIDLE_DRIVER_STATE_START &&
> -	    last_residency < last_state->threshold.demotion_time) {
> -		last_state->stats.demotion_count++;
> -		last_state->stats.promotion_count = 0;
> -		if (last_state->stats.demotion_count >= last_state->threshold.demotion_count) {
> -			ladder_do_selection(ldev, last_idx, last_idx - 1);
> -			return last_idx - 1;
> +	if (last_idx > CPUIDLE_DRIVER_STATE_START) {
> +		int i = last_idx - 1;
> +
> +		if (can_demote(ldev, last_idx, last_residency) &&
> +			!(drv->states[i].disabled || dev->states_usage[i].disable)) {
> +			ladder_do_selection(ldev, last_idx, i);
> +			return i;
>   		}
> +		/* We come here when the last_idx is still a suitable idle state, just that
> +		 * promotion or demotion is not ideal.
> +		 */
> +		ldev->last_state_idx = last_idx;
> +		return last_idx;
>   	}
>
> -	/* otherwise remain at the current state */
> -	return last_idx;
> +	/* we come here if no idle state is suitable */
> +out:	ldev->last_state_idx = -1;
> +	return ldev->last_state_idx;
>   }
>
>   /**
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index cf7f2f0..e9f17ce 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -297,12 +297,12 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>   		data->needs_update = 0;
>   	}
>
> -	data->last_state_idx = 0;
> +	data->last_state_idx = -1;
>   	data->exit_us = 0;
>
>   	/* Special case when user has set very strict latency requirement */
>   	if (unlikely(latency_req == 0))
> -		return 0;
> +		return data->last_state_idx;
>
>   	/* determine the expected residency time, round up */
>   	t = ktime_to_timespec(tick_nohz_get_sleep_length());
> @@ -368,7 +368,8 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>   /**
>    * menu_reflect - records that data structures need update
>    * @dev: the CPU
> - * @index: the index of actual entered state
> + * @index: the index of actual entered state or -1 if no idle state is
> + * suitable.
>    *
>    * NOTE: it's important to be fast here because this operation will add to
>    *       the overall exit latency.
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply

* [RESEND PATCH V5 8/8] cpuidle/powernv: Parse device tree to setup idle states
From: Preeti U Murthy @ 2014-01-22  7:09 UTC (permalink / raw)
  To: peterz, fweisbec, paul.gortmaker, paulus, mingo, mikey, shangw,
	rafael.j.wysocki, galak, =daniel.lezcano, benh, paulmck,
	--to=agraf, arnd, linux-pm, rostedt, michael, john.stultz, anton,
	tglx, chenhui.zhao, deepthi, r58472, geoff, linux-kernel,
	srivatsa.bhat, schwidefsky, svaidy, linuxppc-dev
In-Reply-To: <20140122065918.30650.22437.stgit@preeti.in.ibm.com>

Add deep idle states such as nap and fast sleep to the cpuidle state table
only if they are discovered from the device tree during cpuidle initialization.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---

 drivers/cpuidle/cpuidle-powernv.c |   81 +++++++++++++++++++++++++++++--------
 1 file changed, 64 insertions(+), 17 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 90f0c2b..b3face5 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -12,10 +12,17 @@
 #include <linux/cpu.h>
 #include <linux/notifier.h>
 #include <linux/clockchips.h>
+#include <linux/of.h>
 
 #include <asm/machdep.h>
 #include <asm/firmware.h>
 
+/* Flags and constants used in PowerNV platform */
+
+#define MAX_POWERNV_IDLE_STATES	8
+#define IDLE_USE_INST_NAP	0x00010000 /* Use nap instruction */
+#define IDLE_USE_INST_SLEEP	0x00020000 /* Use sleep instruction */
+
 struct cpuidle_driver powernv_idle_driver = {
 	.name             = "powernv_idle",
 	.owner            = THIS_MODULE,
@@ -87,7 +94,7 @@ static int fastsleep_loop(struct cpuidle_device *dev,
 /*
  * States for dedicated partition case.
  */
-static struct cpuidle_state powernv_states[] = {
+static struct cpuidle_state powernv_states[MAX_POWERNV_IDLE_STATES] = {
 	{ /* Snooze */
 		.name = "snooze",
 		.desc = "snooze",
@@ -95,20 +102,6 @@ static struct cpuidle_state powernv_states[] = {
 		.exit_latency = 0,
 		.target_residency = 0,
 		.enter = &snooze_loop },
-	{ /* NAP */
-		.name = "NAP",
-		.desc = "NAP",
-		.flags = CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 10,
-		.target_residency = 100,
-		.enter = &nap_loop },
-	 { /* Fastsleep */
-		.name = "fastsleep",
-		.desc = "fastsleep",
-		.flags = CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 10,
-		.target_residency = 100,
-		.enter = &fastsleep_loop },
 };
 
 static int powernv_cpuidle_add_cpu_notifier(struct notifier_block *n,
@@ -169,19 +162,73 @@ static int powernv_cpuidle_driver_init(void)
 	return 0;
 }
 
+static int powernv_add_idle_states(void)
+{
+	struct device_node *power_mgt;
+	struct property *prop;
+	int nr_idle_states = 1; /* Snooze */
+	int dt_idle_states;
+	u32 *flags;
+	int i;
+
+	/* Currently we have snooze statically defined */
+
+	power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
+	if (!power_mgt) {
+		pr_warn("opal: PowerMgmt Node not found\n");
+		return nr_idle_states;
+	}
+
+	prop = of_find_property(power_mgt, "ibm,cpu-idle-state-flags", NULL);
+	if (!prop) {
+		pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n");
+		return nr_idle_states;
+	}
+
+	dt_idle_states = prop->length / sizeof(u32);
+	flags = (u32 *) prop->value;
+
+	for (i = 0; i < dt_idle_states; i++) {
+
+		if (flags[i] & IDLE_USE_INST_NAP) {
+			/* Add NAP state */
+			strcpy(powernv_states[nr_idle_states].name, "Nap");
+			strcpy(powernv_states[nr_idle_states].desc, "Nap");
+			powernv_states[nr_idle_states].flags = CPUIDLE_FLAG_TIME_VALID;
+			powernv_states[nr_idle_states].exit_latency = 10;
+			powernv_states[nr_idle_states].target_residency = 100;
+			powernv_states[nr_idle_states].enter = &nap_loop;
+			nr_idle_states++;
+		}
+
+		if (flags[i] & IDLE_USE_INST_SLEEP) {
+			/* Add FASTSLEEP state */
+			strcpy(powernv_states[nr_idle_states].name, "FastSleep");
+			strcpy(powernv_states[nr_idle_states].desc, "FastSleep");
+			powernv_states[nr_idle_states].flags = CPUIDLE_FLAG_TIME_VALID;
+			powernv_states[nr_idle_states].exit_latency = 300;
+			powernv_states[nr_idle_states].target_residency = 1000000;
+			powernv_states[nr_idle_states].enter = &fastsleep_loop;
+			nr_idle_states++;
+		}
+	}
+
+	return nr_idle_states;
+}
+
 /*
  * powernv_idle_probe()
  * Choose state table for shared versus dedicated partition
  */
 static int powernv_idle_probe(void)
 {
-
 	if (cpuidle_disable != IDLE_NO_OVERRIDE)
 		return -ENODEV;
 
 	if (firmware_has_feature(FW_FEATURE_OPALv3)) {
 		cpuidle_state_table = powernv_states;
-		max_idle_state = ARRAY_SIZE(powernv_states);
+		/* Device tree can indicate more idle states */
+		max_idle_state = powernv_add_idle_states();
  	} else
  		return -ENODEV;
 

^ permalink raw reply related

* [RESEND PATCH V5 7/8] cpuidle/powernv: Add "Fast-Sleep" CPU idle state
From: Preeti U Murthy @ 2014-01-22  7:09 UTC (permalink / raw)
  To: peterz, fweisbec, paul.gortmaker, paulus, mingo, mikey, shangw,
	rafael.j.wysocki, galak, =daniel.lezcano, benh, paulmck,
	--to=agraf, arnd, linux-pm, rostedt, michael, john.stultz, anton,
	tglx, chenhui.zhao, deepthi, r58472, geoff, linux-kernel,
	srivatsa.bhat, schwidefsky, svaidy, linuxppc-dev
In-Reply-To: <20140122065918.30650.22437.stgit@preeti.in.ibm.com>

Fast sleep is one of the deep idle states on Power8 in which local timers of
CPUs stop. On PowerPC we do not have an external clock device which can
handle wakeup of such CPUs. Now that we have the support in the tick broadcast
framework for archs that do not sport such a device and the low level support
for fast sleep, enable it in the cpuidle framework on PowerNV.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---

 arch/powerpc/Kconfig              |    2 ++
 arch/powerpc/kernel/time.c        |    2 +-
 drivers/cpuidle/cpuidle-powernv.c |   42 +++++++++++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index fa39517..ec91584 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -129,6 +129,8 @@ config PPC
 	select GENERIC_CMOS_UPDATE
 	select GENERIC_TIME_VSYSCALL_OLD
 	select GENERIC_CLOCKEVENTS
+	select GENERIC_CLOCKEVENTS_BROADCAST
+	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select GENERIC_STRNCPY_FROM_USER
 	select GENERIC_STRNLEN_USER
 	select HAVE_MOD_ARCH_SPECIFIC
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index df2989b..95fa5ce 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -106,7 +106,7 @@ struct clock_event_device decrementer_clockevent = {
 	.irq            = 0,
 	.set_next_event = decrementer_set_next_event,
 	.set_mode       = decrementer_set_mode,
-	.features       = CLOCK_EVT_FEAT_ONESHOT,
+	.features       = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP,
 };
 EXPORT_SYMBOL(decrementer_clockevent);
 
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 78fd174..90f0c2b 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -11,6 +11,7 @@
 #include <linux/cpuidle.h>
 #include <linux/cpu.h>
 #include <linux/notifier.h>
+#include <linux/clockchips.h>
 
 #include <asm/machdep.h>
 #include <asm/firmware.h>
@@ -49,6 +50,40 @@ static int nap_loop(struct cpuidle_device *dev,
 	return index;
 }
 
+static int fastsleep_loop(struct cpuidle_device *dev,
+				struct cpuidle_driver *drv,
+				int index)
+{
+	int cpu = dev->cpu;
+	unsigned long old_lpcr = mfspr(SPRN_LPCR);
+	unsigned long new_lpcr;
+
+	if (unlikely(system_state < SYSTEM_RUNNING))
+		return index;
+
+	new_lpcr = old_lpcr;
+	new_lpcr &= ~(LPCR_MER | LPCR_PECE); /* lpcr[mer] must be 0 */
+
+	/* exit powersave upon external interrupt, but not decrementer
+	 * interrupt, Emulate sleep.
+	 */
+	new_lpcr |= LPCR_PECE0;
+
+	if (clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu)) {
+		new_lpcr |= LPCR_PECE1;
+		mtspr(SPRN_LPCR, new_lpcr);
+		power7_nap();
+	} else {
+		mtspr(SPRN_LPCR, new_lpcr);
+		power7_sleep();
+	}
+	clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
+
+	mtspr(SPRN_LPCR, old_lpcr);
+
+	return index;
+}
+
 /*
  * States for dedicated partition case.
  */
@@ -67,6 +102,13 @@ static struct cpuidle_state powernv_states[] = {
 		.exit_latency = 10,
 		.target_residency = 100,
 		.enter = &nap_loop },
+	 { /* Fastsleep */
+		.name = "fastsleep",
+		.desc = "fastsleep",
+		.flags = CPUIDLE_FLAG_TIME_VALID,
+		.exit_latency = 10,
+		.target_residency = 100,
+		.enter = &fastsleep_loop },
 };
 
 static int powernv_cpuidle_add_cpu_notifier(struct notifier_block *n,

^ permalink raw reply related

* [RESEND PATCH V5 6/8] time/cpuidle: Support in tick broadcast framework in the absence of external clock device
From: Preeti U Murthy @ 2014-01-22  7:09 UTC (permalink / raw)
  To: peterz, fweisbec, paul.gortmaker, paulus, mingo, mikey, shangw,
	rafael.j.wysocki, galak, =daniel.lezcano, benh, paulmck,
	--to=agraf, arnd, linux-pm, rostedt, michael, john.stultz, anton,
	tglx, chenhui.zhao, deepthi, r58472, geoff, linux-kernel,
	srivatsa.bhat, schwidefsky, svaidy, linuxppc-dev
In-Reply-To: <20140122065918.30650.22437.stgit@preeti.in.ibm.com>

On some architectures, in certain CPU deep idle states the local timers stop.
An external clock device is used to wakeup these CPUs. The kernel support for the
wakeup of these CPUs is provided by the tick broadcast framework by using the
external clock device as the wakeup source.

However not all implementations of architectures provide such an external
clock device such as some PowerPC ones. This patch includes support in the
broadcast framework to handle the wakeup of the CPUs in deep idle states on such
systems by queuing a hrtimer on one of the CPUs, meant to handle the wakeup of
CPUs in deep idle states. This CPU is identified as the bc_cpu.

Each time the hrtimer expires, it is reprogrammed for the next wakeup of the
CPUs in deep idle state after handling broadcast. However when a CPU is about
to enter  deep idle state with its wakeup time earlier than the time at which
the hrtimer is currently programmed, it *becomes the new bc_cpu* and restarts
the hrtimer on itself. This way the job of doing broadcast is handed around to
the CPUs that ask for the earliest wakeup just before entering deep idle
state. This is consistent with what happens in cases where an external clock
device is present. The smp affinity of this clock device is set to the CPU
with the earliest wakeup.

The important point here is that the bc_cpu cannot enter deep idle state
since it has a hrtimer queued to wakeup the other CPUs in deep idle. Hence it
cannot have its local timer stopped. Therefore for such a CPU, the
BROADCAST_ENTER notification has to fail implying that it cannot enter deep
idle state. On architectures where an external clock device is present, all
CPUs can enter deep idle.

During hotplug of the bc_cpu, the job of doing a broadcast is assigned to the
first cpu in the broadcast mask. This newly nominated bc_cpu is woken up by
an IPI so as to queue the above mentioned hrtimer on it.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---

 include/linux/clockchips.h   |    4 -
 kernel/time/clockevents.c    |    9 +-
 kernel/time/tick-broadcast.c |  192 ++++++++++++++++++++++++++++++++++++++----
 kernel/time/tick-internal.h  |    8 +-
 4 files changed, 186 insertions(+), 27 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 493aa02..bbda37b 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -186,9 +186,9 @@ static inline int tick_check_broadcast_expired(void) { return 0; }
 #endif
 
 #ifdef CONFIG_GENERIC_CLOCKEVENTS
-extern void clockevents_notify(unsigned long reason, void *arg);
+extern int clockevents_notify(unsigned long reason, void *arg);
 #else
-static inline void clockevents_notify(unsigned long reason, void *arg) {}
+static inline int clockevents_notify(unsigned long reason, void *arg) {}
 #endif
 
 #else /* CONFIG_GENERIC_CLOCKEVENTS_BUILD */
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 086ad60..d61404e 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -524,12 +524,13 @@ void clockevents_resume(void)
 #ifdef CONFIG_GENERIC_CLOCKEVENTS
 /**
  * clockevents_notify - notification about relevant events
+ * Returns non zero on error.
  */
-void clockevents_notify(unsigned long reason, void *arg)
+int clockevents_notify(unsigned long reason, void *arg)
 {
 	struct clock_event_device *dev, *tmp;
 	unsigned long flags;
-	int cpu;
+	int cpu, ret = 0;
 
 	raw_spin_lock_irqsave(&clockevents_lock, flags);
 
@@ -542,11 +543,12 @@ void clockevents_notify(unsigned long reason, void *arg)
 
 	case CLOCK_EVT_NOTIFY_BROADCAST_ENTER:
 	case CLOCK_EVT_NOTIFY_BROADCAST_EXIT:
-		tick_broadcast_oneshot_control(reason);
+		ret = tick_broadcast_oneshot_control(reason);
 		break;
 
 	case CLOCK_EVT_NOTIFY_CPU_DYING:
 		tick_handover_do_timer(arg);
+		tick_handover_broadcast_cpu(arg);
 		break;
 
 	case CLOCK_EVT_NOTIFY_SUSPEND:
@@ -585,6 +587,7 @@ void clockevents_notify(unsigned long reason, void *arg)
 		break;
 	}
 	raw_spin_unlock_irqrestore(&clockevents_lock, flags);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(clockevents_notify);
 
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 9532690..1c23912 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -20,6 +20,7 @@
 #include <linux/sched.h>
 #include <linux/smp.h>
 #include <linux/module.h>
+#include <linux/slab.h>
 
 #include "tick-internal.h"
 
@@ -35,6 +36,15 @@ static cpumask_var_t tmpmask;
 static DEFINE_RAW_SPINLOCK(tick_broadcast_lock);
 static int tick_broadcast_force;
 
+/*
+ * Helper variables for handling broadcast in the absence of a
+ * tick_broadcast_device.
+ * */
+static struct hrtimer *bc_hrtimer;
+static int bc_cpu = -1;
+static ktime_t bc_next_wakeup;
+static int hrtimer_initialized = 0;
+
 #ifdef CONFIG_TICK_ONESHOT
 static void tick_broadcast_clear_oneshot(int cpu);
 #else
@@ -528,6 +538,20 @@ static int tick_broadcast_set_event(struct clock_event_device *bc, int cpu,
 	return ret;
 }
 
+static void tick_broadcast_set_next_wakeup(int cpu, ktime_t expires, int force)
+{
+	struct clock_event_device *bc;
+
+	bc = tick_broadcast_device.evtdev;
+
+	if (bc) {
+		tick_broadcast_set_event(bc, cpu, expires, force);
+	} else {
+		hrtimer_start(bc_hrtimer, expires, HRTIMER_MODE_ABS_PINNED);
+		bc_cpu = cpu;
+	}
+}
+
 int tick_resume_broadcast_oneshot(struct clock_event_device *bc)
 {
 	clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT);
@@ -558,15 +582,13 @@ void tick_check_oneshot_broadcast(int cpu)
 /*
  * Handle oneshot mode broadcasting
  */
-static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
+static int tick_oneshot_broadcast(void)
 {
 	struct tick_device *td;
 	ktime_t now, next_event;
 	int cpu, next_cpu = 0;
 
-	raw_spin_lock(&tick_broadcast_lock);
-again:
-	dev->next_event.tv64 = KTIME_MAX;
+	bc_next_wakeup.tv64 = KTIME_MAX;
 	next_event.tv64 = KTIME_MAX;
 	cpumask_clear(tmpmask);
 	now = ktime_get();
@@ -620,34 +642,95 @@ again:
 	 * in the event mask
 	 */
 	if (next_event.tv64 != KTIME_MAX) {
-		/*
-		 * Rearm the broadcast device. If event expired,
-		 * repeat the above
-		 */
-		if (tick_broadcast_set_event(dev, next_cpu, next_event, 0))
+		bc_next_wakeup = next_event;
+	}
+
+	return next_cpu;
+}
+
+/*
+ * Handler in oneshot mode for the external clock device
+ */
+static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
+{
+	int next_cpu;
+
+	raw_spin_lock(&tick_broadcast_lock);
+
+again:	next_cpu = tick_oneshot_broadcast();
+	/*
+	 * Rearm the broadcast device. If event expired,
+	 * repeat the above
+	 */
+	if (bc_next_wakeup.tv64 != KTIME_MAX)
+		if (tick_broadcast_set_event(dev, next_cpu, bc_next_wakeup, 0))
 			goto again;
+
+	raw_spin_unlock(&tick_broadcast_lock);
+}
+
+/*
+ * Handler in oneshot mode for the hrtimer queued when there is no external
+ * clock device.
+ */
+static enum hrtimer_restart handle_broadcast(struct hrtimer *hrtmr)
+{
+	ktime_t now, interval;
+
+	raw_spin_lock(&tick_broadcast_lock);
+	tick_oneshot_broadcast();
+
+	now = ktime_get();
+
+	if (bc_next_wakeup.tv64 != KTIME_MAX) {
+		interval = ktime_sub(bc_next_wakeup, now);
+		hrtimer_forward_now(bc_hrtimer, interval);
+		raw_spin_unlock(&tick_broadcast_lock);
+		return HRTIMER_RESTART;
 	}
 	raw_spin_unlock(&tick_broadcast_lock);
+	return HRTIMER_NORESTART;
+}
+
+/* The CPU could be asked to take over from the previous bc_cpu,
+ * if it is being hotplugged out.
+ */
+static void tick_broadcast_exit_check(int cpu)
+{
+	if (cpu == bc_cpu)
+		hrtimer_start(bc_hrtimer, bc_next_wakeup,
+				HRTIMER_MODE_ABS_PINNED);
+}
+
+static int can_enter_broadcast(int cpu)
+{
+	return cpu != bc_cpu;
 }
 
 /*
  * Powerstate information: The system enters/leaves a state, where
  * affected devices might stop
+ *
+ * Returns non zero value if the entry into broadcast framework failed
+ * This scenario can arise on certain implementations of archs which do
+ * not have an external clock device to do the broadcast. Then one of the
+ * CPUs get nominated to handle broadcasting.
+ * Such a CPU cannot enter a state where its tick device can stop.
  */
-void tick_broadcast_oneshot_control(unsigned long reason)
+int tick_broadcast_oneshot_control(unsigned long reason)
 {
-	struct clock_event_device *bc, *dev;
+	struct clock_event_device *dev;
 	struct tick_device *td;
 	unsigned long flags;
 	ktime_t now;
-	int cpu;
+	int cpu, ret = 0;
 
 	/*
 	 * Periodic mode does not care about the enter/exit of power
 	 * states
 	 */
 	if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
-		return;
+		return ret;
 
 	/*
 	 * We are called with preemtion disabled from the depth of the
@@ -658,9 +741,8 @@ void tick_broadcast_oneshot_control(unsigned long reason)
 	dev = td->evtdev;
 
 	if (!(dev->features & CLOCK_EVT_FEAT_C3STOP))
-		return;
+		return ret;
 
-	bc = tick_broadcast_device.evtdev;
 
 	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
 	if (reason == CLOCK_EVT_NOTIFY_BROADCAST_ENTER) {
@@ -676,12 +758,22 @@ void tick_broadcast_oneshot_control(unsigned long reason)
 			 * woken by the IPI right away.
 			 */
 			if (!cpumask_test_cpu(cpu, tick_broadcast_force_mask) &&
-			    dev->next_event.tv64 < bc->next_event.tv64)
-				tick_broadcast_set_event(bc, cpu, dev->next_event, 1);
+			    dev->next_event.tv64 < bc_next_wakeup.tv64) {
+				bc_next_wakeup = dev->next_event;
+				tick_broadcast_set_next_wakeup(cpu, dev->next_event, 1);
+			}
+
+			if (!can_enter_broadcast(cpu)) {
+				cpumask_clear_cpu(cpu, tick_broadcast_oneshot_mask);
+				clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
+				ret = 1;
+			}
 		}
 	} else {
 		if (cpumask_test_and_clear_cpu(cpu, tick_broadcast_oneshot_mask)) {
 			clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
+
+			tick_broadcast_exit_check(cpu);
 			/*
 			 * The cpu which was handling the broadcast
 			 * timer marked this cpu in the broadcast
@@ -746,6 +838,7 @@ void tick_broadcast_oneshot_control(unsigned long reason)
 	}
 out:
 	raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
+	return ret;
 }
 
 /*
@@ -821,17 +914,57 @@ void tick_broadcast_switch_to_oneshot(void)
 {
 	struct clock_event_device *bc;
 	unsigned long flags;
+	int cpu = smp_processor_id();
 
 	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
 
+	bc_next_wakeup.tv64 = KTIME_MAX;
+
 	tick_broadcast_device.mode = TICKDEV_MODE_ONESHOT;
 	bc = tick_broadcast_device.evtdev;
-	if (bc)
+	if (bc) {
 		tick_broadcast_setup_oneshot(bc);
+		bc_next_wakeup = bc->next_event;
+	} else if (hrtimer_initialized) {
+
+		/*
+		 * There may be CPUs waiting for periodic broadcast. We need
+		 * to set the oneshot bits for those and program the hrtimer
+		 * to fire at the next tick period.
+ 		 */
+		cpumask_copy(tmpmask, tick_broadcast_mask);
+		cpumask_clear_cpu(cpu, tmpmask);
+		cpumask_or(tick_broadcast_oneshot_mask,
+			   tick_broadcast_oneshot_mask, tmpmask);
+
+		if (!cpumask_empty(tmpmask)) {
+			tick_broadcast_init_next_event(tmpmask,
+						       tick_next_period);
+			hrtimer_start(bc_hrtimer, tick_next_period, HRTIMER_MODE_ABS_PINNED);
+			bc_next_wakeup = tick_next_period;
+		}
+	}
 
 	raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
 }
 
+/*
+ * Use the broadcast function itself to wake up the new broadcast cpu
+ */
+void tick_handover_broadcast_cpu(int *cpup)
+{
+	struct tick_device *td;
+
+	if (*cpup == bc_cpu) {
+		int cpu = cpumask_first(tick_broadcast_oneshot_mask);
+
+		bc_cpu = (cpu < nr_cpu_ids) ? cpu : -1;
+		if (bc_cpu != -1) {
+			td = &per_cpu(tick_cpu_device, bc_cpu);
+			td->evtdev->broadcast(cpumask_of(bc_cpu));
+		}
+	}
+}
 
 /*
  * Remove a dead CPU from broadcasting
@@ -868,8 +1001,29 @@ int tick_broadcast_oneshot_active(void)
 bool tick_broadcast_oneshot_available(void)
 {
 	struct clock_event_device *bc = tick_broadcast_device.evtdev;
+	bool ret = true;
+	unsigned long flags;
 
-	return bc ? bc->features & CLOCK_EVT_FEAT_ONESHOT : false;
+	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
+
+	if (bc) {
+		ret = bc->features & CLOCK_EVT_FEAT_ONESHOT;
+	} else if (!hrtimer_initialized) {
+		/* An alternative to tick_broadcast_device on archs which do not have
+		 * an external device
+		 */
+		bc_hrtimer = kmalloc(sizeof(*bc_hrtimer), GFP_NOWAIT);
+		if (!bc_hrtimer) {
+			ret = false;
+			goto out;
+		}
+		hrtimer_init(bc_hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
+		bc_hrtimer->function = handle_broadcast;
+		hrtimer_initialized = 1;
+	}
+
+out:	raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
+	return ret;
 }
 
 #endif
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index 18e71f7..9e42177 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -46,23 +46,25 @@ extern int tick_switch_to_oneshot(void (*handler)(struct clock_event_device *));
 extern void tick_resume_oneshot(void);
 # ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
 extern void tick_broadcast_setup_oneshot(struct clock_event_device *bc);
-extern void tick_broadcast_oneshot_control(unsigned long reason);
+extern int tick_broadcast_oneshot_control(unsigned long reason);
 extern void tick_broadcast_switch_to_oneshot(void);
 extern void tick_shutdown_broadcast_oneshot(unsigned int *cpup);
 extern int tick_resume_broadcast_oneshot(struct clock_event_device *bc);
 extern int tick_broadcast_oneshot_active(void);
 extern void tick_check_oneshot_broadcast(int cpu);
+extern void tick_handover_broadcast_cpu(int *cpup);
 bool tick_broadcast_oneshot_available(void);
 # else /* BROADCAST */
 static inline void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
 {
 	BUG();
 }
-static inline void tick_broadcast_oneshot_control(unsigned long reason) { }
+static inline int tick_broadcast_oneshot_control(unsigned long reason) { }
 static inline void tick_broadcast_switch_to_oneshot(void) { }
 static inline void tick_shutdown_broadcast_oneshot(unsigned int *cpup) { }
 static inline int tick_broadcast_oneshot_active(void) { return 0; }
 static inline void tick_check_oneshot_broadcast(int cpu) { }
+static inline void tick_handover_broadcast_cpu(int *cpup) {}
 static inline bool tick_broadcast_oneshot_available(void) { return true; }
 # endif /* !BROADCAST */
 
@@ -87,7 +89,7 @@ static inline void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
 {
 	BUG();
 }
-static inline void tick_broadcast_oneshot_control(unsigned long reason) { }
+static inline int tick_broadcast_oneshot_control(unsigned long reason) { }
 static inline void tick_shutdown_broadcast_oneshot(unsigned int *cpup) { }
 static inline int tick_resume_broadcast_oneshot(struct clock_event_device *bc)
 {

^ permalink raw reply related

* [RESEND PATCH V5 5/8] powermgt: Add OPAL call to resync timebase on wakeup
From: Preeti U Murthy @ 2014-01-22  7:09 UTC (permalink / raw)
  To: peterz, fweisbec, paul.gortmaker, paulus, mingo, mikey, shangw,
	rafael.j.wysocki, galak, =daniel.lezcano, benh, paulmck,
	--to=agraf, arnd, linux-pm, rostedt, michael, john.stultz, anton,
	tglx, chenhui.zhao, deepthi, r58472, geoff, linux-kernel,
	srivatsa.bhat, schwidefsky, svaidy, linuxppc-dev
In-Reply-To: <20140122065918.30650.22437.stgit@preeti.in.ibm.com>

From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

During "Fast-sleep" and deeper power savings state, decrementer and
timebase could be stopped making it out of sync with rest
of the cores in the system.

Add a firmware call to request platform to resync timebase
using low level platform methods.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
---

 arch/powerpc/include/asm/opal.h                |    2 ++
 arch/powerpc/kernel/exceptions-64s.S           |    2 +-
 arch/powerpc/kernel/idle_power7.S              |   27 ++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal-wrappers.S |    1 +
 4 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9a87b44..8c4829f 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -154,6 +154,7 @@ extern int opal_enter_rtas(struct rtas_args *args,
 #define OPAL_FLASH_VALIDATE			76
 #define OPAL_FLASH_MANAGE			77
 #define OPAL_FLASH_UPDATE			78
+#define OPAL_RESYNC_TIMEBASE			79
 #define OPAL_GET_MSG				85
 #define OPAL_CHECK_ASYNC_COMPLETION		86
 
@@ -863,6 +864,7 @@ extern void opal_flash_init(void);
 extern int opal_machine_check(struct pt_regs *regs);
 
 extern void opal_shutdown(void);
+extern int opal_resync_timebase(void);
 
 extern void opal_lpc_init(void);
 
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index b01a9cb..9533d7a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -145,7 +145,7 @@ BEGIN_FTR_SECTION
 
 	/* Fast Sleep wakeup on PowerNV */
 8:	GET_PACA(r13)
-	b 	.power7_wakeup_loss
+	b 	.power7_wakeup_tb_loss
 
 9:
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S
index 14f78be..c3ab869 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -17,6 +17,7 @@
 #include <asm/ppc-opcode.h>
 #include <asm/hw_irq.h>
 #include <asm/kvm_book3s_asm.h>
+#include <asm/opal.h>
 
 #undef DEBUG
 
@@ -125,6 +126,32 @@ _GLOBAL(power7_sleep)
 	b	power7_powersave_common
 	/* No return */
 
+_GLOBAL(power7_wakeup_tb_loss)
+	ld	r2,PACATOC(r13);
+	ld	r1,PACAR1(r13)
+
+	/* Time base re-sync */
+	li	r0,OPAL_RESYNC_TIMEBASE
+	LOAD_REG_ADDR(r11,opal);
+	ld	r12,8(r11);
+	ld	r2,0(r11);
+	mtctr	r12
+	bctrl
+
+	/* TODO: Check r3 for failure */
+
+	REST_NVGPRS(r1)
+	REST_GPR(2, r1)
+	ld	r3,_CCR(r1)
+	ld	r4,_MSR(r1)
+	ld	r5,_NIP(r1)
+	addi	r1,r1,INT_FRAME_SIZE
+	mtcr	r3
+	mfspr	r3,SPRN_SRR1		/* Return SRR1 */
+	mtspr	SPRN_SRR1,r4
+	mtspr	SPRN_SRR0,r5
+	rfid
+
 _GLOBAL(power7_wakeup_loss)
 	ld	r1,PACAR1(r13)
 	REST_NVGPRS(r1)
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 719aa5c..a11a87c 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -126,5 +126,6 @@ OPAL_CALL(opal_return_cpu,			OPAL_RETURN_CPU);
 OPAL_CALL(opal_validate_flash,			OPAL_FLASH_VALIDATE);
 OPAL_CALL(opal_manage_flash,			OPAL_FLASH_MANAGE);
 OPAL_CALL(opal_update_flash,			OPAL_FLASH_UPDATE);
+OPAL_CALL(opal_resync_timebase,			OPAL_RESYNC_TIMEBASE);
 OPAL_CALL(opal_get_msg,				OPAL_GET_MSG);
 OPAL_CALL(opal_check_completion,		OPAL_CHECK_ASYNC_COMPLETION);

^ permalink raw reply related

* [RESEND PATCH V5 4/8] powernv/cpuidle: Add context management for Fast Sleep
From: Preeti U Murthy @ 2014-01-22  7:08 UTC (permalink / raw)
  To: peterz, fweisbec, paul.gortmaker, paulus, mingo, mikey, shangw,
	rafael.j.wysocki, galak, =daniel.lezcano, benh, paulmck,
	--to=agraf, arnd, linux-pm, rostedt, michael, john.stultz, anton,
	tglx, chenhui.zhao, deepthi, r58472, geoff, linux-kernel,
	srivatsa.bhat, schwidefsky, svaidy, linuxppc-dev
In-Reply-To: <20140122065918.30650.22437.stgit@preeti.in.ibm.com>

From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

Before adding Fast-Sleep into the cpuidle framework, some low level
support needs to be added to enable it. This includes saving and
restoring of certain registers at entry and exit time of this state
respectively just like we do in the NAP idle state.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
[Changelog modified by Preeti U. Murthy <preeti@linux.vnet.ibm.com>]
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
---

 arch/powerpc/include/asm/processor.h |    1 +
 arch/powerpc/kernel/exceptions-64s.S |   10 ++++-
 arch/powerpc/kernel/idle_power7.S    |   63 ++++++++++++++++++++++++----------
 3 files changed, 53 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index b62de43..d660dc3 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -450,6 +450,7 @@ enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
 
 extern int powersave_nap;	/* set if nap mode can be used in idle loop */
 extern void power7_nap(void);
+extern void power7_sleep(void);
 extern void flush_instruction_cache(void);
 extern void hard_reset_now(void);
 extern void poweroff_now(void);
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 38d5073..b01a9cb 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -121,9 +121,10 @@ BEGIN_FTR_SECTION
 	cmpwi	cr1,r13,2
 	/* Total loss of HV state is fatal, we could try to use the
 	 * PIR to locate a PACA, then use an emergency stack etc...
-	 * but for now, let's just stay stuck here
+	 * OPAL v3 based powernv platforms have new idle states
+	 * which fall in this catagory.
 	 */
-	bgt	cr1,.
+	bgt	cr1,8f
 	GET_PACA(r13)
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
@@ -141,6 +142,11 @@ BEGIN_FTR_SECTION
 	beq	cr1,2f
 	b	.power7_wakeup_noloss
 2:	b	.power7_wakeup_loss
+
+	/* Fast Sleep wakeup on PowerNV */
+8:	GET_PACA(r13)
+	b 	.power7_wakeup_loss
+
 9:
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 #endif /* CONFIG_PPC_P7_NAP */
diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S
index 3fdef0f..14f78be 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -20,17 +20,27 @@
 
 #undef DEBUG
 
-	.text
+/* Idle state entry routines */
 
-_GLOBAL(power7_idle)
-	/* Now check if user or arch enabled NAP mode */
-	LOAD_REG_ADDRBASE(r3,powersave_nap)
-	lwz	r4,ADDROFF(powersave_nap)(r3)
-	cmpwi	0,r4,0
-	beqlr
-	/* fall through */
+#define	IDLE_STATE_ENTER_SEQ(IDLE_INST)				\
+	/* Magic NAP/SLEEP/WINKLE mode enter sequence */	\
+	std	r0,0(r1);					\
+	ptesync;						\
+	ld	r0,0(r1);					\
+1:	cmp	cr0,r0,r0;					\
+	bne	1b;						\
+	IDLE_INST;						\
+	b	.
 
-_GLOBAL(power7_nap)
+	.text
+
+/*
+ * Pass requested state in r3:
+ * 	0 - nap
+ * 	1 - sleep
+ */
+_GLOBAL(power7_powersave_common)
+	/* Use r3 to pass state nap/sleep/winkle */
 	/* NAP is a state loss, we create a regs frame on the
 	 * stack, fill it up with the state we care about and
 	 * stick a pointer to it in PACAR1. We really only
@@ -79,8 +89,8 @@ _GLOBAL(power7_nap)
 	/* Continue saving state */
 	SAVE_GPR(2, r1)
 	SAVE_NVGPRS(r1)
-	mfcr	r3
-	std	r3,_CCR(r1)
+	mfcr	r4
+	std	r4,_CCR(r1)
 	std	r9,_MSR(r1)
 	std	r1,PACAR1(r13)
 
@@ -90,15 +100,30 @@ _GLOBAL(power7_enter_nap_mode)
 	li	r4,KVM_HWTHREAD_IN_NAP
 	stb	r4,HSTATE_HWTHREAD_STATE(r13)
 #endif
+	cmpwi	cr0,r3,1
+	beq	2f
+	IDLE_STATE_ENTER_SEQ(PPC_NAP)
+	/* No return */
+2:	IDLE_STATE_ENTER_SEQ(PPC_SLEEP)
+	/* No return */
 
-	/* Magic NAP mode enter sequence */
-	std	r0,0(r1)
-	ptesync
-	ld	r0,0(r1)
-1:	cmp	cr0,r0,r0
-	bne	1b
-	PPC_NAP
-	b	.
+_GLOBAL(power7_idle)
+	/* Now check if user or arch enabled NAP mode */
+	LOAD_REG_ADDRBASE(r3,powersave_nap)
+	lwz	r4,ADDROFF(powersave_nap)(r3)
+	cmpwi	0,r4,0
+	beqlr
+	/* fall through */
+
+_GLOBAL(power7_nap)
+	li	r3,0
+	b	power7_powersave_common
+	/* No return */
+
+_GLOBAL(power7_sleep)
+	li	r3,1
+	b	power7_powersave_common
+	/* No return */
 
 _GLOBAL(power7_wakeup_loss)
 	ld	r1,PACAR1(r13)

^ permalink raw reply related

* [RESEND PATCH V5 3/8] cpuidle/ppc: Split timer_interrupt() into timer handling and interrupt handling routines
From: Preeti U Murthy @ 2014-01-22  7:08 UTC (permalink / raw)
  To: peterz, fweisbec, paul.gortmaker, paulus, mingo, mikey, shangw,
	rafael.j.wysocki, galak, =daniel.lezcano, benh, paulmck,
	--to=agraf, arnd, linux-pm, rostedt, michael, john.stultz, anton,
	tglx, chenhui.zhao, deepthi, r58472, geoff, linux-kernel,
	srivatsa.bhat, schwidefsky, svaidy, linuxppc-dev
In-Reply-To: <20140122065918.30650.22437.stgit@preeti.in.ibm.com>

Split timer_interrupt(), which is the local timer interrupt handler on ppc
into routines called during regular interrupt handling and __timer_interrupt(),
which takes care of running local timers and collecting time related stats.

This will enable callers interested only in running expired local timers to
directly call into __timer_interupt(). One of the use cases of this is the
tick broadcast IPI handling in which the sleeping CPUs need to handle the local
timers that have expired.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---

 arch/powerpc/kernel/time.c |   81 +++++++++++++++++++++++++-------------------
 1 file changed, 46 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 3ff97db..df2989b 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -478,6 +478,47 @@ void arch_irq_work_raise(void)
 
 #endif /* CONFIG_IRQ_WORK */
 
+void __timer_interrupt(void)
+{
+	struct pt_regs *regs = get_irq_regs();
+	u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
+	struct clock_event_device *evt = &__get_cpu_var(decrementers);
+	u64 now;
+
+	trace_timer_interrupt_entry(regs);
+
+	if (test_irq_work_pending()) {
+		clear_irq_work_pending();
+		irq_work_run();
+	}
+
+	now = get_tb_or_rtc();
+	if (now >= *next_tb) {
+		*next_tb = ~(u64)0;
+		if (evt->event_handler)
+			evt->event_handler(evt);
+		__get_cpu_var(irq_stat).timer_irqs_event++;
+	} else {
+		now = *next_tb - now;
+		if (now <= DECREMENTER_MAX)
+			set_dec((int)now);
+		/* We may have raced with new irq work */
+		if (test_irq_work_pending())
+			set_dec(1);
+		__get_cpu_var(irq_stat).timer_irqs_others++;
+	}
+
+#ifdef CONFIG_PPC64
+	/* collect purr register values often, for accurate calculations */
+	if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
+		struct cpu_usage *cu = &__get_cpu_var(cpu_usage_array);
+		cu->current_tb = mfspr(SPRN_PURR);
+	}
+#endif
+
+	trace_timer_interrupt_exit(regs);
+}
+
 /*
  * timer_interrupt - gets called when the decrementer overflows,
  * with interrupts disabled.
@@ -486,8 +527,6 @@ void timer_interrupt(struct pt_regs * regs)
 {
 	struct pt_regs *old_regs;
 	u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
-	struct clock_event_device *evt = &__get_cpu_var(decrementers);
-	u64 now;
 
 	/* Ensure a positive value is written to the decrementer, or else
 	 * some CPUs will continue to take decrementer exceptions.
@@ -519,39 +558,7 @@ void timer_interrupt(struct pt_regs * regs)
 	old_regs = set_irq_regs(regs);
 	irq_enter();
 
-	trace_timer_interrupt_entry(regs);
-
-	if (test_irq_work_pending()) {
-		clear_irq_work_pending();
-		irq_work_run();
-	}
-
-	now = get_tb_or_rtc();
-	if (now >= *next_tb) {
-		*next_tb = ~(u64)0;
-		if (evt->event_handler)
-			evt->event_handler(evt);
-		__get_cpu_var(irq_stat).timer_irqs_event++;
-	} else {
-		now = *next_tb - now;
-		if (now <= DECREMENTER_MAX)
-			set_dec((int)now);
-		/* We may have raced with new irq work */
-		if (test_irq_work_pending())
-			set_dec(1);
-		__get_cpu_var(irq_stat).timer_irqs_others++;
-	}
-
-#ifdef CONFIG_PPC64
-	/* collect purr register values often, for accurate calculations */
-	if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
-		struct cpu_usage *cu = &__get_cpu_var(cpu_usage_array);
-		cu->current_tb = mfspr(SPRN_PURR);
-	}
-#endif
-
-	trace_timer_interrupt_exit(regs);
-
+	__timer_interrupt();
 	irq_exit();
 	set_irq_regs(old_regs);
 }
@@ -828,6 +835,10 @@ static void decrementer_set_mode(enum clock_event_mode mode,
 /* Interrupt handler for the timer broadcast IPI */
 void tick_broadcast_ipi_handler(void)
 {
+	u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
+
+	*next_tb = get_tb_or_rtc();
+	__timer_interrupt();
 }
 
 static void register_decrementer_clockevent(int cpu)

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox