LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC PATCH 17/17] KVM: PPC: Add an ioctl for userspace to select which platform to emulate
From: Avi Kivity @ 2011-06-30  8:34 UTC (permalink / raw)
  To: Josh Boyer; +Cc: kvm-ppc, linuxppc-dev, Paul Mackerras, Alexander Graf, kvm
In-Reply-To: <20110629115857.GB17551@zod.rchland.ibm.com>

On 06/29/2011 02:58 PM, Josh Boyer wrote:
> >>  This makes me wonder if a similar thing might eventually be usable for
> >>  running an i686 or x32 guest on an x86_64 KVM host.  I have no idea if
> >>  that is even theoretically possible, but if it is it might be better to
> >>  rename the ioctl to be architecture agnostic.
> >
> >On x86 this is not required unless we want to "virtualize" pre-CPUID CPUs. Everything as of Pentium has a full bitmap of feature capabilities that KVM gets from user space, including information such as "Can we do 64-bit mode?".
>
> Ah.  Thank you for the explanation.

To clarify a bit further, running an i686 guest on an x86_64 host is not 
only theoretically possible, but is done regularly.  First, x86_64 is 
backwards compatible with i686 (so you can install a 32-bit OS on 64-bit 
hardware), and second, you can impersonate 32-bit guest hardware on a 
64-bit host.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: linux-next: build failure after merge of the final tree (powerpc tree related)
From: Benjamin Herrenschmidt @ 2011-06-30  7:09 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linuxppc-dev, linux-next, Paul Mackerras, linux-kernel
In-Reply-To: <20110630163659.374db7f9.sfr@canb.auug.org.au>

On Thu, 2011-06-30 at 16:36 +1000, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the final tree, today's linux-next build (powerpc
> allyesconfig) failed like this:
> 
> drivers/tty/hvc/hvsi.c:701:12: error: conflicting types for 'hvsi_put_chars'
> arch/powerpc/include/asm/hvsi.h:92:12: note: previous declaration of 'hvsi_put_chars' was here
> drivers/tty/hvc/hvsi.c:736:12: error: conflicting types for 'hvsi_open'
> arch/powerpc/include/asm/hvsi.h:86:12: note: previous declaration of 'hvsi_open' was here
> drivers/tty/hvc/hvsi.c:802:13: error: conflicting types for 'hvsi_close'
> arch/powerpc/include/asm/hvsi.h:87:13: note: previous declaration of 'hvsi_close' was here
> drivers/tty/hvc/hvsi.c:1083:19: error: conflicting types for 'hvsi_init'
> arch/powerpc/include/asm/hvsi.h:81:13: note: previous declaration of 'hvsi_init' was here
> 
> Caused by commit 17bdc6c0e979 ("powerpc/pseries: Move hvsi support into a
> library").
> 
> I have reverted that commit for today.

Ooops, that seems to be entirely my fault, not sure what happened, I
might have merged the wrong patch version (I had some problem locally
with screwing up the git repo where those patches were originally).

I'll push a fix tomorrow.

Cheers,
Ben.

^ permalink raw reply

* Re: [BUG?]3.0-rc4+ftrace+kprobe: set kprobe at instruction 'stwu' lead to system crash/freeze
From: Yong Zhang @ 2011-06-30  7:08 UTC (permalink / raw)
  To: ananth
  Cc: Jim Keniston, linux-kernel, Steven Rostedt, paulus,
	yrl.pp-manager.tt, Masami Hiramatsu, linuxppc-dev
In-Reply-To: <20110629064635.GB678@in.ibm.com>

On Wed, Jun 29, 2011 at 2:46 PM, Ananth N Mavinakayanahalli
<ananth@in.ibm.com> wrote:
>
> Certain functions are off limits for probing -- look for __kprobe

Yup.

> annotations in the kernel. Some such functions are arch specific, but
> show_interrupts() would definitely not be one of them. It works fine on
> my (64bit) test box.
>
> At this time, I think your best bet is to work with the eldk folks to
> narrow down the problem.

I'll give a try :)

> Given the current set of data, I am inclined to
> think it could be an eldk bug, not a kernel one.

Maybe, but the fact is if I don't use kprobe, things works
very well.

I'll be back if there is any update :)

Thanks,
Yong

-- 
Only stand for myself

^ permalink raw reply

* [PATCH] powerpc: software invalidate TCEs on pseries
From: Michael Neuling @ 2011-06-30  6:58 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, miltonm

From: Milton Miller <miltonm@bga.com>

Some pseries IOMMUs cache TCEs but don't snoop when the TCEs are changed
in memory, hence we need manually invalidate in software.

This adds code to do the invalidate.  It keys off a device tree property
to say where the to do the MMIO for the invalidate and some information
on what the format of the invalidate including some magic routing info.

it_busno get overloaded with this magic routing info and it_index with
the MMIO address for the invalidate command.

This then gets hooked into the building and freeing of TCEs.

This is only useful on bare metal pseries.  pHyp takes care of this when
virtualised.

Based on patch from Milton with cleanups from Mikey.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/tce.h         |    8 ++--
 arch/powerpc/platforms/pseries/iommu.c |   61 ++++++++++++++++++++++++++++++---
 2 files changed, 61 insertions(+), 8 deletions(-)

Index: linux-ozlabs/arch/powerpc/include/asm/tce.h
===================================================================
--- linux-ozlabs.orig/arch/powerpc/include/asm/tce.h
+++ linux-ozlabs/arch/powerpc/include/asm/tce.h
@@ -26,10 +26,12 @@
 
 /*
  * Tces come in two formats, one for the virtual bus and a different
- * format for PCI
+ * format for PCI.  PCI TCEs can have hardware or software maintianed
+ * coherency.
  */
-#define TCE_VB  0
-#define TCE_PCI 1
+#define TCE_VB			0
+#define TCE_PCI			1
+#define TCE_PCI_SW_INVAL	2
 
 /* TCE page size is 4096 bytes (1 << 12) */
 
Index: linux-ozlabs/arch/powerpc/platforms/pseries/iommu.c
===================================================================
--- linux-ozlabs.orig/arch/powerpc/platforms/pseries/iommu.c
+++ linux-ozlabs/arch/powerpc/platforms/pseries/iommu.c
@@ -51,13 +51,42 @@
 #include "plpar_wrappers.h"
 
 
+static void tce_invalidate_pSeries_sw(struct iommu_table *tbl,
+				      u64 *startp, u64 *endp)
+{
+	u64 __iomem *invalidate = (u64 __iomem *)tbl->it_index;
+	unsigned long start, end, inc;
+
+	start = __pa(startp);
+	end = __pa(endp);
+	inc = L1_CACHE_BYTES; /* invalidate a cacheline of TCEs at a time */
+
+	/* If this is non-zero, change the format.  We shift the
+	 * address and or in the magic from the device tree. */
+	if (tbl->it_busno) {
+		start <<= 12;
+		end <<= 12;
+		inc <<= 12;
+		start |= tbl->it_busno;
+		end |= tbl->it_busno;
+	}
+
+	end |= inc - 1; /* round up end to be different than start */
+
+	mb(); /* Make sure TCEs in memory are written */
+	while (start <= end) {
+		out_be64(invalidate, start);
+		start += inc;
+	}
+}
+
 static int tce_build_pSeries(struct iommu_table *tbl, long index,
 			      long npages, unsigned long uaddr,
 			      enum dma_data_direction direction,
 			      struct dma_attrs *attrs)
 {
 	u64 proto_tce;
-	u64 *tcep;
+	u64 *tcep, *tces;
 	u64 rpn;
 
 	proto_tce = TCE_PCI_READ; // Read allowed
@@ -65,7 +94,7 @@ static int tce_build_pSeries(struct iomm
 	if (direction != DMA_TO_DEVICE)
 		proto_tce |= TCE_PCI_WRITE;
 
-	tcep = ((u64 *)tbl->it_base) + index;
+	tces = tcep = ((u64 *)tbl->it_base) + index;
 
 	while (npages--) {
 		/* can't move this out since we might cross MEMBLOCK boundary */
@@ -75,18 +104,24 @@ static int tce_build_pSeries(struct iomm
 		uaddr += TCE_PAGE_SIZE;
 		tcep++;
 	}
+
+	if (tbl->it_type == TCE_PCI_SW_INVAL)
+		tce_invalidate_pSeries_sw(tbl, tces, tcep - 1);
 	return 0;
 }
 
 
 static void tce_free_pSeries(struct iommu_table *tbl, long index, long npages)
 {
-	u64 *tcep;
+	u64 *tcep, *tces;
 
-	tcep = ((u64 *)tbl->it_base) + index;
+	tces = tcep = ((u64 *)tbl->it_base) + index;
 
 	while (npages--)
 		*(tcep++) = 0;
+
+	if (tbl->it_type == TCE_PCI_SW_INVAL)
+		tce_invalidate_pSeries_sw(tbl, tces, tcep - 1);
 }
 
 static unsigned long tce_get_pseries(struct iommu_table *tbl, long index)
@@ -424,7 +459,7 @@ static void iommu_table_setparms(struct
 				 struct iommu_table *tbl)
 {
 	struct device_node *node;
-	const unsigned long *basep;
+	const unsigned long *basep, *sw_inval;
 	const u32 *sizep;
 
 	node = phb->dn;
@@ -461,6 +496,22 @@ static void iommu_table_setparms(struct
 	tbl->it_index = 0;
 	tbl->it_blocksize = 16;
 	tbl->it_type = TCE_PCI;
+
+	sw_inval = of_get_property(node, "linux,tce-sw-invalidate-info", NULL);
+	if (sw_inval) {
+		/*
+		 * This property contains information on how to
+		 * invalidate the TCE entry.  The first property is
+		 * the base MMIO address used to invalidate entries.
+		 * The second property tells us the format of the TCE
+		 * invalidate (whether it needs to be shifted) and
+		 * some magic routing info to add to our invalidate
+		 * command.
+		 */
+		tbl->it_index = (unsigned long) ioremap(sw_inval[0], 8);
+		tbl->it_busno = sw_inval[1]; /* overload this with magic */
+		tbl->it_type = TCE_PCI_SW_INVAL;
+	}
 }
 
 /*

^ permalink raw reply

* linux-next: build failure after merge of the final tree (powerpc tree related)
From: Stephen Rothwell @ 2011-06-30  6:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev
  Cc: linux-next, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1018 bytes --]

Hi all,

After merging the final tree, today's linux-next build (powerpc
allyesconfig) failed like this:

drivers/tty/hvc/hvsi.c:701:12: error: conflicting types for 'hvsi_put_chars'
arch/powerpc/include/asm/hvsi.h:92:12: note: previous declaration of 'hvsi_put_chars' was here
drivers/tty/hvc/hvsi.c:736:12: error: conflicting types for 'hvsi_open'
arch/powerpc/include/asm/hvsi.h:86:12: note: previous declaration of 'hvsi_open' was here
drivers/tty/hvc/hvsi.c:802:13: error: conflicting types for 'hvsi_close'
arch/powerpc/include/asm/hvsi.h:87:13: note: previous declaration of 'hvsi_close' was here
drivers/tty/hvc/hvsi.c:1083:19: error: conflicting types for 'hvsi_init'
arch/powerpc/include/asm/hvsi.h:81:13: note: previous declaration of 'hvsi_init' was here

Caused by commit 17bdc6c0e979 ("powerpc/pseries: Move hvsi support into a
library").

I have reverted that commit for today.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* [PATCH v2] powerpc: Add jump label support
From: Benjamin Herrenschmidt @ 2011-06-30  5:16 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Steven Rostedt

From: Michael Ellerman <michael@ellerman.id.au>

This patch adds support for the new "jump label" feature.

Unlike x86 and sparc we just merrily patch the code with no locks etc,
as far as I know this is safe, but I'm not really sure what the x86/sparc
code is protecting against so maybe it's not.

I also don't see any reason for us to implement the poke_early() routine,
even though sparc does.

[BenH: Updated the patch to upstream generic changes]

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

If no objection I'll stick that in powerpc -next tomorrow, it's been around
for a while, and it seems to just work so ...

 arch/powerpc/Kconfig                  |    1 +
 arch/powerpc/include/asm/jump_label.h |   47 +++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/Makefile          |    1 +
 arch/powerpc/kernel/jump_label.c      |   23 ++++++++++++++++
 4 files changed, 72 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/include/asm/jump_label.h
 create mode 100644 arch/powerpc/kernel/jump_label.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2729c66..c15f2e6 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -134,6 +134,7 @@ config PPC
 	select GENERIC_IRQ_SHOW_LEVEL
 	select HAVE_RCU_TABLE_FREE if SMP
 	select HAVE_SYSCALL_TRACEPOINTS
+	select HAVE_ARCH_JUMP_LABEL
 
 config EARLY_PRINTK
 	bool
diff --git a/arch/powerpc/include/asm/jump_label.h b/arch/powerpc/include/asm/jump_label.h
new file mode 100644
index 0000000..1f780b9
--- /dev/null
+++ b/arch/powerpc/include/asm/jump_label.h
@@ -0,0 +1,47 @@
+#ifndef _ASM_POWERPC_JUMP_LABEL_H
+#define _ASM_POWERPC_JUMP_LABEL_H
+
+/*
+ * Copyright 2010 Michael Ellerman, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/types.h>
+
+#include <asm/feature-fixups.h>
+
+#define JUMP_ENTRY_TYPE		stringify_in_c(FTR_ENTRY_LONG)
+#define JUMP_LABEL_NOP_SIZE	4
+
+static __always_inline bool arch_static_branch(struct jump_label_key *key)
+{
+	asm goto("1:\n\t"
+		 "nop\n\t"
+		 ".pushsection __jump_table,  \"aw\"\n\t"
+		 ".align 4\n\t"
+		 JUMP_ENTRY_TYPE "1b, %l[l_yes], %c0\n\t"
+		 ".popsection \n\t"
+		 : :  "i" (key) : : l_yes);
+	return false;
+l_yes:
+	return true;
+}
+
+#ifdef CONFIG_PPC64
+typedef u64 jump_label_t;
+#else
+typedef u32 jump_label_t;
+#endif
+
+struct jump_entry {
+	jump_label_t code;
+	jump_label_t target;
+	jump_label_t key;
+	jump_label_t pad;
+};
+
+#endif /* _ASM_POWERPC_JUMP_LABEL_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index e8b9818..ce4f7f1 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -76,6 +76,7 @@ obj-$(CONFIG_MODULES)		+= module.o module_$(CONFIG_WORD_SIZE).o
 obj-$(CONFIG_44x)		+= cpu_setup_44x.o
 obj-$(CONFIG_PPC_FSL_BOOK3E)	+= cpu_setup_fsl_booke.o dbell.o
 obj-$(CONFIG_PPC_BOOK3E_64)	+= dbell.o
+obj-$(CONFIG_JUMP_LABEL)	+= jump_label.o
 
 extra-y				:= head_$(CONFIG_WORD_SIZE).o
 extra-$(CONFIG_40x)		:= head_40x.o
diff --git a/arch/powerpc/kernel/jump_label.c b/arch/powerpc/kernel/jump_label.c
new file mode 100644
index 0000000..368d158
--- /dev/null
+++ b/arch/powerpc/kernel/jump_label.c
@@ -0,0 +1,23 @@
+/*
+ * Copyright 2010 Michael Ellerman, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/jump_label.h>
+#include <asm/code-patching.h>
+
+void arch_jump_label_transform(struct jump_entry *entry,
+			       enum jump_label_type type)
+{
+	u32 *addr = (u32 *)(unsigned long)entry->code;
+
+	if (type == JUMP_LABEL_ENABLE)
+		patch_branch(addr, entry->target, 0);
+	else
+		patch_instruction(addr, PPC_INST_NOP);
+}

^ permalink raw reply related

* Re: [openmcapi-dev] Re: [PATCH v3 2/2] powerpc: add support for MPIC message register API
From: Benjamin Herrenschmidt @ 2011-06-30  4:19 UTC (permalink / raw)
  To: Meador Inge
  Cc: openmcapi-dev, Hollis Blanchard, devicetree-discuss, linuxppc-dev
In-Reply-To: <4E0BE7B4.1020901@mentor.com>

On Wed, 2011-06-29 at 22:04 -0500, Meador Inge wrote:
> I posted a more detailed response a few days back:
> http://patchwork.ozlabs.org/patch/98075/.  In
> that response, I tried to put forth the rationale
> for allocating the registers statically due to
> the AMP use case.  With that in mind, do you
> still disagree with the design?  If so, do
> you have any suggestions for how it could be
> better? 

No, not really, please repost with my other comments addressed.

Cheers,
Ben.

^ permalink raw reply

* Re: [openmcapi-dev] Re: [PATCH v3 2/2] powerpc: add support for MPIC message register API
From: Meador Inge @ 2011-06-30  3:04 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: openmcapi-dev, Hollis Blanchard, devicetree-discuss, linuxppc-dev
In-Reply-To: <1308288784.32158.30.camel@pasglop>

On 06/17/2011 12:33 AM, Benjamin Herrenschmidt wrote:

> On Tue, 2011-05-31 at 14:19 -0500, Meador Inge wrote:
>> Some MPIC implementations contain one or more blocks of message registers
>> that are used to send messages between cores via IPIs.  A simple API has
>> been added to access (get/put, read, write, etc ...) these message registers.
>> The available message registers are initially discovered via nodes in the
>> device tree.  A separate commit contains a binding for the message register
>> nodes.

<snip>

> 
> Ok, so we have another scheme of:
> 
>  - Count all devices in the system of a given type
>  - Assign them numbers
>  - API uses number
> 
> That sucks... unless you have an allocator. And even then.
> 
> I'd rather clients use something like struct mpic_msgr (or msg_reg or
> message_reg) as the "handle" to one of these things.
> 
> It can be obtained via an allocator or a device tree parsing routine if
> there's a fixed relationship between clients and registers, I don't
> really know how that stuff is to be used, but in any case, the whole
> thing stinks as it is.

Ben,

I posted a more detailed response a few days back:
http://patchwork.ozlabs.org/patch/98075/.  In
that response, I tried to put forth the rationale
for allocating the registers statically due to
the AMP use case.  With that in mind, do you
still disagree with the design?  If so, do
you have any suggestions for how it could be
better?

-- 
Meador Inge
CodeSourcery / Mentor Embedded
http://www.mentor.com/embedded-software

^ permalink raw reply

* Re: [PATCH] powerpc/timebase_read: don't return time older than cycle_last
From: Benjamin Herrenschmidt @ 2011-06-30  0:29 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <20110629141940.2c930411@schlenkerla.am.freescale.net>

On Wed, 2011-06-29 at 14:19 -0500, Scott Wood wrote:
> On Wed, 29 Jun 2011 11:06:36 +1000
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> 
> > I don't think we ever want to "fix" userspace... how would you "fix" the
> > vDSO gettimeofday implementation for example since the vDSO has no
> > storage ?
> 
> Hmm... I guess you could at least make the libc interface monotonic, though
> that wouldn't help if multiple processes are doing some shared memory
> communication.
> 
> Ideally, anything that uses timestamps should be robust against small jumps
> backwards (i.e. not blow it up into some huge jump forward or similar
> breakage), because reality just isn't perfect.  Especially if you're writing
> userspace code that could run on all sorts of hardware.

Well, powerpc hardware so far has been "perfect" in that regard and I'm
unwilling to bend on that one without throwing a lot of shame at your
face for screwing up :-)

Point is, it's an assumption made by Linux, but also a pile of userspace
code, and I wouldn't be surprised if it extends way beyond just glibc.

So even if it's not spelled out in the PPC_AS (tho it is in PAPR), we
can define it as a Linux requirement :-)
 
> > We base this assumption on what I believe is an architectural
> > requirement tho of course it's not worded very explicitely, and probably
> > just "derived" from the architecture statement that the timebase can
> > always be used as a monotonic source of time.
> 
> Is it making any statement about monotonicity across CPUs?

I think PAPR does actually, I suspect it's been mostly implied and never
written down from a more generic arch perspective.

> > smp-tbsync.c is and has always been a "workaround" for broken HW.
> 
> As is this (or for broken firmware, or broken emulation).
> 
> > Anybody with half a clue should follow the recommendation of the
> > architecture (this one is actually spelled out, but as a recommendation
> > only) to have a TB enable pin and use it to perform a perfect sync at
> > boot time.
> 
> Where in the ISA is this?

In the TB chapter of 2.06

> Closest I see from a quick scan is "There must be a method for getting all
> Time Bases in the system to start incrementing with values that are
> identical or almost identical." (7.2 S, 9.2.1 E).  Note the "almost".
> 
> And while we do provide this beginning with the e500mc-based chips, e500v2
> isn't dead yet.  Kexec is also currently breaking the boot sync, requiring
> smp-tbsync -- though ideally kexec could be reworked to not need to
> physically reset the core.

Right, fixable :-)

> > > We had a bug in U-Boot's timebase sync where the boot core would sometimes
> > > be one tick faster than the other cores.
> > 
> > It's scary to think that your cores TBs seem to be soured from different
> > clock sources...
> 
> They're not.  We use the boot core's timebase for a while, then disable it,
> reset it to zero, and enable all the timebases at once.  The bug was a
> missing readback to ensure that it was stopped before it was reset, so
> sometimes it would tick up to 1 on the boot core after reset, before being
> enabled again.  This resulted in the boot core's timebase always reading
> one greater than the other cores'.
> 
> It can also be an issue when running on simulated hardware.

Ok.

> > ie even if you fix uBoot, can you guarantee they won't
> > drift ? I hope so ...
> 
> It shouldn't drift even with the old U-boot -- it's just a constant offset.

That's easier to fix.

> > I would consider that an unfixable architecture
> > violation and I am not at this stage keen on implementing the necessary
> > "workarounds" in Linux (the userspace case is nasty, really nasty).
> > 
> > PowerPC always prided itself on having a "sane" time base mechanism
> > unlike x86, please don't tell me that you guys are now breaking that
> > assumption.
> 
> If you mean "are we introducing new chips that have timebase problems", no.
> 
> I'm questioning whether the assumption was ever fully valid under all
> circumstances.

I think it was, and regardless, the majority of code out there was
written with that assumption.

> > > It's been fixed, but there are
> > > probably people still running the old U-Boot.  It seems like the kind of
> > > thing where defensive robustness is called for, like timing out instead of
> > > hanging if a hardware register never flips the bit we're waiting for.
> > 
> > No, you'll just "hide" the problem from the kernel and horrible &
> > unexplainable things will happen in userspace. At the VERY LEAST you
> > must warn very loudly if you detect this is happening.
> 
> A warning message is OK.
> 
> The current situation hides it as well, since it appears to work fine
> until you hit the race, and suddenly things get stuck and it takes a lot of
> digging to find out why.

Yes but you can't hide it completely, so you can't "contain" the damage.

> > > > We make hard assumptions here and in various places actually.
> > > 
> > > Are there any in the kernel that this doesn't cover?
> > 
> > Check gtod implementation, I'm not sure whether that's enough at this
> > stage or not for it, and then there's the vDSO of course.
> 
> I think the in-kernel gtod is OK after this patch (it breaks if it reads a
> timestamp that is less than cycle_last).
> 
> The 32-bit vdso looks OK as is since it doesn't convert to nsec until after
> adding to xtime.  Userspace will still see time go backwards a bit if the
> timebase does, but it shouldn't get a wildly wrong answer.

But it might calculate one. There's no such thing as a "small problem"
here. If userspace sees the time go backward, even by 1ns, horrible
unpredictable things will happen.

It's either fully correct or not correct at all, and our userspace makes
the assumption that it's always fully correct.

>   The 64-bit vdso
> uses srdi instead of sradi, and does so before adding to the upper
> half of xtime, so if the race is hit the returned time will be too high by
> 2^32 seconds.
> 
> The problem in the in-kernel gettimeofday that causes it to be sensitive to
> times less than cycles_last appears to be the same thing --
> clocksource_cyc2ns() does an unsigned shift, even though it claims to be
> returning a signed result.
> 
> > Not sure what's up with sched_clock() and whether that has similar constraints.
> 
> sched_clock() should be OK as long as the timestamp is always greater than
> boot_tb, and hopefully any timebase skew is less than the delay from when
> boot_tb is set to when secondary cores start up.
> 
> Delay loops could end early if they use unsigned comparisons, but only if
> the timebase skew is greater than the time it takes to move a thread from
> one cpu to another.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH v2] powerpc/book3e-64: use a separate TLB handler when linear map is bolted
From: Benjamin Herrenschmidt @ 2011-06-29 22:20 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <20110629144010.1fe5df24@schlenkerla.am.freescale.net>

On Wed, 2011-06-29 at 14:40 -0500, Scott Wood wrote:
> What is the "weird page table format" referred to by the normal miss
> handler?

Not sure :-) Probably the fact that we allocate 64K for PTE pages but
only use 32K of them ?

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] powerpc/mm: add devmem_is_allowed() for STRICT_DEVMEM checking
From: Sukadev Bhattiprolu @ 2011-06-29 22:01 UTC (permalink / raw)
  To: Scott Wood; +Cc: Nathan Lynch, linuxppc-dev, Steve Best
In-Reply-To: <20110614140431.31ae4357__48367.5367352136$1308078343$gmane$org@schlenkerla.am.freescale.net>

On 06/14/2011 12:04 PM, Scott Wood wrote:
> On Tue, 14 Jun 2011 14:17:01 -0400
> Steve Best<sfbest@us.ibm.com>  wrote:
>
>> On Tue, 2011-06-14 at 12:30 -0500, Nathan Lynch wrote:
>>> Hi Steve,
>>>
>>> On Tue, 2011-06-14 at 12:58 -0400, Steve Best wrote:
>>>> +/*
>>>> + * devmem_is_allowed() checks to see if /dev/mem access to a certain address
>>>> + * is valid. The argument is a physical page number.
>>>> + *
>>>> + * On PowerPC, access has to be given to data regions used by X. We have to
>>>> + * disallow access to device-exclusive MMIO regions and system RAM.
>>>> + */
>>>> +int devmem_is_allowed(unsigned long pfn)
>>>> +{
>>>> +        if ((pfn>= 57360 || pfn<= 57392))
>>>> +                return 1;
>>>
>>> That seems... fragile.  Where do these numbers come from, and are they
>>> appropriate for all platforms and configurations?
>>
>> This is the range I got from testing pseries blades and servers. maybe
>> there is a better way to get this range anyone know of a way?
>
> Use iomem_is_exclusive(), as other architectures (e.g. x86, arm) do.
>
> Anything else is both platform-specific, and inappropriate hardcoding of
> policy.

x86 allows access to the first 256 pages. Are there other regions that 
we should allow in power besides the !iomem_is_exclusive() region ?

>
> -Scott

^ permalink raw reply

* Re: [PATCH 2/2] Add cpufreq driver for Momentum Maple boards
From: Dmitry Eremin-Solenikov @ 2011-06-29 20:58 UTC (permalink / raw)
  To: kevin diggs; +Cc: Paul Mackerras, linuxppc-dev, cpufreq, Dave Jones
In-Reply-To: <BANLkTikD_x3QA8_q8+444QhA39SEW6hVfg@mail.gmail.com>

On Wed, Jun 29, 2011 at 10:09 PM, kevin diggs <diggskevin38@gmail.com> wrote:
> Hi,
>
> On Tue, Jun 28, 2011 at 10:28 PM, Benjamin Herrenschmidt >
>>
>> If we're going to have a Kconfig.powerpc, should we maybe just have a
>> powerpc subdirectory instead with the driver in it ?
>>
> Where would the powerpc subdirectory be? under drivers/cpufreq? Or
> somewhere under arch/powerpc where it belongs (and I put my 750GX
> stuff)?

drivers/cpufreq/powerpc. However my current version (as suggested by Ben)
goes directly to drivers/cpufreq

-- 
With best wishes
Dmitry

^ permalink raw reply

* Re: [PATCH v2] powerpc/book3e-64: use a separate TLB handler when linear map is bolted
From: Scott Wood @ 2011-06-29 19:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1309333828.14501.65.camel@pasglop>

On Wed, 29 Jun 2011 17:50:28 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Wed, 2011-06-22 at 16:25 -0500, Scott Wood wrote:
> > On MMUs such as FSL where we can guarantee the entire linear mapping is
> > bolted, we don't need to worry about linear TLB misses.  If on top of
> > that we do a full table walk, we get rid of all recursive TLB faults, and
> > can dispense with some state saving.  This gains a few percent on
> > TLB-miss-heavy workloads, and around 50% on a benchmark that had a high
> > rate of virtual page table faults under the normal handler.
> > 
> > While touching the EX_TLB layout, remove EX_TLB_MMUCR0, EX_TLB_SRR0, and
> > EX_TLB_SRR1 as they're not used.
> 
> I merged that into -next, but it was breaking 64K pages on WSP, I had to
> add an ifdef in there to skip the PUD level when walking the page tables
> (PUD_SHIFT isn't defined for asm when doing 64K pages).
> 
> Please check I didn't break anything.

Looks good, though I wonder if all the bolted stuff should be under the
ifdef, at least for now.

What is the "weird page table format" referred to by the normal miss
handler?

-Scott

^ permalink raw reply

* Re: perf_event_open system call support in powerpc
From: Scott Wood @ 2011-06-29 19:24 UTC (permalink / raw)
  To: ashwath narasimhan; +Cc: linuxppc-dev
In-Reply-To: <BANLkTi=j1FwEv2tBmPck3GWHWgcYGchcew@mail.gmail.com>

On Tue, 28 Jun 2011 20:03:10 -0700
ashwath narasimhan <ashwath.narasimhan@oneconvergence.com> wrote:

> Hello,
> 
>  I am new to the powerpc architecture and I am trying to use
> perf_event_open() system call for power pc architecture (e500mc) using
> 2.6.32 kernel distribution. Is this system call number supported for power
> pc architecture? 

perf event support for e500mc wasn't added until 2.6.34.

-Scott

^ permalink raw reply

* Re: [PATCH] powerpc/timebase_read: don't return time older than cycle_last
From: Scott Wood @ 2011-06-29 19:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1309309596.32158.487.camel@pasglop>

On Wed, 29 Jun 2011 11:06:36 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> I don't think we ever want to "fix" userspace... how would you "fix" the
> vDSO gettimeofday implementation for example since the vDSO has no
> storage ?

Hmm... I guess you could at least make the libc interface monotonic, though
that wouldn't help if multiple processes are doing some shared memory
communication.

Ideally, anything that uses timestamps should be robust against small jumps
backwards (i.e. not blow it up into some huge jump forward or similar
breakage), because reality just isn't perfect.  Especially if you're writing
userspace code that could run on all sorts of hardware.

> We base this assumption on what I believe is an architectural
> requirement tho of course it's not worded very explicitely, and probably
> just "derived" from the architecture statement that the timebase can
> always be used as a monotonic source of time.

Is it making any statement about monotonicity across CPUs?

> smp-tbsync.c is and has always been a "workaround" for broken HW.

As is this (or for broken firmware, or broken emulation).

> Anybody with half a clue should follow the recommendation of the
> architecture (this one is actually spelled out, but as a recommendation
> only) to have a TB enable pin and use it to perform a perfect sync at
> boot time.

Where in the ISA is this?

Closest I see from a quick scan is "There must be a method for getting all
Time Bases in the system to start incrementing with values that are
identical or almost identical." (7.2 S, 9.2.1 E).  Note the "almost".

And while we do provide this beginning with the e500mc-based chips, e500v2
isn't dead yet.  Kexec is also currently breaking the boot sync, requiring
smp-tbsync -- though ideally kexec could be reworked to not need to
physically reset the core.

> > We had a bug in U-Boot's timebase sync where the boot core would sometimes
> > be one tick faster than the other cores.
> 
> It's scary to think that your cores TBs seem to be soured from different
> clock sources...

They're not.  We use the boot core's timebase for a while, then disable it,
reset it to zero, and enable all the timebases at once.  The bug was a
missing readback to ensure that it was stopped before it was reset, so
sometimes it would tick up to 1 on the boot core after reset, before being
enabled again.  This resulted in the boot core's timebase always reading
one greater than the other cores'.

It can also be an issue when running on simulated hardware.

> ie even if you fix uBoot, can you guarantee they won't
> drift ? I hope so ...

It shouldn't drift even with the old U-boot -- it's just a constant offset.

> I would consider that an unfixable architecture
> violation and I am not at this stage keen on implementing the necessary
> "workarounds" in Linux (the userspace case is nasty, really nasty).
> 
> PowerPC always prided itself on having a "sane" time base mechanism
> unlike x86, please don't tell me that you guys are now breaking that
> assumption.

If you mean "are we introducing new chips that have timebase problems", no.

I'm questioning whether the assumption was ever fully valid under all
circumstances.

> > It's been fixed, but there are
> > probably people still running the old U-Boot.  It seems like the kind of
> > thing where defensive robustness is called for, like timing out instead of
> > hanging if a hardware register never flips the bit we're waiting for.
> 
> No, you'll just "hide" the problem from the kernel and horrible &
> unexplainable things will happen in userspace. At the VERY LEAST you
> must warn very loudly if you detect this is happening.

A warning message is OK.

The current situation hides it as well, since it appears to work fine
until you hit the race, and suddenly things get stuck and it takes a lot of
digging to find out why.

> > > We make hard assumptions here and in various places actually.
> > 
> > Are there any in the kernel that this doesn't cover?
> 
> Check gtod implementation, I'm not sure whether that's enough at this
> stage or not for it, and then there's the vDSO of course.

I think the in-kernel gtod is OK after this patch (it breaks if it reads a
timestamp that is less than cycle_last).

The 32-bit vdso looks OK as is since it doesn't convert to nsec until after
adding to xtime.  Userspace will still see time go backwards a bit if the
timebase does, but it shouldn't get a wildly wrong answer.  The 64-bit vdso
uses srdi instead of sradi, and does so before adding to the upper
half of xtime, so if the race is hit the returned time will be too high by
2^32 seconds.

The problem in the in-kernel gettimeofday that causes it to be sensitive to
times less than cycles_last appears to be the same thing --
clocksource_cyc2ns() does an unsigned shift, even though it claims to be
returning a signed result.

> Not sure what's up with sched_clock() and whether that has similar constraints.

sched_clock() should be OK as long as the timestamp is always greater than
boot_tb, and hopefully any timebase skew is less than the delay from when
boot_tb is set to when secondary cores start up.

Delay loops could end early if they use unsigned comparisons, but only if
the timebase skew is greater than the time it takes to move a thread from
one cpu to another.

-Scott

^ permalink raw reply

* Re: perf_event_open system call support in powerpc
From: ashwath narasimhan @ 2011-06-29 18:35 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <BANLkTi=j1FwEv2tBmPck3GWHWgcYGchcew@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 754 bytes --]

Yep e500mc supports performance counters in hardware. Found the mapping for
the kernel call in /arch/powerpc/include/asm/systbl.h

On Tue, Jun 28, 2011 at 8:03 PM, ashwath narasimhan <
ashwath.narasimhan@oneconvergence.com> wrote:

> Hello,
>
>  I am new to the powerpc architecture and I am trying to use
> perf_event_open() system call for power pc architecture (e500mc) using
> 2.6.32 kernel distribution. Is this system call number supported for power
> pc architecture? If yes, is there something similar to
>  arch/x86/kernel/syscall_table_32.S  listing for powerpc that indicates the
> number for the above system call?
>
> Thanks in advance for assisting me. Please email me at
> ashwath.narasimhan@oneconvergence.com
>
> --
> -Ash
>



-- 
-Ash

[-- Attachment #2: Type: text/html, Size: 1158 bytes --]

^ permalink raw reply

* Re: [PATCH 2/2] Add cpufreq driver for Momentum Maple boards
From: kevin diggs @ 2011-06-29 18:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Dmitry Eremin-Solenikov, Dave Jones, Paul Mackerras, linuxppc-dev,
	cpufreq
In-Reply-To: <1309337670.14501.69.camel@pasglop>

Hi,

Try this one more time ...

On Wed, Jun 29, 2011 at 3:54 AM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Wed, 2011-06-29 at 12:40 +0400, Dmitry Eremin-Solenikov wrote:
>
> If you feel like it :-) The powermac one has quite a bit more plumbing
> for voltage control etc... but it does make sense in the long run.
>
On my G5 (PowerMac7,3?), a dual 970FX @ 2.5G, I don't think the
voltage scaling works correctly. If someone else with one of these
(preferably someone who is NOT swamped (and named Ben)) could run some
experiments. I would like to know whether the G5 I bought on ebay is
some "FrankenG5" and the others actually work correctly.

To summarize, if I disable frequency scaling and look at the cpu core
voltages it runs at the LOW voltage at full (i.e. 2.5 GHz) speed. With
frequency scaling enabled, it runs the low speed at the same voltage
it runs at 2.5 GHz without frequency scaling enabled. At the full
speed it switches to a higher voltage. It WILL overheat if allowed to
'do stuff'. Temps above 110 are observed for cpu 1 (the second cpu in
the serial (i.e. cpu 1 is heated by cpu 0) cooling setup - DUH!!!).
The two voltages are like ~1.23 and ~1.35.

Back when this beast had MacOS X, I think it exhibited similar
behavior based on the fan noise.

kevin

^ permalink raw reply

* Re: powerpc/4xx: Regression failed on sil24 (and other) drivers
From: Ayman El-Khashab @ 2011-06-29 18:13 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: cam, linuxppc-dev
In-Reply-To: <1309311723.32158.514.camel@pasglop>

On Wed, Jun 29, 2011 at 11:42:03AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2011-06-27 at 06:31 -0500, Ayman El-Khashab wrote:
> > On Mon, Jun 27, 2011 at 08:19:56PM +1000, Benjamin Herrenschmidt wrote:
> > > On Sat, 2011-06-25 at 18:52 -0500, Ayman El-Khashab wrote:
> > > > I noticed during a recent development with the 460SX that a
> > > > simple device that once worked stopped.  I did a bisect to
> > > > find the offending commit and it turns out to be this one:
> > > > 
> > > > 0e52247a2ed1f211f0c4f682dc999610a368903f is the first bad
> > > > commit
> > > > commit 0e52247a2ed1f211f0c4f682dc999610a368903f
> > > > Author: Cam Macdonell <cam@cs.ualberta.ca>
> > > > Date:   Tue Sep 7 17:25:20 2010 -0700
> > > > 
> > > >     PCI: fix pci_resource_alignment prototype
> > > > 
> 
....
> 
> I suspect you don't have CONFIG_PCI_QUIRKS enabled... I think that's the
> cause of your problem.
> 
> It looks like this config option controls both compiling the "generic"
> quirks in from drivers/pci/quirk.c, and the actually mechanism for
> having quirks in the first place (pci_fixup_device() goes away without
> that config option).
> 
> I think we probably want to unconditionally select that if CONFIG_PCI is
> enabled in arch/powerpc...
> 
> Can you try changing it and tell us if that helps ?

Yes, that fixed our problem, thanks for your time.  I am
going to try to get the MSI to work.

Ayman

^ permalink raw reply

* Re: [PATCH 2/2] Add cpufreq driver for Momentum Maple boards
From: kevin diggs @ 2011-06-29 18:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Dmitry Eremin-Solenikov, Dave Jones, Paul Mackerras, linuxppc-dev,
	cpufreq
In-Reply-To: <1309318110.32158.520.camel@pasglop>

Hi,

On Tue, Jun 28, 2011 at 10:28 PM, Benjamin Herrenschmidt >
>
> If we're going to have a Kconfig.powerpc, should we maybe just have a
> powerpc subdirectory instead with the driver in it ?
>
Where would the powerpc subdirectory be? under drivers/cpufreq? Or
somewhere under arch/powerpc where it belongs (and I put my 750GX
stuff)?

>
>> + =A0 =A0 printk(KERN_INFO "Frequency method: SCOM, Voltage method: none=
\n");
>
> This is useless.
>
Why?

> Cheers,
> Ben.
>
kevin

^ permalink raw reply

* Re: [PATCH 1/2] mtd/nand : don't free the global data fsl_lbc_ctrl_dev->nand in fsl_elbc_chip_remove()
From: Scott Wood @ 2011-06-29 16:45 UTC (permalink / raw)
  To: dedekind1; +Cc: linuxppc-dev, b35362, dwmw2, linux-mtd
In-Reply-To: <1309328435.23597.104.camel@sauron>

On Wed, 29 Jun 2011 09:20:25 +0300
Artem Bityutskiy <dedekind1@gmail.com> wrote:

> On Tue, 2011-06-28 at 09:50 +0800, b35362@freescale.com wrote:
> > From: Liu Shuo <b35362@freescale.com>
> > 
> > The global data fsl_lbc_ctrl_dev->nand don't have to be freed in
> > fsl_elbc_chip_remove(). The right place to do that is in fsl_elbc_nand_remove()
> > if elbc_fcm_ctrl->counter is zero.
> > 
> > Signed-off-by: Liu Shuo <b35362@freescale.com>
> > ---
> >  drivers/mtd/nand/fsl_elbc_nand.c |    1 -
> >  1 files changed, 0 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/mtd/nand/fsl_elbc_nand.c b/drivers/mtd/nand/fsl_elbc_nand.c
> > index 0bb254c..a212116 100644
> > --- a/drivers/mtd/nand/fsl_elbc_nand.c
> > +++ b/drivers/mtd/nand/fsl_elbc_nand.c
> > @@ -829,7 +829,6 @@ static int fsl_elbc_chip_remove(struct fsl_elbc_mtd *priv)
> >  
> >  	elbc_fcm_ctrl->chips[priv->bank] = NULL;
> >  	kfree(priv);
> > -	kfree(elbc_fcm_ctrl);
> >  	return 0;
> >  }
> 
> Do we have to assign fsl_lbc_ctrl_dev->nand to NULL in
> fsl_elbc_nand_remove() then? I think that assignment can be killed then.
> 
>         if (!elbc_fcm_ctrl->counter) {
>                 fsl_lbc_ctrl_dev->nand = NULL;
>                 kfree(elbc_fcm_ctrl);
>         }
> 

If we're freeing fsl_lbc_ctrl, we'd better get rid of references to it...

-Scott

^ permalink raw reply

* Re: [PATCH 2/2] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: Scott Wood @ 2011-06-29 16:43 UTC (permalink / raw)
  To: dedekind1; +Cc: linuxppc-dev, b35362, dwmw2, linux-mtd
In-Reply-To: <1309328529.23597.106.camel@sauron>

On Wed, 29 Jun 2011 09:22:04 +0300
Artem Bityutskiy <dedekind1@gmail.com> wrote:

> On Tue, 2011-06-28 at 09:50 +0800, b35362@freescale.com wrote:
> > +	/* Hack for supporting the flash chip whose writesize is
> > +	 * larger than 2K bytes.
> > +	 */
> 
> Please, use proper kernel multi-line comments. Please, make sure
> checkpatch.pl does not generate 13 errors with this patch.

Most of the checkpatch complaints are about existing style in the file --
particularly, the use of tabs only for indentation, with spaces used for
alignment beyond the indentation point.

-Scott

^ permalink raw reply

* Re: [PATCH v4]PPC4xx: Adding PCI(E) MSI support
From: Ayman El-Khashab @ 2011-06-29 15:22 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev, Rupjyoti Sarmah, rsarmah, linux-kernel
In-Reply-To: <1309302928.32158.470.camel@pasglop>

On Wed, Jun 29, 2011 at 09:15:28AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2011-06-28 at 17:31 -0500, Ayman El-Khashab wrote:
> > > > +static int ppc4xx_setup_pcieh_hw(struct platform_device *dev,
> > > > +                            struct resource res, struct
> > ppc4xx_msi *msi)
> > > > +{
> > > > +
> > 
> > <snip>
> > 
> > > > +
> > > > +   msi->msi_dev = of_find_node_by_name(NULL, "ppc4xx-msi");
> > > > +   if (msi->msi_dev)
> > > > +           return -ENODEV;
> > 
> > This does not look correct. I guess it should probably read 
> > 
> > if (!msi->msi_dev) .....
> 
> Indeed, that looks bogus. Rupjyoti, please test and send fixes if
> necessary, obviously this code has not been tested.
> 
> This is not part of the bits I fixed up so I looks to me like the
> original patch was wrong (and thus obviously untested !!!)
> 

Looking back through the mailing list, there have been
various incarnations of this patch to add MSI support to the
44x.  Every one that I looked at had this same line of code
in it so I am not sure they worked.  In any case I am trying
to make it work on my system (which is how I found the bug).
When I enable the "sdr-base" line in the MSI section of my
dts, it just reboots continuosly right after "Loading Device
Tree ....".  I tried renaming it to "msi-sdr-base" just in
case there was a conflict (since it is reading through the
entire tree) but that did not help.  If I understand
correctly, the ppc4xx_msi_probe function must be executing
very early since I suspect something in setup_pcieh_hw is 
what causes it to fail.  

ayman

^ permalink raw reply

* [PATCH V4 2/2] Add cpufreq driver for Momentum Maple boards
From: Dmitry Eremin-Solenikov @ 2011-06-29 15:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Paul Mackerras, cpufreq, Dave Jones
In-Reply-To: <1309360076-22579-1-git-send-email-dbaryshkov@gmail.com>

Add simple cpufreq driver for Maple-based boards (ppc970fx evaluation
kit and others). Driver is based on a cpufreq driver for 64-bit powermac
boxes with all pmac-dependant features removed and simple cleanup
applied.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
---
 drivers/cpufreq/Kconfig         |    5 +
 drivers/cpufreq/Kconfig.powerpc |    7 +
 drivers/cpufreq/Makefile        |    5 +
 drivers/cpufreq/maple-cpufreq.c |  309 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 326 insertions(+), 0 deletions(-)
 create mode 100644 drivers/cpufreq/Kconfig.powerpc
 create mode 100644 drivers/cpufreq/maple-cpufreq.c

diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
index 9fb8485..61ae639 100644
--- a/drivers/cpufreq/Kconfig
+++ b/drivers/cpufreq/Kconfig
@@ -184,5 +184,10 @@ depends on X86
 source "drivers/cpufreq/Kconfig.x86"
 endmenu
 
+menu "PowerPC CPU frequency scaling drivers"
+depends on PPC32 || PPC64
+source "drivers/cpufreq/Kconfig.powerpc"
+endmenu
+
 endif
 endmenu
diff --git a/drivers/cpufreq/Kconfig.powerpc b/drivers/cpufreq/Kconfig.powerpc
new file mode 100644
index 0000000..e76992f
--- /dev/null
+++ b/drivers/cpufreq/Kconfig.powerpc
@@ -0,0 +1,7 @@
+config CPU_FREQ_MAPLE
+	bool "Support for Maple 970FX Evaluation Board"
+	depends on PPC_MAPLE
+	select CPU_FREQ_TABLE
+	help
+	  This adds support for frequency switching on Maple 970FX
+	  Evaluation Board and compatible boards (IBM JS2x blades).
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index e2fc2d2..ca3796d 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -41,3 +41,8 @@ obj-$(CONFIG_X86_CPUFREQ_NFORCE2)	+= cpufreq-nforce2.o
 
 # ARM SoC drivers
 obj-$(CONFIG_UX500_SOC_DB8500)		+= db8500-cpufreq.o
+
+
+##################################################################################d
+# PowerPC platform drivers
+obj-$(CONFIG_CPU_FREQ_MAPLE)		+= maple-cpufreq.o
diff --git a/drivers/cpufreq/maple-cpufreq.c b/drivers/cpufreq/maple-cpufreq.c
new file mode 100644
index 0000000..89b178a
--- /dev/null
+++ b/drivers/cpufreq/maple-cpufreq.c
@@ -0,0 +1,309 @@
+/*
+ *  Copyright (C) 2011 Dmitry Eremin-Solenikov
+ *  Copyright (C) 2002 - 2005 Benjamin Herrenschmidt <benh@kernel.crashing.org>
+ *  and                       Markus Demleitner <msdemlei@cl.uni-heidelberg.de>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This driver adds basic cpufreq support for SMU & 970FX based G5 Macs,
+ * that is iMac G5 and latest single CPU desktop.
+ */
+
+#undef DEBUG
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/delay.h>
+#include <linux/sched.h>
+#include <linux/cpufreq.h>
+#include <linux/init.h>
+#include <linux/completion.h>
+#include <linux/mutex.h>
+#include <linux/time.h>
+#include <linux/of.h>
+
+#define DBG(fmt...) pr_debug(fmt)
+
+/* see 970FX user manual */
+
+#define SCOM_PCR 0x0aa001			/* PCR scom addr */
+
+#define PCR_HILO_SELECT		0x80000000U	/* 1 = PCR, 0 = PCRH */
+#define PCR_SPEED_FULL		0x00000000U	/* 1:1 speed value */
+#define PCR_SPEED_HALF		0x00020000U	/* 1:2 speed value */
+#define PCR_SPEED_QUARTER	0x00040000U	/* 1:4 speed value */
+#define PCR_SPEED_MASK		0x000e0000U	/* speed mask */
+#define PCR_SPEED_SHIFT		17
+#define PCR_FREQ_REQ_VALID	0x00010000U	/* freq request valid */
+#define PCR_VOLT_REQ_VALID	0x00008000U	/* volt request valid */
+#define PCR_TARGET_TIME_MASK	0x00006000U	/* target time */
+#define PCR_STATLAT_MASK	0x00001f00U	/* STATLAT value */
+#define PCR_SNOOPLAT_MASK	0x000000f0U	/* SNOOPLAT value */
+#define PCR_SNOOPACC_MASK	0x0000000fU	/* SNOOPACC value */
+
+#define SCOM_PSR 0x408001			/* PSR scom addr */
+/* warning: PSR is a 64 bits register */
+#define PSR_CMD_RECEIVED	0x2000000000000000U   /* command received */
+#define PSR_CMD_COMPLETED	0x1000000000000000U   /* command completed */
+#define PSR_CUR_SPEED_MASK	0x0300000000000000U   /* current speed */
+#define PSR_CUR_SPEED_SHIFT	(56)
+
+/*
+ * The G5 only supports two frequencies (Quarter speed is not supported)
+ */
+#define CPUFREQ_HIGH                  0
+#define CPUFREQ_LOW                   1
+
+static struct cpufreq_frequency_table maple_cpu_freqs[] = {
+	{CPUFREQ_HIGH,		0},
+	{CPUFREQ_LOW,		0},
+	{0,			CPUFREQ_TABLE_END},
+};
+
+static struct freq_attr *maple_cpu_freqs_attr[] = {
+	&cpufreq_freq_attr_scaling_available_freqs,
+	NULL,
+};
+
+/* Power mode data is an array of the 32 bits PCR values to use for
+ * the various frequencies, retrieved from the device-tree
+ */
+static int maple_pmode_cur;
+
+static DEFINE_MUTEX(maple_switch_mutex);
+
+static const u32 *maple_pmode_data;
+static int maple_pmode_max;
+
+/*
+ * SCOM based frequency switching for 970FX rev3
+ */
+static int maple_scom_switch_freq(int speed_mode)
+{
+	unsigned long flags;
+	int to;
+
+	local_irq_save(flags);
+
+	/* Clear PCR high */
+	scom970_write(SCOM_PCR, 0);
+	/* Clear PCR low */
+	scom970_write(SCOM_PCR, PCR_HILO_SELECT | 0);
+	/* Set PCR low */
+	scom970_write(SCOM_PCR, PCR_HILO_SELECT |
+		      maple_pmode_data[speed_mode]);
+
+	/* Wait for completion */
+	for (to = 0; to < 10; to++) {
+		unsigned long psr = scom970_read(SCOM_PSR);
+
+		if ((psr & PSR_CMD_RECEIVED) == 0 &&
+		    (((psr >> PSR_CUR_SPEED_SHIFT) ^
+		      (maple_pmode_data[speed_mode] >> PCR_SPEED_SHIFT)) & 0x3)
+		    == 0)
+			break;
+		if (psr & PSR_CMD_COMPLETED)
+			break;
+		udelay(100);
+	}
+
+	local_irq_restore(flags);
+
+	maple_pmode_cur = speed_mode;
+	ppc_proc_freq = maple_cpu_freqs[speed_mode].frequency * 1000ul;
+
+	return 0;
+}
+
+static int maple_scom_query_freq(void)
+{
+	unsigned long psr = scom970_read(SCOM_PSR);
+	int i;
+
+	for (i = 0; i <= maple_pmode_max; i++)
+		if ((((psr >> PSR_CUR_SPEED_SHIFT) ^
+		      (maple_pmode_data[i] >> PCR_SPEED_SHIFT)) & 0x3) == 0)
+			break;
+	return i;
+}
+
+/*
+ * Common interface to the cpufreq core
+ */
+
+static int maple_cpufreq_verify(struct cpufreq_policy *policy)
+{
+	return cpufreq_frequency_table_verify(policy, maple_cpu_freqs);
+}
+
+static int maple_cpufreq_target(struct cpufreq_policy *policy,
+	unsigned int target_freq, unsigned int relation)
+{
+	unsigned int newstate = 0;
+	struct cpufreq_freqs freqs;
+	int rc;
+
+	if (cpufreq_frequency_table_target(policy, maple_cpu_freqs,
+			target_freq, relation, &newstate))
+		return -EINVAL;
+
+	if (maple_pmode_cur == newstate)
+		return 0;
+
+	mutex_lock(&maple_switch_mutex);
+
+	freqs.old = maple_cpu_freqs[maple_pmode_cur].frequency;
+	freqs.new = maple_cpu_freqs[newstate].frequency;
+	freqs.cpu = 0;
+
+	cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+	rc = maple_scom_switch_freq(newstate);
+	cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+
+	mutex_unlock(&maple_switch_mutex);
+
+	return rc;
+}
+
+static unsigned int maple_cpufreq_get_speed(unsigned int cpu)
+{
+	return maple_cpu_freqs[maple_pmode_cur].frequency;
+}
+
+static int maple_cpufreq_cpu_init(struct cpufreq_policy *policy)
+{
+	policy->cpuinfo.transition_latency = 12000;
+	policy->cur = maple_cpu_freqs[maple_scom_query_freq()].frequency;
+	/* secondary CPUs are tied to the primary one by the
+	 * cpufreq core if in the secondary policy we tell it that
+	 * it actually must be one policy together with all others. */
+	cpumask_copy(policy->cpus, cpu_online_mask);
+	cpufreq_frequency_table_get_attr(maple_cpu_freqs, policy->cpu);
+
+	return cpufreq_frequency_table_cpuinfo(policy,
+		maple_cpu_freqs);
+}
+
+
+static struct cpufreq_driver maple_cpufreq_driver = {
+	.name		= "maple",
+	.owner		= THIS_MODULE,
+	.flags		= CPUFREQ_CONST_LOOPS,
+	.init		= maple_cpufreq_cpu_init,
+	.verify		= maple_cpufreq_verify,
+	.target		= maple_cpufreq_target,
+	.get		= maple_cpufreq_get_speed,
+	.attr		= maple_cpu_freqs_attr,
+};
+
+static int __init maple_cpufreq_init(void)
+{
+	struct device_node *cpus;
+	struct device_node *cpunode;
+	unsigned int psize;
+	unsigned long max_freq;
+	const u32 *valp;
+	u32 pvr_hi;
+	int rc = -ENODEV;
+
+	/*
+	 * Behave here like powermac driver which checks machine compatibility
+	 * to ease merging of two drivers in future.
+	 */
+	if (!of_machine_is_compatible("Momentum,Maple") &&
+	    !of_machine_is_compatible("Momentum,Apache"))
+		return 0;
+
+	cpus = of_find_node_by_path("/cpus");
+	if (cpus == NULL) {
+		DBG("No /cpus node !\n");
+		return -ENODEV;
+	}
+
+	/* Get first CPU node */
+	for (cpunode = NULL;
+	     (cpunode = of_get_next_child(cpus, cpunode)) != NULL;) {
+		const u32 *reg = of_get_property(cpunode, "reg", NULL);
+		if (reg == NULL || (*reg) != 0)
+			continue;
+		if (!strcmp(cpunode->type, "cpu"))
+			break;
+	}
+	if (cpunode == NULL) {
+		printk(KERN_ERR "cpufreq: Can't find any CPU 0 node\n");
+		goto bail_cpus;
+	}
+
+	/* Check 970FX for now */
+	/* we actually don't care on which CPU to access PVR */
+	pvr_hi = PVR_VER(mfspr(SPRN_PVR));
+	if (pvr_hi != 0x3c && pvr_hi != 0x44) {
+		printk(KERN_ERR "cpufreq: Unsupported CPU version (%x)\n",
+				pvr_hi);
+		goto bail_noprops;
+	}
+
+	/* Look for the powertune data in the device-tree */
+	/*
+	 * On Maple this property is provided by PIBS in dual-processor config,
+	 * not provided by PIBS in CPU0 config and also not provided by SLOF,
+	 * so YMMV
+	 */
+	maple_pmode_data = of_get_property(cpunode, "power-mode-data", &psize);
+	if (!maple_pmode_data) {
+		DBG("No power-mode-data !\n");
+		goto bail_noprops;
+	}
+	maple_pmode_max = psize / sizeof(u32) - 1;
+
+	/*
+	 * From what I see, clock-frequency is always the maximal frequency.
+	 * The current driver can not slew sysclk yet, so we really only deal
+	 * with powertune steps for now. We also only implement full freq and
+	 * half freq in this version. So far, I haven't yet seen a machine
+	 * supporting anything else.
+	 */
+	valp = of_get_property(cpunode, "clock-frequency", NULL);
+	if (!valp)
+		return -ENODEV;
+	max_freq = (*valp)/1000;
+	maple_cpu_freqs[0].frequency = max_freq;
+	maple_cpu_freqs[1].frequency = max_freq/2;
+
+	/* Force apply current frequency to make sure everything is in
+	 * sync (voltage is right for example). Firmware may leave us with
+	 * a strange setting ...
+	 */
+	msleep(10);
+	maple_pmode_cur = -1;
+	maple_scom_switch_freq(maple_scom_query_freq());
+
+	printk(KERN_INFO "Registering Maple CPU frequency driver\n");
+	printk(KERN_INFO "Low: %d Mhz, High: %d Mhz, Cur: %d MHz\n",
+		maple_cpu_freqs[1].frequency/1000,
+		maple_cpu_freqs[0].frequency/1000,
+		maple_cpu_freqs[maple_pmode_cur].frequency/1000);
+
+	rc = cpufreq_register_driver(&maple_cpufreq_driver);
+
+	of_node_put(cpunode);
+	of_node_put(cpus);
+
+	return rc;
+
+bail_noprops:
+	of_node_put(cpunode);
+bail_cpus:
+	of_node_put(cpus);
+
+	return rc;
+}
+
+module_init(maple_cpufreq_init);
+
+
+MODULE_LICENSE("GPL");
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH V4 1/2] ppc: enable scom access functions on Maple
From: Dmitry Eremin-Solenikov @ 2011-06-29 15:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Paul Mackerras, cpufreq, Dave Jones
In-Reply-To: <1309360076-22579-1-git-send-email-dbaryshkov@gmail.com>

Enable functions used to access SCOM if PPC_MAPLE is defined: they are
used by cpufreq driver to control hardware.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
---
 arch/powerpc/kernel/misc_64.S |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index e89df59..616921e 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -339,7 +339,7 @@ _GLOBAL(real_205_writeb)
 #endif /* CONFIG_PPC_PASEMI */
 
 
-#ifdef CONFIG_CPU_FREQ_PMAC64
+#if defined(CONFIG_CPU_FREQ_PMAC64) || defined(CONFIG_CPU_FREQ_MAPLE)
 /*
  * SCOM access functions for 970 (FX only for now)
  *
@@ -408,7 +408,7 @@ _GLOBAL(scom970_write)
 	/* restore interrupts */
 	mtmsrd	r5,1
 	blr
-#endif /* CONFIG_CPU_FREQ_PMAC64 */
+#endif /* CONFIG_CPU_FREQ_PMAC64 || CONFIG_CPU_FREQ_MAPLE */
 
 
 /*
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH V4 0/2] Add cpufreq driver for Momentum Maple platform
From: Dmitry Eremin-Solenikov @ 2011-06-29 15:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Paul Mackerras, cpufreq, Dave Jones

Please merge this patchset, adding a cpufreq driver for Momentum Maple
platform. Changes since V3:

* Add comment regarding power-mode-data
* Adjusted kernel output a bit
* Tied the driver to compatible platforms only (as per
  arch/powerpc/platforms/maple/setup.c)

Dmitry Eremin-Solenikov (2):
      ppc: enable scom access functions on Maple
      Add cpufreq driver for Momentum Maple boards

 arch/powerpc/kernel/misc_64.S   |    4 +-
 drivers/cpufreq/Kconfig         |    5 +
 drivers/cpufreq/Kconfig.powerpc |    7 +
 drivers/cpufreq/Makefile        |    5 +
 drivers/cpufreq/maple-cpufreq.c |  309 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 328 insertions(+), 2 deletions(-)
 create mode 100644 drivers/cpufreq/Kconfig.powerpc
 create mode 100644 drivers/cpufreq/maple-cpufreq.c

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox