LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)
From: Benjamin Herrenschmidt @ 2009-08-06  3:52 UTC (permalink / raw)
  To: Sachin Sant; +Cc: Stephen Rothwell, linux-next, linuxppc-dev
In-Reply-To: <4A796237.6070302@in.ibm.com>

On Wed, 2009-08-05 at 16:13 +0530, Sachin Sant wrote:
> Benjamin Herrenschmidt wrote:
> > Thanks. I'll have a look next week. I think when I changed the indices
> > I may have forgotten to update something.
> >   
> Ben,
> 
> I can recreate this issue with today's next.
> Let me know if i can help in any way to fix this issue.

Does this patch fixes it ?

[PATCH] powerpc/mm: Fix encoding of page table cache numbers

The mask used to encode the page table cache number in the
batch when freeing page tables was too small for the new
possible values of MMU page sizes. This increases it along
with a comment explaining the constraints.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/pgalloc.h |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/pgalloc.h b/arch/powerpc/include/asm/pgalloc.h
index 34b0806..f2e812d 100644
--- a/arch/powerpc/include/asm/pgalloc.h
+++ b/arch/powerpc/include/asm/pgalloc.h
@@ -28,7 +28,12 @@ typedef struct pgtable_free {
 	unsigned long val;
 } pgtable_free_t;
 
-#define PGF_CACHENUM_MASK	0x7
+/* This needs to be big enough to allow for MMU_PAGE_COUNT + 2 to be stored
+ * and small enough to fit in the low bits of any naturally aligned page
+ * table cache entry. Arbitrarily set to 0x1f, that should give us some
+ * room to grow
+ */
+#define PGF_CACHENUM_MASK	0x1f
 
 static inline pgtable_free_t pgtable_free_cache(void *p, int cachenum,
 						unsigned long mask)
-- 
1.6.0.4


> Thanks
> -Sachin
> 
> >> : ------------[ cut here ]------------
> >> cpu 0x0: Vector: 700 (Program Check) at [c000000038923560]
> >>     pc: c0000000000486d4: .free_hugepte_range+0x68/0xa0
> >>     lr: c000000000048954: .hugetlb_free_pgd_range+0x248/0x38c
> >>     sp: c0000000389237e0
> >>    msr: 8000000000029032
> >>   current = 0xc00000003b1d7780
> >>   paca    = 0xc000000001002400
> >>     pid   = 2839, comm = readback
> >> kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36!
> >> enter ? for help
> >> [c000000038923880] c000000000048954 .hugetlb_free_pgd_range+0x248/0x38c
> >> [c000000038923970] c000000000165a48 .free_pgtables+0xa0/0x154
> >> [c000000038923a30] c000000000167f78 .exit_mmap+0x13c/0x1cc
> >> [c000000038923ae0] c0000000000997ec .mmput+0x68/0x14c
> >> [c000000038923b70] c00000000009f1d4 .exit_mm+0x190/0x1b8
> >> [c000000038923c20] c0000000000a16e8 .do_exit+0x214/0x784
> >> [c000000038923d00] c0000000000a1d1c .do_group_exit+0xc4/0xf8
> >> [c000000038923da0] c0000000000a1d7c .SyS_exit_group+0x2c/0x48
> >> [c000000038923e30] c0000000000085b4 syscall_exit+0x0/0x40
> >> --- Exception: c01 (System Call) at 000000000fe15038
> >> SP (ffb8e030) is in userspace
> >> 0:mon> e
> >> cpu 0x0: Vector: 700 (Program Check) at [c000000038923560]
> >>     pc: c0000000000486d4: .free_hugepte_range+0x68/0xa0
> >>     lr: c000000000048954: .hugetlb_free_pgd_range+0x248/0x38c
> >>     sp: c0000000389237e0
> >>    msr: 8000000000029032
> >>   current = 0xc00000003b1d7780
> >>   paca    = 0xc000000001002400
> >>     pid   = 2839, comm = readback
> >> kernel BUG at /home/linux-2.6.31-rc4/arch/powerpc/include/asm/pgalloc.h:36!
> >> 0:mon> r
> >> R00 = 0000000000000001   R16 = 0000000000000000
> >> R01 = c0000000389237e0   R17 = 0000000000000001
> >> R02 = c000000000f165a8   R18 = 000000003fffffff
> >> R03 = c0000000014504d0   R19 = 0000000000000000
> >> R04 = c000000039390001   R20 = 0000000000000000
> >> R05 = 0000000000000007   R21 = 0000010000000000
> >> R06 = 0000000000000000   R22 = 0000000040000000
> >> R07 = 0000000040000000   R23 = c0000000014504d0
> >> R08 = c00000003d708188   R24 = 000000003fffffff
> >> R09 = c00000003eb40000   R25 = 0000000000000007
> >> R10 = c00000003d708188   R26 = c00000003ebd41b8
> >> R11 = 0000000000000018   R27 = c0000000014504d0
> >> R12 = 0000000040000448   R28 = c00000003eb40018
> >> R13 = c000000001002400   R29 = 0000000000000008
> >> R14 = 00000000ffffffff   R30 = 0000000040000000
> >> R15 = 00000000ffffffff   R31 = c0000000389237e0
> >> pc  = c0000000000486d4 .free_hugepte_range+0x68/0xa0
> >> lr  = c000000000048954 .hugetlb_free_pgd_range+0x248/0x38c
> >> msr = 8000000000029032   cr  = 20042444
> >> ctr = 800000000000b6f4   xer = 0000000000000001   trap =  700
> >> 0:mon> 
> >>
> >> Line 36 of arch/powerpc/include/asm/pgalloc.h corresponds to
> >>
> >> BUG_ON(cachenum > PGF_CACHENUM_MASK);
> >>
> >> May be something to do with number of elements in huge_pgtable_cache_name ??
> >>
> >> Thanks
> >> -Sachin

^ permalink raw reply related

* Re: ftrace scripts and make V=1
From: Ingo Molnar @ 2009-08-06  3:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, Dave Airlie, LKML, Sam Ravnborg, linuxppc-dev
In-Reply-To: <alpine.DEB.2.00.0908052011590.5010@gandalf.stny.rr.com>


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Well we tracked it down and it is powerpc64 specific.
> 
> Seems that in drivers/hwmon/lm93.c there's a function called:
> 
>    LM93_IN_FROM_REG()
> 
> But PPC64 has function descriptors and the real function names (the ones 
> you see in objdump) start with a '.'. Thus this in objdump you have:
> 
>  Disassembly of section .text:
> 
>  0000000000000000 <.LM93_IN_FROM_REG>:
>        0:       7c 08 02 a6     mflr    r0
>        4:       fb 81 ff e0     std     r28,-32(r1)
> 
> 
> The function name used is .LM93_IN_FROM_REG. But gcc considers 
> symbols that start with ".L" as a special symbol that is used 
> inside the assembly stage.
> 
> The nm passed into recordmcount uses the --synthetic option which 
> shows the ".L" symbols (my runs outside of the build did not 
> include the --synthetic option, so my older patch worked). We see 
> the function as a local.
> 
> Now to capture all the locations that use "mcount" we need to have 
> a reference to link into the object file a list of mcount callers. 
> We need a reference that will not disappear. We try to use a 
> global function and if that does not work, we use a local function 
> as a reference. But to relink the section back into the object, we 
> need to make it global. In this case, we run objcopy using 
> --globalize-symbol and --localize-symbol to convert the symbol 
> into a global symbol, link the mcount list, then convert it back 
> to a local symbol.
> 
> This works great except for this case. .L* symbols can not be 
> converted into a global symbol, and the mcount section referencing 
> it will remain unresolved.
> 
> Try this patch and see if it fixes your issue.
> 
> Thanks!
> 
> -- Steve
> 
> diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
> index d29baa2..4889c44 100755
> --- a/scripts/recordmcount.pl
> +++ b/scripts/recordmcount.pl
> @@ -414,7 +414,10 @@ while (<IN>) {
>  	    $offset = hex $1;
>  	} else {
>  	    # if we already have a function, and this is weak, skip it
> -	    if (!defined($ref_func) && !defined($weak{$text})) {
> +	    if (!defined($ref_func) && !defined($weak{$text}) &&
> +		 # PPC64 can have symbols that start with .L and
> +		 # gcc considers these special. Don't use them!
> +		 $text !~ /^\.L/) {
>  		$ref_func = $text;
>  		$offset = hex $1;
>  	    }

Ah, indeed. I'm wondering whether also emitting a build warning 
would be useful - just in the (admittedly unlikely) case of someone 
wondering about why LM93_IN_FROM_REG does not show up in function 
traces.

	Ingo

^ permalink raw reply

* Re: [PATCH 0/3] cpu: idle state framework for offline CPUs.
From: Shaohua Li @ 2009-08-06  1:58 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Brown, Len, Peter Zijlstra, linux-kernel@vger.kernel.org,
	Pallipadi, Venkatesh, Ingo Molnar, linuxppc-dev@lists.ozlabs.org,
	Darrick J. Wong
In-Reply-To: <20090805142311.553.78286.stgit@sofia.in.ibm.com>

Hi,

On Wed, Aug 05, 2009 at 10:25:53PM +0800, Gautham R Shenoy wrote:
> In this patch-series, we propose to extend the CPU-Hotplug infrastructure
> and allow the system administrator to choose the desired state the CPU should
> go to when it is offlined. We think this approach addresses the concerns about
> determinism as well as transparency, since CPU-Hotplug already provides
> notification mechanism which the userspace can listen to for any change
> in the configuration and correspondingly readjust any previously set
> cpu-affinities.
Peter dislikes any approach (including cpuhotplug) which breaks userspace policy,
even userspace can get a notification.

> Also, approaches such as [1] can make use of this
> extended infrastructure instead of putting the CPU to an arbitrary C-state
> when it is offlined, thereby providing the system administrator a rope to hang
> himself with should he feel the need to do so.
I didn't see the reason why administrator needs to know which state offline cpu
should stay. Don't know about powerpc side, but in x86 side, it appears deepest
C-state is already preferred.

Thanks,
Shaohua

^ permalink raw reply

* Re: ftrace scripts and make V=1
From: Steven Rostedt @ 2009-08-06  2:00 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, Dave Airlie, LKML, Sam Ravnborg, linuxppc-dev
In-Reply-To: <20090805072952.GC19322@elte.hu>


On Wed, 5 Aug 2009, Ingo Molnar wrote:

> 
> * Dave Airlie <airlied@gmail.com> wrote:
> 
> > Hey,
> > 
> > So I spent 3-4 hrs today (I'm stupid yes) tracking down a .o 
> > breakage by blaming rawhide gcc/binutils as I was using make 
> > V=1and seeing only the compiler chain running,
> 
> Hm, is this that powerpc related build bug you just reported?

Well we tracked it down and it is powerpc64 specific.

Seems that in drivers/hwmon/lm93.c there's a function called:

   LM93_IN_FROM_REG()

But PPC64 has function descriptors and the real function names (the ones 
you see in objdump) start with a '.'. Thus this in objdump you have:

 Disassembly of section .text:

 0000000000000000 <.LM93_IN_FROM_REG>:
       0:       7c 08 02 a6     mflr    r0
       4:       fb 81 ff e0     std     r28,-32(r1)


The function name used is .LM93_IN_FROM_REG. But gcc considers symbols 
that start with ".L" as a special symbol that is used inside the assembly 
stage.

The nm passed into recordmcount uses the --synthetic option which shows 
the ".L" symbols (my runs outside of the build did not include the 
--synthetic option, so my older patch worked). We see the function as a 
local.

Now to capture all the locations that use "mcount" we need to have a 
reference to link into the object file a list of mcount callers. We need a 
reference that will not disappear. We try to use a global function and if 
that does not work, we use a local function as a reference. But to relink 
the section back into the object, we need to make it global. In this case, 
we run objcopy using --globalize-symbol and --localize-symbol to convert 
the symbol into a global symbol, link the mcount list, then convert it 
back to a local symbol.

This works great except for this case. .L* symbols can not be converted 
into a global symbol, and the mcount section referencing it will remain 
unresolved.

Try this patch and see if it fixes your issue.

Thanks!

-- Steve

diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
index d29baa2..4889c44 100755
--- a/scripts/recordmcount.pl
+++ b/scripts/recordmcount.pl
@@ -414,7 +414,10 @@ while (<IN>) {
 	    $offset = hex $1;
 	} else {
 	    # if we already have a function, and this is weak, skip it
-	    if (!defined($ref_func) && !defined($weak{$text})) {
+	    if (!defined($ref_func) && !defined($weak{$text}) &&
+		 # PPC64 can have symbols that start with .L and
+		 # gcc considers these special. Don't use them!
+		 $text !~ /^\.L/) {
 		$ref_func = $text;
 		$offset = hex $1;
 	    }

^ permalink raw reply related

* Re: kexec on e300 core / mpc5121
From: Kenneth Johansson @ 2009-08-06  0:20 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Sebastian Andrzej Siewior
In-Reply-To: <20090805234737.GA26183@b07421-ec1.am.freescale.net>

On Wed, 2009-08-05 at 18:47 -0500, Scott Wood wrote:
> On Thu, Aug 06, 2009 at 12:49:45AM +0200, Kenneth Johansson wrote:
> > On Wed, 2009-08-05 at 01:06 +0200, Sebastian Andrzej Siewior wrote:
> > > I've tried kexec on e300 core which should be easy since it is possible
> > > to disable the MMU on that core. However it does not work.
> > 
> > Is it not possible to disable the mmu on all cpu's that have one ?? 
> 
> No, on e500 for example it is always on.  You can use large pages with
> identity maps to make it seem like it's off, though.

why do something like that ? 

> > Before you turn off the cache you need to flush out all dirty data. best
> > done by simply reading in 32kb of crap from somewhere. otherwise you are
> > sure to loose at least the stack and you do not want that.
> 
> 32KiB is usually not sufficient -- depending on the initial state, an
> 8-way 32KiB cache with PLRU (such as in e300) can require up to 52KiB of
> data (13 loads per set) to fully flush if you simply load+dcbf (in
> separate passes) an arbitrary chunk of data which may already be in the
> cache.

if you have 
int crap[1024*32/4] __attribute__((aligned(32)))

What will happen with the cache if you just load data into a register
from that array ?? Wont it force out everything else in the cache to
make room for the crap ? 

 

^ permalink raw reply

* Re: kexec on e300 core / mpc5121
From: Scott Wood @ 2009-08-05 23:47 UTC (permalink / raw)
  To: Kenneth Johansson; +Cc: linuxppc-dev, Sebastian Andrzej Siewior
In-Reply-To: <1249512585.13069.9.camel@localhost>

On Thu, Aug 06, 2009 at 12:49:45AM +0200, Kenneth Johansson wrote:
> On Wed, 2009-08-05 at 01:06 +0200, Sebastian Andrzej Siewior wrote:
> > I've tried kexec on e300 core which should be easy since it is possible
> > to disable the MMU on that core. However it does not work.
> 
> Is it not possible to disable the mmu on all cpu's that have one ?? 

No, on e500 for example it is always on.  You can use large pages with
identity maps to make it seem like it's off, though.

> Before you turn off the cache you need to flush out all dirty data. best
> done by simply reading in 32kb of crap from somewhere. otherwise you are
> sure to loose at least the stack and you do not want that.

32KiB is usually not sufficient -- depending on the initial state, an
8-way 32KiB cache with PLRU (such as in e300) can require up to 52KiB of
data (13 loads per set) to fully flush if you simply load+dcbf (in
separate passes) an arbitrary chunk of data which may already be in the
cache.

If instead you load+dcbf something that you know is not already in the
cache, or if you have a flush-assist mode that does not choose vacant
cache lines when available (instead sticking strictly to the PLRU), the
maximum is 48KiB.

If you have flush-assist *and* you guarantee no hit on the flush data,
then you can get away with only 32KiB.

-Scott

^ permalink raw reply

* Re: kexec on e300 core / mpc5121
From: Kenneth Johansson @ 2009-08-05 22:49 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linuxppc-dev
In-Reply-To: <20090804230605.GA28753@Chamillionaire.breakpoint.cc>

On Wed, 2009-08-05 at 01:06 +0200, Sebastian Andrzej Siewior wrote:
> I've tried kexec on e300 core which should be easy since it is possible
> to disable the MMU on that core. However it does not work.

Is it not possible to disable the mmu on all cpu's that have one ?? 

> Once I disable the MMU, I can't access my MBAR and print chars on the
> serial port. Is this "normal" or do I have still some caches on?

Yes cache and mmu is separate. the 5121 is not cache coherent and do not
limit cache to only memory regions so serial port or any memory mapped
register is a no no unless you have cache off or cache on and mmu on
with a correct setting for what address range to cache. 

Before you turn off the cache you need to flush out all dirty data. best
done by simply reading in 32kb of crap from somewhere. otherwise you are
sure to loose at least the stack and you do not want that.

^ permalink raw reply

* [PATCH] powerpc: switch to asm-generic/hardirq.h
From: Christoph Hellwig @ 2009-08-05 22:24 UTC (permalink / raw)
  To: linuxppc-dev

hardirq.h on powerpc defines a __last_jiffy_stamp field, but it's not
actually used anywhere.


Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/arch/powerpc/include/asm/hardirq.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/hardirq.h	2009-08-05 19:19:57.391342973 -0300
+++ linux-2.6/arch/powerpc/include/asm/hardirq.h	2009-08-05 19:20:38.658365423 -0300
@@ -1,29 +1 @@
-#ifndef _ASM_POWERPC_HARDIRQ_H
-#define _ASM_POWERPC_HARDIRQ_H
-#ifdef __KERNEL__
-
-#include <asm/irq.h>
-#include <asm/bug.h>
-
-/* The __last_jiffy_stamp field is needed to ensure that no decrementer
- * interrupt is lost on SMP machines. Since on most CPUs it is in the same
- * cache line as local_irq_count, it is cheap to access and is also used on UP
- * for uniformity.
- */
-typedef struct {
-	unsigned int __softirq_pending;	/* set_bit is used on this */
-	unsigned int __last_jiffy_stamp;
-} ____cacheline_aligned irq_cpustat_t;
-
-#include <linux/irq_cpustat.h>	/* Standard mappings for irq_cpustat_t above */
-
-#define last_jiffy_stamp(cpu) __IRQ_STAT((cpu), __last_jiffy_stamp)
-
-static inline void ack_bad_irq(int irq)
-{
-	printk(KERN_CRIT "illegal vector %d received!\n", irq);
-	BUG();
-}
-
-#endif /* __KERNEL__ */
-#endif /* _ASM_POWERPC_HARDIRQ_H */
+#include <asm-generic/hardirq.h>

^ permalink raw reply

* Re: kexec on e300 core / mpc5121
From: Sebastian Andrzej Siewior @ 2009-08-05 21:33 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <20090804230605.GA28753@Chamillionaire.breakpoint.cc>

* Sebastian Andrzej Siewior | 2009-08-05 01:06:06 [+0200]:
[long mail]
>Does someone have an idea?
Issue fixed, it was the ipic. The missing data while disabling the
caches was my fault because I invalidated the caches before I've flushed
them.

Sebastian

^ permalink raw reply

* Re: [PATCH] powerpc/ipic: unmask all interrupt sources
From: Sebastian Andrzej Siewior @ 2009-08-05 21:29 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev
In-Reply-To: <92176A7D-89C7-484F-B4C8-8E510271A512@kernel.crashing.org>

* Kumar Gala | 2009-08-05 15:04:16 [-0500]:
>
> looks good.. I'll pick this up for .32 since it doesn't seem to be a bug 
> until we have kexec.
Well, the code for the non-mmu variant is there and is working. It was
just the ipic thing which was holding me back. However I'm fine with .32
I'm stocked here with .26.
The bigger issue is the user space for ppc32 which is non-working in
current upstream since it is pre-device tree and game cube only.
Once you have time to review the kernel interface (the part where we
jump to the new kernel / kernel wrapper) I could try to rebase my
patches and repost them :)

> - k

Sebastian

^ permalink raw reply

* Re: [PATCH] Fix perfctr oops on ppc32
From: David Woodhouse @ 2009-08-05 21:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Paul Mackerras, linuxppc-dev
In-Reply-To: <1249506153.18245.61.camel@pasglop>

On Thu, 2009-08-06 at 07:02 +1000, Benjamin Herrenschmidt wrote:
> 
> Argh, ignore my Acked-by, I think the patch isn't right...

Hm, good point. Doh.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

^ permalink raw reply

* Re: [PATCH] Fix perfctr oops on ppc32
From: Benjamin Herrenschmidt @ 2009-08-05 21:02 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Paul Mackerras, linuxppc-dev
In-Reply-To: <1249133370.24204.3.camel@macbook.infradead.org>

On Sat, 2009-08-01 at 14:29 +0100, David Woodhouse wrote:
> This seems to be the reason why the Fedora rawhide 2.6.31-rc kernel
> doesn't boot. With some CPUs, cur_cpu_spec->oprofile_cpu_type can be
> NULL -- which makes strcmp() unhappy.
> 
> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
> ---
> At first glance, it looks like there are a bunch of other places which
> use cur_cpu_spec->oprofile_cpu_type without first checking if it's
> non-NULL, but maybe those are all 64-bit and all 64-bit cpu types have
> it set?

Argh, ignore my Acked-by, I think the patch isn't right...

> diff --git a/arch/powerpc/kernel/mpc7450-pmu.c b/arch/powerpc/kernel/mpc7450-pmu.c
> index 75ff47f..ea383c1 100644
> --- a/arch/powerpc/kernel/mpc7450-pmu.c
> +++ b/arch/powerpc/kernel/mpc7450-pmu.c
> @@ -408,7 +408,8 @@ struct power_pmu mpc7450_pmu = {
>  
>  static int init_mpc7450_pmu(void)
>  {
> -	if (strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc/7450"))
> +	if (cur_cpu_spec->oprofile_cpu_type &&
> +	    strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc/7450"))
>  		return -ENODEV;

That means that if we have oprofile_cpu_type, we will enable the 7450
PMCs which doesn't sound right. Shouldn't it be instead:

	if (!cur_cpu_spec->oprofile_cpu_type ||
	    strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc/7450"))
 		return -ENODEV;

Cheers,
Ben.


>  	return register_power_pmu(&mpc7450_pmu);
> 
> 

^ permalink raw reply

* Re: [PATCH] powerpc/ipic: unmask all interrupt sources
From: Kumar Gala @ 2009-08-05 20:04 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linuxppc-dev
In-Reply-To: <20090805194112.GA18204@www.tglx.de>


On Aug 5, 2009, at 2:41 PM, Sebastian Andrzej Siewior wrote:

> in case the interrupt controller was used in an earlier life then it  
> is
> possible it is that some of its sources were used and are still  
> unmask.
> If the (unmasked) device is active and is creating interrupts (or one
> interrupts was pending since the interrupts were disabled) then the  
> boot
> process "ends" very soon. Once external interrupts are enabled, we  
> land in
> -> do_IRQ
>  -> call ppc_md.get_irq()
>     -> ipic_read() gets the source number
>     -> irq_linear_revmap(source)
>        -> revmap[source] == NO_IRQ
>           -> irq_find_mapping(source) returns NO_IRQ because no source
>              is registered
>  -> source is NO_IRQ, ppc_spurious_interrupts gets incremented, no
>     further action.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> This solves my kexec problem I had earlier. I could disable the device
> in ->shutdown path but the device in question could been used in the
> boot loader. Usually one gets the "nobody cared" message for unhandled
> interrupts but in this (rare) case nothing happens and box stands  
> still.
>
> arch/powerpc/sysdev/ipic.c |    3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)

looks good.. I'll pick this up for .32 since it doesn't seem to be a  
bug until we have kexec.

- k

^ permalink raw reply

* [PATCH] powerpc/ipic: unmask all interrupt sources
From: Sebastian Andrzej Siewior @ 2009-08-05 19:41 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev

in case the interrupt controller was used in an earlier life then it is
possible it is that some of its sources were used and are still unmask.
If the (unmasked) device is active and is creating interrupts (or one
interrupts was pending since the interrupts were disabled) then the boot
process "ends" very soon. Once external interrupts are enabled, we land in
-> do_IRQ
  -> call ppc_md.get_irq()
     -> ipic_read() gets the source number
     -> irq_linear_revmap(source)
        -> revmap[source] == NO_IRQ
           -> irq_find_mapping(source) returns NO_IRQ because no source
              is registered
  -> source is NO_IRQ, ppc_spurious_interrupts gets incremented, no
     further action.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
This solves my kexec problem I had earlier. I could disable the device
in ->shutdown path but the device in question could been used in the
boot loader. Usually one gets the "nobody cared" message for unhandled
interrupts but in this (rare) case nothing happens and box stands still.

 arch/powerpc/sysdev/ipic.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/sysdev/ipic.c b/arch/powerpc/sysdev/ipic.c
index 69e2630..ebb7e58 100644
--- a/arch/powerpc/sysdev/ipic.c
+++ b/arch/powerpc/sysdev/ipic.c
@@ -781,6 +781,9 @@ struct ipic * __init ipic_init(struct device_node *node, unsigned int flags)
 	primary_ipic = ipic;
 	irq_set_default_host(primary_ipic->irqhost);
 
+	ipic_write(ipic->regs, IPIC_SIMSR_H, 0);
+	ipic_write(ipic->regs, IPIC_SIMSR_L, 0);
+
 	printk ("IPIC (%d IRQ sources) at %p\n", NR_IPIC_INTS,
 			primary_ipic->regs);
 
-- 
1.6.2.5

^ permalink raw reply related

* Re: [PATCH] Do not inline putprops function
From: M. Mohan Kumar @ 2009-08-05 16:49 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Neil Horman, Simon Horman, kexec, miltonm
In-Reply-To: <20090803054919.GA19594@in.ibm.com>

Hi,

When I align the dtstruct variable to 8 bytes, I am able to invoke kdump.

When the line
	static unsigned dtstruct[TREEWORDS], *dt;
changed to 
	static unsigned dtstruct[TREEWORDS] __attribute__ ((aligned (8))), *dt;

kexec-tool works.

Regards,
M. Mohan Kumar

On Mon, Aug 03, 2009 at 11:19:19AM +0530, M. Mohan Kumar wrote:
> On Wed, Jun 24, 2009 at 10:27:43AM +1000, Michael Ellerman wrote:
> > On Tue, 2009-06-23 at 09:56 -0400, Neil Horman wrote:
> > > On Tue, Jun 23, 2009 at 06:25:34PM +0530, M. Mohan Kumar wrote:
> > > > 
> > > Well it definately looks like removing that variable had some code changes.
> > > It'll take some time to match it up to source, but Most interesting I think is
> > > the variance in putprops around address f34.  Looks like its doing some string
> > > maniuplation in a reversed order, using a huge offset.  Might be worthwhile to
> > > check to see if theres any string overruns in this code.
> > 
> > Yeah I still suspect it's just a bug in the code that's being exposed
> > now.
> > 
> Hi,
> 
> The same code works with gcc-3.4.
> 
> > Mohan, can you try running it under valgrind?
> 
> Still I am not able to use valgrind to debug kexec-tools
> 
> Regards,
> M. Mohan Kumar.
> 
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply

* Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)
From: Sachin Sant @ 2009-08-05 15:33 UTC (permalink / raw)
  To: Kumar Gala; +Cc: Stephen Rothwell, linux-next, linuxppc-dev
In-Reply-To: <E0A64FA0-C835-4361-A203-E9CB5377599C@kernel.crashing.org>

Kumar Gala wrote:
>
> On Jul 29, 2009, at 10:04 AM, Sachin Sant wrote:
>
>> While executing hugetlb tests against today's Next tree on
>> a Power 6 box came across following OOPS.
>
> out of interest what tests are you running for hugetlb?
The one maintained at : http://libhugetlbfs.ozlabs.org/ which points
to the sourceforge libhugetlbfs project.

Latest release can be downloaded from sourceforge using
http://sourceforge.net/projects/libhugetlbfs/files/

I am using version 2.5

Thanks
-Sachin



-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

^ permalink raw reply

* RE: Spansion S25FL128-Flash and MTD
From: EXTERNAL Lange Matthias (AA-DGW/ENG1) @ 2009-08-05 14:59 UTC (permalink / raw)
  To: linuxppc-dev@ozlabs.org
In-Reply-To: <44C5CFA72BC0E242A53B64DC1E67DEB80FB989659B@SI-MBX10.de.bosch.com>

I solved part of my problem. The flash now gets detected during boot up. I =
had to change the compatible property from ""amd,s25sl12800", "jedec-flash"=
" to ""amd,m25p80", "jedec-flash"".

What remains is creating the mtd partitions which are specified in the devi=
ce tree as follows

partition@0 {
        label =3D "firmware";
        reg =3D <0x0 0x800000>;
};
partition@800000 {
        label =3D "rootfs";
        reg =3D <0x800000 0x800000>;
};

During boot no partitions are created and "cat /proc/mtd" just gives me

dev:    size   erasesize  name
mtd0: 01000000 00010000 "spi32766.0"

Any ideas?

Regards,
Matthias.

> -----Original Message-----
> From:
> linuxppc-dev-bounces+matthias.lange=3Dbeissbarth.com@lists.ozlab
> s.org
> [mailto:linuxppc-dev-bounces+matthias.lange=3Dbeissbarth.com@lis
ts.ozlabs.org] On Behalf Of EXTERNAL Lange Matthias (AA-DGW/ENG1)
> Sent: Wednesday, August 05, 2009 9:17 AM
> To: linuxppc-dev@ozlabs.org
> Subject: Spansion S25FL128-Flash and MTD
>
> Hi,
>
> I am trying to get a MTD running on my embedded PowerPC
> board. I am using a Xilinx Virtex4 with an embedded PowerPC
> 405. In the FPGA there is a Xilinx SPI controller implemented
> to which a Spansion S25FL128 SPI-flash (16MB) is connected.
>
> The problem is that with my setup the flash chips gets not
> detected and the MTD partitions won't be setup.
>
> In the kernel config I have configured the following options:
>
> In the device drivers section:
> [x] SPI support
>         [x] Xilinx SPI controller
> [x] Memory Technology Device (MTD) support
>         [x] MTD partitioning support
>         [x] Flash partition map based on OF description
>         RAM/ROM/Flash chip drivers
>                 [x] Detect non-CFI AMD/JEDEC-compatible flash chips
>         Self-contained MTD device drivers
>                 [x] Support most SPI Flash chips (AT26DF,
> M25P, W25X, ...)
>
> As filesystem I have configured JFFS2.
>
> In my device tree I have declared the SPI controller and the
> connected flash as follows:
>
> SPI_Flash: xps-spi@83400000 {
>         compatible =3D "xlnx,xps-spi-2.00.b";
>         interrupt-parent =3D <&int_ctrl>;
>         interrupts =3D < 3 2 >;
>         reg =3D < 0x83400000 0x10000 >;
>         xlnx,family =3D "virtex4";
>         xlnx,fifo-exist =3D <0x0>;
>         xlnx,num-offchip-ss-bits =3D <0x1>;
>         xlnx,num-ss-bits =3D <0x1>;
>         xlnx,sck-ratio =3D <0x20>;
>         #address-cells =3D <1>;
>         #size-cells =3D <1>;
>         nor_flash@0,1000000 {
>                 compatible =3D "amd,s25sl12800", "jedec-flash";
>                 reg =3D <0x0 0x1000000>;
>                 spi-max-frequency =3D <25000000>;
>                 bank-width =3D <1>;
>                 device-width =3D <1>;
>                 #address-cells =3D <1>;
>                 #size-cells =3D <1>;
>                 rootfs@800000 {
>                         label =3D "rootfs";
>                         reg =3D <0x800000 0x800000>;
>                 };
>         };
> };
>
> When the kernel boots I can see from the console that the
> drivers for JFFS2 and the SPI controller get successfully loaded:
>
> ...
> JFFS2 version 2.2. (NAND) (c) 2001-2006 Red Hat, Inc.
> ...
> xilinx-xps-spi 83400000.xps-spi: at 0x83400000 mapped to
> 0xC9060000, irq=3D17
> ...
>
> While debugging the boot process I couldn't see any calls to
> jedec_probe() resp. m25p_probe() from m25p80.c which to my
> understanding should be called in result to a successfull
> match of my compatible property (jedec-flash).
>
> The question is: What am I missing? Is there a problem with
> my device tree definition? What else am I doing wrong?
>
> Any help is appreciated. Regards,
> Matthias.
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>

^ permalink raw reply

* Re: Next July 29 : Hugetlb test failure (OOPS free_hugepte_range)
From: Kumar Gala @ 2009-08-05 14:35 UTC (permalink / raw)
  To: Sachin Sant; +Cc: Stephen Rothwell, linux-next, linuxppc-dev
In-Reply-To: <4A706504.6040704@in.ibm.com>


On Jul 29, 2009, at 10:04 AM, Sachin Sant wrote:

> While executing hugetlb tests against today's Next tree on
> a Power 6 box came across following OOPS.

out of interest what tests are you running for hugetlb?

- k

^ permalink raw reply

* [PATCH 3/3] pSeries: cpu: Cede CPU during a deactivate-offline
From: Gautham R Shenoy @ 2009-08-05 14:26 UTC (permalink / raw)
  To: Joel Schopp, len.brown, Peter Zijlstra, Balbir Singh,
	Venkatesh Pallipadi, Benjamin Herrenschmidt, shaohua.li,
	Ingo Molnar, Vaidyanathan Srinivasan, Dipankar Sarma,
	Darrick J. Wong
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20090805142311.553.78286.stgit@sofia.in.ibm.com>

Implements the pSeries specific code bits to put the CPU into
rtas_stop_self() state or H_CEDE state depending on the
preferred_offline_state value for that CPU.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c    |   70 +++++++++++++++++++++--
 arch/powerpc/platforms/pseries/offline_driver.h |    3 +
 arch/powerpc/platforms/pseries/plpar_wrappers.h |    6 ++
 arch/powerpc/platforms/pseries/smp.c            |   18 ++++++
 4 files changed, 88 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index f15de99..5b47d6c 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -41,6 +41,8 @@ struct cpu_offline_state {
 };
 
 DEFINE_PER_CPU(enum cpu_state_vals, preferred_offline_state) = CPU_DEALLOCATE;
+DEFINE_PER_CPU(enum cpu_state_vals, cpu_current_state);
+DEFINE_PER_CPU(int, cpu_offline_ack);
 
 ssize_t pSeries_show_available_states(struct sys_device *dev,
 			struct sysdev_attribute *attr, char *buf)
@@ -148,8 +150,47 @@ static void pseries_mach_cpu_die(void)
 	local_irq_disable();
 	idle_task_exit();
 	xics_teardown_cpu();
-	unregister_slb_shadow(hard_smp_processor_id(), __pa(get_slb_shadow()));
-	rtas_stop_self();
+	if (__get_cpu_var(preferred_offline_state) == CPU_DEACTIVATE) {
+
+		__get_cpu_var(cpu_offline_ack) = 1;
+		get_lppaca()->idle = 1;
+		if (!get_lppaca()->shared_proc)
+			get_lppaca()->donate_dedicated_cpu = 1;
+
+		printk(KERN_INFO "cpu %u (hwid %u) ceding\n",
+			smp_processor_id(), hard_smp_processor_id());
+
+		while (__get_cpu_var(cpu_current_state)	== CPU_DEACTIVATE) {
+			cede_processor();
+			printk(KERN_INFO "cpu %u (hwid %u) Returned from cede.\
+				Decrementer value: %x. Timebase value:%llx \n",
+			       smp_processor_id(), hard_smp_processor_id(),
+			       get_dec(), get_tb());
+		}
+
+		printk(KERN_INFO "cpu %u (hwid %u) Received online PROD \n",
+		       smp_processor_id(), hard_smp_processor_id());
+
+		if (!get_lppaca()->shared_proc)
+			get_lppaca()->donate_dedicated_cpu = 0;
+		get_lppaca()->idle = 0;
+
+		unregister_slb_shadow(hard_smp_processor_id(),
+						__pa(get_slb_shadow()));
+
+		/**
+		 * NOTE: Calling start_secondary() here, is not a very nice
+		 * way of beginning a new context.
+		 *
+		 * We need to reset the stack-pointer.
+		 * Find a cleaner way to do this.
+		 */
+		start_secondary(NULL);
+	} else {
+		unregister_slb_shadow(hard_smp_processor_id(),
+						__pa(get_slb_shadow()));
+		rtas_stop_self();
+	}
 	/* Should never get here... */
 	BUG();
 	for(;;);
@@ -192,6 +233,10 @@ static int pseries_cpu_disable(void)
 
 	/* FIXME: abstract this to not be platform specific later on */
 	xics_migrate_irqs_away();
+	__get_cpu_var(cpu_current_state) =
+		__get_cpu_var(preferred_offline_state);
+
+	__get_cpu_var(cpu_offline_ack) = 0;
 	return 0;
 }
 
@@ -201,11 +246,22 @@ static void pseries_cpu_die(unsigned int cpu)
 	int cpu_status;
 	unsigned int pcpu = get_hard_smp_processor_id(cpu);
 
-	for (tries = 0; tries < 25; tries++) {
-		cpu_status = query_cpu_stopped(pcpu);
-		if (cpu_status == 0 || cpu_status == -1)
-			break;
-		cpu_relax();
+	if (per_cpu(preferred_offline_state, cpu) == CPU_DEACTIVATE) {
+		/* Wait for some Ack */
+		for (tries = 0; tries < 10000; tries++) {
+			cpu_status = !per_cpu(cpu_offline_ack, cpu);
+			if (!cpu_status)
+				break;
+			cpu_relax();
+		}
+
+	} else {
+		for (tries = 0; tries < 25; tries++) {
+			cpu_status = query_cpu_stopped(pcpu);
+			if (cpu_status == 0 || cpu_status == -1)
+				break;
+			cpu_relax();
+		}
 	}
 	if (cpu_status != 0) {
 		printk("Querying DEAD? cpu %i (%i) shows %i\n",
diff --git a/arch/powerpc/platforms/pseries/offline_driver.h b/arch/powerpc/platforms/pseries/offline_driver.h
index bdae76a..571e085 100644
--- a/arch/powerpc/platforms/pseries/offline_driver.h
+++ b/arch/powerpc/platforms/pseries/offline_driver.h
@@ -12,5 +12,6 @@ enum cpu_state_vals {
 };
 
 DECLARE_PER_CPU(enum cpu_state_vals, preferred_offline_state);
-
+DECLARE_PER_CPU(enum cpu_state_vals, cpu_current_state);
+extern int start_secondary(void *unused);
 #endif
diff --git a/arch/powerpc/platforms/pseries/plpar_wrappers.h b/arch/powerpc/platforms/pseries/plpar_wrappers.h
index a24a6b2..ab7f5ab 100644
--- a/arch/powerpc/platforms/pseries/plpar_wrappers.h
+++ b/arch/powerpc/platforms/pseries/plpar_wrappers.h
@@ -14,6 +14,12 @@ static inline long cede_processor(void)
 	return plpar_hcall_norets(H_CEDE);
 }
 
+static inline long prod_processor(unsigned long cpu)
+{
+	unsigned long hcpuid = get_hard_smp_processor_id(cpu);
+	return plpar_hcall_norets(H_PROD, hcpuid);
+}
+
 static inline long vpa_call(unsigned long flags, unsigned long cpu,
 		unsigned long vpa)
 {
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index 1f8f6cf..15d96a2 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -48,6 +48,7 @@
 #include "plpar_wrappers.h"
 #include "pseries.h"
 #include "xics.h"
+#include "offline_driver.h"
 
 
 /*
@@ -86,7 +87,10 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
 	/* Fixup atomic count: it exited inside IRQ handler. */
 	task_thread_info(paca[lcpu].__current)->preempt_count	= 0;
 
-	/* 
+	if (per_cpu(preferred_offline_state, lcpu) == CPU_DEACTIVATE)
+		goto out;
+
+	/*
 	 * If the RTAS start-cpu token does not exist then presume the
 	 * cpu is already spinning.
 	 */
@@ -100,6 +104,7 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
 		return 0;
 	}
 
+out:
 	return 1;
 }
 
@@ -119,6 +124,7 @@ static void __devinit smp_xics_setup_cpu(int cpu)
 
 static void __devinit smp_pSeries_kick_cpu(int nr)
 {
+	long rc;
 	BUG_ON(nr < 0 || nr >= NR_CPUS);
 
 	if (!smp_startup_cpu(nr))
@@ -130,6 +136,16 @@ static void __devinit smp_pSeries_kick_cpu(int nr)
 	 * the processor will continue on to secondary_start
 	 */
 	paca[nr].cpu_start = 1;
+
+	per_cpu(cpu_current_state, nr) = CPU_STATE_ONLINE;
+	if (per_cpu(preferred_offline_state, nr) == CPU_DEACTIVATE) {
+
+		printk(KERN_INFO "Prodding processor %d to go online\n", nr);
+		rc = prod_processor(nr);
+		if (rc != H_SUCCESS)
+			panic("Prod to wake up processor %d returned \
+				with error code: %ld. Dying\n", nr, rc);
+	}
 }
 
 static int smp_pSeries_cpu_bootable(unsigned int nr)

^ permalink raw reply related

* [PATCH 1/3] cpu: Offline state Framework.
From: Gautham R Shenoy @ 2009-08-05 14:25 UTC (permalink / raw)
  To: Joel Schopp, len.brown, Peter Zijlstra, Balbir Singh,
	Venkatesh Pallipadi, Benjamin Herrenschmidt, shaohua.li,
	Ingo Molnar, Vaidyanathan Srinivasan, Dipankar Sarma,
	Darrick J. Wong
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20090805142311.553.78286.stgit@sofia.in.ibm.com>

Provide an interface by which the system administrator can decide what state
should the CPU go to when it is offlined.

To query the available offline states, on needs to perform a read on:
/sys/devices/system/cpu/cpu<number>/available_offline_states

To query or set the preferred offline state for a particular CPU, one needs to
use the sysfs interface

/sys/devices/system/cpu/cpu<number>/preferred_offline_state

This patch implements the architecture independent bits of the
cpu-offline-state framework.

The architecture specific bits are expected to register the actual code which
implements the callbacks when the above mentioned sysfs interfaces are read or
written into. Thus the values provided by reading available_offline_states are
expected to vary with the architecture.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
 drivers/base/cpu.c  |  111 +++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/cpu.h |   15 +++++++
 2 files changed, 126 insertions(+), 0 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index e62a4cc..1a63de0 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -56,26 +56,137 @@ static ssize_t __ref store_online(struct sys_device *dev, struct sysdev_attribut
 }
 static SYSDEV_ATTR(online, 0644, show_online, store_online);
 
+static struct cpu_offline_driver *cpu_offline_driver;
+static SYSDEV_ATTR(available_offline_states, 0444, NULL, NULL);
+static SYSDEV_ATTR(preferred_offline_state, 0644, NULL, NULL);
+
+/* Should be called with cpu_add_remove_lock held */
+void cpu_offline_driver_add_cpu(struct sys_device *cpu_sys_dev)
+{
+	if (!cpu_offline_driver)
+		return;
+
+	sysdev_create_file(cpu_sys_dev, &attr_available_offline_states);
+	sysdev_create_file(cpu_sys_dev, &attr_preferred_offline_state);
+}
+
+/* Should be called with cpu_add_remove_lock held */
+void cpu_offline_driver_remove_cpu(struct sys_device *cpu_sys_dev)
+{
+	if (!cpu_offline_driver)
+		return;
+
+	sysdev_remove_file(cpu_sys_dev, &attr_available_offline_states);
+	sysdev_remove_file(cpu_sys_dev, &attr_preferred_offline_state);
+
+}
+
 static void __cpuinit register_cpu_control(struct cpu *cpu)
 {
 	sysdev_create_file(&cpu->sysdev, &attr_online);
+	cpu_offline_driver_add_cpu(&cpu->sysdev);
 }
+
 void unregister_cpu(struct cpu *cpu)
 {
 	int logical_cpu = cpu->sysdev.id;
 
 	unregister_cpu_under_node(logical_cpu, cpu_to_node(logical_cpu));
 
+	cpu_offline_driver_remove_cpu(&cpu->sysdev);
 	sysdev_remove_file(&cpu->sysdev, &attr_online);
 
 	sysdev_unregister(&cpu->sysdev);
 	per_cpu(cpu_sys_devices, logical_cpu) = NULL;
 	return;
 }
+
+static int __cpuinit
+cpu_driver_callback(struct notifier_block *nfb, unsigned long action,
+								void *hcpu)
+{
+	struct sys_device *cpu_sysdev = per_cpu(cpu_sys_devices,
+						(unsigned long)(hcpu));
+
+	switch (action) {
+	case CPU_DEAD:
+	case CPU_DEAD_FROZEN:
+		cpu_offline_driver_remove_cpu(cpu_sysdev);
+		break;
+
+	case CPU_ONLINE:
+	case CPU_ONLINE_FROZEN:
+		cpu_offline_driver_add_cpu(cpu_sysdev);
+		break;
+	default:
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block __cpuinitdata cpu_driver_notifier = {
+	.notifier_call = cpu_driver_callback,
+	.priority = 0
+};
+
+int register_cpu_offline_driver(struct cpu_offline_driver *arch_cpu_driver)
+{
+	int ret = 0;
+	cpu_maps_update_begin();
+
+	if (cpu_offline_driver != NULL) {
+		ret = -EEXIST;
+		goto out_unlock;
+	}
+
+	if (!(arch_cpu_driver->show_available_states &&
+	      arch_cpu_driver->show_preferred_state &&
+	      arch_cpu_driver->store_preferred_state)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	attr_available_offline_states.show =
+		arch_cpu_driver->show_available_states;
+	attr_preferred_offline_state.show =
+		arch_cpu_driver->show_preferred_state;
+	attr_preferred_offline_state.store =
+		arch_cpu_driver->store_preferred_state;
+
+	cpu_offline_driver = arch_cpu_driver;
+
+out_unlock:
+	cpu_maps_update_done();
+	if (!ret)
+		register_cpu_notifier(&cpu_driver_notifier);
+	return ret;
+}
+
+void unregister_cpu_offline_driver(struct cpu_offline_driver *arch_cpu_driver)
+{
+	cpu_maps_update_begin();
+
+	if (!cpu_offline_driver) {
+		WARN_ON(1);
+		cpu_maps_update_done();
+		return;
+	}
+
+	cpu_offline_driver = NULL;
+	attr_available_offline_states.show = NULL;
+	attr_preferred_offline_state.show = NULL;
+	attr_preferred_offline_state.store = NULL;
+
+	cpu_maps_update_done();
+	unregister_cpu_notifier(&cpu_driver_notifier);
+}
+
 #else /* ... !CONFIG_HOTPLUG_CPU */
 static inline void register_cpu_control(struct cpu *cpu)
 {
 }
+
 #endif /* CONFIG_HOTPLUG_CPU */
 
 #ifdef CONFIG_KEXEC
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 4d668e0..e2150be 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -51,6 +51,21 @@ struct notifier_block;
 #ifdef CONFIG_HOTPLUG_CPU
 extern int register_cpu_notifier(struct notifier_block *nb);
 extern void unregister_cpu_notifier(struct notifier_block *nb);
+
+struct cpu_offline_driver {
+	ssize_t (*show_available_states)(struct sys_device *dev,
+			struct sysdev_attribute *attr, char *buf);
+	ssize_t (*show_preferred_state)(struct sys_device *dev,
+			struct sysdev_attribute *attr, char *buf);
+
+	ssize_t (*store_preferred_state)(struct sys_device *dev,
+			struct sysdev_attribute *attr,
+			const char *buf, size_t count);
+};
+
+extern int register_cpu_offline_driver(struct cpu_offline_driver *driver);
+extern void unregister_cpu_offline_driver(struct cpu_offline_driver *driver);
+
 #else
 
 #ifndef MODULE

^ permalink raw reply related

* [PATCH 2/3] cpu: Implement cpu-offline-state callbacks for pSeries.
From: Gautham R Shenoy @ 2009-08-05 14:26 UTC (permalink / raw)
  To: Joel Schopp, len.brown, Peter Zijlstra, Balbir Singh,
	Venkatesh Pallipadi, Benjamin Herrenschmidt, shaohua.li,
	Ingo Molnar, Vaidyanathan Srinivasan, Dipankar Sarma,
	Darrick J. Wong
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20090805142311.553.78286.stgit@sofia.in.ibm.com>

This patch implements the callbacks to handle the reads/writes into the sysfs
interfaces

/sys/devices/system/cpu/cpu<number>/available_offline_states
and
/sys/devices/system/cpu/cpu<number>/preferred_offline_state

Currently, the patch defines two states which the processor can go to when it
is offlined. They are

- deallocate: The current behaviour when the cpu is offlined.
  The CPU would call make an rtas_stop_self() call and hand over the
  CPU back to the resource pool, thereby effectively deallocating
  that vCPU from the LPAR.

- deactivate: This is expected to cede the processor to the hypervisor, so
  that on processors which support appropriate low-power states, they can
  be exploited. This can be considered as an extended tickless idle state.

The patch only implements the callbacks which will display the available
states, and record the preferred states. The code bits to call
rtas_stop_self() or H_CEDE, depending on the preferred_offline_state is
implemented in the next patch.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c    |   90 +++++++++++++++++++++++
 arch/powerpc/platforms/pseries/offline_driver.h |   16 ++++
 2 files changed, 106 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.h

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index a20ead8..f15de99 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -30,6 +30,95 @@
 #include <asm/pSeries_reconfig.h>
 #include "xics.h"
 #include "plpar_wrappers.h"
+#include "offline_driver.h"
+
+struct cpu_offline_state {
+	enum cpu_state_vals state_val;
+	const char *state_name;
+} pSeries_cpu_offline_states[] = {
+	{CPU_DEACTIVATE, "deactivate"},
+	{CPU_DEALLOCATE, "deallocate"},
+};
+
+DEFINE_PER_CPU(enum cpu_state_vals, preferred_offline_state) = CPU_DEALLOCATE;
+
+ssize_t pSeries_show_available_states(struct sys_device *dev,
+			struct sysdev_attribute *attr, char *buf)
+{
+	int state;
+	ssize_t ret = 0;
+
+	for (state = CPU_DEACTIVATE; state < CPU_MAX_OFFLINE_STATES; state++) {
+		if (state == CPU_STATE_ONLINE)
+			continue;
+
+		if (ret >= (ssize_t) ((PAGE_SIZE / sizeof(char))
+					- (CPU_STATES_LEN + 2)))
+			goto out;
+		ret += scnprintf(&buf[ret], CPU_STATES_LEN, "%s ",
+				pSeries_cpu_offline_states[state].state_name);
+	}
+
+out:
+	ret += sprintf(&buf[ret], "\n");
+	return ret;
+}
+
+ssize_t pSeries_show_preferred_state(struct sys_device *dev,
+			struct sysdev_attribute *attr, char *buf)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+	int state = per_cpu(preferred_offline_state, cpu->sysdev.id);
+
+	return scnprintf(buf, CPU_STATES_LEN, "%s\n",
+			pSeries_cpu_offline_states[state].state_name);
+}
+
+ssize_t pSeries_store_preferred_state(struct sys_device *dev,
+			struct sysdev_attribute *attr,
+			const char *buf, size_t count)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+	unsigned int ret = -EINVAL;
+	char state_name[CPU_STATES_LEN];
+	int i;
+	cpu_maps_update_begin();
+	ret = sscanf(buf, "%15s", state_name);
+
+	if (ret != 1) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	for (i = CPU_DEACTIVATE; i < CPU_MAX_OFFLINE_STATES; i++)
+		if (!strnicmp(state_name,
+				pSeries_cpu_offline_states[i].state_name,
+				CPU_STATES_LEN))
+			break;
+
+	if (i == CPU_MAX_OFFLINE_STATES) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	per_cpu(preferred_offline_state, cpu->sysdev.id) =
+				pSeries_cpu_offline_states[i].state_val;
+	ret = 0;
+
+out_unlock:
+	cpu_maps_update_done();
+
+	if (ret)
+		return ret;
+	else
+		return count;
+}
+
+struct cpu_offline_driver pSeries_offline_driver = {
+	.show_available_states = pSeries_show_available_states,
+	.show_preferred_state = pSeries_show_preferred_state,
+	.store_preferred_state = pSeries_store_preferred_state,
+};
 
 /* This version can't take the spinlock, because it never returns */
 static struct rtas_args rtas_stop_self_args = {
@@ -281,6 +370,7 @@ static int __init pseries_cpu_hotplug_init(void)
 	ppc_md.cpu_die = pseries_mach_cpu_die;
 	smp_ops->cpu_disable = pseries_cpu_disable;
 	smp_ops->cpu_die = pseries_cpu_die;
+	register_cpu_offline_driver(&pSeries_offline_driver);
 
 	/* Processors can be added/removed only on LPAR */
 	if (firmware_has_feature(FW_FEATURE_LPAR))
diff --git a/arch/powerpc/platforms/pseries/offline_driver.h b/arch/powerpc/platforms/pseries/offline_driver.h
new file mode 100644
index 0000000..bdae76a
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/offline_driver.h
@@ -0,0 +1,16 @@
+#ifndef _OFFLINE_DRIVER_H_
+#define _OFFLINE_DRIVER_H_
+
+#define CPU_STATES_LEN	16
+
+/* Cpu offline states go here */
+enum cpu_state_vals {
+	CPU_DEACTIVATE,
+	CPU_DEALLOCATE,
+	CPU_STATE_ONLINE,
+	CPU_MAX_OFFLINE_STATES
+};
+
+DECLARE_PER_CPU(enum cpu_state_vals, preferred_offline_state);
+
+#endif

^ permalink raw reply related

* [PATCH 0/3] cpu: idle state framework for offline CPUs.
From: Gautham R Shenoy @ 2009-08-05 14:25 UTC (permalink / raw)
  To: Joel Schopp, len.brown, Peter Zijlstra, Balbir Singh,
	Venkatesh Pallipadi, Benjamin Herrenschmidt, shaohua.li,
	Ingo Molnar, Vaidyanathan Srinivasan, Dipankar Sarma,
	Darrick J. Wong
  Cc: linuxppc-dev, linux-kernel

Hi,

**** RFC not for inclusion ****

When we perform a CPU-Offline operation today, we do not put the CPU
into the most energy efficient state. On x86, it loops in hlt as opposed to
going to one of the low-power C-states. On pSeries, we call rtas_stop_self()
and hand over the vCPU back to the resource pool, thereby deallocating
the vCPU.

Thus, when applications or platforms desire to put a particular CPU
to an extended low-power state for a short while, currently they have to
piggy-back on scheduler heuristics such as sched_mc_powersavings or play with
exclusive Cpusets. The former does a good job based on the workload, but fails
to provide any guarentee that the CPU won't be used for the next <> seconds,
while the latter might conflict with the existing CPUsets configurations.

There were efforts to alleviate these problems and various proposals have been
put forth. They include putting the CPU to the deepest possible idle-state
when offlined [1], removing the desired CPU from the topmost-cpuset [2],
a driver which forces a high-priority idle thread to run on the desired CPU
thereby putting it to idle [3].

In this patch-series, we propose to extend the CPU-Hotplug infrastructure
and allow the system administrator to choose the desired state the CPU should
go to when it is offlined. We think this approach addresses the concerns about
determinism as well as transparency, since CPU-Hotplug already provides
notification mechanism which the userspace can listen to for any change
in the configuration and correspondingly readjust any previously set
cpu-affinities. Also, approaches such as [1] can make use of this
extended infrastructure instead of putting the CPU to an arbitrary C-state
when it is offlined, thereby providing the system administrator a rope to hang
himself with should he feel the need to do so.

This patch-series tries to achieve this by implementing an architecture
independent framework that exposes sysfs tunables to allow the
system-adminstrator to choose the offline-state of a CPU.

	/sys/devices/system/cpu/cpu<number>/available_offline_states
and
	/sys/devices/system/cpu/cpu<number>/preferred_offline_states

For the purpose of proof-of-concept, we've implemented the backend for
pSeries. For pSeries, we define two available_offline_states. They are:

	deallocate: This is default behaviour which on an offline, deallocates
	the vCPU by invoking rtas_stop_self() and hands it back to
	the resource pool.

	deactivate: This calls H_CEDE, which will request the hypervisor to
	idle the vCPU in the lowest power mode and give it back as soon as
	we need it.


Any feedback on the patchset will be immensely valuable.

References:
-----------
[1] Pallipadi, Venkatesh: x86: Make offline cpus to go to deepest idle state
using mwait (URL: http://lkml.org/lkml/2009/5/22/431)

[2] Li, Shaohua: cpuset: add new API to change cpuset top group's cpus
(URL: http://lkml.org/lkml/2009/5/19/54)

[3] Li, Shaohua: new ACPI processor driver to force CPUs idle
(URL: http://www.spinics.net/lists/linux-acpi/msg22863.html)


Changelog:
---

Gautham R Shenoy (3):
      pSeries: cpu: Cede CPU during a deactivate-offline
      cpu: Implement cpu-offline-state callbacks for pSeries.
      cpu: Offline state Framework.


 arch/powerpc/platforms/pseries/hotplug-cpu.c    |  160 ++++++++++++++++++++++-
 arch/powerpc/platforms/pseries/offline_driver.h |   17 ++
 arch/powerpc/platforms/pseries/plpar_wrappers.h |    6 +
 arch/powerpc/platforms/pseries/smp.c            |   18 ++-
 drivers/base/cpu.c                              |  111 ++++++++++++++++
 include/linux/cpu.h                             |   15 ++
 6 files changed, 319 insertions(+), 8 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.h

-- 
Thanks and Regards
gautham.

^ permalink raw reply

* Re: linux-next: Tree for August 5
From: Boaz Harrosh @ 2009-08-05 13:28 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linux-next, ppc-dev, LKML
In-Reply-To: <20090805201336.fd2de2ad.sfr@canb.auug.org.au>

On 08/05/2009 01:13 PM, Stephen Rothwell wrote:
> Hi Boaz,
> 
> On Wed, 05 Aug 2009 11:11:20 +0300 Boaz Harrosh <bharrosh@panasas.com> wrote:
>>
>> On 08/05/2009 10:23 AM, Stephen Rothwell wrote:
>>>
>>> This tree fails to build for powerpc allyesconfig (final link problem).
>>
>> Above is reported for a long time. Is it related to this:
>> http://www.spinics.net/lists/kernel/msg921978.html
> 
> No, it is this:
> 
> powerpc-linux-ld: TOC section size exceeds 64k
> 
> It is powerpc specific and the only way we can think of fixing it
> involves stopping building the built-in.o files and linking the entire
> kernel in one go at the end.  We just haven't had time (or the energy) to
> try to fix it properly while it only affects the final link of the
> allyesconfig kernel. 
> 

Ha, OK allyesconfig. Last time I attempted an allyesconfig for i386 I got
me the OOM killer attack on my KDE, half way into the final link.

That's why I use allmodconfig for when I need "try to compile everything ++"
I get just as much compilation coverage if not more. And is faster.
It'll not show symbol conflicts if any exist, though.

Thanks
Boaz

^ permalink raw reply

* [PATCH 4/4] drivers/serial/mpc52xx_uart.c: Use UPIO_MEM rather than SERIAL_IO_MEM
From: Julia Lawall @ 2009-08-05 13:25 UTC (permalink / raw)
  To: grant.likely, linuxppc-dev, linux-kernel, kernel-janitors

From: Julia Lawall <julia@diku.dk>

As in the commit 9b4a1617772d6d5ab5eeda0cd95302fae119e359, use UPIO_MEM
rather than SERIAL_IO_MEM.  Both have the same value.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@has_sc@
@@

#include <linux/serial_core.h>

@depends on has_sc@
@@

- SERIAL_IO_MEM
+ UPIO_MEM
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 drivers/serial/mpc52xx_uart.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -u -p a/drivers/serial/mpc52xx_uart.c b/drivers/serial/mpc52xx_uart.c
--- a/drivers/serial/mpc52xx_uart.c
+++ b/drivers/serial/mpc52xx_uart.c
@@ -705,7 +705,7 @@ mpc52xx_uart_verify_port(struct uart_por
 		return -EINVAL;
 
 	if ((ser->irq != port->irq) ||
-	    (ser->io_type != SERIAL_IO_MEM) ||
+	    (ser->io_type != UPIO_MEM) ||
 	    (ser->baud_base != port->uartclk)  ||
 	    (ser->iomem_base != (void *)port->mapbase) ||
 	    (ser->hub6 != 0))

^ permalink raw reply

* Re: [PATCH] powerpc/mm: Fix switch_mmu_context to iterate of the proper list of cpus
From: Josh Boyer @ 2009-08-05 13:16 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <1249476742.5607.2.camel@concordia>

On Wed, Aug 05, 2009 at 10:52:22PM +1000, Michael Ellerman wrote:
>On Wed, 2009-08-05 at 07:32 -0400, Josh Boyer wrote:
>> On Tue, Aug 04, 2009 at 10:33:32PM -0500, Kumar Gala wrote:
>> >Introduced a temporary variable into our iterating over the list cpus
>> >that are threads on the same core.  For some reason Ben forgot how for
>> >loops work.
>> 
>> Have the powerpoint demons corrupted him already??
>
>No I think one of his kids gave him swine flu.

Whew.  You can recover from that at least ;)

josh

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox