LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: Oops in IDE probing on ppc_440 when PCI is enabled in strapping
From: Benjamin Herrenschmidt @ 2009-09-15  0:42 UTC (permalink / raw)
  To: Ludo Van Put; +Cc: linuxppc-dev
In-Reply-To: <5edaeed70909140608m3ddcda33y7f92b2dfd18ca92e@mail.gmail.com>

On Mon, 2009-09-14 at 15:08 +0200, Ludo Van Put wrote:
> 2009/9/14 Josh Boyer <jwboyer@linux.vnet.ibm.com>:
> > On Mon, Sep 14, 2009 at 02:36:15PM +0200, Ludo Van Put wrote:
> >>Hi,
> >>
> >>we're working with a PPC440GX on a board that has a.o. a compact flash slot.
> >>We had the PCI subsystem of the ppc disabled in strapping for quite a while,
> >>until we wanted to start using it.
> >>However, when we enabled PCI in the strapping and in the (patched 2.6.10)
> >
> > 2.6.10?  Really?  If that is truly the case, you probably aren't going to get
> > a whole lot of help from the list, since that kernel is pretty ancient.
> >
> I can only acknowledge that, but we're stuck to that kernel for now...
> 
> >>kernel configuration, we triggered an oops when probing for IDE devices (to
> >>read out the first 512 bytes of the CF). I can see that the ioremap64 call
> >>in the driver code for our CF returns a different address (compared to PCI
> >>disabled in strapping), but using this address later on for accessing the CF
> >>goes wrong.
> >
> > Posting the oops output would perhaps help.  Or maybe not.
> >
> > josh
> >
> 
> Here it goes, you never know:
> 
> Oops: kernel access of bad area, sig: 11 [#1]
> PREEMPT
> NIP: C0148050 LR: C013BC64 SP: C07CFEA0 REGS: c07cfdf0 TRAP: 0300    Not tainted
> MSR: 00021000 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 00
> DAR: E3093000, DSISR: 00000000
> TASK = c07cdb70[1] 'swapper' THREAD: c07ce000
> Last syscall: 120
> GPR00: 00000000 C07CFEA0 C07CDB70 E3093000 DFE829FE 00000100 C01184E8 C021B270
> GPR08: C0220000 C02D0F60 C07CDEF8 C07CDEF8 00000000 70000000 1FFF6400 00000001
> GPR16: 00000001 FFFFFFFF 1FFF06C0 00000000 00000001 C0220000 C0280000 00029000
> GPR24: 00000000 C02D0F60 C01F0000 C0148040 00000080 00000000 DFE82A00 C02D0FF0
> NIP [c0148050] ide_insw+0x10/0x24
> LR [c013bc64] ata_input_data+0x74/0x114
> Call backtrace:
>  c013e6a4 try_to_identify+0x2ec/0x5ec
>  c013eaa8 do_probe+0x104/0x304
>  c013f0c4 probe_hwif+0x358/0x6c4
>  c0140068 ideprobe_init+0xa8/0x1a0
>  c02a4ef8 ide_generic_init+0x10/0x28
>  c0001324 init+0xc4/0x244
>  c0004254 kernel_thread+0x44/0x60
> Kernel panic - not syncing: Attempted to kill init!
>  <0>Rebooting in 180 seconds..
> 
> 
> ide_insw is a asm routine to read in 16bit words and swap them. Copied
> from arch/ppc/kernel/misc.S. Works fine when PCI is disabled.

Probably because ide_insw uses isnw which offsets everything from
_IO_BASE which changes value when you have a PCI bus with an IO space...
If your IDE isn't PCI IO space based you shouldn't use ide_insw but the
MMIO variants instead.

Ben.

> KR, Ludo
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* Re: [PATCH] powerpc: Fix bug where perf_counters breaks oprofile
From: Benjamin Herrenschmidt @ 2009-09-15  0:56 UTC (permalink / raw)
  To: Josh Boyer; +Cc: Paul Mackerras, Maynard Johnson, linuxppc-dev
In-Reply-To: <20090914204348.GE12372@zod.rchland.ibm.com>

On Mon, 2009-09-14 at 16:43 -0400, Josh Boyer wrote:
> On Mon, Sep 14, 2009 at 03:14:02PM -0500, Maynard Johnson wrote:
> >Maynard Johnson wrote:
> >> Paul Mackerras wrote:
> >>> Currently there is a bug where if you use oprofile on a pSeries
> >>> machine, then use perf_counters, then use oprofile again, oprofile
> >>> will not work correctly; it will lose the PMU configuration the next
> >>> time the hypervisor does a partition context switch, and thereafter
> >>> won't count anything.
> >Ben,
> >Is there any way to get this bug fix into 2.6.31 or is the window closed?  Once the problem occurs, you can't get oprofile to work again without a reboot.  Really would be nice (for many reasons) to get this fixed in .31.
> 
> .31 is already released.  Could probably get it into a -stable .31 kernel
> though.

The fix is already upstream as well no ?

Ben.

> josh

^ permalink raw reply

* Re: [PATCH] powerpc: Fix bug where perf_counters breaks oprofile
From: Josh Boyer @ 2009-09-15  2:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Paul Mackerras, Maynard Johnson, linuxppc-dev
In-Reply-To: <1252976208.8375.181.camel@pasglop>

On Tue, Sep 15, 2009 at 10:56:47AM +1000, Benjamin Herrenschmidt wrote:
>On Mon, 2009-09-14 at 16:43 -0400, Josh Boyer wrote:
>> On Mon, Sep 14, 2009 at 03:14:02PM -0500, Maynard Johnson wrote:
>> >Maynard Johnson wrote:
>> >> Paul Mackerras wrote:
>> >>> Currently there is a bug where if you use oprofile on a pSeries
>> >>> machine, then use perf_counters, then use oprofile again, oprofile
>> >>> will not work correctly; it will lose the PMU configuration the next
>> >>> time the hypervisor does a partition context switch, and thereafter
>> >>> won't count anything.
>> >Ben,
>> >Is there any way to get this bug fix into 2.6.31 or is the window closed?  Once the problem occurs, you can't get oprofile to work again without a reboot.  Really would be nice (for many reasons) to get this fixed in .31.
>> 
>> .31 is already released.  Could probably get it into a -stable .31 kernel
>> though.
>
>The fix is already upstream as well no ?

Sort of.  It's in your next branch, but not in Linus' tree.

Anyway, some people like to have particular versions work so -stable is good
regardless :).

josh

^ permalink raw reply

* Re: [PATCH] mpc5200: support for the MAN mpc5200 based board uc101
From: Heiko Schocher @ 2009-09-15  5:06 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: linuxppc-dev
In-Reply-To: <4AAE9686.4060709@grandegger.com>

Hello Wolfgang,

Wolfgang Grandegger wrote:
> Heiko Schocher wrote:
>> Hello Grant,
>>
>> Grant Likely wrote:
>>> Thanks for the patch.  Comments below.
>>>
>>> g.
>>>
>>> On Mon, Sep 14, 2009 at 2:05 AM, Heiko Schocher <hs@denx.de> wrote:
>>>> - serial Console on PSC1
>>>> - 64MB SDRAM
>>>> - MTD CFI Flash
>>>> - Ethernet FEC
>>>> - I2C with PCF8563 and Temp. Sensor ADM9240
>>>> - IDE support
>>>>
>>>> Signed-off-by: Heiko Schocher <hs@denx.de>
> ...snip....
> 
>>>> +               i2c@3d40 {
>>>> +                       #address-cells = <1>;
>>>> +                       #size-cells = <0>;
>>>> +                       compatible = "fsl,mpc5200-i2c","fsl-i2c";
>>>> +                       reg = <0x3d40 0x40>;
>>>> +                       interrupts = <2 16 0>;
>>>> +                       fsl5200-clocking;
>>> I believe fsl5200-clocking is no longer required.  There is a patch
>>> pending which removes this property from the other .dts files.
> 
> Right, it obsolete.

OK, I remove it.

>> Ok, fix this.
> 
> Like it is, the I2C controller will use a fixed low speed fdt/dfsr
> setting. You have two other options:
> 
>   fsl,preserve-clocking;
>   clock-frequency = <400000>;
> 
> See also
> http://lxr.linux.no/#linux+v2.6.31/Documentation/powerpc/dts-bindings/fsl/i2c.txt.

Ah, ok, thanks for this info. I try this ...

>>>> +
>>>> +                       hwmon@2c {
>>>> +                               compatible = "ad,adm9240";
>>>> +                               reg = <0x2c>;
>>>> +                       };
>>>> +                       rtc@51 {
>>>> +                               compatible = "rtc,pcf8563";
> 
> rtc is not a proper vendor name. Should be nxp, IIRC.

OK, fix it.

Thanks for reviewing

bye
Heiko
-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

^ permalink raw reply

* Re: MPC8323 USB Host
From: spa_kk @ 2009-09-15  5:45 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <20090914135900.GA18534@oksana.dev.rtsoft.ru>


Hi,
Thanks for the suggestion,
pins & clock configuration seems to be ok,
Need a help in understanding which register of MPC8323 is been mapped to 
fhci->vroot_hub->port.wPortChange & 
fhci->vroot_hub->port.wPortStatus
As this are the variable which is been checked while polling

File: "drivers/usb/host/fhci.h"
struct virtual_root_hub {
    int dev_num;    /* USB address of the root hub */
    u32 feature;    /* indicates what feature has been set */
    struct usb_hub_status hub;
    struct usb_port_status port;
};

File: "drivers/usb/core/hub.h"
/*
 * Hub Status and Hub Change results
 * See USB 2.0 spec Table 11-19 and Table 11-20
 */
struct usb_port_status {
    __le16 wPortStatus;
    __le16 wPortChange;
} __attribute__ ((packed));

struct usb_hub_status {
    __le16 wHubStatus;
    __le16 wHubChange;
} __attribute__ ((packed));

Need support in understand where is virtual_root_hub.hub and
virtual_root_hub.port are initialized

Thanks and best regards,
Krishna


Anton Vorontsov-2 wrote:
> 
> Hello,
> 
> On Mon, Sep 14, 2009 at 05:49:01AM -0700, spa_kk wrote:
>> 
>> Hello,
> [...]
>> The value in this variable are not getting changed even after connecting
>> &
>> disconnection the device.
> 
> Could be pins multiplexing or USB clocks misconfiguration.
> 
>> We have manual checked the USBRXP pin, this pin is 
>> HIGH => when device is connected
>> LOW => when device is not connected.
>>  
>> Kindly let us know how should we debug/proceed.
> 
> You might find this useful:
> 
> http://lkml.org/lkml/2009/4/1/481
> 
> -- 
> Anton Vorontsov
> email: cbouatmailru@gmail.com
> irc://irc.freenode.net/bd2
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 
> 

-- 
View this message in context: http://www.nabble.com/MPC8323-USB-Host-tp25435195p25448161.html
Sent from the linuxppc-dev mailing list archive at Nabble.com.

^ permalink raw reply

* [PATCH] powerpc: Check for unsupported relocs when using CONFIG_RELOCATABLE
From: Benjamin Herrenschmidt @ 2009-09-15  5:57 UTC (permalink / raw)
  To: linuxppc-dev list; +Cc: Linux Kernel list

From: Tony Breeds <tony@bakeyournoodle.com>

When using CONFIG_RELOCATABLE, we build the kernel as a position
independent executable. The kernel then uses a little bit of relocation
code to relocate itself. That code only deals with R_PPC64_RELATIVE
relocations though. If for some reason you use assembly constructs
such as LOAD_REG_IMMEDIATE() to load the address of a symbol, you'll
generate different kinds of relocations that won't be processed properly
and bad things will happen. (We have 2 such bugs today).

The perl script tries to filter out "known" bad ones. It's possible
that we are missing some in the case of a weak function that nobody
implements, we'll see if we get false positive and fix it.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 952a396..0101e0c 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -166,6 +166,17 @@ PHONY += $(BOOT_TARGETS)
 
 boot := arch/$(ARCH)/boot
 
+ifeq ($(CONFIG_RELOCATABLE),y)
+quiet_cmd_relocs_check = CALL    $<
+      cmd_relocs_check = perl $< "$(OBJDUMP)" "$(obj)/vmlinux"
+
+PHONY += relocs_check
+relocs_check: arch/powerpc/relocs_check.pl vmlinux
+	$(call cmd,relocs_check)
+
+zImage: relocs_check
+endif
+
 $(BOOT_TARGETS): vmlinux
 	$(Q)$(MAKE) ARCH=ppc64 $(build)=$(boot) $(patsubst %,$(boot)/%,$@)
 
diff --git a/arch/powerpc/relocs_check.pl b/arch/powerpc/relocs_check.pl
new file mode 100755
index 0000000..215e966
--- /dev/null
+++ b/arch/powerpc/relocs_check.pl
@@ -0,0 +1,57 @@
+#!/usr/bin/perl
+
+# Copyright © 2009 IBM Corporation
+
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License
+# as published by the Free Software Foundation; either version
+# 2 of the License, or (at your option) any later version.
+
+# This script checks the relcoations of a vmlinux for "suspicious"
+# relocations.
+
+use strict;
+use warnings;
+
+if ($#ARGV != 1) {
+	print "$#ARGV\n";
+	die "$0 [path to objdump] [path to vmlinux]\n";
+}
+
+# Have Kbuild supply the path to objdump so we handle cross compilation.
+my $objdump = shift;
+my $vmlinux = shift;
+my $bad_relocs_count = 0;
+my $bad_relocs = "";
+my $old_binutils = 0;
+
+open(FD, "$objdump -R $vmlinux|") or die;
+while (<FD>) {
+	study $_;
+
+	# Only look at relcoation lines.
+	next if (!/\s+R_/);
+
+	# These relocations are okay
+	next if (/R_PPC64_RELATIVE/ or /R_PPC64_NONE/ or
+	         /R_PPC64_ADDR64\s+mach_/);
+
+	# If we see this type of relcoation it's an idication that
+	# we /may/ be using an old version of binutils.
+	if (/R_PPC64_UADDR64/) {
+		$old_binutils++;
+	}
+
+	$bad_relocs_count++;
+	$bad_relocs .= $_;
+}
+
+if ($bad_relocs_count) {
+	print "WARNING: $bad_relocs_count bad relocations\n";
+	print $bad_relocs;
+}
+
+if ($old_binutils) {
+	print "WARNING: You need at binutils >= 2.19 to build a ".
+	      "CONFIG_RELCOATABLE kernel\n";
+}

^ permalink raw reply related

* Re: [PATCH] powerpc: Check for unsupported relocs when using CONFIG_RELOCATABLE
From: Tony Breeds @ 2009-09-15  6:24 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list, Linux Kernel list
In-Reply-To: <1252994222.8375.201.camel@pasglop>

On Tue, Sep 15, 2009 at 03:57:02PM +1000, Benjamin Herrenschmidt wrote:
 
> diff --git a/arch/powerpc/relocs_check.pl b/arch/powerpc/relocs_check.pl
> new file mode 100755
> index 0000000..215e966
> --- /dev/null
> +++ b/arch/powerpc/relocs_check.pl
> @@ -0,0 +1,57 @@
> +#!/usr/bin/perl
> +
> +# Copyright © 2009 IBM Corporation
> +
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License
> +# as published by the Free Software Foundation; either version
> +# 2 of the License, or (at your option) any later version.
> +
> +# This script checks the relcoations of a vmlinux for "suspicious"
> +# relocations.
> +
> +use strict;
> +use warnings;
> +
> +if ($#ARGV != 1) {
> +	print "$#ARGV\n";

Ooops that line should have been taken out.  Sorry.

Yours Tony

^ permalink raw reply

* [0/5] Assorted hugepage cleanups (v2)
From: David Gibson @ 2009-09-15  6:41 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt

Currently, ordinary pages use one pagetable layout, and each different
hugepage size uses a slightly different variant layout.  A number of
places which need to walk the pagetable must first check the slice map
to see what the pagetable layout then handle the various different
forms.  New hardware, like Book3E is liable to introduce more possible
variants.

This patch series, therefore, is designed to simplify the matter by
limiting knowledge of the pagetable layout to only the allocation
path.  With this patch, ordinary pages are handled as ever, with a
fixed 4 (or 3) level tree.  All other variants branch off from some
layer of that with a specially marked PGD/PUD/PMD pointer which also
contains enough information to interpret the directories below that
point.  This means that things walking the pagetables (without
allocating) don't need to look up the slice map, they can just step
down the tree in the usual way, branching off to the "non-standard
layout" path for hugepages, which uses the embdded information to
interpret the tree from that point on.

This reduces the source size in a number of places, and means that
newer variants on the pagetable layout to handle new hardware and new
features will need to alter the existing code in less places.

In addition we split out the hash / classic MMU specific code into a
separate hugetlbpage-hash64.c file.  This will make adding support for
other MMUs (like 440 and/or Book3E) easier.

I've used the libhugetlbfs testsuite to test these patches on a
Power5+ machine, but they could certainly do with more testing. In
particular, I don't have any suitable hardware to test 16G pages.

V2: Made the tweaks that BenH suggested to patch 2 of the original
series.  Some corresponding tweaks in patch 3 to match.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* [2/5] Cleanup management of kmem_caches for pagetables
From: David Gibson @ 2009-09-15  6:43 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt
In-Reply-To: <20090915064133.GA11621@yookeroo.seuss>

Currently we have a fair bit of rather fiddly code to manage the
various kmem_caches used to store page tables of various levels.  We
generally have two caches holding some combination of PGD, PUD and PMD
tables, plus several more for the special hugepage pagetables.

This patch cleans this all up by taking a different approach.  Rather
than the caches being designated as for PUDs or for hugeptes for 16M
pages, the caches are simply allocated to be a specific size.  Thus
sharing of caches between different types/levels of pagetables happens
naturally.  The pagetable size, where needed, is passed around encoded
in the same way as {PGD,PUD,PMD}_INDEX_SIZE; that is n where the
pagetable contains 2^n pointers.

Signed-off-by: David Gibson <dwg@au1.ibm.com>

---
 arch/powerpc/include/asm/pgalloc-64.h    |   51 +++++++++++++++----------------
 arch/powerpc/include/asm/pgalloc.h       |   30 ++----------------
 arch/powerpc/include/asm/pgtable-ppc64.h |    1 
 arch/powerpc/mm/hugetlbpage.c            |   45 +++++++--------------------
 arch/powerpc/mm/init_64.c                |   51 ++++++++++++++++++-------------
 arch/powerpc/mm/pgtable.c                |   25 +++++++++------
 6 files changed, 89 insertions(+), 114 deletions(-)

Index: working-2.6/arch/powerpc/mm/init_64.c
===================================================================
--- working-2.6.orig/arch/powerpc/mm/init_64.c	2009-08-14 16:07:54.000000000 +1000
+++ working-2.6/arch/powerpc/mm/init_64.c	2009-09-15 16:03:27.000000000 +1000
@@ -148,30 +148,39 @@ static void pmd_ctor(void *addr)
 	memset(addr, 0, PMD_TABLE_SIZE);
 }
 
-static const unsigned int pgtable_cache_size[2] = {
-	PGD_TABLE_SIZE, PMD_TABLE_SIZE
-};
-static const char *pgtable_cache_name[ARRAY_SIZE(pgtable_cache_size)] = {
-#ifdef CONFIG_PPC_64K_PAGES
-	"pgd_cache", "pmd_cache",
-#else
-	"pgd_cache", "pud_pmd_cache",
-#endif /* CONFIG_PPC_64K_PAGES */
-};
-
-#ifdef CONFIG_HUGETLB_PAGE
-/* Hugepages need an extra cache per hugepagesize, initialized in
- * hugetlbpage.c.  We can't put into the tables above, because HPAGE_SHIFT
- * is not compile time constant. */
-struct kmem_cache *pgtable_cache[ARRAY_SIZE(pgtable_cache_size)+MMU_PAGE_COUNT];
-#else
-struct kmem_cache *pgtable_cache[ARRAY_SIZE(pgtable_cache_size)];
-#endif
+struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE];
+
+void pgtable_cache_add(unsigned shift, void (*ctor)(void *))
+{
+	char *name;
+	unsigned long table_size = sizeof(void *) << shift;
+	unsigned long align = table_size;
+	/* When batching pgtable pointers for RCU freeing, we store
+	 * the index size in the low bits.  Table alignment must be
+	 * big enough to fit it */
+	unsigned long minalign = MAX_PGTABLE_INDEX_SIZE + 1;
+	struct kmem_cache *new;
+
+	BUILD_BUG_ON(!is_power_of_2(minalign));
+	BUG_ON((shift < 1) || (shift > MAX_PGTABLE_INDEX_SIZE));
+
+	if (PGT_CACHE(shift))
+		return; /* Already have a cache of this size */
+	align = max_t(unsigned long, align, minalign);
+	name = kasprintf(GFP_KERNEL, "pgtable-2^%d", shift);
+	new = kmem_cache_create(name, table_size, table_size, 0, ctor);
+	PGT_CACHE(shift) = new;
+	pr_debug("Allocated pgtable cache for order %d\n", shift);
+}
+
 
 void pgtable_cache_init(void)
 {
-	pgtable_cache[0] = kmem_cache_create(pgtable_cache_name[0], PGD_TABLE_SIZE, PGD_TABLE_SIZE, SLAB_PANIC, pgd_ctor);
-	pgtable_cache[1] = kmem_cache_create(pgtable_cache_name[1], PMD_TABLE_SIZE, PMD_TABLE_SIZE, SLAB_PANIC, pmd_ctor);
+	pgtable_cache_add(PGD_INDEX_SIZE, pgd_ctor);
+	pgtable_cache_add(PMD_INDEX_SIZE, pmd_ctor);
+	if (!PGT_CACHE(PGD_INDEX_SIZE) || !PGT_CACHE(PMD_INDEX_SIZE))
+		panic("Couldn't allocate pgtable caches");
+	BUG_ON(PUD_INDEX_SIZE && !PGT_CACHE(PUD_INDEX_SIZE));
 }
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
Index: working-2.6/arch/powerpc/include/asm/pgalloc-64.h
===================================================================
--- working-2.6.orig/arch/powerpc/include/asm/pgalloc-64.h	2009-08-03 16:00:45.000000000 +1000
+++ working-2.6/arch/powerpc/include/asm/pgalloc-64.h	2009-09-15 15:45:49.000000000 +1000
@@ -11,27 +11,30 @@
 #include <linux/cpumask.h>
 #include <linux/percpu.h>
 
+/*
+ * This needs to be big enough to allow any pagetable sizes we need,
+ * but small enough to fit in the low bits of any page table pointer.
+ * In other words all pagetables, even tiny ones, must be aligned to
+ * allow at least enough low 0 bits to contain this value.
+ */
+#define MAX_PGTABLE_INDEX_SIZE	0xf
+
 #ifndef CONFIG_PPC_SUBPAGE_PROT
 static inline void subpage_prot_free(pgd_t *pgd) {}
 #endif
 
 extern struct kmem_cache *pgtable_cache[];
-
-#define PGD_CACHE_NUM		0
-#define PUD_CACHE_NUM		1
-#define PMD_CACHE_NUM		1
-#define HUGEPTE_CACHE_NUM	2
-#define PTE_NONCACHE_NUM	7  /* from GFP rather than kmem_cache */
+#define PGT_CACHE(shift) (pgtable_cache[(shift)-1])
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-	return kmem_cache_alloc(pgtable_cache[PGD_CACHE_NUM], GFP_KERNEL);
+	return kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE), GFP_KERNEL);
 }
 
 static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 {
 	subpage_prot_free(pgd);
-	kmem_cache_free(pgtable_cache[PGD_CACHE_NUM], pgd);
+	kmem_cache_free(PGT_CACHE(PGD_INDEX_SIZE), pgd);
 }
 
 #ifndef CONFIG_PPC_64K_PAGES
@@ -40,13 +43,13 @@ static inline void pgd_free(struct mm_st
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return kmem_cache_alloc(pgtable_cache[PUD_CACHE_NUM],
+	return kmem_cache_alloc(PGT_CACHE(PUD_INDEX_SIZE),
 				GFP_KERNEL|__GFP_REPEAT);
 }
 
 static inline void pud_free(struct mm_struct *mm, pud_t *pud)
 {
-	kmem_cache_free(pgtable_cache[PUD_CACHE_NUM], pud);
+	kmem_cache_free(PGT_CACHE(PUD_INDEX_SIZE), pud);
 }
 
 static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
@@ -78,13 +81,13 @@ static inline void pmd_populate_kernel(s
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return kmem_cache_alloc(pgtable_cache[PMD_CACHE_NUM],
+	return kmem_cache_alloc(PGT_CACHE(PMD_INDEX_SIZE),
 				GFP_KERNEL|__GFP_REPEAT);
 }
 
 static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
 {
-	kmem_cache_free(pgtable_cache[PMD_CACHE_NUM], pmd);
+	kmem_cache_free(PGT_CACHE(PMD_INDEX_SIZE), pmd);
 }
 
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
@@ -107,24 +110,22 @@ static inline pgtable_t pte_alloc_one(st
 	return page;
 }
 
-static inline void pgtable_free(pgtable_free_t pgf)
+static inline void pgtable_free(void *table, unsigned index_size)
 {
-	void *p = (void *)(pgf.val & ~PGF_CACHENUM_MASK);
-	int cachenum = pgf.val & PGF_CACHENUM_MASK;
-
-	if (cachenum == PTE_NONCACHE_NUM)
-		free_page((unsigned long)p);
-	else
-		kmem_cache_free(pgtable_cache[cachenum], p);
+	if (!index_size)
+		free_page((unsigned long)table);
+	else {
+		BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
+		kmem_cache_free(PGT_CACHE(index_size), table);
+	}
 }
 
-#define __pmd_free_tlb(tlb, pmd,addr)		      \
-	pgtable_free_tlb(tlb, pgtable_free_cache(pmd, \
-		PMD_CACHE_NUM, PMD_TABLE_SIZE-1))
+#define __pmd_free_tlb(tlb, pmd, addr)		      \
+	pgtable_free_tlb(tlb, pmd, PMD_INDEX_SIZE)
 #ifndef CONFIG_PPC_64K_PAGES
 #define __pud_free_tlb(tlb, pud, addr)		      \
-	pgtable_free_tlb(tlb, pgtable_free_cache(pud, \
-		PUD_CACHE_NUM, PUD_TABLE_SIZE-1))
+	pgtable_free_tlb(tlb, pud, PUD_INDEX_SIZE)
+
 #endif /* CONFIG_PPC_64K_PAGES */
 
 #define check_pgt_cache()	do { } while (0)
Index: working-2.6/arch/powerpc/include/asm/pgalloc.h
===================================================================
--- working-2.6.orig/arch/powerpc/include/asm/pgalloc.h	2009-08-14 16:07:54.000000000 +1000
+++ working-2.6/arch/powerpc/include/asm/pgalloc.h	2009-09-15 15:44:24.000000000 +1000
@@ -24,25 +24,6 @@ static inline void pte_free(struct mm_st
 	__free_page(ptepage);
 }
 
-typedef struct pgtable_free {
-	unsigned long val;
-} pgtable_free_t;
-
-/* This needs to be big enough to allow for MMU_PAGE_COUNT + 2 to be stored
- * and small enough to fit in the low bits of any naturally aligned page
- * table cache entry. Arbitrarily set to 0x1f, that should give us some
- * room to grow
- */
-#define PGF_CACHENUM_MASK	0x1f
-
-static inline pgtable_free_t pgtable_free_cache(void *p, int cachenum,
-						unsigned long mask)
-{
-	BUG_ON(cachenum > PGF_CACHENUM_MASK);
-
-	return (pgtable_free_t){.val = ((unsigned long) p & ~mask) | cachenum};
-}
-
 #ifdef CONFIG_PPC64
 #include <asm/pgalloc-64.h>
 #else
@@ -50,12 +31,12 @@ static inline pgtable_free_t pgtable_fre
 #endif
 
 #ifdef CONFIG_SMP
-extern void pgtable_free_tlb(struct mmu_gather *tlb, pgtable_free_t pgf);
+extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift);
 extern void pte_free_finish(void);
 #else /* CONFIG_SMP */
-static inline void pgtable_free_tlb(struct mmu_gather *tlb, pgtable_free_t pgf)
+static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift)
 {
-	pgtable_free(pgf);
+	pgtable_free(table, shift);
 }
 static inline void pte_free_finish(void) { }
 #endif /* !CONFIG_SMP */
@@ -63,12 +44,9 @@ static inline void pte_free_finish(void)
 static inline void __pte_free_tlb(struct mmu_gather *tlb, struct page *ptepage,
 				  unsigned long address)
 {
-	pgtable_free_t pgf = pgtable_free_cache(page_address(ptepage),
-						PTE_NONCACHE_NUM,
-						PTE_TABLE_SIZE-1);
 	tlb_flush_pgtable(tlb, address);
 	pgtable_page_dtor(ptepage);
-	pgtable_free_tlb(tlb, pgf);
+	pgtable_free_tlb(tlb, page_address(ptepage), 0);
 }
 
 #endif /* __KERNEL__ */
Index: working-2.6/arch/powerpc/mm/pgtable.c
===================================================================
--- working-2.6.orig/arch/powerpc/mm/pgtable.c	2009-08-28 13:46:31.000000000 +1000
+++ working-2.6/arch/powerpc/mm/pgtable.c	2009-09-15 15:52:32.000000000 +1000
@@ -47,12 +47,12 @@ struct pte_freelist_batch
 {
 	struct rcu_head	rcu;
 	unsigned int	index;
-	pgtable_free_t	tables[0];
+	unsigned long	tables[0];
 };
 
 #define PTE_FREELIST_SIZE \
 	((PAGE_SIZE - sizeof(struct pte_freelist_batch)) \
-	  / sizeof(pgtable_free_t))
+	  / sizeof(unsigned long))
 
 static void pte_free_smp_sync(void *arg)
 {
@@ -62,13 +62,13 @@ static void pte_free_smp_sync(void *arg)
 /* This is only called when we are critically out of memory
  * (and fail to get a page in pte_free_tlb).
  */
-static void pgtable_free_now(pgtable_free_t pgf)
+static void pgtable_free_now(void *table, unsigned shift)
 {
 	pte_freelist_forced_free++;
 
 	smp_call_function(pte_free_smp_sync, NULL, 1);
 
-	pgtable_free(pgf);
+	pgtable_free(table, shift);
 }
 
 static void pte_free_rcu_callback(struct rcu_head *head)
@@ -77,8 +77,12 @@ static void pte_free_rcu_callback(struct
 		container_of(head, struct pte_freelist_batch, rcu);
 	unsigned int i;
 
-	for (i = 0; i < batch->index; i++)
-		pgtable_free(batch->tables[i]);
+	for (i = 0; i < batch->index; i++) {
+		void *table = (void *)(batch->tables[i] & ~MAX_PGTABLE_INDEX_SIZE);
+		unsigned shift = batch->tables[i] & MAX_PGTABLE_INDEX_SIZE;
+
+		pgtable_free(table, shift);
+	}
 
 	free_page((unsigned long)batch);
 }
@@ -89,25 +93,28 @@ static void pte_free_submit(struct pte_f
 	call_rcu(&batch->rcu, pte_free_rcu_callback);
 }
 
-void pgtable_free_tlb(struct mmu_gather *tlb, pgtable_free_t pgf)
+void pgtable_free_tlb(struct mmu_gather *tlb, void *table, unsigned shift)
 {
 	/* This is safe since tlb_gather_mmu has disabled preemption */
 	struct pte_freelist_batch **batchp = &__get_cpu_var(pte_freelist_cur);
+	unsigned long pgf;
 
 	if (atomic_read(&tlb->mm->mm_users) < 2 ||
 	    cpumask_equal(mm_cpumask(tlb->mm), cpumask_of(smp_processor_id()))){
-		pgtable_free(pgf);
+		pgtable_free(table, shift);
 		return;
 	}
 
 	if (*batchp == NULL) {
 		*batchp = (struct pte_freelist_batch *)__get_free_page(GFP_ATOMIC);
 		if (*batchp == NULL) {
-			pgtable_free_now(pgf);
+			pgtable_free_now(table, shift);
 			return;
 		}
 		(*batchp)->index = 0;
 	}
+	BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
+	pgf = (unsigned long)table | (shift - 1);
 	(*batchp)->tables[(*batchp)->index++] = pgf;
 	if ((*batchp)->index == PTE_FREELIST_SIZE) {
 		pte_free_submit(*batchp);
Index: working-2.6/arch/powerpc/mm/hugetlbpage.c
===================================================================
--- working-2.6.orig/arch/powerpc/mm/hugetlbpage.c	2009-09-15 15:44:24.000000000 +1000
+++ working-2.6/arch/powerpc/mm/hugetlbpage.c	2009-09-15 16:03:08.000000000 +1000
@@ -43,26 +43,14 @@ static unsigned nr_gpages;
 unsigned int mmu_huge_psizes[MMU_PAGE_COUNT] = { }; /* initialize all to 0 */
 
 #define hugepte_shift			mmu_huge_psizes
-#define PTRS_PER_HUGEPTE(psize)		(1 << hugepte_shift[psize])
-#define HUGEPTE_TABLE_SIZE(psize)	(sizeof(pte_t) << hugepte_shift[psize])
+#define HUGEPTE_INDEX_SIZE(psize)	(mmu_huge_psizes[(psize)])
+#define PTRS_PER_HUGEPTE(psize)		(1 << mmu_huge_psizes[psize])
 
 #define HUGEPD_SHIFT(psize)		(mmu_psize_to_shift(psize) \
-						+ hugepte_shift[psize])
+					 + HUGEPTE_INDEX_SIZE(psize))
 #define HUGEPD_SIZE(psize)		(1UL << HUGEPD_SHIFT(psize))
 #define HUGEPD_MASK(psize)		(~(HUGEPD_SIZE(psize)-1))
 
-/* Subtract one from array size because we don't need a cache for 4K since
- * is not a huge page size */
-#define HUGE_PGTABLE_INDEX(psize)	(HUGEPTE_CACHE_NUM + psize - 1)
-#define HUGEPTE_CACHE_NAME(psize)	(huge_pgtable_cache_name[psize])
-
-static const char *huge_pgtable_cache_name[MMU_PAGE_COUNT] = {
-	[MMU_PAGE_64K]	= "hugepte_cache_64K",
-	[MMU_PAGE_1M]	= "hugepte_cache_1M",
-	[MMU_PAGE_16M]	= "hugepte_cache_16M",
-	[MMU_PAGE_16G]	= "hugepte_cache_16G",
-};
-
 /* Flag to mark huge PD pointers.  This means pmd_bad() and pud_bad()
  * will choke on pointers to hugepte tables, which is handy for
  * catching screwups early. */
@@ -114,15 +102,15 @@ static inline pte_t *hugepte_offset(huge
 static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
 			   unsigned long address, unsigned int psize)
 {
-	pte_t *new = kmem_cache_zalloc(pgtable_cache[HUGE_PGTABLE_INDEX(psize)],
-				      GFP_KERNEL|__GFP_REPEAT);
+	pte_t *new = kmem_cache_zalloc(PGT_CACHE(hugepte_shift[psize]),
+				       GFP_KERNEL|__GFP_REPEAT);
 
 	if (! new)
 		return -ENOMEM;
 
 	spin_lock(&mm->page_table_lock);
 	if (!hugepd_none(*hpdp))
-		kmem_cache_free(pgtable_cache[HUGE_PGTABLE_INDEX(psize)], new);
+		kmem_cache_free(PGT_CACHE(hugepte_shift[psize]), new);
 	else
 		hpdp->pd = (unsigned long)new | HUGEPD_OK;
 	spin_unlock(&mm->page_table_lock);
@@ -271,9 +259,7 @@ static void free_hugepte_range(struct mm
 
 	hpdp->pd = 0;
 	tlb->need_flush = 1;
-	pgtable_free_tlb(tlb, pgtable_free_cache(hugepte,
-						 HUGEPTE_CACHE_NUM+psize-1,
-						 PGF_CACHENUM_MASK));
+	pgtable_free_tlb(tlb, hugepte, hugepte_shift[psize]);
 }
 
 static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
@@ -698,8 +684,6 @@ static void __init set_huge_psize(int ps
 		if (mmu_huge_psizes[psize] ||
 		   mmu_psize_defs[psize].shift == PAGE_SHIFT)
 			return;
-		if (WARN_ON(HUGEPTE_CACHE_NAME(psize) == NULL))
-			return;
 		hugetlb_add_hstate(mmu_psize_defs[psize].shift - PAGE_SHIFT);
 
 		switch (mmu_psize_defs[psize].shift) {
@@ -769,16 +753,11 @@ static int __init hugetlbpage_init(void)
 
 	for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
 		if (mmu_huge_psizes[psize]) {
-			pgtable_cache[HUGE_PGTABLE_INDEX(psize)] =
-				kmem_cache_create(
-					HUGEPTE_CACHE_NAME(psize),
-					HUGEPTE_TABLE_SIZE(psize),
-					HUGEPTE_TABLE_SIZE(psize),
-					0,
-					NULL);
-			if (!pgtable_cache[HUGE_PGTABLE_INDEX(psize)])
-				panic("hugetlbpage_init(): could not create %s"\
-				      "\n", HUGEPTE_CACHE_NAME(psize));
+			pgtable_cache_add(hugepte_shift[psize], NULL);
+			if (!PGT_CACHE(hugepte_shift[psize]))
+				panic("hugetlbpage_init(): could not create "
+				      "pgtable cache for %d bit pagesize\n",
+				      mmu_psize_to_shift(psize));
 		}
 	}
 
Index: working-2.6/arch/powerpc/include/asm/pgtable-ppc64.h
===================================================================
--- working-2.6.orig/arch/powerpc/include/asm/pgtable-ppc64.h	2009-08-28 13:46:31.000000000 +1000
+++ working-2.6/arch/powerpc/include/asm/pgtable-ppc64.h	2009-09-15 16:03:07.000000000 +1000
@@ -354,6 +354,7 @@ static inline void __ptep_set_access_fla
 #define pgoff_to_pte(off)	((pte_t) {((off) << PTE_RPN_SHIFT)|_PAGE_FILE})
 #define PTE_FILE_MAX_BITS	(BITS_PER_LONG - PTE_RPN_SHIFT)
 
+void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
 void pgtable_cache_init(void);
 
 /*

^ permalink raw reply

* [1/5] Make hpte_need_flush() correctly mask for multiple page sizes
From: David Gibson @ 2009-09-15  6:43 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt
In-Reply-To: <20090915064133.GA11621@yookeroo.seuss>

Currently, hpte_need_flush() only correctly flushes the given address
for normal pages.  Callers for hugepages are required to mask the
address themselves.

But hpte_need_flush() already looks up the page sizes for its own
reasons, so this is a rather silly imposition on the callers.  This
patch alters it to mask based on the pagesize it has looked up itself,
and removes the awkward masking code in the hugepage caller.

Signed-off-by: David Gibson <dwg@au1.ibm.com>

---
 arch/powerpc/mm/hugetlbpage.c |    6 +-----
 arch/powerpc/mm/tlb_hash64.c  |    8 +++-----
 2 files changed, 4 insertions(+), 10 deletions(-)

Index: working-2.6/arch/powerpc/mm/tlb_hash64.c
===================================================================
--- working-2.6.orig/arch/powerpc/mm/tlb_hash64.c	2009-09-04 14:35:30.000000000 +1000
+++ working-2.6/arch/powerpc/mm/tlb_hash64.c	2009-09-04 14:36:12.000000000 +1000
@@ -53,11 +53,6 @@ void hpte_need_flush(struct mm_struct *m
 
 	i = batch->index;
 
-	/* We mask the address for the base page size. Huge pages will
-	 * have applied their own masking already
-	 */
-	addr &= PAGE_MASK;
-
 	/* Get page size (maybe move back to caller).
 	 *
 	 * NOTE: when using special 64K mappings in 4K environment like
@@ -75,6 +70,9 @@ void hpte_need_flush(struct mm_struct *m
 	} else
 		psize = pte_pagesize_index(mm, addr, pte);
 
+	/* Mask the address for the correct page size */
+	addr &= ~((1UL << mmu_psize_defs[psize].shift) - 1);
+
 	/* Build full vaddr */
 	if (!is_kernel_addr(addr)) {
 		ssize = user_segment_size(addr);
Index: working-2.6/arch/powerpc/mm/hugetlbpage.c
===================================================================
--- working-2.6.orig/arch/powerpc/mm/hugetlbpage.c	2009-09-04 14:35:30.000000000 +1000
+++ working-2.6/arch/powerpc/mm/hugetlbpage.c	2009-09-04 14:36:12.000000000 +1000
@@ -445,11 +445,7 @@ void set_huge_pte_at(struct mm_struct *m
 		 * necessary anymore if we make hpte_need_flush() get the
 		 * page size from the slices
 		 */
-		unsigned int psize = get_slice_psize(mm, addr);
-		unsigned int shift = mmu_psize_to_shift(psize);
-		unsigned long sz = ((1UL) << shift);
-		struct hstate *hstate = size_to_hstate(sz);
-		pte_update(mm, addr & hstate->mask, ptep, ~0UL, 1);
+		pte_update(mm, addr, ptep, ~0UL, 1);
 	}
 	*ptep = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
 }

^ permalink raw reply

* [4/5] Cleanup initialization of hugepages on powerpc
From: David Gibson @ 2009-09-15  6:43 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt
In-Reply-To: <20090915064133.GA11621@yookeroo.seuss>

This patch simplifies the logic used to initialize hugepages on
powerpc.  The somewhat oddly named set_huge_psize() is renamed to
add_huge_page_size() and now does all necessary verification of
whether it's given a valid hugepage sizes (instead of just some) and
instantiates the generic hstate structure (but no more).  

hugetlbpage_init() now steps through the available pagesizes, checks
if they're valid for hugepages by calling add_huge_page_size() and
initializes the kmem_caches for the hugepage pagetables.  This means
we can now eliminate the mmu_huge_psizes array, since we no longer
need to pass the sizing information for the pagetable caches from
set_huge_psize() into hugetlbpage_init()

Signed-off-by: David Gibson <dwg@au1.ibm.com>

---
 arch/powerpc/mm/hugetlbpage.c |  106 +++++++++++++++++++-----------------------
 1 file changed, 49 insertions(+), 57 deletions(-)

Index: working-2.6/arch/powerpc/mm/hugetlbpage.c
===================================================================
--- working-2.6.orig/arch/powerpc/mm/hugetlbpage.c	2009-09-09 15:15:12.000000000 +1000
+++ working-2.6/arch/powerpc/mm/hugetlbpage.c	2009-09-09 15:22:49.000000000 +1000
@@ -37,11 +37,6 @@
 static unsigned long gpage_freearray[MAX_NUMBER_GPAGES];
 static unsigned nr_gpages;
 
-/* Array of valid huge page sizes - non-zero value(hugepte_shift) is
- * stored for the huge page sizes that are valid.
- */
-static unsigned int mmu_huge_psizes[MMU_PAGE_COUNT] = { }; /* initialize all to 0 */
-
 /* Flag to mark huge PD pointers.  This means pmd_bad() and pud_bad()
  * will choke on pointers to hugepte tables, which is handy for
  * catching screwups early. */
@@ -502,8 +497,6 @@ unsigned long hugetlb_get_unmapped_area(
 	struct hstate *hstate = hstate_file(file);
 	int mmu_psize = shift_to_mmu_psize(huge_page_shift(hstate));
 
-	if (!mmu_huge_psizes[mmu_psize])
-		return -EINVAL;
 	return slice_get_unmapped_area(addr, len, flags, mmu_psize, 1, 0);
 }
 
@@ -666,47 +659,46 @@ repeat:
 	return err;
 }
 
-static void __init set_huge_psize(int psize)
+static int __init add_huge_page_size(unsigned long long size)
 {
-	unsigned pdshift;
+	int shift = __ffs(size);
+	int mmu_psize;
 
 	/* Check that it is a page size supported by the hardware and
-	 * that it fits within pagetable limits. */
-	if (mmu_psize_defs[psize].shift &&
-		mmu_psize_defs[psize].shift < SID_SHIFT_1T &&
-		(mmu_psize_defs[psize].shift > MIN_HUGEPTE_SHIFT ||
-		 mmu_psize_defs[psize].shift == PAGE_SHIFT_64K ||
-		 mmu_psize_defs[psize].shift == PAGE_SHIFT_16G)) {
-		/* Return if huge page size has already been setup or is the
-		 * same as the base page size. */
-		if (mmu_huge_psizes[psize] ||
-		   mmu_psize_defs[psize].shift == PAGE_SHIFT)
-			return;
-		hugetlb_add_hstate(mmu_psize_defs[psize].shift - PAGE_SHIFT);
+	 * that it fits within pagetable and slice limits. */
+	if (!is_power_of_2(size)
+	    || (shift > SLICE_HIGH_SHIFT) || (shift <= PAGE_SHIFT))
+		return -EINVAL;
 
-		if (mmu_psize_defs[psize].shift < PMD_SHIFT)
-			pdshift = PMD_SHIFT;
-		else if (mmu_psize_defs[psize].shift < PUD_SHIFT)
-			pdshift = PUD_SHIFT;
-		else
-			pdshift = PGDIR_SHIFT;
-		mmu_huge_psizes[psize] = pdshift - mmu_psize_defs[psize].shift;
-	}
+	if ((mmu_psize = shift_to_mmu_psize(shift)) < 0)
+		return -EINVAL;
+
+#ifndef CONFIG_SPU_FS_64K_LS
+	/* Disable support for 64K huge pages when 64K SPU local store
+	 * support is enabled as the current implementation conflicts.
+	 */
+	if (size == PAGE_SIZE_64K)
+		return -EINVAL;
+#endif /* CONFIG_SPU_FS_64K_LS */
+
+	BUG_ON(mmu_psize_defs[mmu_psize].shift != shift);
+
+	/* Return if huge page size has already been setup */
+	if (size_to_hstate(size))
+		return 0;
+
+	hugetlb_add_hstate(shift - PAGE_SHIFT);
+
+	return 0;
 }
 
 static int __init hugepage_setup_sz(char *str)
 {
 	unsigned long long size;
-	int mmu_psize;
-	int shift;
 
 	size = memparse(str, &str);
 
-	shift = __ffs(size);
-	mmu_psize = shift_to_mmu_psize(shift);
-	if (mmu_psize >= 0 && mmu_psize_defs[mmu_psize].shift)
-		set_huge_psize(mmu_psize);
-	else
+	if (add_huge_page_size(size) != 0)
 		printk(KERN_WARNING "Invalid huge page size specified(%llu)\n", size);
 
 	return 1;
@@ -720,31 +712,31 @@ static int __init hugetlbpage_init(void)
 	if (!cpu_has_feature(CPU_FTR_16M_PAGE))
 		return -ENODEV;
 
-	/* Add supported huge page sizes.  Need to change HUGE_MAX_HSTATE
-	 * and adjust PTE_NONCACHE_NUM if the number of supported huge page
-	 * sizes changes.
-	 */
-	set_huge_psize(MMU_PAGE_16M);
-	set_huge_psize(MMU_PAGE_16G);
+	for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
+		unsigned shift;
+		unsigned pdshift;
 
-	/* Temporarily disable support for 64K huge pages when 64K SPU local
-	 * store support is enabled as the current implementation conflicts.
-	 */
-#ifndef CONFIG_SPU_FS_64K_LS
-	set_huge_psize(MMU_PAGE_64K);
-#endif
+		if (!mmu_psize_defs[psize].shift)
+			continue;
 
-	for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
-		if (mmu_huge_psizes[psize]) {
-			pgtable_cache_add(mmu_huge_psizes[psize], NULL);
-			if (!PGT_CACHE(mmu_huge_psizes[psize]))
-				panic("hugetlbpage_init(): could not create "
-				      "pgtable cache for %d bit pagesize\n",
-				      mmu_psize_to_shift(psize));
-		}
+		shift = mmu_psize_to_shift(psize);
+
+		if (add_huge_page_size(1ULL << shift) < 0)
+			continue;
+
+		if (shift < PMD_SHIFT)
+			pdshift = PMD_SHIFT;
+		else if (shift < PUD_SHIFT)
+			pdshift = PUD_SHIFT;
+		else
+			pdshift = PGDIR_SHIFT;
+
+		pgtable_cache_add(pdshift - shift, NULL);
+		if (!PGT_CACHE(pdshift - shift))
+			panic("hugetlbpage_init(): could not create "
+			      "pgtable cache for %d bit pagesize\n", shift);
 	}
 
 	return 0;
 }
-
 module_init(hugetlbpage_init);

^ permalink raw reply

* [3/5] Allow more flexible layouts for hugepage pagetables
From: David Gibson @ 2009-09-15  6:43 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt
In-Reply-To: <20090915064133.GA11621@yookeroo.seuss>

Currently each available hugepage size uses a slightly different
pagetable layout: that is, the bottem level table of pointers to
hugepages is a different size, and may branch off from the normal page
tables at a different level.  Every hugepage aware path that needs to
walk the pagetables must therefore look up the hugepage size from the
slice info first, and work out the correct way to walk the pagetables
accordingly.  Future hardware is likely to add more possible hugepage
sizes, more layout options and more mess.

This patch, therefore reworks the handling of hugepage pagetables to
reduce this complexity.  In the new scheme, instead of having to
consult the slice mask, pagetable walking code can check a flag in the
PGD/PUD/PMD entries to see where to branch off to hugepage pagetables,
and the entry also contains the information (eseentially hugepage
shift) necessary to then interpret that table without recourse to the
slice mask.  This scheme can be extended neatly to handle multiple
levels of self-describing "special" hugepage pagetables, although for
now we assume only one level exists.

This approach means that only the pagetable allocation path needs to
know how the pagetables should be set out.  All other (hugepage)
pagetable walking paths can just interpret the structure as they go.

There already was a flag bit in PGD/PUD/PMD entries for hugepage
directory pointers, but it was only used for debug.  We alter that
flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable
pointer (normally it would be 1 since the pointer lies in the linear
mapping).  This means that asm pagetable walking can test for (and
punt on) hugepage pointers with the same test that checks for
unpopulated page directory entries (beq becomes bge), since hugepage
pointers will always be positive, and normal pointers always negative.

While we're at it, we get rid of the confusing (and grep defeating)
#defining of hugepte_shift to be the same thing as mmu_huge_psizes.

Signed-off-by: David Gibson <dwg@au1.ibm.com>

---
 arch/powerpc/include/asm/hugetlb.h       |   12 
 arch/powerpc/include/asm/mmu-hash64.h    |   14 
 arch/powerpc/include/asm/pgtable-ppc64.h |   13 
 arch/powerpc/kernel/perf_callchain.c     |   20 -
 arch/powerpc/mm/gup.c                    |  149 +--------
 arch/powerpc/mm/hash_utils_64.c          |   17 -
 arch/powerpc/mm/hugetlbpage.c            |  473 ++++++++++++++-----------------
 arch/powerpc/mm/init_64.c                |   10 
 8 files changed, 302 insertions(+), 406 deletions(-)

Index: working-2.6/arch/powerpc/mm/hugetlbpage.c
===================================================================
--- working-2.6.orig/arch/powerpc/mm/hugetlbpage.c	2009-09-15 16:03:08.000000000 +1000
+++ working-2.6/arch/powerpc/mm/hugetlbpage.c	2009-09-15 16:08:02.000000000 +1000
@@ -40,25 +40,11 @@ static unsigned nr_gpages;
 /* Array of valid huge page sizes - non-zero value(hugepte_shift) is
  * stored for the huge page sizes that are valid.
  */
-unsigned int mmu_huge_psizes[MMU_PAGE_COUNT] = { }; /* initialize all to 0 */
-
-#define hugepte_shift			mmu_huge_psizes
-#define HUGEPTE_INDEX_SIZE(psize)	(mmu_huge_psizes[(psize)])
-#define PTRS_PER_HUGEPTE(psize)		(1 << mmu_huge_psizes[psize])
-
-#define HUGEPD_SHIFT(psize)		(mmu_psize_to_shift(psize) \
-					 + HUGEPTE_INDEX_SIZE(psize))
-#define HUGEPD_SIZE(psize)		(1UL << HUGEPD_SHIFT(psize))
-#define HUGEPD_MASK(psize)		(~(HUGEPD_SIZE(psize)-1))
+static unsigned int mmu_huge_psizes[MMU_PAGE_COUNT] = { }; /* initialize all to 0 */
 
 /* Flag to mark huge PD pointers.  This means pmd_bad() and pud_bad()
  * will choke on pointers to hugepte tables, which is handy for
  * catching screwups early. */
-#define HUGEPD_OK	0x1
-
-typedef struct { unsigned long pd; } hugepd_t;
-
-#define hugepd_none(hpd)	((hpd).pd == 0)
 
 static inline int shift_to_mmu_psize(unsigned int shift)
 {
@@ -82,71 +68,126 @@ static inline unsigned int mmu_psize_to_
 	BUG();
 }
 
+#define hugepd_none(hpd)	((hpd).pd == 0)
+
 static inline pte_t *hugepd_page(hugepd_t hpd)
 {
-	BUG_ON(!(hpd.pd & HUGEPD_OK));
-	return (pte_t *)(hpd.pd & ~HUGEPD_OK);
+	BUG_ON(!hugepd_ok(hpd));
+	return (pte_t *)((hpd.pd & ~HUGEPD_SHIFT_MASK) | 0xc000000000000000);
 }
 
-static inline pte_t *hugepte_offset(hugepd_t *hpdp, unsigned long addr,
-				    struct hstate *hstate)
+static inline unsigned int hugepd_shift(hugepd_t hpd)
 {
-	unsigned int shift = huge_page_shift(hstate);
-	int psize = shift_to_mmu_psize(shift);
-	unsigned long idx = ((addr >> shift) & (PTRS_PER_HUGEPTE(psize)-1));
+	return hpd.pd & HUGEPD_SHIFT_MASK;
+}
+
+static inline pte_t *hugepte_offset(hugepd_t *hpdp, unsigned long addr, unsigned pdshift)
+{
+	unsigned long idx = (addr & ((1UL << pdshift) - 1)) >> hugepd_shift(*hpdp);
 	pte_t *dir = hugepd_page(*hpdp);
 
 	return dir + idx;
 }
 
+pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift)
+{
+	pgd_t *pg;
+	pud_t *pu;
+	pmd_t *pm;
+	hugepd_t *hpdp = NULL;
+	unsigned pdshift = PGDIR_SHIFT;
+
+	if (shift)
+		*shift = 0;
+
+	pg = pgdir + pgd_index(ea);
+	if (is_hugepd(pg)) {
+		hpdp = (hugepd_t *)pg;
+	} else if (!pgd_none(*pg)) {
+		pdshift = PUD_SHIFT;
+		pu = pud_offset(pg, ea);
+		if (is_hugepd(pu))
+			hpdp = (hugepd_t *)pu;
+		else if (!pud_none(*pu)) {
+			pdshift = PMD_SHIFT;
+			pm = pmd_offset(pu, ea);
+			if (is_hugepd(pm))
+				hpdp = (hugepd_t *)pm;
+			else if (!pmd_none(*pm)) {
+				return pte_offset_map(pm, ea);
+			}
+		}
+	}
+
+	if (!hpdp)
+		return NULL;
+
+	if (shift)
+		*shift = hugepd_shift(*hpdp);
+	return hugepte_offset(hpdp, ea, pdshift);
+}
+
+pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+{
+	return find_linux_pte_or_hugepte(mm->pgd, addr, NULL);
+}
+
 static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
-			   unsigned long address, unsigned int psize)
+			   unsigned long address, unsigned pdshift, unsigned pshift)
 {
-	pte_t *new = kmem_cache_zalloc(PGT_CACHE(hugepte_shift[psize]),
+	pte_t *new = kmem_cache_zalloc(PGT_CACHE(pdshift - pshift),
 				       GFP_KERNEL|__GFP_REPEAT);
 
+	BUG_ON(pshift > HUGEPD_SHIFT_MASK);
+	BUG_ON((unsigned long)new & HUGEPD_SHIFT_MASK);
+
 	if (! new)
 		return -ENOMEM;
 
 	spin_lock(&mm->page_table_lock);
 	if (!hugepd_none(*hpdp))
-		kmem_cache_free(PGT_CACHE(hugepte_shift[psize]), new);
+		kmem_cache_free(PGT_CACHE(pdshift - pshift), new);
 	else
-		hpdp->pd = (unsigned long)new | HUGEPD_OK;
+		hpdp->pd = ((unsigned long)new & ~0x8000000000000000) | pshift;
 	spin_unlock(&mm->page_table_lock);
 	return 0;
 }
 
-
-static pud_t *hpud_offset(pgd_t *pgd, unsigned long addr, struct hstate *hstate)
+pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz)
 {
-	if (huge_page_shift(hstate) < PUD_SHIFT)
-		return pud_offset(pgd, addr);
-	else
-		return (pud_t *) pgd;
-}
-static pud_t *hpud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long addr,
-			 struct hstate *hstate)
-{
-	if (huge_page_shift(hstate) < PUD_SHIFT)
-		return pud_alloc(mm, pgd, addr);
-	else
-		return (pud_t *) pgd;
-}
-static pmd_t *hpmd_offset(pud_t *pud, unsigned long addr, struct hstate *hstate)
-{
-	if (huge_page_shift(hstate) < PMD_SHIFT)
-		return pmd_offset(pud, addr);
-	else
-		return (pmd_t *) pud;
-}
-static pmd_t *hpmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long addr,
-			 struct hstate *hstate)
-{
-	if (huge_page_shift(hstate) < PMD_SHIFT)
-		return pmd_alloc(mm, pud, addr);
-	else
-		return (pmd_t *) pud;
+	pgd_t *pg;
+	pud_t *pu;
+	pmd_t *pm;
+	hugepd_t *hpdp = NULL;
+	unsigned pshift = __ffs(sz);
+	unsigned pdshift = PGDIR_SHIFT;
+
+	addr &= ~(sz-1);
+
+	pg = pgd_offset(mm, addr);
+	if (pshift >= PUD_SHIFT) {
+		hpdp = (hugepd_t *)pg;
+	} else {
+		pdshift = PUD_SHIFT;
+		pu = pud_alloc(mm, pg, addr);
+		if (pshift >= PMD_SHIFT) {
+			hpdp = (hugepd_t *)pu;
+		} else {
+			pdshift = PMD_SHIFT;
+			pm = pmd_alloc(mm, pu, addr);
+			hpdp = (hugepd_t *)pm;
+		}
+	}
+
+	if (!hpdp)
+		return NULL;
+
+	BUG_ON(!hugepd_none(*hpdp) && !hugepd_ok(*hpdp));
+
+	if (hugepd_none(*hpdp) && __hugepte_alloc(mm, hpdp, addr, pdshift, pshift))
+		return NULL;
+
+	return hugepte_offset(hpdp, addr, pdshift);
 }
 
 /* Build list of addresses of gigantic pages.  This function is used in early
@@ -180,92 +221,38 @@ int alloc_bootmem_huge_page(struct hstat
 	return 1;
 }
 
-
-/* Modelled after find_linux_pte() */
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
-{
-	pgd_t *pg;
-	pud_t *pu;
-	pmd_t *pm;
-
-	unsigned int psize;
-	unsigned int shift;
-	unsigned long sz;
-	struct hstate *hstate;
-	psize = get_slice_psize(mm, addr);
-	shift = mmu_psize_to_shift(psize);
-	sz = ((1UL) << shift);
-	hstate = size_to_hstate(sz);
-
-	addr &= hstate->mask;
-
-	pg = pgd_offset(mm, addr);
-	if (!pgd_none(*pg)) {
-		pu = hpud_offset(pg, addr, hstate);
-		if (!pud_none(*pu)) {
-			pm = hpmd_offset(pu, addr, hstate);
-			if (!pmd_none(*pm))
-				return hugepte_offset((hugepd_t *)pm, addr,
-						      hstate);
-		}
-	}
-
-	return NULL;
-}
-
-pte_t *huge_pte_alloc(struct mm_struct *mm,
-			unsigned long addr, unsigned long sz)
-{
-	pgd_t *pg;
-	pud_t *pu;
-	pmd_t *pm;
-	hugepd_t *hpdp = NULL;
-	struct hstate *hstate;
-	unsigned int psize;
-	hstate = size_to_hstate(sz);
-
-	psize = get_slice_psize(mm, addr);
-	BUG_ON(!mmu_huge_psizes[psize]);
-
-	addr &= hstate->mask;
-
-	pg = pgd_offset(mm, addr);
-	pu = hpud_alloc(mm, pg, addr, hstate);
-
-	if (pu) {
-		pm = hpmd_alloc(mm, pu, addr, hstate);
-		if (pm)
-			hpdp = (hugepd_t *)pm;
-	}
-
-	if (! hpdp)
-		return NULL;
-
-	if (hugepd_none(*hpdp) && __hugepte_alloc(mm, hpdp, addr, psize))
-		return NULL;
-
-	return hugepte_offset(hpdp, addr, hstate);
-}
-
 int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
 {
 	return 0;
 }
 
-static void free_hugepte_range(struct mmu_gather *tlb, hugepd_t *hpdp,
-			       unsigned int psize)
+static void free_hugepd_range(struct mmu_gather *tlb, hugepd_t *hpdp, int pdshift,
+			      unsigned long start, unsigned long end,
+			      unsigned long floor, unsigned long ceiling)
 {
 	pte_t *hugepte = hugepd_page(*hpdp);
+	unsigned shift = hugepd_shift(*hpdp);
+	unsigned long pdmask = ~((1UL << pdshift) - 1);
+
+	start &= pdmask;
+	if (start < floor)
+		return;
+	if (ceiling) {
+		ceiling &= pdmask;
+		if (! ceiling)
+			return;
+	}
+	if (end - 1 > ceiling - 1)
+		return;
 
 	hpdp->pd = 0;
 	tlb->need_flush = 1;
-	pgtable_free_tlb(tlb, hugepte, hugepte_shift[psize]);
+	pgtable_free_tlb(tlb, hugepte, pdshift - shift);
 }
 
 static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
 				   unsigned long addr, unsigned long end,
-				   unsigned long floor, unsigned long ceiling,
-				   unsigned int psize)
+				   unsigned long floor, unsigned long ceiling)
 {
 	pmd_t *pmd;
 	unsigned long next;
@@ -277,7 +264,8 @@ static void hugetlb_free_pmd_range(struc
 		next = pmd_addr_end(addr, end);
 		if (pmd_none(*pmd))
 			continue;
-		free_hugepte_range(tlb, (hugepd_t *)pmd, psize);
+		free_hugepd_range(tlb, (hugepd_t *)pmd, PMD_SHIFT,
+				  addr, next, floor, ceiling);
 	} while (pmd++, addr = next, addr != end);
 
 	start &= PUD_MASK;
@@ -303,23 +291,19 @@ static void hugetlb_free_pud_range(struc
 	pud_t *pud;
 	unsigned long next;
 	unsigned long start;
-	unsigned int shift;
-	unsigned int psize = get_slice_psize(tlb->mm, addr);
-	shift = mmu_psize_to_shift(psize);
 
 	start = addr;
 	pud = pud_offset(pgd, addr);
 	do {
 		next = pud_addr_end(addr, end);
-		if (shift < PMD_SHIFT) {
+		if (!is_hugepd(pud)) {
 			if (pud_none_or_clear_bad(pud))
 				continue;
 			hugetlb_free_pmd_range(tlb, pud, addr, next, floor,
-					       ceiling, psize);
+					       ceiling);
 		} else {
-			if (pud_none(*pud))
-				continue;
-			free_hugepte_range(tlb, (hugepd_t *)pud, psize);
+			free_hugepd_range(tlb, (hugepd_t *)pud, PUD_SHIFT,
+					  addr, next, floor, ceiling);
 		}
 	} while (pud++, addr = next, addr != end);
 
@@ -350,74 +334,34 @@ void hugetlb_free_pgd_range(struct mmu_g
 {
 	pgd_t *pgd;
 	unsigned long next;
-	unsigned long start;
 
 	/*
-	 * Comments below take from the normal free_pgd_range().  They
-	 * apply here too.  The tests against HUGEPD_MASK below are
-	 * essential, because we *don't* test for this at the bottom
-	 * level.  Without them we'll attempt to free a hugepte table
-	 * when we unmap just part of it, even if there are other
-	 * active mappings using it.
+	 * Because there are a number of different possible pagetable
+	 * layouts for hugepage ranges, we limit knowledge of how
+	 * things should be laid out to the allocation path
+	 * (huge_pte_alloc(), above).  Everything else works out the
+	 * structure as it goes from information in the hugepd
+	 * pointers.  That means that we can't here use the
+	 * optimization used in the normal page free_pgd_range(), of
+	 * checking whether we're actually covering a large enough
+	 * range to have to do anything at the top level of the walk
+	 * instead of at the bottom.
 	 *
-	 * The next few lines have given us lots of grief...
-	 *
-	 * Why are we testing HUGEPD* at this top level?  Because
-	 * often there will be no work to do at all, and we'd prefer
-	 * not to go all the way down to the bottom just to discover
-	 * that.
-	 *
-	 * Why all these "- 1"s?  Because 0 represents both the bottom
-	 * of the address space and the top of it (using -1 for the
-	 * top wouldn't help much: the masks would do the wrong thing).
-	 * The rule is that addr 0 and floor 0 refer to the bottom of
-	 * the address space, but end 0 and ceiling 0 refer to the top
-	 * Comparisons need to use "end - 1" and "ceiling - 1" (though
-	 * that end 0 case should be mythical).
-	 *
-	 * Wherever addr is brought up or ceiling brought down, we
-	 * must be careful to reject "the opposite 0" before it
-	 * confuses the subsequent tests.  But what about where end is
-	 * brought down by HUGEPD_SIZE below? no, end can't go down to
-	 * 0 there.
-	 *
-	 * Whereas we round start (addr) and ceiling down, by different
-	 * masks at different levels, in order to test whether a table
-	 * now has no other vmas using it, so can be freed, we don't
-	 * bother to round floor or end up - the tests don't need that.
+	 * To make sense of this, you should probably go read the big
+	 * block comment at the top of the normal free_pgd_range(),
+	 * too.
 	 */
-	unsigned int psize = get_slice_psize(tlb->mm, addr);
 
-	addr &= HUGEPD_MASK(psize);
-	if (addr < floor) {
-		addr += HUGEPD_SIZE(psize);
-		if (!addr)
-			return;
-	}
-	if (ceiling) {
-		ceiling &= HUGEPD_MASK(psize);
-		if (!ceiling)
-			return;
-	}
-	if (end - 1 > ceiling - 1)
-		end -= HUGEPD_SIZE(psize);
-	if (addr > end - 1)
-		return;
-
-	start = addr;
 	pgd = pgd_offset(tlb->mm, addr);
 	do {
-		psize = get_slice_psize(tlb->mm, addr);
-		BUG_ON(!mmu_huge_psizes[psize]);
 		next = pgd_addr_end(addr, end);
-		if (mmu_psize_to_shift(psize) < PUD_SHIFT) {
+		if (!is_hugepd(pgd)) {
 			if (pgd_none_or_clear_bad(pgd))
 				continue;
 			hugetlb_free_pud_range(tlb, pgd, addr, next, floor, ceiling);
 		} else {
-			if (pgd_none(*pgd))
-				continue;
-			free_hugepte_range(tlb, (hugepd_t *)pgd, psize);
+			free_hugepd_range(tlb, (hugepd_t *)pgd, PGDIR_SHIFT,
+					  addr, next, floor, ceiling);
 		}
 	} while (pgd++, addr = next, addr != end);
 }
@@ -448,19 +392,19 @@ follow_huge_addr(struct mm_struct *mm, u
 {
 	pte_t *ptep;
 	struct page *page;
-	unsigned int mmu_psize = get_slice_psize(mm, address);
+	unsigned shift;
+	unsigned long mask;
+
+	ptep = find_linux_pte_or_hugepte(mm->pgd, address, &shift);
 
 	/* Verify it is a huge page else bail. */
-	if (!mmu_huge_psizes[mmu_psize])
+	if (!ptep || !shift)
 		return ERR_PTR(-EINVAL);
 
-	ptep = huge_pte_offset(mm, address);
+	mask = (1UL << shift) - 1;
 	page = pte_page(*ptep);
-	if (page) {
-		unsigned int shift = mmu_psize_to_shift(mmu_psize);
-		unsigned long sz = ((1UL) << shift);
-		page += (address % sz) / PAGE_SIZE;
-	}
+	if (page)
+		page += (address & mask) / PAGE_SIZE;
 
 	return page;
 }
@@ -483,6 +427,73 @@ follow_huge_pmd(struct mm_struct *mm, un
 	return NULL;
 }
 
+static noinline int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
+		       unsigned long end, int write, struct page **pages, int *nr)
+{
+	unsigned long mask;
+	unsigned long pte_end;
+	struct page *head, *page;
+	pte_t pte;
+	int refs;
+
+	pte_end = (addr + sz) & ~(sz-1);
+	if (pte_end < end)
+		end = pte_end;
+
+	pte = *ptep;
+	mask = _PAGE_PRESENT | _PAGE_USER;
+	if (write)
+		mask |= _PAGE_RW;
+
+	if ((pte_val(pte) & mask) != mask)
+		return 0;
+
+	/* hugepages are never "special" */
+	VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
+
+	refs = 0;
+	head = pte_page(pte);
+
+	page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+		/* Could be optimized better */
+		while (*nr) {
+			put_page(page);
+			(*nr)--;
+		}
+	}
+
+	return 1;
+}
+
+int gup_hugepd(hugepd_t *hugepd, unsigned pdshift,
+	       unsigned long addr, unsigned long end,
+	       int write, struct page **pages, int *nr)
+{
+	pte_t *ptep;
+	unsigned long sz = 1UL << hugepd_shift(*hugepd);
+
+	ptep = hugepte_offset(hugepd, addr, pdshift);
+	do {
+		if (!gup_hugepte(ptep, sz, addr, end, write, pages, nr))
+			return 0;
+	} while (ptep++, addr += sz, addr != end);
+
+	return 1;
+}
 
 unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 					unsigned long len, unsigned long pgoff,
@@ -530,34 +541,20 @@ static unsigned int hash_huge_page_do_la
 	return rflags;
 }
 
-int hash_huge_page(struct mm_struct *mm, unsigned long access,
-		   unsigned long ea, unsigned long vsid, int local,
-		   unsigned long trap)
+int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
+		     pte_t *ptep, unsigned long trap, int local, int ssize,
+		     unsigned int shift, unsigned int mmu_psize)
 {
-	pte_t *ptep;
 	unsigned long old_pte, new_pte;
 	unsigned long va, rflags, pa, sz;
 	long slot;
 	int err = 1;
-	int ssize = user_segment_size(ea);
-	unsigned int mmu_psize;
-	int shift;
-	mmu_psize = get_slice_psize(mm, ea);
 
-	if (!mmu_huge_psizes[mmu_psize])
-		goto out;
-	ptep = huge_pte_offset(mm, ea);
+	BUG_ON(shift != mmu_psize_defs[mmu_psize].shift);
 
 	/* Search the Linux page table for a match with va */
 	va = hpt_va(ea, vsid, ssize);
 
-	/*
-	 * If no pte found or not present, send the problem up to
-	 * do_page_fault
-	 */
-	if (unlikely(!ptep || pte_none(*ptep)))
-		goto out;
-
 	/* 
 	 * Check the user's access rights to the page.  If access should be
 	 * prevented then send the problem up to do_page_fault.
@@ -588,7 +585,6 @@ int hash_huge_page(struct mm_struct *mm,
 	rflags = 0x2 | (!(new_pte & _PAGE_RW));
  	/* _PAGE_EXEC -> HW_NO_EXEC since it's inverted */
 	rflags |= ((new_pte & _PAGE_EXEC) ? 0 : HPTE_R_N);
-	shift = mmu_psize_to_shift(mmu_psize);
 	sz = ((1UL) << shift);
 	if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
 		/* No CPU has hugepages but lacks no execute, so we
@@ -672,6 +668,8 @@ repeat:
 
 static void __init set_huge_psize(int psize)
 {
+	unsigned pdshift;
+
 	/* Check that it is a page size supported by the hardware and
 	 * that it fits within pagetable limits. */
 	if (mmu_psize_defs[psize].shift &&
@@ -686,29 +684,14 @@ static void __init set_huge_psize(int ps
 			return;
 		hugetlb_add_hstate(mmu_psize_defs[psize].shift - PAGE_SHIFT);
 
-		switch (mmu_psize_defs[psize].shift) {
-		case PAGE_SHIFT_64K:
-		    /* We only allow 64k hpages with 4k base page,
-		     * which was checked above, and always put them
-		     * at the PMD */
-		    hugepte_shift[psize] = PMD_SHIFT;
-		    break;
-		case PAGE_SHIFT_16M:
-		    /* 16M pages can be at two different levels
-		     * of pagestables based on base page size */
-		    if (PAGE_SHIFT == PAGE_SHIFT_64K)
-			    hugepte_shift[psize] = PMD_SHIFT;
-		    else /* 4k base page */
-			    hugepte_shift[psize] = PUD_SHIFT;
-		    break;
-		case PAGE_SHIFT_16G:
-		    /* 16G pages are always at PGD level */
-		    hugepte_shift[psize] = PGDIR_SHIFT;
-		    break;
-		}
-		hugepte_shift[psize] -= mmu_psize_defs[psize].shift;
-	} else
-		hugepte_shift[psize] = 0;
+		if (mmu_psize_defs[psize].shift < PMD_SHIFT)
+			pdshift = PMD_SHIFT;
+		else if (mmu_psize_defs[psize].shift < PUD_SHIFT)
+			pdshift = PUD_SHIFT;
+		else
+			pdshift = PGDIR_SHIFT;
+		mmu_huge_psizes[psize] = pdshift - mmu_psize_defs[psize].shift;
+	}
 }
 
 static int __init hugepage_setup_sz(char *str)
@@ -732,7 +715,7 @@ __setup("hugepagesz=", hugepage_setup_sz
 
 static int __init hugetlbpage_init(void)
 {
-	unsigned int psize;
+	int psize;
 
 	if (!cpu_has_feature(CPU_FTR_16M_PAGE))
 		return -ENODEV;
@@ -753,8 +736,8 @@ static int __init hugetlbpage_init(void)
 
 	for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
 		if (mmu_huge_psizes[psize]) {
-			pgtable_cache_add(hugepte_shift[psize], NULL);
-			if (!PGT_CACHE(hugepte_shift[psize]))
+			pgtable_cache_add(mmu_huge_psizes[psize], NULL);
+			if (!PGT_CACHE(mmu_huge_psizes[psize]))
 				panic("hugetlbpage_init(): could not create "
 				      "pgtable cache for %d bit pagesize\n",
 				      mmu_psize_to_shift(psize));
Index: working-2.6/arch/powerpc/include/asm/hugetlb.h
===================================================================
--- working-2.6.orig/arch/powerpc/include/asm/hugetlb.h	2009-09-15 16:03:07.000000000 +1000
+++ working-2.6/arch/powerpc/include/asm/hugetlb.h	2009-09-15 16:08:01.000000000 +1000
@@ -3,6 +3,15 @@
 
 #include <asm/page.h>
 
+typedef struct { signed long pd; } hugepd_t;
+
+static inline int hugepd_ok(hugepd_t hpd)
+{
+	return (hpd.pd > 0);
+}
+
+#define is_hugepd(pdep)               (hugepd_ok(*((hugepd_t *)(pdep))))
+#define HUGEPD_SHIFT_MASK     0x3f
 
 int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 			   unsigned long len);
@@ -17,6 +26,9 @@ void set_huge_pte_at(struct mm_struct *m
 pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
 			      pte_t *ptep);
 
+int gup_hugepd(hugepd_t *hugepd, unsigned pdshift, unsigned long addr,
+	       unsigned long end, int write, struct page **pages, int *nr);
+
 /*
  * The version of vma_mmu_pagesize() in arch/powerpc/mm/hugetlbpage.c needs
  * to override the version in mm/hugetlb.c
Index: working-2.6/arch/powerpc/mm/init_64.c
===================================================================
--- working-2.6.orig/arch/powerpc/mm/init_64.c	2009-09-15 16:03:27.000000000 +1000
+++ working-2.6/arch/powerpc/mm/init_64.c	2009-09-15 16:08:08.000000000 +1000
@@ -41,6 +41,7 @@
 #include <linux/module.h>
 #include <linux/poison.h>
 #include <linux/lmb.h>
+#include <linux/hugetlb.h>
 
 #include <asm/pgalloc.h>
 #include <asm/page.h>
@@ -157,8 +158,13 @@ void pgtable_cache_add(unsigned shift, v
 	unsigned long align = table_size;
 	/* When batching pgtable pointers for RCU freeing, we store
 	 * the index size in the low bits.  Table alignment must be
-	 * big enough to fit it */
-	unsigned long minalign = MAX_PGTABLE_INDEX_SIZE + 1;
+	 * big enough to fit it.
+	 *
+	 * Likewise, hugeapge pagetable pointers contain a (different)
+	 * shift value in the low bits.  All tables must be aligned so
+	 * as to leave enough 0 bits in the address to contain it. */
+	unsigned long minalign = max(MAX_PGTABLE_INDEX_SIZE + 1,
+				     HUGEPD_SHIFT_MASK + 1);
 	struct kmem_cache *new;
 
 	BUILD_BUG_ON(!is_power_of_2(minalign));
Index: working-2.6/arch/powerpc/include/asm/pgtable-ppc64.h
===================================================================
--- working-2.6.orig/arch/powerpc/include/asm/pgtable-ppc64.h	2009-09-15 16:03:07.000000000 +1000
+++ working-2.6/arch/powerpc/include/asm/pgtable-ppc64.h	2009-09-15 16:03:36.000000000 +1000
@@ -379,7 +379,18 @@ void pgtable_cache_init(void);
 	return pt;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long address);
+#ifdef CONFIG_HUGETLB_PAGE
+pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
+				 unsigned *shift);
+#else
+static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
+					       unsigned *shift)
+{
+	if (shift)
+		*shift = 0;
+	return find_linux_pte(pgdir, ea);
+}
+#endif /* !CONFIG_HUGETLB_PAGE */
 
 #endif /* __ASSEMBLY__ */
 
Index: working-2.6/arch/powerpc/mm/gup.c
===================================================================
--- working-2.6.orig/arch/powerpc/mm/gup.c	2009-09-15 16:03:07.000000000 +1000
+++ working-2.6/arch/powerpc/mm/gup.c	2009-09-15 16:03:36.000000000 +1000
@@ -55,57 +55,6 @@ static noinline int gup_pte_range(pmd_t 
 	return 1;
 }
 
-#ifdef CONFIG_HUGETLB_PAGE
-static noinline int gup_huge_pte(pte_t *ptep, struct hstate *hstate,
-				 unsigned long *addr, unsigned long end,
-				 int write, struct page **pages, int *nr)
-{
-	unsigned long mask;
-	unsigned long pte_end;
-	struct page *head, *page;
-	pte_t pte;
-	int refs;
-
-	pte_end = (*addr + huge_page_size(hstate)) & huge_page_mask(hstate);
-	if (pte_end < end)
-		end = pte_end;
-
-	pte = *ptep;
-	mask = _PAGE_PRESENT|_PAGE_USER;
-	if (write)
-		mask |= _PAGE_RW;
-	if ((pte_val(pte) & mask) != mask)
-		return 0;
-	/* hugepages are never "special" */
-	VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
-
-	refs = 0;
-	head = pte_page(pte);
-	page = head + ((*addr & ~huge_page_mask(hstate)) >> PAGE_SHIFT);
-	do {
-		VM_BUG_ON(compound_head(page) != head);
-		pages[*nr] = page;
-		(*nr)++;
-		page++;
-		refs++;
-	} while (*addr += PAGE_SIZE, *addr != end);
-
-	if (!page_cache_add_speculative(head, refs)) {
-		*nr -= refs;
-		return 0;
-	}
-	if (unlikely(pte_val(pte) != pte_val(*ptep))) {
-		/* Could be optimized better */
-		while (*nr) {
-			put_page(page);
-			(*nr)--;
-		}
-	}
-
-	return 1;
-}
-#endif /* CONFIG_HUGETLB_PAGE */
-
 static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
 		int write, struct page **pages, int *nr)
 {
@@ -119,7 +68,11 @@ static int gup_pmd_range(pud_t pud, unsi
 		next = pmd_addr_end(addr, end);
 		if (pmd_none(pmd))
 			return 0;
-		if (!gup_pte_range(pmd, addr, next, write, pages, nr))
+		if (is_hugepd(pmdp)) {
+			if (!gup_hugepd((hugepd_t *)pmdp, PMD_SHIFT,
+					addr, next, write, pages, nr))
+				return 0;
+		} else if (!gup_pte_range(pmd, addr, next, write, pages, nr))
 			return 0;
 	} while (pmdp++, addr = next, addr != end);
 
@@ -139,7 +92,11 @@ static int gup_pud_range(pgd_t pgd, unsi
 		next = pud_addr_end(addr, end);
 		if (pud_none(pud))
 			return 0;
-		if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+		if (is_hugepd(pudp)) {
+			if (!gup_hugepd((hugepd_t *)pudp, PUD_SHIFT,
+					addr, next, write, pages, nr))
+				return 0;
+		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
 			return 0;
 	} while (pudp++, addr = next, addr != end);
 
@@ -154,10 +111,6 @@ int get_user_pages_fast(unsigned long st
 	unsigned long next;
 	pgd_t *pgdp;
 	int nr = 0;
-#ifdef CONFIG_PPC64
-	unsigned int shift;
-	int psize;
-#endif
 
 	pr_devel("%s(%lx,%x,%s)\n", __func__, start, nr_pages, write ? "write" : "read");
 
@@ -172,25 +125,6 @@ int get_user_pages_fast(unsigned long st
 
 	pr_devel("  aligned: %lx .. %lx\n", start, end);
 
-#ifdef CONFIG_HUGETLB_PAGE
-	/* We bail out on slice boundary crossing when hugetlb is
-	 * enabled in order to not have to deal with two different
-	 * page table formats
-	 */
-	if (addr < SLICE_LOW_TOP) {
-		if (end > SLICE_LOW_TOP)
-			goto slow_irqon;
-
-		if (unlikely(GET_LOW_SLICE_INDEX(addr) !=
-			     GET_LOW_SLICE_INDEX(end - 1)))
-			goto slow_irqon;
-	} else {
-		if (unlikely(GET_HIGH_SLICE_INDEX(addr) !=
-			     GET_HIGH_SLICE_INDEX(end - 1)))
-			goto slow_irqon;
-	}
-#endif /* CONFIG_HUGETLB_PAGE */
-
 	/*
 	 * XXX: batch / limit 'nr', to avoid large irq off latency
 	 * needs some instrumenting to determine the common sizes used by
@@ -210,54 +144,23 @@ int get_user_pages_fast(unsigned long st
 	 */
 	local_irq_disable();
 
-#ifdef CONFIG_PPC64
-	/* Those bits are related to hugetlbfs implementation and only exist
-	 * on 64-bit for now
-	 */
-	psize = get_slice_psize(mm, addr);
-	shift = mmu_psize_defs[psize].shift;
-#endif /* CONFIG_PPC64 */
-
-#ifdef CONFIG_HUGETLB_PAGE
-	if (unlikely(mmu_huge_psizes[psize])) {
-		pte_t *ptep;
-		unsigned long a = addr;
-		unsigned long sz = ((1UL) << shift);
-		struct hstate *hstate = size_to_hstate(sz);
-
-		BUG_ON(!hstate);
-		/*
-		 * XXX: could be optimized to avoid hstate
-		 * lookup entirely (just use shift)
-		 */
-
-		do {
-			VM_BUG_ON(shift != mmu_psize_defs[get_slice_psize(mm, a)].shift);
-			ptep = huge_pte_offset(mm, a);
-			pr_devel(" %016lx: huge ptep %p\n", a, ptep);
-			if (!ptep || !gup_huge_pte(ptep, hstate, &a, end, write, pages,
-						   &nr))
-				goto slow;
-		} while (a != end);
-	} else
-#endif /* CONFIG_HUGETLB_PAGE */
-	{
-		pgdp = pgd_offset(mm, addr);
-		do {
-			pgd_t pgd = *pgdp;
-
-#ifdef CONFIG_PPC64
-			VM_BUG_ON(shift != mmu_psize_defs[get_slice_psize(mm, addr)].shift);
-#endif
-			pr_devel("  %016lx: normal pgd %p\n", addr,
-				 (void *)pgd_val(pgd));
-			next = pgd_addr_end(addr, end);
-			if (pgd_none(pgd))
-				goto slow;
-			if (!gup_pud_range(pgd, addr, next, write, pages, &nr))
+	pgdp = pgd_offset(mm, addr);
+	do {
+		pgd_t pgd = *pgdp;
+
+		pr_devel("  %016lx: normal pgd %p\n", addr,
+			 (void *)pgd_val(pgd));
+		next = pgd_addr_end(addr, end);
+		if (pgd_none(pgd))
+			goto slow;
+		if (is_hugepd(pgdp)) {
+			if (!gup_hugepd((hugepd_t *)pgdp, PGDIR_SHIFT,
+					addr, next, write, pages, &nr))
 				goto slow;
-		} while (pgdp++, addr = next, addr != end);
-	}
+		} else if (!gup_pud_range(pgd, addr, next, write, pages, &nr))
+			goto slow;
+	} while (pgdp++, addr = next, addr != end);
+
 	local_irq_enable();
 
 	VM_BUG_ON(nr != (end - start) >> PAGE_SHIFT);
Index: working-2.6/arch/powerpc/kernel/perf_callchain.c
===================================================================
--- working-2.6.orig/arch/powerpc/kernel/perf_callchain.c	2009-09-15 16:03:08.000000000 +1000
+++ working-2.6/arch/powerpc/kernel/perf_callchain.c	2009-09-15 16:03:36.000000000 +1000
@@ -119,13 +119,6 @@ static void perf_callchain_kernel(struct
 }
 
 #ifdef CONFIG_PPC64
-
-#ifdef CONFIG_HUGETLB_PAGE
-#define is_huge_psize(pagesize)	(HPAGE_SHIFT && mmu_huge_psizes[pagesize])
-#else
-#define is_huge_psize(pagesize)	0
-#endif
-
 /*
  * On 64-bit we don't want to invoke hash_page on user addresses from
  * interrupt context, so if the access faults, we read the page tables
@@ -135,7 +128,7 @@ static int read_user_stack_slow(void __u
 {
 	pgd_t *pgdir;
 	pte_t *ptep, pte;
-	int pagesize;
+	unsigned shift;
 	unsigned long addr = (unsigned long) ptr;
 	unsigned long offset;
 	unsigned long pfn;
@@ -145,17 +138,14 @@ static int read_user_stack_slow(void __u
 	if (!pgdir)
 		return -EFAULT;
 
-	pagesize = get_slice_psize(current->mm, addr);
+	ptep = find_linux_pte_or_hugepte(pgdir, addr, &shift);
+	if (!shift)
+		shift = PAGE_SHIFT;
 
 	/* align address to page boundary */
-	offset = addr & ((1ul << mmu_psize_defs[pagesize].shift) - 1);
+	offset = addr & ((1UL << shift) - 1);
 	addr -= offset;
 
-	if (is_huge_psize(pagesize))
-		ptep = huge_pte_offset(current->mm, addr);
-	else
-		ptep = find_linux_pte(pgdir, addr);
-
 	if (ptep == NULL)
 		return -EFAULT;
 	pte = *ptep;
Index: working-2.6/arch/powerpc/mm/hash_utils_64.c
===================================================================
--- working-2.6.orig/arch/powerpc/mm/hash_utils_64.c	2009-09-15 16:03:07.000000000 +1000
+++ working-2.6/arch/powerpc/mm/hash_utils_64.c	2009-09-15 16:03:36.000000000 +1000
@@ -891,6 +891,7 @@ int hash_page(unsigned long ea, unsigned
 	unsigned long vsid;
 	struct mm_struct *mm;
 	pte_t *ptep;
+	unsigned hugeshift;
 	const struct cpumask *tmp;
 	int rc, user_region = 0, local = 0;
 	int psize, ssize;
@@ -943,14 +944,6 @@ int hash_page(unsigned long ea, unsigned
 	if (user_region && cpumask_equal(mm_cpumask(mm), tmp))
 		local = 1;
 
-#ifdef CONFIG_HUGETLB_PAGE
-	/* Handle hugepage regions */
-	if (HPAGE_SHIFT && mmu_huge_psizes[psize]) {
-		DBG_LOW(" -> huge page !\n");
-		return hash_huge_page(mm, access, ea, vsid, local, trap);
-	}
-#endif /* CONFIG_HUGETLB_PAGE */
-
 #ifndef CONFIG_PPC_64K_PAGES
 	/* If we use 4K pages and our psize is not 4K, then we are hitting
 	 * a special driver mapping, we need to align the address before
@@ -961,12 +954,18 @@ int hash_page(unsigned long ea, unsigned
 #endif /* CONFIG_PPC_64K_PAGES */
 
 	/* Get PTE and page size from page tables */
-	ptep = find_linux_pte(pgdir, ea);
+	ptep = find_linux_pte_or_hugepte(pgdir, ea, &hugeshift);
 	if (ptep == NULL || !pte_present(*ptep)) {
 		DBG_LOW(" no PTE !\n");
 		return 1;
 	}
 
+#ifdef CONFIG_HUGETLB_PAGE
+	if (hugeshift)
+		return __hash_page_huge(ea, access, vsid, ptep, trap, local,
+					ssize, hugeshift, psize);
+#endif /* CONFIG_HUGETLB_PAGE */
+
 #ifndef CONFIG_PPC_64K_PAGES
 	DBG_LOW(" i-pte: %016lx\n", pte_val(*ptep));
 #else
Index: working-2.6/arch/powerpc/include/asm/mmu-hash64.h
===================================================================
--- working-2.6.orig/arch/powerpc/include/asm/mmu-hash64.h	2009-09-15 16:03:07.000000000 +1000
+++ working-2.6/arch/powerpc/include/asm/mmu-hash64.h	2009-09-15 16:03:36.000000000 +1000
@@ -173,14 +173,6 @@ extern unsigned long tce_alloc_start, tc
  */
 extern int mmu_ci_restrictions;
 
-#ifdef CONFIG_HUGETLB_PAGE
-/*
- * The page size indexes of the huge pages for use by hugetlbfs
- */
-extern unsigned int mmu_huge_psizes[MMU_PAGE_COUNT];
-
-#endif /* CONFIG_HUGETLB_PAGE */
-
 /*
  * This function sets the AVPN and L fields of the HPTE  appropriately
  * for the page size
@@ -254,9 +246,9 @@ extern int __hash_page_64K(unsigned long
 			   unsigned int local, int ssize);
 struct mm_struct;
 extern int hash_page(unsigned long ea, unsigned long access, unsigned long trap);
-extern int hash_huge_page(struct mm_struct *mm, unsigned long access,
-			  unsigned long ea, unsigned long vsid, int local,
-			  unsigned long trap);
+int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
+		     pte_t *ptep, unsigned long trap, int local, int ssize,
+		     unsigned int shift, unsigned int mmu_psize);
 
 extern int htab_bolt_mapping(unsigned long vstart, unsigned long vend,
 			     unsigned long pstart, unsigned long prot,

^ permalink raw reply

* [5/5] Split hash MMU specific hugepage code into a new file
From: David Gibson @ 2009-09-15  6:43 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt
In-Reply-To: <20090915064133.GA11621@yookeroo.seuss>

This patch separates the parts of hugetlbpage.c which are inherently
specific to the hash MMU into a new hugelbpage-hash64.c file.

Signed-off-by: David Gibson <dwg@au1.ibm.com>

---
 arch/powerpc/include/asm/hugetlb.h   |    3 
 arch/powerpc/mm/Makefile             |    5 -
 arch/powerpc/mm/hugetlbpage-hash64.c |  167 ++++++++++++++++++++++++++++++++++
 arch/powerpc/mm/hugetlbpage.c        |  168 -----------------------------------
 4 files changed, 176 insertions(+), 167 deletions(-)

Index: working-2.6/arch/powerpc/mm/Makefile
===================================================================
--- working-2.6.orig/arch/powerpc/mm/Makefile	2009-08-14 16:07:54.000000000 +1000
+++ working-2.6/arch/powerpc/mm/Makefile	2009-09-09 15:24:33.000000000 +1000
@@ -28,7 +28,10 @@ obj-$(CONFIG_44x)		+= 44x_mmu.o
 obj-$(CONFIG_FSL_BOOKE)		+= fsl_booke_mmu.o
 obj-$(CONFIG_NEED_MULTIPLE_NODES) += numa.o
 obj-$(CONFIG_PPC_MM_SLICES)	+= slice.o
-obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
+ifeq ($(CONFIG_HUGETLB_PAGE),y)
+obj-y				+= hugetlbpage.o
+obj-$(CONFIG_PPC_STD_MMU_64)	+= hugetlbpage-hash64.o
+endif
 obj-$(CONFIG_PPC_SUBPAGE_PROT)	+= subpage-prot.o
 obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o
 obj-$(CONFIG_HIGHMEM)		+= highmem.o
Index: working-2.6/arch/powerpc/mm/hugetlbpage-hash64.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ working-2.6/arch/powerpc/mm/hugetlbpage-hash64.c	2009-09-09 15:25:35.000000000 +1000
@@ -0,0 +1,167 @@
+/*
+ * PPC64 Huge TLB Page Support for hash based MMUs (POWER4 and later)
+ *
+ * Copyright (C) 2003 David Gibson, IBM Corporation.
+ *
+ * Based on the IA-32 version:
+ * Copyright (C) 2002, Rohit Seth <rohit.seth@intel.com>
+ */
+
+#include <linux/mm.h>
+#include <linux/hugetlb.h>
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+#include <asm/cacheflush.h>
+#include <asm/machdep.h>
+
+/*
+ * Called by asm hashtable.S for doing lazy icache flush
+ */
+static unsigned int hash_huge_page_do_lazy_icache(unsigned long rflags,
+					pte_t pte, int trap, unsigned long sz)
+{
+	struct page *page;
+	int i;
+
+	if (!pfn_valid(pte_pfn(pte)))
+		return rflags;
+
+	page = pte_page(pte);
+
+	/* page is dirty */
+	if (!test_bit(PG_arch_1, &page->flags) && !PageReserved(page)) {
+		if (trap == 0x400) {
+			for (i = 0; i < (sz / PAGE_SIZE); i++)
+				__flush_dcache_icache(page_address(page+i));
+			set_bit(PG_arch_1, &page->flags);
+		} else {
+			rflags |= HPTE_R_N;
+		}
+	}
+	return rflags;
+}
+
+int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
+		     pte_t *ptep, unsigned long trap, int local, int ssize,
+		     unsigned int shift, unsigned int mmu_psize)
+{
+	unsigned long old_pte, new_pte;
+	unsigned long va, rflags, pa, sz;
+	long slot;
+	int err = 1;
+
+	BUG_ON(shift != mmu_psize_defs[mmu_psize].shift);
+
+	/* Search the Linux page table for a match with va */
+	va = hpt_va(ea, vsid, ssize);
+
+	/*
+	 * Check the user's access rights to the page.  If access should be
+	 * prevented then send the problem up to do_page_fault.
+	 */
+	if (unlikely(access & ~pte_val(*ptep)))
+		goto out;
+	/*
+	 * At this point, we have a pte (old_pte) which can be used to build
+	 * or update an HPTE. There are 2 cases:
+	 *
+	 * 1. There is a valid (present) pte with no associated HPTE (this is
+	 *	the most common case)
+	 * 2. There is a valid (present) pte with an associated HPTE. The
+	 *	current values of the pp bits in the HPTE prevent access
+	 *	because we are doing software DIRTY bit management and the
+	 *	page is currently not DIRTY.
+	 */
+
+
+	do {
+		old_pte = pte_val(*ptep);
+		if (old_pte & _PAGE_BUSY)
+			goto out;
+		new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED;
+	} while(old_pte != __cmpxchg_u64((unsigned long *)ptep,
+					 old_pte, new_pte));
+
+	rflags = 0x2 | (!(new_pte & _PAGE_RW));
+ 	/* _PAGE_EXEC -> HW_NO_EXEC since it's inverted */
+	rflags |= ((new_pte & _PAGE_EXEC) ? 0 : HPTE_R_N);
+	sz = ((1UL) << shift);
+	if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
+		/* No CPU has hugepages but lacks no execute, so we
+		 * don't need to worry about that case */
+		rflags = hash_huge_page_do_lazy_icache(rflags, __pte(old_pte),
+						       trap, sz);
+
+	/* Check if pte already has an hpte (case 2) */
+	if (unlikely(old_pte & _PAGE_HASHPTE)) {
+		/* There MIGHT be an HPTE for this pte */
+		unsigned long hash, slot;
+
+		hash = hpt_hash(va, shift, ssize);
+		if (old_pte & _PAGE_F_SECOND)
+			hash = ~hash;
+		slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+		slot += (old_pte & _PAGE_F_GIX) >> 12;
+
+		if (ppc_md.hpte_updatepp(slot, rflags, va, mmu_psize,
+					 ssize, local) == -1)
+			old_pte &= ~_PAGE_HPTEFLAGS;
+	}
+
+	if (likely(!(old_pte & _PAGE_HASHPTE))) {
+		unsigned long hash = hpt_hash(va, shift, ssize);
+		unsigned long hpte_group;
+
+		pa = pte_pfn(__pte(old_pte)) << PAGE_SHIFT;
+
+repeat:
+		hpte_group = ((hash & htab_hash_mask) *
+			      HPTES_PER_GROUP) & ~0x7UL;
+
+		/* clear HPTE slot informations in new PTE */
+#ifdef CONFIG_PPC_64K_PAGES
+		new_pte = (new_pte & ~_PAGE_HPTEFLAGS) | _PAGE_HPTE_SUB0;
+#else
+		new_pte = (new_pte & ~_PAGE_HPTEFLAGS) | _PAGE_HASHPTE;
+#endif
+		/* Add in WIMG bits */
+		rflags |= (new_pte & (_PAGE_WRITETHRU | _PAGE_NO_CACHE |
+				      _PAGE_COHERENT | _PAGE_GUARDED));
+
+		/* Insert into the hash table, primary slot */
+		slot = ppc_md.hpte_insert(hpte_group, va, pa, rflags, 0,
+					  mmu_psize, ssize);
+
+		/* Primary is full, try the secondary */
+		if (unlikely(slot == -1)) {
+			hpte_group = ((~hash & htab_hash_mask) *
+				      HPTES_PER_GROUP) & ~0x7UL;
+			slot = ppc_md.hpte_insert(hpte_group, va, pa, rflags,
+						  HPTE_V_SECONDARY,
+						  mmu_psize, ssize);
+			if (slot == -1) {
+				if (mftb() & 0x1)
+					hpte_group = ((hash & htab_hash_mask) *
+						      HPTES_PER_GROUP)&~0x7UL;
+
+				ppc_md.hpte_remove(hpte_group);
+				goto repeat;
+                        }
+		}
+
+		if (unlikely(slot == -2))
+			panic("hash_huge_page: pte_insert failed\n");
+
+		new_pte |= (slot << 12) & (_PAGE_F_SECOND | _PAGE_F_GIX);
+	}
+
+	/*
+	 * No need to use ldarx/stdcx here
+	 */
+	*ptep = __pte(new_pte & ~_PAGE_BUSY);
+
+	err = 0;
+
+ out:
+	return err;
+}
Index: working-2.6/arch/powerpc/mm/hugetlbpage.c
===================================================================
--- working-2.6.orig/arch/powerpc/mm/hugetlbpage.c	2009-09-09 15:22:49.000000000 +1000
+++ working-2.6/arch/powerpc/mm/hugetlbpage.c	2009-09-09 15:25:09.000000000 +1000
@@ -7,29 +7,17 @@
  * Copyright (C) 2002, Rohit Seth <rohit.seth@intel.com>
  */
 
-#include <linux/init.h>
-#include <linux/fs.h>
 #include <linux/mm.h>
+#include <linux/io.h>
 #include <linux/hugetlb.h>
-#include <linux/pagemap.h>
-#include <linux/slab.h>
-#include <linux/err.h>
-#include <linux/sysctl.h>
-#include <asm/mman.h>
+#include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 #include <asm/tlb.h>
-#include <asm/tlbflush.h>
-#include <asm/mmu_context.h>
-#include <asm/machdep.h>
-#include <asm/cputable.h>
-#include <asm/spu.h>
 
 #define PAGE_SHIFT_64K	16
 #define PAGE_SHIFT_16M	24
 #define PAGE_SHIFT_16G	34
 
-#define NUM_LOW_AREAS	(0x100000000UL >> SID_SHIFT)
-#define NUM_HIGH_AREAS	(PGTABLE_RANGE >> HTLB_AREA_SHIFT)
 #define MAX_NUMBER_GPAGES	1024
 
 /* Tracks the 16G pages after the device tree is scanned and before the
@@ -507,158 +495,6 @@ unsigned long vma_mmu_pagesize(struct vm
 	return 1UL << mmu_psize_to_shift(psize);
 }
 
-/*
- * Called by asm hashtable.S for doing lazy icache flush
- */
-static unsigned int hash_huge_page_do_lazy_icache(unsigned long rflags,
-					pte_t pte, int trap, unsigned long sz)
-{
-	struct page *page;
-	int i;
-
-	if (!pfn_valid(pte_pfn(pte)))
-		return rflags;
-
-	page = pte_page(pte);
-
-	/* page is dirty */
-	if (!test_bit(PG_arch_1, &page->flags) && !PageReserved(page)) {
-		if (trap == 0x400) {
-			for (i = 0; i < (sz / PAGE_SIZE); i++)
-				__flush_dcache_icache(page_address(page+i));
-			set_bit(PG_arch_1, &page->flags);
-		} else {
-			rflags |= HPTE_R_N;
-		}
-	}
-	return rflags;
-}
-
-int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
-		     pte_t *ptep, unsigned long trap, int local, int ssize,
-		     unsigned int shift, unsigned int mmu_psize)
-{
-	unsigned long old_pte, new_pte;
-	unsigned long va, rflags, pa, sz;
-	long slot;
-	int err = 1;
-
-	BUG_ON(shift != mmu_psize_defs[mmu_psize].shift);
-
-	/* Search the Linux page table for a match with va */
-	va = hpt_va(ea, vsid, ssize);
-
-	/* 
-	 * Check the user's access rights to the page.  If access should be
-	 * prevented then send the problem up to do_page_fault.
-	 */
-	if (unlikely(access & ~pte_val(*ptep)))
-		goto out;
-	/*
-	 * At this point, we have a pte (old_pte) which can be used to build
-	 * or update an HPTE. There are 2 cases:
-	 *
-	 * 1. There is a valid (present) pte with no associated HPTE (this is 
-	 *	the most common case)
-	 * 2. There is a valid (present) pte with an associated HPTE. The
-	 *	current values of the pp bits in the HPTE prevent access
-	 *	because we are doing software DIRTY bit management and the
-	 *	page is currently not DIRTY. 
-	 */
-
-
-	do {
-		old_pte = pte_val(*ptep);
-		if (old_pte & _PAGE_BUSY)
-			goto out;
-		new_pte = old_pte | _PAGE_BUSY | _PAGE_ACCESSED;
-	} while(old_pte != __cmpxchg_u64((unsigned long *)ptep,
-					 old_pte, new_pte));
-
-	rflags = 0x2 | (!(new_pte & _PAGE_RW));
- 	/* _PAGE_EXEC -> HW_NO_EXEC since it's inverted */
-	rflags |= ((new_pte & _PAGE_EXEC) ? 0 : HPTE_R_N);
-	sz = ((1UL) << shift);
-	if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
-		/* No CPU has hugepages but lacks no execute, so we
-		 * don't need to worry about that case */
-		rflags = hash_huge_page_do_lazy_icache(rflags, __pte(old_pte),
-						       trap, sz);
-
-	/* Check if pte already has an hpte (case 2) */
-	if (unlikely(old_pte & _PAGE_HASHPTE)) {
-		/* There MIGHT be an HPTE for this pte */
-		unsigned long hash, slot;
-
-		hash = hpt_hash(va, shift, ssize);
-		if (old_pte & _PAGE_F_SECOND)
-			hash = ~hash;
-		slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
-		slot += (old_pte & _PAGE_F_GIX) >> 12;
-
-		if (ppc_md.hpte_updatepp(slot, rflags, va, mmu_psize,
-					 ssize, local) == -1)
-			old_pte &= ~_PAGE_HPTEFLAGS;
-	}
-
-	if (likely(!(old_pte & _PAGE_HASHPTE))) {
-		unsigned long hash = hpt_hash(va, shift, ssize);
-		unsigned long hpte_group;
-
-		pa = pte_pfn(__pte(old_pte)) << PAGE_SHIFT;
-
-repeat:
-		hpte_group = ((hash & htab_hash_mask) *
-			      HPTES_PER_GROUP) & ~0x7UL;
-
-		/* clear HPTE slot informations in new PTE */
-#ifdef CONFIG_PPC_64K_PAGES
-		new_pte = (new_pte & ~_PAGE_HPTEFLAGS) | _PAGE_HPTE_SUB0;
-#else
-		new_pte = (new_pte & ~_PAGE_HPTEFLAGS) | _PAGE_HASHPTE;
-#endif
-		/* Add in WIMG bits */
-		rflags |= (new_pte & (_PAGE_WRITETHRU | _PAGE_NO_CACHE |
-				      _PAGE_COHERENT | _PAGE_GUARDED));
-
-		/* Insert into the hash table, primary slot */
-		slot = ppc_md.hpte_insert(hpte_group, va, pa, rflags, 0,
-					  mmu_psize, ssize);
-
-		/* Primary is full, try the secondary */
-		if (unlikely(slot == -1)) {
-			hpte_group = ((~hash & htab_hash_mask) *
-				      HPTES_PER_GROUP) & ~0x7UL; 
-			slot = ppc_md.hpte_insert(hpte_group, va, pa, rflags,
-						  HPTE_V_SECONDARY,
-						  mmu_psize, ssize);
-			if (slot == -1) {
-				if (mftb() & 0x1)
-					hpte_group = ((hash & htab_hash_mask) *
-						      HPTES_PER_GROUP)&~0x7UL;
-
-				ppc_md.hpte_remove(hpte_group);
-				goto repeat;
-                        }
-		}
-
-		if (unlikely(slot == -2))
-			panic("hash_huge_page: pte_insert failed\n");
-
-		new_pte |= (slot << 12) & (_PAGE_F_SECOND | _PAGE_F_GIX);
-	}
-
-	/*
-	 * No need to use ldarx/stdcx here
-	 */
-	*ptep = __pte(new_pte & ~_PAGE_BUSY);
-
-	err = 0;
-
- out:
-	return err;
-}
-
 static int __init add_huge_page_size(unsigned long long size)
 {
 	int shift = __ffs(size);
Index: working-2.6/arch/powerpc/include/asm/hugetlb.h
===================================================================
--- working-2.6.orig/arch/powerpc/include/asm/hugetlb.h	2009-09-09 15:15:12.000000000 +1000
+++ working-2.6/arch/powerpc/include/asm/hugetlb.h	2009-09-09 15:24:33.000000000 +1000
@@ -13,6 +13,9 @@ static inline int hugepd_ok(hugepd_t hpd
 #define is_hugepd(pdep)               (hugepd_ok(*((hugepd_t *)(pdep))))
 #define HUGEPD_SHIFT_MASK     0x3f
 
+pte_t *huge_pte_offset_and_shift(struct mm_struct *mm,
+				 unsigned long addr, unsigned *shift);
+
 int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 			   unsigned long len);
 

^ permalink raw reply

* Re: [git pull] Please pull powerpc.git next branch
From: Benjamin Herrenschmidt @ 2009-09-15  7:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linuxppc-dev list, Andrew Morton, Ingo Molnar, Linux Kernel list,
	FUJITA Tomonori
In-Reply-To: <1252653473.8566.98.camel@pasglop>

On Fri, 2009-09-11 at 17:18 +1000, Benjamin Herrenschmidt wrote:
> Hi Linus !
> 
> This is the powerpc batch for 2.6.32.
> 
> You will notice a bunch of generic swiotlb changes along with
> corresponding changes to arch/sparc and arch/x86 from Fujita Tomonori.
> 
> There are due to my tree having pulled Ingo's iommu tree do sort out
> various dependencies. If you pull Ingo's first, you'll already have
> all these.
>
> I'm happy for you to defer the pulling of my tree until you have those
> bits via Ingo if you prefer, and I'll then send a pull request cleared
> of that noise.

So you pulled from Ingo, gitk or git log origin/master..powerpc/next
only shows my new stuff but git shortlog and git diff --stat still get
everything from the merge base including Fujita stuff...

So I don't know at this stage how to generate a "clean" pull request ...
In any case, the tree is still there waiting for you to pull. It seems
to merge cleanly tonight though I haven't yet got a chance to test the
result much.

Cheers,
Ben.

> Among other non-arch/powerpc patches: Some PS3 and PowerMac specific
> driver changes (yes, I think we are still the only users of
> generic_nvram), some changes to HVC console and that should be it.
> 
> Cheers,
> Ben.
> 
> The following changes since commit e07cccf4046978df10f2e13fe2b99b2f9b3a65db:
>   Linus Torvalds (1):
>         Linux 2.6.31-rc9
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git next
> 
> Anton Blanchard (3):
>       powerpc: Move 64bit VDSO to improve context switch performance
>       powerpc: Rearrange SLB preload code
>       powerpc: Preload application text segment instead of TASK_UNMAPPED_BASE
> 
> Anton Vorontsov (7):
>       powerpc/83xx: Add support for MPC8377E-WLAN boards
>       powerpc/85xx: Add support for I2C EEPROMs on MPC8548CDS boards
>       powerpc/83xx: Add eSDHC support for MPC837xE-RDB/WLAN boards
>       powerpc/85xx: Add eSDHC support for MPC8536DS boards
>       powerpc/82xx: Fix BCSR bits for MPC8272ADS boards
>       powerpc/82xx: Add CPM USB Gadget support for MPC8272ADS boards
>       powerpc/85xx: Add QE USB support for MPC8569E-MDS boards
> 
> Arnd Bergmann (1):
>       dma-ops: Remove flush_write_buffers() in dma-mapping-common.h
> 
> Bastian Blank (1):
>       powerpc: Remove SMP warning from PowerMac cpufreq
> 
> Becky Bruce (1):
>       powerpc: Name xpn & x fields in HW Hash PTE format
> 
> Benjamin Herrenschmidt (37):
>       powerpc: Rename exception.h to exception-64s.h
>       powerpc: Use names rather than numbers for SPRGs (v2)
>       powerpc: Remove use of a second scratch SPRG in STAB code
>       powerpc/mm: Fix definitions of FORCE_MAX_ZONEORDER in Kconfig
>       powerpc/pmac: Fix PowerSurge SMP IPI allocation
>       powerpc: Change PACA from SPRG3 to SPRG1
>       powerpc: Add compat_sys_truncate
>       powerpc/mm: Fix misplaced #endif in pgtable-ppc64-64k.h
>       powerpc/of: Remove useless register save/restore when calling OF back
>       powerpc/mm: Add HW threads support to no_hash TLB management
>       powerpc/mm: Add opcode definitions for tlbivax and tlbsrx.
>       powerpc/mm: Add more bit definitions for Book3E MMU registers
>       powerpc/mm: Add support for early ioremap on non-hash 64-bit processors
>       powerpc: Modify some ppc_asm.h macros to accomodate 64-bits Book3E
>       powerpc/mm: Make low level TLB flush ops on BookE take additional args
>       powerpc/mm: Call mmu_context_init() from ppc64
>       powerpc: Clean ifdef usage in copy_thread()
>       powerpc: Move definitions of secondary CPU spinloop to header file
>       powerpc/mm: Rework & cleanup page table freeing code path
>       powerpc: Add SPR definitions for new 64-bit BookE
>       powerpc: Add memory management headers for new 64-bit BookE
>       powerpc: Add definitions used by exception handling on 64-bit Book3E
>       powerpc: Add PACA fields specific to 64-bit Book3E processors
>       powerpc/mm: Move around mmu_gathers definition on 64-bit
>       powerpc: Add TLB management code for 64-bit Book3E
>       powerpc/mm: Add support for SPARSEMEM_VMEMMAP on 64-bit Book3E
>       powerpc: Remaining 64-bit Book3E support
>       powerpc/mm: Fix encoding of page table cache numbers
>       Merge commit 'paulus-perf/master' into next
>       Merge commit 'origin/master' into next
>       powerpc/mm: Cleanup handling of execute permission
>       Merge commit 'kumar/next' into next
>       Merge commit 'tip/iommu-for-powerpc' into next
>       powerpc: Properly start decrementer on BookE secondary CPUs
>       powerpc: Fix some late PowerMac G5 with PCIe ATI graphics
>       powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead
>       powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdline
> 
> Benjamin Krill (1):
>       powerpc/prom_init: Evaluate mem kernel parameter for early allocation
> 
> Brian King (1):
>       powerpc/pseries: Fix to handle slb resize across migration
> 
> Casey Dahlin (1):
>       lib/swiotlb.c: Fix strange panic message selection logic when swiotlb fills up
> 
> Christoph Hellwig (2):
>       powerpc/sputrace: Use the generic event tracer
>       powerpc: Switch to asm-generic/hardirq.h
> 
> FUJITA Tomonori (29):
>       swiotlb: remove unused swiotlb_alloc_boot()
>       swiotlb: remove unused swiotlb_alloc()
>       swiotlb: remove swiotlb_arch_range_needs_mapping
>       swiotlb: remove unnecessary swiotlb_bus_to_virt
>       x86: add dma_capable() to replace is_buffer_dma_capable()
>       x86: replace is_buffer_dma_capable() with dma_capable
>       ia64: add dma_capable() to replace is_buffer_dma_capable()
>       powerpc: add dma_capable() to replace is_buffer_dma_capable()
>       swiotlb: use dma_capable()
>       powerpc: remove unncesary swiotlb_arch_address_needs_mapping
>       remove is_buffer_dma_capable()
>       x86, IA64, powerpc: add phys_to_dma() and dma_to_phys()
>       swiotlb: use phys_to_dma and dma_to_phys
>       powerpc: remove unused swiotlb_phys_to_bus() and swiotlb_bus_to_phys()
>       x86: remove unused swiotlb_phys_to_bus() and swiotlb_bus_to_phys()
>       IA64: Remove NULL flush_write_buffers
>       sparc: Use dma_map_ops struct
>       sparc: Use asm-generic/dma-mapping-common.h
>       sparc: Remove no-op dma_4v_sync_single_for_cpu and dma_4v_sync_sg_for_cpu
>       sparc: Replace sbus_map_single and sbus_unmap_single with sbus_map_page and sbus_unmap_page
>       sparc: Use asm-generic/pci-dma-compat
>       sparc: Add CONFIG_DMA_API_DEBUG support
>       powerpc: Remove addr_needs_map in struct dma_mapping_ops
>       powerpc: Remove swiotlb_pci_dma_ops
>       dma: Add set_dma_mask hook to struct dma_map_ops
>       powerpc: use dma_map_ops struct
>       powerpc: Use asm-generic/dma-mapping-common.h
>       powerpc: Handle SWIOTLB mapping error properly
>       powerpc: Add CONFIG_DMA_API_DEBUG support
> 
> Frans Pop (1):
>       powerpc: Makefile simplification through use of cc-ifversion
> 
> Gautham R Shenoy (1):
>       powerpc/pseries: Reduce the polling interval in __cpu_up()
> 
> Geert Uytterhoeven (1):
>       powerpc/cell: Move CBE_IOPTE_* to <asm/cell-regs.h>
> 
> Geoff Levand (1):
>       powerpc/ps3: Workaround for flash memory I/O error
> 
> Geoff Thorpe (1):
>       powerpc: expose the multi-bit ops that underlie single-bit ops.
> 
> Gerhard Pircher (1):
>       powerpc/amigaone: Convert amigaone_init() to a machine_device_initcall()
> 
> Grant Likely (3):
>       powerpc/pci: Remove dead checks for CONFIG_PPC_OF
>       powerpc/pci: move pci_64.c device tree scanning code into pci-common.c
>       powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan()
> 
> Heiko Schocher (2):
>       powerpc/82xx: mgcoge - updates for 2.6.32
>       powerpc/82xx: mgcoge - updated defconfig
> 
> Josh Boyer (1):
>       powerpc: Fix __flush_icache_range on 44x
> 
> Julia Lawall (5):
>       powerpc/fsl_rio: Add kmalloc NULL tests
>       powerpc/ipic: introduce missing kfree
>       powerpc/qe: introduce missing kfree
>       hvc_console: Drop unnecessary NULL test
>       powerpc: Use DIV_ROUND_CLOSEST in time init code
> 
> Kumar Gala (14):
>       powerpc/mm: Fix switch_mmu_context to iterate of the proper list of cpus
>       powerpc/85xx: Move mpc8536ds.dts to address-cells/size-cells = <2>
>       powerpc/85xx: Added 36-bit physical device tree for mpc8536ds board
>       powerpc/mm: Fix assert_pte_locked to work properly on uniprocessor
>       powerpc/booke: Move MMUCSR definition into mmu-book3e.h
>       powerpc/mm: Add MMU features for TLB reservation & Paired MAS registers
>       powerpc/book3e-64: Move the default cpu table entry
>       powerpc/book3e-64: Wait til generic_calibrate_decr to enable decrementer
>       powerpc/book3e-64: Add helper function to setup IVORs
>       powerpc/book3e-64: Add support to initial_tlb_book3e for non-HES TLB
>       powerpc/pci: Pull ppc32 PCI features into common
>       powerpc/book3e: Add missing page sizes
>       powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT
>       powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops
> 
> Liang Li (4):
>       powerpc/83xx: Remove second USB node from SBC834x DTS
>       powerpc/83xx: Add localbus node and MTD partitions for SBC834x
>       powerpc/83xx: Fix incorrect PCI interrupt map in SBC834x DTS
>       powerpc/85xx: sbc8560 - Fix warm reboot with board specific reset function
> 
> Lucian Adrian Grijincu (1):
>       powerpc: Update boot wrapper script with the new location of dtc
> 
> Lyonel Vincent (1):
>       powerpc/powermac: Thermal control turns system off too eagerly
> 
> Martyn Welch (5):
>       powerpc/86xx: Correct reading of information presented in cpuinfo
>       powerpc/86xx: Enable XMC site on GE Fanuc SBC310
>       powerpc/86xx: Update GE Fanuc sbc310 DTS
>       powerpc/nvram: Allow byte length reads from mmio NVRAM driver
>       powerpc/nvram: Enable use Generic NVRAM driver for different size chips
> 
> Michael Barkowski (1):
>       powerpc/qe_lib: Set gpio data before changing the direction to output
> 
> Michael Ellerman (4):
>       powerpc/mpic: Fix MPIC_BROKEN_REGREAD on non broken MPICs
>       kmemleak: Allow kmemleak to be built on powerpc
>       powerpc: Enable GCOV
>       powerpc/vmlinux.lds: Move _edata down
> 
> Michael Wolf (1):
>       powerpc: Adjust base and index registers in Altivec macros
> 
> Michel Dänzer (2):
>       agp/uninorth: Allow larger aperture sizes on pre-U3 bridges.
>       agp/uninorth: Simplify cache flushing.
> 
> Paul Gortmaker (4):
>       powerpc/83xx: sbc8349 - update defconfig, enable MTD, USB storage
>       powerpc/85xx: issue fsl_soc reboot warning only when applicable
>       powerpc/85xx: sbc8560 - remove "has-rstcr" from global utilities block
>       powerpc: derive COMMAND_LINE_SIZE from asm-generic
> 
> Paul Mackerras (5):
>       powerpc/32: Always order writes to halves of 64-bit PTEs
>       powerpc: Allow perf_counters to access user memory at interrupt time
>       perf_counter: powerpc: Add callchain support
>       powerpc: Fix bug where perf_counters breaks oprofile
>       powerpc/perf_counters: Reduce stack usage of power_check_constraints
> 
> Peter Huewe (1):
>       hvc_console: Add __init and __exit to hvc_vio
> 
> Poonam Aggrwal (1):
>       powerpc/85xx: Add support for P2020RDB board
> 
> Roel Kluin (2):
>       powerpc/fsl-booke: read buffer overflow
>       powerpc/hvsi: Avoid calculating possibly-invalid address
> 
> Sebastian Andrzej Siewior (1):
>       powerpc/ipic: unmask all interrupt sources
> 
> Solomon Peachy (1):
>       powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBC
> 
> Stefan Roese (7):
>       powerpc: Add AMCC 460EX/460GT Rev. B support to cputable.c
>       powerpc/44x: Add NAND support to Canyonlands dts
>       powerpc/40x: Update Kilauea dts to support NAND, RTC and HWMON
>       powerpc/44x: Update Canyonlands defconfig to support NOR, NAND and RTC
>       powerpc/40x: Update kilauea defconfig to support NAND, RTC and HWMON
>       powerpc/44x: Update Arches dts
>       powerpc/44x: Update Arches defconfig
> 
> Stephen Rothwell (1):
>       powerpc: use consistent types in mktree
> 
> Stoyan Gaydarov (1):
>       powerpc: ARRAY_SIZE changes in pasemi and powermac code
> 
> Tiejun Chen (2):
>       powerpc/405ex: provide necessary fixup function to support cuImage
>       powerpc/405ex: support cuImage via included dtb
> 
> Wolfram Sang (1):
>       powerpc/irq: Improve nanodoc
> 
> fkan@amcc.com (1):
>       powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support.
> 
> roel kluin (3):
>       powerpc/cell: Replace strncpy by strlcpy
>       powerpc: Missing tests for NULL after ioremap()
>       powerpc/macio: Don't the address of an array element before boundchecking
> 
>  arch/ia64/include/asm/dma-mapping.h                |   19 +-
>  arch/powerpc/Kconfig                               |   29 +-
>  arch/powerpc/Makefile                              |    2 +-
>  arch/powerpc/boot/4xx.c                            |  142 +++
>  arch/powerpc/boot/4xx.h                            |    1 +
>  arch/powerpc/boot/Makefile                         |    6 +-
>  arch/powerpc/boot/cuboot-hotfoot.c                 |  142 +++
>  arch/powerpc/boot/cuboot-kilauea.c                 |   49 +
>  arch/powerpc/boot/dcr.h                            |    4 +-
>  arch/powerpc/boot/dts/arches.dts                   |   50 +
>  arch/powerpc/boot/dts/canyonlands.dts              |   49 +-
>  arch/powerpc/boot/dts/eiger.dts                    |  421 +++++++
>  arch/powerpc/boot/dts/gef_sbc310.dts               |   64 +-
>  arch/powerpc/boot/dts/hotfoot.dts                  |  294 +++++
>  arch/powerpc/boot/dts/kilauea.dts                  |   44 +-
>  arch/powerpc/boot/dts/mgcoge.dts                   |   53 +
>  arch/powerpc/boot/dts/mpc8272ads.dts               |    8 +
>  arch/powerpc/boot/dts/mpc8377_rdb.dts              |    2 +-
>  arch/powerpc/boot/dts/mpc8377_wlan.dts             |  464 ++++++++
>  arch/powerpc/boot/dts/mpc8378_rdb.dts              |    2 +-
>  arch/powerpc/boot/dts/mpc8379_rdb.dts              |    2 +-
>  arch/powerpc/boot/dts/mpc8536ds.dts                |   40 +-
>  arch/powerpc/boot/dts/mpc8536ds_36b.dts            |  475 ++++++++
>  arch/powerpc/boot/dts/mpc8548cds.dts               |   20 +
>  arch/powerpc/boot/dts/mpc8569mds.dts               |   45 +
>  arch/powerpc/boot/dts/p2020rdb.dts                 |  586 +++++++++
>  arch/powerpc/boot/dts/sbc8349.dts                  |   60 +-
>  arch/powerpc/boot/dts/sbc8560.dts                  |    1 -
>  arch/powerpc/boot/mktree.c                         |   10 +-
>  arch/powerpc/boot/ppcboot-hotfoot.h                |  133 +++
>  arch/powerpc/boot/wrapper                          |    3 +-
>  arch/powerpc/configs/40x/kilauea_defconfig         |  298 ++++-
>  arch/powerpc/configs/44x/arches_defconfig          |  382 ++++++-
>  arch/powerpc/configs/44x/canyonlands_defconfig     |  350 +++++-
>  arch/powerpc/configs/44x/eiger_defconfig           | 1252 ++++++++++++++++++++
>  arch/powerpc/configs/83xx/sbc834x_defconfig        |  320 +++++-
>  arch/powerpc/configs/mgcoge_defconfig              |   86 ++-
>  arch/powerpc/configs/mpc85xx_defconfig             |    1 +
>  arch/powerpc/include/asm/bitops.h                  |  194 +---
>  arch/powerpc/include/asm/cell-regs.h               |   11 +
>  arch/powerpc/include/asm/cputhreads.h              |   16 +
>  arch/powerpc/include/asm/device.h                  |    7 +-
>  arch/powerpc/include/asm/dma-mapping.h             |  323 +-----
>  arch/powerpc/include/asm/exception-64e.h           |  205 ++++
>  .../include/asm/{exception.h => exception-64s.h}   |   25 +-
>  arch/powerpc/include/asm/hardirq.h                 |   30 +-
>  arch/powerpc/include/asm/hw_irq.h                  |    5 +
>  arch/powerpc/include/asm/iommu.h                   |   10 -
>  arch/powerpc/include/asm/irq.h                     |    7 +-
>  arch/powerpc/include/asm/machdep.h                 |    6 +-
>  arch/powerpc/include/asm/mmu-40x.h                 |    3 +
>  arch/powerpc/include/asm/mmu-44x.h                 |    6 +
>  arch/powerpc/include/asm/mmu-8xx.h                 |    3 +
>  arch/powerpc/include/asm/mmu-book3e.h              |  208 +++-
>  arch/powerpc/include/asm/mmu-hash32.h              |   16 +-
>  arch/powerpc/include/asm/mmu-hash64.h              |   22 +-
>  arch/powerpc/include/asm/mmu.h                     |   46 +
>  arch/powerpc/include/asm/mmu_context.h             |   15 +-
>  arch/powerpc/include/asm/nvram.h                   |    3 +
>  arch/powerpc/include/asm/paca.h                    |   23 +-
>  arch/powerpc/include/asm/page.h                    |    4 +
>  arch/powerpc/include/asm/page_64.h                 |   10 +
>  arch/powerpc/include/asm/pci-bridge.h              |   40 +-
>  arch/powerpc/include/asm/pci.h                     |   11 +-
>  arch/powerpc/include/asm/pgalloc.h                 |   46 +-
>  arch/powerpc/include/asm/pgtable-ppc32.h           |    9 +-
>  arch/powerpc/include/asm/pgtable-ppc64-64k.h       |    4 +-
>  arch/powerpc/include/asm/pgtable-ppc64.h           |   67 +-
>  arch/powerpc/include/asm/pgtable.h                 |    6 +-
>  arch/powerpc/include/asm/pmc.h                     |   16 +-
>  arch/powerpc/include/asm/ppc-opcode.h              |    6 +
>  arch/powerpc/include/asm/ppc-pci.h                 |    1 -
>  arch/powerpc/include/asm/ppc_asm.h                 |   26 +-
>  arch/powerpc/include/asm/pte-40x.h                 |    2 +-
>  arch/powerpc/include/asm/pte-44x.h                 |    2 +-
>  arch/powerpc/include/asm/pte-8xx.h                 |    1 -
>  arch/powerpc/include/asm/pte-book3e.h              |   84 ++
>  arch/powerpc/include/asm/pte-common.h              |   25 +-
>  arch/powerpc/include/asm/pte-fsl-booke.h           |    9 +-
>  arch/powerpc/include/asm/pte-hash32.h              |    1 -
>  arch/powerpc/include/asm/reg.h                     |  141 +++-
>  arch/powerpc/include/asm/reg_booke.h               |   50 +-
>  arch/powerpc/include/asm/setup.h                   |    2 +-
>  arch/powerpc/include/asm/smp.h                     |   10 +
>  arch/powerpc/include/asm/swiotlb.h                 |    8 +-
>  arch/powerpc/include/asm/systbl.h                  |    4 +-
>  arch/powerpc/include/asm/tlb.h                     |   38 +-
>  arch/powerpc/include/asm/tlbflush.h                |   11 +-
>  arch/powerpc/include/asm/vdso.h                    |    3 +-
>  arch/powerpc/kernel/Makefile                       |   21 +-
>  arch/powerpc/kernel/asm-offsets.c                  |   21 +-
>  arch/powerpc/kernel/cpu_setup_6xx.S                |    2 +-
>  arch/powerpc/kernel/cputable.c                     |   62 +-
>  arch/powerpc/kernel/dma-iommu.c                    |    2 +-
>  arch/powerpc/kernel/dma-swiotlb.c                  |   99 +--
>  arch/powerpc/kernel/dma.c                          |   13 +-
>  arch/powerpc/kernel/entry_32.S                     |   20 +-
>  arch/powerpc/kernel/entry_64.S                     |  102 +-
>  arch/powerpc/kernel/exceptions-64e.S               | 1001 ++++++++++++++++
>  arch/powerpc/kernel/exceptions-64s.S               |   97 +-
>  arch/powerpc/kernel/fpu.S                          |    2 +-
>  arch/powerpc/kernel/head_32.S                      |   40 +-
>  arch/powerpc/kernel/head_40x.S                     |  124 +-
>  arch/powerpc/kernel/head_44x.S                     |   58 +-
>  arch/powerpc/kernel/head_64.S                      |   83 ++-
>  arch/powerpc/kernel/head_8xx.S                     |   13 +-
>  arch/powerpc/kernel/head_booke.h                   |   50 +-
>  arch/powerpc/kernel/head_fsl_booke.S               |  100 +-
>  arch/powerpc/kernel/ibmebus.c                      |    2 +-
>  arch/powerpc/kernel/lparcfg.c                      |    3 +
>  arch/powerpc/kernel/misc_32.S                      |    7 +
>  arch/powerpc/kernel/of_platform.c                  |    2 +-
>  arch/powerpc/kernel/paca.c                         |    3 +
>  arch/powerpc/kernel/pci-common.c                   |  133 ++-
>  arch/powerpc/kernel/pci_32.c                       |  105 +--
>  arch/powerpc/kernel/pci_64.c                       |  335 +-----
>  arch/powerpc/kernel/pci_of_scan.c                  |  358 ++++++
>  arch/powerpc/kernel/perf_callchain.c               |  527 ++++++++
>  arch/powerpc/kernel/perf_counter.c                 |   68 +-
>  arch/powerpc/kernel/process.c                      |   16 +-
>  arch/powerpc/kernel/prom_init.c                    |  107 ++-
>  arch/powerpc/kernel/rtas.c                         |    7 +-
>  arch/powerpc/kernel/setup_32.c                     |    8 +
>  arch/powerpc/kernel/setup_64.c                     |   34 +-
>  arch/powerpc/kernel/smp.c                          |   15 +-
>  arch/powerpc/kernel/sys_ppc32.c                    |   12 +
>  arch/powerpc/kernel/sysfs.c                        |    3 +
>  arch/powerpc/kernel/time.c                         |   33 +-
>  arch/powerpc/kernel/vdso.c                         |    7 +-
>  arch/powerpc/kernel/vdso32/Makefile                |    1 +
>  arch/powerpc/kernel/vdso64/Makefile                |    2 +
>  arch/powerpc/kernel/vector.S                       |    2 +-
>  arch/powerpc/kernel/vio.c                          |    2 +-
>  arch/powerpc/kernel/vmlinux.lds.S                  |    8 +-
>  arch/powerpc/kvm/booke_interrupts.S                |   18 +-
>  arch/powerpc/mm/40x_mmu.c                          |    4 +-
>  arch/powerpc/mm/Makefile                           |    1 +
>  arch/powerpc/mm/fsl_booke_mmu.c                    |    2 +-
>  arch/powerpc/mm/hash_low_32.S                      |    4 +-
>  arch/powerpc/mm/hugetlbpage.c                      |    8 +-
>  arch/powerpc/mm/init_32.c                          |    2 -
>  arch/powerpc/mm/init_64.c                          |   55 +-
>  arch/powerpc/mm/mmu_context_nohash.c               |   96 +-
>  arch/powerpc/mm/mmu_decl.h                         |   37 +-
>  arch/powerpc/mm/pgtable.c                          |  179 ++-
>  arch/powerpc/mm/pgtable_32.c                       |    2 +-
>  arch/powerpc/mm/pgtable_64.c                       |   59 +-
>  arch/powerpc/mm/slb.c                              |   83 +-
>  arch/powerpc/mm/stab.c                             |   11 +-
>  arch/powerpc/mm/tlb_hash32.c                       |    3 +
>  arch/powerpc/mm/tlb_hash64.c                       |   20 +-
>  arch/powerpc/mm/tlb_low_64e.S                      |  770 ++++++++++++
>  arch/powerpc/mm/tlb_nohash.c                       |  268 ++++-
>  arch/powerpc/mm/tlb_nohash_low.S                   |   87 ++-
>  arch/powerpc/platforms/40x/Kconfig                 |   10 +
>  arch/powerpc/platforms/40x/ppc40x_simple.c         |    3 +-
>  arch/powerpc/platforms/44x/Kconfig                 |   12 +
>  arch/powerpc/platforms/44x/ppc44x_simple.c         |    1 +
>  arch/powerpc/platforms/82xx/mgcoge.c               |   69 +-
>  arch/powerpc/platforms/82xx/mpc8272_ads.c          |   22 +-
>  arch/powerpc/platforms/83xx/Kconfig                |    4 +-
>  arch/powerpc/platforms/83xx/mpc837x_rdb.c          |   28 +-
>  arch/powerpc/platforms/83xx/mpc83xx.h              |    4 +
>  arch/powerpc/platforms/85xx/Kconfig                |    9 +
>  arch/powerpc/platforms/85xx/Makefile               |    3 +-
>  arch/powerpc/platforms/85xx/mpc8536_ds.c           |    3 +-
>  arch/powerpc/platforms/85xx/mpc85xx_ds.c           |    3 +-
>  arch/powerpc/platforms/85xx/mpc85xx_mds.c          |    7 +-
>  arch/powerpc/platforms/85xx/mpc85xx_rdb.c          |  141 +++
>  arch/powerpc/platforms/85xx/sbc8560.c              |   39 +-
>  arch/powerpc/platforms/85xx/smp.c                  |   23 -
>  arch/powerpc/platforms/86xx/gef_ppc9a.c            |   37 +-
>  arch/powerpc/platforms/86xx/mpc86xx_hpcn.c         |    3 +-
>  arch/powerpc/platforms/86xx/mpc86xx_smp.c          |    1 -
>  arch/powerpc/platforms/Kconfig.cputype             |   38 +-
>  arch/powerpc/platforms/amigaone/setup.c            |    6 +-
>  arch/powerpc/platforms/cell/Kconfig                |    7 -
>  arch/powerpc/platforms/cell/celleb_setup.c         |    3 +-
>  arch/powerpc/platforms/cell/iommu.c                |    2 +-
>  arch/powerpc/platforms/cell/smp.c                  |    2 -
>  arch/powerpc/platforms/cell/spufs/Makefile         |    3 +-
>  arch/powerpc/platforms/cell/spufs/context.c        |    1 +
>  arch/powerpc/platforms/cell/spufs/file.c           |    1 +
>  arch/powerpc/platforms/cell/spufs/sched.c          |    2 +
>  arch/powerpc/platforms/cell/spufs/spufs.h          |    5 -
>  arch/powerpc/platforms/cell/spufs/sputrace.c       |  272 -----
>  arch/powerpc/platforms/cell/spufs/sputrace.h       |   39 +
>  arch/powerpc/platforms/iseries/exception.S         |   59 +-
>  arch/powerpc/platforms/iseries/exception.h         |    6 +-
>  arch/powerpc/platforms/iseries/mf.c                |    2 +-
>  arch/powerpc/platforms/pasemi/idle.c               |    2 +-
>  arch/powerpc/platforms/powermac/cpufreq_32.c       |    8 -
>  arch/powerpc/platforms/powermac/feature.c          |   13 +-
>  arch/powerpc/platforms/powermac/pci.c              |   61 +
>  arch/powerpc/platforms/powermac/smp.c              |    2 +-
>  arch/powerpc/platforms/ps3/mm.c                    |    2 +-
>  arch/powerpc/platforms/ps3/system-bus.c            |    6 +-
>  arch/powerpc/platforms/pseries/pci_dlpar.c         |    2 +-
>  arch/powerpc/platforms/pseries/reconfig.c          |    9 +-
>  arch/powerpc/platforms/pseries/setup.c             |    4 -
>  arch/powerpc/platforms/pseries/smp.c               |    2 -
>  arch/powerpc/sysdev/fsl_rio.c                      |   18 +-
>  arch/powerpc/sysdev/fsl_soc.c                      |    6 +-
>  arch/powerpc/sysdev/ipic.c                         |    7 +-
>  arch/powerpc/sysdev/mmio_nvram.c                   |   32 +
>  arch/powerpc/sysdev/mpic.c                         |   13 +-
>  arch/powerpc/sysdev/qe_lib/gpio.c                  |    4 +-
>  arch/powerpc/sysdev/qe_lib/qe_ic.c                 |    5 +-
>  arch/powerpc/xmon/Makefile                         |    2 +
>  arch/powerpc/xmon/xmon.c                           |    2 +-
>  arch/sparc/Kconfig                                 |    2 +
>  arch/sparc/include/asm/dma-mapping.h               |  145 +--
>  arch/sparc/include/asm/pci.h                       |    3 +
>  arch/sparc/include/asm/pci_32.h                    |  105 --
>  arch/sparc/include/asm/pci_64.h                    |   88 --
>  arch/sparc/kernel/Makefile                         |    2 +-
>  arch/sparc/kernel/dma.c                            |  175 +---
>  arch/sparc/kernel/dma.h                            |   14 -
>  arch/sparc/kernel/iommu.c                          |   20 +-
>  arch/sparc/kernel/ioport.c                         |  190 ++--
>  arch/sparc/kernel/pci.c                            |    2 +-
>  arch/sparc/kernel/pci_sun4v.c                      |   30 +-
>  arch/x86/include/asm/dma-mapping.h                 |   18 +
>  arch/x86/kernel/pci-dma.c                          |    2 +-
>  arch/x86/kernel/pci-gart_64.c                      |    5 +-
>  arch/x86/kernel/pci-nommu.c                        |   29 +-
>  arch/x86/kernel/pci-swiotlb.c                      |   25 -
>  drivers/block/ps3vram.c                            |    2 +-
>  drivers/char/agp/uninorth-agp.c                    |   49 +-
>  drivers/char/generic_nvram.c                       |   27 +-
>  drivers/char/hvc_console.c                         |    2 -
>  drivers/char/hvc_vio.c                             |    4 +-
>  drivers/char/hvsi.c                                |    3 +-
>  drivers/macintosh/macio_asic.c                     |    6 +-
>  drivers/macintosh/therm_windtunnel.c               |    4 +-
>  drivers/ps3/ps3stor_lib.c                          |   65 +-
>  drivers/video/ps3fb.c                              |    2 +-
>  include/asm-generic/dma-mapping-common.h           |    6 -
>  include/linux/dma-mapping.h                        |    6 +-
>  include/linux/pci_ids.h                            |    1 +
>  include/linux/swiotlb.h                            |   11 -
>  kernel/gcov/Kconfig                                |    2 +-
>  lib/Kconfig.debug                                  |    2 +-
>  lib/swiotlb.c                                      |  124 +--
>  244 files changed, 12124 insertions(+), 3266 deletions(-)
>  create mode 100644 arch/powerpc/boot/cuboot-hotfoot.c
>  create mode 100644 arch/powerpc/boot/cuboot-kilauea.c
>  create mode 100644 arch/powerpc/boot/dts/eiger.dts
>  create mode 100644 arch/powerpc/boot/dts/hotfoot.dts
>  create mode 100644 arch/powerpc/boot/dts/mpc8377_wlan.dts
>  create mode 100644 arch/powerpc/boot/dts/mpc8536ds_36b.dts
>  create mode 100644 arch/powerpc/boot/dts/p2020rdb.dts
>  create mode 100644 arch/powerpc/boot/ppcboot-hotfoot.h
>  create mode 100644 arch/powerpc/configs/44x/eiger_defconfig
>  create mode 100644 arch/powerpc/include/asm/exception-64e.h
>  rename arch/powerpc/include/asm/{exception.h => exception-64s.h} (94%)
>  create mode 100644 arch/powerpc/include/asm/pte-book3e.h
>  create mode 100644 arch/powerpc/kernel/exceptions-64e.S
>  create mode 100644 arch/powerpc/kernel/pci_of_scan.c
>  create mode 100644 arch/powerpc/kernel/perf_callchain.c
>  create mode 100644 arch/powerpc/mm/tlb_low_64e.S
>  create mode 100644 arch/powerpc/platforms/85xx/mpc85xx_rdb.c
>  delete mode 100644 arch/powerpc/platforms/cell/spufs/sputrace.c
>  create mode 100644 arch/powerpc/platforms/cell/spufs/sputrace.h
>  delete mode 100644 arch/sparc/kernel/dma.h
> 

^ permalink raw reply

* Re: Oops in IDE probing on ppc_440 when PCI is enabled in strapping
From: Ludo Van Put @ 2009-09-15  8:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev
In-Reply-To: <1252975341.8375.167.camel@pasglop>

2009/9/15 Benjamin Herrenschmidt <benh@kernel.crashing.org>:
> On Mon, 2009-09-14 at 15:08 +0200, Ludo Van Put wrote:
>> 2009/9/14 Josh Boyer <jwboyer@linux.vnet.ibm.com>:
>> > On Mon, Sep 14, 2009 at 02:36:15PM +0200, Ludo Van Put wrote:
>> >>Hi,
>> >>
>> >>we're working with a PPC440GX on a board that has a.o. a compact flash=
 slot.
>> >>We had the PCI subsystem of the ppc disabled in strapping for quite a =
while,
>> >>until we wanted to start using it.
>> >>However, when we enabled PCI in the strapping and in the (patched 2.6.=
10)
>> >
>> > 2.6.10? =A0Really? =A0If that is truly the case, you probably aren't g=
oing to get
>> > a whole lot of help from the list, since that kernel is pretty ancient=
.
>> >
>> I can only acknowledge that, but we're stuck to that kernel for now...
>>
>> >>kernel configuration, we triggered an oops when probing for IDE device=
s (to
>> >>read out the first 512 bytes of the CF). I can see that the ioremap64 =
call
>> >>in the driver code for our CF returns a different address (compared to=
 PCI
>> >>disabled in strapping), but using this address later on for accessing =
the CF
>> >>goes wrong.
>> >
>> > Posting the oops output would perhaps help. =A0Or maybe not.
>> >
>> > josh
>> >
>>
>> Here it goes, you never know:
>>
>> Oops: kernel access of bad area, sig: 11 [#1]
>> PREEMPT
>> NIP: C0148050 LR: C013BC64 SP: C07CFEA0 REGS: c07cfdf0 TRAP: 0300 =A0 =
=A0Not tainted
>> MSR: 00021000 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 00
>> DAR: E3093000, DSISR: 00000000
>> TASK =3D c07cdb70[1] 'swapper' THREAD: c07ce000
>> Last syscall: 120
>> GPR00: 00000000 C07CFEA0 C07CDB70 E3093000 DFE829FE 00000100 C01184E8 C0=
21B270
>> GPR08: C0220000 C02D0F60 C07CDEF8 C07CDEF8 00000000 70000000 1FFF6400 00=
000001
>> GPR16: 00000001 FFFFFFFF 1FFF06C0 00000000 00000001 C0220000 C0280000 00=
029000
>> GPR24: 00000000 C02D0F60 C01F0000 C0148040 00000080 00000000 DFE82A00 C0=
2D0FF0
>> NIP [c0148050] ide_insw+0x10/0x24
>> LR [c013bc64] ata_input_data+0x74/0x114
>> Call backtrace:
>> =A0c013e6a4 try_to_identify+0x2ec/0x5ec
>> =A0c013eaa8 do_probe+0x104/0x304
>> =A0c013f0c4 probe_hwif+0x358/0x6c4
>> =A0c0140068 ideprobe_init+0xa8/0x1a0
>> =A0c02a4ef8 ide_generic_init+0x10/0x28
>> =A0c0001324 init+0xc4/0x244
>> =A0c0004254 kernel_thread+0x44/0x60
>> Kernel panic - not syncing: Attempted to kill init!
>> =A0<0>Rebooting in 180 seconds..
>>
>>
>> ide_insw is a asm routine to read in 16bit words and swap them. Copied
>> from arch/ppc/kernel/misc.S. Works fine when PCI is disabled.
>
> Probably because ide_insw uses isnw which offsets everything from
> _IO_BASE which changes value when you have a PCI bus with an IO space...
> If your IDE isn't PCI IO space based you shouldn't use ide_insw but the
> MMIO variants instead.
>
> Ben.

Thnx for the suggestion, but the ide_insw is in fact of copy of the
_insw assembly routine, and it gets passed
the effective address, without the _IO_BASE offset.

I was thinking about TLB stuff. I'm not a u-boot expert, but could it
be that I need to tweak/reconfigure u-boot so I can access the address
returned from ioremap64?

KR, Ludo

^ permalink raw reply

* Re: Oops in IDE probing on ppc_440 when PCI is enabled in strapping
From: Benjamin Herrenschmidt @ 2009-09-15  9:44 UTC (permalink / raw)
  To: Ludo Van Put; +Cc: linuxppc-dev
In-Reply-To: <5edaeed70909150157n59745b92qe9abf2ed13147288@mail.gmail.com>

On Tue, 2009-09-15 at 10:57 +0200, Ludo Van Put wrote:
> Thnx for the suggestion, but the ide_insw is in fact of copy of the
> _insw assembly routine, and it gets passed
> the effective address, without the _IO_BASE offset.
> 
> I was thinking about TLB stuff. I'm not a u-boot expert, but could it
> be that I need to tweak/reconfigure u-boot so I can access the address
> returned from ioremap64?

No. If you pass the right physical address to ioremap64, the result
should be useable as-is. The TLB entries will be faulted in
automatically by the kernel when doing accesses.

At this stage, I can't say what's wrong, it looks like you may be
accessing the wrong virtual address or something like that. Hard to
tell. It's a data access exception, not a machine check, so that means
that in some ways, the virtual address accessed by ide_insw is not
mapped by the kernel page tables, which is what the kernel TLB miss
handler uses to populate the TLB.

2.6.10 is so old, that I really have little memories of what is going on
in that area and I'm afraid am of little help here. If you have a HW
debugger such as a BDI, you may want to trace through the access, what
kind of TLB faults it generates and why the TLB miss handler doesn't
handle it.

Cheers,
Ben.

^ permalink raw reply

* Re: [RFC] powerpc/irq: Add generic API for setting up cascaded IRQs
From: Benjamin Herrenschmidt @ 2009-09-15 10:02 UTC (permalink / raw)
  To: Grant Likely; +Cc: linuxppc-dev
In-Reply-To: <fa686aa40909120705wdc8928ei75e9b8c952d1e45@mail.gmail.com>


> I'm a reverse polish kind of guy.  I preferring 'subject'_'action'
> over 'action'_'subject' just because it groups like subjects together.
>  But it doesn't matter much, especially in this case where 'subject'
> is in a group of exactly 1.  :-)
> 
> I'll do whichever you prefer.

I just caught myself calling something relocs_check instead of
check_relocs so I suppose it's fair game :-)

Whatever you want.

Cheers,
Ben. 

^ permalink raw reply

* Re: [PATCH] [SCSI] mpt fusion: Fix 32 bit platforms with 64 bit resources
From: Benjamin Herrenschmidt @ 2009-09-15 10:29 UTC (permalink / raw)
  To: pbathija; +Cc: linuxppc-dev, linux-scsi
In-Reply-To: <1252455333-9925-1-git-send-email-pbathija@amcc.com>


> diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c
> index 5d496a9..d5b0f15 100644
> --- a/drivers/message/fusion/mptbase.c
> +++ b/drivers/message/fusion/mptbase.c
> @@ -1510,11 +1510,12 @@ static int
>  mpt_mapresources(MPT_ADAPTER *ioc)
>  {
>  	u8		__iomem *mem;
> +	u8		__iomem *port;
>  	int		 ii;
> -	unsigned long	 mem_phys;
> -	unsigned long	 port;
> -	u32		 msize;
> -	u32		 psize;
> +	phys_addr_t	 mem_phys;
> +	phys_addr_t	 port_phys;
> +	resource_size_t	 msize;
> +	resource_size_t	 psize;

Is phys_addr_t defined for all archs nowadays ? Why not use
resource_size_t for everything ? resource_size_t is a bit of a misnomer,
it's not a type supposed to reference a "size" but really a physical
address (or a size)... it's been called resource_size_t I believe
because it's "sized" appropriately for holding a physical address.

Cheers,
Ben.

^ permalink raw reply

* Re: [RFC] powerpc/irq: Add generic API for setting up cascaded IRQs
From: Michael Ellerman @ 2009-09-15 11:04 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1253008958.8375.213.camel@pasglop>

[-- Attachment #1: Type: text/plain, Size: 618 bytes --]

On Tue, 2009-09-15 at 20:02 +1000, Benjamin Herrenschmidt wrote:
> > I'm a reverse polish kind of guy.  I preferring 'subject'_'action'
> > over 'action'_'subject' just because it groups like subjects together.
> >  But it doesn't matter much, especially in this case where 'subject'
> > is in a group of exactly 1.  :-)
> > 
> > I'll do whichever you prefer.
> 
> I just caught myself calling something relocs_check instead of
> check_relocs so I suppose it's fair game :-)

Yeah but that sounds stupid :)

> Whatever you want.

I want a pony. And setup_cascade() is definitely better IMHO.

cheers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* [PATCH v3 0/3] cpu: pseries: Cpu offline states framework
From: Gautham R Shenoy @ 2009-09-15 12:06 UTC (permalink / raw)
  To: Joel Schopp, Benjamin Herrenschmidt, Peter Zijlstra, Balbir Singh,
	Venkatesh Pallipadi, Dipankar Sarma, Vaidyanathan Srinivasan
  Cc: Arun R Bharadwaj, linuxppc-dev, linux-kernel, Darrick J. Wong

Hi,

**** RFC not for inclusion ****

This is the version 3 of the patch series to provide a cpu-offline framework
that enables the administrators choose the state of a CPU when it is
offlined, when multiple such states are exposed by the underlying
architecture.

Changes from Version 2:(can be found here: http://lkml.org/lkml/2009/8/28/102)
- Addressed Andrew Morton's review comments regarding names of global
  variables, handling of error conditions and documentation of the interfaces.

- Implemented a patch to provide helper functions to set the cede latency
  specifier value in the VPA indicating latency expectation of the guest OS
  when the vcpu is ceded from a subsequent H_CEDE hypercall. Hypervisor may
  use this for better energy savings.

- Renamed of the cpu-hotplug states. "deallocate" is renamed
  as "offline" and "deactivate" is renamed as "inactive".

The patch-series exposes the following sysfs tunables to
allow the system-adminstrator to choose the state of a CPU:

To query the available hotplug states, one needs to read the sysfs tunable:
	/sys/devices/system/cpu/cpu<number>/available_hotplug_states
To query or set the current state, on needs to read/write the sysfs tunable:
	/sys/devices/system/cpu/cpu<number>/current_hotplug_state

The patchset ensures that the writes to the "current_hotplug_state" sysfs file are
serialized against the writes to the "online" file.

This patchset contains the offline state driver implemented for
pSeries. For pSeries, we define three available_hotplug_states. They are:

	online: The processor is online.

	offline: This is the the default behaviour when the cpu is offlined
	even in the absense of this driver. The CPU would call make an
	rtas_stop_self() call and hand over the CPU back to the resource pool,
	thereby effectively deallocating that vCPU from the LPAR.
	NOTE: This would result in a configuration change to the LPAR
	which is visible to the outside world.

	inactive: This cedes the vCPU to the hypervisor with a cede latency
	specifier value 2.
	NOTE: This option does not result in a configuration change
	and the vCPU would be still entitled to the LPAR to which it earlier
	belong to.

Any feedback on the patchset will be immensely valuable.
---

Arun R Bharadwaj (1):
      pSeries: cede latency specifier helper function.

Gautham R Shenoy (2):
      cpu: Implement cpu-offline-state callbacks for pSeries.
      cpu: Offline state Framework.


 Documentation/cpu-hotplug.txt                   |   22 +++
 arch/powerpc/include/asm/lppaca.h               |    9 +
 arch/powerpc/platforms/pseries/Makefile         |    2 
 arch/powerpc/platforms/pseries/hotplug-cpu.c    |   88 ++++++++++-
 arch/powerpc/platforms/pseries/offline_driver.c |  148 +++++++++++++++++++
 arch/powerpc/platforms/pseries/offline_driver.h |   20 +++
 arch/powerpc/platforms/pseries/plpar_wrappers.h |   17 ++
 arch/powerpc/platforms/pseries/smp.c            |   17 ++
 arch/powerpc/xmon/xmon.c                        |    3 
 drivers/base/cpu.c                              |  181 ++++++++++++++++++++++-
 include/linux/cpu.h                             |   10 +
 11 files changed, 498 insertions(+), 19 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.c
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.h

-- 
Thanks and Regards
gautham.

^ permalink raw reply

* [PATCH v3 1/3] pSeries: cede latency specifier helper function.
From: Gautham R Shenoy @ 2009-09-15 12:07 UTC (permalink / raw)
  To: Joel Schopp, Benjamin Herrenschmidt, Peter Zijlstra, Balbir Singh,
	Venkatesh Pallipadi, Dipankar Sarma, Vaidyanathan Srinivasan
  Cc: Arun R Bharadwaj, linuxppc-dev, linux-kernel, Darrick J. Wong
In-Reply-To: <20090915120629.20523.79019.stgit@sofia.in.ibm.com>

From: Arun R Bharadwaj <arun@linux.vnet.ibm.com>

This patch provides helper functions to set the cede latency specifier
value in the VPA indicating the latency expectation of the guest OS to
inform the hypervisor's choice of the platform dependent energy saving
mode chosen for the processor when unused during the subsequent
H_CEDE hypercall.

Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
 arch/powerpc/include/asm/lppaca.h               |    9 ++++++++-
 arch/powerpc/platforms/pseries/plpar_wrappers.h |   17 +++++++++++++++++
 arch/powerpc/xmon/xmon.c                        |    3 ++-
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
index f78f65c..aaa0066 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -100,7 +100,14 @@ struct lppaca {
 	// Used to pass parms from the OS to PLIC for SetAsrAndRfid
 	u64	saved_gpr3;		// Saved GPR3                   x20-x27
 	u64	saved_gpr4;		// Saved GPR4                   x28-x2F
-	u64	saved_gpr5;		// Saved GPR5                   x30-x37
+	union {
+		u64	saved_gpr5;	// Saved GPR5                   x30-x37
+		struct {
+			u8	cede_latency_hint;  //			x30
+			u8	reserved[7];        //			x31-x36
+		} fields;
+	} gpr5_dword;
+
 
 	u8	dtl_enable_mask;	// Dispatch Trace Log mask	x38-x38
 	u8	donate_dedicated_cpu;	// Donate dedicated CPU cycles  x39-x39
diff --git a/arch/powerpc/platforms/pseries/plpar_wrappers.h b/arch/powerpc/platforms/pseries/plpar_wrappers.h
index a24a6b2..1174d4b 100644
--- a/arch/powerpc/platforms/pseries/plpar_wrappers.h
+++ b/arch/powerpc/platforms/pseries/plpar_wrappers.h
@@ -9,11 +9,28 @@ static inline long poll_pending(void)
 	return plpar_hcall_norets(H_POLL_PENDING);
 }
 
+static inline u8 get_cede_latency_hint(void)
+{
+	return get_lppaca()->gpr5_dword.fields.cede_latency_hint;
+}
+
+static inline void set_cede_latency_hint(u8 latency_hint)
+{
+	get_lppaca()->gpr5_dword.fields.cede_latency_hint = latency_hint;
+}
+
 static inline long cede_processor(void)
 {
 	return plpar_hcall_norets(H_CEDE);
 }
 
+static inline long extended_cede_processor(u8 latency_hint)
+{
+	set_cede_latency_hint(latency_hint);
+	cede_processor();
+
+}
+
 static inline long vpa_call(unsigned long flags, unsigned long cpu,
 		unsigned long vpa)
 {
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index e1f33a8..a2089cd 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1613,7 +1613,8 @@ static void super_regs(void)
 			       ptrLpPaca->saved_srr0, ptrLpPaca->saved_srr1);
 			printf("    Saved Gpr3=%.16lx  Saved Gpr4=%.16lx \n",
 			       ptrLpPaca->saved_gpr3, ptrLpPaca->saved_gpr4);
-			printf("    Saved Gpr5=%.16lx \n", ptrLpPaca->saved_gpr5);
+			printf("    Saved Gpr5=%.16lx \n",
+				ptrLpPaca->gpr5_dword.saved_gpr5);
 		}
 #endif
 

^ permalink raw reply related

* [PATCH v3 2/3] cpu: Offline state Framework.
From: Gautham R Shenoy @ 2009-09-15 12:07 UTC (permalink / raw)
  To: Joel Schopp, Benjamin Herrenschmidt, Peter Zijlstra, Balbir Singh,
	Venkatesh Pallipadi, Dipankar Sarma, Vaidyanathan Srinivasan
  Cc: Arun R Bharadwaj, linuxppc-dev, linux-kernel, Darrick J. Wong
In-Reply-To: <20090915120629.20523.79019.stgit@sofia.in.ibm.com>

Provide an interface by which the system administrator can decide what state
should the CPU go to when it is offlined.

To query the hotplug states, on needs to perform a read on:
/sys/devices/system/cpu/cpu<number>/available_hotplug_states

To query or set the current state for a particular CPU, one needs to
use the sysfs interface

/sys/devices/system/cpu/cpu<number>/current_hotplug_state

This patch implements the architecture independent bits of the
cpu-offline-state framework.

The architecture specific bits are expected to register the actual code which
implements the callbacks when the above mentioned sysfs interfaces are read or
written into. Thus the values provided by reading available_offline_states vary
with the architecture.

The patch provides serialization for writes to the "current_hotplug_state"
with respect to with the writes to the "online" sysfs tunable.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
 Documentation/cpu-hotplug.txt |   22 +++++
 drivers/base/cpu.c            |  181 +++++++++++++++++++++++++++++++++++++++--
 include/linux/cpu.h           |   10 ++
 3 files changed, 204 insertions(+), 9 deletions(-)

diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
index 9d620c1..dcec06d 100644
--- a/Documentation/cpu-hotplug.txt
+++ b/Documentation/cpu-hotplug.txt
@@ -115,6 +115,28 @@ Just remember the critical section cannot call any
 function that can sleep or schedule this process away. The preempt_disable()
 will work as long as stop_machine_run() is used to take a cpu down.
 
+CPU-offline states
+--------------------------------------
+On architectures which allow the more than one valid state when
+the CPU goes offline, the system administrator can decide
+the state the CPU needs to go to when it is offlined.
+
+If the architecture has implemented a cpu-offline driver exposing these
+multiple offline states, the system administrator can use the following sysfs
+interfaces to query the available hotplug states and also query and set the
+current hotplug state for a given cpu:
+
+To query the hotplug states, on needs to perform a read on:
+/sys/devices/system/cpu/cpu<number>/available_hotplug_states
+
+To query or set the current state for a particular CPU,
+one needs to use the sysfs interface
+
+/sys/devices/system/cpu/cpu<number>/current_hotplug_state
+
+Writes to the "online" sysfs files are serialized against the writes to the
+"current_hotplug_state" file.
+
 CPU Hotplug - Frequently Asked Questions.
 
 Q: How to enable my kernel to support CPU hotplug?
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index e62a4cc..00c38be 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -20,7 +20,166 @@ EXPORT_SYMBOL(cpu_sysdev_class);
 
 static DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
 
+struct sys_device *get_cpu_sysdev(unsigned cpu)
+{
+	if (cpu < nr_cpu_ids && cpu_possible(cpu))
+		return per_cpu(cpu_sys_devices, cpu);
+	else
+		return NULL;
+}
+EXPORT_SYMBOL_GPL(get_cpu_sysdev);
+
+
 #ifdef CONFIG_HOTPLUG_CPU
+
+struct cpu_offline_driver *cpu_offline_driver;
+static DEFINE_MUTEX(cpu_offline_driver_lock);
+
+ssize_t show_available_hotplug_states(struct sys_device *dev,
+			struct sysdev_attribute *attr, char *buf)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+	int cpu_num = cpu->sysdev.id;
+	ssize_t ret;
+
+	mutex_lock(&cpu_offline_driver_lock);
+	if (!cpu_offline_driver) {
+		ret = -EEXIST;
+		goto out_unlock;
+	}
+
+	ret = cpu_offline_driver->read_available_states(cpu_num, buf);
+
+out_unlock:
+	mutex_unlock(&cpu_offline_driver_lock);
+
+	return ret;
+
+}
+
+ssize_t show_current_hotplug_state(struct sys_device *dev,
+			struct sysdev_attribute *attr, char *buf)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+	int cpu_num = cpu->sysdev.id;
+	ssize_t ret = 0;
+
+	mutex_lock(&cpu_offline_driver_lock);
+	if (!cpu_offline_driver) {
+		ret = -EEXIST;
+		goto out_unlock;
+	}
+
+	ret = cpu_offline_driver->read_current_state(cpu_num, buf);
+
+out_unlock:
+	mutex_unlock(&cpu_offline_driver_lock);
+
+	return ret;
+}
+
+ssize_t store_current_hotplug_state(struct sys_device *dev,
+			struct sysdev_attribute *attr,
+			const char *buf, size_t count)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+	int cpu_num = cpu->sysdev.id;
+	ssize_t ret = count;
+
+	mutex_lock(&cpu_offline_driver_lock);
+	if (!cpu_offline_driver) {
+		ret = -EEXIST;
+		goto out_unlock;
+	}
+
+	ret = cpu_offline_driver->write_current_state(cpu_num, buf);
+
+out_unlock:
+	mutex_unlock(&cpu_offline_driver_lock);
+
+	if (ret >= 0)
+		ret = count;
+	return ret;
+}
+
+static SYSDEV_ATTR(available_hotplug_states, 0444,
+			show_available_hotplug_states, NULL);
+static SYSDEV_ATTR(current_hotplug_state, 0644,
+		show_current_hotplug_state, store_current_hotplug_state);
+
+/* Should be called with cpu_add_remove_lock held */
+void cpu_offline_driver_add_cpu(struct sys_device *cpu_sys_dev)
+{
+	int rc;
+
+	if (!cpu_offline_driver || !cpu_sys_dev)
+		return;
+
+	rc = sysdev_create_file(cpu_sys_dev, &attr_available_hotplug_states);
+	BUG_ON(rc == -EEXIST);
+
+	rc = sysdev_create_file(cpu_sys_dev, &attr_current_hotplug_state);
+	BUG_ON(rc == -EEXIST);
+}
+
+/* Should be called with cpu_add_remove_lock held */
+void cpu_offline_driver_remove_cpu(struct sys_device *cpu_sys_dev)
+{
+	if (!cpu_offline_driver || !cpu_sys_dev)
+		return;
+
+	sysdev_remove_file(cpu_sys_dev, &attr_available_hotplug_states);
+	sysdev_remove_file(cpu_sys_dev, &attr_current_hotplug_state);
+
+}
+
+int register_cpu_offline_driver(struct cpu_offline_driver *arch_cpu_driver)
+{
+	int ret = 0;
+	int cpu;
+
+	mutex_lock(&cpu_offline_driver_lock);
+
+	if (cpu_offline_driver != NULL) {
+		ret = -EEXIST;
+		goto out_unlock;
+	}
+
+	if (WARN_ON(!(arch_cpu_driver->read_available_states &&
+	      arch_cpu_driver->read_current_state &&
+	      arch_cpu_driver->write_current_state))) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	cpu_offline_driver = arch_cpu_driver;
+
+	for_each_possible_cpu(cpu)
+		cpu_offline_driver_add_cpu(get_cpu_sysdev(cpu));
+
+out_unlock:
+	mutex_unlock(&cpu_offline_driver_lock);
+	return ret;
+}
+
+void unregister_cpu_offline_driver(struct cpu_offline_driver *arch_cpu_driver)
+{
+	int cpu;
+	mutex_lock(&cpu_offline_driver_lock);
+
+	if (WARN_ON(!cpu_offline_driver)) {
+		mutex_unlock(&cpu_offline_driver_lock);
+		return;
+	}
+
+	for_each_possible_cpu(cpu)
+		cpu_offline_driver_remove_cpu(get_cpu_sysdev(cpu));
+
+	cpu_offline_driver = NULL;
+	mutex_unlock(&cpu_offline_driver_lock);
+}
+
+
 static ssize_t show_online(struct sys_device *dev, struct sysdev_attribute *attr,
 			   char *buf)
 {
@@ -35,6 +194,7 @@ static ssize_t __ref store_online(struct sys_device *dev, struct sysdev_attribut
 	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
 	ssize_t ret;
 
+	mutex_lock(&cpu_offline_driver_lock);
 	switch (buf[0]) {
 	case '0':
 		ret = cpu_down(cpu->sysdev.id);
@@ -50,6 +210,8 @@ static ssize_t __ref store_online(struct sys_device *dev, struct sysdev_attribut
 		ret = -EINVAL;
 	}
 
+	mutex_unlock(&cpu_offline_driver_lock);
+
 	if (ret >= 0)
 		ret = count;
 	return ret;
@@ -59,23 +221,33 @@ static SYSDEV_ATTR(online, 0644, show_online, store_online);
 static void __cpuinit register_cpu_control(struct cpu *cpu)
 {
 	sysdev_create_file(&cpu->sysdev, &attr_online);
+	mutex_lock(&cpu_offline_driver_lock);
+	cpu_offline_driver_add_cpu(&cpu->sysdev);
+	mutex_unlock(&cpu_offline_driver_lock);
 }
+
 void unregister_cpu(struct cpu *cpu)
 {
 	int logical_cpu = cpu->sysdev.id;
 
 	unregister_cpu_under_node(logical_cpu, cpu_to_node(logical_cpu));
 
+	mutex_lock(&cpu_offline_driver_lock);
+	cpu_offline_driver_remove_cpu(&cpu->sysdev);
+	mutex_unlock(&cpu_offline_driver_lock);
+
 	sysdev_remove_file(&cpu->sysdev, &attr_online);
 
 	sysdev_unregister(&cpu->sysdev);
 	per_cpu(cpu_sys_devices, logical_cpu) = NULL;
 	return;
 }
+
 #else /* ... !CONFIG_HOTPLUG_CPU */
 static inline void register_cpu_control(struct cpu *cpu)
 {
 }
+
 #endif /* CONFIG_HOTPLUG_CPU */
 
 #ifdef CONFIG_KEXEC
@@ -224,15 +396,6 @@ int __cpuinit register_cpu(struct cpu *cpu, int num)
 	return error;
 }
 
-struct sys_device *get_cpu_sysdev(unsigned cpu)
-{
-	if (cpu < nr_cpu_ids && cpu_possible(cpu))
-		return per_cpu(cpu_sys_devices, cpu);
-	else
-		return NULL;
-}
-EXPORT_SYMBOL_GPL(get_cpu_sysdev);
-
 int __init cpu_dev_init(void)
 {
 	int err;
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 4d668e0..8ac12e8 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -51,6 +51,16 @@ struct notifier_block;
 #ifdef CONFIG_HOTPLUG_CPU
 extern int register_cpu_notifier(struct notifier_block *nb);
 extern void unregister_cpu_notifier(struct notifier_block *nb);
+
+struct cpu_offline_driver {
+	ssize_t (*read_available_states)(unsigned int cpu, char *buf);
+	ssize_t (*read_current_state)(unsigned int cpu, char *buf);
+	ssize_t (*write_current_state)(unsigned int cpu, const char *buf);
+};
+
+extern int register_cpu_offline_driver(struct cpu_offline_driver *driver);
+extern void unregister_cpu_offline_driver(struct cpu_offline_driver *driver);
+
 #else
 
 #ifndef MODULE

^ permalink raw reply related

* [PATCH v3 3/3] cpu: Implement cpu-offline-state callbacks for pSeries.
From: Gautham R Shenoy @ 2009-09-15 12:07 UTC (permalink / raw)
  To: Joel Schopp, Benjamin Herrenschmidt, Peter Zijlstra, Balbir Singh,
	Venkatesh Pallipadi, Dipankar Sarma, Vaidyanathan Srinivasan
  Cc: Arun R Bharadwaj, linuxppc-dev, linux-kernel, Darrick J. Wong
In-Reply-To: <20090915120629.20523.79019.stgit@sofia.in.ibm.com>

This patch implements the callbacks to handle the reads/writes into the sysfs
interfaces

/sys/devices/system/cpu/cpu<number>/available_hotplug_states
and
/sys/devices/system/cpu/cpu<number>/current_hotplug_state

Currently, the patch defines two states which the processor can go to when it
is offlined. They are

- offline: The current behaviour when the cpu is offlined.
  The CPU would call make an rtas_stop_self() call and hand over the
  CPU back to the resource pool, thereby effectively deallocating
  that vCPU from the LPAR.

- inactive: This is expected to cede the processor to the hypervisor with a
  latency hint specifier value. Hypervisor may use this hint to provide
  better energy savings. In this state, the control of the vCPU will continue
  to be with the LPAR.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
 arch/powerpc/platforms/pseries/Makefile         |    2 
 arch/powerpc/platforms/pseries/hotplug-cpu.c    |   88 +++++++++++++-
 arch/powerpc/platforms/pseries/offline_driver.c |  148 +++++++++++++++++++++++
 arch/powerpc/platforms/pseries/offline_driver.h |   20 +++
 arch/powerpc/platforms/pseries/smp.c            |   17 +++
 5 files changed, 267 insertions(+), 8 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.c
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.h

diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index 790c0b8..3a569c7 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -17,7 +17,7 @@ obj-$(CONFIG_KEXEC)	+= kexec.o
 obj-$(CONFIG_PCI)	+= pci.o pci_dlpar.o
 obj-$(CONFIG_PSERIES_MSI)	+= msi.o
 
-obj-$(CONFIG_HOTPLUG_CPU)	+= hotplug-cpu.o
+obj-$(CONFIG_HOTPLUG_CPU)	+= hotplug-cpu.o offline_driver.o
 obj-$(CONFIG_MEMORY_HOTPLUG)	+= hotplug-memory.o
 
 obj-$(CONFIG_HVC_CONSOLE)	+= hvconsole.o
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index a20ead8..1e06bb1 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -30,6 +30,7 @@
 #include <asm/pSeries_reconfig.h>
 #include "xics.h"
 #include "plpar_wrappers.h"
+#include "offline_driver.h"
 
 /* This version can't take the spinlock, because it never returns */
 static struct rtas_args rtas_stop_self_args = {
@@ -54,13 +55,74 @@ static void rtas_stop_self(void)
 	panic("Alas, I survived.\n");
 }
 
+static void cede_on_offline(u8 cede_latency_hint)
+{
+	unsigned int cpu = smp_processor_id();
+	unsigned int hwcpu = hard_smp_processor_id();
+	u8 old_cede_latency_hint;
+
+	old_cede_latency_hint = get_cede_latency_hint();
+	get_lppaca()->idle = 1;
+	if (!get_lppaca()->shared_proc)
+		get_lppaca()->donate_dedicated_cpu = 1;
+
+	printk(KERN_INFO "cpu %u (hwid %u) ceding for offline with hint %d\n",
+			cpu, hwcpu, cede_latency_hint);
+	while (get_preferred_offline_state(cpu) != CPU_STATE_ONLINE) {
+		extended_cede_processor(cede_latency_hint);
+		printk(KERN_INFO "cpu %u (hwid %u) returned from cede.\n",
+			cpu, hwcpu);
+		printk(KERN_INFO
+			"Decrementer value = %x Timebase value = %llx\n",
+			get_dec(), get_tb());
+	}
+
+	printk(KERN_INFO "cpu %u (hwid %u) got prodded to go online\n",
+		cpu, hwcpu);
+
+	if (!get_lppaca()->shared_proc)
+		get_lppaca()->donate_dedicated_cpu = 0;
+	get_lppaca()->idle = 0;
+
+	/* Reset the cede_latency specifier value */
+	set_cede_latency_hint(old_cede_latency_hint);
+
+	unregister_slb_shadow(hwcpu, __pa(get_slb_shadow()));
+
+	/*
+	 * NOTE: Calling start_secondary() here for now to start
+	 * a new context.
+	 *
+	 * However, need to do it cleanly by resetting the stack
+	 * pointer.
+	 */
+	start_secondary();
+}
+
 static void pseries_mach_cpu_die(void)
 {
+	unsigned int cpu = smp_processor_id();
+	u8 cede_latency_hint = 0;
+
 	local_irq_disable();
 	idle_task_exit();
 	xics_teardown_cpu();
-	unregister_slb_shadow(hard_smp_processor_id(), __pa(get_slb_shadow()));
-	rtas_stop_self();
+
+	if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
+
+		set_cpu_current_state(cpu, CPU_STATE_OFFLINE);
+		unregister_slb_shadow(hard_smp_processor_id(),
+					__pa(get_slb_shadow()));
+		rtas_stop_self();
+		goto out_bug;
+	} else if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
+		set_cpu_current_state(cpu, CPU_STATE_INACTIVE);
+		cede_latency_hint = 2;
+		cede_on_offline(cede_latency_hint);
+
+	}
+
+out_bug:
 	/* Should never get here... */
 	BUG();
 	for(;;);
@@ -112,11 +174,23 @@ static void pseries_cpu_die(unsigned int cpu)
 	int cpu_status;
 	unsigned int pcpu = get_hard_smp_processor_id(cpu);
 
-	for (tries = 0; tries < 25; tries++) {
-		cpu_status = query_cpu_stopped(pcpu);
-		if (cpu_status == 0 || cpu_status == -1)
-			break;
-		cpu_relax();
+	if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
+		cpu_status = 1;
+		for (tries = 0; tries < 1000; tries++) {
+			if (get_cpu_current_state(cpu) == CPU_STATE_INACTIVE) {
+				cpu_status = 0;
+				break;
+			}
+			cpu_relax();
+		}
+	} else {
+
+		for (tries = 0; tries < 25; tries++) {
+			cpu_status = query_cpu_stopped(pcpu);
+			if (cpu_status == 0 || cpu_status == -1)
+				break;
+			cpu_relax();
+		}
 	}
 	if (cpu_status != 0) {
 		printk("Querying DEAD? cpu %i (%i) shows %i\n",
diff --git a/arch/powerpc/platforms/pseries/offline_driver.c b/arch/powerpc/platforms/pseries/offline_driver.c
new file mode 100644
index 0000000..ca15b6b
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/offline_driver.c
@@ -0,0 +1,148 @@
+#include "offline_driver.h"
+#include <linux/cpu.h>
+#include <linux/percpu-defs.h>
+
+struct cpu_hotplug_state {
+	enum cpu_state_vals state_val;
+	const char *state_name;
+	int available;
+} pSeries_cpu_hotplug_states[] = {
+	{CPU_STATE_OFFLINE, "offline", 1},
+	{CPU_STATE_INACTIVE, "inactive", 1},
+	{CPU_STATE_ONLINE, "online", 1},
+	{CPU_MAX_HOTPLUG_STATES, "", 0},
+};
+
+static DEFINE_PER_CPU(enum cpu_state_vals, preferred_offline_state) =
+							CPU_STATE_OFFLINE;
+static DEFINE_PER_CPU(enum cpu_state_vals, current_state) = CPU_STATE_OFFLINE;
+
+static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE;
+
+enum cpu_state_vals get_cpu_current_state(int cpu)
+{
+	return per_cpu(current_state, cpu);
+}
+
+void set_cpu_current_state(int cpu, enum cpu_state_vals state)
+{
+	per_cpu(current_state, cpu) = state;
+}
+
+enum cpu_state_vals get_preferred_offline_state(int cpu)
+{
+	return per_cpu(preferred_offline_state, cpu);
+}
+
+void set_preferred_offline_state(int cpu, enum cpu_state_vals state)
+{
+	per_cpu(preferred_offline_state, cpu) = state;
+}
+
+void set_default_offline_state(int cpu)
+{
+	per_cpu(preferred_offline_state, cpu) = default_offline_state;
+}
+
+static const char *get_cpu_hotplug_state_name(enum cpu_state_vals state_val)
+{
+	return pSeries_cpu_hotplug_states[state_val].state_name;
+}
+
+static bool cpu_hotplug_state_available(enum cpu_state_vals state_val)
+{
+	return pSeries_cpu_hotplug_states[state_val].available;
+}
+
+ssize_t pSeries_read_available_states(unsigned int cpu, char *buf)
+{
+	int state;
+	ssize_t ret = 0;
+
+	for (state = CPU_STATE_OFFLINE; state < CPU_MAX_HOTPLUG_STATES;
+								state++) {
+		if (!cpu_hotplug_state_available(state))
+			continue;
+
+		if (ret >= (ssize_t) ((PAGE_SIZE / sizeof(char))
+					- (CPU_STATES_LEN + 2)))
+			goto out;
+		ret += scnprintf(&buf[ret], CPU_STATES_LEN, "%s ",
+				get_cpu_hotplug_state_name(state));
+	}
+
+out:
+	ret += sprintf(&buf[ret], "\n");
+	return ret;
+}
+
+ssize_t pSeries_read_current_state(unsigned int cpu, char *buf)
+{
+	int state = get_cpu_current_state(cpu);
+
+	return scnprintf(buf, CPU_STATES_LEN, "%s\n",
+				get_cpu_hotplug_state_name(state));
+}
+
+ssize_t pSeries_write_current_state(unsigned int cpu, const char *buf)
+{
+	int ret;
+	char state_name[CPU_STATES_LEN];
+	int i;
+	struct sys_device *dev = get_cpu_sysdev(cpu);
+	ret = sscanf(buf, "%15s", state_name);
+
+	if (ret != 1) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	for (i = CPU_STATE_OFFLINE; i < CPU_MAX_HOTPLUG_STATES; i++)
+		if (!strnicmp(state_name,
+				get_cpu_hotplug_state_name(i),
+				CPU_STATES_LEN))
+			break;
+
+	if (i == CPU_MAX_HOTPLUG_STATES) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	if (i == get_cpu_current_state(cpu)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	if (i == CPU_STATE_ONLINE) {
+		ret = cpu_up(cpu);
+		if (!ret)
+			kobject_uevent(&dev->kobj, KOBJ_ONLINE);
+		goto out_unlock;
+	}
+
+	if (get_cpu_current_state(cpu) != CPU_STATE_ONLINE) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	set_preferred_offline_state(cpu, i);
+	ret = cpu_down(cpu);
+	if (!ret)
+		kobject_uevent(&dev->kobj, KOBJ_OFFLINE);
+
+out_unlock:
+	return ret;
+}
+
+struct cpu_offline_driver pSeries_offline_driver = {
+	.read_available_states = pSeries_read_available_states,
+	.read_current_state = pSeries_read_current_state,
+	.write_current_state = pSeries_write_current_state,
+};
+
+static int pseries_hotplug_driver_init(void)
+{
+	return register_cpu_offline_driver(&pSeries_offline_driver);
+}
+
+arch_initcall(pseries_hotplug_driver_init);
diff --git a/arch/powerpc/platforms/pseries/offline_driver.h b/arch/powerpc/platforms/pseries/offline_driver.h
new file mode 100644
index 0000000..b4674df
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/offline_driver.h
@@ -0,0 +1,20 @@
+#ifndef _OFFLINE_DRIVER_H_
+#define _OFFLINE_DRIVER_H_
+
+#define CPU_STATES_LEN	16
+
+/* Cpu offline states go here */
+enum cpu_state_vals {
+	CPU_STATE_OFFLINE,
+	CPU_STATE_INACTIVE,
+	CPU_STATE_ONLINE,
+	CPU_MAX_HOTPLUG_STATES
+};
+
+extern enum cpu_state_vals get_cpu_current_state(int cpu);
+extern void set_cpu_current_state(int cpu, enum cpu_state_vals state);
+extern enum cpu_state_vals get_preferred_offline_state(int cpu);
+extern void set_preferred_offline_state(int cpu, enum cpu_state_vals state);
+extern int start_secondary(void);
+extern void set_default_offline_state(int cpu);
+#endif
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index 1f8f6cf..48f8ae5 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -48,6 +48,7 @@
 #include "plpar_wrappers.h"
 #include "pseries.h"
 #include "xics.h"
+#include "offline_driver.h"
 
 
 /*
@@ -86,6 +87,9 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
 	/* Fixup atomic count: it exited inside IRQ handler. */
 	task_thread_info(paca[lcpu].__current)->preempt_count	= 0;
 
+	if (get_cpu_current_state(lcpu) != CPU_STATE_OFFLINE)
+		goto out;
+
 	/* 
 	 * If the RTAS start-cpu token does not exist then presume the
 	 * cpu is already spinning.
@@ -100,6 +104,7 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
 		return 0;
 	}
 
+out:
 	return 1;
 }
 
@@ -113,12 +118,15 @@ static void __devinit smp_xics_setup_cpu(int cpu)
 		vpa_init(cpu);
 
 	cpu_clear(cpu, of_spin_map);
+	set_cpu_current_state(cpu, CPU_STATE_ONLINE);
+	set_default_offline_state(cpu);
 
 }
 #endif /* CONFIG_XICS */
 
 static void __devinit smp_pSeries_kick_cpu(int nr)
 {
+	long rc;
 	BUG_ON(nr < 0 || nr >= NR_CPUS);
 
 	if (!smp_startup_cpu(nr))
@@ -130,6 +138,15 @@ static void __devinit smp_pSeries_kick_cpu(int nr)
 	 * the processor will continue on to secondary_start
 	 */
 	paca[nr].cpu_start = 1;
+
+	set_preferred_offline_state(nr, CPU_STATE_ONLINE);
+
+	if (get_cpu_current_state(nr) != CPU_STATE_OFFLINE) {
+		rc = plpar_hcall_norets(H_PROD, nr);
+		if (rc != H_SUCCESS)
+			panic("Error: Prod to wake up processor %d Ret= %ld\n",
+				nr, rc);
+	}
 }
 
 static int smp_pSeries_cpu_bootable(unsigned int nr)

^ permalink raw reply related

* Re: [PATCH v3 0/3] cpu: pseries: Cpu offline states framework
From: Peter Zijlstra @ 2009-09-15 12:11 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: linux-kernel, Venkatesh Pallipadi, Arun R Bharadwaj, linuxppc-dev,
	Darrick J. Wong
In-Reply-To: <20090915120629.20523.79019.stgit@sofia.in.ibm.com>

On Tue, 2009-09-15 at 17:36 +0530, Gautham R Shenoy wrote:
> This patchset contains the offline state driver implemented for
> pSeries. For pSeries, we define three available_hotplug_states. They are:
> 
>         online: The processor is online.
> 
>         offline: This is the the default behaviour when the cpu is offlined
>         even in the absense of this driver. The CPU would call make an
>         rtas_stop_self() call and hand over the CPU back to the resource pool,
>         thereby effectively deallocating that vCPU from the LPAR.
>         NOTE: This would result in a configuration change to the LPAR
>         which is visible to the outside world.
> 
>         inactive: This cedes the vCPU to the hypervisor with a cede latency
>         specifier value 2.
>         NOTE: This option does not result in a configuration change
>         and the vCPU would be still entitled to the LPAR to which it earlier
>         belong to.
> 
> Any feedback on the patchset will be immensely valuable.

I still think its a layering violation... its the hypervisor manager
that should be bothered in what state an off-lined cpu is in. 

^ permalink raw reply

* Re: [PATCH v3 0/3] cpu: pseries: Cpu offline states framework
From: Michael Ellerman @ 2009-09-15 13:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gautham R Shenoy, linux-kernel, Venkatesh Pallipadi,
	Arun R Bharadwaj, linuxppc-dev, Darrick J. Wong
In-Reply-To: <1253016701.5506.73.camel@laptop>

[-- Attachment #1: Type: text/plain, Size: 1452 bytes --]

On Tue, 2009-09-15 at 14:11 +0200, Peter Zijlstra wrote:
> On Tue, 2009-09-15 at 17:36 +0530, Gautham R Shenoy wrote:
> > This patchset contains the offline state driver implemented for
> > pSeries. For pSeries, we define three available_hotplug_states. They are:
> > 
> >         online: The processor is online.
> > 
> >         offline: This is the the default behaviour when the cpu is offlined
> >         even in the absense of this driver. The CPU would call make an
> >         rtas_stop_self() call and hand over the CPU back to the resource pool,
> >         thereby effectively deallocating that vCPU from the LPAR.
> >         NOTE: This would result in a configuration change to the LPAR
> >         which is visible to the outside world.
> > 
> >         inactive: This cedes the vCPU to the hypervisor with a cede latency
> >         specifier value 2.
> >         NOTE: This option does not result in a configuration change
> >         and the vCPU would be still entitled to the LPAR to which it earlier
> >         belong to.
> > 
> > Any feedback on the patchset will be immensely valuable.
> 
> I still think its a layering violation... its the hypervisor manager
> that should be bothered in what state an off-lined cpu is in. 

Yeah it probably is a layering violation, but when has that stopped us
before :)

Is it anticipated that this will be useful on platforms other than
pseries?

cheers



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox