* RE: [PATCH 1/3] rapidio: make enumeration/discovery configurable
From: Bounine, Alexandre @ 2013-04-29 12:08 UTC (permalink / raw)
To: Andrew Morton
Cc: Micha Nelissen, linux-kernel@vger.kernel.org, Andre van Herk,
linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20130426155325.09fbad8c382dc8c549d503d0@linux-foundation.org>
On Fri, 26 Apr 2013 6:53 PM Andrew Morton <akpm@linux-foundation.org> wrote=
:
> Subject: Re: [PATCH 1/3] rapidio: make enumeration/discovery
> configurable
>=20
> This Kconfig change makes my kbuild do Weird Things.
>=20
> make mrproper ; yes "" | make allmodconfig ; make 2>/tmp/x
... skip ...=20
> : DMA Engine support for RapidIO (RAPIDIO_DMA_ENGINE) [Y/n/?] y
> : RapidIO subsystem debug messages (RAPIDIO_DEBUG) [Y/n/?] y
> : Enumeration method [M/y/?] (NEW) aborted!
> :
> : Console input/output is redirected. Run 'make oldconfig' to update conf=
iguration.
> :
> : SYSHDR arch/x86/syscalls/../include/generated/uapi/asm/unistd_32.h
> : SYSHDR arch/x86/syscalls/../include/generated/uapi/asm/unistd_64.h
> : SYSHDR
> arch/x86/syscalls/../include/generated/uapi/asm/unistd_x32.h
>=20
>=20
> See the "Enumeration method [M/y/?] (NEW) aborted!"
>=20
> Note that this only happens when make's stderr is redirected.
>=20
> I've no idea what's going on here. This appears to fix things:
>=20
> --- a/drivers/rapidio/Kconfig~rapidio-make-enumeration-discovery-
> configurable-fix
> +++ a/drivers/rapidio/Kconfig
> @@ -59,7 +59,7 @@ choice
> If unsure, select Basic built-in.
>=20
> config RAPIDIO_ENUM_BASIC
> - tristate "Basic"
> + bool "Basic"
> help
> This option includes basic RapidIO fabric enumeration and
> discovery
> mechanism similar to one described in RapidIO specification
> Annex 1.
>=20
> but doesn't appear to be what you intended.
Goal is to build enumerator as a module or built-in.
I will retest v2 of patches for this issue.
There are some changes to Kconfig (removed AUTO_ENUM option).
^ permalink raw reply
* [PATCH] powerpc/booke: remove obsolete macro FINISH_EXCEPTION
From: Kevin Hao @ 2013-04-29 10:59 UTC (permalink / raw)
To: Kumar Gala, Benjamin Herrenschmidt; +Cc: linuxppc
This is stale and not used by anyone now.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
---
This passed the build test with the ppc44x_defconfig, corenet32_smp_defconfig.
arch/powerpc/kernel/head_booke.h | 5 -----
1 file changed, 5 deletions(-)
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index 5f051ee..b385350 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -199,11 +199,6 @@
.align 5; \
label:
-#define FINISH_EXCEPTION(func) \
- bl transfer_to_handler_full; \
- .long func; \
- .long ret_from_except_full
-
#define EXCEPTION(n, intno, label, hdlr, xfer) \
START_EXCEPTION(label); \
NORMAL_EXCEPTION_PROLOG(intno); \
--
1.8.1.4
^ permalink raw reply related
* Please pull 'next' branch of 5xxx tree
From: Anatolij Gustschin @ 2013-04-29 9:42 UTC (permalink / raw)
To: Benjamin Herrenschmidt, linuxppc-dev
Hi Ben !
Please pull mpc5xxx patches for v3.10. There are some changes
for mpc5121 generic platform code to support mpc5125 SoC and DTS
files for ac14xx and MPC5125-TWR boards.
All these patches have already been in linux-next for a while.
Thanks,
Anatolij
The following changes since commit 07961ac7c0ee8b546658717034fe692fd12eefa9:
Linux 3.9-rc5 (2013-03-31 15:12:43 -0700)
are available in the git repository at:
git://git.denx.de/linux-2.6-agust.git next
for you to fetch changes up to fdeaf0e20e9f3999c5cb129e821595fe927bf259:
powerpc/512x: add ifm ac14xx board (2013-04-10 20:48:44 +0200)
----------------------------------------------------------------
Anatolij Gustschin (2):
powerpc/mpc512x: create SoC devices for more nodes
powerpc/512x: add ifm ac14xx board
Matteo Facchinetti (2):
powerpc/512x: move mpc5121_generic platform to mpc512x_generic.
powerpc/mpc512x: add platform code for MPC5125.
arch/powerpc/boot/dts/ac14xx.dts | 392 ++++++++++++++++++++
arch/powerpc/boot/dts/mpc5121.dtsi | 2 +-
arch/powerpc/boot/dts/mpc5121ads.dts | 2 +-
arch/powerpc/boot/dts/mpc5125twr.dts | 233 ++++++++++++
arch/powerpc/boot/dts/pdm360ng.dts | 2 +-
arch/powerpc/configs/mpc512x_defconfig | 2 +-
arch/powerpc/platforms/512x/Kconfig | 8 +-
arch/powerpc/platforms/512x/Makefile | 2 +-
arch/powerpc/platforms/512x/clock.c | 9 +-
arch/powerpc/platforms/512x/mpc512x.h | 1 +
.../512x/{mpc5121_generic.c => mpc512x_generic.c} | 12 +-
arch/powerpc/platforms/512x/mpc512x_shared.c | 33 +-
12 files changed, 674 insertions(+), 24 deletions(-)
create mode 100644 arch/powerpc/boot/dts/ac14xx.dts
create mode 100644 arch/powerpc/boot/dts/mpc5125twr.dts
rename arch/powerpc/platforms/512x/{mpc5121_generic.c => mpc512x_generic.c} (83%)
^ permalink raw reply
* Re: [PATCH] powerpc: Add an in memory udbg console
From: Benjamin Herrenschmidt @ 2013-04-29 9:07 UTC (permalink / raw)
To: Alistair Popple; +Cc: linuxppc-dev
In-Reply-To: <1367216662-16872-1-git-send-email-alistair@popple.id.au>
On Mon, 2013-04-29 at 16:24 +1000, Alistair Popple wrote:
> This patch adds a new udbg early debug console which utilises
> statically defined input and output buffers stored within the kernel
> BSS. It is primarily designed to assist with bring up of new hardware
> which may not have a working console but which has a method of
> reading/writing kernel memory.
This probably need some memory barriers :-)
Cheers,
Ben.
> Signed-off-by: Alistair Popple <alistair@popple.id.au>
> ---
> arch/powerpc/Kconfig.debug | 23 ++++++++++
> arch/powerpc/include/asm/udbg.h | 1 +
> arch/powerpc/kernel/udbg.c | 3 ++
> arch/powerpc/sysdev/Makefile | 2 +
> arch/powerpc/sysdev/udbg_memcons.c | 85 ++++++++++++++++++++++++++++++++++++
> 5 files changed, 114 insertions(+)
> create mode 100644 arch/powerpc/sysdev/udbg_memcons.c
>
> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> index 5416e28..863d877 100644
> --- a/arch/powerpc/Kconfig.debug
> +++ b/arch/powerpc/Kconfig.debug
> @@ -262,8 +262,31 @@ config PPC_EARLY_DEBUG_OPAL_HVSI
> Select this to enable early debugging for the PowerNV platform
> using an "hvsi" console
>
> +config PPC_EARLY_DEBUG_MEMCONS
> + bool "In memory console"
> + help
> + Select this to enable early debugging using an in memory console.
> + This console provides input and output buffers stored within the
> + kernel BSS and should be safe to select on any system. A debugger
> + can then be used to read kernel output or send input to the console.
> endchoice
>
> +config PPC_MEMCONS_OUTPUT_SIZE
> + int "In memory console output buffer size"
> + depends on PPC_EARLY_DEBUG_MEMCONS
> + default 4096
> + help
> + Selects the size of the output buffer (in bytes) of the in memory
> + console.
> +
> +config PPC_MEMCONS_INPUT_SIZE
> + int "In memory console input buffer size"
> + depends on PPC_EARLY_DEBUG_MEMCONS
> + default 128
> + help
> + Selects the size of the input buffer (in bytes) of the in memory
> + console.
> +
> config PPC_EARLY_DEBUG_OPAL
> def_bool y
> depends on PPC_EARLY_DEBUG_OPAL_RAW || PPC_EARLY_DEBUG_OPAL_HVSI
> diff --git a/arch/powerpc/include/asm/udbg.h b/arch/powerpc/include/asm/udbg.h
> index 5a7510e..dc59091 100644
> --- a/arch/powerpc/include/asm/udbg.h
> +++ b/arch/powerpc/include/asm/udbg.h
> @@ -52,6 +52,7 @@ extern void __init udbg_init_40x_realmode(void);
> extern void __init udbg_init_cpm(void);
> extern void __init udbg_init_usbgecko(void);
> extern void __init udbg_init_wsp(void);
> +extern void __init udbg_init_memcons(void);
> extern void __init udbg_init_ehv_bc(void);
> extern void __init udbg_init_ps3gelic(void);
> extern void __init udbg_init_debug_opal_raw(void);
> diff --git a/arch/powerpc/kernel/udbg.c b/arch/powerpc/kernel/udbg.c
> index f974849..efa5c93 100644
> --- a/arch/powerpc/kernel/udbg.c
> +++ b/arch/powerpc/kernel/udbg.c
> @@ -64,6 +64,9 @@ void __init udbg_early_init(void)
> udbg_init_usbgecko();
> #elif defined(CONFIG_PPC_EARLY_DEBUG_WSP)
> udbg_init_wsp();
> +#elif defined(CONFIG_PPC_EARLY_DEBUG_MEMCONS)
> + /* In memory console */
> + udbg_init_memcons();
> #elif defined(CONFIG_PPC_EARLY_DEBUG_EHV_BC)
> udbg_init_ehv_bc();
> #elif defined(CONFIG_PPC_EARLY_DEBUG_PS3GELIC)
> diff --git a/arch/powerpc/sysdev/Makefile b/arch/powerpc/sysdev/Makefile
> index b0a518e..99464a7 100644
> --- a/arch/powerpc/sysdev/Makefile
> +++ b/arch/powerpc/sysdev/Makefile
> @@ -64,6 +64,8 @@ endif
>
> obj-$(CONFIG_PPC_SCOM) += scom.o
>
> +obj-$(CONFIG_PPC_EARLY_DEBUG_MEMCONS) += udbg_memcons.o
> +
> subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
>
> obj-$(CONFIG_PPC_XICS) += xics/
> diff --git a/arch/powerpc/sysdev/udbg_memcons.c b/arch/powerpc/sysdev/udbg_memcons.c
> new file mode 100644
> index 0000000..36f5c44
> --- /dev/null
> +++ b/arch/powerpc/sysdev/udbg_memcons.c
> @@ -0,0 +1,85 @@
> +/*
> + * A udbg backend which logs messages and reads input from in memory
> + * buffers.
> + *
> + * Copyright (C) 2003-2005 Anton Blanchard and Milton Miller, IBM Corp
> + * Copyright (C) 2013 Alistair Popple, IBM Corp
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <asm/page.h>
> +#include <asm/processor.h>
> +#include <asm/udbg.h>
> +
> +struct memcons {
> + char *output_start;
> + char *output_pos;
> + char *output_end;
> + char *input_start;
> + char *input_pos;
> + char *input_end;
> +};
> +
> +static char memcons_output[CONFIG_PPC_MEMCONS_OUTPUT_SIZE];
> +static char memcons_input[CONFIG_PPC_MEMCONS_INPUT_SIZE];
> +
> +struct memcons memcons = {
> + .output_start = memcons_output,
> + .output_pos = memcons_output,
> + .output_end = &memcons_output[CONFIG_PPC_MEMCONS_OUTPUT_SIZE],
> + .input_start = memcons_input,
> + .input_pos = memcons_input,
> + .input_end = &memcons_input[CONFIG_PPC_MEMCONS_INPUT_SIZE],
> +};
> +
> +void memcons_putc(char c)
> +{
> + *memcons.output_pos++ = c;
> + if (memcons.output_pos >= memcons.output_end)
> + memcons.output_pos = memcons.output_start;
> +}
> +
> +int memcons_getc_poll(void)
> +{
> + char c;
> +
> + if (*memcons.input_pos) {
> + c = *memcons.input_pos;
> + *memcons.input_pos++ = 0;
> +
> + if (*memcons.input_pos == '\0' || memcons.input_pos >= memcons.input_end)
> + memcons.input_pos = memcons.input_start;
> +
> + return c;
> + }
> +
> + return -1;
> +}
> +
> +int memcons_getc(void)
> +{
> + int c;
> +
> + while (1) {
> + c = memcons_getc_poll();
> + if (c == -1)
> + cpu_relax();
> + else
> + break;
> + }
> +
> + return c;
> +}
> +
> +void udbg_init_memcons(void)
> +{
> + udbg_putc = memcons_putc;
> + udbg_getc = memcons_getc;
> + udbg_getc_poll = memcons_getc_poll;
> +}
^ permalink raw reply
* Re: [PATCH] powerpc: Fix missing doorbell IPIs during nap power saving
From: Benjamin Herrenschmidt @ 2013-04-29 9:05 UTC (permalink / raw)
To: Ian Munsie; +Cc: Michael Neuling, linuxppc-dev, stable
In-Reply-To: <1367223460-4931-1-git-send-email-imunsie@au1.ibm.com>
On Mon, 2013-04-29 at 18:17 +1000, Ian Munsie wrote:
> This is not an issue with other interrupts that can wake a thread
> (external, decrementer) as they are level sensitive and will continue to
> be asserted by the hardware.
However, it might be worth experimenting setting the corresponding
bits in the PACA as well, as this will diminish the latency of
processing them.
IE. local_irq_enable() will see the bit, generate an interrupt
frame and call the handler which is a faster path than re-enabling
MSR:EE and then taking an external interrupt.
Cheers,
Ben.
^ permalink raw reply
* [PATCH] powerpc: Fix missing doorbell IPIs during nap power saving
From: Ian Munsie @ 2013-04-29 8:17 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Michael Neuling, linuxppc-dev, Ian Munsie, stable
From: Ian Munsie <imunsie@au1.ibm.com>
If a doorbell IPI comes in while a thread is in nap power saving, the
doorbell interrupt won't be replayed by the hardware since it is edge
sensitive. Currently we are not replaying these interrupts in software,
which can cause threads to miss IPIs that come in during power saving
and eventually will result in an RCU warning from rcu_sched that it has
detected a stalled CPU.
This patch fixes the issue by testing if a doorbell caused the thread to
come out of power saving and sets the corresponding bit in the paca to
indicate a doorbell happened, which will then be handled by the existing
interrupt replay code.
This is not an issue with other interrupts that can wake a thread
(external, decrementer) as they are level sensitive and will continue to
be asserted by the hardware.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
arch/powerpc/include/asm/reg.h | 5 ++++-
arch/powerpc/kernel/idle_power7.S | 23 +++++++++++++++++++----
2 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 5c6fbe2..dd8a071 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -549,14 +549,17 @@
#define SRR1_ISI_NOPT 0x40000000 /* ISI: Not found in hash */
#define SRR1_ISI_N_OR_G 0x10000000 /* ISI: Access is no-exec or G */
#define SRR1_ISI_PROT 0x08000000 /* ISI: Other protection fault */
-#define SRR1_WAKEMASK 0x00380000 /* reason for wakeup */
+#define SRR1_WAKEMASK 0x003c0000 /* reason for wakeup [42:45] */
+#define SRR1_WAKESHIFT 18
#define SRR1_WAKESYSERR 0x00300000 /* System error */
#define SRR1_WAKEEE 0x00200000 /* External interrupt */
#define SRR1_WAKEMT 0x00280000 /* mtctrl */
#define SRR1_WAKEHMI 0x00280000 /* Hypervisor maintenance */
#define SRR1_WAKEDEC 0x00180000 /* Decrementer interrupt */
+#define SRR1_WAKEDBELLP 0x00140000 /* Privileged Doorbell */
#define SRR1_WAKETHERM 0x00100000 /* Thermal management interrupt */
#define SRR1_WAKERESET 0x00100000 /* System reset */
+#define SRR1_WAKEDBELLH 0x000c0000 /* Hypervisor Doorbell */
#define SRR1_WAKESTATE 0x00030000 /* Powersave exit mask [46:47] */
#define SRR1_WS_DEEPEST 0x00030000 /* Some resources not maintained,
* may not be recoverable */
diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S
index e11863f..c80bb4b 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -108,9 +108,7 @@ _GLOBAL(power7_wakeup_loss)
ld r5,_NIP(r1)
addi r1,r1,INT_FRAME_SIZE
mtcr r3
- mtspr SPRN_SRR1,r4
- mtspr SPRN_SRR0,r5
- rfid
+ b power7_wakeup_common
_GLOBAL(power7_wakeup_noloss)
lbz r0,PACA_NAPSTATELOST(r13)
@@ -120,6 +118,23 @@ _GLOBAL(power7_wakeup_noloss)
ld r4,_MSR(r1)
ld r5,_NIP(r1)
addi r1,r1,INT_FRAME_SIZE
- mtspr SPRN_SRR1,r4
+ /* Fall through */
+
+power7_wakeup_common:
+BEGIN_FTR_SECTION
+ mfspr r3,SPRN_SRR1
+ extrdi r3,r3,4,42 /* Extract SRR1_WAKEMASK */
+ cmpwi r3,SRR1_WAKEDBELLH >> SRR1_WAKESHIFT
+ beq 1f
+ cmpwi r3,SRR1_WAKEDBELLP >> SRR1_WAKESHIFT
+ bne 2f
+
+1: /* Woken by a doorbell, set doorbell happened in paca */
+ lbz r3,PACAIRQHAPPENED(r13)
+ ori r3,r3,PACA_IRQ_DBELL
+ stb r3,PACAIRQHAPPENED(r13)
+END_FTR_SECTION_IFSET(CPU_FTR_DBELL)
+
+2: mtspr SPRN_SRR1,r4
mtspr SPRN_SRR0,r5
rfid
--
1.7.10.4
^ permalink raw reply related
* [PATCH] powerpc: Add an in memory udbg console
From: Alistair Popple @ 2013-04-29 6:24 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Alistair Popple
This patch adds a new udbg early debug console which utilises
statically defined input and output buffers stored within the kernel
BSS. It is primarily designed to assist with bring up of new hardware
which may not have a working console but which has a method of
reading/writing kernel memory.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
---
arch/powerpc/Kconfig.debug | 23 ++++++++++
arch/powerpc/include/asm/udbg.h | 1 +
arch/powerpc/kernel/udbg.c | 3 ++
arch/powerpc/sysdev/Makefile | 2 +
arch/powerpc/sysdev/udbg_memcons.c | 85 ++++++++++++++++++++++++++++++++++++
5 files changed, 114 insertions(+)
create mode 100644 arch/powerpc/sysdev/udbg_memcons.c
diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 5416e28..863d877 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -262,8 +262,31 @@ config PPC_EARLY_DEBUG_OPAL_HVSI
Select this to enable early debugging for the PowerNV platform
using an "hvsi" console
+config PPC_EARLY_DEBUG_MEMCONS
+ bool "In memory console"
+ help
+ Select this to enable early debugging using an in memory console.
+ This console provides input and output buffers stored within the
+ kernel BSS and should be safe to select on any system. A debugger
+ can then be used to read kernel output or send input to the console.
endchoice
+config PPC_MEMCONS_OUTPUT_SIZE
+ int "In memory console output buffer size"
+ depends on PPC_EARLY_DEBUG_MEMCONS
+ default 4096
+ help
+ Selects the size of the output buffer (in bytes) of the in memory
+ console.
+
+config PPC_MEMCONS_INPUT_SIZE
+ int "In memory console input buffer size"
+ depends on PPC_EARLY_DEBUG_MEMCONS
+ default 128
+ help
+ Selects the size of the input buffer (in bytes) of the in memory
+ console.
+
config PPC_EARLY_DEBUG_OPAL
def_bool y
depends on PPC_EARLY_DEBUG_OPAL_RAW || PPC_EARLY_DEBUG_OPAL_HVSI
diff --git a/arch/powerpc/include/asm/udbg.h b/arch/powerpc/include/asm/udbg.h
index 5a7510e..dc59091 100644
--- a/arch/powerpc/include/asm/udbg.h
+++ b/arch/powerpc/include/asm/udbg.h
@@ -52,6 +52,7 @@ extern void __init udbg_init_40x_realmode(void);
extern void __init udbg_init_cpm(void);
extern void __init udbg_init_usbgecko(void);
extern void __init udbg_init_wsp(void);
+extern void __init udbg_init_memcons(void);
extern void __init udbg_init_ehv_bc(void);
extern void __init udbg_init_ps3gelic(void);
extern void __init udbg_init_debug_opal_raw(void);
diff --git a/arch/powerpc/kernel/udbg.c b/arch/powerpc/kernel/udbg.c
index f974849..efa5c93 100644
--- a/arch/powerpc/kernel/udbg.c
+++ b/arch/powerpc/kernel/udbg.c
@@ -64,6 +64,9 @@ void __init udbg_early_init(void)
udbg_init_usbgecko();
#elif defined(CONFIG_PPC_EARLY_DEBUG_WSP)
udbg_init_wsp();
+#elif defined(CONFIG_PPC_EARLY_DEBUG_MEMCONS)
+ /* In memory console */
+ udbg_init_memcons();
#elif defined(CONFIG_PPC_EARLY_DEBUG_EHV_BC)
udbg_init_ehv_bc();
#elif defined(CONFIG_PPC_EARLY_DEBUG_PS3GELIC)
diff --git a/arch/powerpc/sysdev/Makefile b/arch/powerpc/sysdev/Makefile
index b0a518e..99464a7 100644
--- a/arch/powerpc/sysdev/Makefile
+++ b/arch/powerpc/sysdev/Makefile
@@ -64,6 +64,8 @@ endif
obj-$(CONFIG_PPC_SCOM) += scom.o
+obj-$(CONFIG_PPC_EARLY_DEBUG_MEMCONS) += udbg_memcons.o
+
subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
obj-$(CONFIG_PPC_XICS) += xics/
diff --git a/arch/powerpc/sysdev/udbg_memcons.c b/arch/powerpc/sysdev/udbg_memcons.c
new file mode 100644
index 0000000..36f5c44
--- /dev/null
+++ b/arch/powerpc/sysdev/udbg_memcons.c
@@ -0,0 +1,85 @@
+/*
+ * A udbg backend which logs messages and reads input from in memory
+ * buffers.
+ *
+ * Copyright (C) 2003-2005 Anton Blanchard and Milton Miller, IBM Corp
+ * Copyright (C) 2013 Alistair Popple, IBM Corp
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <asm/page.h>
+#include <asm/processor.h>
+#include <asm/udbg.h>
+
+struct memcons {
+ char *output_start;
+ char *output_pos;
+ char *output_end;
+ char *input_start;
+ char *input_pos;
+ char *input_end;
+};
+
+static char memcons_output[CONFIG_PPC_MEMCONS_OUTPUT_SIZE];
+static char memcons_input[CONFIG_PPC_MEMCONS_INPUT_SIZE];
+
+struct memcons memcons = {
+ .output_start = memcons_output,
+ .output_pos = memcons_output,
+ .output_end = &memcons_output[CONFIG_PPC_MEMCONS_OUTPUT_SIZE],
+ .input_start = memcons_input,
+ .input_pos = memcons_input,
+ .input_end = &memcons_input[CONFIG_PPC_MEMCONS_INPUT_SIZE],
+};
+
+void memcons_putc(char c)
+{
+ *memcons.output_pos++ = c;
+ if (memcons.output_pos >= memcons.output_end)
+ memcons.output_pos = memcons.output_start;
+}
+
+int memcons_getc_poll(void)
+{
+ char c;
+
+ if (*memcons.input_pos) {
+ c = *memcons.input_pos;
+ *memcons.input_pos++ = 0;
+
+ if (*memcons.input_pos == '\0' || memcons.input_pos >= memcons.input_end)
+ memcons.input_pos = memcons.input_start;
+
+ return c;
+ }
+
+ return -1;
+}
+
+int memcons_getc(void)
+{
+ int c;
+
+ while (1) {
+ c = memcons_getc_poll();
+ if (c == -1)
+ cpu_relax();
+ else
+ break;
+ }
+
+ return c;
+}
+
+void udbg_init_memcons(void)
+{
+ udbg_putc = memcons_putc;
+ udbg_getc = memcons_getc;
+ udbg_getc_poll = memcons_getc_poll;
+}
--
1.7.10.4
^ permalink raw reply related
* [PATCH] powerpc/rtas_flash: Fix bad memory access
From: Vasant Hegde @ 2013-04-29 4:43 UTC (permalink / raw)
To: linuxppc-dev; +Cc: paulus, linux-kernel
We use kmem_cache_alloc() to allocate memory to hold the new firmware
which will be flashed. kmem_cache_alloc() calls rtas_block_ctor() to
set memory to NULL. But these constructor is called only for newly
allocated slabs.
If we run below command multiple time without rebooting, allocator may
allocate memory from the area which was free'd by kmem_cache_free and
it will not call constructor. In this situation we may hit kernel oops.
dd if=<fw image> of=/proc/ppc64/rtas/firmware_flash bs=4096
oops message:
-------------
[ 1602.399755] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1602.399772] SMP NR_CPUS=1024 NUMA pSeries
[ 1602.399779] Modules linked in: rtas_flash nfsd lockd auth_rpcgss nfs_acl sunrpc fuse loop dm_mod sg ipv6 ses enclosure ehea ehci_pci ohci_hcd ehci_hcd usbcore sd_mod usb_common crc_t10dif scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac scsi_dh ipr libata scsi_mod
[ 1602.399817] NIP: d00000000a170b9c LR: d00000000a170b64 CTR: c00000000079cd58
[ 1602.399823] REGS: c0000003b9937930 TRAP: 0300 Not tainted (3.9.0-rc4-0.27-ppc64)
[ 1602.399828] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 22000428 XER: 20000000
[ 1602.399841] SOFTE: 1
[ 1602.399844] CFAR: c000000000005f24
[ 1602.399848] DAR: 8c2625a820631fef, DSISR: 40000000
[ 1602.399852] TASK = c0000003b4520760[3655] 'dd' THREAD: c0000003b9934000 CPU: 3
GPR00: 8c2625a820631fe7 c0000003b9937bb0 d00000000a179f28 d00000000a171f08
GPR04: 0000000010040000 0000000000001000 c0000003b9937df0 c0000003b5fb2080
GPR08: c0000003b58f7200 d00000000a179f28 c0000003b40058d4 c00000000079cd58
GPR12: d00000000a171450 c000000007f40900 0000000000000005 0000000010178d20
GPR16: 00000000100cb9d8 000000000000001d 0000000000000000 000000001003ffff
GPR20: 0000000000000001 0000000000000000 00003fffa0b50d30 000000001001f010
GPR24: 0000000010020888 0000000010040000 d00000000a171f08 d00000000a172808
GPR28: 0000000000001000 0000000010040000 c0000003b4005880 8c2625a820631fe7
[ 1602.399924] NIP [d00000000a170b9c] .rtas_flash_write+0x7c/0x1e8 [rtas_flash]
[ 1602.399930] LR [d00000000a170b64] .rtas_flash_write+0x44/0x1e8 [rtas_flash]
[ 1602.399934] Call Trace:
[ 1602.399939] [c0000003b9937bb0] [d00000000a170b64] .rtas_flash_write+0x44/0x1e8 [rtas_flash] (unreliable)
[ 1602.399948] [c0000003b9937c60] [c000000000282830] .proc_reg_write+0x90/0xe0
[ 1602.399955] [c0000003b9937ce0] [c0000000001ff374] .vfs_write+0x114/0x238
[ 1602.399961] [c0000003b9937d80] [c0000000001ff5d8] .SyS_write+0x70/0xe8
[ 1602.399968] [c0000003b9937e30] [c000000000009cdc] syscall_exit+0x0/0xa0
[ 1602.399973] Instruction dump:
[ 1602.399977] eb698010 801b0028 2f80dcd6 419e00a4 2fbc0000 419e009c ebfb0030 2fbf0000
[ 1602.399989] 409e0010 480000d8 60000000 7c1f0378 <e81f0008> 2fa00000 409efff4 e81f0000
[ 1602.400012] ---[ end trace b4136d115dc31dac ]---
[ 1602.402178]
[ 1602.402185] Sending IPI to other CPUs
[ 1602.403329] IPI complete
This patch uses kmem_cache_zalloc() instead of kmem_cache_alloc() to
allocate memory, which makes sure memory is set to 0 before using.
Also removes rtas_block_ctor(), which is no longer required.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
---
arch/powerpc/kernel/rtas_flash.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/kernel/rtas_flash.c b/arch/powerpc/kernel/rtas_flash.c
index 0a08c3b..d61a94c 100644
--- a/arch/powerpc/kernel/rtas_flash.c
+++ b/arch/powerpc/kernel/rtas_flash.c
@@ -286,12 +286,6 @@ static ssize_t rtas_flash_read(struct file *file, char __user *buf,
return simple_read_from_buffer(buf, count, ppos, msg, strlen(msg));
}
-/* constructor for flash_block_cache */
-void rtas_block_ctor(void *ptr)
-{
- memset(ptr, 0, RTAS_BLK_SIZE);
-}
-
/* We could be much more efficient here. But to keep this function
* simple we allocate a page to the block list no matter how small the
* count is. If the system is low on memory it will be just as well
@@ -316,7 +310,7 @@ static ssize_t rtas_flash_write(struct file *file, const char __user *buffer,
* proc file
*/
if (uf->flist == NULL) {
- uf->flist = kmem_cache_alloc(flash_block_cache, GFP_KERNEL);
+ uf->flist = kmem_cache_zalloc(flash_block_cache, GFP_KERNEL);
if (!uf->flist)
return -ENOMEM;
}
@@ -327,7 +321,7 @@ static ssize_t rtas_flash_write(struct file *file, const char __user *buffer,
next_free = fl->num_blocks;
if (next_free == FLASH_BLOCKS_PER_NODE) {
/* Need to allocate another block_list */
- fl->next = kmem_cache_alloc(flash_block_cache, GFP_KERNEL);
+ fl->next = kmem_cache_zalloc(flash_block_cache, GFP_KERNEL);
if (!fl->next)
return -ENOMEM;
fl = fl->next;
@@ -336,7 +330,7 @@ static ssize_t rtas_flash_write(struct file *file, const char __user *buffer,
if (count > RTAS_BLK_SIZE)
count = RTAS_BLK_SIZE;
- p = kmem_cache_alloc(flash_block_cache, GFP_KERNEL);
+ p = kmem_cache_zalloc(flash_block_cache, GFP_KERNEL);
if (!p)
return -ENOMEM;
@@ -789,7 +783,7 @@ static int __init rtas_flash_init(void)
flash_block_cache = kmem_cache_create("rtas_flash_cache",
RTAS_BLK_SIZE, RTAS_BLK_SIZE, 0,
- rtas_block_ctor);
+ NULL);
if (!flash_block_cache) {
printk(KERN_ERR "%s: failed to create block cache\n",
__func__);
^ permalink raw reply related
* linux-next: build failure after merge of the cgroup tree
From: Stephen Rothwell @ 2013-04-29 4:04 UTC (permalink / raw)
To: Tejun Heo
Cc: linux-kernel, Li Zefan, linux-next, Nathan Fontenot, linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 2216 bytes --]
Hi Tejun,
After merging the cgroup tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:
arch/powerpc/mm/numa.c: In function 'arch_update_cpu_topology':
arch/powerpc/mm/numa.c:1465:2: error: implicit declaration of function 'kzalloc' [-Werror=implicit-function-declaration]
arch/powerpc/mm/numa.c:1465:10: error: assignment makes pointer from integer without a cast [-Werror]
arch/powerpc/mm/numa.c:1497:2: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration]
Caused by commit 30c05350c39d ("powerpc/pseries: Use stop machine to
update cpu maps") from the powerpc tree interacting with (probably)
commit ff794dea52ea ("cpuset: remove include of cgroup.h from cpuset.h")
from the cgroup tree. Removing includes from header files is fraught
with danger ...
The former should have added an include of linux/slab.h to
arch/powerpc/mm/numa.c.
I have added the following merge fix patch for today (but it should be
applied to the powerpc tree ASAP).
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Mon, 29 Apr 2013 14:01:44 +1000
Subject: [PATCH] powerpc: numa.c: using kzalloc/kfree requires including
slab.h
fixes these build errors:
arch/powerpc/mm/numa.c: In function 'arch_update_cpu_topology':
arch/powerpc/mm/numa.c:1465:2: error: implicit declaration of function 'kzalloc' [-Werror=implicit-function-declaration]
arch/powerpc/mm/numa.c:1465:10: error: assignment makes pointer from integer without a cast [-Werror]
arch/powerpc/mm/numa.c:1497:2: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration]
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
arch/powerpc/mm/numa.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 2d13f90..490e39c 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -26,6 +26,7 @@
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/uaccess.h>
+#include <linux/slab.h>
#include <asm/sparsemem.h>
#include <asm/prom.h>
#include <asm/smp.h>
--
1.8.1
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply related
* [PATCH 2/2] powerpc: Update default configurations
From: Alistair Popple @ 2013-04-29 3:42 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Alistair Popple
In-Reply-To: <1367206964-6319-1-git-send-email-alistair@popple.id.au>
Update default configurations for systems with CONFIG_BOOTX_TEXT
selected so that they continue to print early debug messages as is
currently the case.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
---
arch/powerpc/configs/c2k_defconfig | 2 ++
arch/powerpc/configs/g5_defconfig | 2 ++
arch/powerpc/configs/maple_defconfig | 2 ++
arch/powerpc/configs/pmac32_defconfig | 2 ++
arch/powerpc/configs/ppc64_defconfig | 2 ++
arch/powerpc/configs/ppc6xx_defconfig | 2 ++
6 files changed, 12 insertions(+)
diff --git a/arch/powerpc/configs/c2k_defconfig b/arch/powerpc/configs/c2k_defconfig
index 2a84fd7..671a8f9 100644
--- a/arch/powerpc/configs/c2k_defconfig
+++ b/arch/powerpc/configs/c2k_defconfig
@@ -423,6 +423,8 @@ CONFIG_SYSCTL_SYSCALL_CHECK=y
CONFIG_DEBUG_STACKOVERFLOW=y
CONFIG_DEBUG_STACK_USAGE=y
CONFIG_BOOTX_TEXT=y
+CONFIG_PPC_EARLY_DEBUG=y
+CONFIG_PPC_EARLY_DEBUG_BOOTX=y
CONFIG_KEYS=y
CONFIG_KEYS_DEBUG_PROC_KEYS=y
CONFIG_SECURITY=y
diff --git a/arch/powerpc/configs/g5_defconfig b/arch/powerpc/configs/g5_defconfig
index 07b7f2a..1ea22fc 100644
--- a/arch/powerpc/configs/g5_defconfig
+++ b/arch/powerpc/configs/g5_defconfig
@@ -284,6 +284,8 @@ CONFIG_DEBUG_MUTEXES=y
CONFIG_LATENCYTOP=y
CONFIG_SYSCTL_SYSCALL_CHECK=y
CONFIG_BOOTX_TEXT=y
+CONFIG_PPC_EARLY_DEBUG=y
+CONFIG_PPC_EARLY_DEBUG_BOOTX=y
CONFIG_CRYPTO_NULL=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_ECB=m
diff --git a/arch/powerpc/configs/maple_defconfig b/arch/powerpc/configs/maple_defconfig
index 02ac96b..2a5afac 100644
--- a/arch/powerpc/configs/maple_defconfig
+++ b/arch/powerpc/configs/maple_defconfig
@@ -138,6 +138,8 @@ CONFIG_DEBUG_STACK_USAGE=y
CONFIG_XMON=y
CONFIG_XMON_DEFAULT=y
CONFIG_BOOTX_TEXT=y
+CONFIG_PPC_EARLY_DEBUG=y
+CONFIG_PPC_EARLY_DEBUG_BOOTX=y
CONFIG_CRYPTO_ECB=m
CONFIG_CRYPTO_PCBC=m
# CONFIG_CRYPTO_ANSI_CPRNG is not set
diff --git a/arch/powerpc/configs/pmac32_defconfig b/arch/powerpc/configs/pmac32_defconfig
index 29767a8..a73626b 100644
--- a/arch/powerpc/configs/pmac32_defconfig
+++ b/arch/powerpc/configs/pmac32_defconfig
@@ -350,6 +350,8 @@ CONFIG_SYSCTL_SYSCALL_CHECK=y
CONFIG_XMON=y
CONFIG_XMON_DEFAULT=y
CONFIG_BOOTX_TEXT=y
+CONFIG_PPC_EARLY_DEBUG=y
+CONFIG_PPC_EARLY_DEBUG_BOOTX=y
CONFIG_CRYPTO_NULL=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_MD4=m
diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig
index aef3f71..c86fcb9 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -398,6 +398,8 @@ CONFIG_FTR_FIXUP_SELFTEST=y
CONFIG_MSI_BITMAP_SELFTEST=y
CONFIG_XMON=y
CONFIG_BOOTX_TEXT=y
+CONFIG_PPC_EARLY_DEBUG=y
+CONFIG_PPC_EARLY_DEBUG_BOOTX=y
CONFIG_CRYPTO_NULL=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_PCBC=m
diff --git a/arch/powerpc/configs/ppc6xx_defconfig b/arch/powerpc/configs/ppc6xx_defconfig
index be1cb6e..20ebfaf 100644
--- a/arch/powerpc/configs/ppc6xx_defconfig
+++ b/arch/powerpc/configs/ppc6xx_defconfig
@@ -1264,6 +1264,8 @@ CONFIG_DEBUG_STACKOVERFLOW=y
CONFIG_DEBUG_STACK_USAGE=y
CONFIG_XMON=y
CONFIG_BOOTX_TEXT=y
+CONFIG_PPC_EARLY_DEBUG=y
+CONFIG_PPC_EARLY_DEBUG_BOOTX=y
CONFIG_KEYS=y
CONFIG_KEYS_DEBUG_PROC_KEYS=y
CONFIG_SECURITY=y
--
1.7.10.4
^ permalink raw reply related
* [PATCH 1/2] powerpc: Add a configuration option for early BootX/OpenFirmware debug
From: Alistair Popple @ 2013-04-29 3:42 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Alistair Popple
In-Reply-To: <1367206964-6319-1-git-send-email-alistair@popple.id.au>
Signed-off-by: Alistair Popple <alistair@popple.id.au>
---
arch/powerpc/Kconfig.debug | 7 +++++++
arch/powerpc/kernel/udbg.c | 2 +-
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 5416e28..e826853 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -147,6 +147,13 @@ choice
enable debugging for the wrong type of machine your kernel
_will not boot_.
+config PPC_EARLY_DEBUG_BOOTX
+ bool "BootX or OpenFirmware"
+ depends on BOOTX_TEXT
+ help
+ Select this to enable early debugging for a machine using BootX
+ or OpenFirmware.
+
config PPC_EARLY_DEBUG_LPAR
bool "LPAR HV Console"
depends on PPC_PSERIES
diff --git a/arch/powerpc/kernel/udbg.c b/arch/powerpc/kernel/udbg.c
index f974849..9bbf07b 100644
--- a/arch/powerpc/kernel/udbg.c
+++ b/arch/powerpc/kernel/udbg.c
@@ -50,7 +50,7 @@ void __init udbg_early_init(void)
udbg_init_debug_beat();
#elif defined(CONFIG_PPC_EARLY_DEBUG_PAS_REALMODE)
udbg_init_pas_realmode();
-#elif defined(CONFIG_BOOTX_TEXT)
+#elif defined(CONFIG_PPC_EARLY_DEBUG_BOOTX)
udbg_init_btext();
#elif defined(CONFIG_PPC_EARLY_DEBUG_44x)
/* PPC44x debug */
--
1.7.10.4
^ permalink raw reply related
* [PATCH 0/2] powerpc: Early debug console configuration clean up
From: Alistair Popple @ 2013-04-29 3:42 UTC (permalink / raw)
To: linuxppc-dev
This patch series (based on v3.9) adds a configuration option to
enable the use of BootX or OpenFirmware console for early debug
console support.
Previously the use of BootX/OpenFirmware as an early debug console was
selected by CONFIG_BOOTX. However this left the ability to select a
different early debug console under the "Early debugging" menu which
was not supported by udbg.c. Instead the actual console selected when
both CONFIG_BOOTX and another early debug console is conifigured was
based on the arbitrary order of a series of #if/#elif's in udbg.c.
This patch replaces the previously sent "[PATCH] Make CONFIG_BOOTX an
exclusive early debug option".
^ permalink raw reply
* Re: linux-next: manual merge of the vfs tree with the powerpc tree
From: Benjamin Herrenschmidt @ 2013-04-29 3:40 UTC (permalink / raw)
To: Stephen Rothwell
Cc: Vasant Hegde, linux-kernel, David Howells, linux-next, Al Viro,
linuxppc-dev
In-Reply-To: <20130429113540.55b2593d303bc2a9142919fc@canb.auug.org.au>
On Mon, 2013-04-29 at 11:35 +1000, Stephen Rothwell wrote:
> Today's linux-next merge of the vfs tree got a conflict in
> arch/powerpc/kernel/rtas_flash.c between commit ad18a364f186
> ("powerpc/rtas_flash: Free kmem upon module exit"), from the powerpc
> tree and commit 5c0333c00ff6 ("ppc: Clean up rtas_flash driver
> somewhat")
> from the vfs tree.
>
> I fixed it up (see below) and can carry the fix as necessary (no
> action
> is required).
Please don't merge Dave's rework until it's been acked & fully tested by
us. AFAIK, there are outstanding comments from my guys who
reviewed/tested the first pass.
Cheers,
Ben.
^ permalink raw reply
* [PATCH] powerpc: Align p_toc
From: Anton Blanchard @ 2013-04-29 1:52 UTC (permalink / raw)
To: benh, paulus; +Cc: linuxppc-dev
p_toc is an 8 byte relative offset to the TOC that we place in the
text section. This means it is only 4 byte aligned where it should
be 8 byte aligned. Add an explicit alignment.
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Index: b/arch/powerpc/kernel/head_64.S
===================================================================
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -703,6 +703,7 @@ _GLOBAL(relative_toc)
mtlr r0
blr
+.balign 8
p_toc: .llong __toc_start + 0x8000 - 0b
/*
^ permalink raw reply
* linux-next: manual merge of the vfs tree with the powerpc tree
From: Stephen Rothwell @ 2013-04-29 1:35 UTC (permalink / raw)
To: Al Viro; +Cc: Vasant Hegde, linux-kernel, David Howells, linux-next,
linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 1427 bytes --]
Hi Al,
Today's linux-next merge of the vfs tree got a conflict in
arch/powerpc/kernel/rtas_flash.c between commit ad18a364f186
("powerpc/rtas_flash: Free kmem upon module exit"), from the powerpc
tree and commit 5c0333c00ff6 ("ppc: Clean up rtas_flash driver somewhat")
from the vfs tree.
I fixed it up (see below) and can carry the fix as necessary (no action
is required).
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
diff --cc arch/powerpc/kernel/rtas_flash.c
index a3e4034,8196bfb..0000000
--- a/arch/powerpc/kernel/rtas_flash.c
+++ b/arch/powerpc/kernel/rtas_flash.c
@@@ -806,20 -740,17 +758,22 @@@ enomem_buf
static void __exit rtas_flash_cleanup(void)
{
+ int i;
+
rtas_flash_term_hook = NULL;
+ if (rtas_firmware_flash_list) {
+ free_flash_list(rtas_firmware_flash_list);
+ rtas_firmware_flash_list = NULL;
+ }
+
- if (flash_block_cache)
- kmem_cache_destroy(flash_block_cache);
+ for (i = 0; i < ARRAY_SIZE(rtas_flash_files); i++) {
+ const struct rtas_flash_file *f = &rtas_flash_files[i];
+ remove_proc_entry(f->filename, NULL);
+ }
- remove_flash_pde(firmware_flash_pde);
- remove_flash_pde(firmware_update_pde);
- remove_flash_pde(validate_pde);
- remove_flash_pde(manage_pde);
+ kmem_cache_destroy(flash_block_cache);
+ kfree(rtas_validate_flash_data.buf);
}
module_init(rtas_flash_init);
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply
* [PATCH -V7 10/10] powerpc: disable assert_pte_locked
From: Aneesh Kumar K.V @ 2013-04-28 19:51 UTC (permalink / raw)
To: benh, paulus, dwg, linux-mm; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1367178711-8232-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
With THP we set pmd to none, before we do pte_clear. Hence we can't
walk page table to get the pte lock ptr and verify whether it is locked.
THP do take pte lock before calling pte_clear. So we don't change the locking
rules here. It is that we can't use page table walking to check whether
pte locks are help with THP.
NOTE: This needs to be re-written. Not to be merged upstream.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/mm/pgtable.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 214130a..d77f94f 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -224,6 +224,7 @@ int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
#ifdef CONFIG_DEBUG_VM
void assert_pte_locked(struct mm_struct *mm, unsigned long addr)
{
+#if 0
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
@@ -237,6 +238,7 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr)
pmd = pmd_offset(pud, addr);
BUG_ON(!pmd_present(*pmd));
assert_spin_locked(pte_lockptr(mm, pmd));
+#endif
}
#endif /* CONFIG_DEBUG_VM */
--
1.8.1.2
^ permalink raw reply related
* [PATCH -V7 04/10] powerpc: Update find_linux_pte_or_hugepte to handle transparent hugepages
From: Aneesh Kumar K.V @ 2013-04-28 19:51 UTC (permalink / raw)
To: benh, paulus, dwg, linux-mm; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1367178711-8232-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/mm/hugetlbpage.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 8601f2d..081c001 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -954,7 +954,7 @@ pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift
pdshift = PMD_SHIFT;
pm = pmd_offset(pu, ea);
- if (pmd_huge(*pm)) {
+ if (pmd_huge(*pm) || pmd_large(*pm)) {
ret_pte = (pte_t *) pm;
goto out;
} else if (is_hugepd(pm))
--
1.8.1.2
^ permalink raw reply related
* [PATCH -V7 07/10] powerpc/THP: Add code to handle HPTE faults for large pages
From: Aneesh Kumar K.V @ 2013-04-28 19:51 UTC (permalink / raw)
To: benh, paulus, dwg, linux-mm; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1367178711-8232-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
The deposted PTE page in the second half of the PMD table is used to
track the state on hash PTEs. After updating the HPTE, we mark the
coresponding slot in the deposted PTE page valid.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/mmu-hash64.h | 13 +++
arch/powerpc/mm/Makefile | 1 +
arch/powerpc/mm/hash_utils_64.c | 13 ++-
arch/powerpc/mm/hugepage-hash64.c | 180 ++++++++++++++++++++++++++++++++++
4 files changed, 203 insertions(+), 4 deletions(-)
create mode 100644 arch/powerpc/mm/hugepage-hash64.c
diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index 2accc96..3d6fbb0 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -340,6 +340,19 @@ extern int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
pte_t *ptep, unsigned long trap, int local, int ssize,
unsigned int shift, unsigned int mmu_psize);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+extern int __hash_page_thp(unsigned long ea, unsigned long access,
+ unsigned long vsid, pmd_t *pmdp, unsigned long trap,
+ int local, int ssize, unsigned int psize);
+#else
+static inline int __hash_page_thp(unsigned long ea, unsigned long access,
+ unsigned long vsid, pmd_t *pmdp,
+ unsigned long trap, int local,
+ int ssize, unsigned int psize)
+{
+ BUG();
+}
+#endif
extern void hash_failure_debug(unsigned long ea, unsigned long access,
unsigned long vsid, unsigned long trap,
int ssize, int psize, int lpsize,
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index fde36e6..87671eb 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -33,6 +33,7 @@ ifeq ($(CONFIG_HUGETLB_PAGE),y)
obj-$(CONFIG_PPC_STD_MMU_64) += hugetlbpage-hash64.o
obj-$(CONFIG_PPC_BOOK3E_MMU) += hugetlbpage-book3e.o
endif
+obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += hugepage-hash64.o
obj-$(CONFIG_PPC_SUBPAGE_PROT) += subpage-prot.o
obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o
obj-$(CONFIG_HIGHMEM) += highmem.o
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index e942ae9..cea7267 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1041,11 +1041,16 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
return 1;
}
+ if (hugeshift) {
+ if (pmd_trans_huge((pmd_t) *ptep))
+ return __hash_page_thp(ea, access, vsid, (pmd_t *)ptep,
+ trap, local, ssize, psize);
#ifdef CONFIG_HUGETLB_PAGE
- if (hugeshift)
- return __hash_page_huge(ea, access, vsid, ptep, trap, local,
- ssize, hugeshift, psize);
-#endif /* CONFIG_HUGETLB_PAGE */
+ else
+ return __hash_page_huge(ea, access, vsid, ptep, trap,
+ local, ssize, hugeshift, psize);
+#endif
+ }
#ifndef CONFIG_PPC_64K_PAGES
DBG_LOW(" i-pte: %016lx\n", pte_val(*ptep));
diff --git a/arch/powerpc/mm/hugepage-hash64.c b/arch/powerpc/mm/hugepage-hash64.c
new file mode 100644
index 0000000..340962a
--- /dev/null
+++ b/arch/powerpc/mm/hugepage-hash64.c
@@ -0,0 +1,180 @@
+/*
+ * Copyright IBM Corporation, 2013
+ * Author Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+/*
+ * PPC64 THP Support for hash based MMUs
+ */
+#include <linux/mm.h>
+#include <asm/machdep.h>
+
+/*
+ * The linux hugepage PMD now include the pmd entries followed by the address
+ * to the stashed pgtable_t. The stashed pgtable_t contains the hpte bits.
+ * [ secondary group | 3 bit hidx | valid ]. We use one byte per each HPTE entry.
+ * With 16MB hugepage and 64K HPTE we need 256 entries and with 4K HPTE we need
+ * 4096 entries. Both will fit in a 4K pgtable_t.
+ */
+int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid,
+ pmd_t *pmdp, unsigned long trap, int local, int ssize,
+ unsigned int psize)
+{
+ unsigned int index, valid;
+ unsigned char *hpte_slot_array;
+ unsigned long rflags, pa, hidx;
+ unsigned long old_pmd, new_pmd;
+ int ret, lpsize = MMU_PAGE_16M;
+ unsigned long vpn, hash, shift, slot;
+
+ /*
+ * atomically mark the linux large page PMD busy and dirty
+ */
+ do {
+ old_pmd = pmd_val(*pmdp);
+ /* If PMD busy, retry the access */
+ if (unlikely(old_pmd & _PAGE_BUSY))
+ return 0;
+ /* If PMD permissions don't match, take page fault */
+ if (unlikely(access & ~old_pmd))
+ return 1;
+ /*
+ * Try to lock the PTE, add ACCESSED and DIRTY if it was
+ * a write access
+ */
+ new_pmd = old_pmd | _PAGE_BUSY | _PAGE_ACCESSED;
+ if (access & _PAGE_RW)
+ new_pmd |= _PAGE_DIRTY;
+ } while (old_pmd != __cmpxchg_u64((unsigned long *)pmdp,
+ old_pmd, new_pmd));
+ /*
+ * PP bits. _PAGE_USER is already PP bit 0x2, so we only
+ * need to add in 0x1 if it's a read-only user page
+ */
+ rflags = new_pmd & _PAGE_USER;
+ if ((new_pmd & _PAGE_USER) && !((new_pmd & _PAGE_RW) &&
+ (new_pmd & _PAGE_DIRTY)))
+ rflags |= 0x1;
+ /*
+ * _PAGE_EXEC -> HW_NO_EXEC since it's inverted
+ */
+ rflags |= ((new_pmd & _PAGE_EXEC) ? 0 : HPTE_R_N);
+
+#if 0
+ if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) {
+
+ /*
+ * No CPU has hugepages but lacks no execute, so we
+ * don't need to worry about that case
+ */
+ rflags = hash_page_do_lazy_icache(rflags, __pte(old_pte), trap);
+ }
+#endif
+ /*
+ * Find the slot index details for this ea, using base page size.
+ */
+ shift = mmu_psize_defs[psize].shift;
+ index = (ea & (HUGE_PAGE_SIZE - 1)) >> shift;
+ BUG_ON(index >= 4096);
+
+ vpn = hpt_vpn(ea, vsid, ssize);
+ hash = hpt_hash(vpn, shift, ssize);
+ /*
+ * The hpte hindex are stored in the pgtable whose address is in the
+ * second half of the PMD
+ */
+ hpte_slot_array = *(char **)(pmdp + PTRS_PER_PMD);
+
+ valid = hpte_slot_array[index] & 0x1;
+ if (valid) {
+ /* update the hpte bits */
+ hidx = hpte_slot_array[index] >> 1;
+ if (hidx & _PTEIDX_SECONDARY)
+ hash = ~hash;
+ slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+ slot += hidx & _PTEIDX_GROUP_IX;
+
+ ret = ppc_md.hpte_updatepp(slot, rflags, vpn,
+ psize, ssize, local);
+ /*
+ * We failed to update, try to insert a new entry.
+ */
+ if (ret == -1) {
+ /*
+ * large pte is marked busy, so we can be sure
+ * nobody is looking at hpte_slot_array. hence we can
+ * safely update this here.
+ */
+ hpte_slot_array[index] = 0;
+ valid = 0;
+ }
+ }
+
+ if (!valid) {
+ unsigned long hpte_group;
+
+ /* insert new entry */
+ pa = pmd_pfn(__pmd(old_pmd)) << PAGE_SHIFT;
+repeat:
+ hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL;
+
+ /* clear the busy bits and set the hash pte bits */
+ new_pmd = (new_pmd & ~_PAGE_THP_HPTEFLAGS) | _PAGE_HASHPTE;
+
+ /* Add in WIMG bits */
+ rflags |= (new_pmd & (_PAGE_WRITETHRU | _PAGE_NO_CACHE |
+ _PAGE_COHERENT | _PAGE_GUARDED));
+
+ /* Insert into the hash table, primary slot */
+ slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0,
+ psize, lpsize, ssize);
+ /*
+ * Primary is full, try the secondary
+ */
+ if (unlikely(slot == -1)) {
+ hpte_group = ((~hash & htab_hash_mask) *
+ HPTES_PER_GROUP) & ~0x7UL;
+ slot = ppc_md.hpte_insert(hpte_group, vpn, pa,
+ rflags, HPTE_V_SECONDARY,
+ psize, lpsize, ssize);
+ if (slot == -1) {
+ if (mftb() & 0x1)
+ hpte_group = ((hash & htab_hash_mask) *
+ HPTES_PER_GROUP) & ~0x7UL;
+
+ ppc_md.hpte_remove(hpte_group);
+ goto repeat;
+ }
+ }
+ /*
+ * Hypervisor failure. Restore old pmd and return -1
+ * similar to __hash_page_*
+ */
+ if (unlikely(slot == -2)) {
+ *pmdp = __pmd(old_pmd);
+ hash_failure_debug(ea, access, vsid, trap, ssize,
+ psize, lpsize, old_pmd);
+ return -1;
+ }
+ /*
+ * large pte is marked busy, so we can be sure
+ * nobody is looking at hpte_slot_array. hence we can
+ * safely update this here.
+ */
+ hpte_slot_array[index] = slot << 1 | 0x1;
+ }
+ /*
+ * No need to use ldarx/stdcx here
+ */
+ *pmdp = __pmd(new_pmd & ~_PAGE_BUSY);
+ return 0;
+}
--
1.8.1.2
^ permalink raw reply related
* [PATCH -V7 09/10] powerpc: Optimize hugepage invalidate
From: Aneesh Kumar K.V @ 2013-04-28 19:51 UTC (permalink / raw)
To: benh, paulus, dwg, linux-mm; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1367178711-8232-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Hugepage invalidate involves invalidating multiple hpte entries.
Optimize the operation using H_BULK_REMOVE on lpar platforms.
On native, reduce the number of tlb flush.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/machdep.h | 3 +
arch/powerpc/mm/hash_native_64.c | 78 +++++++++++++++++++++
arch/powerpc/mm/pgtable_64.c | 13 +++-
arch/powerpc/platforms/pseries/lpar.c | 126 ++++++++++++++++++++++++++++++++--
4 files changed, 210 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 3f3f691..5d1e7d2 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -56,6 +56,9 @@ struct machdep_calls {
void (*hpte_removebolted)(unsigned long ea,
int psize, int ssize);
void (*flush_hash_range)(unsigned long number, int local);
+ void (*hugepage_invalidate)(struct mm_struct *mm,
+ unsigned char *hpte_slot_array,
+ unsigned long addr, int psize);
/* special for kexec, to be called in real mode, linear mapping is
* destroyed as well */
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 6a2aead..8ca178d 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -455,6 +455,83 @@ static void native_hpte_invalidate(unsigned long slot, unsigned long vpn,
local_irq_restore(flags);
}
+static void native_hugepage_invalidate(struct mm_struct *mm,
+ unsigned char *hpte_slot_array,
+ unsigned long addr, int psize)
+{
+ int ssize = 0, i;
+ int lock_tlbie;
+ struct hash_pte *hptep;
+ int actual_psize = MMU_PAGE_16M;
+ unsigned int max_hpte_count, valid;
+ unsigned long flags, s_addr = addr;
+ unsigned long hpte_v, want_v, shift;
+ unsigned long hidx, vpn = 0, vsid, hash, slot;
+
+ shift = mmu_psize_defs[psize].shift;
+ max_hpte_count = HUGE_PAGE_SIZE >> shift;
+
+ local_irq_save(flags);
+ for (i = 0; i < max_hpte_count; i++) {
+ /*
+ * 8 bits per each hpte entries
+ * 000| [ secondary group (one bit) | hidx (3 bits) | valid bit]
+ */
+ valid = hpte_slot_array[i] & 0x1;
+ if (!valid)
+ continue;
+ hidx = hpte_slot_array[i] >> 1;
+
+ /* get the vpn */
+ addr = s_addr + (i * (1ul << shift));
+ if (!is_kernel_addr(addr)) {
+ ssize = user_segment_size(addr);
+ vsid = get_vsid(mm->context.id, addr, ssize);
+ WARN_ON(vsid == 0);
+ } else {
+ vsid = get_kernel_vsid(addr, mmu_kernel_ssize);
+ ssize = mmu_kernel_ssize;
+ }
+
+ vpn = hpt_vpn(addr, vsid, ssize);
+ hash = hpt_hash(vpn, shift, ssize);
+ if (hidx & _PTEIDX_SECONDARY)
+ hash = ~hash;
+
+ slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+ slot += hidx & _PTEIDX_GROUP_IX;
+
+ hptep = htab_address + slot;
+ want_v = hpte_encode_avpn(vpn, psize, ssize);
+ native_lock_hpte(hptep);
+ hpte_v = hptep->v;
+
+ /* Even if we miss, we need to invalidate the TLB */
+ if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID))
+ native_unlock_hpte(hptep);
+ else
+ /* Invalidate the hpte. NOTE: this also unlocks it */
+ hptep->v = 0;
+ }
+ /*
+ * Since this is a hugepage, we just need a single tlbie.
+ * use the last vpn.
+ */
+ lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
+ if (lock_tlbie)
+ raw_spin_lock(&native_tlbie_lock);
+
+ asm volatile("ptesync":::"memory");
+ __tlbie(vpn, psize, actual_psize, ssize);
+ asm volatile("eieio; tlbsync; ptesync":::"memory");
+
+ if (lock_tlbie)
+ raw_spin_unlock(&native_tlbie_lock);
+
+ local_irq_restore(flags);
+}
+
+
static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
int *psize, int *apsize, int *ssize, unsigned long *vpn)
{
@@ -658,4 +735,5 @@ void __init hpte_init_native(void)
ppc_md.hpte_remove = native_hpte_remove;
ppc_md.hpte_clear_all = native_hpte_clear;
ppc_md.flush_hash_range = native_flush_hash_range;
+ ppc_md.hugepage_invalidate = native_hugepage_invalidate;
}
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index b742d6f..504952f 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -659,6 +659,7 @@ void hpte_need_hugepage_flush(struct mm_struct *mm, unsigned long addr,
{
int ssize, i;
unsigned long s_addr;
+ int max_hpte_count;
unsigned int psize, valid;
unsigned char *hpte_slot_array;
unsigned long hidx, vpn, vsid, hash, shift, slot;
@@ -672,12 +673,18 @@ void hpte_need_hugepage_flush(struct mm_struct *mm, unsigned long addr,
* second half of the PMD
*/
hpte_slot_array = *(char **)(pmdp + PTRS_PER_PMD);
-
/* get the base page size */
psize = get_slice_psize(mm, s_addr);
- shift = mmu_psize_defs[psize].shift;
- for (i = 0; i < (HUGE_PAGE_SIZE >> shift); i++) {
+ if (ppc_md.hugepage_invalidate)
+ return ppc_md.hugepage_invalidate(mm, hpte_slot_array,
+ s_addr, psize);
+ /*
+ * No bluk hpte removal support, invalidate each entry
+ */
+ shift = mmu_psize_defs[psize].shift;
+ max_hpte_count = HUGE_PAGE_SIZE >> shift;
+ for (i = 0; i < max_hpte_count; i++) {
/*
* 8 bits per each hpte entries
* 000| [ secondary group (one bit) | hidx (3 bits) | valid bit]
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 6d62072..58a31db 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -45,6 +45,13 @@
#include "plpar_wrappers.h"
#include "pseries.h"
+/* Flag bits for H_BULK_REMOVE */
+#define HBR_REQUEST 0x4000000000000000UL
+#define HBR_RESPONSE 0x8000000000000000UL
+#define HBR_END 0xc000000000000000UL
+#define HBR_AVPN 0x0200000000000000UL
+#define HBR_ANDCOND 0x0100000000000000UL
+
/* in hvCall.S */
EXPORT_SYMBOL(plpar_hcall);
@@ -345,6 +352,117 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
BUG_ON(lpar_rc != H_SUCCESS);
}
+/*
+ * Limit iterations holding pSeries_lpar_tlbie_lock to 3. We also need
+ * to make sure that we avoid bouncing the hypervisor tlbie lock.
+ */
+#define PPC64_HUGE_HPTE_BATCH 12
+
+static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
+ unsigned long *vpn, int count,
+ int psize, int ssize)
+{
+ unsigned long param[9];
+ int i = 0, pix = 0, rc;
+ unsigned long flags = 0;
+ int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
+
+ if (lock_tlbie)
+ spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
+
+ for (i = 0; i < count; i++) {
+
+ if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) {
+ pSeries_lpar_hpte_invalidate(slot[i], vpn[i], psize,
+ ssize, 0);
+ } else {
+ param[pix] = HBR_REQUEST | HBR_AVPN | slot[i];
+ param[pix+1] = hpte_encode_avpn(vpn[i], psize, ssize);
+ pix += 2;
+ if (pix == 8) {
+ rc = plpar_hcall9(H_BULK_REMOVE, param,
+ param[0], param[1], param[2],
+ param[3], param[4], param[5],
+ param[6], param[7]);
+ BUG_ON(rc != H_SUCCESS);
+ pix = 0;
+ }
+ }
+ }
+ if (pix) {
+ param[pix] = HBR_END;
+ rc = plpar_hcall9(H_BULK_REMOVE, param, param[0], param[1],
+ param[2], param[3], param[4], param[5],
+ param[6], param[7]);
+ BUG_ON(rc != H_SUCCESS);
+ }
+
+ if (lock_tlbie)
+ spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags);
+}
+
+static void pSeries_lpar_hugepage_invalidate(struct mm_struct *mm,
+ unsigned char *hpte_slot_array,
+ unsigned long addr, int psize)
+{
+ int ssize = 0, i, index = 0;
+ unsigned long s_addr = addr;
+ unsigned int max_hpte_count, valid;
+ unsigned long vpn_array[PPC64_HUGE_HPTE_BATCH];
+ unsigned long slot_array[PPC64_HUGE_HPTE_BATCH];
+ unsigned long shift, hidx, vpn = 0, vsid, hash, slot;
+
+ shift = mmu_psize_defs[psize].shift;
+ max_hpte_count = HUGE_PAGE_SIZE >> shift;
+
+ for (i = 0; i < max_hpte_count; i++) {
+ /*
+ * 8 bits per each hpte entries
+ * 000| [ secondary group (one bit) | hidx (3 bits) | valid bit]
+ */
+ valid = hpte_slot_array[i] & 0x1;
+ if (!valid)
+ continue;
+ hidx = hpte_slot_array[i] >> 1;
+
+ /* get the vpn */
+ addr = s_addr + (i * (1ul << shift));
+ if (!is_kernel_addr(addr)) {
+ ssize = user_segment_size(addr);
+ vsid = get_vsid(mm->context.id, addr, ssize);
+ WARN_ON(vsid == 0);
+ } else {
+ vsid = get_kernel_vsid(addr, mmu_kernel_ssize);
+ ssize = mmu_kernel_ssize;
+ }
+
+ vpn = hpt_vpn(addr, vsid, ssize);
+ hash = hpt_hash(vpn, shift, ssize);
+ if (hidx & _PTEIDX_SECONDARY)
+ hash = ~hash;
+
+ slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+ slot += hidx & _PTEIDX_GROUP_IX;
+
+ slot_array[index] = slot;
+ vpn_array[index] = vpn;
+ if (index == PPC64_HUGE_HPTE_BATCH - 1) {
+ /*
+ * Now do a bluk invalidate
+ */
+ __pSeries_lpar_hugepage_invalidate(slot_array,
+ vpn_array,
+ PPC64_HUGE_HPTE_BATCH,
+ psize, ssize);
+ index = 0;
+ } else
+ index++;
+ }
+ if (index)
+ __pSeries_lpar_hugepage_invalidate(slot_array, vpn_array,
+ index, psize, ssize);
+}
+
static void pSeries_lpar_hpte_removebolted(unsigned long ea,
int psize, int ssize)
{
@@ -360,13 +478,6 @@ static void pSeries_lpar_hpte_removebolted(unsigned long ea,
pSeries_lpar_hpte_invalidate(slot, vpn, psize, ssize, 0);
}
-/* Flag bits for H_BULK_REMOVE */
-#define HBR_REQUEST 0x4000000000000000UL
-#define HBR_RESPONSE 0x8000000000000000UL
-#define HBR_END 0xc000000000000000UL
-#define HBR_AVPN 0x0200000000000000UL
-#define HBR_ANDCOND 0x0100000000000000UL
-
/*
* Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
* lock.
@@ -452,6 +563,7 @@ void __init hpte_init_lpar(void)
ppc_md.hpte_removebolted = pSeries_lpar_hpte_removebolted;
ppc_md.flush_hash_range = pSeries_lpar_flush_hash_range;
ppc_md.hpte_clear_all = pSeries_lpar_hptab_clear;
+ ppc_md.hugepage_invalidate = pSeries_lpar_hugepage_invalidate;
}
#ifdef CONFIG_PPC_SMLPAR
--
1.8.1.2
^ permalink raw reply related
* [PATCH -V7 08/10] powerpc/THP: Enable THP on PPC64
From: Aneesh Kumar K.V @ 2013-04-28 19:51 UTC (permalink / raw)
To: benh, paulus, dwg, linux-mm; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1367178711-8232-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
We enable only if the we support 16MB page size.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/pgtable-ppc64.h | 3 +--
arch/powerpc/mm/pgtable_64.c | 28 ++++++++++++++++++++++++++++
2 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index 97fc839..d65534b 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -426,8 +426,7 @@ static inline unsigned long pmd_pfn(pmd_t pmd)
return pmd_val(pmd) >> PTE_RPN_SHIFT;
}
-/* We will enable it in the last patch */
-#define has_transparent_hugepage() 0
+extern int has_transparent_hugepage(void);
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
static inline int pmd_young(pmd_t pmd)
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 54216c1..b742d6f 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -754,6 +754,34 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
return;
}
+int has_transparent_hugepage(void)
+{
+ if (!mmu_has_feature(MMU_FTR_16M_PAGE))
+ return 0;
+ /*
+ * We support THP only if HPAGE_SHIFT is 16MB.
+ */
+ if (!HPAGE_SHIFT || (HPAGE_SHIFT != mmu_psize_defs[MMU_PAGE_16M].shift))
+ return 0;
+ /*
+ * We need to make sure that we support 16MB hugepage in a segement
+ * with base page size 64K or 4K. We only enable THP with a PAGE_SIZE
+ * of 64K.
+ */
+ /*
+ * If we have 64K HPTE, we will be using that by default
+ */
+ if (mmu_psize_defs[MMU_PAGE_64K].shift &&
+ (mmu_psize_defs[MMU_PAGE_64K].penc[MMU_PAGE_16M] == -1))
+ return 0;
+ /*
+ * Ok we only have 4K HPTE
+ */
+ if (mmu_psize_defs[MMU_PAGE_4K].penc[MMU_PAGE_16M] == -1)
+ return 0;
+
+ return 1;
+}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
pmd_t pmdp_get_and_clear(struct mm_struct *mm,
--
1.8.1.2
^ permalink raw reply related
* [PATCH -V7 06/10] powerpc: Update gup_pmd_range to handle transparent hugepages
From: Aneesh Kumar K.V @ 2013-04-28 19:51 UTC (permalink / raw)
To: benh, paulus, dwg, linux-mm; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1367178711-8232-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/mm/gup.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/mm/gup.c b/arch/powerpc/mm/gup.c
index 4b921af..3d36fd7 100644
--- a/arch/powerpc/mm/gup.c
+++ b/arch/powerpc/mm/gup.c
@@ -66,9 +66,20 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
pmd_t pmd = *pmdp;
next = pmd_addr_end(addr, end);
- if (pmd_none(pmd))
+ /*
+ * The pmd_trans_splitting() check below explains why
+ * pmdp_splitting_flush has to flush the tlb, to stop
+ * this gup-fast code from running while we set the
+ * splitting bit in the pmd. Returning zero will take
+ * the slow path that will call wait_split_huge_page()
+ * if the pmd is still in splitting state. gup-fast
+ * can't because it has irq disabled and
+ * wait_split_huge_page() would never return as the
+ * tlb flush IPI wouldn't run.
+ */
+ if (pmd_none(pmd) || pmd_trans_splitting(pmd))
return 0;
- if (pmd_huge(pmd)) {
+ if (pmd_huge(pmd) || pmd_large(pmd)) {
if (!gup_hugepte((pte_t *)pmdp, PMD_SIZE, addr, next,
write, pages, nr))
return 0;
--
1.8.1.2
^ permalink raw reply related
* [PATCH -V7 05/10] powerpc: Replace find_linux_pte with find_linux_pte_or_hugepte
From: Aneesh Kumar K.V @ 2013-04-28 19:51 UTC (permalink / raw)
To: benh, paulus, dwg, linux-mm; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1367178711-8232-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Replace find_linux_pte with find_linux_pte_or_hugepte and explicitly
document why we don't need to handle transparent hugepages at callsites.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/pgtable-ppc64.h | 24 ------------------------
arch/powerpc/kernel/io-workarounds.c | 10 ++++++++--
arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +-
arch/powerpc/mm/hash_utils_64.c | 8 +++++++-
arch/powerpc/mm/hugetlbpage.c | 8 ++++++--
arch/powerpc/mm/tlb_hash64.c | 7 ++++++-
arch/powerpc/platforms/pseries/eeh.c | 7 ++++++-
7 files changed, 34 insertions(+), 32 deletions(-)
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index f0effab..97fc839 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -343,30 +343,6 @@ static inline void __ptep_set_access_flags(pte_t *ptep, pte_t entry)
void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
void pgtable_cache_init(void);
-
-/*
- * find_linux_pte returns the address of a linux pte for a given
- * effective address and directory. If not found, it returns zero.
- */
-static inline pte_t *find_linux_pte(pgd_t *pgdir, unsigned long ea)
-{
- pgd_t *pg;
- pud_t *pu;
- pmd_t *pm;
- pte_t *pt = NULL;
-
- pg = pgdir + pgd_index(ea);
- if (!pgd_none(*pg)) {
- pu = pud_offset(pg, ea);
- if (!pud_none(*pu)) {
- pm = pmd_offset(pu, ea);
- if (pmd_present(*pm))
- pt = pte_offset_kernel(pm, ea);
- }
- }
- return pt;
-}
-
pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
unsigned *shift);
#endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/kernel/io-workarounds.c b/arch/powerpc/kernel/io-workarounds.c
index 50e90b7..e5263ab 100644
--- a/arch/powerpc/kernel/io-workarounds.c
+++ b/arch/powerpc/kernel/io-workarounds.c
@@ -55,6 +55,7 @@ static struct iowa_bus *iowa_pci_find(unsigned long vaddr, unsigned long paddr)
struct iowa_bus *iowa_mem_find_bus(const PCI_IO_ADDR addr)
{
+ unsigned shift;
struct iowa_bus *bus;
int token;
@@ -70,11 +71,16 @@ struct iowa_bus *iowa_mem_find_bus(const PCI_IO_ADDR addr)
if (vaddr < PHB_IO_BASE || vaddr >= PHB_IO_END)
return NULL;
- ptep = find_linux_pte(init_mm.pgd, vaddr);
+ ptep = find_linux_pte_or_hugepte(init_mm.pgd, vaddr, &shift);
if (ptep == NULL)
paddr = 0;
- else
+ else {
+ /*
+ * we don't have hugepages backing iomem
+ */
+ BUG_ON(shift);
paddr = pte_pfn(*ptep) << PAGE_SHIFT;
+ }
bus = iowa_pci_find(vaddr, paddr);
if (bus == NULL)
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 19c93ba..8c345df 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -27,7 +27,7 @@ static void *real_vmalloc_addr(void *x)
unsigned long addr = (unsigned long) x;
pte_t *p;
- p = find_linux_pte(swapper_pg_dir, addr);
+ p = find_linux_pte_or_hugepte(swapper_pg_dir, addr, NULL);
if (!p || !pte_present(*p))
return NULL;
/* assume we don't have huge pages in vmalloc space... */
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index d0eb6d4..e942ae9 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1131,6 +1131,7 @@ EXPORT_SYMBOL_GPL(hash_page);
void hash_preload(struct mm_struct *mm, unsigned long ea,
unsigned long access, unsigned long trap)
{
+ int shift;
unsigned long vsid;
pgd_t *pgdir;
pte_t *ptep;
@@ -1152,10 +1153,15 @@ void hash_preload(struct mm_struct *mm, unsigned long ea,
pgdir = mm->pgd;
if (pgdir == NULL)
return;
- ptep = find_linux_pte(pgdir, ea);
+ /*
+ * THP pages use update_mmu_cache_pmd. We don't do
+ * hash preload there. Hence can ignore THP here
+ */
+ ptep = find_linux_pte_or_hugepte(pgdir, ea, &shift);
if (!ptep)
return;
+ BUG_ON(shift);
#ifdef CONFIG_PPC_64K_PAGES
/* If either _PAGE_4K_PFN or _PAGE_NO_CACHE is set (and we are on
* a 64K kernel), then we don't preload, hash_page() will take
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 081c001..1154714 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -105,6 +105,7 @@ int pgd_huge(pgd_t pgd)
pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
{
+ /* Only called for HugeTLB pages, hence can ignore THP */
return find_linux_pte_or_hugepte(mm->pgd, addr, NULL);
}
@@ -673,11 +674,14 @@ follow_huge_addr(struct mm_struct *mm, unsigned long address, int write)
struct page *page;
unsigned shift;
unsigned long mask;
-
+ /*
+ * Transparent hugepages are handled by generic code. We can skip them
+ * here.
+ */
ptep = find_linux_pte_or_hugepte(mm->pgd, address, &shift);
/* Verify it is a huge page else bail. */
- if (!ptep || !shift)
+ if (!ptep || !shift || pmd_trans_huge((pmd_t)*ptep))
return ERR_PTR(-EINVAL);
mask = (1UL << shift) - 1;
diff --git a/arch/powerpc/mm/tlb_hash64.c b/arch/powerpc/mm/tlb_hash64.c
index 023ec8a..56d9b85 100644
--- a/arch/powerpc/mm/tlb_hash64.c
+++ b/arch/powerpc/mm/tlb_hash64.c
@@ -189,6 +189,7 @@ void tlb_flush(struct mmu_gather *tlb)
void __flush_hash_table_range(struct mm_struct *mm, unsigned long start,
unsigned long end)
{
+ int shift;
unsigned long flags;
start = _ALIGN_DOWN(start, PAGE_SIZE);
@@ -206,11 +207,15 @@ void __flush_hash_table_range(struct mm_struct *mm, unsigned long start,
local_irq_save(flags);
arch_enter_lazy_mmu_mode();
for (; start < end; start += PAGE_SIZE) {
- pte_t *ptep = find_linux_pte(mm->pgd, start);
+ pte_t *ptep = find_linux_pte_or_hugepte(mm->pgd, start, &shift);
unsigned long pte;
if (ptep == NULL)
continue;
+ /*
+ * We won't find hugepages here, this is iomem.
+ */
+ BUG_ON(shift);
pte = pte_val(*ptep);
if (!(pte & _PAGE_HASHPTE))
continue;
diff --git a/arch/powerpc/platforms/pseries/eeh.c b/arch/powerpc/platforms/pseries/eeh.c
index 6b73d6c..d2e76d2 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -258,12 +258,17 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
*/
static inline unsigned long eeh_token_to_phys(unsigned long token)
{
+ int shift;
pte_t *ptep;
unsigned long pa;
- ptep = find_linux_pte(init_mm.pgd, token);
+ /*
+ * We won't find hugepages here, iomem
+ */
+ ptep = find_linux_pte_or_hugepte(init_mm.pgd, token, &shift);
if (!ptep)
return token;
+ BUG_ON(shift);
pa = pte_pfn(*ptep) << PAGE_SHIFT;
return pa | (token & (PAGE_SIZE-1));
--
1.8.1.2
^ permalink raw reply related
* [PATCH -V7 03/10] powerpc: move find_linux_pte_or_hugepte and gup_hugepte to common code
From: Aneesh Kumar K.V @ 2013-04-28 19:51 UTC (permalink / raw)
To: benh, paulus, dwg, linux-mm; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1367178711-8232-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
We will use this in the later patch for handling THP pages
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/hugetlb.h | 8 +-
arch/powerpc/include/asm/pgtable-ppc64.h | 11 --
arch/powerpc/mm/Makefile | 2 +-
arch/powerpc/mm/hugetlbpage.c | 251 ++++++++++++++++---------------
4 files changed, 136 insertions(+), 136 deletions(-)
diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
index 4daf7e6..91aba46 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -190,8 +190,14 @@ static inline void flush_hugetlb_page(struct vm_area_struct *vma,
unsigned long vmaddr)
{
}
-#endif /* CONFIG_HUGETLB_PAGE */
+#define hugepd_shift(x) 0
+static inline pte_t *hugepte_offset(hugepd_t *hpdp, unsigned long addr,
+ unsigned pdshift)
+{
+ return 0;
+}
+#endif /* CONFIG_HUGETLB_PAGE */
/*
* FSL Book3E platforms require special gpage handling - the gpages
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index 20133c1..f0effab 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -367,19 +367,8 @@ static inline pte_t *find_linux_pte(pgd_t *pgdir, unsigned long ea)
return pt;
}
-#ifdef CONFIG_HUGETLB_PAGE
pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
unsigned *shift);
-#else
-static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
- unsigned *shift)
-{
- if (shift)
- *shift = 0;
- return find_linux_pte(pgdir, ea);
-}
-#endif /* !CONFIG_HUGETLB_PAGE */
-
#endif /* __ASSEMBLY__ */
#ifndef _PAGE_SPLITTING
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index cf16b57..fde36e6 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -28,8 +28,8 @@ obj-$(CONFIG_44x) += 44x_mmu.o
obj-$(CONFIG_PPC_FSL_BOOK3E) += fsl_booke_mmu.o
obj-$(CONFIG_NEED_MULTIPLE_NODES) += numa.o
obj-$(CONFIG_PPC_MM_SLICES) += slice.o
-ifeq ($(CONFIG_HUGETLB_PAGE),y)
obj-y += hugetlbpage.o
+ifeq ($(CONFIG_HUGETLB_PAGE),y)
obj-$(CONFIG_PPC_STD_MMU_64) += hugetlbpage-hash64.o
obj-$(CONFIG_PPC_BOOK3E_MMU) += hugetlbpage-book3e.o
endif
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index fbe6be7..8601f2d 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -21,6 +21,9 @@
#include <asm/pgalloc.h>
#include <asm/tlb.h>
#include <asm/setup.h>
+#include <asm/hugetlb.h>
+
+#ifdef CONFIG_HUGETLB_PAGE
#define PAGE_SHIFT_64K 16
#define PAGE_SHIFT_16M 24
@@ -100,66 +103,6 @@ int pgd_huge(pgd_t pgd)
}
#endif
-/*
- * We have 4 cases for pgds and pmds:
- * (1) invalid (all zeroes)
- * (2) pointer to next table, as normal; bottom 6 bits == 0
- * (3) leaf pte for huge page, bottom two bits != 00
- * (4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table
- */
-pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift)
-{
- pgd_t *pg;
- pud_t *pu;
- pmd_t *pm;
- pte_t *ret_pte;
- hugepd_t *hpdp = NULL;
- unsigned pdshift = PGDIR_SHIFT;
-
- if (shift)
- *shift = 0;
-
- pg = pgdir + pgd_index(ea);
-
- if (pgd_huge(*pg)) {
- ret_pte = (pte_t *) pg;
- goto out;
- } else if (is_hugepd(pg))
- hpdp = (hugepd_t *)pg;
- else if (!pgd_none(*pg)) {
- pdshift = PUD_SHIFT;
- pu = pud_offset(pg, ea);
-
- if (pud_huge(*pu)) {
- ret_pte = (pte_t *) pu;
- goto out;
- } else if (is_hugepd(pu))
- hpdp = (hugepd_t *)pu;
- else if (!pud_none(*pu)) {
- pdshift = PMD_SHIFT;
- pm = pmd_offset(pu, ea);
-
- if (pmd_huge(*pm)) {
- ret_pte = (pte_t *) pm;
- goto out;
- } else if (is_hugepd(pm))
- hpdp = (hugepd_t *)pm;
- else if (!pmd_none(*pm))
- return pte_offset_kernel(pm, ea);
- }
- }
- if (!hpdp)
- return NULL;
-
- ret_pte = hugepte_offset(hpdp, ea, pdshift);
- pdshift = hugepd_shift(*hpdp);
-out:
- if (shift)
- *shift = pdshift;
- return ret_pte;
-}
-EXPORT_SYMBOL_GPL(find_linux_pte_or_hugepte);
-
pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
{
return find_linux_pte_or_hugepte(mm->pgd, addr, NULL);
@@ -753,69 +696,6 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
return NULL;
}
-int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
- unsigned long end, int write, struct page **pages, int *nr)
-{
- unsigned long mask;
- unsigned long pte_end;
- struct page *head, *page, *tail;
- pte_t pte;
- int refs;
-
- pte_end = (addr + sz) & ~(sz-1);
- if (pte_end < end)
- end = pte_end;
-
- pte = *ptep;
- mask = _PAGE_PRESENT | _PAGE_USER;
- if (write)
- mask |= _PAGE_RW;
-
- if ((pte_val(pte) & mask) != mask)
- return 0;
-
- /* hugepages are never "special" */
- VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
-
- refs = 0;
- head = pte_page(pte);
-
- page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
- tail = page;
- do {
- VM_BUG_ON(compound_head(page) != head);
- pages[*nr] = page;
- (*nr)++;
- page++;
- refs++;
- } while (addr += PAGE_SIZE, addr != end);
-
- if (!page_cache_add_speculative(head, refs)) {
- *nr -= refs;
- return 0;
- }
-
- if (unlikely(pte_val(pte) != pte_val(*ptep))) {
- /* Could be optimized better */
- *nr -= refs;
- while (refs--)
- put_page(head);
- return 0;
- }
-
- /*
- * Any tail page need their mapcount reference taken before we
- * return.
- */
- while (refs--) {
- if (PageTail(tail))
- get_huge_page_tail(tail);
- tail++;
- }
-
- return 1;
-}
-
static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end,
unsigned long sz)
{
@@ -1032,3 +912,128 @@ void flush_dcache_icache_hugepage(struct page *page)
}
}
}
+
+#endif /* CONFIG_HUGETLB_PAGE */
+
+/*
+ * We have 4 cases for pgds and pmds:
+ * (1) invalid (all zeroes)
+ * (2) pointer to next table, as normal; bottom 6 bits == 0
+ * (3) leaf pte for huge page, bottom two bits != 00
+ * (4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table
+ */
+pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift)
+{
+ pgd_t *pg;
+ pud_t *pu;
+ pmd_t *pm;
+ pte_t *ret_pte;
+ hugepd_t *hpdp = NULL;
+ unsigned pdshift = PGDIR_SHIFT;
+
+ if (shift)
+ *shift = 0;
+
+ pg = pgdir + pgd_index(ea);
+
+ if (pgd_huge(*pg)) {
+ ret_pte = (pte_t *) pg;
+ goto out;
+ } else if (is_hugepd(pg))
+ hpdp = (hugepd_t *)pg;
+ else if (!pgd_none(*pg)) {
+ pdshift = PUD_SHIFT;
+ pu = pud_offset(pg, ea);
+
+ if (pud_huge(*pu)) {
+ ret_pte = (pte_t *) pu;
+ goto out;
+ } else if (is_hugepd(pu))
+ hpdp = (hugepd_t *)pu;
+ else if (!pud_none(*pu)) {
+ pdshift = PMD_SHIFT;
+ pm = pmd_offset(pu, ea);
+
+ if (pmd_huge(*pm)) {
+ ret_pte = (pte_t *) pm;
+ goto out;
+ } else if (is_hugepd(pm))
+ hpdp = (hugepd_t *)pm;
+ else if (!pmd_none(*pm))
+ return pte_offset_kernel(pm, ea);
+ }
+ }
+ if (!hpdp)
+ return NULL;
+
+ ret_pte = hugepte_offset(hpdp, ea, pdshift);
+ pdshift = hugepd_shift(*hpdp);
+out:
+ if (shift)
+ *shift = pdshift;
+ return ret_pte;
+}
+EXPORT_SYMBOL_GPL(find_linux_pte_or_hugepte);
+
+int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
+ unsigned long end, int write, struct page **pages, int *nr)
+{
+ unsigned long mask;
+ unsigned long pte_end;
+ struct page *head, *page, *tail;
+ pte_t pte;
+ int refs;
+
+ pte_end = (addr + sz) & ~(sz-1);
+ if (pte_end < end)
+ end = pte_end;
+
+ pte = *ptep;
+ mask = _PAGE_PRESENT | _PAGE_USER;
+ if (write)
+ mask |= _PAGE_RW;
+
+ if ((pte_val(pte) & mask) != mask)
+ return 0;
+
+ /* hugepages are never "special" */
+ VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
+
+ refs = 0;
+ head = pte_page(pte);
+
+ page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
+ tail = page;
+ do {
+ VM_BUG_ON(compound_head(page) != head);
+ pages[*nr] = page;
+ (*nr)++;
+ page++;
+ refs++;
+ } while (addr += PAGE_SIZE, addr != end);
+
+ if (!page_cache_add_speculative(head, refs)) {
+ *nr -= refs;
+ return 0;
+ }
+
+ if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+ /* Could be optimized better */
+ *nr -= refs;
+ while (refs--)
+ put_page(head);
+ return 0;
+ }
+
+ /*
+ * Any tail page need their mapcount reference taken before we
+ * return.
+ */
+ while (refs--) {
+ if (PageTail(tail))
+ get_huge_page_tail(tail);
+ tail++;
+ }
+
+ return 1;
+}
--
1.8.1.2
^ permalink raw reply related
* [PATCH -V7 02/10] powerpc/THP: Implement transparent hugepages for ppc64
From: Aneesh Kumar K.V @ 2013-04-28 19:51 UTC (permalink / raw)
To: benh, paulus, dwg, linux-mm; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1367178711-8232-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
We now have pmd entries covering 16MB range and the PMD table double its original size.
We use the second half of the PMD table to deposit the pgtable (PTE page).
The depoisted PTE page is further used to track the HPTE information. The information
include [ secondary group | 3 bit hidx | valid ]. We use one byte per each HPTE entry.
With 16MB hugepage and 64K HPTE we need 256 entries and with 4K HPTE we need
4096 entries. Both will fit in a 4K PTE page. On hugepage invalidate we need to walk
the PTE page and invalidate all valid HPTEs.
This patch implements necessary arch specific functions for THP support and also
hugepage invalidate logic. These PMD related functions are intentionally kept
similar to their PTE counter-part.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/page.h | 11 +-
arch/powerpc/include/asm/pgtable-ppc64-64k.h | 3 +-
arch/powerpc/include/asm/pgtable-ppc64.h | 259 +++++++++++++++++++++-
arch/powerpc/include/asm/pgtable.h | 5 +
arch/powerpc/include/asm/pte-hash64-64k.h | 17 ++
arch/powerpc/mm/pgtable_64.c | 318 +++++++++++++++++++++++++++
arch/powerpc/platforms/Kconfig.cputype | 1 +
7 files changed, 611 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 988c812..cbf4be7 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -37,8 +37,17 @@
#define PAGE_SIZE (ASM_CONST(1) << PAGE_SHIFT)
#ifndef __ASSEMBLY__
-#ifdef CONFIG_HUGETLB_PAGE
+/*
+ * With hugetlbfs enabled we allow the HPAGE_SHIFT to run time
+ * configurable. But we enable THP only with 16MB hugepage.
+ * With only THP configured, we force hugepage size to 16MB.
+ * This should ensure that all subarchs that doesn't support
+ * THP continue to work fine with HPAGE_SHIFT usage.
+ */
+#if defined(CONFIG_HUGETLB_PAGE)
extern unsigned int HPAGE_SHIFT;
+#elif defined(CONFIG_TRANSPARENT_HUGEPAGE)
+#define HPAGE_SHIFT PMD_SHIFT
#else
#define HPAGE_SHIFT PAGE_SHIFT
#endif
diff --git a/arch/powerpc/include/asm/pgtable-ppc64-64k.h b/arch/powerpc/include/asm/pgtable-ppc64-64k.h
index 45142d6..a56b82f 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64-64k.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64-64k.h
@@ -33,7 +33,8 @@
#define PGDIR_MASK (~(PGDIR_SIZE-1))
/* Bits to mask out from a PMD to get to the PTE page */
-#define PMD_MASKED_BITS 0x1ff
+/* PMDs point to PTE table fragments which are 4K aligned. */
+#define PMD_MASKED_BITS 0xfff
/* Bits to mask out from a PGD/PUD to get to the PMD page */
#define PUD_MASKED_BITS 0x1ff
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index ab84332..20133c1 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -154,7 +154,7 @@
#define pmd_present(pmd) (pmd_val(pmd) != 0)
#define pmd_clear(pmdp) (pmd_val(*(pmdp)) = 0)
#define pmd_page_vaddr(pmd) (pmd_val(pmd) & ~PMD_MASKED_BITS)
-#define pmd_page(pmd) virt_to_page(pmd_page_vaddr(pmd))
+extern struct page *pmd_page(pmd_t pmd);
#define pud_set(pudp, pudval) (pud_val(*(pudp)) = (pudval))
#define pud_none(pud) (!pud_val(pud))
@@ -382,4 +382,261 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
#endif /* __ASSEMBLY__ */
+#ifndef _PAGE_SPLITTING
+/*
+ * THP pages can't be special. So use the _PAGE_SPECIAL
+ */
+#define _PAGE_SPLITTING _PAGE_SPECIAL
+#endif
+
+#ifndef _PAGE_THP_HUGE
+/*
+ * We need to differentiate between explicit huge page and THP huge
+ * page, since THP huge page also need to track real subpage details
+ * We use the _PAGE_COMBO bits here as dummy for platform that doesn't
+ * support THP.
+ */
+#define _PAGE_THP_HUGE 0x10000000
+#endif
+
+/*
+ * PTE flags to conserve for HPTE identification for THP page.
+ */
+#ifndef _PAGE_THP_HPTEFLAGS
+#define _PAGE_THP_HPTEFLAGS (_PAGE_BUSY | _PAGE_HASHPTE)
+#endif
+
+#define HUGE_PAGE_SIZE (ASM_CONST(1) << 24)
+#define HUGE_PAGE_MASK (~(HUGE_PAGE_SIZE - 1))
+
+/*
+ * set of bits not changed in pmd_modify.
+ */
+#define _HPAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_THP_HPTEFLAGS | \
+ _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_THP_HUGE)
+
+#ifndef __ASSEMBLY__
+extern void hpte_need_hugepage_flush(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
+extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
+extern pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot);
+extern void set_pmd_at(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp, pmd_t pmd);
+extern void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
+ pmd_t *pmd);
+
+static inline int pmd_trans_huge(pmd_t pmd)
+{
+ /*
+ * leaf pte for huge page, bottom two bits != 00
+ */
+ return (pmd_val(pmd) & 0x3) && (pmd_val(pmd) & _PAGE_THP_HUGE);
+}
+
+static inline int pmd_large(pmd_t pmd)
+{
+ /*
+ * leaf pte for huge page, bottom two bits != 00
+ */
+ if (pmd_trans_huge(pmd))
+ return pmd_val(pmd) & _PAGE_PRESENT;
+ return 0;
+}
+
+static inline int pmd_trans_splitting(pmd_t pmd)
+{
+ if (pmd_trans_huge(pmd))
+ return pmd_val(pmd) & _PAGE_SPLITTING;
+ return 0;
+}
+
+
+static inline unsigned long pmd_pfn(pmd_t pmd)
+{
+ /*
+ * Only called for hugepage pmd
+ */
+ return pmd_val(pmd) >> PTE_RPN_SHIFT;
+}
+
+/* We will enable it in the last patch */
+#define has_transparent_hugepage() 0
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+
+static inline int pmd_young(pmd_t pmd)
+{
+ return pmd_val(pmd) & _PAGE_ACCESSED;
+}
+
+static inline pmd_t pmd_mkhuge(pmd_t pmd)
+{
+ /* Do nothing, mk_pmd() does this part. */
+ return pmd;
+}
+
+#define __HAVE_ARCH_PMD_WRITE
+static inline int pmd_write(pmd_t pmd)
+{
+ return pmd_val(pmd) & _PAGE_RW;
+}
+
+static inline pmd_t pmd_mkold(pmd_t pmd)
+{
+ pmd_val(pmd) &= ~_PAGE_ACCESSED;
+ return pmd;
+}
+
+static inline pmd_t pmd_wrprotect(pmd_t pmd)
+{
+ pmd_val(pmd) &= ~_PAGE_RW;
+ return pmd;
+}
+
+static inline pmd_t pmd_mkdirty(pmd_t pmd)
+{
+ pmd_val(pmd) |= _PAGE_DIRTY;
+ return pmd;
+}
+
+static inline pmd_t pmd_mkyoung(pmd_t pmd)
+{
+ pmd_val(pmd) |= _PAGE_ACCESSED;
+ return pmd;
+}
+
+static inline pmd_t pmd_mkwrite(pmd_t pmd)
+{
+ pmd_val(pmd) |= _PAGE_RW;
+ return pmd;
+}
+
+static inline pmd_t pmd_mknotpresent(pmd_t pmd)
+{
+ pmd_val(pmd) &= ~_PAGE_PRESENT;
+ return pmd;
+}
+
+static inline pmd_t pmd_mksplitting(pmd_t pmd)
+{
+ pmd_val(pmd) |= _PAGE_SPLITTING;
+ return pmd;
+}
+
+/*
+ * Set the dirty and/or accessed bits atomically in a linux hugepage PMD, this
+ * function doesn't need to flush the hash entry
+ */
+static inline void __pmdp_set_access_flags(pmd_t *pmdp, pmd_t entry)
+{
+ unsigned long bits = pmd_val(entry) & (_PAGE_DIRTY |
+ _PAGE_ACCESSED |
+ _PAGE_RW | _PAGE_EXEC);
+#ifdef PTE_ATOMIC_UPDATES
+ unsigned long old, tmp;
+
+ __asm__ __volatile__(
+ "1: ldarx %0,0,%4\n\
+ andi. %1,%0,%6\n\
+ bne- 1b \n\
+ or %0,%3,%0\n\
+ stdcx. %0,0,%4\n\
+ bne- 1b"
+ :"=&r" (old), "=&r" (tmp), "=m" (*pmdp)
+ :"r" (bits), "r" (pmdp), "m" (*pmdp), "i" (_PAGE_BUSY)
+ :"cc");
+#else
+ unsigned long old = pmd_val(*pmdp);
+ *pmdp = __pmd(old | bits);
+#endif
+}
+
+#define __HAVE_ARCH_PMD_SAME
+static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
+{
+ return (((pmd_val(pmd_a) ^ pmd_val(pmd_b)) & ~_PAGE_THP_HPTEFLAGS) == 0);
+}
+
+#define __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS
+extern int pmdp_set_access_flags(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp,
+ pmd_t entry, int dirty);
+
+static inline unsigned long pmd_hugepage_update(struct mm_struct *mm,
+ unsigned long addr,
+ pmd_t *pmdp, unsigned long clr)
+{
+#ifdef PTE_ATOMIC_UPDATES
+ unsigned long old, tmp;
+
+ __asm__ __volatile__(
+ "1: ldarx %0,0,%3\n\
+ andi. %1,%0,%6\n\
+ bne- 1b \n\
+ andc %1,%0,%4 \n\
+ stdcx. %1,0,%3 \n\
+ bne- 1b"
+ : "=&r" (old), "=&r" (tmp), "=m" (*pmdp)
+ : "r" (pmdp), "r" (clr), "m" (*pmdp), "i" (_PAGE_BUSY)
+ : "cc" );
+#else
+ unsigned long old = pmd_val(*pmdp);
+ *pmdp = __pmd(old & ~clr);
+#endif
+
+#ifdef CONFIG_PPC_STD_MMU_64
+ if (old & _PAGE_HASHPTE)
+ hpte_need_hugepage_flush(mm, addr, pmdp);
+#endif
+ return old;
+}
+
+static inline int __pmdp_test_and_clear_young(struct mm_struct *mm,
+ unsigned long addr, pmd_t *pmdp)
+{
+ unsigned long old;
+
+ if ((pmd_val(*pmdp) & (_PAGE_ACCESSED | _PAGE_HASHPTE)) == 0)
+ return 0;
+ old = pmd_hugepage_update(mm, addr, pmdp, _PAGE_ACCESSED);
+ return ((old & _PAGE_ACCESSED) != 0);
+}
+
+#define __HAVE_ARCH_PMDP_TEST_AND_CLEAR_YOUNG
+extern int pmdp_test_and_clear_young(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp);
+#define __HAVE_ARCH_PMDP_CLEAR_YOUNG_FLUSH
+extern int pmdp_clear_flush_young(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp);
+
+#define __HAVE_ARCH_PMDP_GET_AND_CLEAR
+extern pmd_t pmdp_get_and_clear(struct mm_struct *mm,
+ unsigned long addr, pmd_t *pmdp);
+
+#define __HAVE_ARCH_PMDP_SET_WRPROTECT
+static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp)
+{
+
+ if ((pmd_val(*pmdp) & _PAGE_RW) == 0)
+ return;
+
+ pmd_hugepage_update(mm, addr, pmdp, _PAGE_RW);
+}
+
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+extern void pmdp_splitting_flush(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp);
+
+#define __HAVE_ARCH_PGTABLE_DEPOSIT
+extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
+ pgtable_t pgtable);
+#define __HAVE_ARCH_PGTABLE_WITHDRAW
+extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
+
+#define __HAVE_ARCH_PMDP_INVALIDATE
+extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+ pmd_t *pmdp);
+#endif /* __ASSEMBLY__ */
#endif /* _ASM_POWERPC_PGTABLE_PPC64_H_ */
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 7aeb955..283198e 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -222,5 +222,10 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
unsigned long end, int write, struct page **pages, int *nr);
#endif /* __ASSEMBLY__ */
+#ifndef CONFIG_TRANSPARENT_HUGEPAGE
+#define pmd_large(pmd) 0
+#define has_transparent_hugepage() 0
+#endif
+
#endif /* __KERNEL__ */
#endif /* _ASM_POWERPC_PGTABLE_H */
diff --git a/arch/powerpc/include/asm/pte-hash64-64k.h b/arch/powerpc/include/asm/pte-hash64-64k.h
index 3e13e23..6be70be 100644
--- a/arch/powerpc/include/asm/pte-hash64-64k.h
+++ b/arch/powerpc/include/asm/pte-hash64-64k.h
@@ -38,6 +38,23 @@
*/
#define PTE_RPN_SHIFT (30)
+/*
+ * THP pages can't be special. So use the _PAGE_SPECIAL
+ */
+#define _PAGE_SPLITTING _PAGE_SPECIAL
+
+/*
+ * PTE flags to conserve for HPTE identification for THP page.
+ * We drop _PAGE_COMBO here, because we overload that with _PAGE_TH_HUGE.
+ */
+#define _PAGE_THP_HPTEFLAGS (_PAGE_BUSY | _PAGE_HASHPTE)
+
+/*
+ * We need to differentiate between explicit huge page and THP huge
+ * page, since THP huge page also need to track real subpage details
+ */
+#define _PAGE_THP_HUGE _PAGE_COMBO
+
#ifndef __ASSEMBLY__
/*
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index a854096..54216c1 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -338,6 +338,19 @@ EXPORT_SYMBOL(iounmap);
EXPORT_SYMBOL(__iounmap);
EXPORT_SYMBOL(__iounmap_at);
+/*
+ * For hugepage we have pfn in the pmd, we use PTE_RPN_SHIFT bits for flags
+ * For PTE page, we have a PTE_FRAG_SIZE (4K) aligned virtual address.
+ */
+struct page *pmd_page(pmd_t pmd)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ if (pmd_trans_huge(pmd))
+ return pfn_to_page(pmd_pfn(pmd));
+#endif
+ return virt_to_page(pmd_page_vaddr(pmd));
+}
+
#ifdef CONFIG_PPC_64K_PAGES
static pte_t *get_from_cache(struct mm_struct *mm)
{
@@ -455,3 +468,308 @@ void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift)
}
#endif
#endif /* CONFIG_PPC_64K_PAGES */
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static pmd_t set_hugepage_access_flags_filter(pmd_t pmd,
+ struct vm_area_struct *vma,
+ int dirty)
+{
+ return pmd;
+}
+
+/*
+ * This is called when relaxing access to a hugepage. It's also called in the page
+ * fault path when we don't hit any of the major fault cases, ie, a minor
+ * update of _PAGE_ACCESSED, _PAGE_DIRTY, etc... The generic code will have
+ * handled those two for us, we additionally deal with missing execute
+ * permission here on some processors
+ */
+int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
+ pmd_t *pmdp, pmd_t entry, int dirty)
+{
+ int changed;
+ entry = set_hugepage_access_flags_filter(entry, vma, dirty);
+ changed = !pmd_same(*(pmdp), entry);
+ if (changed) {
+ __pmdp_set_access_flags(pmdp, entry);
+ /*
+ * Since we are not supporting SW TLB systems, we don't
+ * have any thing similar to flush_tlb_page_nohash()
+ */
+ }
+ return changed;
+}
+
+int pmdp_test_and_clear_young(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp)
+{
+ return __pmdp_test_and_clear_young(vma->vm_mm, address, pmdp);
+}
+
+/*
+ * We currently remove entries from the hashtable regardless of whether
+ * the entry was young or dirty. The generic routines only flush if the
+ * entry was young or dirty which is not good enough.
+ *
+ * We should be more intelligent about this but for the moment we override
+ * these functions and force a tlb flush unconditionally
+ */
+int pmdp_clear_flush_young(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp)
+{
+ return __pmdp_test_and_clear_young(vma->vm_mm, address, pmdp);
+}
+
+/*
+ * We mark the pmd splitting and invalidate all the hpte
+ * entries for this hugepage.
+ */
+void pmdp_splitting_flush(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp)
+{
+ unsigned long old, tmp;
+
+ VM_BUG_ON(address & ~HPAGE_PMD_MASK);
+#ifdef PTE_ATOMIC_UPDATES
+
+ __asm__ __volatile__(
+ "1: ldarx %0,0,%3\n\
+ andi. %1,%0,%6\n\
+ bne- 1b \n\
+ ori %1,%0,%4 \n\
+ stdcx. %1,0,%3 \n\
+ bne- 1b"
+ : "=&r" (old), "=&r" (tmp), "=m" (*pmdp)
+ : "r" (pmdp), "i" (_PAGE_SPLITTING), "m" (*pmdp), "i" (_PAGE_BUSY)
+ : "cc" );
+#else
+ old = pmd_val(*pmdp);
+ *pmdp = __pmd(old | _PAGE_SPLITTING);
+#endif
+ /*
+ * If we didn't had the splitting flag set, go and flush the
+ * HPTE entries and serialize against gup fast.
+ */
+ if (!(old & _PAGE_SPLITTING)) {
+#ifdef CONFIG_PPC_STD_MMU_64
+ /* We need to flush the hpte */
+ if (old & _PAGE_HASHPTE)
+ hpte_need_hugepage_flush(vma->vm_mm, address, pmdp);
+#endif
+ /* need tlb flush only to serialize against gup-fast */
+ flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
+ }
+}
+
+/*
+ * We want to put the pgtable in pmd and use pgtable for tracking
+ * the base page size hptes
+ */
+void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
+ pgtable_t pgtable)
+{
+ unsigned long *pgtable_slot;
+ assert_spin_locked(&mm->page_table_lock);
+ /*
+ * we store the pgtable in the second half of PMD
+ */
+ pgtable_slot = pmdp + PTRS_PER_PMD;
+ *pgtable_slot = (unsigned long)pgtable;
+}
+
+pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
+{
+ pgtable_t pgtable;
+ unsigned long *pgtable_slot;
+
+ assert_spin_locked(&mm->page_table_lock);
+ pgtable_slot = pmdp + PTRS_PER_PMD;
+ pgtable = (pgtable_t) *pgtable_slot;
+ /*
+ * We store HPTE information in the deposited PTE fragment.
+ * zero out the content on withdraw.
+ */
+ memset(pgtable, 0, PTE_FRAG_SIZE);
+ return pgtable;
+}
+
+/*
+ * Since we are looking at latest ppc64, we don't need to worry about
+ * i/d cache coherency on exec fault
+ */
+static pmd_t set_pmd_filter(pmd_t pmd, unsigned long addr)
+{
+ pmd = __pmd(pmd_val(pmd) & ~_PAGE_THP_HPTEFLAGS);
+ return pmd;
+}
+
+/*
+ * We can make it less convoluted than __set_pte_at, because
+ * we can ignore lot of hardware here, because this is only for
+ * MPSS
+ */
+static inline void __set_pmd_at(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp, pmd_t pmd, int percpu)
+{
+ /*
+ * There is nothing in hash page table now, so nothing to
+ * invalidate, set_pte_at is used for adding new entry.
+ * For updating we should use update_hugepage_pmd()
+ */
+ *pmdp = pmd;
+}
+
+/*
+ * set a new huge pmd. We should not be called for updating
+ * an existing pmd entry. That should go via pmd_hugepage_update.
+ */
+void set_pmd_at(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp, pmd_t pmd)
+{
+ /*
+ * Note: mm->context.id might not yet have been assigned as
+ * this context might not have been activated yet when this
+ * is called.
+ */
+ pmd = set_pmd_filter(pmd, addr);
+
+ __set_pmd_at(mm, addr, pmdp, pmd, 0);
+
+}
+
+void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+ pmd_t *pmdp)
+{
+ pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT);
+ flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
+}
+
+/*
+ * A linux hugepage PMD was changed and the corresponding hash table entries
+ * neesd to be flushed.
+ *
+ * The linux hugepage PMD now include the pmd entries followed by the address
+ * to the stashed pgtable_t. The stashed pgtable_t contains the hpte bits.
+ * [ secondary group | 3 bit hidx | valid ]. We use one byte per each HPTE entry.
+ * With 16MB hugepage and 64K HPTE we need 256 entries and with 4K HPTE we need
+ * 4096 entries. Both will fit in a 4K pgtable_t.
+ */
+void hpte_need_hugepage_flush(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmdp)
+{
+ int ssize, i;
+ unsigned long s_addr;
+ unsigned int psize, valid;
+ unsigned char *hpte_slot_array;
+ unsigned long hidx, vpn, vsid, hash, shift, slot;
+
+ /*
+ * Flush all the hptes mapping this hugepage
+ */
+ s_addr = addr & HUGE_PAGE_MASK;
+ /*
+ * The hpte hindex are stored in the pgtable whose address is in the
+ * second half of the PMD
+ */
+ hpte_slot_array = *(char **)(pmdp + PTRS_PER_PMD);
+
+ /* get the base page size */
+ psize = get_slice_psize(mm, s_addr);
+ shift = mmu_psize_defs[psize].shift;
+
+ for (i = 0; i < (HUGE_PAGE_SIZE >> shift); i++) {
+ /*
+ * 8 bits per each hpte entries
+ * 000| [ secondary group (one bit) | hidx (3 bits) | valid bit]
+ */
+ valid = hpte_slot_array[i] & 0x1;
+ if (!valid)
+ continue;
+ hidx = hpte_slot_array[i] >> 1;
+
+ /* get the vpn */
+ addr = s_addr + (i * (1ul << shift));
+ if (!is_kernel_addr(addr)) {
+ ssize = user_segment_size(addr);
+ vsid = get_vsid(mm->context.id, addr, ssize);
+ WARN_ON(vsid == 0);
+ } else {
+ vsid = get_kernel_vsid(addr, mmu_kernel_ssize);
+ ssize = mmu_kernel_ssize;
+ }
+
+ vpn = hpt_vpn(addr, vsid, ssize);
+ hash = hpt_hash(vpn, shift, ssize);
+ if (hidx & _PTEIDX_SECONDARY)
+ hash = ~hash;
+
+ slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+ slot += hidx & _PTEIDX_GROUP_IX;
+ ppc_md.hpte_invalidate(slot, vpn, psize, ssize, 0);
+ }
+}
+
+static pmd_t pmd_set_protbits(pmd_t pmd, pgprot_t pgprot)
+{
+ pmd_val(pmd) |= pgprot_val(pgprot);
+ return pmd;
+}
+
+pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot)
+{
+ pmd_t pmd;
+ /*
+ * For a valid pte, we would have _PAGE_PRESENT or _PAGE_FILE always
+ * set. We use this to check THP page at pmd level.
+ * leaf pte for huge page, bottom two bits != 00
+ */
+ pmd_val(pmd) = pfn << PTE_RPN_SHIFT;
+ pmd_val(pmd) |= _PAGE_THP_HUGE;
+ pmd = pmd_set_protbits(pmd, pgprot);
+ return pmd;
+}
+
+pmd_t mk_pmd(struct page *page, pgprot_t pgprot)
+{
+ return pfn_pmd(page_to_pfn(page), pgprot);
+}
+
+pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
+{
+
+ pmd_val(pmd) &= _HPAGE_CHG_MASK;
+ pmd = pmd_set_protbits(pmd, newprot);
+ return pmd;
+}
+
+/*
+ * This is called at the end of handling a user page fault, when the
+ * fault has been handled by updating a HUGE PMD entry in the linux page tables.
+ * We use it to preload an HPTE into the hash table corresponding to
+ * the updated linux HUGE PMD entry.
+ */
+void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
+ pmd_t *pmd)
+{
+ return;
+}
+
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+
+pmd_t pmdp_get_and_clear(struct mm_struct *mm,
+ unsigned long addr, pmd_t *pmdp)
+{
+ pmd_t old_pmd;
+ unsigned long old;
+ /*
+ * khugepaged calls this for normal pmd also
+ */
+ if (pmd_trans_huge(*pmdp)) {
+ old = pmd_hugepage_update(mm, addr, pmdp, ~0UL);
+ old_pmd = __pmd(old);
+ } else {
+ old_pmd = *pmdp;
+ pmd_clear(pmdp);
+ }
+ return old_pmd;
+}
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 18e3b76..a526144 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -71,6 +71,7 @@ config PPC_BOOK3S_64
select PPC_FPU
select PPC_HAVE_PMU_SUPPORT
select SYS_SUPPORTS_HUGETLBFS
+ select HAVE_ARCH_TRANSPARENT_HUGEPAGE if PPC_64K_PAGES
config PPC_BOOK3E_64
bool "Embedded processors"
--
1.8.1.2
^ permalink raw reply related
* [PATCH -V7 01/10] powerpc/THP: Double the PMD table size for THP
From: Aneesh Kumar K.V @ 2013-04-28 19:51 UTC (permalink / raw)
To: benh, paulus, dwg, linux-mm; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1367178711-8232-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
THP code does PTE page allocation along with large page request and deposit them
for later use. This is to ensure that we won't have any failures when we split
hugepages to regular pages.
On powerpc we want to use the deposited PTE page for storing hash pte slot and
secondary bit information for the HPTEs. We use the second half
of the pmd table to save the deposted PTE page.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/pgalloc-64.h | 6 +++---
arch/powerpc/include/asm/pgtable-ppc64.h | 6 +++++-
arch/powerpc/mm/init_64.c | 9 ++++++---
3 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/include/asm/pgalloc-64.h b/arch/powerpc/include/asm/pgalloc-64.h
index 91acb12..c756463 100644
--- a/arch/powerpc/include/asm/pgalloc-64.h
+++ b/arch/powerpc/include/asm/pgalloc-64.h
@@ -221,17 +221,17 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
{
- return kmem_cache_alloc(PGT_CACHE(PMD_INDEX_SIZE),
+ return kmem_cache_alloc(PGT_CACHE(PMD_CACHE_INDEX),
GFP_KERNEL|__GFP_REPEAT);
}
static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
{
- kmem_cache_free(PGT_CACHE(PMD_INDEX_SIZE), pmd);
+ kmem_cache_free(PGT_CACHE(PMD_CACHE_INDEX), pmd);
}
#define __pmd_free_tlb(tlb, pmd, addr) \
- pgtable_free_tlb(tlb, pmd, PMD_INDEX_SIZE)
+ pgtable_free_tlb(tlb, pmd, PMD_CACHE_INDEX)
#ifndef CONFIG_PPC_64K_PAGES
#define __pud_free_tlb(tlb, pud, addr) \
pgtable_free_tlb(tlb, pud, PUD_INDEX_SIZE)
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index e3d55f6f..ab84332 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -20,7 +20,11 @@
PUD_INDEX_SIZE + PGD_INDEX_SIZE + PAGE_SHIFT)
#define PGTABLE_RANGE (ASM_CONST(1) << PGTABLE_EADDR_SIZE)
-
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define PMD_CACHE_INDEX (PMD_INDEX_SIZE + 1)
+#else
+#define PMD_CACHE_INDEX PMD_INDEX_SIZE
+#endif
/*
* Define the address range of the kernel non-linear virtual area
*/
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index a56de85..97f741d 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -88,7 +88,11 @@ static void pgd_ctor(void *addr)
static void pmd_ctor(void *addr)
{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ memset(addr, 0, PMD_TABLE_SIZE * 2);
+#else
memset(addr, 0, PMD_TABLE_SIZE);
+#endif
}
struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE];
@@ -137,10 +141,9 @@ void pgtable_cache_add(unsigned shift, void (*ctor)(void *))
void pgtable_cache_init(void)
{
pgtable_cache_add(PGD_INDEX_SIZE, pgd_ctor);
- pgtable_cache_add(PMD_INDEX_SIZE, pmd_ctor);
- if (!PGT_CACHE(PGD_INDEX_SIZE) || !PGT_CACHE(PMD_INDEX_SIZE))
+ pgtable_cache_add(PMD_CACHE_INDEX, pmd_ctor);
+ if (!PGT_CACHE(PGD_INDEX_SIZE) || !PGT_CACHE(PMD_CACHE_INDEX))
panic("Couldn't allocate pgtable caches");
-
/* In all current configs, when the PUD index exists it's the
* same size as either the pgd or pmd index. Verify that the
* initialization above has also created a PUD cache. This
--
1.8.1.2
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox