LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 1/7] [booke] Rename mapping based RELOCATABLE to DYNAMIC_MEMSTART for BookE
From: Suzuki K. Poulose @ 2011-12-15  8:57 UTC (permalink / raw)
  To: Josh Boyer, Benjamin Herrenschmidt
  Cc: Scott Wood, Josh Poimboeuf, linux ppc dev
In-Reply-To: <20111215085516.3404.56377.stgit@suzukikp.in.ibm.com>

The current implementation of CONFIG_RELOCATABLE in BookE is based
on mapping the page aligned kernel load address to KERNELBASE. This
approach however is not enough for platforms, where the TLB page size
is large (e.g, 256M on 44x). So we are renaming the RELOCATABLE used
currently in BookE to DYNAMIC_MEMSTART to reflect the actual method.

The CONFIG_RELOCATABLE for PPC32(BookE) based on processing of the
dynamic relocations will be introduced in the later in the patch series.

This change would allow the use of the old method of RELOCATABLE for
platforms which can afford to enforce the page alignment (platforms with
smaller TLB size).

Changes since v3:

* Introduced a new config, NONSTATIC_KERNEL, to denote a kernel which is
  either a RELOCATABLE or DYNAMIC_MEMSTART(Suggested by: Josh Boyer)

Suggested-by: Scott Wood <scottwood@freescale.com>
Tested-by: Scott Wood <scottwood@freescale.com>

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc: Scott Wood <scottwood@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux ppc dev <linuxppc-dev@lists.ozlabs.org>
---

 arch/powerpc/Kconfig                          |   60 +++++++++++++++++--------
 arch/powerpc/configs/44x/iss476-smp_defconfig |    3 +
 arch/powerpc/include/asm/kdump.h              |    4 +-
 arch/powerpc/include/asm/page.h               |    4 +-
 arch/powerpc/kernel/crash_dump.c              |    4 +-
 arch/powerpc/kernel/head_44x.S                |    4 +-
 arch/powerpc/kernel/head_fsl_booke.S          |    2 -
 arch/powerpc/kernel/machine_kexec.c           |    2 -
 arch/powerpc/kernel/prom_init.c               |    2 -
 arch/powerpc/mm/44x_mmu.c                     |    2 -
 10 files changed, 56 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 7c93c7e..fac92ce 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -364,7 +364,8 @@ config KEXEC
 config CRASH_DUMP
 	bool "Build a kdump crash kernel"
 	depends on PPC64 || 6xx || FSL_BOOKE
-	select RELOCATABLE if PPC64 || FSL_BOOKE
+	select RELOCATABLE if PPC64
+	select DYNAMIC_MEMSTART if FSL_BOOKE
 	help
 	  Build a kernel suitable for use as a kdump capture kernel.
 	  The same kernel binary can be used as production kernel and dump
@@ -773,6 +774,10 @@ source "drivers/rapidio/Kconfig"
 
 endmenu
 
+config NONSTATIC_KERNEL
+	bool
+	default n
+
 menu "Advanced setup"
 	depends on PPC32
 
@@ -822,23 +827,39 @@ config LOWMEM_CAM_NUM
 	int "Number of CAMs to use to map low memory" if LOWMEM_CAM_NUM_BOOL
 	default 3
 
-config RELOCATABLE
-	bool "Build a relocatable kernel (EXPERIMENTAL)"
+config DYNAMIC_MEMSTART
+	bool "Enable page aligned dynamic load address for kernel (EXPERIMENTAL)"
 	depends on EXPERIMENTAL && ADVANCED_OPTIONS && FLATMEM && (FSL_BOOKE || PPC_47x)
-	help
-	  This builds a kernel image that is capable of running at the
-	  location the kernel is loaded at (some alignment restrictions may
-	  exist).
-
-	  One use is for the kexec on panic case where the recovery kernel
-	  must live at a different physical address than the primary
-	  kernel.
-
-	  Note: If CONFIG_RELOCATABLE=y, then the kernel runs from the address
-	  it has been loaded at and the compile time physical addresses
-	  CONFIG_PHYSICAL_START is ignored.  However CONFIG_PHYSICAL_START
-	  setting can still be useful to bootwrappers that need to know the
-	  load location of the kernel (eg. u-boot/mkimage).
+	select NONSTATIC_KERNEL
+	help
+	  This option enables the kernel to be loaded at any page aligned
+	  physical address. The kernel creates a mapping from KERNELBASE to 
+	  the address where the kernel is loaded. The page size here implies
+	  the TLB page size of the mapping for kernel on the particular platform.
+	  Please refer to the init code for finding the TLB page size.
+
+	  DYNAMIC_MEMSTART is an easy way of implementing pseudo-RELOCATABLE
+	  kernel image, where the only restriction is the page aligned kernel
+	  load address. When this option is enabled, the compile time physical 
+	  address CONFIG_PHYSICAL_START is ignored.
+
+# Mapping based RELOCATABLE is moved to DYNAMIC_MEMSTART
+# config RELOCATABLE
+#	bool "Build a relocatable kernel (EXPERIMENTAL)"
+#	depends on EXPERIMENTAL && ADVANCED_OPTIONS && FLATMEM && (FSL_BOOKE || PPC_47x)
+#	help
+#	  This builds a kernel image that is capable of running at the
+#	  location the kernel is loaded at, without any alignment restrictions.
+#
+#	  One use is for the kexec on panic case where the recovery kernel
+#	  must live at a different physical address than the primary
+#	  kernel.
+#
+#	  Note: If CONFIG_RELOCATABLE=y, then the kernel runs from the address
+#	  it has been loaded at and the compile time physical addresses
+#	  CONFIG_PHYSICAL_START is ignored.  However CONFIG_PHYSICAL_START
+#	  setting can still be useful to bootwrappers that need to know the
+#	  load location of the kernel (eg. u-boot/mkimage).
 
 config PAGE_OFFSET_BOOL
 	bool "Set custom page offset address"
@@ -868,7 +889,7 @@ config KERNEL_START_BOOL
 config KERNEL_START
 	hex "Virtual address of kernel base" if KERNEL_START_BOOL
 	default PAGE_OFFSET if PAGE_OFFSET_BOOL
-	default "0xc2000000" if CRASH_DUMP && !RELOCATABLE
+	default "0xc2000000" if CRASH_DUMP && !NONSTATIC_KERNEL
 	default "0xc0000000"
 
 config PHYSICAL_START_BOOL
@@ -881,7 +902,7 @@ config PHYSICAL_START_BOOL
 
 config PHYSICAL_START
 	hex "Physical address where the kernel is loaded" if PHYSICAL_START_BOOL
-	default "0x02000000" if PPC_STD_MMU && CRASH_DUMP && !RELOCATABLE
+	default "0x02000000" if PPC_STD_MMU && CRASH_DUMP && !NONSTATIC_KERNEL
 	default "0x00000000"
 
 config PHYSICAL_ALIGN
@@ -927,6 +948,7 @@ endmenu
 if PPC64
 config RELOCATABLE
 	bool "Build a relocatable kernel"
+	select NONSTATIC_KERNEL
 	help
 	  This builds a kernel image that is capable of running anywhere
 	  in the RMA (real memory area) at any 16k-aligned base address.
diff --git a/arch/powerpc/configs/44x/iss476-smp_defconfig b/arch/powerpc/configs/44x/iss476-smp_defconfig
index a6eb6ad..ca00cf7 100644
--- a/arch/powerpc/configs/44x/iss476-smp_defconfig
+++ b/arch/powerpc/configs/44x/iss476-smp_defconfig
@@ -25,7 +25,8 @@ CONFIG_CMDLINE_BOOL=y
 CONFIG_CMDLINE="root=/dev/issblk0"
 # CONFIG_PCI is not set
 CONFIG_ADVANCED_OPTIONS=y
-CONFIG_RELOCATABLE=y
+CONFIG_NONSTATIC_KERNEL=y
+CONFIG_DYNAMIC_MEMSTART=y
 CONFIG_NET=y
 CONFIG_PACKET=y
 CONFIG_UNIX=y
diff --git a/arch/powerpc/include/asm/kdump.h b/arch/powerpc/include/asm/kdump.h
index bffd062..c977620 100644
--- a/arch/powerpc/include/asm/kdump.h
+++ b/arch/powerpc/include/asm/kdump.h
@@ -32,11 +32,11 @@
 
 #ifndef __ASSEMBLY__
 
-#if defined(CONFIG_CRASH_DUMP) && !defined(CONFIG_RELOCATABLE)
+#if defined(CONFIG_CRASH_DUMP) && !defined(CONFIG_NONSTATIC_KERNEL)
 extern void reserve_kdump_trampoline(void);
 extern void setup_kdump_trampoline(void);
 #else
-/* !CRASH_DUMP || RELOCATABLE */
+/* !CRASH_DUMP || !NONSTATIC_KERNEL */
 static inline void reserve_kdump_trampoline(void) { ; }
 static inline void setup_kdump_trampoline(void) { ; }
 #endif
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 9d7485c..f149967 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -92,7 +92,7 @@ extern unsigned int HPAGE_SHIFT;
 #define PAGE_OFFSET	ASM_CONST(CONFIG_PAGE_OFFSET)
 #define LOAD_OFFSET	ASM_CONST((CONFIG_KERNEL_START-CONFIG_PHYSICAL_START))
 
-#if defined(CONFIG_RELOCATABLE)
+#if defined(CONFIG_NONSTATIC_KERNEL)
 #ifndef __ASSEMBLY__
 
 extern phys_addr_t memstart_addr;
@@ -105,7 +105,7 @@ extern phys_addr_t kernstart_addr;
 
 #ifdef CONFIG_PPC64
 #define MEMORY_START	0UL
-#elif defined(CONFIG_RELOCATABLE)
+#elif defined(CONFIG_NONSTATIC_KERNEL)
 #define MEMORY_START	memstart_addr
 #else
 #define MEMORY_START	(PHYSICAL_START + PAGE_OFFSET - KERNELBASE)
diff --git a/arch/powerpc/kernel/crash_dump.c b/arch/powerpc/kernel/crash_dump.c
index 424afb6..b3ba516 100644
--- a/arch/powerpc/kernel/crash_dump.c
+++ b/arch/powerpc/kernel/crash_dump.c
@@ -28,7 +28,7 @@
 #define DBG(fmt...)
 #endif
 
-#ifndef CONFIG_RELOCATABLE
+#ifndef CONFIG_NONSTATIC_KERNEL
 void __init reserve_kdump_trampoline(void)
 {
 	memblock_reserve(0, KDUMP_RESERVE_LIMIT);
@@ -67,7 +67,7 @@ void __init setup_kdump_trampoline(void)
 
 	DBG(" <- setup_kdump_trampoline()\n");
 }
-#endif /* CONFIG_RELOCATABLE */
+#endif /* CONFIG_NONSTATIC_KERNEL */
 
 static int __init parse_savemaxmem(char *p)
 {
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index b725dab..3df7735 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -86,8 +86,10 @@ _ENTRY(_start);
 
 	bl	early_init
 
-#ifdef CONFIG_RELOCATABLE
+#ifdef CONFIG_DYNAMIC_MEMSTART
 	/*
+	 * Mapping based, page aligned dynamic kernel loading.
+	 *
 	 * r25 will contain RPN/ERPN for the start address of memory
 	 *
 	 * Add the difference between KERNELBASE and PAGE_OFFSET to the
diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S
index 9f5d210..d5d78c4 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -197,7 +197,7 @@ _ENTRY(__early_start)
 
 	bl	early_init
 
-#ifdef CONFIG_RELOCATABLE
+#ifdef CONFIG_DYNAMIC_MEMSTART
 	lis	r3,kernstart_addr@ha
 	la	r3,kernstart_addr@l(r3)
 #ifdef CONFIG_PHYS_64BIT
diff --git a/arch/powerpc/kernel/machine_kexec.c b/arch/powerpc/kernel/machine_kexec.c
index 9ce1672..ec50bb9 100644
--- a/arch/powerpc/kernel/machine_kexec.c
+++ b/arch/powerpc/kernel/machine_kexec.c
@@ -128,7 +128,7 @@ void __init reserve_crashkernel(void)
 
 	crash_size = resource_size(&crashk_res);
 
-#ifndef CONFIG_RELOCATABLE
+#ifndef CONFIG_NONSTATIC_KERNEL
 	if (crashk_res.start != KDUMP_KERNELBASE)
 		printk("Crash kernel location must be 0x%x\n",
 				KDUMP_KERNELBASE);
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index df47316..6e63b20 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -2844,7 +2844,7 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4,
 	RELOC(of_platform) = prom_find_machine_type();
 	prom_printf("Detected machine type: %x\n", RELOC(of_platform));
 
-#ifndef CONFIG_RELOCATABLE
+#ifndef CONFIG_NONSTATIC_KERNEL
 	/* Bail if this is a kdump kernel. */
 	if (PHYSICAL_START > 0)
 		prom_panic("Error: You can't boot a kdump kernel from OF!\n");
diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
index f60e006..924a258 100644
--- a/arch/powerpc/mm/44x_mmu.c
+++ b/arch/powerpc/mm/44x_mmu.c
@@ -221,7 +221,7 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
 {
 	u64 size;
 
-#ifndef CONFIG_RELOCATABLE
+#ifndef CONFIG_NONSTATIC_KERNEL
 	/* We don't currently support the first MEMBLOCK not mapping 0
 	 * physical on those processors
 	 */

^ permalink raw reply related

* [PATCH v5 4/7] [ppc] Define virtual-physical translations for RELOCATABLE
From: Suzuki K. Poulose @ 2011-12-15  8:58 UTC (permalink / raw)
  To: Josh Boyer, Benjamin Herrenschmidt
  Cc: Scott Wood, Josh Poimboeuf, linux ppc dev
In-Reply-To: <20111215085516.3404.56377.stgit@suzukikp.in.ibm.com>

We find the runtime address of _stext and relocate ourselves based
on the following calculation.

	virtual_base = ALIGN(KERNELBASE,KERNEL_TLB_PIN_SIZE) +
			MODULO(_stext.run,KERNEL_TLB_PIN_SIZE)

relocate() is called with the Effective Virtual Base Address (as
shown below)

            | Phys. Addr| Virt. Addr |
Page        |------------------------|
Boundary    |           |            |
            |           |            |
            |           |            |
Kernel Load |___________|_ __ _ _ _ _|<- Effective
Addr(_stext)|           |      ^     |Virt. Base Addr
            |           |      |     |
            |           |      |     |
            |           |reloc_offset|
            |           |      |     |
            |           |      |     |
            |           |______v_____|<-(KERNELBASE)%TLB_SIZE
            |           |            |
            |           |            |
            |           |            |
Page        |-----------|------------|
Boundary    |           |            |


On BookE, we need __va() & __pa() early in the boot process to access
the device tree.

Currently this has been defined as :

#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) -
						PHYSICAL_START + KERNELBASE)
where:
 PHYSICAL_START is kernstart_addr - a variable updated at runtime.
 KERNELBASE	is the compile time Virtual base address of kernel.

This won't work for us, as kernstart_addr is dynamic and will yield different
results for __va()/__pa() for same mapping.

e.g.,

Let the kernel be loaded at 64MB and KERNELBASE be 0xc0000000 (same as
PAGE_OFFSET).

In this case, we would be mapping 0 to 0xc0000000, and kernstart_addr = 64M

Now __va(1MB) = (0x100000) - (0x4000000) + 0xc0000000
		= 0xbc100000 , which is wrong.

it should be : 0xc0000000 + 0x100000 = 0xc0100000

On platforms which support AMP, like PPC_47x (based on 44x), the kernel
could be loaded at highmem. Hence we cannot always depend on the compile
time constants for mapping.

Here are the possible solutions:

1) Update kernstart_addr(PHSYICAL_START) to match the Physical address of
compile time KERNELBASE value, instead of the actual Physical_Address(_stext).

The disadvantage is that we may break other users of PHYSICAL_START. They
could be replaced with __pa(_stext).

2) Redefine __va() & __pa() with relocation offset


#ifdef	CONFIG_RELOCATABLE_PPC32
#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) - PHYSICAL_START + (KERNELBASE + RELOC_OFFSET)))
#define __pa(x) ((unsigned long)(x) + PHYSICAL_START - (KERNELBASE + RELOC_OFFSET))
#endif

where, RELOC_OFFSET could be

  a) A variable, say relocation_offset (like kernstart_addr), updated
     at boot time. This impacts performance, as we have to load an additional
     variable from memory.

		OR

  b) #define RELOC_OFFSET ((PHYSICAL_START & PPC_PIN_SIZE_OFFSET_MASK) - \
                      (KERNELBASE & PPC_PIN_SIZE_OFFSET_MASK))

   This introduces more calculations for doing the translation.

3) Redefine __va() & __pa() with a new variable

i.e,

#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) + VIRT_PHYS_OFFSET))

where VIRT_PHYS_OFFSET :

#ifdef CONFIG_RELOCATABLE_PPC32
#define VIRT_PHYS_OFFSET virt_phys_offset
#else
#define VIRT_PHYS_OFFSET (KERNELBASE - PHYSICAL_START)
#endif /* CONFIG_RELOCATABLE_PPC32 */

where virt_phy_offset is updated at runtime to :

	Effective KERNELBASE - kernstart_addr.

Taking our example, above:

virt_phys_offset = effective_kernelstart_vaddr - kernstart_addr
		 = 0xc0400000 - 0x400000
		 = 0xc0000000
	and

	__va(0x100000) = 0xc0000000 + 0x100000 = 0xc0100000
	 which is what we want.

I have implemented (3) in the following patch which has same cost of
operation as the existing one.

I have tested the patches on 440x platforms only. However this should
work fine for PPC_47x also, as we only depend on the runtime address
and the current TLB XLAT entry for the startup code, which is available
in r25. I don't have access to a 47x board yet. So, it would be great if
somebody could test this on 47x.

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
---

 arch/powerpc/include/asm/page.h |   85 ++++++++++++++++++++++++++++++++++++++-
 arch/powerpc/mm/init_32.c       |    7 +++
 2 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index f149967..f072e97 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -97,12 +97,26 @@ extern unsigned int HPAGE_SHIFT;
 
 extern phys_addr_t memstart_addr;
 extern phys_addr_t kernstart_addr;
+
+#ifdef CONFIG_RELOCATABLE_PPC32
+extern long long virt_phys_offset;
 #endif
+
+#endif /* __ASSEMBLY__ */
 #define PHYSICAL_START	kernstart_addr
-#else
+
+#else	/* !CONFIG_NONSTATIC_KERNEL */
 #define PHYSICAL_START	ASM_CONST(CONFIG_PHYSICAL_START)
 #endif
 
+/* See Description below for VIRT_PHYS_OFFSET */
+#ifdef CONFIG_RELOCATABLE_PPC32
+#define VIRT_PHYS_OFFSET virt_phys_offset
+#else
+#define VIRT_PHYS_OFFSET (KERNELBASE - PHYSICAL_START)
+#endif
+
+
 #ifdef CONFIG_PPC64
 #define MEMORY_START	0UL
 #elif defined(CONFIG_NONSTATIC_KERNEL)
@@ -125,12 +139,77 @@ extern phys_addr_t kernstart_addr;
  * determine MEMORY_START until then.  However we can determine PHYSICAL_START
  * from information at hand (program counter, TLB lookup).
  *
+ * On BookE with RELOCATABLE (RELOCATABLE_PPC32)
+ *
+ *   With RELOCATABLE_PPC32,  we support loading the kernel at any physical 
+ *   address without any restriction on the page alignment.
+ *
+ *   We find the runtime address of _stext and relocate ourselves based on 
+ *   the following calculation:
+ *
+ *  	  virtual_base = ALIGN_DOWN(KERNELBASE,256M) +
+ *  				MODULO(_stext.run,256M)
+ *   and create the following mapping:
+ *
+ * 	  ALIGN_DOWN(_stext.run,256M) => ALIGN_DOWN(KERNELBASE,256M)
+ *
+ *   When we process relocations, we cannot depend on the
+ *   existing equation for the __va()/__pa() translations:
+ *
+ * 	   __va(x) = (x)  - PHYSICAL_START + KERNELBASE
+ *
+ *   Where:
+ *   	 PHYSICAL_START = kernstart_addr = Physical address of _stext
+ *  	 KERNELBASE = Compiled virtual address of _stext.
+ *
+ *   This formula holds true iff, kernel load address is TLB page aligned.
+ *
+ *   In our case, we need to also account for the shift in the kernel Virtual 
+ *   address.
+ *
+ *   E.g.,
+ *
+ *   Let the kernel be loaded at 64MB and KERNELBASE be 0xc0000000 (same as PAGE_OFFSET).
+ *   In this case, we would be mapping 0 to 0xc0000000, and kernstart_addr = 64M
+ *
+ *   Now __va(1MB) = (0x100000) - (0x4000000) + 0xc0000000
+ *                 = 0xbc100000 , which is wrong.
+ *
+ *   Rather, it should be : 0xc0000000 + 0x100000 = 0xc0100000
+ *      	according to our mapping.
+ *
+ *   Hence we use the following formula to get the translations right:
+ *
+ * 	  __va(x) = (x) - [ PHYSICAL_START - Effective KERNELBASE ]
+ *
+ * 	  Where :
+ * 		PHYSICAL_START = dynamic load address.(kernstart_addr variable)
+ * 		Effective KERNELBASE = virtual_base =
+ * 				     = ALIGN_DOWN(KERNELBASE,256M) +
+ * 						MODULO(PHYSICAL_START,256M)
+ *
+ * 	To make the cost of __va() / __pa() more light weight, we introduce
+ * 	a new variable virt_phys_offset, which will hold :
+ *
+ * 	virt_phys_offset = Effective KERNELBASE - PHYSICAL_START
+ * 			 = ALIGN_DOWN(KERNELBASE,256M) - 
+ * 			 	ALIGN_DOWN(PHYSICALSTART,256M)
+ *
+ * 	Hence :
+ *
+ * 	__va(x) = x - PHYSICAL_START + Effective KERNELBASE
+ * 		= x + virt_phys_offset
+ *
+ * 		and
+ * 	__pa(x) = x + PHYSICAL_START - Effective KERNELBASE
+ * 		= x - virt_phys_offset
+ * 		
  * On non-Book-E PPC64 PAGE_OFFSET and MEMORY_START are constants so use
  * the other definitions for __va & __pa.
  */
 #ifdef CONFIG_BOOKE
-#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) - PHYSICAL_START + KERNELBASE))
-#define __pa(x) ((unsigned long)(x) + PHYSICAL_START - KERNELBASE)
+#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) + VIRT_PHYS_OFFSET))
+#define __pa(x) ((unsigned long)(x) - VIRT_PHYS_OFFSET)
 #else
 #define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) + PAGE_OFFSET - MEMORY_START))
 #define __pa(x) ((unsigned long)(x) - PAGE_OFFSET + MEMORY_START)
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 161cefd..60a4e4e 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -65,6 +65,13 @@ phys_addr_t memstart_addr = (phys_addr_t)~0ull;
 EXPORT_SYMBOL(memstart_addr);
 phys_addr_t kernstart_addr;
 EXPORT_SYMBOL(kernstart_addr);
+
+#ifdef CONFIG_RELOCATABLE_PPC32
+/* Used in __va()/__pa() */
+long long virt_phys_offset;
+EXPORT_SYMBOL(virt_phys_offset);
+#endif
+
 phys_addr_t lowmem_end_addr;
 
 int boot_mapsize;

^ permalink raw reply related

* [PATCH v5 2/7] [44x] Enable DYNAMIC_MEMSTART for 440x
From: Suzuki K. Poulose @ 2011-12-15  8:57 UTC (permalink / raw)
  To: Josh Boyer, Benjamin Herrenschmidt
  Cc: Scott Wood, Josh Poimboeuf, linux ppc dev
In-Reply-To: <20111215085516.3404.56377.stgit@suzukikp.in.ibm.com>

DYNAMIC_MEMSTART(old RELOCATABLE) was restricted only to PPC_47x variants
of 44x. This patch enables DYNAMIC_MEMSTART for 440x based chipsets.

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux ppc dev <linuxppc-dev@lists.ozlabs.org>
---

 arch/powerpc/Kconfig           |    2 +-
 arch/powerpc/kernel/head_44x.S |   12 ++++++++++++
 2 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index fac92ce..5eafe95 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -829,7 +829,7 @@ config LOWMEM_CAM_NUM
 
 config DYNAMIC_MEMSTART
 	bool "Enable page aligned dynamic load address for kernel (EXPERIMENTAL)"
-	depends on EXPERIMENTAL && ADVANCED_OPTIONS && FLATMEM && (FSL_BOOKE || PPC_47x)
+	depends on EXPERIMENTAL && ADVANCED_OPTIONS && FLATMEM && (FSL_BOOKE || 44x)
 	select NONSTATIC_KERNEL
 	help
 	  This option enables the kernel to be loaded at any page aligned
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 3df7735..d571111 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -802,12 +802,24 @@ skpinv:	addi	r4,r4,1				/* Increment */
 /*
  * Configure and load pinned entry into TLB slot 63.
  */
+#ifdef CONFIG_DYNAMIC_MEMSTART
+
+	/* Read the XLAT entry for our current mapping */
+	tlbre	r25,r23,PPC44x_TLB_XLAT
+
+	lis	r3,KERNELBASE@h
+	ori	r3,r3,KERNELBASE@l
+
+	/* Use our current RPN entry */
+	mr	r4,r25
+#else
 
 	lis	r3,PAGE_OFFSET@h
 	ori	r3,r3,PAGE_OFFSET@l
 
 	/* Kernel is at the base of RAM */
 	li r4, 0			/* Load the kernel physical address */
+#endif
 
 	/* Load the kernel PID = 0 */
 	li	r0,0

^ permalink raw reply related

* [PATCH v5 3/7] [ppc] Process dynamic relocations for kernel
From: Suzuki K. Poulose @ 2011-12-15  8:58 UTC (permalink / raw)
  To: Josh Boyer, Benjamin Herrenschmidt
  Cc: Scott Wood, Josh Poimboeuf, linux ppc dev
In-Reply-To: <20111215085516.3404.56377.stgit@suzukikp.in.ibm.com>

The following patch implements the dynamic relocation processing for
PPC32 kernel. relocate() accepts the target virtual address and relocates
 the kernel image to the same.

Currently the following relocation types are handled :

	R_PPC_RELATIVE
	R_PPC_ADDR16_LO
	R_PPC_ADDR16_HI
	R_PPC_ADDR16_HA

The last 3 relocations in the above list depends on value of Symbol indexed
whose index is encoded in the Relocation entry. Hence we need the Symbol
Table for processing such relocations.

Note: The GNU ld for ppc32 produces buggy relocations for relocation types
that depend on symbols. The value of the symbols with STB_LOCAL scope
should be assumed to be zero. - Alan Modra

Changes since V4:

 ( Suggested by: Segher Boessenkool <segher@kernel.crashing.org>: )
 * Added 'sync' between dcbst and icbi for the modified instruction in
   relocate().
 * Replaced msync with sync.
 * Better comments on register usage in relocate().
 * Better check for relocation types in relocs_check.pl

Changes since V3:
 * Updated relocation types for ppc in arch/powerpc/relocs_check.pl

Changes since v2:
  * Flush the d-cache'd instructions and invalidate the i-cache to reflect
the processed instructions.(Reported by: Josh Poimboeuf)

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alan Modra <amodra@au1.ibm.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
---

 arch/powerpc/Kconfig              |   41 ++++---
 arch/powerpc/Makefile             |    6 +
 arch/powerpc/kernel/Makefile      |    2 
 arch/powerpc/kernel/reloc_32.S    |  208 +++++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/vmlinux.lds.S |    8 +
 arch/powerpc/relocs_check.pl      |   14 ++
 6 files changed, 256 insertions(+), 23 deletions(-)
 create mode 100644 arch/powerpc/kernel/reloc_32.S

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 5eafe95..33b1c8c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -843,23 +843,30 @@ config DYNAMIC_MEMSTART
 	  load address. When this option is enabled, the compile time physical 
 	  address CONFIG_PHYSICAL_START is ignored.
 
-# Mapping based RELOCATABLE is moved to DYNAMIC_MEMSTART
-# config RELOCATABLE
-#	bool "Build a relocatable kernel (EXPERIMENTAL)"
-#	depends on EXPERIMENTAL && ADVANCED_OPTIONS && FLATMEM && (FSL_BOOKE || PPC_47x)
-#	help
-#	  This builds a kernel image that is capable of running at the
-#	  location the kernel is loaded at, without any alignment restrictions.
-#
-#	  One use is for the kexec on panic case where the recovery kernel
-#	  must live at a different physical address than the primary
-#	  kernel.
-#
-#	  Note: If CONFIG_RELOCATABLE=y, then the kernel runs from the address
-#	  it has been loaded at and the compile time physical addresses
-#	  CONFIG_PHYSICAL_START is ignored.  However CONFIG_PHYSICAL_START
-#	  setting can still be useful to bootwrappers that need to know the
-#	  load location of the kernel (eg. u-boot/mkimage).
+	  This option is overridden by CONFIG_RELOCATABLE
+
+config RELOCATABLE
+	bool "Build a relocatable kernel (EXPERIMENTAL)"
+	depends on EXPERIMENTAL && ADVANCED_OPTIONS && FLATMEM
+	select NONSTATIC_KERNEL
+	help
+	  This builds a kernel image that is capable of running at the
+	  location the kernel is loaded at, without any alignment restrictions.
+	  This feature is a superset of DYNAMIC_MEMSTART and hence overrides it.
+
+	  One use is for the kexec on panic case where the recovery kernel
+	  must live at a different physical address than the primary
+	  kernel.
+
+	  Note: If CONFIG_RELOCATABLE=y, then the kernel runs from the address
+	  it has been loaded at and the compile time physical addresses
+	  CONFIG_PHYSICAL_START is ignored.  However CONFIG_PHYSICAL_START
+	  setting can still be useful to bootwrappers that need to know the
+	  load address of the kernel (eg. u-boot/mkimage).
+
+config RELOCATABLE_PPC32
+	def_bool y
+	depends on PPC32 && RELOCATABLE
 
 config PAGE_OFFSET_BOOL
 	bool "Set custom page offset address"
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index ffe4d88..b8b105c 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -63,9 +63,9 @@ override CC	+= -m$(CONFIG_WORD_SIZE)
 override AR	:= GNUTARGET=elf$(CONFIG_WORD_SIZE)-powerpc $(AR)
 endif
 
-LDFLAGS_vmlinux-yy := -Bstatic
-LDFLAGS_vmlinux-$(CONFIG_PPC64)$(CONFIG_RELOCATABLE) := -pie
-LDFLAGS_vmlinux	:= $(LDFLAGS_vmlinux-yy)
+LDFLAGS_vmlinux-y := -Bstatic
+LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) := -pie
+LDFLAGS_vmlinux	:= $(LDFLAGS_vmlinux-y)
 
 CFLAGS-$(CONFIG_PPC64)	:= -mminimal-toc -mtraceback=no -mcall-aixdesc
 CFLAGS-$(CONFIG_PPC32)	:= -ffixed-r2 -mmultiple
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index ce4f7f1..ee728e4 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -85,6 +85,8 @@ extra-$(CONFIG_FSL_BOOKE)	:= head_fsl_booke.o
 extra-$(CONFIG_8xx)		:= head_8xx.o
 extra-y				+= vmlinux.lds
 
+obj-$(CONFIG_RELOCATABLE_PPC32)	+= reloc_32.o
+
 obj-$(CONFIG_PPC32)		+= entry_32.o setup_32.o
 obj-$(CONFIG_PPC64)		+= dma-iommu.o iommu.o
 obj-$(CONFIG_KGDB)		+= kgdb.o
diff --git a/arch/powerpc/kernel/reloc_32.S b/arch/powerpc/kernel/reloc_32.S
new file mode 100644
index 0000000..ef46ba6
--- /dev/null
+++ b/arch/powerpc/kernel/reloc_32.S
@@ -0,0 +1,208 @@
+/*
+ * Code to process dynamic relocations for PPC32.
+ *
+ * Copyrights (C) IBM Corporation, 2011.
+ *	Author: Suzuki Poulose <suzuki@in.ibm.com>
+ *
+ *  - Based on ppc64 code - reloc_64.S
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/ppc_asm.h>
+
+/* Dynamic section table entry tags */
+DT_RELA = 7			/* Tag for Elf32_Rela section */
+DT_RELASZ = 8			/* Size of the Rela relocs */
+DT_RELAENT = 9			/* Size of one Rela reloc entry */
+
+STN_UNDEF = 0			/* Undefined symbol index */
+STB_LOCAL = 0			/* Local binding for the symbol */
+
+R_PPC_ADDR16_LO = 4		/* Lower half of (S+A) */
+R_PPC_ADDR16_HI = 5		/* Upper half of (S+A) */
+R_PPC_ADDR16_HA = 6		/* High Adjusted (S+A) */
+R_PPC_RELATIVE = 22
+
+/*
+ * r3 = desired final address
+ */
+
+_GLOBAL(relocate)
+
+	mflr	r0		/* Save our LR */
+	bl	0f		/* Find our current runtime address */
+0:	mflr	r12		/* Make it accessible */
+	mtlr	r0
+
+	lwz	r11, (p_dyn - 0b)(r12)
+	add	r11, r11, r12	/* runtime address of .dynamic section */
+	lwz	r9, (p_rela - 0b)(r12)
+	add	r9, r9, r12	/* runtime address of .rela.dyn section */
+	lwz	r10, (p_st - 0b)(r12)
+	add	r10, r10, r12	/* runtime address of _stext section */
+	lwz	r13, (p_sym - 0b)(r12)
+	add	r13, r13, r12	/* runtime address of .dynsym section */
+
+	/*
+	 * Scan the dynamic section for RELA, RELASZ entries
+	 */
+	li	r6, 0
+	li	r7, 0
+	li	r8, 0
+1:	lwz	r5, 0(r11)	/* ELF_Dyn.d_tag */
+	cmpwi	r5, 0		/* End of ELF_Dyn[] */
+	beq	eodyn
+	cmpwi	r5, DT_RELA
+	bne	relasz
+	lwz	r7, 4(r11)	/* r7 = rela.link */
+	b	skip
+relasz:
+	cmpwi	r5, DT_RELASZ
+	bne	relaent
+	lwz	r8, 4(r11)	/* r8 = Total Rela relocs size */
+	b	skip
+relaent:
+	cmpwi	r5, DT_RELAENT
+	bne	skip
+	lwz	r6, 4(r11)	/* r6 = Size of one Rela reloc */
+skip:
+	addi	r11, r11, 8
+	b	1b
+eodyn:				/* End of Dyn Table scan */
+
+	/* Check if we have found all the entries */
+	cmpwi	r7, 0
+	beq	done
+	cmpwi	r8, 0
+	beq	done
+	cmpwi	r6, 0
+	beq	done
+
+
+	/*
+	 * Work out the current offset from the link time address of .rela
+	 * section.
+	 *  cur_offset[r7] = rela.run[r9] - rela.link [r7]
+	 *  _stext.link[r12] = _stext.run[r10] - cur_offset[r7]
+	 *  final_offset[r3] = _stext.final[r3] - _stext.link[r12]
+	 */
+	subf	r7, r7, r9	/* cur_offset */
+	subf	r12, r7, r10
+	subf	r3, r12, r3	/* final_offset */
+
+	subf	r8, r6, r8	/* relaz -= relaent */
+	/*
+	 * Scan through the .rela table and process each entry
+	 * r9	- points to the current .rela table entry
+	 * r13	- points to the symbol table
+	 */
+
+	/*
+	 * Check if we have a relocation based on symbol
+	 * r5 will hold the value of the symbol.
+	 */
+applyrela:
+	lwz	r4, 4(r9)		/* r4 = rela.r_info */
+	srwi	r5, r4, 8		/* ELF32_R_SYM(r_info) */
+	cmpwi	r5, STN_UNDEF	/* sym == STN_UNDEF ? */
+	beq	get_type	/* value = 0 */
+	/* Find the value of the symbol at index(r5) */
+	slwi	r5, r5, 4		/* r5 = r5 * sizeof(Elf32_Sym) */
+	add	r12, r13, r5	/* r12 = &__dyn_sym[Index] */
+
+	/*
+	 * GNU ld has a bug, where dynamic relocs based on
+	 * STB_LOCAL symbols, the value should be assumed
+	 * to be zero. - Alan Modra
+	 */
+	/* XXX: Do we need to check if we are using GNU ld ? */
+	lbz	r5, 12(r12)	/* r5 = dyn_sym[Index].st_info */
+	extrwi	r5, r5, 4, 24	/* r5 = ELF32_ST_BIND(r5) */
+	cmpwi	r5, STB_LOCAL	/* st_value = 0, ld bug */
+	beq	get_type	/* We have r5 = 0 */
+	lwz	r5, 4(r12)	/* r5 = __dyn_sym[Index].st_value */
+
+get_type:
+	/* Load the relocation type to r4 */
+	extrwi	r4, r4, 8, 24	/* r4 = ELF32_R_TYPE(r_info) = ((char*)r4)[3] */
+
+	/* R_PPC_RELATIVE */
+	cmpwi	r4, R_PPC_RELATIVE
+	bne	hi16
+	lwz	r4, 0(r9)	/* r_offset */
+	lwz	r0, 8(r9)	/* r_addend */
+	add	r0, r0, r3	/* final addend */
+	stwx	r0, r4, r7	/* memory[r4+r7]) = (u32)r0 */
+	b	nxtrela		/* continue */
+
+	/* R_PPC_ADDR16_HI */
+hi16:
+	cmpwi	r4, R_PPC_ADDR16_HI
+	bne	ha16
+	lwz	r4, 0(r9)	/* r_offset */
+	lwz	r0, 8(r9)	/* r_addend */
+	add	r0, r0, r3
+	add	r0, r0, r5	/* r0 = (S+A+Offset) */
+	extrwi	r0, r0, 16, 0	/* r0 = (r0 >> 16) */
+	b	store_half
+
+	/* R_PPC_ADDR16_HA */
+ha16:
+	cmpwi	r4, R_PPC_ADDR16_HA
+	bne	lo16
+	lwz	r4, 0(r9)	/* r_offset */
+	lwz	r0, 8(r9)	/* r_addend */
+	add	r0, r0, r3
+	add	r0, r0, r5	/* r0 = (S+A+Offset) */
+	extrwi	r5, r0, 1, 16	/* Extract bit 16 */
+	extrwi	r0, r0, 16, 0	/* r0 = (r0 >> 16) */
+	add	r0, r0, r5	/* Add it to r0 */
+	b	store_half
+
+	/* R_PPC_ADDR16_LO */
+lo16:
+	cmpwi	r4, R_PPC_ADDR16_LO
+	bne	nxtrela
+	lwz	r4, 0(r9)	/* r_offset */
+	lwz	r0, 8(r9)	/* r_addend */
+	add	r0, r0, r3
+	add	r0, r0, r5	/* r0 = (S+A+Offset) */
+	extrwi	r0, r0, 16, 16	/* r0 &= 0xffff */
+	/* Fall through to */
+
+	/* Store half word */
+store_half:
+	sthx	r0, r4, r7	/* memory[r4+r7] = (u16)r0 */
+
+nxtrela:
+	/*
+	 * We have to flush the modified instructions to the
+	 * main storage from the d-cache. And also, invalidate the
+	 * cached instructions in i-cache which has been modified.
+	 *
+	 * We delay the sync / isync operation till the end, since
+	 * we won't be executing the modified instructions until
+	 * we return from here.
+	 */
+	dcbst	r4,r7
+	sync			/* Ensure the data is flushed before icbi */
+	icbi	r4,r7
+	cmpwi	r8, 0		/* relasz = 0 ? */
+	ble	done
+	add	r9, r9, r6	/* move to next entry in the .rela table */
+	subf	r8, r6, r8	/* relasz -= relaent */
+	b	applyrela
+
+done:
+	sync			/* Wait for the flush to finish */
+	isync			/* Discard prefetched instructions */
+	blr
+
+p_dyn:		.long	__dynamic_start - 0b
+p_rela:		.long	__rela_dyn_start - 0b
+p_sym:		.long	__dynamic_symtab - 0b
+p_st:		.long	_stext - 0b
diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index 920276c..710a540 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -170,7 +170,13 @@ SECTIONS
 	}
 #ifdef CONFIG_RELOCATABLE
 	. = ALIGN(8);
-	.dynsym : AT(ADDR(.dynsym) - LOAD_OFFSET) { *(.dynsym) }
+	.dynsym : AT(ADDR(.dynsym) - LOAD_OFFSET)
+	{
+#ifdef CONFIG_RELOCATABLE_PPC32
+		__dynamic_symtab = .;
+#endif
+		*(.dynsym)
+	}
 	.dynstr : AT(ADDR(.dynstr) - LOAD_OFFSET) { *(.dynstr) }
 	.dynamic : AT(ADDR(.dynamic) - LOAD_OFFSET)
 	{
diff --git a/arch/powerpc/relocs_check.pl b/arch/powerpc/relocs_check.pl
index d257109..7f5b838 100755
--- a/arch/powerpc/relocs_check.pl
+++ b/arch/powerpc/relocs_check.pl
@@ -32,8 +32,18 @@ while (<FD>) {
 	next if (!/\s+R_/);
 
 	# These relocations are okay
-	next if (/R_PPC64_RELATIVE/ or /R_PPC64_NONE/ or
-	         /R_PPC64_ADDR64\s+mach_/);
+	# On PPC64:
+	# 	R_PPC64_RELATIVE, R_PPC64_NONE, R_PPC64_ADDR64
+	# On PPC:
+	# 	R_PPC_RELATIVE, R_PPC_ADDR16_HI, 
+	# 	R_PPC_ADDR16_HA,R_PPC_ADDR16_LO,
+	# 	R_PPC_NONE
+
+	next if (/\bR_PPC64_RELATIVE\b/ or /\bR_PPC64_NONE\b/ or
+	         /\bR_PPC64_ADDR64\s+mach_/);
+	next if (/\bR_PPC_ADDR16_LO\b/ or /\bR_PPC_ADDR16_HI\b/ or
+		 /\bR_PPC_ADDR16_HA\b/ or /\bR_PPC_RELATIVE\b/ or
+		 /\bR_PPC_NONE\b/);
 
 	# If we see this type of relcoation it's an idication that
 	# we /may/ be using an old version of binutils.

^ permalink raw reply related

* [PATCH v5 5/7] [44x] Enable CONFIG_RELOCATABLE for PPC44x
From: Suzuki K. Poulose @ 2011-12-15  8:59 UTC (permalink / raw)
  To: Josh Boyer, Benjamin Herrenschmidt
  Cc: Scott Wood, Josh Poimboeuf, linux ppc dev
In-Reply-To: <20111215085516.3404.56377.stgit@suzukikp.in.ibm.com>

The following patch adds relocatable kernel support - based on processing
of dynamic relocations - for PPC44x kernel.

We find the runtime address of _stext and relocate ourselves based
on the following calculation.

	virtual_base = ALIGN(KERNELBASE,256M) +
			MODULO(_stext.run,256M)

relocate() is called with the Effective Virtual Base Address (as
shown below)

            | Phys. Addr| Virt. Addr |
Page (256M) |------------------------|
Boundary    |           |            |
            |           |            |
            |           |            |
Kernel Load |___________|_ __ _ _ _ _|<- Effective
Addr(_stext)|           |      ^     |Virt. Base Addr
            |           |      |     |
            |           |      |     |
            |           |reloc_offset|
            |           |      |     |
            |           |      |     |
            |           |______v_____|<-(KERNELBASE)%256M
            |           |            |
            |           |            |
            |           |            |
Page(256M)  |-----------|------------|
Boundary    |           |            |

The virt_phys_offset is updated accordingly, i.e,

	virt_phys_offset = effective. kernel virt base - kernstart_addr

I have tested the patches on 440x platforms only. However this should
work fine for PPC_47x also, as we only depend on the runtime address
and the current TLB XLAT entry for the startup code, which is available
in r25. I don't have access to a 47x board yet. So, it would be great if
somebody could test this on 47x.

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Tony Breeds <tony@bakeyournoodle.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
---

 arch/powerpc/Kconfig           |    2 -
 arch/powerpc/kernel/head_44x.S |   95 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 94 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 33b1c8c..8833df5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -847,7 +847,7 @@ config DYNAMIC_MEMSTART
 
 config RELOCATABLE
 	bool "Build a relocatable kernel (EXPERIMENTAL)"
-	depends on EXPERIMENTAL && ADVANCED_OPTIONS && FLATMEM
+	depends on EXPERIMENTAL && ADVANCED_OPTIONS && FLATMEM && 44x
 	select NONSTATIC_KERNEL
 	help
 	  This builds a kernel image that is capable of running at the
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index d571111..885d540 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -64,6 +64,35 @@ _ENTRY(_start);
 	mr	r31,r3		/* save device tree ptr */
 	li	r24,0		/* CPU number */
 
+#ifdef CONFIG_RELOCATABLE
+/*
+ * Relocate ourselves to the current runtime address.
+ * This is called only by the Boot CPU.
+ * "relocate" is called with our current runtime virutal
+ * address.
+ * r21 will be loaded with the physical runtime address of _stext
+ */
+	bl	0f				/* Get our runtime address */
+0:	mflr	r21				/* Make it accessible */
+	addis	r21,r21,(_stext - 0b)@ha
+	addi	r21,r21,(_stext - 0b)@l 	/* Get our current runtime base */
+
+	/*
+	 * We have the runtime (virutal) address of our base.
+	 * We calculate our shift of offset from a 256M page.
+	 * We could map the 256M page we belong to at PAGE_OFFSET and
+	 * get going from there.
+	 */
+	lis	r4,KERNELBASE@h
+	ori	r4,r4,KERNELBASE@l
+	rlwinm	r6,r21,0,4,31			/* r6 = PHYS_START % 256M */
+	rlwinm	r5,r4,0,4,31			/* r5 = KERNELBASE % 256M */
+	subf	r3,r5,r6			/* r3 = r6 - r5 */
+	add	r3,r4,r3			/* Required Virutal Address */
+
+	bl	relocate
+#endif
+
 	bl	init_cpu_state
 
 	/*
@@ -86,7 +115,64 @@ _ENTRY(_start);
 
 	bl	early_init
 
-#ifdef CONFIG_DYNAMIC_MEMSTART
+#ifdef CONFIG_RELOCATABLE
+	/*
+	 * Relocatable kernel support based on processing of dynamic
+	 * relocation entries.
+	 *
+	 * r25 will contain RPN/ERPN for the start address of memory
+	 * r21 will contain the current offset of _stext
+	 */
+	lis	r3,kernstart_addr@ha
+	la	r3,kernstart_addr@l(r3)
+
+	/*
+	 * Compute the kernstart_addr.
+	 * kernstart_addr => (r6,r8)
+	 * kernstart_addr & ~0xfffffff => (r6,r7)
+	 */
+	rlwinm	r6,r25,0,28,31	/* ERPN. Bits 32-35 of Address */
+	rlwinm	r7,r25,0,0,3	/* RPN - assuming 256 MB page size */
+	rlwinm	r8,r21,0,4,31	/* r8 = (_stext & 0xfffffff) */
+	or	r8,r7,r8	/* Compute the lower 32bit of kernstart_addr */
+
+	/* Store kernstart_addr */
+	stw	r6,0(r3)	/* higher 32bit */
+	stw	r8,4(r3)	/* lower 32bit  */
+
+	/*
+	 * Compute the virt_phys_offset :
+	 * virt_phys_offset = stext.run - kernstart_addr
+	 *
+	 * stext.run = (KERNELBASE & ~0xfffffff) + (kernstart_addr & 0xfffffff)
+	 * When we relocate, we have :
+	 *
+	 *	(kernstart_addr & 0xfffffff) = (stext.run & 0xfffffff)
+	 *
+	 * hence:
+	 *  virt_phys_offset = (KERNELBASE & ~0xfffffff) - (kernstart_addr & ~0xfffffff)
+	 *
+	 */
+
+	/* KERNELBASE&~0xfffffff => (r4,r5) */
+	li	r4, 0		/* higer 32bit */
+	lis	r5,KERNELBASE@h
+	rlwinm	r5,r5,0,0,3	/* Align to 256M, lower 32bit */
+
+	/*
+	 * 64bit subtraction.
+	 */
+	subfc	r5,r7,r5
+	subfe	r4,r6,r4
+
+	/* Store virt_phys_offset */
+	lis	r3,virt_phys_offset@ha
+	la	r3,virt_phys_offset@l(r3)
+
+	stw	r4,0(r3)
+	stw	r5,4(r3)
+
+#elif defined(CONFIG_DYNAMIC_MEMSTART)
 	/*
 	 * Mapping based, page aligned dynamic kernel loading.
 	 *
@@ -802,7 +888,12 @@ skpinv:	addi	r4,r4,1				/* Increment */
 /*
  * Configure and load pinned entry into TLB slot 63.
  */
-#ifdef CONFIG_DYNAMIC_MEMSTART
+#ifdef CONFIG_NONSTATIC_KERNEL
+	/*
+	 * In case of a NONSTATIC_KERNEL we reuse the TLB XLAT
+	 * entries of the initial mapping set by the boot loader.
+	 * The XLAT entry is stored in r25
+	 */
 
 	/* Read the XLAT entry for our current mapping */
 	tlbre	r25,r23,PPC44x_TLB_XLAT

^ permalink raw reply related

* [PATCH v5 6/7] [44x] Enable CRASH_DUMP for 440x
From: Suzuki K. Poulose @ 2011-12-15  8:59 UTC (permalink / raw)
  To: Josh Boyer, Benjamin Herrenschmidt
  Cc: Scott Wood, Josh Poimboeuf, linux ppc dev
In-Reply-To: <20111215085516.3404.56377.stgit@suzukikp.in.ibm.com>

Now that we have relocatable kernel, supporting CRASH_DUMP only requires
turning the switches on for UP machines.

We don't have kexec support on 47x yet. Enabling SMP support would be done
as part of enabling the PPC_47x support.


Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
---

 arch/powerpc/Kconfig |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8833df5..fe56229 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -363,8 +363,8 @@ config KEXEC
 
 config CRASH_DUMP
 	bool "Build a kdump crash kernel"
-	depends on PPC64 || 6xx || FSL_BOOKE
-	select RELOCATABLE if PPC64
+	depends on PPC64 || 6xx || FSL_BOOKE || (44x && !SMP && !PPC_47x)
+	select RELOCATABLE if PPC64 || 44x
 	select DYNAMIC_MEMSTART if FSL_BOOKE
 	help
 	  Build a kernel suitable for use as a kdump capture kernel.

^ permalink raw reply related

* [PATCH v5 7/7] [boot] Change the load address for the wrapper to fit the kernel
From: Suzuki K. Poulose @ 2011-12-15  8:59 UTC (permalink / raw)
  To: Josh Boyer, Benjamin Herrenschmidt
  Cc: Scott Wood, Josh Poimboeuf, linux ppc dev
In-Reply-To: <20111215085516.3404.56377.stgit@suzukikp.in.ibm.com>

The wrapper code which uncompresses the kernel in case of a 'ppc' boot
is by default loaded at 0x00400000 and the kernel will be uncompressed
to fit the location 0-0x00400000. But with dynamic relocations, the size
of the kernel may exceed 0x00400000(4M). This would cause an overlap
of the uncompressed kernel and the boot wrapper, causing a failure in
boot.

The message looks like :


   zImage starting: loaded at 0x00400000 (sp: 0x0065ffb0)
   Allocating 0x5ce650 bytes for kernel ...
   Insufficient memory for kernel at address 0! (_start=00400000, uncompressed size=00591a20)

This patch shifts the load address of the boot wrapper code to the next higher MB,
according to the size of  the uncompressed vmlinux.

With the patch, we get the following message while building the image :

 WARN: Uncompressed kernel (size 0x5b0344) overlaps the address of the wrapper(0x400000)
 WARN: Fixing the link_address of wrapper to (0x600000)


Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
---

 arch/powerpc/boot/wrapper |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
index 14cd4bc..c8d6aaf 100755
--- a/arch/powerpc/boot/wrapper
+++ b/arch/powerpc/boot/wrapper
@@ -257,6 +257,8 @@ vmz="$tmpdir/`basename \"$kernel\"`.$ext"
 if [ -z "$cacheit" -o ! -f "$vmz$gzip" -o "$vmz$gzip" -ot "$kernel" ]; then
     ${CROSS}objcopy $objflags "$kernel" "$vmz.$$"
 
+    strip_size=$(stat -c %s $vmz.$$)
+
     if [ -n "$gzip" ]; then
         gzip -n -f -9 "$vmz.$$"
     fi
@@ -266,6 +268,24 @@ if [ -z "$cacheit" -o ! -f "$vmz$gzip" -o "$vmz$gzip" -ot "$kernel" ]; then
     else
 	vmz="$vmz.$$"
     fi
+else
+    # Calculate the vmlinux.strip size
+    ${CROSS}objcopy $objflags "$kernel" "$vmz.$$"
+    strip_size=$(stat -c %s $vmz.$$)
+    rm -f $vmz.$$
+fi
+
+# Round the size to next higher MB limit
+round_size=$(((strip_size + 0xfffff) & 0xfff00000))
+
+round_size=0x$(printf "%x" $round_size)
+link_addr=$(printf "%d" $link_address)
+
+if [ $link_addr -lt $strip_size ]; then
+    echo "WARN: Uncompressed kernel (size 0x$(printf "%x\n" $strip_size))" \
+		"overlaps the address of the wrapper($link_address)"
+    echo "WARN: Fixing the link_address of wrapper to ($round_size)"
+    link_address=$round_size
 fi
 
 vmz="$vmz$gzip"

^ permalink raw reply related

* Re: linux-next bad Kconfig for drivers/hid
From: Jiri Kosina @ 2011-12-15 10:08 UTC (permalink / raw)
  To: Tony Breeds; +Cc: Jeremy Fitzhardinge, LinuxPPC-dev, Linux Kernel ML
In-Reply-To: <20111212003140.GA16353@thor.bakeyournoodle.com>

On Mon, 12 Dec 2011, Tony Breeds wrote:

> On Mon, Dec 12, 2011 at 12:21:16AM +0100, Jiri Kosina wrote:
> > On Thu, 8 Dec 2011, Jeremy Fitzhardinge wrote:
> > > 
> > > Hm.  How about making it "depends on HID && POWER_SUPPLY"?  I think that
> > > would needlessly disable it if HID is also modular, but I'm not sure how
> > > to fix that.  "depends on HID && POWER_SUPPLY && HID == POWER_SUPPLY"?
> 
> That would work, but I think technically I think you could end up with
> HID=m and POWER_SUPPLY=m which would still allow HID_BATTERY_STRENGTH=y
> which is the same problem.
> 
> I don't know what kind of .config contortions you'd need to do to get
> there.
>  
> > How about making it 'default POWER_SUPPLY' instead?
> 
> By itself that wont help as POWER_SUPPLY=m statisfies.
> 
> So it looks like we have Jeremy's:
> 	HID && POWER_SUPPLY && HID == POWER_SUPPLY

Tony,

have you actually tested this one to work in the configuration you have 
been seeing it to fail?

I don't seem to be able to find any use of '==' in other Kconfig files 
(and never used it myself), so I'd like to have confirmation that it 
actually works and fixes the problem before I apply it :)

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply

* [PATCH v2 1/3] powerpc/kprobe: introduce a new thread flag
From: Tiejun Chen @ 2011-12-15 11:00 UTC (permalink / raw)
  To: benh, linuxppc-dev
In-Reply-To: <1323946810-4868-1-git-send-email-tiejun.chen@windriver.com>

We need to add a new thread flag, TIF_EMULATE_STACK_STORE,
for emulating stack store operation while exiting exception.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
---
 arch/powerpc/include/asm/thread_info.h |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index 836f231..f414fee 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -112,6 +112,8 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_FREEZE		14	/* Freezing for suspend */
 #define TIF_SYSCALL_TRACEPOINT	15	/* syscall tracepoint instrumentation */
 #define TIF_RUNLATCH		16	/* Is the runlatch enabled? */
+#define TIF_EMULATE_STACK_STORE	17	/* Is an instruction emulation
+					   for stack store? */
 
 /* as above, but as bit values */
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
@@ -130,6 +132,7 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_FREEZE		(1<<TIF_FREEZE)
 #define _TIF_SYSCALL_TRACEPOINT	(1<<TIF_SYSCALL_TRACEPOINT)
 #define _TIF_RUNLATCH		(1<<TIF_RUNLATCH)
+#define _TIF_EMULATE_STACK_STORE	(1<<TIF_EMULATE_STACK_STORE)
 #define _TIF_SYSCALL_T_OR_A	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 				 _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT)
 
-- 
1.5.6

^ permalink raw reply related

* [PATCH v2 0/3] ppc32/kprobe: Fix a bug for kprobe stwu r1
From: Tiejun Chen @ 2011-12-15 11:00 UTC (permalink / raw)
  To: benh, linuxppc-dev

Changes from V1:

* use memcpy simply to withdraw copy_exc_stack
* add !(regs->msr & MSR_PR)) and
	WARN_ON(test_thread_flag(TIF_EMULATE_STACK_STORE));
  to make sure we're in goot path.
* move this migration process inside 'restore'
* clear TIF flag atomically 

Tiejun Chen (3):
      powerpc/kprobe: introduce a new thread flag
      ppc32/kprobe: complete kprobe and migrate exception frame
      ppc32/kprobe: don't emulate store when kprobe stwu r1

 arch/powerpc/include/asm/thread_info.h |    3 ++
 arch/powerpc/kernel/entry_32.S         |   35 ++++++++++++++++++++++++++++++++
 arch/powerpc/lib/sstep.c               |   25 +++++++++++++++++++++-
 3 files changed, 61 insertions(+), 2 deletions(-)

Tiejun

^ permalink raw reply

* [PATCH v2 2/3] ppc32/kprobe: complete kprobe and migrate exception frame
From: Tiejun Chen @ 2011-12-15 11:00 UTC (permalink / raw)
  To: benh, linuxppc-dev
In-Reply-To: <1323946810-4868-1-git-send-email-tiejun.chen@windriver.com>

We can't emulate stwu since that may corrupt current exception stack.
So we will have to do real store operation in the exception return code.

Firstly we'll allocate a trampoline exception frame below the kprobed
function stack and copy the current exception frame to the trampoline.
Then we can do this real store operation to implement 'stwu', and reroute
the trampoline frame to r1 to complete this exception migration.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
---
 arch/powerpc/kernel/entry_32.S |   35 +++++++++++++++++++++++++++++++++++
 1 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 56212bc..0cdd27d 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -850,6 +850,41 @@ resume_kernel:
 
 	/* interrupts are hard-disabled at this point */
 restore:
+	lwz	r3,_MSR(r1)	/* Returning to user mode? */
+	andi.	r0,r3,MSR_PR
+	bne	1f	
+	/* check current_thread_info, _TIF_EMULATE_STACK_STORE */
+	rlwinm	r9,r1,0,0,(31-THREAD_SHIFT)
+	lwz	r0,TI_FLAGS(r9)
+	andis.	r0,r0,_TIF_EMULATE_STACK_STORE@h
+	beq+	1f	
+
+	addi	r9,r1,INT_FRAME_SIZE	/* Get the kprobed function entry */
+
+	lwz	r3,GPR1(r1)
+	subi	r3,r3,INT_FRAME_SIZE	/* dst: Allocate a trampoline exception frame */
+	mr	r4,r1			/* src:  current exception frame */
+	li	r5,INT_FRAME_SIZE	/* size: INT_FRAME_SIZE */
+	mr	r1,r3			/* Reroute the trampoline frame to r1 */
+	bl	memcpy			/* Copy from the original to the trampoline */
+
+	/* Do real store operation to complete stwu */
+	lwz	r5,GPR1(r1)
+	stw	r9,0(r5)
+
+	/* Clear _TIF_EMULATE_STACK_STORE flag */
+	rlwinm	r9,r1,0,0,(31-THREAD_SHIFT)
+	lis	r11,_TIF_EMULATE_STACK_STORE@h
+	addi	r9,r9,TI_FLAGS
+0:	lwarx	r8,0,r9
+	andc	r8,r8,r11
+#ifdef CONFIG_IBM405_ERR77
+	dcbt	0,r9
+#endif
+	stwcx.	r8,0,r9
+	bne-	0b
+1:
+
 #ifdef CONFIG_44x
 BEGIN_MMU_FTR_SECTION
 	b	1f
-- 
1.5.6

^ permalink raw reply related

* [PATCH v2 3/3] ppc32/kprobe: don't emulate store when kprobe stwu r1
From: Tiejun Chen @ 2011-12-15 11:00 UTC (permalink / raw)
  To: benh, linuxppc-dev
In-Reply-To: <1323946810-4868-1-git-send-email-tiejun.chen@windriver.com>

We don't do the real store operation for kprobing 'stwu Rx,(y)R1'
since this may corrupt the exception frame, now we will do this
operation safely in exception return code after migrate current
exception frame below the kprobed function stack.

So we only update gpr[1] here and trigger a thread flag to mask
this.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
---
 arch/powerpc/lib/sstep.c |   25 +++++++++++++++++++++++--
 1 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 9a52349..5145d10 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -566,7 +566,7 @@ int __kprobes emulate_step(struct pt_regs *regs, unsigned int instr)
 	unsigned long int ea;
 	unsigned int cr, mb, me, sh;
 	int err;
-	unsigned long old_ra;
+	unsigned long old_ra, val3;
 	long ival;
 
 	opcode = instr >> 26;
@@ -1486,10 +1486,31 @@ int __kprobes emulate_step(struct pt_regs *regs, unsigned int instr)
 		goto ldst_done;
 
 	case 36:	/* stw */
-	case 37:	/* stwu */
 		val = regs->gpr[rd];
 		err = write_mem(val, dform_ea(instr, regs), 4, regs);
 		goto ldst_done;
+	case 37:	/* stwu */
+		val = regs->gpr[rd];
+		val3 = dform_ea(instr, regs);
+		/*
+		 * For PPC32 we always use stwu to change stack point with r1. So
+		 * this emulated store may corrupt the exception frame, now we
+		 * have to provide the exception frame trampoline, which is pushed
+		 * below the kprobed function stack. So we only update gpr[1] but
+		 * don't emulate the real store operation. We will do real store
+		 * operation safely in exception return code by checking this flag.
+		 */
+		if ((ra == 1) && !(regs->msr & MSR_PR)) {
+			/*
+			 * Check if we already set since that means we'll
+			 * lose the previous value.
+			 */
+			WARN_ON(test_thread_flag(TIF_EMULATE_STACK_STORE));
+			set_thread_flag(TIF_EMULATE_STACK_STORE);
+			err = 0;
+		} else
+			err = write_mem(val, val3, 4, regs);
+		goto ldst_done;
 
 	case 38:	/* stb */
 	case 39:	/* stbu */
-- 
1.5.6

^ permalink raw reply related

* Re: [PATCH 3/4] ppc32/kprobe: complete kprobe and migrate exception frame
From: tiejun.chen @ 2011-12-15 11:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1323909460.21839.42.camel@pasglop>

Looks we have to go into 'restore' at last as I said previously. I send v2 based
on your all comments.

>> I assume it may not necessary to reorganize ret_from_except for *ppc32* .
> 
> It might be cleaner but I can do that myself later.
> 

I have this version but I'm not 100% sure if its as you expect :)

#define _TIF_WORK_MASK (_TIF_USER_WORK_MASK | _TIF_EMULATE_STACK_STORE)

======
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 56212bc..e52b586 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -791,41 +791,29 @@ ret_from_except:
        SYNC                    /* Some chip revs have problems here... */
        MTMSRD(r10)             /* disable interrupts */

-       lwz     r3,_MSR(r1)     /* Returning to user mode? */
-       andi.   r0,r3,MSR_PR
-       beq     resume_kernel
-
 user_exc_return:               /* r10 contains MSR_KERNEL here */
        /* Check current_thread_info()->flags */
        rlwinm  r9,r1,0,0,(31-THREAD_SHIFT)
        lwz     r9,TI_FLAGS(r9)
-       andi.   r0,r9,_TIF_USER_WORK_MASK
-       bne     do_work
+       andi.   r0,r9,_TIF_WORK_MASK
+       beq     restore

-restore_user:
-#if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
-       /* Check whether this process has its own DBCR0 value.  The internal
-          debug mode bit tells us that dbcr0 should be loaded. */
-       lwz     r0,THREAD+THREAD_DBCR0(r2)
-       andis.  r10,r0,DBCR0_IDM@h
-       bnel-   load_dbcr0
-#endif
+       lwz     r3,_MSR(r1)     /* Returning to user mode? */
+       andi.   r0,r3,MSR_PR
+       bne     do_user_work

 #ifdef CONFIG_PREEMPT
-       b       restore
-
 /* N.B. the only way to get here is from the beq following ret_from_except. */
-resume_kernel:
        /* check current_thread_info->preempt_count */
        rlwinm  r9,r1,0,0,(31-THREAD_SHIFT)
        lwz     r0,TI_PREEMPT(r9)
        cmpwi   0,r0,0          /* if non-zero, just restore regs and return */
-       bne     restore
+       bne     2f
        lwz     r0,TI_FLAGS(r9)
        andi.   r0,r0,_TIF_NEED_RESCHED
-       beq+    restore
+       beq+    2f
        andi.   r0,r3,MSR_EE    /* interrupts off? */
-       beq     restore         /* don't schedule if so */
+       beq     2f      /* don't schedule if so */
 #ifdef CONFIG_TRACE_IRQFLAGS
        /* Lockdep thinks irqs are enabled, we need to call
         * preempt_schedule_irq with IRQs off, so we inform lockdep
@@ -844,12 +832,54 @@ resume_kernel:
         */
        bl      trace_hardirqs_on
 #endif
-#else
-resume_kernel:
+2:
 #endif /* CONFIG_PREEMPT */

+       /* check current_thread_info, _TIF_EMULATE_STACK_STORE */
+       rlwinm  r9,r1,0,0,(31-THREAD_SHIFT)
+       lwz     r0,TI_FLAGS(r9)
+       andis.  r0,r0,_TIF_EMULATE_STACK_STORE@h
+       beq+    restore
+
+       addi    r9,r1,INT_FRAME_SIZE    /* Get the kprobed function entry */
+
+       lwz     r3,GPR1(r1)
+       subi    r3,r3,INT_FRAME_SIZE    /* dst: Allocate a trampoline exception
frame */
+       mr      r4,r1                   /* src:  current exception frame */
+       li      r5,INT_FRAME_SIZE       /* size: INT_FRAME_SIZE */
+       mr      r1,r3                   /* Reroute the trampoline frame to r1 */
+       bl      memcpy                  /* Copy from the original to the
trampoline */
+
+       /* Do real store operation to complete stwu */
+       lwz     r5,GPR1(r1)
+       stw     r9,0(r5)
+
+       /* Do real store operation to complete stwu */
+       lwz     r5,GPR1(r1)
+       stw     r9,0(r5)
+
+       /* Clear _TIF_EMULATE_STACK_STORE flag */
+       rlwinm  r9,r1,0,0,(31-THREAD_SHIFT)
+       lis     r11,_TIF_EMULATE_STACK_STORE@h
+       addi    r9,r9,TI_FLAGS
+0:     lwarx   r8,0,r9
+       andc    r8,r8,r11
+#ifdef CONFIG_IBM405_ERR77
+       dcbt    0,r9
+#endif
+       stwcx.  r8,0,r9
+       bne-    0b
+
        /* interrupts are hard-disabled at this point */
 restore:
+#if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
+       lwz     r3,_MSR(r1)     /* Returning to user mode? */
+       andi.   r0,r3,MSR_PR
+       beq     1f
+       /* Check whether this process has its own DBCR0 value.  The internal
+          debug mode bit tells us that dbcr0 should be loaded. */
+       lwz     r0,THREAD+THREAD_DBCR0(r2)
+       andis.  r10,r0,DBCR0_IDM@h
+       bnel-   load_dbcr0
+1:
+#endif
+
 #ifdef CONFIG_44x
 BEGIN_MMU_FTR_SECTION
        b       1f
@@ -1159,7 +1189,7 @@ global_dbcr0:
        .previous
 #endif /* !(CONFIG_4xx || CONFIG_BOOKE) */

-do_work:                       /* r10 contains MSR_KERNEL here */
+do_user_work:                  /* r10 contains MSR_KERNEL here */
        andi.   r0,r9,_TIF_NEED_RESCHED
        beq     do_user_signal

@@ -1184,7 +1214,7 @@ recheck:
        andi.   r0,r9,_TIF_NEED_RESCHED
        bne-    do_resched
        andi.   r0,r9,_TIF_USER_WORK_MASK
-       beq     restore_user
+       beq     restore
 do_user_signal:                        /* r10 contains MSR_KERNEL here */
        ori     r10,r10,MSR_EE
        SYNC

Tiejun

Thanks
Tiejun

^ permalink raw reply related

* Re: linux-next bad Kconfig for drivers/hid
From: Tony Breeds @ 2011-12-15 11:44 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Jeremy Fitzhardinge, LinuxPPC-dev, Linux Kernel ML
In-Reply-To: <alpine.LNX.2.00.1112151107020.21970@pobox.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 499 bytes --]

On Thu, Dec 15, 2011 at 11:08:08AM +0100, Jiri Kosina wrote:
 
> Tony,
> 
> have you actually tested this one to work in the configuration you have 
> been seeing it to fail?
> 
> I don't seem to be able to find any use of '==' in other Kconfig files 
> (and never used it myself), so I'd like to have confirmation that it 
> actually works and fixes the problem before I apply it :)

Sorry I used '=' instead of '==' and that worked.  I figured the '=='
was just a typo.

Yours Tony

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: Linux port availability for p5010 processor
From: Vineeth @ 2011-12-15 11:45 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <4EE7A776.70609@freescale.com>

[-- Attachment #1: Type: text/plain, Size: 1013 bytes --]

found p5020_ds.c in platforms/85xx;
why is it a part of 85xx directory ? the core of P5020 is E5500 where the
core of 85xx is e500;

Do we have the processor initialization code (start.S, head.S) files, port
available with linux ?





On Wed, Dec 14, 2011 at 12:58 AM, Scott Wood <scottwood@freescale.com>wrote:

> On 12/12/2011 11:33 PM, Vineeth wrote:
> > Do we have a linux port available for freescale P5010 processor (with
> > single E5500 core) ?
> > /(found arch/powerpc/platforms/pseries ; and a some details on
> > kernel/cputable.c /)
>
> p5010 is basically a p5020 with one core and "memory complex" instead of
> two.  The same Linux code should work for both.
>
> pseries is something completely different.
>
> > Is there any reference board which uses this processor ?
>
> p5020ds has a p5020.
>
> Linux support is in arch/powerpc/platforms/85xx/p5020_ds.c (and
> scattered in other places).
>
> > any reference in DTS file also will be helpful.
>
> arch/powerpc/boot/dts/p5020ds.dts
>
> -Scott
>
>

[-- Attachment #2: Type: text/html, Size: 1508 bytes --]

^ permalink raw reply

* [PATCH 0/5] Make use of hardware reference and change bits in HPT
From: Paul Mackerras @ 2011-12-15 12:00 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, kvm-ppc

This series of patches builds on top of my previous series and
modifies the Book3S HV memory management code to use the hardware
reference and change bits in the guest hashed page table.  This makes
kvm_age_hva() more efficient, lets us implement the dirty page
tracking properly (which in turn means that things like VGA emulation
in qemu can work), and also means that we can supply hardware
reference and change information to the guest -- not that Linux guests
currently use that information, but possibly they will want it in
future, and there is an interface defined in PAPR for it.

Paul.

^ permalink raw reply

* [PATCH 1/5] KVM: PPC: Book3S HV: Keep HPTE locked when invalidating
From: Paul Mackerras @ 2011-12-15 12:01 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, kvm-ppc
In-Reply-To: <20111215120018.GA20629@bloggs.ozlabs.ibm.com>

This reworks the implementations of the H_REMOVE and H_BULK_REMOVE
hcalls to make sure that we keep the HPTE locked and in the reverse-
mapping chain until we have finished invalidating it.  Previously
we would remove it from the chain and unlock it before invalidating
it, leaving a tiny window when the guest could access the page even
though we believe we have removed it from the guest (e.g.,
kvm_unmap_hva() has been called for the page and has found no HPTEs
in the chain).  In addition, we'll need this for future patches where
we will need to read the R and C bits in the HPTE after invalidating
it.

Doing this required restructuring kvmppc_h_bulk_remove() substantially.
Since we want to batch up the tlbies, we now need to keep several
HPTEs locked simultaneously.  In order to avoid possible deadlocks,
we don't spin on the HPTE bitlock for any except the first HPTE in
a batch.  If we can't acquire the HPTE bitlock for the second or
subsequent HPTE, we terminate the batch at that point, do the tlbies
that we have accumulated so far, unlock those HPTEs, and then start
a new batch to do the remaining invalidations.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |  212 ++++++++++++++++++++--------------
 1 files changed, 125 insertions(+), 87 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 7c0fc99..823348d 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -140,6 +140,12 @@ static pte_t lookup_linux_pte(struct kvm_vcpu *vcpu, unsigned long hva,
 	return kvmppc_read_update_linux_pte(ptep, writing);
 }
 
+static inline void unlock_hpte(unsigned long *hpte, unsigned long hpte_v)
+{
+	asm volatile(PPC_RELEASE_BARRIER "" : : : "memory");
+	hpte[0] = hpte_v;
+}
+
 long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 		    long pte_index, unsigned long pteh, unsigned long ptel)
 {
@@ -356,6 +362,7 @@ long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags,
 	struct kvm *kvm = vcpu->kvm;
 	unsigned long *hpte;
 	unsigned long v, r, rb;
+	struct revmap_entry *rev;
 
 	if (pte_index >= HPT_NPTE)
 		return H_PARAMETER;
@@ -368,30 +375,32 @@ long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags,
 		hpte[0] &= ~HPTE_V_HVLOCK;
 		return H_NOT_FOUND;
 	}
-	if (atomic_read(&kvm->online_vcpus) == 1)
-		flags |= H_LOCAL;
-	vcpu->arch.gpr[4] = v = hpte[0] & ~HPTE_V_HVLOCK;
-	vcpu->arch.gpr[5] = r = hpte[1];
-	rb = compute_tlbie_rb(v, r, pte_index);
-	if (v & HPTE_V_VALID)
+
+	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
+	v = hpte[0] & ~HPTE_V_HVLOCK;
+	if (v & HPTE_V_VALID) {
+		hpte[0] &= ~HPTE_V_VALID;
+		rb = compute_tlbie_rb(v, hpte[1], pte_index);
+		if (!(flags & H_LOCAL) && atomic_read(&kvm->online_vcpus) > 1) {
+			while (!try_lock_tlbie(&kvm->arch.tlbie_lock))
+				cpu_relax();
+			asm volatile("ptesync" : : : "memory");
+			asm volatile(PPC_TLBIE(%1,%0)"; eieio; tlbsync"
+				     : : "r" (rb), "r" (kvm->arch.lpid));
+			asm volatile("ptesync" : : : "memory");
+			kvm->arch.tlbie_lock = 0;
+		} else {
+			asm volatile("ptesync" : : : "memory");
+			asm volatile("tlbiel %0" : : "r" (rb));
+			asm volatile("ptesync" : : : "memory");
+		}
 		remove_revmap_chain(kvm, pte_index, v);
-	smp_wmb();
-	hpte[0] = 0;
-	if (!(v & HPTE_V_VALID))
-		return H_SUCCESS;
-	if (!(flags & H_LOCAL)) {
-		while (!try_lock_tlbie(&kvm->arch.tlbie_lock))
-			cpu_relax();
-		asm volatile("ptesync" : : : "memory");
-		asm volatile(PPC_TLBIE(%1,%0)"; eieio; tlbsync"
-			     : : "r" (rb), "r" (kvm->arch.lpid));
-		asm volatile("ptesync" : : : "memory");
-		kvm->arch.tlbie_lock = 0;
-	} else {
-		asm volatile("ptesync" : : : "memory");
-		asm volatile("tlbiel %0" : : "r" (rb));
-		asm volatile("ptesync" : : : "memory");
 	}
+	r = rev->guest_rpte;
+	unlock_hpte(hpte, 0);
+
+	vcpu->arch.gpr[4] = v;
+	vcpu->arch.gpr[5] = r;
 	return H_SUCCESS;
 }
 
@@ -399,82 +408,113 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 {
 	struct kvm *kvm = vcpu->kvm;
 	unsigned long *args = &vcpu->arch.gpr[4];
-	unsigned long *hp, tlbrb[4];
-	long int i, found;
-	long int n_inval = 0;
-	unsigned long flags, req, pte_index;
+	unsigned long *hp, *hptes[4], tlbrb[4];
+	long int i, j, k, n, found, indexes[4];
+	unsigned long flags, req, pte_index, rcbits;
 	long int local = 0;
 	long int ret = H_SUCCESS;
+	struct revmap_entry *rev, *revs[4];
 
 	if (atomic_read(&kvm->online_vcpus) == 1)
 		local = 1;
-	for (i = 0; i < 4; ++i) {
-		pte_index = args[i * 2];
-		flags = pte_index >> 56;
-		pte_index &= ((1ul << 56) - 1);
-		req = flags >> 6;
-		flags &= 3;
-		if (req == 3)
-			break;
-		if (req != 1 || flags == 3 ||
-		    pte_index >= HPT_NPTE) {
-			/* parameter error */
-			args[i * 2] = ((0xa0 | flags) << 56) + pte_index;
-			ret = H_PARAMETER;
-			break;
-		}
-		hp = (unsigned long *)(kvm->arch.hpt_virt + (pte_index << 4));
-		while (!try_lock_hpte(hp, HPTE_V_HVLOCK))
-			cpu_relax();
-		found = 0;
-		if (hp[0] & (HPTE_V_ABSENT | HPTE_V_VALID)) {
-			switch (flags & 3) {
-			case 0:		/* absolute */
-				found = 1;
+	for (i = 0; i < 4 && ret == H_SUCCESS; ) {
+		n = 0;
+		for (; i < 4; ++i) {
+			j = i * 2;
+			pte_index = args[j];
+			flags = pte_index >> 56;
+			pte_index &= ((1ul << 56) - 1);
+			req = flags >> 6;
+			flags &= 3;
+			if (req == 3) {		/* no more requests */
+				i = 4;
 				break;
-			case 1:		/* andcond */
-				if (!(hp[0] & args[i * 2 + 1]))
-					found = 1;
+			}
+			if (req != 1 || flags == 3 || pte_index >= HPT_NPTE) {
+				/* parameter error */
+				args[j] = ((0xa0 | flags) << 56) + pte_index;
+				ret = H_PARAMETER;
 				break;
-			case 2:		/* AVPN */
-				if ((hp[0] & ~0x7fUL) == args[i * 2 + 1])
+			}
+			hp = (unsigned long *)
+				(kvm->arch.hpt_virt + (pte_index << 4));
+			/* to avoid deadlock, don't spin except for first */
+			if (!try_lock_hpte(hp, HPTE_V_HVLOCK)) {
+				if (n)
+					break;
+				while (!try_lock_hpte(hp, HPTE_V_HVLOCK))
+					cpu_relax();
+			}
+			found = 0;
+			if (hp[0] & (HPTE_V_ABSENT | HPTE_V_VALID)) {
+				switch (flags & 3) {
+				case 0:		/* absolute */
 					found = 1;
-				break;
+					break;
+				case 1:		/* andcond */
+					if (!(hp[0] & args[j + 1]))
+						found = 1;
+					break;
+				case 2:		/* AVPN */
+					if ((hp[0] & ~0x7fUL) == args[j + 1])
+						found = 1;
+					break;
+				}
+			}
+			if (!found) {
+				hp[0] &= ~HPTE_V_HVLOCK;
+				args[j] = ((0x90 | flags) << 56) + pte_index;
+				continue;
 			}
+
+			args[j] = ((0x80 | flags) << 56) + pte_index;
+			rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
+			/* insert R and C bits from guest PTE */
+			rcbits = rev->guest_rpte & (HPTE_R_R|HPTE_R_C);
+			args[j] |= rcbits << (56 - 5);
+
+			if (!(hp[0] & HPTE_V_VALID))
+				continue;
+
+			hp[0] &= ~HPTE_V_VALID;		/* leave it locked */
+			tlbrb[n] = compute_tlbie_rb(hp[0], hp[1], pte_index);
+			indexes[n] = j;
+			hptes[n] = hp;
+			revs[n] = rev;
+			++n;
 		}
-		if (!found) {
-			hp[0] &= ~HPTE_V_HVLOCK;
-			args[i * 2] = ((0x90 | flags) << 56) + pte_index;
-			continue;
+
+		if (!n)
+			break;
+
+		/* Now that we've collected a batch, do the tlbies */
+		if (!local) {
+			while(!try_lock_tlbie(&kvm->arch.tlbie_lock))
+				cpu_relax();
+			asm volatile("ptesync" : : : "memory");
+			for (k = 0; k < n; ++k)
+				asm volatile(PPC_TLBIE(%1,%0) : :
+					     "r" (tlbrb[k]),
+					     "r" (kvm->arch.lpid));
+			asm volatile("eieio; tlbsync; ptesync" : : : "memory");
+			kvm->arch.tlbie_lock = 0;
+		} else {
+			asm volatile("ptesync" : : : "memory");
+			for (k = 0; k < n; ++k)
+				asm volatile("tlbiel %0" : : "r" (tlbrb[k]));
+			asm volatile("ptesync" : : : "memory");
 		}
-		/* insert R and C bits from PTE */
-		flags |= (hp[1] >> 5) & 0x0c;
-		args[i * 2] = ((0x80 | flags) << 56) + pte_index;
-		if (hp[0] & HPTE_V_VALID) {
-			tlbrb[n_inval++] = compute_tlbie_rb(hp[0], hp[1], pte_index);
+
+		for (k = 0; k < n; ++k) {
+			j = indexes[k];
+			pte_index = args[j] & ((1ul << 56) - 1);
+			hp = hptes[k];
+			rev = revs[k];
 			remove_revmap_chain(kvm, pte_index, hp[0]);
+			unlock_hpte(hp, 0);
 		}
-		smp_wmb();
-		hp[0] = 0;
-	}
-	if (n_inval == 0)
-		return ret;
-
-	if (!local) {
-		while(!try_lock_tlbie(&kvm->arch.tlbie_lock))
-			cpu_relax();
-		asm volatile("ptesync" : : : "memory");
-		for (i = 0; i < n_inval; ++i)
-			asm volatile(PPC_TLBIE(%1,%0)
-				     : : "r" (tlbrb[i]), "r" (kvm->arch.lpid));
-		asm volatile("eieio; tlbsync; ptesync" : : : "memory");
-		kvm->arch.tlbie_lock = 0;
-	} else {
-		asm volatile("ptesync" : : : "memory");
-		for (i = 0; i < n_inval; ++i)
-			asm volatile("tlbiel %0" : : "r" (tlbrb[i]));
-		asm volatile("ptesync" : : : "memory");
 	}
+
 	return ret;
 }
 
@@ -720,9 +760,7 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 	rev = real_vmalloc_addr(&kvm->arch.revmap[index]);
 	gr = rev->guest_rpte;
 
-	/* Unlock the HPTE */
-	asm volatile("lwsync" : : : "memory");
-	hpte[0] = v;
+	unlock_hpte(hpte, v);
 
 	/* For not found, if the HPTE is valid by now, retry the instruction */
 	if ((status & DSISR_NOHPTE) && (v & HPTE_V_VALID))
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH 2/5] KVM: PPC: Book3s HV: Maintain separate guest and host views of R and C bits
From: Paul Mackerras @ 2011-12-15 12:02 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, kvm-ppc
In-Reply-To: <20111215120018.GA20629@bloggs.ozlabs.ibm.com>

This allows both the guest and the host to use the referenced (R) and
changed (C) bits in the guest hashed page table.  The guest has a view
of R and C that is maintained in the guest_rpte field of the revmap
entry for the HPTE, and the host has a view that is maintained in the
rmap entry for the associated gfn.

Both view are updated from the guest HPT.  If a bit (R or C) is zero
in either view, it will be initially set to zero in the HPTE (or HPTEs),
until set to 1 by hardware.  When an HPTE is removed for any reason,
the R and C bits from the HPTE are ORed into both views.  We have to
be careful to read the R and C bits from the HPTE after invalidating
it, but before unlocking it, in case of any late updates by the hardware.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h |    5 ++-
 arch/powerpc/kvm/book3s_64_mmu_hv.c |   48 +++++++++++++++++++++-------------
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |   45 +++++++++++++++++++--------------
 3 files changed, 59 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 968f3aa..1cb6e52 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -200,8 +200,9 @@ struct revmap_entry {
  * index in the guest HPT of a HPTE that points to the page.
  */
 #define KVMPPC_RMAP_LOCK_BIT	63
-#define KVMPPC_RMAP_REF_BIT	33
-#define KVMPPC_RMAP_REFERENCED	(1ul << KVMPPC_RMAP_REF_BIT)
+#define KVMPPC_RMAP_RC_SHIFT	32
+#define KVMPPC_RMAP_REFERENCED	(HPTE_R_R << KVMPPC_RMAP_RC_SHIFT)
+#define KVMPPC_RMAP_CHANGED	(HPTE_R_C << KVMPPC_RMAP_RC_SHIFT)
 #define KVMPPC_RMAP_PRESENT	0x100000000ul
 #define KVMPPC_RMAP_INDEX	0xfffffffful
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 66d6452..aa51dde 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -505,6 +505,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	unsigned long is_io;
 	unsigned int writing, write_ok;
 	struct vm_area_struct *vma;
+	unsigned long rcbits;
 
 	/*
 	 * Real-mode code has already searched the HPT and found the
@@ -640,11 +641,17 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		goto out_unlock;
 	}
 
+	/* Only set R/C in real HPTE if set in both *rmap and guest_rpte */
+	rcbits = *rmap >> KVMPPC_RMAP_RC_SHIFT;
+	r &= rcbits | ~(HPTE_R_R | HPTE_R_C);
+
 	if (hptep[0] & HPTE_V_VALID) {
 		/* HPTE was previously valid, so we need to invalidate it */
 		unlock_rmap(rmap);
 		hptep[0] |= HPTE_V_ABSENT;
 		kvmppc_invalidate_hpte(kvm, hptep, index);
+		/* don't lose previous R and C bits */
+		r |= hptep[1] & (HPTE_R_R | HPTE_R_C);
 	} else {
 		kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
 	}
@@ -701,50 +708,55 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	struct revmap_entry *rev = kvm->arch.revmap;
 	unsigned long h, i, j;
 	unsigned long *hptep;
-	unsigned long ptel, psize;
+	unsigned long ptel, psize, rcbits;
 
 	for (;;) {
-		while (test_and_set_bit_lock(KVMPPC_RMAP_LOCK_BIT, rmapp))
-			cpu_relax();
+		lock_rmap(rmapp);
 		if (!(*rmapp & KVMPPC_RMAP_PRESENT)) {
-			__clear_bit_unlock(KVMPPC_RMAP_LOCK_BIT, rmapp);
+			unlock_rmap(rmapp);
 			break;
 		}
 
 		/*
 		 * To avoid an ABBA deadlock with the HPTE lock bit,
-		 * we have to unlock the rmap chain before locking the HPTE.
-		 * Thus we remove the first entry, unlock the rmap chain,
-		 * lock the HPTE and then check that it is for the
-		 * page we're unmapping before changing it to non-present.
+		 * we can't spin on the HPTE lock while holding the
+		 * rmap chain lock.
 		 */
 		i = *rmapp & KVMPPC_RMAP_INDEX;
+		hptep = (unsigned long *) (kvm->arch.hpt_virt + (i << 4));
+		if (!try_lock_hpte(hptep, HPTE_V_HVLOCK)) {
+			/* unlock rmap before spinning on the HPTE lock */
+			unlock_rmap(rmapp);
+			while (hptep[0] & HPTE_V_HVLOCK)
+				cpu_relax();
+			continue;
+		}
 		j = rev[i].forw;
 		if (j == i) {
 			/* chain is now empty */
-			j = 0;
+			*rmapp &= ~(KVMPPC_RMAP_PRESENT | KVMPPC_RMAP_INDEX);
 		} else {
 			/* remove i from chain */
 			h = rev[i].back;
 			rev[h].forw = j;
 			rev[j].back = h;
 			rev[i].forw = rev[i].back = i;
-			j |= KVMPPC_RMAP_PRESENT;
+			*rmapp = (*rmapp & ~KVMPPC_RMAP_INDEX) | j;
 		}
-		smp_wmb();
-		*rmapp = j | (1ul << KVMPPC_RMAP_REF_BIT);
 
-		/* Now lock, check and modify the HPTE */
-		hptep = (unsigned long *) (kvm->arch.hpt_virt + (i << 4));
-		while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
-			cpu_relax();
+		/* Now check and modify the HPTE */
 		ptel = rev[i].guest_rpte;
 		psize = hpte_page_size(hptep[0], ptel);
 		if ((hptep[0] & HPTE_V_VALID) &&
 		    hpte_rpn(ptel, psize) == gfn) {
-			kvmppc_invalidate_hpte(kvm, hptep, i);
 			hptep[0] |= HPTE_V_ABSENT;
+			kvmppc_invalidate_hpte(kvm, hptep, i);
+			/* Harvest R and C */
+			rcbits = hptep[1] & (HPTE_R_R | HPTE_R_C);
+			*rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
+			rev[i].guest_rpte = ptel | rcbits;
 		}
+		unlock_rmap(rmapp);
 		hptep[0] &= ~HPTE_V_HVLOCK;
 	}
 	return 0;
@@ -767,7 +779,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	kvm_unmap_rmapp(kvm, rmapp, gfn);
 	while (test_and_set_bit_lock(KVMPPC_RMAP_LOCK_BIT, rmapp))
 		cpu_relax();
-	__clear_bit(KVMPPC_RMAP_REF_BIT, rmapp);
+	*rmapp &= ~KVMPPC_RMAP_REFERENCED;
 	__clear_bit_unlock(KVMPPC_RMAP_LOCK_BIT, rmapp);
 	return 1;
 }
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 823348d..bcf6f92 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -87,15 +87,17 @@ EXPORT_SYMBOL_GPL(kvmppc_add_revmap_chain);
 
 /* Remove this HPTE from the chain for a real page */
 static void remove_revmap_chain(struct kvm *kvm, long pte_index,
-				unsigned long hpte_v)
+				struct revmap_entry *rev,
+				unsigned long hpte_v, unsigned long hpte_r)
 {
-	struct revmap_entry *rev, *next, *prev;
+	struct revmap_entry *next, *prev;
 	unsigned long gfn, ptel, head;
 	struct kvm_memory_slot *memslot;
 	unsigned long *rmap;
+	unsigned long rcbits;
 
-	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
-	ptel = rev->guest_rpte;
+	rcbits = hpte_r & (HPTE_R_R | HPTE_R_C);
+	ptel = rev->guest_rpte |= rcbits;
 	gfn = hpte_rpn(ptel, hpte_page_size(hpte_v, ptel));
 	memslot = builtin_gfn_to_memslot(kvm, gfn);
 	if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID))
@@ -116,6 +118,7 @@ static void remove_revmap_chain(struct kvm *kvm, long pte_index,
 		else
 			*rmap = (*rmap & ~KVMPPC_RMAP_INDEX) | head;
 	}
+	*rmap |= rcbits << KVMPPC_RMAP_RC_SHIFT;
 	unlock_rmap(rmap);
 }
 
@@ -162,6 +165,7 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 	pte_t pte;
 	unsigned int writing;
 	unsigned long mmu_seq;
+	unsigned long rcbits;
 	bool realmode = vcpu->arch.vcore->vcore_state == VCORE_RUNNING;
 
 	psize = hpte_page_size(pteh, ptel);
@@ -320,6 +324,9 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 		} else {
 			kvmppc_add_revmap_chain(kvm, rev, rmap, pte_index,
 						realmode);
+			/* Only set R/C in real HPTE if already set in *rmap */
+			rcbits = *rmap >> KVMPPC_RMAP_RC_SHIFT;
+			ptel &= rcbits | ~(HPTE_R_R | HPTE_R_C);
 		}
 	}
 
@@ -394,7 +401,8 @@ long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags,
 			asm volatile("tlbiel %0" : : "r" (rb));
 			asm volatile("ptesync" : : : "memory");
 		}
-		remove_revmap_chain(kvm, pte_index, v);
+		/* Read PTE low word after tlbie to get final R/C values */
+		remove_revmap_chain(kvm, pte_index, rev, v, hpte[1]);
 	}
 	r = rev->guest_rpte;
 	unlock_hpte(hpte, 0);
@@ -469,12 +477,13 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 
 			args[j] = ((0x80 | flags) << 56) + pte_index;
 			rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
-			/* insert R and C bits from guest PTE */
-			rcbits = rev->guest_rpte & (HPTE_R_R|HPTE_R_C);
-			args[j] |= rcbits << (56 - 5);
 
-			if (!(hp[0] & HPTE_V_VALID))
+			if (!(hp[0] & HPTE_V_VALID)) {
+				/* insert R and C bits from PTE */
+				rcbits = rev->guest_rpte & (HPTE_R_R|HPTE_R_C);
+				args[j] |= rcbits << (56 - 5);
 				continue;
+			}
 
 			hp[0] &= ~HPTE_V_VALID;		/* leave it locked */
 			tlbrb[n] = compute_tlbie_rb(hp[0], hp[1], pte_index);
@@ -505,13 +514,16 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 			asm volatile("ptesync" : : : "memory");
 		}
 
+		/* Read PTE low words after tlbie to get final R/C values */
 		for (k = 0; k < n; ++k) {
 			j = indexes[k];
 			pte_index = args[j] & ((1ul << 56) - 1);
 			hp = hptes[k];
 			rev = revs[k];
-			remove_revmap_chain(kvm, pte_index, hp[0]);
-			unlock_hpte(hp, 0);
+			remove_revmap_chain(kvm, pte_index, rev, hp[0], hp[1]);
+			rcbits = rev->guest_rpte & (HPTE_R_R|HPTE_R_C);
+			args[j] |= rcbits << (56 - 5);
+			hp[0] = 0;
 		}
 	}
 
@@ -595,8 +607,7 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
 		pte_index &= ~3;
 		n = 4;
 	}
-	if (flags & H_R_XLATE)
-		rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
+	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
 	for (i = 0; i < n; ++i, ++pte_index) {
 		hpte = (unsigned long *)(kvm->arch.hpt_virt + (pte_index << 4));
 		v = hpte[0] & ~HPTE_V_HVLOCK;
@@ -605,12 +616,8 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
 			v &= ~HPTE_V_ABSENT;
 			v |= HPTE_V_VALID;
 		}
-		if (v & HPTE_V_VALID) {
-			if (rev)
-				r = rev[i].guest_rpte;
-			else
-				r = hpte[1] | HPTE_R_RPN;
-		}
+		if (v & HPTE_V_VALID)
+			r = rev[i].guest_rpte | (r & (HPTE_R_R | HPTE_R_C));
 		vcpu->arch.gpr[4 + i * 2] = v;
 		vcpu->arch.gpr[5 + i * 2] = r;
 	}
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH 3/5] KVM: PPC: Book3S HV: Use the hardware referenced bit for kvm_age_hva
From: Paul Mackerras @ 2011-12-15 12:02 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, kvm-ppc
In-Reply-To: <20111215120018.GA20629@bloggs.ozlabs.ibm.com>

This uses the host view of the hardware R (referenced) bit to speed
up kvm_age_hva() and kvm_test_age_hva().  Instead of removing all
the relevant HPTEs in kvm_age_hva(), we now just reset their R bits
if set.  Also, kvm_test_age_hva() now scans the relevant HPTEs to
see if any of them have R set.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s.h |    2 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |   81 ++++++++++++++++++++++++++++-----
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   |   19 ++++++++
 3 files changed, 91 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index ea9539c..6ececb4 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -149,6 +149,8 @@ extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev,
 			unsigned long *rmap, long pte_index, int realmode);
 extern void kvmppc_invalidate_hpte(struct kvm *kvm, unsigned long *hptep,
 			unsigned long pte_index);
+void kvmppc_clear_ref_hpte(struct kvm *kvm, unsigned long *hptep,
+			unsigned long pte_index);
 extern void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long addr,
 			unsigned long *nb_ret);
 extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index aa51dde..926e2b9 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -772,16 +772,50 @@ int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
 static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			 unsigned long gfn)
 {
-	if (!kvm->arch.using_mmu_notifiers)
-		return 0;
-	if (!(*rmapp & KVMPPC_RMAP_REFERENCED))
-		return 0;
-	kvm_unmap_rmapp(kvm, rmapp, gfn);
-	while (test_and_set_bit_lock(KVMPPC_RMAP_LOCK_BIT, rmapp))
-		cpu_relax();
-	*rmapp &= ~KVMPPC_RMAP_REFERENCED;
-	__clear_bit_unlock(KVMPPC_RMAP_LOCK_BIT, rmapp);
-	return 1;
+	struct revmap_entry *rev = kvm->arch.revmap;
+	unsigned long head, i, j;
+	unsigned long *hptep;
+	int ret = 0;
+
+ retry:
+	lock_rmap(rmapp);
+	if (*rmapp & KVMPPC_RMAP_REFERENCED) {
+		*rmapp &= ~KVMPPC_RMAP_REFERENCED;
+		ret = 1;
+	}
+	if (!(*rmapp & KVMPPC_RMAP_PRESENT)) {
+		unlock_rmap(rmapp);
+		return ret;
+	}
+
+	i = head = *rmapp & KVMPPC_RMAP_INDEX;
+	do {
+		hptep = (unsigned long *) (kvm->arch.hpt_virt + (i << 4));
+		j = rev[i].forw;
+
+		/* If this HPTE isn't referenced, ignore it */
+		if (!(hptep[1] & HPTE_R_R))
+			continue;
+
+		if (!try_lock_hpte(hptep, HPTE_V_HVLOCK)) {
+			/* unlock rmap before spinning on the HPTE lock */
+			unlock_rmap(rmapp);
+			while (hptep[0] & HPTE_V_HVLOCK)
+				cpu_relax();
+			goto retry;
+		}
+
+		/* Now check and modify the HPTE */
+		if ((hptep[0] & HPTE_V_VALID) && (hptep[1] & HPTE_R_R)) {
+			kvmppc_clear_ref_hpte(kvm, hptep, i);
+			rev[i].guest_rpte |= HPTE_R_R;
+			ret = 1;
+		}
+		hptep[0] &= ~HPTE_V_HVLOCK;
+	} while ((i = j) != head);
+
+	unlock_rmap(rmapp);
+	return ret;
 }
 
 int kvm_age_hva(struct kvm *kvm, unsigned long hva)
@@ -794,7 +828,32 @@ int kvm_age_hva(struct kvm *kvm, unsigned long hva)
 static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			      unsigned long gfn)
 {
-	return !!(*rmapp & KVMPPC_RMAP_REFERENCED);
+	struct revmap_entry *rev = kvm->arch.revmap;
+	unsigned long head, i, j;
+	unsigned long *hp;
+	int ret = 1;
+
+	if (*rmapp & KVMPPC_RMAP_REFERENCED)
+		return 1;
+
+	lock_rmap(rmapp);
+	if (*rmapp & KVMPPC_RMAP_REFERENCED)
+		goto out;
+
+	if (*rmapp & KVMPPC_RMAP_PRESENT) {
+		i = head = *rmapp & KVMPPC_RMAP_INDEX;
+		do {
+			hp = (unsigned long *)(kvm->arch.hpt_virt + (i << 4));
+			j = rev[i].forw;
+			if (hp[1] & HPTE_R_R)
+				goto out;
+		} while ((i = j) != head);
+	}
+	ret = 0;
+
+ out:
+	unlock_rmap(rmapp);
+	return ret;
 }
 
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index bcf6f92..76864a8 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -641,6 +641,25 @@ void kvmppc_invalidate_hpte(struct kvm *kvm, unsigned long *hptep,
 }
 EXPORT_SYMBOL_GPL(kvmppc_invalidate_hpte);
 
+void kvmppc_clear_ref_hpte(struct kvm *kvm, unsigned long *hptep,
+			   unsigned long pte_index)
+{
+	unsigned long rb;
+	unsigned char rbyte;
+
+	rb = compute_tlbie_rb(hptep[0], hptep[1], pte_index);
+	rbyte = (hptep[1] & ~HPTE_R_R) >> 8;
+	/* modify only the second-last byte, which contains the ref bit */
+	*((char *)hptep + 14) = rbyte;
+	while (!try_lock_tlbie(&kvm->arch.tlbie_lock))
+		cpu_relax();
+	asm volatile(PPC_TLBIE(%1,%0)"; eieio; tlbsync"
+		     : : "r" (rb), "r" (kvm->arch.lpid));
+	asm volatile("ptesync" : : : "memory");
+	kvm->arch.tlbie_lock = 0;
+}
+EXPORT_SYMBOL_GPL(kvmppc_clear_ref_hpte);
+
 static int slb_base_page_shift[4] = {
 	24,	/* 16M */
 	16,	/* 64k */
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH 4/5] KVM: PPC: Book3s HV: Implement get_dirty_log using hardware changed bit
From: Paul Mackerras @ 2011-12-15 12:03 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, kvm-ppc
In-Reply-To: <20111215120018.GA20629@bloggs.ozlabs.ibm.com>

This changes the implementation of kvm_vm_ioctl_get_dirty_log() for
Book3s HV guests to use the hardware C (changed) bits in the guest
hashed page table.  Since this makes the implementation quite different
from the Book3s PR case, this moves the existing implementation from
book3s.c to book3s_pr.c and creates a new implementation in book3s_hv.c.
That implementation calls kvmppc_hv_get_dirty_log() to do the actual
work by calling kvm_test_clear_dirty on each page.  It iterates over
the HPTEs, clearing the C bit if set, and returns 1 if any C bit was
set (including the saved C bit in the rmap entry).

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s.h |    2 +
 arch/powerpc/kvm/book3s.c             |   39 ------------------
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |   69 +++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv.c          |   37 +++++++++++++++++
 arch/powerpc/kvm/book3s_pr.c          |   39 ++++++++++++++++++
 5 files changed, 147 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 6ececb4..aa795cc 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -158,6 +158,8 @@ extern long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 			long pte_index, unsigned long pteh, unsigned long ptel);
 extern long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 			long pte_index, unsigned long pteh, unsigned long ptel);
+extern long kvmppc_hv_get_dirty_log(struct kvm *kvm,
+			struct kvm_memory_slot *memslot);
 
 extern void kvmppc_entry_trampoline(void);
 extern void kvmppc_hv_entry_trampoline(void);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 6bf7e05..7d54f4e 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -477,45 +477,6 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
-/*
- * Get (and clear) the dirty memory log for a memory slot.
- */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
-				      struct kvm_dirty_log *log)
-{
-	struct kvm_memory_slot *memslot;
-	struct kvm_vcpu *vcpu;
-	ulong ga, ga_end;
-	int is_dirty = 0;
-	int r;
-	unsigned long n;
-
-	mutex_lock(&kvm->slots_lock);
-
-	r = kvm_get_dirty_log(kvm, log, &is_dirty);
-	if (r)
-		goto out;
-
-	/* If nothing is dirty, don't bother messing with page tables. */
-	if (is_dirty) {
-		memslot = id_to_memslot(kvm->memslots, log->slot);
-
-		ga = memslot->base_gfn << PAGE_SHIFT;
-		ga_end = ga + (memslot->npages << PAGE_SHIFT);
-
-		kvm_for_each_vcpu(n, vcpu, kvm)
-			kvmppc_mmu_pte_pflush(vcpu, ga, ga_end);
-
-		n = kvm_dirty_bitmap_bytes(memslot);
-		memset(memslot->dirty_bitmap, 0, n);
-	}
-
-	r = 0;
-out:
-	mutex_unlock(&kvm->slots_lock);
-	return r;
-}
-
 void kvmppc_decrementer_func(unsigned long data)
 {
 	struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 926e2b9..783cd35 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -870,6 +870,75 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
 	kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
 }
 
+static int kvm_test_clear_dirty(struct kvm *kvm, unsigned long *rmapp)
+{
+	struct revmap_entry *rev = kvm->arch.revmap;
+	unsigned long head, i, j;
+	unsigned long *hptep;
+	int ret = 0;
+
+ retry:
+	lock_rmap(rmapp);
+	if (*rmapp & KVMPPC_RMAP_CHANGED) {
+		*rmapp &= ~KVMPPC_RMAP_CHANGED;
+		ret = 1;
+	}
+	if (!(*rmapp & KVMPPC_RMAP_PRESENT)) {
+		unlock_rmap(rmapp);
+		return ret;
+	}
+
+	i = head = *rmapp & KVMPPC_RMAP_INDEX;
+	do {
+		hptep = (unsigned long *) (kvm->arch.hpt_virt + (i << 4));
+		j = rev[i].forw;
+
+		if (!(hptep[1] & HPTE_R_C))
+			continue;
+
+		if (!try_lock_hpte(hptep, HPTE_V_HVLOCK)) {
+			/* unlock rmap before spinning on the HPTE lock */
+			unlock_rmap(rmapp);
+			while (hptep[0] & HPTE_V_HVLOCK)
+				cpu_relax();
+			goto retry;
+		}
+
+		/* Now check and modify the HPTE */
+		if ((hptep[0] & HPTE_V_VALID) && (hptep[1] & HPTE_R_C)) {
+			/* need to make it temporarily absent to clear C */
+			hptep[0] |= HPTE_V_ABSENT;
+			kvmppc_invalidate_hpte(kvm, hptep, i);
+			hptep[1] &= ~HPTE_R_C;
+			eieio();
+			hptep[0] = (hptep[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
+			rev[i].guest_rpte |= HPTE_R_C;
+			ret = 1;
+		}
+		hptep[0] &= ~HPTE_V_HVLOCK;
+	} while ((i = j) != head);
+
+	unlock_rmap(rmapp);
+	return ret;
+}
+
+long kvmppc_hv_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+{
+	unsigned long i;
+	unsigned long *rmapp, *map;
+
+	preempt_disable();
+	rmapp = memslot->rmap;
+	map = memslot->dirty_bitmap;
+	for (i = 0; i < memslot->npages; ++i) {
+		if (kvm_test_clear_dirty(kvm, rmapp))
+			__set_bit_le(i, map);
+		++rmapp;
+	}
+	preempt_enable();
+	return 0;
+}
+
 void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long gpa,
 			    unsigned long *nb_ret)
 {
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cb8e15f..c11d960 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1105,6 +1105,43 @@ long kvm_vm_ioctl_allocate_rma(struct kvm *kvm, struct kvm_allocate_rma *ret)
 	return fd;
 }
 
+/*
+ * Get (and clear) the dirty memory log for a memory slot.
+ */
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	struct kvm_memory_slot *memslot;
+	int r;
+	unsigned long n;
+
+	mutex_lock(&kvm->slots_lock);
+
+	r = -EINVAL;
+	if (log->slot >= KVM_MEMORY_SLOTS)
+		goto out;
+
+	memslot = id_to_memslot(kvm->memslots, log->slot);
+	r = -ENOENT;
+	if (!memslot->dirty_bitmap)
+		goto out;
+
+	n = kvm_dirty_bitmap_bytes(memslot);
+	memset(memslot->dirty_bitmap, 0, n);
+
+	r = kvmppc_hv_get_dirty_log(kvm, memslot);
+	if (r)
+		goto out;
+
+	r = -EFAULT;
+	if (copy_to_user(log->dirty_bitmap, memslot->dirty_bitmap, n))
+		goto out;
+
+	r = 0;
+out:
+	mutex_unlock(&kvm->slots_lock);
+	return r;
+}
+
 static unsigned long slb_pgsize_encoding(unsigned long psize)
 {
 	unsigned long senc = 0;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index ddd92a5..dfb52dc 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1069,6 +1069,45 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+/*
+ * Get (and clear) the dirty memory log for a memory slot.
+ */
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+				      struct kvm_dirty_log *log)
+{
+	struct kvm_memory_slot *memslot;
+	struct kvm_vcpu *vcpu;
+	ulong ga, ga_end;
+	int is_dirty = 0;
+	int r;
+	unsigned long n;
+
+	mutex_lock(&kvm->slots_lock);
+
+	r = kvm_get_dirty_log(kvm, log, &is_dirty);
+	if (r)
+		goto out;
+
+	/* If nothing is dirty, don't bother messing with page tables. */
+	if (is_dirty) {
+		memslot = id_to_memslot(kvm->memslots, log->slot);
+
+		ga = memslot->base_gfn << PAGE_SHIFT;
+		ga_end = ga + (memslot->npages << PAGE_SHIFT);
+
+		kvm_for_each_vcpu(n, vcpu, kvm)
+			kvmppc_mmu_pte_pflush(vcpu, ga, ga_end);
+
+		n = kvm_dirty_bitmap_bytes(memslot);
+		memset(memslot->dirty_bitmap, 0, n);
+	}
+
+	r = 0;
+out:
+	mutex_unlock(&kvm->slots_lock);
+	return r;
+}
+
 int kvmppc_core_prepare_memory_region(struct kvm *kvm,
 				      struct kvm_userspace_memory_region *mem)
 {
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH 5/5] KVM: PPC: Book3s HV: Implement H_CLEAR_REF and H_CLEAR_MOD hcalls
From: Paul Mackerras @ 2011-12-15 12:04 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, kvm-ppc
In-Reply-To: <20111215120018.GA20629@bloggs.ozlabs.ibm.com>

This adds implementations for the H_CLEAR_REF (test and clear reference
bit) and H_CLEAR_MOD (test and clear changed bit) hypercalls.  These
hypercalls are not used by Linux guests at this stage, and these
implementations are only compile tested.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c     |   69 +++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |    4 +-
 2 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 76864a8..718b5a7 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -624,6 +624,75 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
 	return H_SUCCESS;
 }
 
+long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
+			unsigned long pte_index)
+{
+	struct kvm *kvm = vcpu->kvm;
+	unsigned long *hpte, v, r, gr;
+	struct revmap_entry *rev;
+
+	if (pte_index >= HPT_NPTE)
+		return H_PARAMETER;
+
+	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
+	hpte = (unsigned long *)(kvm->arch.hpt_virt + (pte_index << 4));
+	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
+		cpu_relax();
+	v = hpte[0];
+	r = hpte[1];
+	gr = rev->guest_rpte;
+	rev->guest_rpte &= ~HPTE_R_R;
+	if (v & HPTE_V_VALID) {
+		gr |= r & (HPTE_R_R | HPTE_R_C);
+		if (r & HPTE_R_R)
+			kvmppc_clear_ref_hpte(kvm, hpte, pte_index);
+	}
+	unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
+
+	if (!(v & (HPTE_V_VALID | HPTE_V_ABSENT)))
+		return H_NOT_FOUND;
+
+	vcpu->arch.gpr[4] = gr;
+	return H_SUCCESS;
+}
+
+long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
+			unsigned long pte_index)
+{
+	struct kvm *kvm = vcpu->kvm;
+	unsigned long *hpte, v, r, gr;
+	struct revmap_entry *rev;
+
+	if (pte_index >= HPT_NPTE)
+		return H_PARAMETER;
+
+	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
+	hpte = (unsigned long *)(kvm->arch.hpt_virt + (pte_index << 4));
+	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
+		cpu_relax();
+	v = hpte[0];
+	r = hpte[1];
+	gr = rev->guest_rpte;
+	rev->guest_rpte &= ~HPTE_R_C;
+	if (v & HPTE_V_VALID) {
+		gr |= r & (HPTE_R_R | HPTE_R_C);
+		if (r & HPTE_R_C) {
+			hpte[0] |= HPTE_V_ABSENT;
+			kvmppc_invalidate_hpte(kvm, hpte, pte_index);
+			hpte[1] &= ~HPTE_R_C;
+			eieio();
+			hpte[0] = v;
+		}
+	}
+	unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
+
+	if (!(v & (HPTE_V_VALID | HPTE_V_ABSENT)))
+		return H_NOT_FOUND;
+
+	vcpu->arch.gpr[4] = gr;
+	return H_SUCCESS;
+}
+
 void kvmppc_invalidate_hpte(struct kvm *kvm, unsigned long *hptep,
 			unsigned long pte_index)
 {
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b70bf22..4c52d6d 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1276,8 +1276,8 @@ hcall_real_table:
 	.long	.kvmppc_h_remove - hcall_real_table
 	.long	.kvmppc_h_enter - hcall_real_table
 	.long	.kvmppc_h_read - hcall_real_table
-	.long	0		/* 0x10 - H_CLEAR_MOD */
-	.long	0		/* 0x14 - H_CLEAR_REF */
+	.long	.kvmppc_h_clear_mod - hcall_real_table
+	.long	.kvmppc_h_clear_ref - hcall_real_table
 	.long	.kvmppc_h_protect - hcall_real_table
 	.long	0		/* 0x1c - H_GET_TCE */
 	.long	.kvmppc_h_put_tce - hcall_real_table
-- 
1.7.7.3

^ permalink raw reply related

* Re: Linux port availability for p5010 processor
From: Tabi Timur-B04825 @ 2011-12-15 14:30 UTC (permalink / raw)
  To: Vineeth; +Cc: Wood Scott-B07421, linuxppc-dev@ozlabs.org
In-Reply-To: <CAFbQSaA6yK7d+O0C6-BguKamRcBNnsU-yimO48cQV8G_evmJjw@mail.gmail.com>

On Thu, Dec 15, 2011 at 5:45 AM, Vineeth <vneethv@gmail.com> wrote:
>
> why is it a part of 85xx directory ? the core of P5020 is E5500 where the
> core of 85xx is e500;

e5500 is very similar to e500, so it's all part of the same family of cores=
.

--=20
Timur Tabi
Linux kernel developer at Freescale=

^ permalink raw reply

* Re: linux-next bad Kconfig for drivers/hid
From: Randy Dunlap @ 2011-12-15 17:43 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Jeremy Fitzhardinge, LinuxPPC-dev, Linux Kernel ML
In-Reply-To: <alpine.LNX.2.00.1112151107020.21970@pobox.suse.cz>

On 12/15/2011 02:08 AM, Jiri Kosina wrote:
> On Mon, 12 Dec 2011, Tony Breeds wrote:
> 
>> On Mon, Dec 12, 2011 at 12:21:16AM +0100, Jiri Kosina wrote:
>>> On Thu, 8 Dec 2011, Jeremy Fitzhardinge wrote:
>>>>
>>>> Hm.  How about making it "depends on HID && POWER_SUPPLY"?  I think that
>>>> would needlessly disable it if HID is also modular, but I'm not sure how
>>>> to fix that.  "depends on HID && POWER_SUPPLY && HID == POWER_SUPPLY"?
>>
>> That would work, but I think technically I think you could end up with
>> HID=m and POWER_SUPPLY=m which would still allow HID_BATTERY_STRENGTH=y
>> which is the same problem.
>>
>> I don't know what kind of .config contortions you'd need to do to get
>> there.
>>  
>>> How about making it 'default POWER_SUPPLY' instead?
>>
>> By itself that wont help as POWER_SUPPLY=m statisfies.
>>
>> So it looks like we have Jeremy's:
>> 	HID && POWER_SUPPLY && HID == POWER_SUPPLY
> 
> Tony,
> 
> have you actually tested this one to work in the configuration you have 
> been seeing it to fail?
> 
> I don't seem to be able to find any use of '==' in other Kconfig files 
> (and never used it myself), so I'd like to have confirmation that it 
> actually works and fixes the problem before I apply it :)

Documentation/kbuild/kconfig-language.txt does not list "==":

<expr> ::= <symbol>                             (1)
           <symbol> '=' <symbol>                (2)
           <symbol> '!=' <symbol>               (3)
           '(' <expr> ')'                       (4)
           '!' <expr>                           (5)
           <expr> '&&' <expr>                   (6)
           <expr> '||' <expr>                   (7)


-- 
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply

* [PATCH] phylib: update mdiobus_alloc() to allocate extra private space
From: Timur Tabi @ 2011-12-15 16:51 UTC (permalink / raw)
  To: davem, afleming, netdev, linuxppc-dev

Augment mdiobus_alloc() to take a parameter indicating the number of extra
bytes to allocate for private data.  Almost all callers of mdiobus_alloc()
separately allocate a private data structure.  By allowing mdiobus_alloc()
to allocate extra memory, the two allocations can be merged into one.

This patch does not change any of the callers to actually take advantage
of this feature, however.  That change can be made by the individual
maintainers at their leisure.  For now, all callers ask for zero additional
bytes, which mimics the previous behavior.

Signed-off-by: Timur Tabi <timur@freescale.com>
---
 arch/powerpc/platforms/pasemi/gpio_mdio.c         |    2 +-
 drivers/net/ethernet/adi/bfin_mac.c               |    2 +-
 drivers/net/ethernet/aeroflex/greth.c             |    2 +-
 drivers/net/ethernet/amd/au1000_eth.c             |    2 +-
 drivers/net/ethernet/broadcom/bcm63xx_enet.c      |    2 +-
 drivers/net/ethernet/broadcom/sb1250-mac.c        |    2 +-
 drivers/net/ethernet/broadcom/tg3.c               |    2 +-
 drivers/net/ethernet/cadence/macb.c               |    2 +-
 drivers/net/ethernet/dnet.c                       |    2 +-
 drivers/net/ethernet/ethoc.c                      |    2 +-
 drivers/net/ethernet/faraday/ftgmac100.c          |    2 +-
 drivers/net/ethernet/freescale/fec.c              |    2 +-
 drivers/net/ethernet/freescale/fec_mpc52xx_phy.c  |    2 +-
 drivers/net/ethernet/freescale/fs_enet/mii-fec.c  |    2 +-
 drivers/net/ethernet/freescale/fsl_pq_mdio.c      |    2 +-
 drivers/net/ethernet/lantiq_etop.c                |    2 +-
 drivers/net/ethernet/marvell/mv643xx_eth.c        |    2 +-
 drivers/net/ethernet/marvell/pxa168_eth.c         |    2 +-
 drivers/net/ethernet/rdc/r6040.c                  |    2 +-
 drivers/net/ethernet/s6gmac.c                     |    2 +-
 drivers/net/ethernet/smsc/smsc911x.c              |    2 +-
 drivers/net/ethernet/smsc/smsc9420.c              |    2 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c |    2 +-
 drivers/net/ethernet/ti/cpmac.c                   |    2 +-
 drivers/net/ethernet/ti/davinci_mdio.c            |    2 +-
 drivers/net/ethernet/toshiba/tc35815.c            |    2 +-
 drivers/net/ethernet/xilinx/ll_temac_mdio.c       |    2 +-
 drivers/net/ethernet/xilinx/xilinx_emaclite.c     |    2 +-
 drivers/net/ethernet/xscale/ixp4xx_eth.c          |    2 +-
 drivers/net/phy/fixed.c                           |    2 +-
 drivers/net/phy/mdio-bitbang.c                    |    2 +-
 drivers/net/phy/mdio-octeon.c                     |    2 +-
 drivers/net/phy/mdio_bus.c                        |   20 +++++++++++++++++---
 drivers/staging/et131x/et131x.c                   |    2 +-
 include/linux/phy.h                               |    2 +-
 net/dsa/dsa.c                                     |    2 +-
 36 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/platforms/pasemi/gpio_mdio.c b/arch/powerpc/platforms/pasemi/gpio_mdio.c
index 9886296..754a57b 100644
--- a/arch/powerpc/platforms/pasemi/gpio_mdio.c
+++ b/arch/powerpc/platforms/pasemi/gpio_mdio.c
@@ -230,7 +230,7 @@ static int __devinit gpio_mdio_probe(struct platform_device *ofdev)
 	if (!priv)
 		goto out;
 
-	new_bus = mdiobus_alloc();
+	new_bus = mdiobus_alloc(0);
 
 	if (!new_bus)
 		goto out_free_priv;
diff --git a/drivers/net/ethernet/adi/bfin_mac.c b/drivers/net/ethernet/adi/bfin_mac.c
index b6d69c9..ea71758 100644
--- a/drivers/net/ethernet/adi/bfin_mac.c
+++ b/drivers/net/ethernet/adi/bfin_mac.c
@@ -1659,7 +1659,7 @@ static int __devinit bfin_mii_bus_probe(struct platform_device *pdev)
 	}
 
 	rc = -ENOMEM;
-	miibus = mdiobus_alloc();
+	miibus = mdiobus_alloc(0);
 	if (miibus == NULL)
 		goto out_err_alloc;
 	miibus->read = bfin_mdiobus_read;
diff --git a/drivers/net/ethernet/aeroflex/greth.c b/drivers/net/ethernet/aeroflex/greth.c
index c885aa9..c6bc550 100644
--- a/drivers/net/ethernet/aeroflex/greth.c
+++ b/drivers/net/ethernet/aeroflex/greth.c
@@ -1326,7 +1326,7 @@ static int greth_mdio_init(struct greth_private *greth)
 	int ret, phy;
 	unsigned long timeout;
 
-	greth->mdio = mdiobus_alloc();
+	greth->mdio = mdiobus_alloc(0);
 	if (!greth->mdio) {
 		return -ENOMEM;
 	}
diff --git a/drivers/net/ethernet/amd/au1000_eth.c b/drivers/net/ethernet/amd/au1000_eth.c
index cc9262b..5c30544 100644
--- a/drivers/net/ethernet/amd/au1000_eth.c
+++ b/drivers/net/ethernet/amd/au1000_eth.c
@@ -1159,7 +1159,7 @@ static int __devinit au1000_probe(struct platform_device *pdev)
 		goto err_mdiobus_alloc;
 	}
 
-	aup->mii_bus = mdiobus_alloc();
+	aup->mii_bus = mdiobus_alloc(0);
 	if (aup->mii_bus == NULL) {
 		dev_err(&pdev->dev, "failed to allocate mdiobus structure\n");
 		err = -ENOMEM;
diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
index a11a8ad..c847801 100644
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c
+++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
@@ -1715,7 +1715,7 @@ static int __devinit bcm_enet_probe(struct platform_device *pdev)
 	/* MII bus registration */
 	if (priv->has_phy) {
 
-		priv->mii_bus = mdiobus_alloc();
+		priv->mii_bus = mdiobus_alloc(0);
 		if (!priv->mii_bus) {
 			ret = -ENOMEM;
 			goto out_uninit_hw;
diff --git a/drivers/net/ethernet/broadcom/sb1250-mac.c b/drivers/net/ethernet/broadcom/sb1250-mac.c
index 8fa7abc..4ff830e 100644
--- a/drivers/net/ethernet/broadcom/sb1250-mac.c
+++ b/drivers/net/ethernet/broadcom/sb1250-mac.c
@@ -2252,7 +2252,7 @@ static int sbmac_init(struct platform_device *pldev, long long base)
 	/* This is needed for PASS2 for Rx H/W checksum feature */
 	sbmac_set_iphdr_offset(sc);
 
-	sc->mii_bus = mdiobus_alloc();
+	sc->mii_bus = mdiobus_alloc(0);
 	if (sc->mii_bus == NULL) {
 		err = -ENOMEM;
 		goto uninit_ctx;
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 1979151..6ce2c4c 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -1323,7 +1323,7 @@ static int tg3_mdio_init(struct tg3 *tp)
 	if (!tg3_flag(tp, USE_PHYLIB) || tg3_flag(tp, MDIOBUS_INITED))
 		return 0;
 
-	tp->mdio_bus = mdiobus_alloc();
+	tp->mdio_bus = mdiobus_alloc(0);
 	if (tp->mdio_bus == NULL)
 		return -ENOMEM;
 
diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
index a437b46..dc3b09e 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -234,7 +234,7 @@ static int macb_mii_init(struct macb *bp)
 	/* Enable management port */
 	macb_writel(bp, NCR, MACB_BIT(MPE));
 
-	bp->mii_bus = mdiobus_alloc();
+	bp->mii_bus = mdiobus_alloc(0);
 	if (bp->mii_bus == NULL) {
 		err = -ENOMEM;
 		goto err_out;
diff --git a/drivers/net/ethernet/dnet.c b/drivers/net/ethernet/dnet.c
index ce88c0f..04a55a4 100644
--- a/drivers/net/ethernet/dnet.c
+++ b/drivers/net/ethernet/dnet.c
@@ -316,7 +316,7 @@ static int dnet_mii_init(struct dnet *bp)
 {
 	int err, i;
 
-	bp->mii_bus = mdiobus_alloc();
+	bp->mii_bus = mdiobus_alloc(0);
 	if (bp->mii_bus == NULL)
 		return -ENOMEM;
 
diff --git a/drivers/net/ethernet/ethoc.c b/drivers/net/ethernet/ethoc.c
index 60f0e78..9916475 100644
--- a/drivers/net/ethernet/ethoc.c
+++ b/drivers/net/ethernet/ethoc.c
@@ -1056,7 +1056,7 @@ static int __devinit ethoc_probe(struct platform_device *pdev)
 	ethoc_set_mac_address(netdev, netdev->dev_addr);
 
 	/* register MII bus */
-	priv->mdio = mdiobus_alloc();
+	priv->mdio = mdiobus_alloc(0);
 	if (!priv->mdio) {
 		ret = -ENOMEM;
 		goto free;
diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
index fb5579a..19fc332 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1248,7 +1248,7 @@ static int ftgmac100_probe(struct platform_device *pdev)
 	priv->irq = irq;
 
 	/* initialize mdio bus */
-	priv->mii_bus = mdiobus_alloc();
+	priv->mii_bus = mdiobus_alloc(0);
 	if (!priv->mii_bus) {
 		err = -EIO;
 		goto err_alloc_mdiobus;
diff --git a/drivers/net/ethernet/freescale/fec.c b/drivers/net/ethernet/freescale/fec.c
index 01ee9cc..ab1d39c 100644
--- a/drivers/net/ethernet/freescale/fec.c
+++ b/drivers/net/ethernet/freescale/fec.c
@@ -1066,7 +1066,7 @@ static int fec_enet_mii_init(struct platform_device *pdev)
 	fep->phy_speed <<= 1;
 	writel(fep->phy_speed, fep->hwp + FEC_MII_SPEED);
 
-	fep->mii_bus = mdiobus_alloc();
+	fep->mii_bus = mdiobus_alloc(0);
 	if (fep->mii_bus == NULL) {
 		err = -ENOMEM;
 		goto err_out;
diff --git a/drivers/net/ethernet/freescale/fec_mpc52xx_phy.c b/drivers/net/ethernet/freescale/fec_mpc52xx_phy.c
index 360a578..f5918c1 100644
--- a/drivers/net/ethernet/freescale/fec_mpc52xx_phy.c
+++ b/drivers/net/ethernet/freescale/fec_mpc52xx_phy.c
@@ -70,7 +70,7 @@ static int mpc52xx_fec_mdio_probe(struct platform_device *of)
 	struct resource res;
 	int err;
 
-	bus = mdiobus_alloc();
+	bus = mdiobus_alloc(0);
 	if (bus == NULL)
 		return -ENOMEM;
 	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
diff --git a/drivers/net/ethernet/freescale/fs_enet/mii-fec.c b/drivers/net/ethernet/freescale/fs_enet/mii-fec.c
index 55bb867..74964ce 100644
--- a/drivers/net/ethernet/freescale/fs_enet/mii-fec.c
+++ b/drivers/net/ethernet/freescale/fs_enet/mii-fec.c
@@ -116,7 +116,7 @@ static int __devinit fs_enet_mdio_probe(struct platform_device *ofdev)
 		return -EINVAL;
 	get_bus_freq = match->data;
 
-	new_bus = mdiobus_alloc();
+	new_bus = mdiobus_alloc(0);
 	if (!new_bus)
 		goto out;
 
diff --git a/drivers/net/ethernet/freescale/fsl_pq_mdio.c b/drivers/net/ethernet/freescale/fsl_pq_mdio.c
index f109602..332418a 100644
--- a/drivers/net/ethernet/freescale/fsl_pq_mdio.c
+++ b/drivers/net/ethernet/freescale/fsl_pq_mdio.c
@@ -266,7 +266,7 @@ static int fsl_pq_mdio_probe(struct platform_device *ofdev)
 	if (!priv)
 		return -ENOMEM;
 
-	new_bus = mdiobus_alloc();
+	new_bus = mdiobus_alloc(0);
 	if (!new_bus) {
 		err = -ENOMEM;
 		goto err_free_priv;
diff --git a/drivers/net/ethernet/lantiq_etop.c b/drivers/net/ethernet/lantiq_etop.c
index 0b3567a..94a6055 100644
--- a/drivers/net/ethernet/lantiq_etop.c
+++ b/drivers/net/ethernet/lantiq_etop.c
@@ -425,7 +425,7 @@ ltq_etop_mdio_init(struct net_device *dev)
 	int i;
 	int err;
 
-	priv->mii_bus = mdiobus_alloc();
+	priv->mii_bus = mdiobus_alloc(0);
 	if (!priv->mii_bus) {
 		netdev_err(dev, "failed to allocate mii bus\n");
 		err = -ENOMEM;
diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c
index e87847e..250e0cc 100644
--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -2604,7 +2604,7 @@ static int mv643xx_eth_shared_probe(struct platform_device *pdev)
 	 * Set up and register SMI bus.
 	 */
 	if (pd == NULL || pd->shared_smi == NULL) {
-		msp->smi_bus = mdiobus_alloc();
+		msp->smi_bus = mdiobus_alloc(0);
 		if (msp->smi_bus == NULL)
 			goto out_unmap;
 
diff --git a/drivers/net/ethernet/marvell/pxa168_eth.c b/drivers/net/ethernet/marvell/pxa168_eth.c
index 5ec409e..2dfb98a 100644
--- a/drivers/net/ethernet/marvell/pxa168_eth.c
+++ b/drivers/net/ethernet/marvell/pxa168_eth.c
@@ -1543,7 +1543,7 @@ static int pxa168_eth_probe(struct platform_device *pdev)
 	pep->timeout.function = rxq_refill_timer_wrapper;
 	pep->timeout.data = (unsigned long)pep;
 
-	pep->smi_bus = mdiobus_alloc();
+	pep->smi_bus = mdiobus_alloc(0);
 	if (pep->smi_bus == NULL) {
 		err = -ENOMEM;
 		goto err_base;
diff --git a/drivers/net/ethernet/rdc/r6040.c b/drivers/net/ethernet/rdc/r6040.c
index 4bf68cf..4175e57 100644
--- a/drivers/net/ethernet/rdc/r6040.c
+++ b/drivers/net/ethernet/rdc/r6040.c
@@ -1176,7 +1176,7 @@ static int __devinit r6040_init_one(struct pci_dev *pdev,
 
 	netif_napi_add(dev, &lp->napi, r6040_poll, 64);
 
-	lp->mii_bus = mdiobus_alloc();
+	lp->mii_bus = mdiobus_alloc(0);
 	if (!lp->mii_bus) {
 		dev_err(&pdev->dev, "mdiobus_alloc() failed\n");
 		err = -ENOMEM;
diff --git a/drivers/net/ethernet/s6gmac.c b/drivers/net/ethernet/s6gmac.c
index a7ff8ea..d8f11ef 100644
--- a/drivers/net/ethernet/s6gmac.c
+++ b/drivers/net/ethernet/s6gmac.c
@@ -994,7 +994,7 @@ static int __devinit s6gmac_probe(struct platform_device *pdev)
 			dev->name);
 		goto errdev;
 	}
-	mb = mdiobus_alloc();
+	mb = mdiobus_alloc(0);
 	if (!mb) {
 		printk(KERN_ERR DRV_PRMT "error allocating mii bus\n");
 		goto errmii;
diff --git a/drivers/net/ethernet/smsc/smsc911x.c b/drivers/net/ethernet/smsc/smsc911x.c
index 06d0df6..103297b 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -1037,7 +1037,7 @@ static int __devinit smsc911x_mii_init(struct platform_device *pdev,
 	struct smsc911x_data *pdata = netdev_priv(dev);
 	int err = -ENXIO, i;
 
-	pdata->mii_bus = mdiobus_alloc();
+	pdata->mii_bus = mdiobus_alloc(0);
 	if (!pdata->mii_bus) {
 		err = -ENOMEM;
 		goto err_out_1;
diff --git a/drivers/net/ethernet/smsc/smsc9420.c b/drivers/net/ethernet/smsc/smsc9420.c
index a9efbdf..16738a1 100644
--- a/drivers/net/ethernet/smsc/smsc9420.c
+++ b/drivers/net/ethernet/smsc/smsc9420.c
@@ -1205,7 +1205,7 @@ static int smsc9420_mii_init(struct net_device *dev)
 	struct smsc9420_pdata *pd = netdev_priv(dev);
 	int err = -ENXIO, i;
 
-	pd->mii_bus = mdiobus_alloc();
+	pd->mii_bus = mdiobus_alloc(0);
 	if (!pd->mii_bus) {
 		err = -ENOMEM;
 		goto err_out_1;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index 9c3b9d5..5dc0396 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -144,7 +144,7 @@ int stmmac_mdio_register(struct net_device *ndev)
 	if (!mdio_bus_data)
 		return 0;
 
-	new_bus = mdiobus_alloc();
+	new_bus = mdiobus_alloc(0);
 	if (new_bus == NULL)
 		return -ENOMEM;
 
diff --git a/drivers/net/ethernet/ti/cpmac.c b/drivers/net/ethernet/ti/cpmac.c
index aaac0c7..b13df3c 100644
--- a/drivers/net/ethernet/ti/cpmac.c
+++ b/drivers/net/ethernet/ti/cpmac.c
@@ -1227,7 +1227,7 @@ int __devinit cpmac_init(void)
 	u32 mask;
 	int i, res;
 
-	cpmac_mii = mdiobus_alloc();
+	cpmac_mii = mdiobus_alloc(0);
 	if (cpmac_mii == NULL)
 		return -ENOMEM;
 
diff --git a/drivers/net/ethernet/ti/davinci_mdio.c b/drivers/net/ethernet/ti/davinci_mdio.c
index 7615040..fffddef 100644
--- a/drivers/net/ethernet/ti/davinci_mdio.c
+++ b/drivers/net/ethernet/ti/davinci_mdio.c
@@ -300,7 +300,7 @@ static int __devinit davinci_mdio_probe(struct platform_device *pdev)
 
 	data->pdata = pdata ? (*pdata) : default_pdata;
 
-	data->bus = mdiobus_alloc();
+	data->bus = mdiobus_alloc(0);
 	if (!data->bus) {
 		dev_err(dev, "failed to alloc mii bus\n");
 		ret = -ENOMEM;
diff --git a/drivers/net/ethernet/toshiba/tc35815.c b/drivers/net/ethernet/toshiba/tc35815.c
index 71b785c..4538908 100644
--- a/drivers/net/ethernet/toshiba/tc35815.c
+++ b/drivers/net/ethernet/toshiba/tc35815.c
@@ -673,7 +673,7 @@ static int tc_mii_init(struct net_device *dev)
 	int err;
 	int i;
 
-	lp->mii_bus = mdiobus_alloc();
+	lp->mii_bus = mdiobus_alloc(0);
 	if (lp->mii_bus == NULL) {
 		err = -ENOMEM;
 		goto err_out;
diff --git a/drivers/net/ethernet/xilinx/ll_temac_mdio.c b/drivers/net/ethernet/xilinx/ll_temac_mdio.c
index 8cf9d4f..7d0ebf5 100644
--- a/drivers/net/ethernet/xilinx/ll_temac_mdio.c
+++ b/drivers/net/ethernet/xilinx/ll_temac_mdio.c
@@ -81,7 +81,7 @@ int temac_mdio_setup(struct temac_local *lp, struct device_node *np)
 	temac_indirect_out32(lp, XTE_MC_OFFSET, 1 << 6 | clk_div);
 	mutex_unlock(&lp->indirect_mutex);
 
-	bus = mdiobus_alloc();
+	bus = mdiobus_alloc(0);
 	if (!bus)
 		return -ENOMEM;
 
diff --git a/drivers/net/ethernet/xilinx/xilinx_emaclite.c b/drivers/net/ethernet/xilinx/xilinx_emaclite.c
index dca6541..4ecd7ac 100644
--- a/drivers/net/ethernet/xilinx/xilinx_emaclite.c
+++ b/drivers/net/ethernet/xilinx/xilinx_emaclite.c
@@ -861,7 +861,7 @@ static int xemaclite_mdio_setup(struct net_local *lp, struct device *dev)
 	out_be32(lp->base_addr + XEL_MDIOCTRL_OFFSET,
 		 XEL_MDIOCTRL_MDIOEN_MASK);
 
-	bus = mdiobus_alloc();
+	bus = mdiobus_alloc(0);
 	if (!bus)
 		return -ENOMEM;
 
diff --git a/drivers/net/ethernet/xscale/ixp4xx_eth.c b/drivers/net/ethernet/xscale/ixp4xx_eth.c
index f45c85a..43b4892 100644
--- a/drivers/net/ethernet/xscale/ixp4xx_eth.c
+++ b/drivers/net/ethernet/xscale/ixp4xx_eth.c
@@ -509,7 +509,7 @@ static int ixp4xx_mdio_register(void)
 {
 	int err;
 
-	if (!(mdio_bus = mdiobus_alloc()))
+	if (!(mdio_bus = mdiobus_alloc(0)))
 		return -ENOMEM;
 
 	if (cpu_is_ixp43x()) {
diff --git a/drivers/net/phy/fixed.c b/drivers/net/phy/fixed.c
index 1fa4d73..18e838e 100644
--- a/drivers/net/phy/fixed.c
+++ b/drivers/net/phy/fixed.c
@@ -214,7 +214,7 @@ static int __init fixed_mdio_bus_init(void)
 		goto err_pdev;
 	}
 
-	fmb->mii_bus = mdiobus_alloc();
+	fmb->mii_bus = mdiobus_alloc(0);
 	if (fmb->mii_bus == NULL) {
 		ret = -ENOMEM;
 		goto err_mdiobus_reg;
diff --git a/drivers/net/phy/mdio-bitbang.c b/drivers/net/phy/mdio-bitbang.c
index daec9b0..097cf50 100644
--- a/drivers/net/phy/mdio-bitbang.c
+++ b/drivers/net/phy/mdio-bitbang.c
@@ -214,7 +214,7 @@ struct mii_bus *alloc_mdio_bitbang(struct mdiobb_ctrl *ctrl)
 {
 	struct mii_bus *bus;
 
-	bus = mdiobus_alloc();
+	bus = mdiobus_alloc(0);
 	if (!bus)
 		return NULL;
 
diff --git a/drivers/net/phy/mdio-octeon.c b/drivers/net/phy/mdio-octeon.c
index bd12ba9..4c68668 100644
--- a/drivers/net/phy/mdio-octeon.c
+++ b/drivers/net/phy/mdio-octeon.c
@@ -99,7 +99,7 @@ static int __devinit octeon_mdiobus_probe(struct platform_device *pdev)
 	/* The platform_device id is our unit number.  */
 	bus->unit = pdev->id;
 
-	bus->mii_bus = mdiobus_alloc();
+	bus->mii_bus = mdiobus_alloc(0);
 
 	if (!bus->mii_bus)
 		goto err;
diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 6c58da2..e6fd35d 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -41,14 +41,28 @@
  *
  * Description: called by a bus driver to allocate an mii_bus
  * structure to fill in.
+ *
+ * 'size' is an an extra amount of memory to allocate for private storage.
+ * If non-zero, then bus->priv is points to that memory.
  */
-struct mii_bus *mdiobus_alloc(void)
+struct mii_bus *mdiobus_alloc(size_t size)
 {
 	struct mii_bus *bus;
+	size_t aligned_size = ALIGN(sizeof(*bus), NETDEV_ALIGN);
+	size_t alloc_size;
+
+	/* If we alloc extra space, it should be aligned */
+	if (size)
+		alloc_size = aligned_size + size;
+	else
+		alloc_size = sizeof(*bus);
 
-	bus = kzalloc(sizeof(*bus), GFP_KERNEL);
-	if (bus != NULL)
+	bus = kzalloc(alloc_size, GFP_KERNEL);
+	if (bus) {
 		bus->state = MDIOBUS_ALLOCATED;
+		if (size)
+			bus->priv = (void *)bus + aligned_size;
+	}
 
 	return bus;
 }
diff --git a/drivers/staging/et131x/et131x.c b/drivers/staging/et131x/et131x.c
index 0c1c6ca..1f5a2f9 100644
--- a/drivers/staging/et131x/et131x.c
+++ b/drivers/staging/et131x/et131x.c
@@ -5396,7 +5396,7 @@ static int __devinit et131x_pci_setup(struct pci_dev *pdev,
 	et1310_disable_phy_coma(adapter);
 
 	/* Setup the mii_bus struct */
-	adapter->mii_bus = mdiobus_alloc();
+	adapter->mii_bus = mdiobus_alloc(0);
 	if (!adapter->mii_bus) {
 		dev_err(&pdev->dev, "Alloc of mii_bus struct failed\n");
 		goto err_mem_free;
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 79f337c..603153f 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -129,7 +129,7 @@ struct mii_bus {
 };
 #define to_mii_bus(d) container_of(d, struct mii_bus, dev)
 
-struct mii_bus *mdiobus_alloc(void);
+struct mii_bus *mdiobus_alloc(size_t size);
 int mdiobus_register(struct mii_bus *bus);
 void mdiobus_unregister(struct mii_bus *bus);
 void mdiobus_free(struct mii_bus *bus);
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 88e7c2f..dd0e270 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -154,7 +154,7 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,
 	if (ret < 0)
 		goto out;
 
-	ds->slave_mii_bus = mdiobus_alloc();
+	ds->slave_mii_bus = mdiobus_alloc(0);
 	if (ds->slave_mii_bus == NULL) {
 		ret = -ENOMEM;
 		goto out;
-- 
1.7.3.4

^ permalink raw reply related

* Re: [PATCH 1/3]arch:powerpc:sysdev:mpic.c Remove extra semicolon.
From: Justin P. Mattock @ 2011-12-15 17:11 UTC (permalink / raw)
  To: Justin P. Mattock; +Cc: linuxppc-dev, trivial, linux-kernel, Paul Mackerras
In-Reply-To: <1321893808-3721-1-git-send-email-justinmattock@gmail.com>

what would be the status of these? should I resend/rebase to the current 
etc?..

On 11/21/2011 08:43 AM, Justin P. Mattock wrote:
> From: "Justin P. Mattock"<justinmattock@gmail.com>
>
> The patch below removes an extra semicolon.
>
> Signed-off-by: Justin P. Mattock<justinmattock@gmail.com>
> CC: linuxppc-dev@lists.ozlabs.org
> CC: Paul Mackerras<paulus@samba.org>
> ---
>   arch/powerpc/sysdev/mpic.c |    2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c
> index 8c7e852..b3fa3d7 100644
> --- a/arch/powerpc/sysdev/mpic.c
> +++ b/arch/powerpc/sysdev/mpic.c
> @@ -901,7 +901,7 @@ int mpic_set_irq_type(struct irq_data *d, unsigned int flow_type)
>   	if (vold != vnew)
>   		mpic_irq_write(src, MPIC_INFO(IRQ_VECTOR_PRI), vnew);
>
> -	return IRQ_SET_MASK_OK_NOCOPY;;
> +	return IRQ_SET_MASK_OK_NOCOPY;
>   }
>
>   void mpic_set_vector(unsigned int virq, unsigned int vector)

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox