LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v3] watchdog: mpc8xxx_wdt convert to watchdog core
From: Wim Van Sebroeck @ 2014-01-14  8:32 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: scottwood, linuxppc-dev, linux-kernel, Guenter Roeck,
	linux-watchdog
In-Reply-To: <20131204063214.D1DB01A2BEA@localhost.localdomain>

Hi Christophe,

> Convert mpc8xxx_wdt.c to the new watchdog API.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

This patch has been added to linux-watchdog-next.

Kind regards,
Wim.

^ permalink raw reply

* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Preeti U Murthy @ 2014-01-14  8:25 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: deepthi, linux-pm, daniel.lezcano, rjw, linux-kernel, paulmck,
	linuxppc-dev, tuukka.tikkanen
In-Reply-To: <52D4E07E.204@linux.vnet.ibm.com>

Hi Srivatsa,

On 01/14/2014 12:30 PM, Srivatsa S. Bhat wrote:
> On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
>> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
>> Inspite of this it was observed that the idle state count of the shallowest
>> idle state, snooze, was increasing.
>>
>> This is because the governor returns the idle state index as 0 even in
>> scenarios when no idle state can be chosen. These scenarios could be when the
>> latency requirement is 0 or as mentioned above when the user wants to disable
>> certain cpu idle states at runtime. In the latter case, its possible that no
>> cpu idle state is valid because the suitable states were disabled
>> and the rest did not match the menu governor criteria to be chosen as the
>> next idle state.
>>
>> This patch adds the code to indicate that a valid cpu idle state could not be
>> chosen by the menu governor and reports back to arch so that it can take some
>> default action.
>>
> 
> That sounds fair enough. However, the "default" action of pseries idle loop
> (pseries_lpar_idle()) surprises me. It enters Cede, which is _deeper_ than doing
> a snooze! IOW, a user might "disable" cpuidle or set the PM_QOS_CPU_DMA_LATENCY
> to 0 hoping to prevent the CPUs from going to deep idle states, but then the
> machine would still end up going to Cede, even though that wont get reflected
> in the idle state counts. IMHO that scenario needs some thought as well...

Yes I did see this, but since the patch intends to only communicate
whether the cpuidle governor was successful in choosing an idle state on
its part, I wished to address the default action of pseries idle loop
separately. You are right we will need to understand the patch which
introduced this action. I will take a look at it.

> 
>> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>> ---
>>
>>  drivers/cpuidle/cpuidle.c        |    6 +++++-
>>  drivers/cpuidle/governors/menu.c |    7 ++++---
>>  2 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index a55e68f..5bf06bb 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>>
>>  	/* ask the governor for the next state */
>>  	next_state = cpuidle_curr_governor->select(drv, dev);
>> +
>> +	dev->last_residency = 0;
>>  	if (need_resched()) {
>> -		dev->last_residency = 0;
>>  		/* give the governor an opportunity to reflect on the outcome */
>>  		if (cpuidle_curr_governor->reflect)
>>  			cpuidle_curr_governor->reflect(dev, next_state);
> 
> The comments on top of the .reflect() routines of the governors say that the
> second parameter is the index of the actual state entered. But after this patch,
> next_state can be negative, indicating an invalid index. So those comments need
> to be updated accordingly.

Right, I will take care of the comment in the next post.
> 
>> @@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
>>  		return 0;
>>  	}
>>
>> +	if (next_state < 0)
>> +		return -EINVAL;
> 
> The exit path above (due to need_resched) returns with irqs enabled, but the new
> one you are adding (next_state < 0) returns with irqs disabled. This is correct,
> because in the latter case, "idle" is still in progress and the arch will choose
> a default handler to execute (unlike the former case where "idle" is over and
> hence its time to enable interrupts).

Correct.
> 
> IMHO it would be good to add comments around this code to explain this subtle
> difference. We can never be too careful with these things... ;-)

Ok, will do so.
> 
>> +
>>  	trace_cpu_idle_rcuidle(next_state, dev->cpu);
>>
>>  	broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>> index cf7f2f0..6921543 100644
>> --- a/drivers/cpuidle/governors/menu.c
>> +++ b/drivers/cpuidle/governors/menu.c
>> @@ -283,6 +283,7 @@ again:
>>   * menu_select - selects the next idle state to enter
>>   * @drv: cpuidle driver containing state data
>>   * @dev: the CPU
>> + * Returns -1 when no idle state is suitable
>>   */
>>  static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>>  {
>> @@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>>  	int multiplier;
>>  	struct timespec t;
>>
>> -	if (data->needs_update) {
>> +	if (data->last_state_idx >= 0 && data->needs_update) {
>                ^^^^^
> Doesn't hurt, but actually unnecessary, since ->needs_update is set to 1
> only when index >= 0.

Right we do not need this check. I was assuming that needs_update would
be consistent with the index >= 0 only in the need_resched() case. But
needs_update will get unset each time the governor is invoked to be set
only if index >= 0 thereafter.

> 
>>  		menu_update(drv, dev);
>>  		data->needs_update = 0;
>>  	}
>>
>> -	data->last_state_idx = 0;
>> +	data->last_state_idx = -1;
>>  	data->exit_us = 0;
>>
>>  	/* Special case when user has set very strict latency requirement */
>>  	if (unlikely(latency_req == 0))
>> -		return 0;
>> +		return data->last_state_idx;
>>
>>  	/* determine the expected residency time, round up */
>>  	t = ktime_to_timespec(tick_nohz_get_sleep_length());
>>
> 
> What about the ladder governor? I know its not used that much in practice,
> but I think it would be good to update that as well, just to keep it
> consistent.

Yes this needs to be updated as well. But the ladder governor has a few
other details to take care of in addition to what is taken care of in
the menu governor by this patch. Hence I will be posting that separately.

Thanks

Regards
Preeti U Murthy
> 
> Regards,
> Srivatsa S. Bhat
> 

^ permalink raw reply

* Re: [PATCH] Move precessing of MCE queued event out from syscall exit path.
From: Benjamin Herrenschmidt @ 2014-01-14  8:20 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Mahesh J Salgaonkar, linuxppc-dev
In-Reply-To: <alpine.LSU.2.11.1401132314380.3222@eggly.anvils>

On Mon, 2014-01-13 at 23:47 -0800, Hugh Dickins wrote:
> 
> And I may be quite wrong to point a finger at ATA errors: perhaps
> they're always shown, and quickly cleared off screen in successful
> boots,
> but left visible when root cannot be mounted for some other reason.

dmesg would tell...

> I don't know, and won't have time to investigate further - bisecting
> intermittents is not much fun!  I'll just have to hope that it's
> sorted out before it reaches 3.14-rc, or else bite the bullet and
> investigate on that.)

Right :-) Oh well, I still use a G5 as a desktop so I might eventually
stumble upon them !

Cheers,
Ben.

^ permalink raw reply

* [PATCH 3/3] powerpc/fsl: Use the new interface to save or restore registers
From: Dongsheng Wang @ 2014-01-14  7:59 UTC (permalink / raw)
  To: scottwood, benh; +Cc: anton, linuxppc-dev, chenhui.zhao, Wang Dongsheng
In-Reply-To: <1389686397-46555-1-git-send-email-dongsheng.wang@freescale.com>

From: Wang Dongsheng <dongsheng.wang@freescale.com>

Use fsl_cpu_state_save/fsl_cpu_state_restore to save/restore registers.
Use the functions to save/restore registers, so we don't need to
maintain the code.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>

diff --git a/arch/powerpc/kernel/swsusp_booke.S b/arch/powerpc/kernel/swsusp_booke.S
index 553c140..b5992db 100644
--- a/arch/powerpc/kernel/swsusp_booke.S
+++ b/arch/powerpc/kernel/swsusp_booke.S
@@ -4,92 +4,28 @@
  * Copyright (c) 2009-2010 MontaVista Software, LLC.
  */
 
-#include <linux/threads.h>
-#include <asm/processor.h>
 #include <asm/page.h>
-#include <asm/cputable.h>
-#include <asm/thread_info.h>
 #include <asm/ppc_asm.h>
 #include <asm/asm-offsets.h>
 #include <asm/mmu.h>
-
-/*
- * Structure for storing CPU registers on the save area.
- */
-#define SL_SP		0
-#define SL_PC		4
-#define SL_MSR		8
-#define SL_TCR		0xc
-#define SL_SPRG0	0x10
-#define SL_SPRG1	0x14
-#define SL_SPRG2	0x18
-#define SL_SPRG3	0x1c
-#define SL_SPRG4	0x20
-#define SL_SPRG5	0x24
-#define SL_SPRG6	0x28
-#define SL_SPRG7	0x2c
-#define SL_TBU		0x30
-#define SL_TBL		0x34
-#define SL_R2		0x38
-#define SL_CR		0x3c
-#define SL_LR		0x40
-#define SL_R12		0x44	/* r12 to r31 */
-#define SL_SIZE		(SL_R12 + 80)
-
-	.section .data
-	.align	5
-
-_GLOBAL(swsusp_save_area)
-	.space	SL_SIZE
-
+#include <asm/fsl_sleep.h>
 
 	.section .text
 	.align	5
 
 _GLOBAL(swsusp_arch_suspend)
-	lis	r11,swsusp_save_area@h
-	ori	r11,r11,swsusp_save_area@l
-
-	mflr	r0
-	stw	r0,SL_LR(r11)
-	mfcr	r0
-	stw	r0,SL_CR(r11)
-	stw	r1,SL_SP(r11)
-	stw	r2,SL_R2(r11)
-	stmw	r12,SL_R12(r11)
-
-	/* Save MSR & TCR */
-	mfmsr	r4
-	stw	r4,SL_MSR(r11)
-	mfspr	r4,SPRN_TCR
-	stw	r4,SL_TCR(r11)
-
-	/* Get a stable timebase and save it */
-1:	mfspr	r4,SPRN_TBRU
-	stw	r4,SL_TBU(r11)
-	mfspr	r5,SPRN_TBRL
-	stw	r5,SL_TBL(r11)
-	mfspr	r3,SPRN_TBRU
-	cmpw	r3,r4
-	bne	1b
+	mflr	r15
+	lis	r3, core_registers_save_area@h
+	ori	r3, r3, core_registers_save_area@l
+
+	/* Save base register */
+	li	r4, 0
+	bl	fsl_cpu_state_save
 
-	/* Save SPRGs */
-	mfspr	r4,SPRN_SPRG0
-	stw	r4,SL_SPRG0(r11)
-	mfspr	r4,SPRN_SPRG1
-	stw	r4,SL_SPRG1(r11)
-	mfspr	r4,SPRN_SPRG2
-	stw	r4,SL_SPRG2(r11)
-	mfspr	r4,SPRN_SPRG3
-	stw	r4,SL_SPRG3(r11)
-	mfspr	r4,SPRN_SPRG4
-	stw	r4,SL_SPRG4(r11)
-	mfspr	r4,SPRN_SPRG5
-	stw	r4,SL_SPRG5(r11)
-	mfspr	r4,SPRN_SPRG6
-	stw	r4,SL_SPRG6(r11)
-	mfspr	r4,SPRN_SPRG7
-	stw	r4,SL_SPRG7(r11)
+	/* Save LR */
+	lis	r3, core_registers_save_area@h
+	ori	r3, r3, core_registers_save_area@l
+	stw	r15, SR_LR(r3)
 
 	/* Call the low level suspend stuff (we should probably have made
 	 * a stackframe...
@@ -97,11 +33,12 @@ _GLOBAL(swsusp_arch_suspend)
 	bl	swsusp_save
 
 	/* Restore LR from the save area */
-	lis	r11,swsusp_save_area@h
-	ori	r11,r11,swsusp_save_area@l
-	lwz	r0,SL_LR(r11)
-	mtlr	r0
+	lis	r3, core_registers_save_area@h
+	ori	r3, r3, core_registers_save_area@l
+	lwz	r15, SR_LR(r3)
+	mtlr	r15
 
+	li	r3, 0
 	blr
 
 _GLOBAL(swsusp_arch_resume)
@@ -138,9 +75,6 @@ _GLOBAL(swsusp_arch_resume)
 	bl flush_dcache_L1
 	bl flush_instruction_cache
 
-	lis	r11,swsusp_save_area@h
-	ori	r11,r11,swsusp_save_area@l
-
 	/*
 	 * Mappings from virtual addresses to physical addresses may be
 	 * different than they were prior to restoring hibernation state. 
@@ -149,53 +83,12 @@ _GLOBAL(swsusp_arch_resume)
 	 */
 	bl	_tlbil_all
 
-	lwz	r4,SL_SPRG0(r11)
-	mtspr	SPRN_SPRG0,r4
-	lwz	r4,SL_SPRG1(r11)
-	mtspr	SPRN_SPRG1,r4
-	lwz	r4,SL_SPRG2(r11)
-	mtspr	SPRN_SPRG2,r4
-	lwz	r4,SL_SPRG3(r11)
-	mtspr	SPRN_SPRG3,r4
-	lwz	r4,SL_SPRG4(r11)
-	mtspr	SPRN_SPRG4,r4
-	lwz	r4,SL_SPRG5(r11)
-	mtspr	SPRN_SPRG5,r4
-	lwz	r4,SL_SPRG6(r11)
-	mtspr	SPRN_SPRG6,r4
-	lwz	r4,SL_SPRG7(r11)
-	mtspr	SPRN_SPRG7,r4
-
-	/* restore the MSR */
-	lwz	r3,SL_MSR(r11)
-	mtmsr	r3
-
-	/* Restore TB */
-	li	r3,0
-	mtspr	SPRN_TBWL,r3
-	lwz	r3,SL_TBU(r11)
-	lwz	r4,SL_TBL(r11)
-	mtspr	SPRN_TBWU,r3
-	mtspr	SPRN_TBWL,r4
-
-	/* Restore TCR and clear any pending bits in TSR. */
-	lwz	r4,SL_TCR(r11)
-	mtspr	SPRN_TCR,r4
-	lis	r4, (TSR_ENW | TSR_WIS | TSR_DIS | TSR_FIS)@h
-	mtspr	SPRN_TSR,r4
-
-	/* Kick decrementer */
-	li	r0,1
-	mtdec	r0
-
-	/* Restore the callee-saved registers and return */
-	lwz	r0,SL_CR(r11)
-	mtcr	r0
-	lwz	r2,SL_R2(r11)
-	lmw	r12,SL_R12(r11)
-	lwz	r1,SL_SP(r11)
-	lwz	r0,SL_LR(r11)
-	mtlr	r0
+	lis	r3, core_registers_save_area@h
+	ori	r3, r3, core_registers_save_area@l
+
+	/* Restore base register */
+	li	r4, 0
+	bl	fsl_cpu_state_restore
 
 	li	r3,0
 	blr
-- 
1.8.5

^ permalink raw reply related

* [PATCH 2/3] powerpc/85xx: Provide two functions to save/restore the core registers
From: Dongsheng Wang @ 2014-01-14  7:59 UTC (permalink / raw)
  To: scottwood, benh; +Cc: anton, linuxppc-dev, chenhui.zhao, Wang Dongsheng
In-Reply-To: <1389686397-46555-1-git-send-email-dongsheng.wang@freescale.com>

From: Wang Dongsheng <dongsheng.wang@freescale.com>

Add fsl_cpu_state_save/fsl_cpu_state_restore functions, used for deep
sleep and hibernation to save/restore core registers. We abstract out
save/restore code for use in various modules, to make them don't need
to maintain.

Currently supported processors type are E6500, E5500, E500MC, E500v2 and
E500v1.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>

diff --git a/arch/powerpc/include/asm/fsl_sleep.h b/arch/powerpc/include/asm/fsl_sleep.h
new file mode 100644
index 0000000..31c8a9b
--- /dev/null
+++ b/arch/powerpc/include/asm/fsl_sleep.h
@@ -0,0 +1,98 @@
+/*
+ * Freescale 85xx Power management set
+ *
+ * Author: Wang Dongsheng <dongsheng.wang@freescale.com>
+ *
+ * Copyright 2014 Freescale Semiconductor Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __ASM_FSL_SLEEP_H
+#define __ASM_FSL_SLEEP_H
+
+/*
+ * Freescale 85xx Core registers set, core register map definition
+ * Address base on r3, we need to compatible with both 32-bit and 64-bit, so
+ * the data width is 64-bit(double word).
+ *
+ * Acronyms:
+ *	dw(data width)	0x08
+ *
+ * Map:
+ * General-Purpose Registers
+ *	GPR1(sp)		0
+ *	GPR2			0x8		(dw * 1)
+ *	GPR13 - GPR31		0x10 ~ 0xa0	(dw * 2 ~ dw * 20)
+ * Foating-point registers
+ *	FPR14 - FPR31		0xa8 ~ 0x130	(dw * 21 ~ dw * 38)
+ * Registers for Branch Operations
+ *	CR			0x138		(dw * 39)
+ *	LR			0x140		(dw * 40)
+ * Processor Control Registers
+ *	MSR			0x148		(dw * 41)
+ *	EPCR			0x150		(dw * 42)
+ *
+ *	Only e500, e500v2 need to save HID0 - HID1
+ *	HID0 - HID1		0x158 ~ 0x160 (dw * 43 ~ dw * 44)
+ * Timer Registers
+ *	TCR			0x168		(dw * 45)
+ *	TB(64bit)		0x170		(dw * 46)
+ *	TBU(32bit)		0x178		(dw * 47)
+ *	TBL(32bit)		0x180		(dw * 48)
+ * Interrupt Registers
+ *	IVPR			0x188		(dw * 49)
+ *	IVOR0 - IVOR15		0x190 ~ 0x208	(dw * 50 ~ dw * 65)
+ *	IVOR32 - IVOR41		0x210 ~ 0x258	(dw * 66 ~ dw * 75)
+ * Software-Use Registers
+ *	SPRG1			0x260		(dw * 76), 64-bit need to save.
+ *	SPRG3			0x268		(dw * 77), 32-bit need to save.
+ * MMU Registers
+ *	PID0 - PID2		0x270 ~ 0x280	(dw * 78 ~ dw * 80)
+ * Debug Registers
+ *	DBCR0 - DBCR2		0x288 ~ 0x298	(dw * 81 ~ dw * 83)
+ *	IAC1 - IAC4		0x2a0 ~ 0x2b8	(dw * 84 ~ dw * 87)
+ *	DAC1 - DAC2		0x2c0 ~ 0x2c8	(dw * 88 ~ dw * 89)
+ *
+ */
+
+#define SR_GPR1			0x000
+#define SR_GPR2			0x008
+#define SR_GPR13		0x010
+#define SR_FPR14		0x0a8
+#define SR_CR			0x138
+#define SR_LR			0x140
+#define SR_MSR			0x148
+#define SR_EPCR			0x150
+#define SR_HID0			0x158
+#define SR_TCR			0x168
+#define SR_TB			0x170
+#define SR_TBU			0x178
+#define SR_TBL			0x180
+#define SR_IVPR			0x188
+#define SR_IVOR0		0x190
+#define SR_IVOR32		0x210
+#define SR_SPRG1		0x260
+#define SR_SPRG3		0x268
+#define SR_PID0			0x270
+#define SR_DBCR0		0x288
+#define SR_IAC1			0x2a0
+#define SR_DAC1			0x2c0
+#define FSL_CPU_SR_SIZE		(SR_DAC1 + 0x10)
+
+#ifndef __ASSEMBLY__
+
+enum core_save_type {
+	BASE_SAVE = 0,
+	ALL_SAVE = 1,
+};
+
+extern int fsl_cpu_state_save(void *save_page, enum core_save_type type);
+extern int fsl_cpu_state_restore(void *restore_page, enum core_save_type type);
+
+#endif
+
+#endif
+
diff --git a/arch/powerpc/platforms/85xx/Makefile b/arch/powerpc/platforms/85xx/Makefile
index 25cebe7..650a01c 100644
--- a/arch/powerpc/platforms/85xx/Makefile
+++ b/arch/powerpc/platforms/85xx/Makefile
@@ -4,6 +4,7 @@
 obj-$(CONFIG_SMP) += smp.o
 
 obj-y += common.o
+obj-y += save-core.o
 
 obj-$(CONFIG_BSC9131_RDB) += bsc913x_rdb.o
 obj-$(CONFIG_C293_PCIE)   += c293pcie.o
diff --git a/arch/powerpc/platforms/85xx/save-core.S b/arch/powerpc/platforms/85xx/save-core.S
new file mode 100644
index 0000000..a6b93b8
--- /dev/null
+++ b/arch/powerpc/platforms/85xx/save-core.S
@@ -0,0 +1,497 @@
+/*
+ * Freescale Power Management, Save/Restore core state
+ *
+ * Copyright 2014 Freescale Semiconductor, Inc.
+ * Author: Wang Dongsheng <dongsheng.wang@freescale.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/fsl_sleep.h>
+
+/*
+ * Freescale 85xx Cores
+ * Support Core List:
+ * E500v1, E500v2, E500MC, E5500, E6500.
+ */
+
+ /*
+ * Save/Restore register operation define
+ */
+#define LOAD_SAVE_ADDRESS	\
+	mr	r10, r3
+
+#ifdef CONFIG_PPC64
+#define PPC_STD(sreg, offset, areg) \
+	std	sreg, (offset)(areg)
+#define PPC_LD(lreg, offset, areg) \
+	ld	lreg, (offset)(areg)
+
+#define PPC_STFD(sreg, offset, areg) \
+	stfd	sreg, (offset)(areg)
+#define PPC_LFD(lreg, offset, areg) \
+	lfd	lreg, (offset)(areg)
+#else
+#define PPC_STD(sreg, offset, areg) \
+	stw	sreg, (offset)(areg)
+#define PPC_LD(lreg, offset, areg) \
+	lwz	lreg, (offset)(areg)
+
+#define PPC_STFD(sreg, offset, areg) \
+	stfs	sreg, (offset)(areg)
+#define PPC_LFD(lreg, offset, areg) \
+	lfs	lreg, (offset)(areg)
+#endif
+
+#define do_save_gpr_reg(reg, addr) \
+	mr	r0, reg ;\
+	PPC_STD(r0, addr, r10)
+
+#define do_restore_gpr_reg(reg, addr) \
+	PPC_LD(r0, addr, r10) ;\
+	mr	reg, r0
+
+#define do_save_fpr_reg(reg, addr) \
+	fmr	fr0, reg ;\
+	PPC_STFD(fr0, addr, r10)
+
+#define do_restore_fpr_reg(reg, addr) \
+	PPC_LFD(fr0, addr, r10) ;\
+	fmr	reg, fr0
+
+#define do_save_spr_reg(reg, addr) \
+	mfspr	r0, SPRN_##reg ;\
+	PPC_STD(r0, addr, r10)
+
+#define do_restore_spr_reg(reg, addr) \
+	PPC_LD(r0, addr, r10) ;\
+	mtspr	SPRN_##reg, r0
+
+#define do_save_special_reg(special, addr) \
+	mf##special	r0 ;\
+	PPC_STD(r0, addr, r10)
+#define do_restore_special_reg(special, addr) \
+	PPC_LD(r0, addr, r10) ;\
+	mt##special	r0
+
+#define do_sr_general_gpr_regs(func) \
+	do_##func##_gpr_reg(r1, SR_GPR1) ;\
+	do_##func##_gpr_reg(r2, SR_GPR2) ;\
+	do_##func##_gpr_reg(r13, SR_GPR13 + 0x00) ;\
+	do_##func##_gpr_reg(r14, SR_GPR13 + 0x08) ;\
+	do_##func##_gpr_reg(r15, SR_GPR13 + 0x10) ;\
+	do_##func##_gpr_reg(r16, SR_GPR13 + 0x18) ;\
+	do_##func##_gpr_reg(r17, SR_GPR13 + 0x20) ;\
+	do_##func##_gpr_reg(r18, SR_GPR13 + 0x28) ;\
+	do_##func##_gpr_reg(r19, SR_GPR13 + 0x30) ;\
+	do_##func##_gpr_reg(r20, SR_GPR13 + 0x38) ;\
+	do_##func##_gpr_reg(r21, SR_GPR13 + 0x40) ;\
+	do_##func##_gpr_reg(r22, SR_GPR13 + 0x48) ;\
+	do_##func##_gpr_reg(r23, SR_GPR13 + 0x50) ;\
+	do_##func##_gpr_reg(r24, SR_GPR13 + 0x58) ;\
+	do_##func##_gpr_reg(r25, SR_GPR13 + 0x60) ;\
+	do_##func##_gpr_reg(r26, SR_GPR13 + 0x68) ;\
+	do_##func##_gpr_reg(r27, SR_GPR13 + 0x70) ;\
+	do_##func##_gpr_reg(r28, SR_GPR13 + 0x78) ;\
+	do_##func##_gpr_reg(r29, SR_GPR13 + 0x80) ;\
+	do_##func##_gpr_reg(r30, SR_GPR13 + 0x88) ;\
+	do_##func##_gpr_reg(r31, SR_GPR13 + 0x90)
+
+#define do_sr_fpr_regs(func) \
+	do_##func##_fpr_reg(fr14, SR_FPR14 + 0x00) ;\
+	do_##func##_fpr_reg(fr15, SR_FPR14 + 0x08) ;\
+	do_##func##_fpr_reg(fr16, SR_FPR14 + 0x10) ;\
+	do_##func##_fpr_reg(fr17, SR_FPR14 + 0x18) ;\
+	do_##func##_fpr_reg(fr18, SR_FPR14 + 0x20) ;\
+	do_##func##_fpr_reg(fr19, SR_FPR14 + 0x28) ;\
+	do_##func##_fpr_reg(fr20, SR_FPR14 + 0x30) ;\
+	do_##func##_fpr_reg(fr21, SR_FPR14 + 0x38) ;\
+	do_##func##_fpr_reg(fr22, SR_FPR14 + 0x40) ;\
+	do_##func##_fpr_reg(fr23, SR_FPR14 + 0x48) ;\
+	do_##func##_fpr_reg(fr24, SR_FPR14 + 0x50) ;\
+	do_##func##_fpr_reg(fr25, SR_FPR14 + 0x58) ;\
+	do_##func##_fpr_reg(fr26, SR_FPR14 + 0x60) ;\
+	do_##func##_fpr_reg(fr27, SR_FPR14 + 0x68) ;\
+	do_##func##_fpr_reg(fr28, SR_FPR14 + 0x70) ;\
+	do_##func##_fpr_reg(fr29, SR_FPR14 + 0x78) ;\
+	do_##func##_fpr_reg(fr30, SR_FPR14 + 0x80) ;\
+	do_##func##_fpr_reg(fr31, SR_FPR14 + 0x88)
+
+#define do_sr_general_branch_regs(func) \
+	do_##func##_special_reg(CR, SR_CR)
+
+#define do_sr_general_pcr_regs(func) \
+	do_##func##_special_reg(MSR, SR_MSR) ;\
+	do_##func##_spr_reg(EPCR, SR_EPCR) ;\
+	do_##func##_spr_reg(HID0, SR_HID0 + 0x00)
+
+#define do_sr_e500_pcr_regs(func) \
+	do_##func##_spr_reg(HID1, SR_HID0 + 0x08)
+
+#define do_sr_save_tb_regs \
+	do_save_spr_reg(TBRU, SR_TBU) ;\
+	do_save_spr_reg(TBRL, SR_TBL)
+
+#define do_sr_restore_tb_regs \
+	do_restore_spr_reg(TBWU, SR_TBU) ;\
+	do_restore_spr_reg(TBWL, SR_TBL)
+
+#define do_sr_general_time_regs(func) \
+	do_sr_##func##_tb_regs	;\
+	do_##func##_spr_reg(TCR, SR_TCR)
+
+#define do_sr_interrupt_regs(func) \
+	do_##func##_spr_reg(IVPR, SR_IVPR) ;\
+	do_##func##_spr_reg(IVOR0, SR_IVOR0 + 0x00) ;\
+	do_##func##_spr_reg(IVOR1, SR_IVOR0 + 0x08) ;\
+	do_##func##_spr_reg(IVOR2, SR_IVOR0 + 0x10) ;\
+	do_##func##_spr_reg(IVOR3, SR_IVOR0 + 0x18) ;\
+	do_##func##_spr_reg(IVOR4, SR_IVOR0 + 0x20) ;\
+	do_##func##_spr_reg(IVOR5, SR_IVOR0 + 0x28) ;\
+	do_##func##_spr_reg(IVOR6, SR_IVOR0 + 0x30) ;\
+	do_##func##_spr_reg(IVOR7, SR_IVOR0 + 0x38) ;\
+	do_##func##_spr_reg(IVOR8, SR_IVOR0 + 0x40) ;\
+	do_##func##_spr_reg(IVOR10, SR_IVOR0 + 0x50) ;\
+	do_##func##_spr_reg(IVOR11, SR_IVOR0 + 0x58) ;\
+	do_##func##_spr_reg(IVOR12, SR_IVOR0 + 0x60) ;\
+	do_##func##_spr_reg(IVOR13, SR_IVOR0 + 0x68) ;\
+	do_##func##_spr_reg(IVOR14, SR_IVOR0 + 0x70) ;\
+	do_##func##_spr_reg(IVOR15, SR_IVOR0 + 0x78)
+
+#define do_e6500_sr_interrupt_regs(func) \
+	do_##func##_spr_reg(IVOR9, SR_IVOR0 + 0x48) ;\
+	do_##func##_spr_reg(IVOR32, SR_IVOR32 + 0x00) ;\
+	do_##func##_spr_reg(IVOR33, SR_IVOR32 + 0x08) ;\
+	do_##func##_spr_reg(IVOR35, SR_IVOR32 + 0x18) ;\
+	do_##func##_spr_reg(IVOR36, SR_IVOR32 + 0x20) ;\
+	do_##func##_spr_reg(IVOR37, SR_IVOR32 + 0x28) ;\
+	do_##func##_spr_reg(IVOR38, SR_IVOR32 + 0x30) ;\
+	do_##func##_spr_reg(IVOR39, SR_IVOR32 + 0x38) ;\
+	do_##func##_spr_reg(IVOR40, SR_IVOR32 + 0x40) ;\
+	do_##func##_spr_reg(IVOR41, SR_IVOR32 + 0x48)
+
+#define do_e5500_sr_interrupt_regs(func) \
+	do_##func##_spr_reg(IVOR9, SR_IVOR0 + 0x48) ;\
+	do_##func##_spr_reg(IVOR35, SR_IVOR32 + 0x18) ;\
+	do_##func##_spr_reg(IVOR36, SR_IVOR32 + 0x20) ;\
+	do_##func##_spr_reg(IVOR37, SR_IVOR32 + 0x28) ;\
+	do_##func##_spr_reg(IVOR38, SR_IVOR32 + 0x30) ;\
+	do_##func##_spr_reg(IVOR39, SR_IVOR32 + 0x38) ;\
+	do_##func##_spr_reg(IVOR40, SR_IVOR32 + 0x40) ;\
+	do_##func##_spr_reg(IVOR41, SR_IVOR32 + 0x48)
+
+#define do_e500_sr_interrupt_regs(func) \
+	do_##func##_spr_reg(IVOR32, SR_IVOR32 + 0x00) ;\
+	do_##func##_spr_reg(IVOR33, SR_IVOR32 + 0x08) ;\
+	do_##func##_spr_reg(IVOR34, SR_IVOR32 + 0x10)
+
+#define do_e500mc_sr_interrupt_regs(func) \
+	do_##func##_spr_reg(IVOR9, SR_IVOR0 + 0x48) ;\
+	do_##func##_spr_reg(IVOR35, SR_IVOR32 + 0x18) ;\
+	do_##func##_spr_reg(IVOR36, SR_IVOR32 + 0x20) ;\
+	do_##func##_spr_reg(IVOR37, SR_IVOR32 + 0x28) ;\
+	do_##func##_spr_reg(IVOR38, SR_IVOR32 + 0x30) ;\
+	do_##func##_spr_reg(IVOR39, SR_IVOR32 + 0x38) ;\
+	do_##func##_spr_reg(IVOR40, SR_IVOR32 + 0x40) ;\
+	do_##func##_spr_reg(IVOR41, SR_IVOR32 + 0x48)
+
+#define do_sr_general_software_regs(func) \
+	do_##func##_spr_reg(SPRG1, SR_SPRG1) ;\
+	do_##func##_spr_reg(SPRG3, SR_SPRG3)
+
+#define do_sr_general_mmu_regs(func) \
+	do_##func##_spr_reg(PID0, SR_PID0 + 0x00)
+
+#define do_sr_e500_mmu_regs(func) \
+	do_##func##_spr_reg(PID1, SR_PID0 + 0x08) ;\
+	do_##func##_spr_reg(PID2, SR_PID0 + 0x10)
+
+#define do_sr_debug_regs(func) \
+	do_##func##_spr_reg(DBCR0, SR_DBCR0 + 0x00) ;\
+	do_##func##_spr_reg(DBCR1, SR_DBCR0 + 0x08) ;\
+	do_##func##_spr_reg(DBCR2, SR_DBCR0 + 0x10) ;\
+	do_##func##_spr_reg(IAC1, SR_IAC1 + 0x00) ;\
+	do_##func##_spr_reg(IAC2, SR_IAC1 + 0x08) ;\
+	do_##func##_spr_reg(DAC1, SR_DAC1 + 0x00) ;\
+	do_##func##_spr_reg(DAC2, SR_DAC1 + 0x08)
+
+#define do_e6500_sr_debug_regs(func) \
+	do_##func##_spr_reg(IAC3, SR_IAC1 + 0x10) ;\
+	do_##func##_spr_reg(IAC4, SR_IAC1 + 0x18)
+
+/*
+ * Freescale 85xx Cores, Save/Restore core registers.
+ */
+_GLOBAL(core_registers_save_area)
+	.space FSL_CPU_SR_SIZE
+
+	.section .text
+	.align	5
+_GLOBAL(fsl_cpu_base_save)
+	do_sr_general_gpr_regs(save)
+	do_sr_general_branch_regs(save)
+	do_sr_general_pcr_regs(save)
+	do_sr_general_software_regs(save)
+	do_sr_general_mmu_regs(save)
+
+	/*
+	 * Need to save float-point registers if MSR[FP] = 1.
+	 */
+	mfmsr	r12
+	andi.	r12, r12, MSR_FP
+	beq	1f
+	do_sr_fpr_regs(save)
+
+1:
+	mfspr	r5, SPRN_TBRU
+	do_sr_general_time_regs(save)
+	mfspr	r6, SPRN_TBRU
+	cmpw	r5, r6
+	bne	1b
+
+	blr
+
+_GLOBAL(fsl_cpu_base_restore)
+	do_sr_general_gpr_regs(restore)
+	do_sr_general_branch_regs(restore)
+	do_sr_general_pcr_regs(restore)
+	do_sr_general_software_regs(restore)
+	do_sr_general_mmu_regs(restore)
+
+	isync
+
+	/*
+	 * Need to restore float-point registers if MSR[FP] = 1.
+	 */
+	mfmsr	r12
+	andi.	r12, r12, MSR_FP
+	beq	1f
+	do_sr_fpr_regs(restore)
+
+1:
+	/* Restore Time registers */
+	/* clear tb lower to avoid wrap */
+	li	r0, 0
+	mtspr	SPRN_TBWL, r0
+	do_sr_general_time_regs(restore)
+
+	lis	r0, (TSR_ENW | TSR_WIS | TSR_DIS | TSR_FIS)@h
+	mtspr	SPRN_TSR, r0
+
+	/* Kick decrementer */
+	li	r0, 1
+	mtdec	r0
+
+	blr
+
+/* Base registers, e500v1, e500v2 need to do some special save/restore */
+_GLOBAL(e500_base_special_save)
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500V1@l
+	cmpw	r11, r12
+	beq	500f
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500V2@l
+	cmpw	r11, r12
+	bne	1f
+
+500:
+	do_sr_e500_pcr_regs(save)
+	do_sr_e500_mmu_regs(save)
+1:
+	blr
+
+_GLOBAL(e500_base_special_restore)
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500V1@l
+	cmpw	r11, r12
+	beq	500f
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500V2@l
+	cmpw	r11, r12
+	bne	1f
+
+500:
+	do_sr_e500_pcr_regs(save)
+	do_sr_e500_mmu_regs(save)
+1:
+	blr
+
+_GLOBAL(fsl_cpu_append_save)
+	mfspr	r0, SPRN_PVR
+	rlwinm	r11, r0, 16, 16, 31
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E6500@l
+	cmpw	r11, r12
+	beq	e6500_append_save
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E5500@l
+	cmpw	r11, r12
+	beq	e5500_append_save
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500MC@l
+	cmpw	r11, r12
+	beq	e500mc_append_save
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500V2@l
+	cmpw	r11, r12
+	beq	e500v2_append_save
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500V1@l
+	cmpw	r11, r12
+	beq	e500v1_append_save
+
+	b	1f
+
+e6500_append_save:
+	do_e6500_sr_interrupt_regs(save)
+	do_e6500_sr_debug_regs(save)
+	b	1f
+
+e5500_append_save:
+	do_e5500_sr_interrupt_regs(save)
+	b	1f
+
+e500mc_append_save:
+	do_e500mc_sr_interrupt_regs(save)
+	b	1f
+
+e500v2_append_save:
+e500v1_append_save:
+	do_e500_sr_interrupt_regs(save)
+
+1:
+	do_sr_interrupt_regs(save)
+	do_sr_debug_regs(save)
+
+	blr
+
+_GLOBAL(fsl_cpu_append_restore)
+	mfspr	r0, SPRN_PVR
+	rlwinm	r11, r0, 16, 16, 31
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E6500@l
+	cmpw	r11, r12
+	beq	e6500_append_restore
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E5500@l
+	cmpw	r11, r12
+	beq	e5500_append_restore
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500MC@l
+	cmpw	r11, r12
+	beq	e500mc_append_restore
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500V2@l
+	cmpw	r11, r12
+	beq	e500v2_append_restore
+
+	lis	r12, 0
+	ori	r12, r12, PVR_VER_E500V1@l
+	cmpw	r11, r12
+	beq	e500v1_append_restore
+
+	b	1f
+
+e6500_append_restore:
+	do_e6500_sr_interrupt_regs(restore)
+	do_e6500_sr_debug_regs(restore)
+	b	1f
+
+e5500_append_restore:
+	do_e5500_sr_interrupt_regs(restore)
+	b	1f
+
+e500mc_append_restore:
+	do_e500mc_sr_interrupt_regs(restore)
+	b	1f
+
+e500v2_append_restore:
+e500v1_append_restore:
+	do_e500_sr_interrupt_regs(restore)
+
+1:
+	do_sr_interrupt_regs(restore)
+	do_sr_debug_regs(restore)
+
+	sync
+
+	blr
+
+/*
+ * r3 = the virtual address of buffer
+ * r4 = suspend type, 0-BASE_SAVE, 1-ALL_SAVE
+ */
+_GLOBAL(fsl_cpu_state_save)
+	mflr	r9
+	LOAD_SAVE_ADDRESS
+
+	/* save the return address to SR_LR */
+	do_save_gpr_reg(r9, SR_LR)
+
+	/* if core_save_type is BASE_SAVE, goto 1f */
+	cmpwi	r4, 0
+	beq	1f
+
+	bl	fsl_cpu_append_save
+
+1:
+	bl	e500_base_special_save
+
+	bl	fsl_cpu_base_save
+
+	li	r3, 0
+	mtlr	r9
+	blr
+
+/*
+ * r3 = the virtual address of buffer
+ * r4 = suspend type, 0-BASE_SAVE, 1-ALL_SAVE
+ */
+_GLOBAL(fsl_cpu_state_restore)
+	mflr	r9
+	LOAD_SAVE_ADDRESS
+
+	/*
+	 * Disable machine checks and critical exceptions,
+	 * if core_save_type is ALL_SAVE, we will restore interrupt
+	 * IVORs registers.
+	 */
+	mfmsr	r5
+	rlwinm	r5, r5, 0, ~MSR_CE
+	rlwinm	r5, r5, 0, ~MSR_ME
+	mtmsr	r5
+	isync
+
+	/* if core_save_type is BASE_SAVE, goto 1f */
+	cmpwi	r4, 0
+	beq	1f
+
+	bl	fsl_cpu_append_restore
+
+1:
+	bl	e500_base_special_restore
+
+	bl	fsl_cpu_base_restore
+
+	/* return the return address of the save time */
+	do_restore_gpr_reg(r9, SR_LR)
+
+	li	r3, 0
+	mtlr	r9
+	blr
-- 
1.8.5

^ permalink raw reply related

* [PATCH 1/3] powerpc/fsl: add E500MC and E5500 PVR define
From: Dongsheng Wang @ 2014-01-14  7:59 UTC (permalink / raw)
  To: scottwood, benh; +Cc: anton, linuxppc-dev, chenhui.zhao, Wang Dongsheng

From: Wang Dongsheng <dongsheng.wang@freescale.com>

E500MC and E5500 PVR will be used in subsequent save/restore core
state patches.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 62b114e..cd7b630 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1075,6 +1075,8 @@
 #define PVR_8560	0x80200000
 #define PVR_VER_E500V1	0x8020
 #define PVR_VER_E500V2	0x8021
+#define PVR_VER_E500MC	0x8023
+#define PVR_VER_E5500	0x8024
 #define PVR_VER_E6500	0x8040
 
 /*
-- 
1.8.5

^ permalink raw reply related

* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Deepthi Dharwar @ 2014-01-14  8:00 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-pm, daniel.lezcano, rjw, linux-kernel, Preeti U Murthy,
	paulmck, linuxppc-dev, tuukka.tikkanen
In-Reply-To: <52D4E07E.204@linux.vnet.ibm.com>

On 01/14/2014 12:30 PM, Srivatsa S. Bhat wrote:
> On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
>> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
>> Inspite of this it was observed that the idle state count of the shallowest
>> idle state, snooze, was increasing.
>>
>> This is because the governor returns the idle state index as 0 even in
>> scenarios when no idle state can be chosen. These scenarios could be when the
>> latency requirement is 0 or as mentioned above when the user wants to disable
>> certain cpu idle states at runtime. In the latter case, its possible that no
>> cpu idle state is valid because the suitable states were disabled
>> and the rest did not match the menu governor criteria to be chosen as the
>> next idle state.
>>
>> This patch adds the code to indicate that a valid cpu idle state could not be
>> chosen by the menu governor and reports back to arch so that it can take some
>> default action.
>>
> 
> That sounds fair enough. However, the "default" action of pseries idle loop
> (pseries_lpar_idle()) surprises me. It enters Cede, which is _deeper_ than doing
> a snooze! IOW, a user might "disable" cpuidle or set the PM_QOS_CPU_DMA_LATENCY
> to 0 hoping to prevent the CPUs from going to deep idle states, but then the
> machine would still end up going to Cede, even though that wont get reflected
> in the idle state counts. IMHO that scenario needs some thought as well...

It was the snooze loop earlier but later we changed it to cede in commit
363edbe2614 powerpc: Default arch idle will cede the processor on
pseries to address the following regressions:

>>snippet from the patch.
When adding cpuidle support to pSeries, we introduced two
    regressions:

      - The new cpuidle backend driver only works under hypervisors
        supporting the "SLPLAR" option, which isn't the case of the
        old POWER4 hypervisor and the HV "light" used on js2x blades

      - The cpuidle driver registers fairly late, meaning that for
        a significant portion of the boot process, we end up having
        all threads spinning. This slows down the boot process and
        increases the overall resource usage if the hypervisor has
        shared processors.

    This fixes both by implementing a "default" idle that will cede
    to the hypervisor when possible, in a very simple way without
    all the bells and whisles of cpuidle.

Regards,
Deepthi


>> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>> ---
>>
>>  drivers/cpuidle/cpuidle.c        |    6 +++++-
>>  drivers/cpuidle/governors/menu.c |    7 ++++---
>>  2 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index a55e68f..5bf06bb 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>>
>>  	/* ask the governor for the next state */
>>  	next_state = cpuidle_curr_governor->select(drv, dev);
>> +
>> +	dev->last_residency = 0;
>>  	if (need_resched()) {
>> -		dev->last_residency = 0;
>>  		/* give the governor an opportunity to reflect on the outcome */
>>  		if (cpuidle_curr_governor->reflect)
>>  			cpuidle_curr_governor->reflect(dev, next_state);
> 
> The comments on top of the .reflect() routines of the governors say that the
> second parameter is the index of the actual state entered. But after this patch,
> next_state can be negative, indicating an invalid index. So those comments need
> to be updated accordingly.
> 
>> @@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
>>  		return 0;
>>  	}
>>
>> +	if (next_state < 0)
>> +		return -EINVAL;
> 
> The exit path above (due to need_resched) returns with irqs enabled, but the new
> one you are adding (next_state < 0) returns with irqs disabled. This is correct,
> because in the latter case, "idle" is still in progress and the arch will choose
> a default handler to execute (unlike the former case where "idle" is over and
> hence its time to enable interrupts).
> 
> IMHO it would be good to add comments around this code to explain this subtle
> difference. We can never be too careful with these things... ;-)
> 
>> +
>>  	trace_cpu_idle_rcuidle(next_state, dev->cpu);
>>
>>  	broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>> index cf7f2f0..6921543 100644
>> --- a/drivers/cpuidle/governors/menu.c
>> +++ b/drivers/cpuidle/governors/menu.c
>> @@ -283,6 +283,7 @@ again:
>>   * menu_select - selects the next idle state to enter
>>   * @drv: cpuidle driver containing state data
>>   * @dev: the CPU
>> + * Returns -1 when no idle state is suitable
>>   */
>>  static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>>  {
>> @@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>>  	int multiplier;
>>  	struct timespec t;
>>
>> -	if (data->needs_update) {
>> +	if (data->last_state_idx >= 0 && data->needs_update) {
>                ^^^^^
> Doesn't hurt, but actually unnecessary, since ->needs_update is set to 1
> only when index >= 0.
> 
>>  		menu_update(drv, dev);
>>  		data->needs_update = 0;
>>  	}
>>
>> -	data->last_state_idx = 0;
>> +	data->last_state_idx = -1;
>>  	data->exit_us = 0;
>>
>>  	/* Special case when user has set very strict latency requirement */
>>  	if (unlikely(latency_req == 0))
>> -		return 0;
>> +		return data->last_state_idx;
>>
>>  	/* determine the expected residency time, round up */
>>  	t = ktime_to_timespec(tick_nohz_get_sleep_length());
>>
> 
> What about the ladder governor? I know its not used that much in practice,
> but I think it would be good to update that as well, just to keep it
> consistent.
> 
> Regards,
> Srivatsa S. Bhat
> 

^ permalink raw reply

* Re: [PATCH] Move precessing of MCE queued event out from syscall exit path.
From: Hugh Dickins @ 2014-01-14  7:47 UTC (permalink / raw)
  To: Mahesh J Salgaonkar, Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <20140114042611.13145.6551.stgit@mars.in.ibm.com>

On Tue, 14 Jan 2014, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> Huge Dickins reported an issue that b5ff4211a829
> "powerpc/book3s: Queue up and process delayed MCE events" breaks the
> PowerMac G5 boot. This patch fixes it by moving the mce even processing
> away from syscall exit, which was wrong to do that in first place, and
> implements a different mechanism to deal with it using a paca flag and
> decrementer interrupt to process the event.
> 
> Reported-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Good work guys: I can happily report that both this rework,
and Ben's one-liner, fix the issue for me on the G5: thank you.

(Irrelevant not-so-happy detail: I very nearly mailed you an hour or
so earlier to report that neither fixed it; but retried my original
CONFIG_PPC_POWERNV hack after, and found that now equally useless.

I did write of changing behaviour and ATA errors: it now appears that's
an independent but intermittent issue on the G5 in 3.13-rc7-mm1, which
coincidentally happened to trigger when I tested rc7-mm1 without fixes,
but not when I tested with my hack, until today.

I've gone back to testing on rc6-mm1, the previous week's mmotm,
which showed failure to run /sbin/init: rc6-mm1 has no trouble mounting
root, and it runs properly with your new patch, and with Ben's patch.

And I may be quite wrong to point a finger at ATA errors: perhaps
they're always shown, and quickly cleared off screen in successful boots,
but left visible when root cannot be mounted for some other reason.

I don't know, and won't have time to investigate further - bisecting
intermittents is not much fun!  I'll just have to hope that it's
sorted out before it reaches 3.14-rc, or else bite the bullet and
investigate on that.)

Hugh

> ---
>  arch/powerpc/include/asm/mce.h  |    3 +++
>  arch/powerpc/include/asm/paca.h |    3 +++
>  arch/powerpc/kernel/entry_64.S  |    5 -----
>  arch/powerpc/kernel/irq.c       |   11 ++++++++++-
>  arch/powerpc/kernel/mce.c       |    7 +++++++
>  arch/powerpc/kernel/time.c      |    9 +++++++++
>  6 files changed, 32 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
> index 2257d1e..225e678 100644
> --- a/arch/powerpc/include/asm/mce.h
> +++ b/arch/powerpc/include/asm/mce.h
> @@ -186,6 +186,9 @@ struct mce_error_info {
>  #define MCE_EVENT_RELEASE	true
>  #define MCE_EVENT_DONTRELEASE	false
>  
> +/* MCE bit flags (paca.mce_flags) */
> +#define MCE_EVENT_PENDING	0x0001
> +
>  extern void save_mce_event(struct pt_regs *regs, long handled,
>  			   struct mce_error_info *mce_err, uint64_t nip,
>  			   uint64_t addr);
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index c3523d1..f9aa521 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -141,6 +141,9 @@ struct paca_struct {
>  	u8 io_sync;			/* writel() needs spin_unlock sync */
>  	u8 irq_work_pending;		/* IRQ_WORK interrupt while soft-disable */
>  	u8 nap_state_lost;		/* NV GPR values lost in power7_idle */
> +#ifdef CONFIG_PPC_BOOK3S_64
> +	u8 mce_flags;			/* MCE bit flags. */
> +#endif
>  	u64 sprg3;			/* Saved user-visible sprg */
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>  	u64 tm_scratch;                 /* TM scratch area for reclaim */
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 770d6d6..bbfb029 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -184,11 +184,6 @@ syscall_exit:
>  	bl	.do_show_syscall_exit
>  	ld	r3,RESULT(r1)
>  #endif
> -#ifdef CONFIG_PPC_BOOK3S_64
> -BEGIN_FTR_SECTION
> -	bl	.machine_check_process_queued_event
> -END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
> -#endif
>  	CURRENT_THREAD_INFO(r12, r1)
>  
>  	ld	r8,_MSR(r1)
> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index ba01656..e22f591 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -67,6 +67,7 @@
>  #include <asm/udbg.h>
>  #include <asm/smp.h>
>  #include <asm/debug.h>
> +#include <asm/mce.h>
>  
>  #ifdef CONFIG_PPC64
>  #include <asm/paca.h>
> @@ -158,9 +159,17 @@ notrace unsigned int __check_irq_replay(void)
>  	 * We may have missed a decrementer interrupt. We check the
>  	 * decrementer itself rather than the paca irq_happened field
>  	 * in case we also had a rollover while hard disabled
> +	 * Also check if any MCE event is queued up that requires
> +	 * processing. Machine check handler would set paca->mce_flags
> +	 * and then call set_dec(1) to trigger a decrementer interrupt
> +	 * from NMI.
>  	 */
>  	local_paca->irq_happened &= ~PACA_IRQ_DEC;
> -	if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow())
> +	if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow()
> +#ifdef CONFIG_PPC_BOOK3S_64
> +		|| local_paca->mce_flags & MCE_EVENT_PENDING
> +#endif
> +		)
>  		return 0x900;
>  
>  	/* Finally check if an external interrupt happened */
> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index d6edf2b..7bab827 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -185,6 +185,13 @@ void machine_check_queue_event(void)
>  		return;
>  	}
>  	__get_cpu_var(mce_event_queue[index]) = evt;
> +
> +	/*
> +	 * Set the event pending flag and raise an decrementer interrupt
> +	 * to process the queued event later.
> +	 */
> +	local_paca->mce_flags |= MCE_EVENT_PENDING;
> +	set_dec(1);
>  }
>  
>  /*
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index b3b1441..87ccf92 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -69,6 +69,7 @@
>  #include <asm/vdso_datapage.h>
>  #include <asm/firmware.h>
>  #include <asm/cputime.h>
> +#include <asm/mce.h>
>  
>  /* powerpc clocksource/clockevent code */
>  
> @@ -505,6 +506,14 @@ void timer_interrupt(struct pt_regs * regs)
>  		return;
>  	}
>  
> +#ifdef CONFIG_PPC_BOOK3S_64
> +	/* Check if we have MCE event pending for processing. */
> +	if (local_paca->mce_flags & MCE_EVENT_PENDING) {
> +		local_paca->mce_flags &= ~MCE_EVENT_PENDING;
> +		machine_check_process_queued_event();
> +	}
> +#endif
> +
>  	/* Conditionally hard-enable interrupts now that the DEC has been
>  	 * bumped to its maximum value
>  	 */

^ permalink raw reply

* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Srivatsa S. Bhat @ 2014-01-14  7:37 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: deepthi, linux-pm, daniel.lezcano, rjw, linux-kernel, paulmck,
	linuxppc-dev, tuukka.tikkanen
In-Reply-To: <52D4E07E.204@linux.vnet.ibm.com>

On 01/14/2014 12:30 PM, Srivatsa S. Bhat wrote:
> On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
>> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
>> Inspite of this it was observed that the idle state count of the shallowest
>> idle state, snooze, was increasing.
>>
>> This is because the governor returns the idle state index as 0 even in
>> scenarios when no idle state can be chosen. These scenarios could be when the
>> latency requirement is 0 or as mentioned above when the user wants to disable
>> certain cpu idle states at runtime. In the latter case, its possible that no
>> cpu idle state is valid because the suitable states were disabled
>> and the rest did not match the menu governor criteria to be chosen as the
>> next idle state.
>>
>> This patch adds the code to indicate that a valid cpu idle state could not be
>> chosen by the menu governor and reports back to arch so that it can take some
>> default action.
>>
> 
> That sounds fair enough. However, the "default" action of pseries idle loop
> (pseries_lpar_idle()) surprises me. It enters Cede, which is _deeper_ than doing
> a snooze! IOW, a user might "disable" cpuidle or set the PM_QOS_CPU_DMA_LATENCY
> to 0 hoping to prevent the CPUs from going to deep idle states, but then the
> machine would still end up going to Cede, even though that wont get reflected
> in the idle state counts. IMHO that scenario needs some thought as well...
> 

I checked the git history and found that the default idle was changed (on purpose)
to cede the processor, in order to speed up booting.. Hmm..

commit 363edbe2614aa90df706c0f19ccfa2a6c06af0be
Author: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Date:   Fri Sep 6 00:25:06 2013 +0530

    powerpc: Default arch idle could cede processor on pseries


Regards,
Srivatsa S. Bhat

^ permalink raw reply

* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Srivatsa S. Bhat @ 2014-01-14  7:00 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: deepthi, linux-pm, daniel.lezcano, rjw, linux-kernel, paulmck,
	linuxppc-dev, tuukka.tikkanen
In-Reply-To: <20140114060516.6109.14901.stgit@preeti.in.ibm.com>

On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
> Inspite of this it was observed that the idle state count of the shallowest
> idle state, snooze, was increasing.
> 
> This is because the governor returns the idle state index as 0 even in
> scenarios when no idle state can be chosen. These scenarios could be when the
> latency requirement is 0 or as mentioned above when the user wants to disable
> certain cpu idle states at runtime. In the latter case, its possible that no
> cpu idle state is valid because the suitable states were disabled
> and the rest did not match the menu governor criteria to be chosen as the
> next idle state.
> 
> This patch adds the code to indicate that a valid cpu idle state could not be
> chosen by the menu governor and reports back to arch so that it can take some
> default action.
> 

That sounds fair enough. However, the "default" action of pseries idle loop
(pseries_lpar_idle()) surprises me. It enters Cede, which is _deeper_ than doing
a snooze! IOW, a user might "disable" cpuidle or set the PM_QOS_CPU_DMA_LATENCY
to 0 hoping to prevent the CPUs from going to deep idle states, but then the
machine would still end up going to Cede, even though that wont get reflected
in the idle state counts. IMHO that scenario needs some thought as well...

> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> ---
> 
>  drivers/cpuidle/cpuidle.c        |    6 +++++-
>  drivers/cpuidle/governors/menu.c |    7 ++++---
>  2 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index a55e68f..5bf06bb 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
> 
>  	/* ask the governor for the next state */
>  	next_state = cpuidle_curr_governor->select(drv, dev);
> +
> +	dev->last_residency = 0;
>  	if (need_resched()) {
> -		dev->last_residency = 0;
>  		/* give the governor an opportunity to reflect on the outcome */
>  		if (cpuidle_curr_governor->reflect)
>  			cpuidle_curr_governor->reflect(dev, next_state);

The comments on top of the .reflect() routines of the governors say that the
second parameter is the index of the actual state entered. But after this patch,
next_state can be negative, indicating an invalid index. So those comments need
to be updated accordingly.

> @@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
>  		return 0;
>  	}
> 
> +	if (next_state < 0)
> +		return -EINVAL;

The exit path above (due to need_resched) returns with irqs enabled, but the new
one you are adding (next_state < 0) returns with irqs disabled. This is correct,
because in the latter case, "idle" is still in progress and the arch will choose
a default handler to execute (unlike the former case where "idle" is over and
hence its time to enable interrupts).

IMHO it would be good to add comments around this code to explain this subtle
difference. We can never be too careful with these things... ;-)

> +
>  	trace_cpu_idle_rcuidle(next_state, dev->cpu);
> 
>  	broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index cf7f2f0..6921543 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -283,6 +283,7 @@ again:
>   * menu_select - selects the next idle state to enter
>   * @drv: cpuidle driver containing state data
>   * @dev: the CPU
> + * Returns -1 when no idle state is suitable
>   */
>  static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  {
> @@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  	int multiplier;
>  	struct timespec t;
> 
> -	if (data->needs_update) {
> +	if (data->last_state_idx >= 0 && data->needs_update) {
               ^^^^^
Doesn't hurt, but actually unnecessary, since ->needs_update is set to 1
only when index >= 0.

>  		menu_update(drv, dev);
>  		data->needs_update = 0;
>  	}
> 
> -	data->last_state_idx = 0;
> +	data->last_state_idx = -1;
>  	data->exit_us = 0;
> 
>  	/* Special case when user has set very strict latency requirement */
>  	if (unlikely(latency_req == 0))
> -		return 0;
> +		return data->last_state_idx;
> 
>  	/* determine the expected residency time, round up */
>  	t = ktime_to_timespec(tick_nohz_get_sleep_length());
> 

What about the ladder governor? I know its not used that much in practice,
but I think it would be good to update that as well, just to keep it
consistent.

Regards,
Srivatsa S. Bhat

^ permalink raw reply

* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Deepthi Dharwar @ 2014-01-14  6:16 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: linux-pm, daniel.lezcano, rjw, linux-kernel, srivatsa.bhat,
	paulmck, linuxppc-dev, tuukka.tikkanen
In-Reply-To: <20140114060516.6109.14901.stgit@preeti.in.ibm.com>

On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
> Inspite of this it was observed that the idle state count of the shallowest
> idle state, snooze, was increasing.
> 
> This is because the governor returns the idle state index as 0 even in
> scenarios when no idle state can be chosen. These scenarios could be when the
> latency requirement is 0 or as mentioned above when the user wants to disable
> certain cpu idle states at runtime. In the latter case, its possible that no
> cpu idle state is valid because the suitable states were disabled
> and the rest did not match the menu governor criteria to be chosen as the
> next idle state.
> 
> This patch adds the code to indicate that a valid cpu idle state could not be
> chosen by the menu governor and reports back to arch so that it can take some
> default action.
> 
> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> ---

Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>

> 
>  drivers/cpuidle/cpuidle.c        |    6 +++++-
>  drivers/cpuidle/governors/menu.c |    7 ++++---
>  2 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index a55e68f..5bf06bb 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
> 
>  	/* ask the governor for the next state */
>  	next_state = cpuidle_curr_governor->select(drv, dev);
> +
> +	dev->last_residency = 0;
>  	if (need_resched()) {
> -		dev->last_residency = 0;
>  		/* give the governor an opportunity to reflect on the outcome */
>  		if (cpuidle_curr_governor->reflect)
>  			cpuidle_curr_governor->reflect(dev, next_state);
> @@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
>  		return 0;
>  	}
> 
> +	if (next_state < 0)
> +		return -EINVAL;
> +
>  	trace_cpu_idle_rcuidle(next_state, dev->cpu);
> 
>  	broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index cf7f2f0..6921543 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -283,6 +283,7 @@ again:
>   * menu_select - selects the next idle state to enter
>   * @drv: cpuidle driver containing state data
>   * @dev: the CPU
> + * Returns -1 when no idle state is suitable
>   */
>  static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  {
> @@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  	int multiplier;
>  	struct timespec t;
> 
> -	if (data->needs_update) {
> +	if (data->last_state_idx >= 0 && data->needs_update) {
>  		menu_update(drv, dev);
>  		data->needs_update = 0;
>  	}
> 
> -	data->last_state_idx = 0;
> +	data->last_state_idx = -1;
>  	data->exit_us = 0;
> 
>  	/* Special case when user has set very strict latency requirement */
>  	if (unlikely(latency_req == 0))
> -		return 0;
> +		return data->last_state_idx;
> 
>  	/* determine the expected residency time, round up */
>  	t = ktime_to_timespec(tick_nohz_get_sleep_length());
> 

^ permalink raw reply

* [PATCH] powerpc: Fix races with irq_work
From: Benjamin Herrenschmidt @ 2014-01-14  6:11 UTC (permalink / raw)
  To: linuxppc-dev list

If we set irq_work on a processor and immediately afterward, before the
irq work has a chance to be processed, we change the decrementer value,
we can seriously delay the handling of that irq_work.

Fix it by checking in a few places for pending irq work, first before
changing the decrementer in decrementer_set_next_event() and after
changing it in the same function and in timer_interrupt().

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index afb1b56..b3dab20 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -536,6 +536,9 @@ void timer_interrupt(struct pt_regs * regs)
 		now = *next_tb - now;
 		if (now <= DECREMENTER_MAX)
 			set_dec((int)now);
+		/* We may have raced with new irq work */
+		if (test_irq_work_pending())
+			set_dec(1);
 		__get_cpu_var(irq_stat).timer_irqs_others++;
 	}
 
@@ -802,8 +805,16 @@ static void __init clocksource_init(void)
 static int decrementer_set_next_event(unsigned long evt,
 				      struct clock_event_device *dev)
 {
+	/* Don't adjust the decrementer if some irq work is pending */
+	if (test_irq_work_pending())
+		return 0;
 	__get_cpu_var(decrementers_next_tb) = get_tb_or_rtc() + evt;
 	set_dec(evt);
+
+	/* We may have raced with new irq work */
+	if (test_irq_work_pending())
+		set_dec(1);
+
 	return 0;
 }
 

^ permalink raw reply related

* [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Preeti U Murthy @ 2014-01-14  6:05 UTC (permalink / raw)
  To: deepthi, paulmck, linux-pm, benh, daniel.lezcano, rjw,
	linux-kernel, srivatsa.bhat, svaidy, linuxppc-dev,
	tuukka.tikkanen

On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
Inspite of this it was observed that the idle state count of the shallowest
idle state, snooze, was increasing.

This is because the governor returns the idle state index as 0 even in
scenarios when no idle state can be chosen. These scenarios could be when the
latency requirement is 0 or as mentioned above when the user wants to disable
certain cpu idle states at runtime. In the latter case, its possible that no
cpu idle state is valid because the suitable states were disabled
and the rest did not match the menu governor criteria to be chosen as the
next idle state.

This patch adds the code to indicate that a valid cpu idle state could not be
chosen by the menu governor and reports back to arch so that it can take some
default action.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---

 drivers/cpuidle/cpuidle.c        |    6 +++++-
 drivers/cpuidle/governors/menu.c |    7 ++++---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a55e68f..5bf06bb 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
 
 	/* ask the governor for the next state */
 	next_state = cpuidle_curr_governor->select(drv, dev);
+
+	dev->last_residency = 0;
 	if (need_resched()) {
-		dev->last_residency = 0;
 		/* give the governor an opportunity to reflect on the outcome */
 		if (cpuidle_curr_governor->reflect)
 			cpuidle_curr_governor->reflect(dev, next_state);
@@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
 		return 0;
 	}
 
+	if (next_state < 0)
+		return -EINVAL;
+
 	trace_cpu_idle_rcuidle(next_state, dev->cpu);
 
 	broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index cf7f2f0..6921543 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -283,6 +283,7 @@ again:
  * menu_select - selects the next idle state to enter
  * @drv: cpuidle driver containing state data
  * @dev: the CPU
+ * Returns -1 when no idle state is suitable
  */
 static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 {
@@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	int multiplier;
 	struct timespec t;
 
-	if (data->needs_update) {
+	if (data->last_state_idx >= 0 && data->needs_update) {
 		menu_update(drv, dev);
 		data->needs_update = 0;
 	}
 
-	data->last_state_idx = 0;
+	data->last_state_idx = -1;
 	data->exit_us = 0;
 
 	/* Special case when user has set very strict latency requirement */
 	if (unlikely(latency_req == 0))
-		return 0;
+		return data->last_state_idx;
 
 	/* determine the expected residency time, round up */
 	t = ktime_to_timespec(tick_nohz_get_sleep_length());

^ permalink raw reply related

* Re: [PATCH] powerpc: thp: Fix crash on mremap
From: Andrew Morton @ 2014-01-14  4:32 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: aarcange, linux-mm, paulus, Aneesh Kumar K.V, Kirill A. Shutemov,
	linuxppc-dev, kirill.shutemov
In-Reply-To: <1389672810.6933.0.camel@pasglop>

On Tue, 14 Jan 2014 15:13:30 +1100 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Mon, 2014-01-13 at 14:17 -0800, Andrew Morton wrote:
> 
> > Did this get fixed?
> 
> Any chance you can Ack the patch on that thread ?
> 
> http://thread.gmane.org/gmane.linux.kernel.mm/111809
> 
> So I can put it in powerpc -next with a CC stable ? Or if you tell me
> tat Kirill Ack is sufficient then I'll go for it.

yup, it looks OK to me from a non-ppc perspective.  Please proceed as
described.

^ permalink raw reply

* [PATCH] Move precessing of MCE queued event out from syscall exit path.
From: Mahesh J Salgaonkar @ 2014-01-14  4:26 UTC (permalink / raw)
  To: linuxppc-dev, Benjamin Herrenschmidt; +Cc: Hugh Dickins

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Huge Dickins reported an issue that b5ff4211a829
"powerpc/book3s: Queue up and process delayed MCE events" breaks the
PowerMac G5 boot. This patch fixes it by moving the mce even processing
away from syscall exit, which was wrong to do that in first place, and
implements a different mechanism to deal with it using a paca flag and
decrementer interrupt to process the event.

Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/mce.h  |    3 +++
 arch/powerpc/include/asm/paca.h |    3 +++
 arch/powerpc/kernel/entry_64.S  |    5 -----
 arch/powerpc/kernel/irq.c       |   11 ++++++++++-
 arch/powerpc/kernel/mce.c       |    7 +++++++
 arch/powerpc/kernel/time.c      |    9 +++++++++
 6 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index 2257d1e..225e678 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -186,6 +186,9 @@ struct mce_error_info {
 #define MCE_EVENT_RELEASE	true
 #define MCE_EVENT_DONTRELEASE	false
 
+/* MCE bit flags (paca.mce_flags) */
+#define MCE_EVENT_PENDING	0x0001
+
 extern void save_mce_event(struct pt_regs *regs, long handled,
 			   struct mce_error_info *mce_err, uint64_t nip,
 			   uint64_t addr);
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index c3523d1..f9aa521 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -141,6 +141,9 @@ struct paca_struct {
 	u8 io_sync;			/* writel() needs spin_unlock sync */
 	u8 irq_work_pending;		/* IRQ_WORK interrupt while soft-disable */
 	u8 nap_state_lost;		/* NV GPR values lost in power7_idle */
+#ifdef CONFIG_PPC_BOOK3S_64
+	u8 mce_flags;			/* MCE bit flags. */
+#endif
 	u64 sprg3;			/* Saved user-visible sprg */
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	u64 tm_scratch;                 /* TM scratch area for reclaim */
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 770d6d6..bbfb029 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -184,11 +184,6 @@ syscall_exit:
 	bl	.do_show_syscall_exit
 	ld	r3,RESULT(r1)
 #endif
-#ifdef CONFIG_PPC_BOOK3S_64
-BEGIN_FTR_SECTION
-	bl	.machine_check_process_queued_event
-END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
-#endif
 	CURRENT_THREAD_INFO(r12, r1)
 
 	ld	r8,_MSR(r1)
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index ba01656..e22f591 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -67,6 +67,7 @@
 #include <asm/udbg.h>
 #include <asm/smp.h>
 #include <asm/debug.h>
+#include <asm/mce.h>
 
 #ifdef CONFIG_PPC64
 #include <asm/paca.h>
@@ -158,9 +159,17 @@ notrace unsigned int __check_irq_replay(void)
 	 * We may have missed a decrementer interrupt. We check the
 	 * decrementer itself rather than the paca irq_happened field
 	 * in case we also had a rollover while hard disabled
+	 * Also check if any MCE event is queued up that requires
+	 * processing. Machine check handler would set paca->mce_flags
+	 * and then call set_dec(1) to trigger a decrementer interrupt
+	 * from NMI.
 	 */
 	local_paca->irq_happened &= ~PACA_IRQ_DEC;
-	if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow())
+	if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow()
+#ifdef CONFIG_PPC_BOOK3S_64
+		|| local_paca->mce_flags & MCE_EVENT_PENDING
+#endif
+		)
 		return 0x900;
 
 	/* Finally check if an external interrupt happened */
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index d6edf2b..7bab827 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -185,6 +185,13 @@ void machine_check_queue_event(void)
 		return;
 	}
 	__get_cpu_var(mce_event_queue[index]) = evt;
+
+	/*
+	 * Set the event pending flag and raise an decrementer interrupt
+	 * to process the queued event later.
+	 */
+	local_paca->mce_flags |= MCE_EVENT_PENDING;
+	set_dec(1);
 }
 
 /*
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index b3b1441..87ccf92 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -69,6 +69,7 @@
 #include <asm/vdso_datapage.h>
 #include <asm/firmware.h>
 #include <asm/cputime.h>
+#include <asm/mce.h>
 
 /* powerpc clocksource/clockevent code */
 
@@ -505,6 +506,14 @@ void timer_interrupt(struct pt_regs * regs)
 		return;
 	}
 
+#ifdef CONFIG_PPC_BOOK3S_64
+	/* Check if we have MCE event pending for processing. */
+	if (local_paca->mce_flags & MCE_EVENT_PENDING) {
+		local_paca->mce_flags &= ~MCE_EVENT_PENDING;
+		machine_check_process_queued_event();
+	}
+#endif
+
 	/* Conditionally hard-enable interrupts now that the DEC has been
 	 * bumped to its maximum value
 	 */

^ permalink raw reply related

* Re: [PATCH mmotm/next] powerpc: fix powernv boot breakage on G5???
From: Benjamin Herrenschmidt @ 2014-01-14  4:17 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Mahesh Salgaonkar, linuxppc-dev
In-Reply-To: <alpine.LSU.2.11.1401120043210.1092@eggly.anvils>

On Sun, 2014-01-12 at 00:46 -0800, Hugh Dickins wrote:
> My PowerMac G5 cannot boot mmotm these days: different symptoms
> (starting /sbin/init failed? or ATA errors and hang?), with unrelated
> bugs adding to the confusion; but a bisection led to b5ff4211a829
> "powerpc/book3s: Queue up and process delayed MCE events".  Since that
> series seems to be mostly about powernv, I tried changing BOOK3S_64
> to POWERNV in entry_64.S, which has got it back to working for me.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> just in case this happens to be right, but it's well beyond me!
> ---

Do that help instead ?

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 770d6d6..9820d36 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -187,6 +187,7 @@ syscall_exit:
 #ifdef CONFIG_PPC_BOOK3S_64
 BEGIN_FTR_SECTION
 	bl	.machine_check_process_queued_event
+	ld	r3,RESULT(r1)
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
 #endif
 	CURRENT_THREAD_INFO(r12, r1)

Cheers,
Ben.

> 
>  arch/powerpc/kernel/entry_64.S |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- mmotm/arch/powerpc/kernel/entry_64.S	2014-01-10 18:24:56.940448828 -0800
> +++ linux/arch/powerpc/kernel/entry_64.S	2014-01-10 18:29:24.276455182 -0800
> @@ -184,7 +184,7 @@ syscall_exit:
>  	bl	.do_show_syscall_exit
>  	ld	r3,RESULT(r1)
>  #endif
> -#ifdef CONFIG_PPC_BOOK3S_64
> +#ifdef CONFIG_PPC_POWERNV
>  BEGIN_FTR_SECTION
>  	bl	.machine_check_process_queued_event
>  END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)

^ permalink raw reply related

* Re: [PATCH] powerpc: thp: Fix crash on mremap
From: Benjamin Herrenschmidt @ 2014-01-14  4:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: aarcange, linux-mm, paulus, Aneesh Kumar K.V, Kirill A. Shutemov,
	linuxppc-dev, kirill.shutemov
In-Reply-To: <20140113141748.0b851e1573e41bf26de7c0ae@linux-foundation.org>

On Mon, 2014-01-13 at 14:17 -0800, Andrew Morton wrote:

> Did this get fixed?

Any chance you can Ack the patch on that thread ?

http://thread.gmane.org/gmane.linux.kernel.mm/111809

So I can put it in powerpc -next with a CC stable ? Or if you tell me
tat Kirill Ack is sufficient then I'll go for it.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] powerpc: thp: Fix crash on mremap
From: Kirill A. Shutemov @ 2014-01-13 22:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: aarcange, linux-mm, paulus, Aneesh Kumar K.V, linuxppc-dev,
	kirill.shutemov
In-Reply-To: <20140113141748.0b851e1573e41bf26de7c0ae@linux-foundation.org>

On Mon, Jan 13, 2014 at 02:17:48PM -0800, Andrew Morton wrote:
> On Thu, 2 Jan 2014 04:19:51 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Wed, Jan 01, 2014 at 09:29:05PM +1100, Benjamin Herrenschmidt wrote:
> > > On Wed, 2014-01-01 at 15:23 +0530, Aneesh Kumar K.V wrote:
> > > > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > > > 
> > > > This patch fix the below crash
> > > > 
> > > > NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
> > > > LR [c0000000000439ac] .hash_page+0x18c/0x5e0
> > > > ...
> > > > Call Trace:
> > > > [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
> > > > [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
> > > > [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
> > > > 
> > > > On ppc64 we use the pgtable for storing the hpte slot information and
> > > > store address to the pgtable at a constant offset (PTRS_PER_PMD) from
> > > > pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
> > > > the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
> > > > from new pmd.
> > > > 
> > > > We also want to move the withdraw and deposit before the set_pmd so
> > > > that, when page fault find the pmd as trans huge we can be sure that
> > > > pgtable can be located at the offset.
> > > > 
> 
> Did this get fixed?

New version: http://thread.gmane.org/gmane.linux.kernel.mm/111809

-- 
 Kirill A. Shutemov

^ permalink raw reply

* Re: [PATCH] powerpc: thp: Fix crash on mremap
From: Andrew Morton @ 2014-01-13 22:17 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: aarcange, linux-mm, paulus, Aneesh Kumar K.V, linuxppc-dev,
	kirill.shutemov
In-Reply-To: <20140102021951.GA26369@node.dhcp.inet.fi>

On Thu, 2 Jan 2014 04:19:51 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Wed, Jan 01, 2014 at 09:29:05PM +1100, Benjamin Herrenschmidt wrote:
> > On Wed, 2014-01-01 at 15:23 +0530, Aneesh Kumar K.V wrote:
> > > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > > 
> > > This patch fix the below crash
> > > 
> > > NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
> > > LR [c0000000000439ac] .hash_page+0x18c/0x5e0
> > > ...
> > > Call Trace:
> > > [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
> > > [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
> > > [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
> > > 
> > > On ppc64 we use the pgtable for storing the hpte slot information and
> > > store address to the pgtable at a constant offset (PTRS_PER_PMD) from
> > > pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
> > > the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
> > > from new pmd.
> > > 
> > > We also want to move the withdraw and deposit before the set_pmd so
> > > that, when page fault find the pmd as trans huge we can be sure that
> > > pgtable can be located at the offset.
> > > 

Did this get fixed?

^ permalink raw reply

* Re: [PATCH v2 0/9] cpuidle: rework device state count handling
From: Rafael J. Wysocki @ 2014-01-13 21:20 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: linux-samsung-soc, linux-pm, daniel.lezcano, linux-kernel,
	kyungmin.park, linuxppc-dev, lenb
In-Reply-To: <2079155.EyEBRDoJjP@vostro.rjw.lan>

On Saturday, January 11, 2014 01:37:29 AM Rafael J. Wysocki wrote:
> On Friday, December 20, 2013 07:47:22 PM Bartlomiej Zolnierkiewicz wrote:
> > Hi,
> > 
> > Some cpuidle drivers assume that cpuidle core will handle cases where
> > device->state_count is smaller than driver->state_count, unfortunately
> > currently this is untrue (device->state_count is used only for handling
> > cpuidle state sysfs entries and driver->state_count is used for all
> > other cases) and will not be fixed in the future as device->state_count
> > is planned to be removed [1].
> > 
> > This patchset fixes such drivers (ARM EXYNOS cpuidle driver and ACPI
> > cpuidle driver), removes superflous device->state_count initialization
> > from drivers for which device->state_count equals driver->state_count
> > (POWERPC pseries cpuidle driver and intel_idle driver) and finally
> > removes state_count field from struct cpuidle_device.
> > 
> > Additionaly (while at it) this patchset fixes C1E promotion disable
> > quirk handling (in intel_idle driver) and converts cpuidle drivers code
> > to use the common cpuidle_[un]register() routines (in POWERPC pseries
> > cpuidle driver and intel_idle driver).
> > 
> > [1] http://permalink.gmane.org/gmane.linux.power-management.general/36908
> > 
> > Reference to v1:
> > 	http://comments.gmane.org/gmane.linux.power-management.general/37390
> > 
> > Changes since v1:
> > - synced patch series with next-20131220
> > - added ACKs from Daniel Lezcano
> 
> This series breaks boot on one of my test machines with intel_idle, so I'm
> not sure how well it has been tested.
> 
> I've dropped it entirely for now.  If I have the time, I will try to identify
> the root cause of the failure, but that may not happen before the merge window.
> Sorry about that.

The breakage was introduced by patch [8/9], so I've re-applied patches [1-7/9]
from this series.  Please refer to Fengguang's report [1] for the breakage
details.

Thanks!

[1] http://marc.info/?l=linux-kernel&m=138964167909907&w=2

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* [PATCH] powerpc/relocate fix relocate processing in LE mode
From: Laurent Dufour @ 2014-01-13 16:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, paulus, linuxppc-dev

Relocation's code is not working in little endian mode because the r_info
field, which is a 64 bits value, should be read from the right offset.

The current code is optimized to read the r_info field as a 32 bits value
starting at the middle of the double word (offset 12). When running in LE
mode, the read value is not correct since only the MSB is read.

This patch removes this optimization which consist to deal with a 32 bits
value instead of a 64 bits one. This way it works in big and little endian
mode.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/reloc_64.S |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/reloc_64.S b/arch/powerpc/kernel/reloc_64.S
index b47a0e1..1482327 100644
--- a/arch/powerpc/kernel/reloc_64.S
+++ b/arch/powerpc/kernel/reloc_64.S
@@ -69,8 +69,8 @@ _GLOBAL(relocate)
 	 * R_PPC64_RELATIVE ones.
 	 */
 	mtctr	r8
-5:	lwz	r0,12(9)	/* ELF64_R_TYPE(reloc->r_info) */
-	cmpwi	r0,R_PPC64_RELATIVE
+5:	ld	r0,8(9)		/* ELF64_R_TYPE(reloc->r_info) */
+	cmpdi	r0,R_PPC64_RELATIVE
 	bne	6f
 	ld	r6,0(r9)	/* reloc->r_offset */
 	ld	r0,16(r9)	/* reloc->r_addend */

^ permalink raw reply related

* Re: [PATCH V4] powerpc: thp: Fix crash on mremap
From: Kirill A. Shutemov @ 2014-01-13 16:32 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: aarcange, linux-mm, paulus, linuxppc-dev, kirill.shutemov
In-Reply-To: <1389593064-32664-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

On Mon, Jan 13, 2014 at 11:34:24AM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This patch fix the below crash
> 
> NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
> LR [c0000000000439ac] .hash_page+0x18c/0x5e0
> ...
> Call Trace:
> [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
> [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
> [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
> 
> On ppc64 we use the pgtable for storing the hpte slot information and
> store address to the pgtable at a constant offset (PTRS_PER_PMD) from
> pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
> the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
> from new pmd.
> 
> We also want to move the withdraw and deposit before the set_pmd so
> that, when page fault find the pmd as trans huge we can be sure that
> pgtable can be located at the offset.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>


-- 
 Kirill A. Shutemov

^ permalink raw reply

* [PATCH 2/4] powerpc: book3s kvm can be modular so it should use module.h
From: Paul Gortmaker @ 2014-01-13 16:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras; +Cc: Paul Gortmaker, linuxppc-dev
In-Reply-To: <1389630113-7919-1-git-send-email-paul.gortmaker@windriver.com>

KVM support is tristate, so this file should be including
module.h instead of export.h -- it only works currently because
module_init is currently (mis)placed in init.h -- but we are
intending to clean that up and relocate it to module.h

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 arch/powerpc/kvm/book3s.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 8912608b7e1b..279459e8a072 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -16,7 +16,7 @@
 
 #include <linux/kvm_host.h>
 #include <linux/err.h>
-#include <linux/export.h>
+#include <linux/module.h>
 #include <linux/slab.h>
 
 #include <asm/reg.h>
-- 
1.8.5.2

^ permalink raw reply related

* [PATCH 3/4] powerpc: use subsys_initcall for Freescale Local Bus
From: Paul Gortmaker @ 2014-01-13 16:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras; +Cc: Paul Gortmaker, linuxppc-dev
In-Reply-To: <1389630113-7919-1-git-send-email-paul.gortmaker@windriver.com>

The FSL_SOC option is bool, and hence this code is either
present or absent.  It will never be modular, so using
module_init as an alias for __initcall is rather misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of subsys_initcall (which
makes sense for bus code) will thus change this registration
from level 6-device to level 4-subsys (i.e. slightly earlier).
However no observable impact of that small difference has
been observed during testing, or is expected.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 arch/powerpc/sysdev/fsl_lbc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/fsl_lbc.c b/arch/powerpc/sysdev/fsl_lbc.c
index 6bc5a546d49f..9f00e5f84abe 100644
--- a/arch/powerpc/sysdev/fsl_lbc.c
+++ b/arch/powerpc/sysdev/fsl_lbc.c
@@ -388,4 +388,4 @@ static int __init fsl_lbc_init(void)
 {
 	return platform_driver_register(&fsl_lbc_ctrl_driver);
 }
-module_init(fsl_lbc_init);
+subsys_initcall(fsl_lbc_init);
-- 
1.8.5.2

^ permalink raw reply related

* [PATCH 4/4] powerpc: don't use module_init for non-modular core hugetlb code
From: Paul Gortmaker @ 2014-01-13 16:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras; +Cc: Paul Gortmaker, linuxppc-dev
In-Reply-To: <1389630113-7919-1-git-send-email-paul.gortmaker@windriver.com>

The hugetlbpage.o is obj-y (always built in).  It will never
be modular, so using module_init as an alias for __initcall is
somewhat misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of arch_initcall (which
makes sense for arch code) will thus change this registration
from level 6-device to level 3-arch (i.e. slightly earlier).
However no observable impact of that small difference has
been observed during testing, or is expected.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 arch/powerpc/mm/hugetlbpage.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 90bb6d9409bf..d25c202420da 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -911,7 +911,7 @@ static int __init hugetlbpage_init(void)
 	return 0;
 }
 #endif
-module_init(hugetlbpage_init);
+arch_initcall(hugetlbpage_init);
 
 void flush_dcache_icache_hugepage(struct page *page)
 {
-- 
1.8.5.2

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox