* Re: [PATCH v3] watchdog: mpc8xxx_wdt convert to watchdog core
From: Wim Van Sebroeck @ 2014-01-14 8:32 UTC (permalink / raw)
To: Christophe Leroy
Cc: scottwood, linuxppc-dev, linux-kernel, Guenter Roeck,
linux-watchdog
In-Reply-To: <20131204063214.D1DB01A2BEA@localhost.localdomain>
Hi Christophe,
> Convert mpc8xxx_wdt.c to the new watchdog API.
>
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
This patch has been added to linux-watchdog-next.
Kind regards,
Wim.
^ permalink raw reply
* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Preeti U Murthy @ 2014-01-14 8:25 UTC (permalink / raw)
To: Srivatsa S. Bhat
Cc: deepthi, linux-pm, daniel.lezcano, rjw, linux-kernel, paulmck,
linuxppc-dev, tuukka.tikkanen
In-Reply-To: <52D4E07E.204@linux.vnet.ibm.com>
Hi Srivatsa,
On 01/14/2014 12:30 PM, Srivatsa S. Bhat wrote:
> On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
>> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
>> Inspite of this it was observed that the idle state count of the shallowest
>> idle state, snooze, was increasing.
>>
>> This is because the governor returns the idle state index as 0 even in
>> scenarios when no idle state can be chosen. These scenarios could be when the
>> latency requirement is 0 or as mentioned above when the user wants to disable
>> certain cpu idle states at runtime. In the latter case, its possible that no
>> cpu idle state is valid because the suitable states were disabled
>> and the rest did not match the menu governor criteria to be chosen as the
>> next idle state.
>>
>> This patch adds the code to indicate that a valid cpu idle state could not be
>> chosen by the menu governor and reports back to arch so that it can take some
>> default action.
>>
>
> That sounds fair enough. However, the "default" action of pseries idle loop
> (pseries_lpar_idle()) surprises me. It enters Cede, which is _deeper_ than doing
> a snooze! IOW, a user might "disable" cpuidle or set the PM_QOS_CPU_DMA_LATENCY
> to 0 hoping to prevent the CPUs from going to deep idle states, but then the
> machine would still end up going to Cede, even though that wont get reflected
> in the idle state counts. IMHO that scenario needs some thought as well...
Yes I did see this, but since the patch intends to only communicate
whether the cpuidle governor was successful in choosing an idle state on
its part, I wished to address the default action of pseries idle loop
separately. You are right we will need to understand the patch which
introduced this action. I will take a look at it.
>
>> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>> ---
>>
>> drivers/cpuidle/cpuidle.c | 6 +++++-
>> drivers/cpuidle/governors/menu.c | 7 ++++---
>> 2 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index a55e68f..5bf06bb 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>>
>> /* ask the governor for the next state */
>> next_state = cpuidle_curr_governor->select(drv, dev);
>> +
>> + dev->last_residency = 0;
>> if (need_resched()) {
>> - dev->last_residency = 0;
>> /* give the governor an opportunity to reflect on the outcome */
>> if (cpuidle_curr_governor->reflect)
>> cpuidle_curr_governor->reflect(dev, next_state);
>
> The comments on top of the .reflect() routines of the governors say that the
> second parameter is the index of the actual state entered. But after this patch,
> next_state can be negative, indicating an invalid index. So those comments need
> to be updated accordingly.
Right, I will take care of the comment in the next post.
>
>> @@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
>> return 0;
>> }
>>
>> + if (next_state < 0)
>> + return -EINVAL;
>
> The exit path above (due to need_resched) returns with irqs enabled, but the new
> one you are adding (next_state < 0) returns with irqs disabled. This is correct,
> because in the latter case, "idle" is still in progress and the arch will choose
> a default handler to execute (unlike the former case where "idle" is over and
> hence its time to enable interrupts).
Correct.
>
> IMHO it would be good to add comments around this code to explain this subtle
> difference. We can never be too careful with these things... ;-)
Ok, will do so.
>
>> +
>> trace_cpu_idle_rcuidle(next_state, dev->cpu);
>>
>> broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>> index cf7f2f0..6921543 100644
>> --- a/drivers/cpuidle/governors/menu.c
>> +++ b/drivers/cpuidle/governors/menu.c
>> @@ -283,6 +283,7 @@ again:
>> * menu_select - selects the next idle state to enter
>> * @drv: cpuidle driver containing state data
>> * @dev: the CPU
>> + * Returns -1 when no idle state is suitable
>> */
>> static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>> {
>> @@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>> int multiplier;
>> struct timespec t;
>>
>> - if (data->needs_update) {
>> + if (data->last_state_idx >= 0 && data->needs_update) {
> ^^^^^
> Doesn't hurt, but actually unnecessary, since ->needs_update is set to 1
> only when index >= 0.
Right we do not need this check. I was assuming that needs_update would
be consistent with the index >= 0 only in the need_resched() case. But
needs_update will get unset each time the governor is invoked to be set
only if index >= 0 thereafter.
>
>> menu_update(drv, dev);
>> data->needs_update = 0;
>> }
>>
>> - data->last_state_idx = 0;
>> + data->last_state_idx = -1;
>> data->exit_us = 0;
>>
>> /* Special case when user has set very strict latency requirement */
>> if (unlikely(latency_req == 0))
>> - return 0;
>> + return data->last_state_idx;
>>
>> /* determine the expected residency time, round up */
>> t = ktime_to_timespec(tick_nohz_get_sleep_length());
>>
>
> What about the ladder governor? I know its not used that much in practice,
> but I think it would be good to update that as well, just to keep it
> consistent.
Yes this needs to be updated as well. But the ladder governor has a few
other details to take care of in addition to what is taken care of in
the menu governor by this patch. Hence I will be posting that separately.
Thanks
Regards
Preeti U Murthy
>
> Regards,
> Srivatsa S. Bhat
>
^ permalink raw reply
* Re: [PATCH] Move precessing of MCE queued event out from syscall exit path.
From: Benjamin Herrenschmidt @ 2014-01-14 8:20 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Mahesh J Salgaonkar, linuxppc-dev
In-Reply-To: <alpine.LSU.2.11.1401132314380.3222@eggly.anvils>
On Mon, 2014-01-13 at 23:47 -0800, Hugh Dickins wrote:
>
> And I may be quite wrong to point a finger at ATA errors: perhaps
> they're always shown, and quickly cleared off screen in successful
> boots,
> but left visible when root cannot be mounted for some other reason.
dmesg would tell...
> I don't know, and won't have time to investigate further - bisecting
> intermittents is not much fun! I'll just have to hope that it's
> sorted out before it reaches 3.14-rc, or else bite the bullet and
> investigate on that.)
Right :-) Oh well, I still use a G5 as a desktop so I might eventually
stumble upon them !
Cheers,
Ben.
^ permalink raw reply
* [PATCH 3/3] powerpc/fsl: Use the new interface to save or restore registers
From: Dongsheng Wang @ 2014-01-14 7:59 UTC (permalink / raw)
To: scottwood, benh; +Cc: anton, linuxppc-dev, chenhui.zhao, Wang Dongsheng
In-Reply-To: <1389686397-46555-1-git-send-email-dongsheng.wang@freescale.com>
From: Wang Dongsheng <dongsheng.wang@freescale.com>
Use fsl_cpu_state_save/fsl_cpu_state_restore to save/restore registers.
Use the functions to save/restore registers, so we don't need to
maintain the code.
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
diff --git a/arch/powerpc/kernel/swsusp_booke.S b/arch/powerpc/kernel/swsusp_booke.S
index 553c140..b5992db 100644
--- a/arch/powerpc/kernel/swsusp_booke.S
+++ b/arch/powerpc/kernel/swsusp_booke.S
@@ -4,92 +4,28 @@
* Copyright (c) 2009-2010 MontaVista Software, LLC.
*/
-#include <linux/threads.h>
-#include <asm/processor.h>
#include <asm/page.h>
-#include <asm/cputable.h>
-#include <asm/thread_info.h>
#include <asm/ppc_asm.h>
#include <asm/asm-offsets.h>
#include <asm/mmu.h>
-
-/*
- * Structure for storing CPU registers on the save area.
- */
-#define SL_SP 0
-#define SL_PC 4
-#define SL_MSR 8
-#define SL_TCR 0xc
-#define SL_SPRG0 0x10
-#define SL_SPRG1 0x14
-#define SL_SPRG2 0x18
-#define SL_SPRG3 0x1c
-#define SL_SPRG4 0x20
-#define SL_SPRG5 0x24
-#define SL_SPRG6 0x28
-#define SL_SPRG7 0x2c
-#define SL_TBU 0x30
-#define SL_TBL 0x34
-#define SL_R2 0x38
-#define SL_CR 0x3c
-#define SL_LR 0x40
-#define SL_R12 0x44 /* r12 to r31 */
-#define SL_SIZE (SL_R12 + 80)
-
- .section .data
- .align 5
-
-_GLOBAL(swsusp_save_area)
- .space SL_SIZE
-
+#include <asm/fsl_sleep.h>
.section .text
.align 5
_GLOBAL(swsusp_arch_suspend)
- lis r11,swsusp_save_area@h
- ori r11,r11,swsusp_save_area@l
-
- mflr r0
- stw r0,SL_LR(r11)
- mfcr r0
- stw r0,SL_CR(r11)
- stw r1,SL_SP(r11)
- stw r2,SL_R2(r11)
- stmw r12,SL_R12(r11)
-
- /* Save MSR & TCR */
- mfmsr r4
- stw r4,SL_MSR(r11)
- mfspr r4,SPRN_TCR
- stw r4,SL_TCR(r11)
-
- /* Get a stable timebase and save it */
-1: mfspr r4,SPRN_TBRU
- stw r4,SL_TBU(r11)
- mfspr r5,SPRN_TBRL
- stw r5,SL_TBL(r11)
- mfspr r3,SPRN_TBRU
- cmpw r3,r4
- bne 1b
+ mflr r15
+ lis r3, core_registers_save_area@h
+ ori r3, r3, core_registers_save_area@l
+
+ /* Save base register */
+ li r4, 0
+ bl fsl_cpu_state_save
- /* Save SPRGs */
- mfspr r4,SPRN_SPRG0
- stw r4,SL_SPRG0(r11)
- mfspr r4,SPRN_SPRG1
- stw r4,SL_SPRG1(r11)
- mfspr r4,SPRN_SPRG2
- stw r4,SL_SPRG2(r11)
- mfspr r4,SPRN_SPRG3
- stw r4,SL_SPRG3(r11)
- mfspr r4,SPRN_SPRG4
- stw r4,SL_SPRG4(r11)
- mfspr r4,SPRN_SPRG5
- stw r4,SL_SPRG5(r11)
- mfspr r4,SPRN_SPRG6
- stw r4,SL_SPRG6(r11)
- mfspr r4,SPRN_SPRG7
- stw r4,SL_SPRG7(r11)
+ /* Save LR */
+ lis r3, core_registers_save_area@h
+ ori r3, r3, core_registers_save_area@l
+ stw r15, SR_LR(r3)
/* Call the low level suspend stuff (we should probably have made
* a stackframe...
@@ -97,11 +33,12 @@ _GLOBAL(swsusp_arch_suspend)
bl swsusp_save
/* Restore LR from the save area */
- lis r11,swsusp_save_area@h
- ori r11,r11,swsusp_save_area@l
- lwz r0,SL_LR(r11)
- mtlr r0
+ lis r3, core_registers_save_area@h
+ ori r3, r3, core_registers_save_area@l
+ lwz r15, SR_LR(r3)
+ mtlr r15
+ li r3, 0
blr
_GLOBAL(swsusp_arch_resume)
@@ -138,9 +75,6 @@ _GLOBAL(swsusp_arch_resume)
bl flush_dcache_L1
bl flush_instruction_cache
- lis r11,swsusp_save_area@h
- ori r11,r11,swsusp_save_area@l
-
/*
* Mappings from virtual addresses to physical addresses may be
* different than they were prior to restoring hibernation state.
@@ -149,53 +83,12 @@ _GLOBAL(swsusp_arch_resume)
*/
bl _tlbil_all
- lwz r4,SL_SPRG0(r11)
- mtspr SPRN_SPRG0,r4
- lwz r4,SL_SPRG1(r11)
- mtspr SPRN_SPRG1,r4
- lwz r4,SL_SPRG2(r11)
- mtspr SPRN_SPRG2,r4
- lwz r4,SL_SPRG3(r11)
- mtspr SPRN_SPRG3,r4
- lwz r4,SL_SPRG4(r11)
- mtspr SPRN_SPRG4,r4
- lwz r4,SL_SPRG5(r11)
- mtspr SPRN_SPRG5,r4
- lwz r4,SL_SPRG6(r11)
- mtspr SPRN_SPRG6,r4
- lwz r4,SL_SPRG7(r11)
- mtspr SPRN_SPRG7,r4
-
- /* restore the MSR */
- lwz r3,SL_MSR(r11)
- mtmsr r3
-
- /* Restore TB */
- li r3,0
- mtspr SPRN_TBWL,r3
- lwz r3,SL_TBU(r11)
- lwz r4,SL_TBL(r11)
- mtspr SPRN_TBWU,r3
- mtspr SPRN_TBWL,r4
-
- /* Restore TCR and clear any pending bits in TSR. */
- lwz r4,SL_TCR(r11)
- mtspr SPRN_TCR,r4
- lis r4, (TSR_ENW | TSR_WIS | TSR_DIS | TSR_FIS)@h
- mtspr SPRN_TSR,r4
-
- /* Kick decrementer */
- li r0,1
- mtdec r0
-
- /* Restore the callee-saved registers and return */
- lwz r0,SL_CR(r11)
- mtcr r0
- lwz r2,SL_R2(r11)
- lmw r12,SL_R12(r11)
- lwz r1,SL_SP(r11)
- lwz r0,SL_LR(r11)
- mtlr r0
+ lis r3, core_registers_save_area@h
+ ori r3, r3, core_registers_save_area@l
+
+ /* Restore base register */
+ li r4, 0
+ bl fsl_cpu_state_restore
li r3,0
blr
--
1.8.5
^ permalink raw reply related
* [PATCH 2/3] powerpc/85xx: Provide two functions to save/restore the core registers
From: Dongsheng Wang @ 2014-01-14 7:59 UTC (permalink / raw)
To: scottwood, benh; +Cc: anton, linuxppc-dev, chenhui.zhao, Wang Dongsheng
In-Reply-To: <1389686397-46555-1-git-send-email-dongsheng.wang@freescale.com>
From: Wang Dongsheng <dongsheng.wang@freescale.com>
Add fsl_cpu_state_save/fsl_cpu_state_restore functions, used for deep
sleep and hibernation to save/restore core registers. We abstract out
save/restore code for use in various modules, to make them don't need
to maintain.
Currently supported processors type are E6500, E5500, E500MC, E500v2 and
E500v1.
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
diff --git a/arch/powerpc/include/asm/fsl_sleep.h b/arch/powerpc/include/asm/fsl_sleep.h
new file mode 100644
index 0000000..31c8a9b
--- /dev/null
+++ b/arch/powerpc/include/asm/fsl_sleep.h
@@ -0,0 +1,98 @@
+/*
+ * Freescale 85xx Power management set
+ *
+ * Author: Wang Dongsheng <dongsheng.wang@freescale.com>
+ *
+ * Copyright 2014 Freescale Semiconductor Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __ASM_FSL_SLEEP_H
+#define __ASM_FSL_SLEEP_H
+
+/*
+ * Freescale 85xx Core registers set, core register map definition
+ * Address base on r3, we need to compatible with both 32-bit and 64-bit, so
+ * the data width is 64-bit(double word).
+ *
+ * Acronyms:
+ * dw(data width) 0x08
+ *
+ * Map:
+ * General-Purpose Registers
+ * GPR1(sp) 0
+ * GPR2 0x8 (dw * 1)
+ * GPR13 - GPR31 0x10 ~ 0xa0 (dw * 2 ~ dw * 20)
+ * Foating-point registers
+ * FPR14 - FPR31 0xa8 ~ 0x130 (dw * 21 ~ dw * 38)
+ * Registers for Branch Operations
+ * CR 0x138 (dw * 39)
+ * LR 0x140 (dw * 40)
+ * Processor Control Registers
+ * MSR 0x148 (dw * 41)
+ * EPCR 0x150 (dw * 42)
+ *
+ * Only e500, e500v2 need to save HID0 - HID1
+ * HID0 - HID1 0x158 ~ 0x160 (dw * 43 ~ dw * 44)
+ * Timer Registers
+ * TCR 0x168 (dw * 45)
+ * TB(64bit) 0x170 (dw * 46)
+ * TBU(32bit) 0x178 (dw * 47)
+ * TBL(32bit) 0x180 (dw * 48)
+ * Interrupt Registers
+ * IVPR 0x188 (dw * 49)
+ * IVOR0 - IVOR15 0x190 ~ 0x208 (dw * 50 ~ dw * 65)
+ * IVOR32 - IVOR41 0x210 ~ 0x258 (dw * 66 ~ dw * 75)
+ * Software-Use Registers
+ * SPRG1 0x260 (dw * 76), 64-bit need to save.
+ * SPRG3 0x268 (dw * 77), 32-bit need to save.
+ * MMU Registers
+ * PID0 - PID2 0x270 ~ 0x280 (dw * 78 ~ dw * 80)
+ * Debug Registers
+ * DBCR0 - DBCR2 0x288 ~ 0x298 (dw * 81 ~ dw * 83)
+ * IAC1 - IAC4 0x2a0 ~ 0x2b8 (dw * 84 ~ dw * 87)
+ * DAC1 - DAC2 0x2c0 ~ 0x2c8 (dw * 88 ~ dw * 89)
+ *
+ */
+
+#define SR_GPR1 0x000
+#define SR_GPR2 0x008
+#define SR_GPR13 0x010
+#define SR_FPR14 0x0a8
+#define SR_CR 0x138
+#define SR_LR 0x140
+#define SR_MSR 0x148
+#define SR_EPCR 0x150
+#define SR_HID0 0x158
+#define SR_TCR 0x168
+#define SR_TB 0x170
+#define SR_TBU 0x178
+#define SR_TBL 0x180
+#define SR_IVPR 0x188
+#define SR_IVOR0 0x190
+#define SR_IVOR32 0x210
+#define SR_SPRG1 0x260
+#define SR_SPRG3 0x268
+#define SR_PID0 0x270
+#define SR_DBCR0 0x288
+#define SR_IAC1 0x2a0
+#define SR_DAC1 0x2c0
+#define FSL_CPU_SR_SIZE (SR_DAC1 + 0x10)
+
+#ifndef __ASSEMBLY__
+
+enum core_save_type {
+ BASE_SAVE = 0,
+ ALL_SAVE = 1,
+};
+
+extern int fsl_cpu_state_save(void *save_page, enum core_save_type type);
+extern int fsl_cpu_state_restore(void *restore_page, enum core_save_type type);
+
+#endif
+
+#endif
+
diff --git a/arch/powerpc/platforms/85xx/Makefile b/arch/powerpc/platforms/85xx/Makefile
index 25cebe7..650a01c 100644
--- a/arch/powerpc/platforms/85xx/Makefile
+++ b/arch/powerpc/platforms/85xx/Makefile
@@ -4,6 +4,7 @@
obj-$(CONFIG_SMP) += smp.o
obj-y += common.o
+obj-y += save-core.o
obj-$(CONFIG_BSC9131_RDB) += bsc913x_rdb.o
obj-$(CONFIG_C293_PCIE) += c293pcie.o
diff --git a/arch/powerpc/platforms/85xx/save-core.S b/arch/powerpc/platforms/85xx/save-core.S
new file mode 100644
index 0000000..a6b93b8
--- /dev/null
+++ b/arch/powerpc/platforms/85xx/save-core.S
@@ -0,0 +1,497 @@
+/*
+ * Freescale Power Management, Save/Restore core state
+ *
+ * Copyright 2014 Freescale Semiconductor, Inc.
+ * Author: Wang Dongsheng <dongsheng.wang@freescale.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/fsl_sleep.h>
+
+/*
+ * Freescale 85xx Cores
+ * Support Core List:
+ * E500v1, E500v2, E500MC, E5500, E6500.
+ */
+
+ /*
+ * Save/Restore register operation define
+ */
+#define LOAD_SAVE_ADDRESS \
+ mr r10, r3
+
+#ifdef CONFIG_PPC64
+#define PPC_STD(sreg, offset, areg) \
+ std sreg, (offset)(areg)
+#define PPC_LD(lreg, offset, areg) \
+ ld lreg, (offset)(areg)
+
+#define PPC_STFD(sreg, offset, areg) \
+ stfd sreg, (offset)(areg)
+#define PPC_LFD(lreg, offset, areg) \
+ lfd lreg, (offset)(areg)
+#else
+#define PPC_STD(sreg, offset, areg) \
+ stw sreg, (offset)(areg)
+#define PPC_LD(lreg, offset, areg) \
+ lwz lreg, (offset)(areg)
+
+#define PPC_STFD(sreg, offset, areg) \
+ stfs sreg, (offset)(areg)
+#define PPC_LFD(lreg, offset, areg) \
+ lfs lreg, (offset)(areg)
+#endif
+
+#define do_save_gpr_reg(reg, addr) \
+ mr r0, reg ;\
+ PPC_STD(r0, addr, r10)
+
+#define do_restore_gpr_reg(reg, addr) \
+ PPC_LD(r0, addr, r10) ;\
+ mr reg, r0
+
+#define do_save_fpr_reg(reg, addr) \
+ fmr fr0, reg ;\
+ PPC_STFD(fr0, addr, r10)
+
+#define do_restore_fpr_reg(reg, addr) \
+ PPC_LFD(fr0, addr, r10) ;\
+ fmr reg, fr0
+
+#define do_save_spr_reg(reg, addr) \
+ mfspr r0, SPRN_##reg ;\
+ PPC_STD(r0, addr, r10)
+
+#define do_restore_spr_reg(reg, addr) \
+ PPC_LD(r0, addr, r10) ;\
+ mtspr SPRN_##reg, r0
+
+#define do_save_special_reg(special, addr) \
+ mf##special r0 ;\
+ PPC_STD(r0, addr, r10)
+#define do_restore_special_reg(special, addr) \
+ PPC_LD(r0, addr, r10) ;\
+ mt##special r0
+
+#define do_sr_general_gpr_regs(func) \
+ do_##func##_gpr_reg(r1, SR_GPR1) ;\
+ do_##func##_gpr_reg(r2, SR_GPR2) ;\
+ do_##func##_gpr_reg(r13, SR_GPR13 + 0x00) ;\
+ do_##func##_gpr_reg(r14, SR_GPR13 + 0x08) ;\
+ do_##func##_gpr_reg(r15, SR_GPR13 + 0x10) ;\
+ do_##func##_gpr_reg(r16, SR_GPR13 + 0x18) ;\
+ do_##func##_gpr_reg(r17, SR_GPR13 + 0x20) ;\
+ do_##func##_gpr_reg(r18, SR_GPR13 + 0x28) ;\
+ do_##func##_gpr_reg(r19, SR_GPR13 + 0x30) ;\
+ do_##func##_gpr_reg(r20, SR_GPR13 + 0x38) ;\
+ do_##func##_gpr_reg(r21, SR_GPR13 + 0x40) ;\
+ do_##func##_gpr_reg(r22, SR_GPR13 + 0x48) ;\
+ do_##func##_gpr_reg(r23, SR_GPR13 + 0x50) ;\
+ do_##func##_gpr_reg(r24, SR_GPR13 + 0x58) ;\
+ do_##func##_gpr_reg(r25, SR_GPR13 + 0x60) ;\
+ do_##func##_gpr_reg(r26, SR_GPR13 + 0x68) ;\
+ do_##func##_gpr_reg(r27, SR_GPR13 + 0x70) ;\
+ do_##func##_gpr_reg(r28, SR_GPR13 + 0x78) ;\
+ do_##func##_gpr_reg(r29, SR_GPR13 + 0x80) ;\
+ do_##func##_gpr_reg(r30, SR_GPR13 + 0x88) ;\
+ do_##func##_gpr_reg(r31, SR_GPR13 + 0x90)
+
+#define do_sr_fpr_regs(func) \
+ do_##func##_fpr_reg(fr14, SR_FPR14 + 0x00) ;\
+ do_##func##_fpr_reg(fr15, SR_FPR14 + 0x08) ;\
+ do_##func##_fpr_reg(fr16, SR_FPR14 + 0x10) ;\
+ do_##func##_fpr_reg(fr17, SR_FPR14 + 0x18) ;\
+ do_##func##_fpr_reg(fr18, SR_FPR14 + 0x20) ;\
+ do_##func##_fpr_reg(fr19, SR_FPR14 + 0x28) ;\
+ do_##func##_fpr_reg(fr20, SR_FPR14 + 0x30) ;\
+ do_##func##_fpr_reg(fr21, SR_FPR14 + 0x38) ;\
+ do_##func##_fpr_reg(fr22, SR_FPR14 + 0x40) ;\
+ do_##func##_fpr_reg(fr23, SR_FPR14 + 0x48) ;\
+ do_##func##_fpr_reg(fr24, SR_FPR14 + 0x50) ;\
+ do_##func##_fpr_reg(fr25, SR_FPR14 + 0x58) ;\
+ do_##func##_fpr_reg(fr26, SR_FPR14 + 0x60) ;\
+ do_##func##_fpr_reg(fr27, SR_FPR14 + 0x68) ;\
+ do_##func##_fpr_reg(fr28, SR_FPR14 + 0x70) ;\
+ do_##func##_fpr_reg(fr29, SR_FPR14 + 0x78) ;\
+ do_##func##_fpr_reg(fr30, SR_FPR14 + 0x80) ;\
+ do_##func##_fpr_reg(fr31, SR_FPR14 + 0x88)
+
+#define do_sr_general_branch_regs(func) \
+ do_##func##_special_reg(CR, SR_CR)
+
+#define do_sr_general_pcr_regs(func) \
+ do_##func##_special_reg(MSR, SR_MSR) ;\
+ do_##func##_spr_reg(EPCR, SR_EPCR) ;\
+ do_##func##_spr_reg(HID0, SR_HID0 + 0x00)
+
+#define do_sr_e500_pcr_regs(func) \
+ do_##func##_spr_reg(HID1, SR_HID0 + 0x08)
+
+#define do_sr_save_tb_regs \
+ do_save_spr_reg(TBRU, SR_TBU) ;\
+ do_save_spr_reg(TBRL, SR_TBL)
+
+#define do_sr_restore_tb_regs \
+ do_restore_spr_reg(TBWU, SR_TBU) ;\
+ do_restore_spr_reg(TBWL, SR_TBL)
+
+#define do_sr_general_time_regs(func) \
+ do_sr_##func##_tb_regs ;\
+ do_##func##_spr_reg(TCR, SR_TCR)
+
+#define do_sr_interrupt_regs(func) \
+ do_##func##_spr_reg(IVPR, SR_IVPR) ;\
+ do_##func##_spr_reg(IVOR0, SR_IVOR0 + 0x00) ;\
+ do_##func##_spr_reg(IVOR1, SR_IVOR0 + 0x08) ;\
+ do_##func##_spr_reg(IVOR2, SR_IVOR0 + 0x10) ;\
+ do_##func##_spr_reg(IVOR3, SR_IVOR0 + 0x18) ;\
+ do_##func##_spr_reg(IVOR4, SR_IVOR0 + 0x20) ;\
+ do_##func##_spr_reg(IVOR5, SR_IVOR0 + 0x28) ;\
+ do_##func##_spr_reg(IVOR6, SR_IVOR0 + 0x30) ;\
+ do_##func##_spr_reg(IVOR7, SR_IVOR0 + 0x38) ;\
+ do_##func##_spr_reg(IVOR8, SR_IVOR0 + 0x40) ;\
+ do_##func##_spr_reg(IVOR10, SR_IVOR0 + 0x50) ;\
+ do_##func##_spr_reg(IVOR11, SR_IVOR0 + 0x58) ;\
+ do_##func##_spr_reg(IVOR12, SR_IVOR0 + 0x60) ;\
+ do_##func##_spr_reg(IVOR13, SR_IVOR0 + 0x68) ;\
+ do_##func##_spr_reg(IVOR14, SR_IVOR0 + 0x70) ;\
+ do_##func##_spr_reg(IVOR15, SR_IVOR0 + 0x78)
+
+#define do_e6500_sr_interrupt_regs(func) \
+ do_##func##_spr_reg(IVOR9, SR_IVOR0 + 0x48) ;\
+ do_##func##_spr_reg(IVOR32, SR_IVOR32 + 0x00) ;\
+ do_##func##_spr_reg(IVOR33, SR_IVOR32 + 0x08) ;\
+ do_##func##_spr_reg(IVOR35, SR_IVOR32 + 0x18) ;\
+ do_##func##_spr_reg(IVOR36, SR_IVOR32 + 0x20) ;\
+ do_##func##_spr_reg(IVOR37, SR_IVOR32 + 0x28) ;\
+ do_##func##_spr_reg(IVOR38, SR_IVOR32 + 0x30) ;\
+ do_##func##_spr_reg(IVOR39, SR_IVOR32 + 0x38) ;\
+ do_##func##_spr_reg(IVOR40, SR_IVOR32 + 0x40) ;\
+ do_##func##_spr_reg(IVOR41, SR_IVOR32 + 0x48)
+
+#define do_e5500_sr_interrupt_regs(func) \
+ do_##func##_spr_reg(IVOR9, SR_IVOR0 + 0x48) ;\
+ do_##func##_spr_reg(IVOR35, SR_IVOR32 + 0x18) ;\
+ do_##func##_spr_reg(IVOR36, SR_IVOR32 + 0x20) ;\
+ do_##func##_spr_reg(IVOR37, SR_IVOR32 + 0x28) ;\
+ do_##func##_spr_reg(IVOR38, SR_IVOR32 + 0x30) ;\
+ do_##func##_spr_reg(IVOR39, SR_IVOR32 + 0x38) ;\
+ do_##func##_spr_reg(IVOR40, SR_IVOR32 + 0x40) ;\
+ do_##func##_spr_reg(IVOR41, SR_IVOR32 + 0x48)
+
+#define do_e500_sr_interrupt_regs(func) \
+ do_##func##_spr_reg(IVOR32, SR_IVOR32 + 0x00) ;\
+ do_##func##_spr_reg(IVOR33, SR_IVOR32 + 0x08) ;\
+ do_##func##_spr_reg(IVOR34, SR_IVOR32 + 0x10)
+
+#define do_e500mc_sr_interrupt_regs(func) \
+ do_##func##_spr_reg(IVOR9, SR_IVOR0 + 0x48) ;\
+ do_##func##_spr_reg(IVOR35, SR_IVOR32 + 0x18) ;\
+ do_##func##_spr_reg(IVOR36, SR_IVOR32 + 0x20) ;\
+ do_##func##_spr_reg(IVOR37, SR_IVOR32 + 0x28) ;\
+ do_##func##_spr_reg(IVOR38, SR_IVOR32 + 0x30) ;\
+ do_##func##_spr_reg(IVOR39, SR_IVOR32 + 0x38) ;\
+ do_##func##_spr_reg(IVOR40, SR_IVOR32 + 0x40) ;\
+ do_##func##_spr_reg(IVOR41, SR_IVOR32 + 0x48)
+
+#define do_sr_general_software_regs(func) \
+ do_##func##_spr_reg(SPRG1, SR_SPRG1) ;\
+ do_##func##_spr_reg(SPRG3, SR_SPRG3)
+
+#define do_sr_general_mmu_regs(func) \
+ do_##func##_spr_reg(PID0, SR_PID0 + 0x00)
+
+#define do_sr_e500_mmu_regs(func) \
+ do_##func##_spr_reg(PID1, SR_PID0 + 0x08) ;\
+ do_##func##_spr_reg(PID2, SR_PID0 + 0x10)
+
+#define do_sr_debug_regs(func) \
+ do_##func##_spr_reg(DBCR0, SR_DBCR0 + 0x00) ;\
+ do_##func##_spr_reg(DBCR1, SR_DBCR0 + 0x08) ;\
+ do_##func##_spr_reg(DBCR2, SR_DBCR0 + 0x10) ;\
+ do_##func##_spr_reg(IAC1, SR_IAC1 + 0x00) ;\
+ do_##func##_spr_reg(IAC2, SR_IAC1 + 0x08) ;\
+ do_##func##_spr_reg(DAC1, SR_DAC1 + 0x00) ;\
+ do_##func##_spr_reg(DAC2, SR_DAC1 + 0x08)
+
+#define do_e6500_sr_debug_regs(func) \
+ do_##func##_spr_reg(IAC3, SR_IAC1 + 0x10) ;\
+ do_##func##_spr_reg(IAC4, SR_IAC1 + 0x18)
+
+/*
+ * Freescale 85xx Cores, Save/Restore core registers.
+ */
+_GLOBAL(core_registers_save_area)
+ .space FSL_CPU_SR_SIZE
+
+ .section .text
+ .align 5
+_GLOBAL(fsl_cpu_base_save)
+ do_sr_general_gpr_regs(save)
+ do_sr_general_branch_regs(save)
+ do_sr_general_pcr_regs(save)
+ do_sr_general_software_regs(save)
+ do_sr_general_mmu_regs(save)
+
+ /*
+ * Need to save float-point registers if MSR[FP] = 1.
+ */
+ mfmsr r12
+ andi. r12, r12, MSR_FP
+ beq 1f
+ do_sr_fpr_regs(save)
+
+1:
+ mfspr r5, SPRN_TBRU
+ do_sr_general_time_regs(save)
+ mfspr r6, SPRN_TBRU
+ cmpw r5, r6
+ bne 1b
+
+ blr
+
+_GLOBAL(fsl_cpu_base_restore)
+ do_sr_general_gpr_regs(restore)
+ do_sr_general_branch_regs(restore)
+ do_sr_general_pcr_regs(restore)
+ do_sr_general_software_regs(restore)
+ do_sr_general_mmu_regs(restore)
+
+ isync
+
+ /*
+ * Need to restore float-point registers if MSR[FP] = 1.
+ */
+ mfmsr r12
+ andi. r12, r12, MSR_FP
+ beq 1f
+ do_sr_fpr_regs(restore)
+
+1:
+ /* Restore Time registers */
+ /* clear tb lower to avoid wrap */
+ li r0, 0
+ mtspr SPRN_TBWL, r0
+ do_sr_general_time_regs(restore)
+
+ lis r0, (TSR_ENW | TSR_WIS | TSR_DIS | TSR_FIS)@h
+ mtspr SPRN_TSR, r0
+
+ /* Kick decrementer */
+ li r0, 1
+ mtdec r0
+
+ blr
+
+/* Base registers, e500v1, e500v2 need to do some special save/restore */
+_GLOBAL(e500_base_special_save)
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500V1@l
+ cmpw r11, r12
+ beq 500f
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500V2@l
+ cmpw r11, r12
+ bne 1f
+
+500:
+ do_sr_e500_pcr_regs(save)
+ do_sr_e500_mmu_regs(save)
+1:
+ blr
+
+_GLOBAL(e500_base_special_restore)
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500V1@l
+ cmpw r11, r12
+ beq 500f
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500V2@l
+ cmpw r11, r12
+ bne 1f
+
+500:
+ do_sr_e500_pcr_regs(save)
+ do_sr_e500_mmu_regs(save)
+1:
+ blr
+
+_GLOBAL(fsl_cpu_append_save)
+ mfspr r0, SPRN_PVR
+ rlwinm r11, r0, 16, 16, 31
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E6500@l
+ cmpw r11, r12
+ beq e6500_append_save
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E5500@l
+ cmpw r11, r12
+ beq e5500_append_save
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500MC@l
+ cmpw r11, r12
+ beq e500mc_append_save
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500V2@l
+ cmpw r11, r12
+ beq e500v2_append_save
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500V1@l
+ cmpw r11, r12
+ beq e500v1_append_save
+
+ b 1f
+
+e6500_append_save:
+ do_e6500_sr_interrupt_regs(save)
+ do_e6500_sr_debug_regs(save)
+ b 1f
+
+e5500_append_save:
+ do_e5500_sr_interrupt_regs(save)
+ b 1f
+
+e500mc_append_save:
+ do_e500mc_sr_interrupt_regs(save)
+ b 1f
+
+e500v2_append_save:
+e500v1_append_save:
+ do_e500_sr_interrupt_regs(save)
+
+1:
+ do_sr_interrupt_regs(save)
+ do_sr_debug_regs(save)
+
+ blr
+
+_GLOBAL(fsl_cpu_append_restore)
+ mfspr r0, SPRN_PVR
+ rlwinm r11, r0, 16, 16, 31
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E6500@l
+ cmpw r11, r12
+ beq e6500_append_restore
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E5500@l
+ cmpw r11, r12
+ beq e5500_append_restore
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500MC@l
+ cmpw r11, r12
+ beq e500mc_append_restore
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500V2@l
+ cmpw r11, r12
+ beq e500v2_append_restore
+
+ lis r12, 0
+ ori r12, r12, PVR_VER_E500V1@l
+ cmpw r11, r12
+ beq e500v1_append_restore
+
+ b 1f
+
+e6500_append_restore:
+ do_e6500_sr_interrupt_regs(restore)
+ do_e6500_sr_debug_regs(restore)
+ b 1f
+
+e5500_append_restore:
+ do_e5500_sr_interrupt_regs(restore)
+ b 1f
+
+e500mc_append_restore:
+ do_e500mc_sr_interrupt_regs(restore)
+ b 1f
+
+e500v2_append_restore:
+e500v1_append_restore:
+ do_e500_sr_interrupt_regs(restore)
+
+1:
+ do_sr_interrupt_regs(restore)
+ do_sr_debug_regs(restore)
+
+ sync
+
+ blr
+
+/*
+ * r3 = the virtual address of buffer
+ * r4 = suspend type, 0-BASE_SAVE, 1-ALL_SAVE
+ */
+_GLOBAL(fsl_cpu_state_save)
+ mflr r9
+ LOAD_SAVE_ADDRESS
+
+ /* save the return address to SR_LR */
+ do_save_gpr_reg(r9, SR_LR)
+
+ /* if core_save_type is BASE_SAVE, goto 1f */
+ cmpwi r4, 0
+ beq 1f
+
+ bl fsl_cpu_append_save
+
+1:
+ bl e500_base_special_save
+
+ bl fsl_cpu_base_save
+
+ li r3, 0
+ mtlr r9
+ blr
+
+/*
+ * r3 = the virtual address of buffer
+ * r4 = suspend type, 0-BASE_SAVE, 1-ALL_SAVE
+ */
+_GLOBAL(fsl_cpu_state_restore)
+ mflr r9
+ LOAD_SAVE_ADDRESS
+
+ /*
+ * Disable machine checks and critical exceptions,
+ * if core_save_type is ALL_SAVE, we will restore interrupt
+ * IVORs registers.
+ */
+ mfmsr r5
+ rlwinm r5, r5, 0, ~MSR_CE
+ rlwinm r5, r5, 0, ~MSR_ME
+ mtmsr r5
+ isync
+
+ /* if core_save_type is BASE_SAVE, goto 1f */
+ cmpwi r4, 0
+ beq 1f
+
+ bl fsl_cpu_append_restore
+
+1:
+ bl e500_base_special_restore
+
+ bl fsl_cpu_base_restore
+
+ /* return the return address of the save time */
+ do_restore_gpr_reg(r9, SR_LR)
+
+ li r3, 0
+ mtlr r9
+ blr
--
1.8.5
^ permalink raw reply related
* [PATCH 1/3] powerpc/fsl: add E500MC and E5500 PVR define
From: Dongsheng Wang @ 2014-01-14 7:59 UTC (permalink / raw)
To: scottwood, benh; +Cc: anton, linuxppc-dev, chenhui.zhao, Wang Dongsheng
From: Wang Dongsheng <dongsheng.wang@freescale.com>
E500MC and E5500 PVR will be used in subsequent save/restore core
state patches.
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 62b114e..cd7b630 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1075,6 +1075,8 @@
#define PVR_8560 0x80200000
#define PVR_VER_E500V1 0x8020
#define PVR_VER_E500V2 0x8021
+#define PVR_VER_E500MC 0x8023
+#define PVR_VER_E5500 0x8024
#define PVR_VER_E6500 0x8040
/*
--
1.8.5
^ permalink raw reply related
* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Deepthi Dharwar @ 2014-01-14 8:00 UTC (permalink / raw)
To: Srivatsa S. Bhat
Cc: linux-pm, daniel.lezcano, rjw, linux-kernel, Preeti U Murthy,
paulmck, linuxppc-dev, tuukka.tikkanen
In-Reply-To: <52D4E07E.204@linux.vnet.ibm.com>
On 01/14/2014 12:30 PM, Srivatsa S. Bhat wrote:
> On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
>> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
>> Inspite of this it was observed that the idle state count of the shallowest
>> idle state, snooze, was increasing.
>>
>> This is because the governor returns the idle state index as 0 even in
>> scenarios when no idle state can be chosen. These scenarios could be when the
>> latency requirement is 0 or as mentioned above when the user wants to disable
>> certain cpu idle states at runtime. In the latter case, its possible that no
>> cpu idle state is valid because the suitable states were disabled
>> and the rest did not match the menu governor criteria to be chosen as the
>> next idle state.
>>
>> This patch adds the code to indicate that a valid cpu idle state could not be
>> chosen by the menu governor and reports back to arch so that it can take some
>> default action.
>>
>
> That sounds fair enough. However, the "default" action of pseries idle loop
> (pseries_lpar_idle()) surprises me. It enters Cede, which is _deeper_ than doing
> a snooze! IOW, a user might "disable" cpuidle or set the PM_QOS_CPU_DMA_LATENCY
> to 0 hoping to prevent the CPUs from going to deep idle states, but then the
> machine would still end up going to Cede, even though that wont get reflected
> in the idle state counts. IMHO that scenario needs some thought as well...
It was the snooze loop earlier but later we changed it to cede in commit
363edbe2614 powerpc: Default arch idle will cede the processor on
pseries to address the following regressions:
>>snippet from the patch.
When adding cpuidle support to pSeries, we introduced two
regressions:
- The new cpuidle backend driver only works under hypervisors
supporting the "SLPLAR" option, which isn't the case of the
old POWER4 hypervisor and the HV "light" used on js2x blades
- The cpuidle driver registers fairly late, meaning that for
a significant portion of the boot process, we end up having
all threads spinning. This slows down the boot process and
increases the overall resource usage if the hypervisor has
shared processors.
This fixes both by implementing a "default" idle that will cede
to the hypervisor when possible, in a very simple way without
all the bells and whisles of cpuidle.
Regards,
Deepthi
>> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>> ---
>>
>> drivers/cpuidle/cpuidle.c | 6 +++++-
>> drivers/cpuidle/governors/menu.c | 7 ++++---
>> 2 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index a55e68f..5bf06bb 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>>
>> /* ask the governor for the next state */
>> next_state = cpuidle_curr_governor->select(drv, dev);
>> +
>> + dev->last_residency = 0;
>> if (need_resched()) {
>> - dev->last_residency = 0;
>> /* give the governor an opportunity to reflect on the outcome */
>> if (cpuidle_curr_governor->reflect)
>> cpuidle_curr_governor->reflect(dev, next_state);
>
> The comments on top of the .reflect() routines of the governors say that the
> second parameter is the index of the actual state entered. But after this patch,
> next_state can be negative, indicating an invalid index. So those comments need
> to be updated accordingly.
>
>> @@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
>> return 0;
>> }
>>
>> + if (next_state < 0)
>> + return -EINVAL;
>
> The exit path above (due to need_resched) returns with irqs enabled, but the new
> one you are adding (next_state < 0) returns with irqs disabled. This is correct,
> because in the latter case, "idle" is still in progress and the arch will choose
> a default handler to execute (unlike the former case where "idle" is over and
> hence its time to enable interrupts).
>
> IMHO it would be good to add comments around this code to explain this subtle
> difference. We can never be too careful with these things... ;-)
>
>> +
>> trace_cpu_idle_rcuidle(next_state, dev->cpu);
>>
>> broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>> index cf7f2f0..6921543 100644
>> --- a/drivers/cpuidle/governors/menu.c
>> +++ b/drivers/cpuidle/governors/menu.c
>> @@ -283,6 +283,7 @@ again:
>> * menu_select - selects the next idle state to enter
>> * @drv: cpuidle driver containing state data
>> * @dev: the CPU
>> + * Returns -1 when no idle state is suitable
>> */
>> static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>> {
>> @@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>> int multiplier;
>> struct timespec t;
>>
>> - if (data->needs_update) {
>> + if (data->last_state_idx >= 0 && data->needs_update) {
> ^^^^^
> Doesn't hurt, but actually unnecessary, since ->needs_update is set to 1
> only when index >= 0.
>
>> menu_update(drv, dev);
>> data->needs_update = 0;
>> }
>>
>> - data->last_state_idx = 0;
>> + data->last_state_idx = -1;
>> data->exit_us = 0;
>>
>> /* Special case when user has set very strict latency requirement */
>> if (unlikely(latency_req == 0))
>> - return 0;
>> + return data->last_state_idx;
>>
>> /* determine the expected residency time, round up */
>> t = ktime_to_timespec(tick_nohz_get_sleep_length());
>>
>
> What about the ladder governor? I know its not used that much in practice,
> but I think it would be good to update that as well, just to keep it
> consistent.
>
> Regards,
> Srivatsa S. Bhat
>
^ permalink raw reply
* Re: [PATCH] Move precessing of MCE queued event out from syscall exit path.
From: Hugh Dickins @ 2014-01-14 7:47 UTC (permalink / raw)
To: Mahesh J Salgaonkar, Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <20140114042611.13145.6551.stgit@mars.in.ibm.com>
On Tue, 14 Jan 2014, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>
> Huge Dickins reported an issue that b5ff4211a829
> "powerpc/book3s: Queue up and process delayed MCE events" breaks the
> PowerMac G5 boot. This patch fixes it by moving the mce even processing
> away from syscall exit, which was wrong to do that in first place, and
> implements a different mechanism to deal with it using a paca flag and
> decrementer interrupt to process the event.
>
> Reported-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Good work guys: I can happily report that both this rework,
and Ben's one-liner, fix the issue for me on the G5: thank you.
(Irrelevant not-so-happy detail: I very nearly mailed you an hour or
so earlier to report that neither fixed it; but retried my original
CONFIG_PPC_POWERNV hack after, and found that now equally useless.
I did write of changing behaviour and ATA errors: it now appears that's
an independent but intermittent issue on the G5 in 3.13-rc7-mm1, which
coincidentally happened to trigger when I tested rc7-mm1 without fixes,
but not when I tested with my hack, until today.
I've gone back to testing on rc6-mm1, the previous week's mmotm,
which showed failure to run /sbin/init: rc6-mm1 has no trouble mounting
root, and it runs properly with your new patch, and with Ben's patch.
And I may be quite wrong to point a finger at ATA errors: perhaps
they're always shown, and quickly cleared off screen in successful boots,
but left visible when root cannot be mounted for some other reason.
I don't know, and won't have time to investigate further - bisecting
intermittents is not much fun! I'll just have to hope that it's
sorted out before it reaches 3.14-rc, or else bite the bullet and
investigate on that.)
Hugh
> ---
> arch/powerpc/include/asm/mce.h | 3 +++
> arch/powerpc/include/asm/paca.h | 3 +++
> arch/powerpc/kernel/entry_64.S | 5 -----
> arch/powerpc/kernel/irq.c | 11 ++++++++++-
> arch/powerpc/kernel/mce.c | 7 +++++++
> arch/powerpc/kernel/time.c | 9 +++++++++
> 6 files changed, 32 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
> index 2257d1e..225e678 100644
> --- a/arch/powerpc/include/asm/mce.h
> +++ b/arch/powerpc/include/asm/mce.h
> @@ -186,6 +186,9 @@ struct mce_error_info {
> #define MCE_EVENT_RELEASE true
> #define MCE_EVENT_DONTRELEASE false
>
> +/* MCE bit flags (paca.mce_flags) */
> +#define MCE_EVENT_PENDING 0x0001
> +
> extern void save_mce_event(struct pt_regs *regs, long handled,
> struct mce_error_info *mce_err, uint64_t nip,
> uint64_t addr);
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index c3523d1..f9aa521 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -141,6 +141,9 @@ struct paca_struct {
> u8 io_sync; /* writel() needs spin_unlock sync */
> u8 irq_work_pending; /* IRQ_WORK interrupt while soft-disable */
> u8 nap_state_lost; /* NV GPR values lost in power7_idle */
> +#ifdef CONFIG_PPC_BOOK3S_64
> + u8 mce_flags; /* MCE bit flags. */
> +#endif
> u64 sprg3; /* Saved user-visible sprg */
> #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> u64 tm_scratch; /* TM scratch area for reclaim */
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 770d6d6..bbfb029 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -184,11 +184,6 @@ syscall_exit:
> bl .do_show_syscall_exit
> ld r3,RESULT(r1)
> #endif
> -#ifdef CONFIG_PPC_BOOK3S_64
> -BEGIN_FTR_SECTION
> - bl .machine_check_process_queued_event
> -END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
> -#endif
> CURRENT_THREAD_INFO(r12, r1)
>
> ld r8,_MSR(r1)
> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index ba01656..e22f591 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -67,6 +67,7 @@
> #include <asm/udbg.h>
> #include <asm/smp.h>
> #include <asm/debug.h>
> +#include <asm/mce.h>
>
> #ifdef CONFIG_PPC64
> #include <asm/paca.h>
> @@ -158,9 +159,17 @@ notrace unsigned int __check_irq_replay(void)
> * We may have missed a decrementer interrupt. We check the
> * decrementer itself rather than the paca irq_happened field
> * in case we also had a rollover while hard disabled
> + * Also check if any MCE event is queued up that requires
> + * processing. Machine check handler would set paca->mce_flags
> + * and then call set_dec(1) to trigger a decrementer interrupt
> + * from NMI.
> */
> local_paca->irq_happened &= ~PACA_IRQ_DEC;
> - if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow())
> + if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow()
> +#ifdef CONFIG_PPC_BOOK3S_64
> + || local_paca->mce_flags & MCE_EVENT_PENDING
> +#endif
> + )
> return 0x900;
>
> /* Finally check if an external interrupt happened */
> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index d6edf2b..7bab827 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -185,6 +185,13 @@ void machine_check_queue_event(void)
> return;
> }
> __get_cpu_var(mce_event_queue[index]) = evt;
> +
> + /*
> + * Set the event pending flag and raise an decrementer interrupt
> + * to process the queued event later.
> + */
> + local_paca->mce_flags |= MCE_EVENT_PENDING;
> + set_dec(1);
> }
>
> /*
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index b3b1441..87ccf92 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -69,6 +69,7 @@
> #include <asm/vdso_datapage.h>
> #include <asm/firmware.h>
> #include <asm/cputime.h>
> +#include <asm/mce.h>
>
> /* powerpc clocksource/clockevent code */
>
> @@ -505,6 +506,14 @@ void timer_interrupt(struct pt_regs * regs)
> return;
> }
>
> +#ifdef CONFIG_PPC_BOOK3S_64
> + /* Check if we have MCE event pending for processing. */
> + if (local_paca->mce_flags & MCE_EVENT_PENDING) {
> + local_paca->mce_flags &= ~MCE_EVENT_PENDING;
> + machine_check_process_queued_event();
> + }
> +#endif
> +
> /* Conditionally hard-enable interrupts now that the DEC has been
> * bumped to its maximum value
> */
^ permalink raw reply
* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Srivatsa S. Bhat @ 2014-01-14 7:37 UTC (permalink / raw)
To: Preeti U Murthy
Cc: deepthi, linux-pm, daniel.lezcano, rjw, linux-kernel, paulmck,
linuxppc-dev, tuukka.tikkanen
In-Reply-To: <52D4E07E.204@linux.vnet.ibm.com>
On 01/14/2014 12:30 PM, Srivatsa S. Bhat wrote:
> On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
>> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
>> Inspite of this it was observed that the idle state count of the shallowest
>> idle state, snooze, was increasing.
>>
>> This is because the governor returns the idle state index as 0 even in
>> scenarios when no idle state can be chosen. These scenarios could be when the
>> latency requirement is 0 or as mentioned above when the user wants to disable
>> certain cpu idle states at runtime. In the latter case, its possible that no
>> cpu idle state is valid because the suitable states were disabled
>> and the rest did not match the menu governor criteria to be chosen as the
>> next idle state.
>>
>> This patch adds the code to indicate that a valid cpu idle state could not be
>> chosen by the menu governor and reports back to arch so that it can take some
>> default action.
>>
>
> That sounds fair enough. However, the "default" action of pseries idle loop
> (pseries_lpar_idle()) surprises me. It enters Cede, which is _deeper_ than doing
> a snooze! IOW, a user might "disable" cpuidle or set the PM_QOS_CPU_DMA_LATENCY
> to 0 hoping to prevent the CPUs from going to deep idle states, but then the
> machine would still end up going to Cede, even though that wont get reflected
> in the idle state counts. IMHO that scenario needs some thought as well...
>
I checked the git history and found that the default idle was changed (on purpose)
to cede the processor, in order to speed up booting.. Hmm..
commit 363edbe2614aa90df706c0f19ccfa2a6c06af0be
Author: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Date: Fri Sep 6 00:25:06 2013 +0530
powerpc: Default arch idle could cede processor on pseries
Regards,
Srivatsa S. Bhat
^ permalink raw reply
* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Srivatsa S. Bhat @ 2014-01-14 7:00 UTC (permalink / raw)
To: Preeti U Murthy
Cc: deepthi, linux-pm, daniel.lezcano, rjw, linux-kernel, paulmck,
linuxppc-dev, tuukka.tikkanen
In-Reply-To: <20140114060516.6109.14901.stgit@preeti.in.ibm.com>
On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
> Inspite of this it was observed that the idle state count of the shallowest
> idle state, snooze, was increasing.
>
> This is because the governor returns the idle state index as 0 even in
> scenarios when no idle state can be chosen. These scenarios could be when the
> latency requirement is 0 or as mentioned above when the user wants to disable
> certain cpu idle states at runtime. In the latter case, its possible that no
> cpu idle state is valid because the suitable states were disabled
> and the rest did not match the menu governor criteria to be chosen as the
> next idle state.
>
> This patch adds the code to indicate that a valid cpu idle state could not be
> chosen by the menu governor and reports back to arch so that it can take some
> default action.
>
That sounds fair enough. However, the "default" action of pseries idle loop
(pseries_lpar_idle()) surprises me. It enters Cede, which is _deeper_ than doing
a snooze! IOW, a user might "disable" cpuidle or set the PM_QOS_CPU_DMA_LATENCY
to 0 hoping to prevent the CPUs from going to deep idle states, but then the
machine would still end up going to Cede, even though that wont get reflected
in the idle state counts. IMHO that scenario needs some thought as well...
> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> ---
>
> drivers/cpuidle/cpuidle.c | 6 +++++-
> drivers/cpuidle/governors/menu.c | 7 ++++---
> 2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index a55e68f..5bf06bb 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>
> /* ask the governor for the next state */
> next_state = cpuidle_curr_governor->select(drv, dev);
> +
> + dev->last_residency = 0;
> if (need_resched()) {
> - dev->last_residency = 0;
> /* give the governor an opportunity to reflect on the outcome */
> if (cpuidle_curr_governor->reflect)
> cpuidle_curr_governor->reflect(dev, next_state);
The comments on top of the .reflect() routines of the governors say that the
second parameter is the index of the actual state entered. But after this patch,
next_state can be negative, indicating an invalid index. So those comments need
to be updated accordingly.
> @@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
> return 0;
> }
>
> + if (next_state < 0)
> + return -EINVAL;
The exit path above (due to need_resched) returns with irqs enabled, but the new
one you are adding (next_state < 0) returns with irqs disabled. This is correct,
because in the latter case, "idle" is still in progress and the arch will choose
a default handler to execute (unlike the former case where "idle" is over and
hence its time to enable interrupts).
IMHO it would be good to add comments around this code to explain this subtle
difference. We can never be too careful with these things... ;-)
> +
> trace_cpu_idle_rcuidle(next_state, dev->cpu);
>
> broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index cf7f2f0..6921543 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -283,6 +283,7 @@ again:
> * menu_select - selects the next idle state to enter
> * @drv: cpuidle driver containing state data
> * @dev: the CPU
> + * Returns -1 when no idle state is suitable
> */
> static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> {
> @@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> int multiplier;
> struct timespec t;
>
> - if (data->needs_update) {
> + if (data->last_state_idx >= 0 && data->needs_update) {
^^^^^
Doesn't hurt, but actually unnecessary, since ->needs_update is set to 1
only when index >= 0.
> menu_update(drv, dev);
> data->needs_update = 0;
> }
>
> - data->last_state_idx = 0;
> + data->last_state_idx = -1;
> data->exit_us = 0;
>
> /* Special case when user has set very strict latency requirement */
> if (unlikely(latency_req == 0))
> - return 0;
> + return data->last_state_idx;
>
> /* determine the expected residency time, round up */
> t = ktime_to_timespec(tick_nohz_get_sleep_length());
>
What about the ladder governor? I know its not used that much in practice,
but I think it would be good to update that as well, just to keep it
consistent.
Regards,
Srivatsa S. Bhat
^ permalink raw reply
* Re: [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Deepthi Dharwar @ 2014-01-14 6:16 UTC (permalink / raw)
To: Preeti U Murthy
Cc: linux-pm, daniel.lezcano, rjw, linux-kernel, srivatsa.bhat,
paulmck, linuxppc-dev, tuukka.tikkanen
In-Reply-To: <20140114060516.6109.14901.stgit@preeti.in.ibm.com>
On 01/14/2014 11:35 AM, Preeti U Murthy wrote:
> On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
> Inspite of this it was observed that the idle state count of the shallowest
> idle state, snooze, was increasing.
>
> This is because the governor returns the idle state index as 0 even in
> scenarios when no idle state can be chosen. These scenarios could be when the
> latency requirement is 0 or as mentioned above when the user wants to disable
> certain cpu idle states at runtime. In the latter case, its possible that no
> cpu idle state is valid because the suitable states were disabled
> and the rest did not match the menu governor criteria to be chosen as the
> next idle state.
>
> This patch adds the code to indicate that a valid cpu idle state could not be
> chosen by the menu governor and reports back to arch so that it can take some
> default action.
>
> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> ---
Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
>
> drivers/cpuidle/cpuidle.c | 6 +++++-
> drivers/cpuidle/governors/menu.c | 7 ++++---
> 2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index a55e68f..5bf06bb 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
>
> /* ask the governor for the next state */
> next_state = cpuidle_curr_governor->select(drv, dev);
> +
> + dev->last_residency = 0;
> if (need_resched()) {
> - dev->last_residency = 0;
> /* give the governor an opportunity to reflect on the outcome */
> if (cpuidle_curr_governor->reflect)
> cpuidle_curr_governor->reflect(dev, next_state);
> @@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
> return 0;
> }
>
> + if (next_state < 0)
> + return -EINVAL;
> +
> trace_cpu_idle_rcuidle(next_state, dev->cpu);
>
> broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index cf7f2f0..6921543 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -283,6 +283,7 @@ again:
> * menu_select - selects the next idle state to enter
> * @drv: cpuidle driver containing state data
> * @dev: the CPU
> + * Returns -1 when no idle state is suitable
> */
> static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> {
> @@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> int multiplier;
> struct timespec t;
>
> - if (data->needs_update) {
> + if (data->last_state_idx >= 0 && data->needs_update) {
> menu_update(drv, dev);
> data->needs_update = 0;
> }
>
> - data->last_state_idx = 0;
> + data->last_state_idx = -1;
> data->exit_us = 0;
>
> /* Special case when user has set very strict latency requirement */
> if (unlikely(latency_req == 0))
> - return 0;
> + return data->last_state_idx;
>
> /* determine the expected residency time, round up */
> t = ktime_to_timespec(tick_nohz_get_sleep_length());
>
^ permalink raw reply
* [PATCH] powerpc: Fix races with irq_work
From: Benjamin Herrenschmidt @ 2014-01-14 6:11 UTC (permalink / raw)
To: linuxppc-dev list
If we set irq_work on a processor and immediately afterward, before the
irq work has a chance to be processed, we change the decrementer value,
we can seriously delay the handling of that irq_work.
Fix it by checking in a few places for pending irq work, first before
changing the decrementer in decrementer_set_next_event() and after
changing it in the same function and in timer_interrupt().
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index afb1b56..b3dab20 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -536,6 +536,9 @@ void timer_interrupt(struct pt_regs * regs)
now = *next_tb - now;
if (now <= DECREMENTER_MAX)
set_dec((int)now);
+ /* We may have raced with new irq work */
+ if (test_irq_work_pending())
+ set_dec(1);
__get_cpu_var(irq_stat).timer_irqs_others++;
}
@@ -802,8 +805,16 @@ static void __init clocksource_init(void)
static int decrementer_set_next_event(unsigned long evt,
struct clock_event_device *dev)
{
+ /* Don't adjust the decrementer if some irq work is pending */
+ if (test_irq_work_pending())
+ return 0;
__get_cpu_var(decrementers_next_tb) = get_tb_or_rtc() + evt;
set_dec(evt);
+
+ /* We may have raced with new irq work */
+ if (test_irq_work_pending())
+ set_dec(1);
+
return 0;
}
^ permalink raw reply related
* [PATCH] cpuidle/menu: Fail cpuidle_idle_call() if no idle state is acceptable
From: Preeti U Murthy @ 2014-01-14 6:05 UTC (permalink / raw)
To: deepthi, paulmck, linux-pm, benh, daniel.lezcano, rjw,
linux-kernel, srivatsa.bhat, svaidy, linuxppc-dev,
tuukka.tikkanen
On PowerPC, in a particular test scenario, all the cpu idle states were disabled.
Inspite of this it was observed that the idle state count of the shallowest
idle state, snooze, was increasing.
This is because the governor returns the idle state index as 0 even in
scenarios when no idle state can be chosen. These scenarios could be when the
latency requirement is 0 or as mentioned above when the user wants to disable
certain cpu idle states at runtime. In the latter case, its possible that no
cpu idle state is valid because the suitable states were disabled
and the rest did not match the menu governor criteria to be chosen as the
next idle state.
This patch adds the code to indicate that a valid cpu idle state could not be
chosen by the menu governor and reports back to arch so that it can take some
default action.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---
drivers/cpuidle/cpuidle.c | 6 +++++-
| 7 ++++---
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a55e68f..5bf06bb 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
/* ask the governor for the next state */
next_state = cpuidle_curr_governor->select(drv, dev);
+
+ dev->last_residency = 0;
if (need_resched()) {
- dev->last_residency = 0;
/* give the governor an opportunity to reflect on the outcome */
if (cpuidle_curr_governor->reflect)
cpuidle_curr_governor->reflect(dev, next_state);
@@ -140,6 +141,9 @@ int cpuidle_idle_call(void)
return 0;
}
+ if (next_state < 0)
+ return -EINVAL;
+
trace_cpu_idle_rcuidle(next_state, dev->cpu);
broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
--git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index cf7f2f0..6921543 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -283,6 +283,7 @@ again:
* menu_select - selects the next idle state to enter
* @drv: cpuidle driver containing state data
* @dev: the CPU
+ * Returns -1 when no idle state is suitable
*/
static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
{
@@ -292,17 +293,17 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
int multiplier;
struct timespec t;
- if (data->needs_update) {
+ if (data->last_state_idx >= 0 && data->needs_update) {
menu_update(drv, dev);
data->needs_update = 0;
}
- data->last_state_idx = 0;
+ data->last_state_idx = -1;
data->exit_us = 0;
/* Special case when user has set very strict latency requirement */
if (unlikely(latency_req == 0))
- return 0;
+ return data->last_state_idx;
/* determine the expected residency time, round up */
t = ktime_to_timespec(tick_nohz_get_sleep_length());
^ permalink raw reply related
* Re: [PATCH] powerpc: thp: Fix crash on mremap
From: Andrew Morton @ 2014-01-14 4:32 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: aarcange, linux-mm, paulus, Aneesh Kumar K.V, Kirill A. Shutemov,
linuxppc-dev, kirill.shutemov
In-Reply-To: <1389672810.6933.0.camel@pasglop>
On Tue, 14 Jan 2014 15:13:30 +1100 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> On Mon, 2014-01-13 at 14:17 -0800, Andrew Morton wrote:
>
> > Did this get fixed?
>
> Any chance you can Ack the patch on that thread ?
>
> http://thread.gmane.org/gmane.linux.kernel.mm/111809
>
> So I can put it in powerpc -next with a CC stable ? Or if you tell me
> tat Kirill Ack is sufficient then I'll go for it.
yup, it looks OK to me from a non-ppc perspective. Please proceed as
described.
^ permalink raw reply
* [PATCH] Move precessing of MCE queued event out from syscall exit path.
From: Mahesh J Salgaonkar @ 2014-01-14 4:26 UTC (permalink / raw)
To: linuxppc-dev, Benjamin Herrenschmidt; +Cc: Hugh Dickins
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Huge Dickins reported an issue that b5ff4211a829
"powerpc/book3s: Queue up and process delayed MCE events" breaks the
PowerMac G5 boot. This patch fixes it by moving the mce even processing
away from syscall exit, which was wrong to do that in first place, and
implements a different mechanism to deal with it using a paca flag and
decrementer interrupt to process the event.
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/mce.h | 3 +++
arch/powerpc/include/asm/paca.h | 3 +++
arch/powerpc/kernel/entry_64.S | 5 -----
arch/powerpc/kernel/irq.c | 11 ++++++++++-
arch/powerpc/kernel/mce.c | 7 +++++++
arch/powerpc/kernel/time.c | 9 +++++++++
6 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index 2257d1e..225e678 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -186,6 +186,9 @@ struct mce_error_info {
#define MCE_EVENT_RELEASE true
#define MCE_EVENT_DONTRELEASE false
+/* MCE bit flags (paca.mce_flags) */
+#define MCE_EVENT_PENDING 0x0001
+
extern void save_mce_event(struct pt_regs *regs, long handled,
struct mce_error_info *mce_err, uint64_t nip,
uint64_t addr);
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index c3523d1..f9aa521 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -141,6 +141,9 @@ struct paca_struct {
u8 io_sync; /* writel() needs spin_unlock sync */
u8 irq_work_pending; /* IRQ_WORK interrupt while soft-disable */
u8 nap_state_lost; /* NV GPR values lost in power7_idle */
+#ifdef CONFIG_PPC_BOOK3S_64
+ u8 mce_flags; /* MCE bit flags. */
+#endif
u64 sprg3; /* Saved user-visible sprg */
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
u64 tm_scratch; /* TM scratch area for reclaim */
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 770d6d6..bbfb029 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -184,11 +184,6 @@ syscall_exit:
bl .do_show_syscall_exit
ld r3,RESULT(r1)
#endif
-#ifdef CONFIG_PPC_BOOK3S_64
-BEGIN_FTR_SECTION
- bl .machine_check_process_queued_event
-END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
-#endif
CURRENT_THREAD_INFO(r12, r1)
ld r8,_MSR(r1)
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index ba01656..e22f591 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -67,6 +67,7 @@
#include <asm/udbg.h>
#include <asm/smp.h>
#include <asm/debug.h>
+#include <asm/mce.h>
#ifdef CONFIG_PPC64
#include <asm/paca.h>
@@ -158,9 +159,17 @@ notrace unsigned int __check_irq_replay(void)
* We may have missed a decrementer interrupt. We check the
* decrementer itself rather than the paca irq_happened field
* in case we also had a rollover while hard disabled
+ * Also check if any MCE event is queued up that requires
+ * processing. Machine check handler would set paca->mce_flags
+ * and then call set_dec(1) to trigger a decrementer interrupt
+ * from NMI.
*/
local_paca->irq_happened &= ~PACA_IRQ_DEC;
- if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow())
+ if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow()
+#ifdef CONFIG_PPC_BOOK3S_64
+ || local_paca->mce_flags & MCE_EVENT_PENDING
+#endif
+ )
return 0x900;
/* Finally check if an external interrupt happened */
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index d6edf2b..7bab827 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -185,6 +185,13 @@ void machine_check_queue_event(void)
return;
}
__get_cpu_var(mce_event_queue[index]) = evt;
+
+ /*
+ * Set the event pending flag and raise an decrementer interrupt
+ * to process the queued event later.
+ */
+ local_paca->mce_flags |= MCE_EVENT_PENDING;
+ set_dec(1);
}
/*
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index b3b1441..87ccf92 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -69,6 +69,7 @@
#include <asm/vdso_datapage.h>
#include <asm/firmware.h>
#include <asm/cputime.h>
+#include <asm/mce.h>
/* powerpc clocksource/clockevent code */
@@ -505,6 +506,14 @@ void timer_interrupt(struct pt_regs * regs)
return;
}
+#ifdef CONFIG_PPC_BOOK3S_64
+ /* Check if we have MCE event pending for processing. */
+ if (local_paca->mce_flags & MCE_EVENT_PENDING) {
+ local_paca->mce_flags &= ~MCE_EVENT_PENDING;
+ machine_check_process_queued_event();
+ }
+#endif
+
/* Conditionally hard-enable interrupts now that the DEC has been
* bumped to its maximum value
*/
^ permalink raw reply related
* Re: [PATCH mmotm/next] powerpc: fix powernv boot breakage on G5???
From: Benjamin Herrenschmidt @ 2014-01-14 4:17 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Mahesh Salgaonkar, linuxppc-dev
In-Reply-To: <alpine.LSU.2.11.1401120043210.1092@eggly.anvils>
On Sun, 2014-01-12 at 00:46 -0800, Hugh Dickins wrote:
> My PowerMac G5 cannot boot mmotm these days: different symptoms
> (starting /sbin/init failed? or ATA errors and hang?), with unrelated
> bugs adding to the confusion; but a bisection led to b5ff4211a829
> "powerpc/book3s: Queue up and process delayed MCE events". Since that
> series seems to be mostly about powernv, I tried changing BOOK3S_64
> to POWERNV in entry_64.S, which has got it back to working for me.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> just in case this happens to be right, but it's well beyond me!
> ---
Do that help instead ?
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 770d6d6..9820d36 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -187,6 +187,7 @@ syscall_exit:
#ifdef CONFIG_PPC_BOOK3S_64
BEGIN_FTR_SECTION
bl .machine_check_process_queued_event
+ ld r3,RESULT(r1)
END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
#endif
CURRENT_THREAD_INFO(r12, r1)
Cheers,
Ben.
>
> arch/powerpc/kernel/entry_64.S | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- mmotm/arch/powerpc/kernel/entry_64.S 2014-01-10 18:24:56.940448828 -0800
> +++ linux/arch/powerpc/kernel/entry_64.S 2014-01-10 18:29:24.276455182 -0800
> @@ -184,7 +184,7 @@ syscall_exit:
> bl .do_show_syscall_exit
> ld r3,RESULT(r1)
> #endif
> -#ifdef CONFIG_PPC_BOOK3S_64
> +#ifdef CONFIG_PPC_POWERNV
> BEGIN_FTR_SECTION
> bl .machine_check_process_queued_event
> END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
^ permalink raw reply related
* Re: [PATCH] powerpc: thp: Fix crash on mremap
From: Benjamin Herrenschmidt @ 2014-01-14 4:13 UTC (permalink / raw)
To: Andrew Morton
Cc: aarcange, linux-mm, paulus, Aneesh Kumar K.V, Kirill A. Shutemov,
linuxppc-dev, kirill.shutemov
In-Reply-To: <20140113141748.0b851e1573e41bf26de7c0ae@linux-foundation.org>
On Mon, 2014-01-13 at 14:17 -0800, Andrew Morton wrote:
> Did this get fixed?
Any chance you can Ack the patch on that thread ?
http://thread.gmane.org/gmane.linux.kernel.mm/111809
So I can put it in powerpc -next with a CC stable ? Or if you tell me
tat Kirill Ack is sufficient then I'll go for it.
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH] powerpc: thp: Fix crash on mremap
From: Kirill A. Shutemov @ 2014-01-13 22:30 UTC (permalink / raw)
To: Andrew Morton
Cc: aarcange, linux-mm, paulus, Aneesh Kumar K.V, linuxppc-dev,
kirill.shutemov
In-Reply-To: <20140113141748.0b851e1573e41bf26de7c0ae@linux-foundation.org>
On Mon, Jan 13, 2014 at 02:17:48PM -0800, Andrew Morton wrote:
> On Thu, 2 Jan 2014 04:19:51 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>
> > On Wed, Jan 01, 2014 at 09:29:05PM +1100, Benjamin Herrenschmidt wrote:
> > > On Wed, 2014-01-01 at 15:23 +0530, Aneesh Kumar K.V wrote:
> > > > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > > >
> > > > This patch fix the below crash
> > > >
> > > > NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
> > > > LR [c0000000000439ac] .hash_page+0x18c/0x5e0
> > > > ...
> > > > Call Trace:
> > > > [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
> > > > [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
> > > > [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
> > > >
> > > > On ppc64 we use the pgtable for storing the hpte slot information and
> > > > store address to the pgtable at a constant offset (PTRS_PER_PMD) from
> > > > pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
> > > > the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
> > > > from new pmd.
> > > >
> > > > We also want to move the withdraw and deposit before the set_pmd so
> > > > that, when page fault find the pmd as trans huge we can be sure that
> > > > pgtable can be located at the offset.
> > > >
>
> Did this get fixed?
New version: http://thread.gmane.org/gmane.linux.kernel.mm/111809
--
Kirill A. Shutemov
^ permalink raw reply
* Re: [PATCH] powerpc: thp: Fix crash on mremap
From: Andrew Morton @ 2014-01-13 22:17 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: aarcange, linux-mm, paulus, Aneesh Kumar K.V, linuxppc-dev,
kirill.shutemov
In-Reply-To: <20140102021951.GA26369@node.dhcp.inet.fi>
On Thu, 2 Jan 2014 04:19:51 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> On Wed, Jan 01, 2014 at 09:29:05PM +1100, Benjamin Herrenschmidt wrote:
> > On Wed, 2014-01-01 at 15:23 +0530, Aneesh Kumar K.V wrote:
> > > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > >
> > > This patch fix the below crash
> > >
> > > NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
> > > LR [c0000000000439ac] .hash_page+0x18c/0x5e0
> > > ...
> > > Call Trace:
> > > [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
> > > [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
> > > [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
> > >
> > > On ppc64 we use the pgtable for storing the hpte slot information and
> > > store address to the pgtable at a constant offset (PTRS_PER_PMD) from
> > > pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
> > > the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
> > > from new pmd.
> > >
> > > We also want to move the withdraw and deposit before the set_pmd so
> > > that, when page fault find the pmd as trans huge we can be sure that
> > > pgtable can be located at the offset.
> > >
Did this get fixed?
^ permalink raw reply
* Re: [PATCH v2 0/9] cpuidle: rework device state count handling
From: Rafael J. Wysocki @ 2014-01-13 21:20 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: linux-samsung-soc, linux-pm, daniel.lezcano, linux-kernel,
kyungmin.park, linuxppc-dev, lenb
In-Reply-To: <2079155.EyEBRDoJjP@vostro.rjw.lan>
On Saturday, January 11, 2014 01:37:29 AM Rafael J. Wysocki wrote:
> On Friday, December 20, 2013 07:47:22 PM Bartlomiej Zolnierkiewicz wrote:
> > Hi,
> >
> > Some cpuidle drivers assume that cpuidle core will handle cases where
> > device->state_count is smaller than driver->state_count, unfortunately
> > currently this is untrue (device->state_count is used only for handling
> > cpuidle state sysfs entries and driver->state_count is used for all
> > other cases) and will not be fixed in the future as device->state_count
> > is planned to be removed [1].
> >
> > This patchset fixes such drivers (ARM EXYNOS cpuidle driver and ACPI
> > cpuidle driver), removes superflous device->state_count initialization
> > from drivers for which device->state_count equals driver->state_count
> > (POWERPC pseries cpuidle driver and intel_idle driver) and finally
> > removes state_count field from struct cpuidle_device.
> >
> > Additionaly (while at it) this patchset fixes C1E promotion disable
> > quirk handling (in intel_idle driver) and converts cpuidle drivers code
> > to use the common cpuidle_[un]register() routines (in POWERPC pseries
> > cpuidle driver and intel_idle driver).
> >
> > [1] http://permalink.gmane.org/gmane.linux.power-management.general/36908
> >
> > Reference to v1:
> > http://comments.gmane.org/gmane.linux.power-management.general/37390
> >
> > Changes since v1:
> > - synced patch series with next-20131220
> > - added ACKs from Daniel Lezcano
>
> This series breaks boot on one of my test machines with intel_idle, so I'm
> not sure how well it has been tested.
>
> I've dropped it entirely for now. If I have the time, I will try to identify
> the root cause of the failure, but that may not happen before the merge window.
> Sorry about that.
The breakage was introduced by patch [8/9], so I've re-applied patches [1-7/9]
from this series. Please refer to Fengguang's report [1] for the breakage
details.
Thanks!
[1] http://marc.info/?l=linux-kernel&m=138964167909907&w=2
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply
* [PATCH] powerpc/relocate fix relocate processing in LE mode
From: Laurent Dufour @ 2014-01-13 16:36 UTC (permalink / raw)
To: Benjamin Herrenschmidt, paulus, linuxppc-dev
Relocation's code is not working in little endian mode because the r_info
field, which is a 64 bits value, should be read from the right offset.
The current code is optimized to read the r_info field as a 32 bits value
starting at the middle of the double word (offset 12). When running in LE
mode, the read value is not correct since only the MSB is read.
This patch removes this optimization which consist to deal with a 32 bits
value instead of a 64 bits one. This way it works in big and little endian
mode.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
arch/powerpc/kernel/reloc_64.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/reloc_64.S b/arch/powerpc/kernel/reloc_64.S
index b47a0e1..1482327 100644
--- a/arch/powerpc/kernel/reloc_64.S
+++ b/arch/powerpc/kernel/reloc_64.S
@@ -69,8 +69,8 @@ _GLOBAL(relocate)
* R_PPC64_RELATIVE ones.
*/
mtctr r8
-5: lwz r0,12(9) /* ELF64_R_TYPE(reloc->r_info) */
- cmpwi r0,R_PPC64_RELATIVE
+5: ld r0,8(9) /* ELF64_R_TYPE(reloc->r_info) */
+ cmpdi r0,R_PPC64_RELATIVE
bne 6f
ld r6,0(r9) /* reloc->r_offset */
ld r0,16(r9) /* reloc->r_addend */
^ permalink raw reply related
* Re: [PATCH V4] powerpc: thp: Fix crash on mremap
From: Kirill A. Shutemov @ 2014-01-13 16:32 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: aarcange, linux-mm, paulus, linuxppc-dev, kirill.shutemov
In-Reply-To: <1389593064-32664-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
On Mon, Jan 13, 2014 at 11:34:24AM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> This patch fix the below crash
>
> NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
> LR [c0000000000439ac] .hash_page+0x18c/0x5e0
> ...
> Call Trace:
> [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
> [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
> [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
>
> On ppc64 we use the pgtable for storing the hpte slot information and
> store address to the pgtable at a constant offset (PTRS_PER_PMD) from
> pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
> the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
> from new pmd.
>
> We also want to move the withdraw and deposit before the set_pmd so
> that, when page fault find the pmd as trans huge we can be sure that
> pgtable can be located at the offset.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply
* [PATCH 2/4] powerpc: book3s kvm can be modular so it should use module.h
From: Paul Gortmaker @ 2014-01-13 16:21 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras; +Cc: Paul Gortmaker, linuxppc-dev
In-Reply-To: <1389630113-7919-1-git-send-email-paul.gortmaker@windriver.com>
KVM support is tristate, so this file should be including
module.h instead of export.h -- it only works currently because
module_init is currently (mis)placed in init.h -- but we are
intending to clean that up and relocate it to module.h
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
arch/powerpc/kvm/book3s.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 8912608b7e1b..279459e8a072 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -16,7 +16,7 @@
#include <linux/kvm_host.h>
#include <linux/err.h>
-#include <linux/export.h>
+#include <linux/module.h>
#include <linux/slab.h>
#include <asm/reg.h>
--
1.8.5.2
^ permalink raw reply related
* [PATCH 3/4] powerpc: use subsys_initcall for Freescale Local Bus
From: Paul Gortmaker @ 2014-01-13 16:21 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras; +Cc: Paul Gortmaker, linuxppc-dev
In-Reply-To: <1389630113-7919-1-git-send-email-paul.gortmaker@windriver.com>
The FSL_SOC option is bool, and hence this code is either
present or absent. It will never be modular, so using
module_init as an alias for __initcall is rather misleading.
Fix this up now, so that we can relocate module_init from
init.h into module.h in the future. If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.
Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups. As __initcall gets
mapped onto device_initcall, our use of subsys_initcall (which
makes sense for bus code) will thus change this registration
from level 6-device to level 4-subsys (i.e. slightly earlier).
However no observable impact of that small difference has
been observed during testing, or is expected.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
arch/powerpc/sysdev/fsl_lbc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/sysdev/fsl_lbc.c b/arch/powerpc/sysdev/fsl_lbc.c
index 6bc5a546d49f..9f00e5f84abe 100644
--- a/arch/powerpc/sysdev/fsl_lbc.c
+++ b/arch/powerpc/sysdev/fsl_lbc.c
@@ -388,4 +388,4 @@ static int __init fsl_lbc_init(void)
{
return platform_driver_register(&fsl_lbc_ctrl_driver);
}
-module_init(fsl_lbc_init);
+subsys_initcall(fsl_lbc_init);
--
1.8.5.2
^ permalink raw reply related
* [PATCH 4/4] powerpc: don't use module_init for non-modular core hugetlb code
From: Paul Gortmaker @ 2014-01-13 16:21 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras; +Cc: Paul Gortmaker, linuxppc-dev
In-Reply-To: <1389630113-7919-1-git-send-email-paul.gortmaker@windriver.com>
The hugetlbpage.o is obj-y (always built in). It will never
be modular, so using module_init as an alias for __initcall is
somewhat misleading.
Fix this up now, so that we can relocate module_init from
init.h into module.h in the future. If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.
Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups. As __initcall gets
mapped onto device_initcall, our use of arch_initcall (which
makes sense for arch code) will thus change this registration
from level 6-device to level 3-arch (i.e. slightly earlier).
However no observable impact of that small difference has
been observed during testing, or is expected.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
arch/powerpc/mm/hugetlbpage.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 90bb6d9409bf..d25c202420da 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -911,7 +911,7 @@ static int __init hugetlbpage_init(void)
return 0;
}
#endif
-module_init(hugetlbpage_init);
+arch_initcall(hugetlbpage_init);
void flush_dcache_icache_hugepage(struct page *page)
{
--
1.8.5.2
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox