* [PATCH 2/4] powerpc: move the patch_exception to a common place
From: Kevin Hao @ 2013-05-11 23:26 UTC (permalink / raw)
To: Kumar Gala, Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc
In-Reply-To: <1368314784-971-1-git-send-email-haokexin@gmail.com>
So that it can be used by other codes. No function change.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
---
arch/powerpc/include/asm/code-patching.h | 7 +++++++
arch/powerpc/lib/code-patching.c | 15 +++++++++++++++
arch/powerpc/mm/tlb_nohash.c | 19 -------------------
3 files changed, 22 insertions(+), 19 deletions(-)
diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index a6f8c7a..97e02f9 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -34,6 +34,13 @@ int instr_is_branch_to_addr(const unsigned int *instr, unsigned long addr);
unsigned long branch_target(const unsigned int *instr);
unsigned int translate_branch(const unsigned int *dest,
const unsigned int *src);
+#ifdef CONFIG_PPC_BOOK3E_64
+void __patch_exception(int exc, unsigned long addr);
+#define patch_exception(exc, name) do { \
+ extern unsigned int name; \
+ __patch_exception((exc), (unsigned long)&name); \
+} while (0)
+#endif
static inline unsigned long ppc_function_entry(void *func)
{
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 17e5b23..d5edbeb 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -159,6 +159,21 @@ unsigned int translate_branch(const unsigned int *dest, const unsigned int *src)
return 0;
}
+#ifdef CONFIG_PPC_BOOK3E_64
+void __patch_exception(int exc, unsigned long addr)
+{
+ extern unsigned int interrupt_base_book3e;
+ unsigned int *ibase = &interrupt_base_book3e;
+
+ /* Our exceptions vectors start with a NOP and -then- a branch
+ * to deal with single stepping from userspace which stops on
+ * the second instruction. Thus we need to patch the second
+ * instruction of the exception, not the first one
+ */
+
+ patch_branch(ibase + (exc / 4) + 1, addr, 0);
+}
+#endif
#ifdef CONFIG_CODE_PATCHING_SELFTEST
diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index 6888cad..0658aad 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -518,25 +518,6 @@ static void setup_page_sizes(void)
}
}
-static void __patch_exception(int exc, unsigned long addr)
-{
- extern unsigned int interrupt_base_book3e;
- unsigned int *ibase = &interrupt_base_book3e;
-
- /* Our exceptions vectors start with a NOP and -then- a branch
- * to deal with single stepping from userspace which stops on
- * the second instruction. Thus we need to patch the second
- * instruction of the exception, not the first one
- */
-
- patch_branch(ibase + (exc / 4) + 1, addr, 0);
-}
-
-#define patch_exception(exc, name) do { \
- extern unsigned int name; \
- __patch_exception((exc), (unsigned long)&name); \
-} while (0)
-
static void setup_mmu_htw(void)
{
/* Check if HW tablewalk is present, and if yes, enable it by:
--
1.8.1.4
^ permalink raw reply related
* [PATCH 1/4] powerpc/book3e: introduce external_input_edge exception handler for 64bit kernel
From: Kevin Hao @ 2013-05-11 23:26 UTC (permalink / raw)
To: Kumar Gala, Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc
In-Reply-To: <1368314784-971-1-git-send-email-haokexin@gmail.com>
In the external proxy facility mode, the interrupt is automatically
acknowledged with the same effect as reading the IACK register. So
this makes external input interrupt more like edge sensitive. That
means we can leave the irq hard enabled when it occurs with irq soft
disabled just like the dec and doorbell interrupt. But the External
Proxy Register(EPR) is only considered valid from the time that the
external interrupt occurs until MSR[EE] is set to 1. So we have to
save the EPR before irq hard enabled.
In general, we would initialize all the interrupt sources to the same
priority. That means if a interrupt is in service and the other
interrupt would never get any chance to be delivered to the cpu.
And also in order not to complicate the implementation, we only support
to save one content of the EPR register and will set PACA_IRQ_EE
and then hard disable the irq if more external interrupt are received.
In order to support to choose the working mode of MPIC at runtime,
we introduce a new external input exception handler. And we will
use this handler if the external proxy is enabled.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
---
arch/powerpc/include/asm/paca.h | 1 +
arch/powerpc/kernel/asm-offsets.c | 1 +
arch/powerpc/kernel/entry_64.S | 2 ++
arch/powerpc/kernel/exceptions-64e.S | 26 +++++++++++++++++++++++---
arch/powerpc/kernel/irq.c | 12 ++++++------
arch/powerpc/kernel/paca.c | 3 +++
arch/powerpc/sysdev/mpic.c | 8 ++++++++
7 files changed, 44 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 77c91e7..9b3649d 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -118,6 +118,7 @@ struct paca_struct {
void *mc_kstack;
void *crit_kstack;
void *dbg_kstack;
+ u32 saved_epr; /* EPR saved here */
#endif /* CONFIG_PPC_BOOK3E */
mm_context_t context;
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index b51a97c..3199708 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -201,6 +201,7 @@ int main(void)
DEFINE(PACA_MC_STACK, offsetof(struct paca_struct, mc_kstack));
DEFINE(PACA_CRIT_STACK, offsetof(struct paca_struct, crit_kstack));
DEFINE(PACA_DBG_STACK, offsetof(struct paca_struct, dbg_kstack));
+ DEFINE(PACASAVEDEPR, offsetof(struct paca_struct, saved_epr));
#endif /* CONFIG_PPC_BOOK3E */
#ifdef CONFIG_PPC_STD_MMU_64
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 3fe5259..09ac356 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -884,6 +884,8 @@ restore_check_irq_replay:
* still soft-disabled and we keep them that way.
*/
cmpwi cr0,r3,0x500
+ cmpwi cr1,r3,0x510
+ cror eq,4*cr1+eq,eq
bne 1f
addi r3,r1,STACK_FRAME_OVERHEAD;
bl .do_IRQ
diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S
index 42a756e..e32bd6c 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -358,6 +358,9 @@ interrupt_end_book3e:
MASKABLE_EXCEPTION(0x500, BOOKE_INTERRUPT_EXTERNAL,
external_input, .do_IRQ, ACK_NONE)
+ MASKABLE_EXCEPTION(0x510, BOOKE_INTERRUPT_EXTERNAL,
+ external_input_edge, .do_IRQ, ACK_NONE)
+
/* Alignment */
START_EXCEPTION(alignment);
NORMAL_EXCEPTION_PROLOG(0x600, BOOKE_INTERRUPT_ALIGNMENT,
@@ -685,13 +688,25 @@ kernel_dbg_exc:
ori r10,r10,\paca_irq
stb r10,PACAIRQHAPPENED(r13)
+ .if \paca_irq == PACA_IRQ_EE_EDGE
+ lwz r10,PACASAVEDEPR(r13)
+ cmpwi r10,~0
+ bne- 98f
+ mfspr r10,SPRN_EPR
+ stw r10,PACASAVEDEPR(r13)
+ b 99f
+98: lbz r10,PACAIRQHAPPENED(r13)
+ ori r10,r10,PACA_IRQ_EE
+ stb r10,PACAIRQHAPPENED(r13)
+ .endif
+
.if \full_mask == 1
rldicl r10,r11,48,1 /* clear MSR_EE */
rotldi r11,r10,16
mtspr SPRN_SRR1,r11
.endif
- lwz r11,PACA_EXGEN+EX_CR(r13)
+99: lwz r11,PACA_EXGEN+EX_CR(r13)
mtcr r11
ld r10,PACA_EXGEN+EX_R10(r13)
ld r11,PACA_EXGEN+EX_R11(r13)
@@ -701,9 +716,11 @@ kernel_dbg_exc:
.endm
masked_interrupt_book3e_0x500:
- // XXX When adding support for EPR, use PACA_IRQ_EE_EDGE
masked_interrupt_book3e PACA_IRQ_EE 1
+masked_interrupt_book3e_0x510:
+ masked_interrupt_book3e PACA_IRQ_EE_EDGE 1
+
masked_interrupt_book3e_0x900:
ACK_DEC(r10);
masked_interrupt_book3e PACA_IRQ_DEC 0
@@ -718,7 +735,7 @@ masked_interrupt_book3e_0x2c0:
/*
* Called from arch_local_irq_enable when an interrupt needs
- * to be resent. r3 contains either 0x500,0x900,0x260 or 0x280
+ * to be resent. r3 contains either 0x500,0x900,0x510 or 0x280
* to indicate the kind of interrupt. MSR:EE is already off.
* We generate a stackframe like if a real interrupt had happened.
*
@@ -745,6 +762,8 @@ _GLOBAL(__replay_interrupt)
beq exc_0x900_common
cmpwi cr0,r3,0x280
beq exc_0x280_common
+ cmpwi cr0,r3,0x510
+ beq exc_0x510_common
blr
@@ -859,6 +878,7 @@ BAD_STACK_TRAMPOLINE(0x310)
BAD_STACK_TRAMPOLINE(0x320)
BAD_STACK_TRAMPOLINE(0x400)
BAD_STACK_TRAMPOLINE(0x500)
+BAD_STACK_TRAMPOLINE(0x510)
BAD_STACK_TRAMPOLINE(0x600)
BAD_STACK_TRAMPOLINE(0x700)
BAD_STACK_TRAMPOLINE(0x800)
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 5cbcf4d..9349db9 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -165,11 +165,6 @@ notrace unsigned int __check_irq_replay(void)
if (decrementer_check_overflow())
return 0x900;
- /* Finally check if an external interrupt happened */
- local_paca->irq_happened &= ~PACA_IRQ_EE;
- if (happened & PACA_IRQ_EE)
- return 0x500;
-
#ifdef CONFIG_PPC_BOOK3E
/* Finally check if an EPR external interrupt happened
* this bit is typically set if we need to handle another
@@ -177,7 +172,7 @@ notrace unsigned int __check_irq_replay(void)
*/
local_paca->irq_happened &= ~PACA_IRQ_EE_EDGE;
if (happened & PACA_IRQ_EE_EDGE)
- return 0x500;
+ return 0x510;
local_paca->irq_happened &= ~PACA_IRQ_DBELL;
if (happened & PACA_IRQ_DBELL)
@@ -191,6 +186,11 @@ notrace unsigned int __check_irq_replay(void)
}
#endif /* CONFIG_PPC_BOOK3E */
+ /* Finally check if an external interrupt happened */
+ local_paca->irq_happened &= ~PACA_IRQ_EE;
+ if (happened & PACA_IRQ_EE)
+ return 0x500;
+
/* There should be nothing left ! */
BUG_ON(local_paca->irq_happened != 0);
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index f8f2468..ba370e9 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -144,6 +144,9 @@ void __init initialise_paca(struct paca_struct *new_paca, int cpu)
#ifdef CONFIG_PPC_STD_MMU_64
new_paca->slb_shadow_ptr = &slb_shadow[cpu];
#endif /* CONFIG_PPC_STD_MMU_64 */
+#ifdef CONFIG_PPC_BOOK3E
+ new_paca->saved_epr = ~0;
+#endif
}
/* Put the paca pointer into r13 and SPRG_PACA */
diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c
index ee21b5e..4d248d3 100644
--- a/arch/powerpc/sysdev/mpic.c
+++ b/arch/powerpc/sysdev/mpic.c
@@ -1804,7 +1804,15 @@ unsigned int mpic_get_coreint_irq(void)
BUG_ON(mpic == NULL);
+#ifdef CONFIG_PPC64
+ if (local_paca->saved_epr != ~0) {
+ src = local_paca->saved_epr;
+ local_paca->saved_epr = ~0;
+ } else
+ src = mfspr(SPRN_EPR);
+#else
src = mfspr(SPRN_EPR);
+#endif
if (unlikely(src == mpic->spurious_vec)) {
if (mpic->flags & MPIC_SPV_EOI)
--
1.8.1.4
^ permalink raw reply related
* [PATCH 0/4] enable the PACA_IRQ_EE_EDGE support for book3e
From: Kevin Hao @ 2013-05-11 23:26 UTC (permalink / raw)
To: Kumar Gala, Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc
Hi,
In the current kernel, the external proxy works pretty well with the lazy
EE. But it still treat the external input interrupt as level sensitive
and would hard disable the irq while a interrupt occurs with irq soft
disabled. This patch series enables PACA_IRQ_EE_EDGE support for book3e
and will leave the irq still hard enabled while a external interrupt
occurs with soft irq disabled. This passed the boot test on T4240QDS
board.
---
Kevin Hao (4):
powerpc/book3e: introduce external_input_edge exception handler for
64bit kernel
powerpc: move the patch_exception to a common place
powerpc: use patch_exception to update the debug exception handler
powerpc/fsl-book3e: enable the external_input_edge exception handler
arch/powerpc/include/asm/code-patching.h | 7 +++++++
arch/powerpc/include/asm/paca.h | 1 +
arch/powerpc/kernel/asm-offsets.c | 1 +
arch/powerpc/kernel/entry_64.S | 2 ++
arch/powerpc/kernel/exceptions-64e.S | 26 +++++++++++++++++++++++---
arch/powerpc/kernel/irq.c | 12 ++++++------
arch/powerpc/kernel/paca.c | 3 +++
arch/powerpc/kernel/setup_64.c | 6 +-----
arch/powerpc/lib/code-patching.c | 15 +++++++++++++++
arch/powerpc/mm/tlb_nohash.c | 19 -------------------
arch/powerpc/platforms/85xx/corenet_ds.c | 9 ++++++++-
arch/powerpc/sysdev/mpic.c | 8 ++++++++
12 files changed, 75 insertions(+), 34 deletions(-)
Thanks,
Kevin
--
1.8.1.4
^ permalink raw reply
* Re: [PATCH] perf: Power7: Make CPI stack events available in sysfs
From: Sukadev Bhattiprolu @ 2013-05-11 22:31 UTC (permalink / raw)
To: Michael Ellerman
Cc: linuxppc-dev, Paul Mackerras, linux-kernel,
Arnaldo Carvalho de Melo
In-Reply-To: <20130422155505.GA26169@us.ibm.com>
Sukadev Bhattiprolu [sukadev@linux.vnet.ibm.com] wrote:
| Michael Ellerman [michael@ellerman.id.au] wrote:
| | On Sat, Apr 06, 2013 at 09:48:03AM -0700, Sukadev Bhattiprolu wrote:
| | > From bdeacf7175241f6c79b5b2be0fa6b20b0d0b7d1c Mon Sep 17 00:00:00 2001
| | > From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
| | > Date: Sat, 6 Apr 2013 08:48:26 -0700
| | > Subject: [PATCH] perf: Power7: Make CPI stack events available in sysfs
| | >
| | > A set of Power7 events are often used for Cycles Per Instruction (CPI) stack
| | > analysis. Make these events available in sysfs (/sys/devices/cpu/events/) so
| | > they can be identified using their symbolic names:
| | >
| | > perf stat -e 'cpu/PM_CMPLU_STALL_DCACHE_MISS/' /bin/ls
| |
| | Should we take these two via the powerpc tree? Or do you want to take
| | them Arnaldo?
|
| I think it can go through powerpc tree since it is all arch-specific.
|
Ben, Paul,
Any comments on this patch ?
https://lkml.org/lkml/2013/4/6/169
^ permalink raw reply
* Re: [PATCH 2/2] powerpc/85xx: enable coreint for all the 64bit boards
From: Kevin Hao @ 2013-05-11 7:00 UTC (permalink / raw)
To: Kumar Gala, Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc
In-Reply-To: <1365643954-20798-3-git-send-email-haokexin@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3090 bytes --]
On Thu, Apr 11, 2013 at 09:32:34AM +0800, Kevin Hao wrote:
> With the patch 7230c564 (powerpc: Rework lazy-interrupt handling),
> it seems that the coreint works pretty well on the 85xx 64bit kernel.
> So use the coreint by default for these boards.
Hi Kumar,
Since the external proxy works pretty well with the lazy EE without any change
to the current kernel, could you please pick this patch up? This has already
been acked by Scott. I also think I got a implicit ack from Benjamin.
I will try to use the PACA_IRQ_EE_EDGE to make the support for external
proxy more better a little later as suggested by Benjamin.
Thanks,
Kevin
>
> Signed-off-by: Kevin Hao <haokexin@gmail.com>
> ---
> arch/powerpc/platforms/85xx/p5020_ds.c | 5 -----
> arch/powerpc/platforms/85xx/p5040_ds.c | 5 -----
> arch/powerpc/platforms/85xx/t4240_qds.c | 5 -----
> 3 files changed, 15 deletions(-)
>
> diff --git a/arch/powerpc/platforms/85xx/p5020_ds.c b/arch/powerpc/platforms/85xx/p5020_ds.c
> index 753a42c..39cfa40 100644
> --- a/arch/powerpc/platforms/85xx/p5020_ds.c
> +++ b/arch/powerpc/platforms/85xx/p5020_ds.c
> @@ -75,12 +75,7 @@ define_machine(p5020_ds) {
> #ifdef CONFIG_PCI
> .pcibios_fixup_bus = fsl_pcibios_fixup_bus,
> #endif
> -/* coreint doesn't play nice with lazy EE, use legacy mpic for now */
> -#ifdef CONFIG_PPC64
> - .get_irq = mpic_get_irq,
> -#else
> .get_irq = mpic_get_coreint_irq,
> -#endif
> .restart = fsl_rstcr_restart,
> .calibrate_decr = generic_calibrate_decr,
> .progress = udbg_progress,
> diff --git a/arch/powerpc/platforms/85xx/p5040_ds.c b/arch/powerpc/platforms/85xx/p5040_ds.c
> index 1138185..f70e74c 100644
> --- a/arch/powerpc/platforms/85xx/p5040_ds.c
> +++ b/arch/powerpc/platforms/85xx/p5040_ds.c
> @@ -66,12 +66,7 @@ define_machine(p5040_ds) {
> #ifdef CONFIG_PCI
> .pcibios_fixup_bus = fsl_pcibios_fixup_bus,
> #endif
> -/* coreint doesn't play nice with lazy EE, use legacy mpic for now */
> -#ifdef CONFIG_PPC64
> - .get_irq = mpic_get_irq,
> -#else
> .get_irq = mpic_get_coreint_irq,
> -#endif
> .restart = fsl_rstcr_restart,
> .calibrate_decr = generic_calibrate_decr,
> .progress = udbg_progress,
> diff --git a/arch/powerpc/platforms/85xx/t4240_qds.c b/arch/powerpc/platforms/85xx/t4240_qds.c
> index 5998e9f..91ead6b 100644
> --- a/arch/powerpc/platforms/85xx/t4240_qds.c
> +++ b/arch/powerpc/platforms/85xx/t4240_qds.c
> @@ -75,12 +75,7 @@ define_machine(t4240_qds) {
> #ifdef CONFIG_PCI
> .pcibios_fixup_bus = fsl_pcibios_fixup_bus,
> #endif
> -/* coreint doesn't play nice with lazy EE, use legacy mpic for now */
> -#ifdef CONFIG_PPC64
> - .get_irq = mpic_get_irq,
> -#else
> .get_irq = mpic_get_coreint_irq,
> -#endif
> .restart = fsl_rstcr_restart,
> .calibrate_decr = generic_calibrate_decr,
> .progress = udbg_progress,
> --
> 1.8.1.4
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply
* Re: [PATCH v2 4/4] kvm/ppc: IRQ disabling cleanup
From: Scott Wood @ 2013-05-10 22:53 UTC (permalink / raw)
To: Bhushan Bharat-R65777
Cc: Wood Scott-B07421, kvm@vger.kernel.org, Alexander Graf,
kvm-ppc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <6A3DF150A5B70D4F9B66A25E3F7C888D0700F859@039-SN2MPN1-011.039d.mgd.msft.net>
On 05/10/2013 12:01:38 AM, Bhushan Bharat-R65777 wrote:
>=20
>=20
> > -----Original Message-----
> > From: kvm-ppc-owner@vger.kernel.org =20
> [mailto:kvm-ppc-owner@vger.kernel.org] On
> > Behalf Of Scott Wood
> > Sent: Friday, May 10, 2013 8:40 AM
> > To: Alexander Graf; Benjamin Herrenschmidt
> > Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; =20
> linuxppc-dev@lists.ozlabs.org;
> > Wood Scott-B07421
> > Subject: [PATCH v2 4/4] kvm/ppc: IRQ disabling cleanup
> >
> > - WARN_ON_ONCE(!irqs_disabled());
> > + WARN_ON(irqs_disabled());
> > + hard_irq_disable();
>=20
> Here we hard disable in kvmppc_prepare_to_enter(), so my comment in =20
> other patch about interrupt loss is no more valid.
>=20
> So here
> MSR.EE =3D 0
> local_paca->soft_enabled =3D 0
> local_paca->irq_happened |=3D PACA_IRQ_HARD_DIS;
>=20
> > +
> > while (true) {
> > if (need_resched()) {
> > local_irq_enable();
>=20
> This will make the state:
> MSR.EE =3D 1
> local_paca->soft_enabled =3D 1
> local_paca->irq_happened =3D PACA_IRQ_HARD_DIS; //same as before
>=20
> Is that a valid state where interrupts are fully enabled and =20
> irq_happend in not 0?
PACA_IRQ_HARD_DIS will have been cleared by local_irq_enable(), as =20
Tiejun pointed out.
> int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
> {
> int r =3D 0;
> WARN_ON_ONCE(!irqs_disabled());
>=20
> kvmppc_core_check_exceptions(vcpu);
>=20
> if (vcpu->requests) {
> /* Exception delivery raised request; start over */
> return 1;
> }
>=20
> if (vcpu->arch.shared->msr & MSR_WE) {
> local_irq_enable();
> kvm_vcpu_block(vcpu);
> clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
> local_irq_disable();
> ^^^
> We do not require hard_irq_disable() here?
Yes, that should be changed to hard_irq_disable(), and I'll add a =20
WARN_ON to double check that interrupts are hard-disabled (eventually =20
we'll probably want to make these critical-path assertions dependent on =20
a debug option...). It doesn't really matter all that much, though, =20
since we don't have MSR_WE on any 64-bit booke chips. :-)
-Scott=
^ permalink raw reply
* Re: [PATCH v2 3/4] kvm/ppc: Call trace_hardirqs_on before entry
From: Scott Wood @ 2013-05-10 22:47 UTC (permalink / raw)
To: tiejun.chen
Cc: Wood Scott-B07421, kvm@vger.kernel.org, Alexander Graf,
kvm-ppc@vger.kernel.org, Bhushan Bharat-R65777,
linuxppc-dev@lists.ozlabs.org
In-Reply-To: <518C7A28.4000903@windriver.com>
On 05/09/2013 11:40:08 PM, tiejun.chen wrote:
> On 05/10/2013 11:34 AM, Bhushan Bharat-R65777 wrote:
>>> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
>>> index 705fc5c..eb89b83 100644
>>> --- a/arch/powerpc/kvm/booke.c
>>> +++ b/arch/powerpc/kvm/booke.c
>>> @@ -673,7 +673,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, =20
>>> struct kvm_vcpu
>>> *vcpu)
>>> ret =3D s;
>>> goto out;
>>> }
>>> - kvmppc_lazy_ee_enable();
>>> + kvmppc_fix_ee_before_entry();
>>=20
>> local_irq_disable() is called before kvmppc_prepare_to_enter().
>=20
> In patch 4, we call hard_irq_disable() once enter =20
> kvmppc_prepare_to_enter().
And before patch 4, we have the code near the end of =20
kvmppc_prepare_to_enter() that checks lazy_irq_pending() and aborts =20
guest entry if there was a race. If I'd known about that bit of code =20
beforehand, I probably wouldn't have bothered with most of patch 4/4, =20
but now that it's been done it seems cleaner.
-Scott=
^ permalink raw reply
* Re: [PATCH v2 2/4] kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
From: Scott Wood @ 2013-05-10 22:43 UTC (permalink / raw)
To: Bhushan Bharat-R65777
Cc: Wood Scott-B07421, kvm@vger.kernel.org, Alexander Graf,
kvm-ppc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <6A3DF150A5B70D4F9B66A25E3F7C888D0700F847@039-SN2MPN1-011.039d.mgd.msft.net>
On 05/10/2013 12:01:19 AM, Bhushan Bharat-R65777 wrote:
>=20
>=20
> > -----Original Message-----
> > From: kvm-ppc-owner@vger.kernel.org =20
> [mailto:kvm-ppc-owner@vger.kernel.org] On
> > Behalf Of Scott Wood
> > Sent: Friday, May 10, 2013 8:40 AM
> > To: Alexander Graf; Benjamin Herrenschmidt
> > Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; =20
> linuxppc-dev@lists.ozlabs.org;
> > Wood Scott-B07421
> > Subject: [PATCH v2 2/4] kvm/ppc/booke64: Fix lazy ee handling in
> > kvmppc_handle_exit()
> >
> > EE is hard-disabled on entry to kvmppc_handle_exit(), so call
> > hard_irq_disable() so that PACA_IRQ_HARD_DIS is set, and =20
> soft_enabled
> > is unset.
> >
> > Without this, we get warnings such as =20
> arch/powerpc/kernel/time.c:300,
> > and sometimes host kernel hangs.
> >
> > Signed-off-by: Scott Wood <scottwood@freescale.com>
> > ---
> > arch/powerpc/kvm/booke.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> > index 1020119..705fc5c 100644
> > --- a/arch/powerpc/kvm/booke.c
> > +++ b/arch/powerpc/kvm/booke.c
> > @@ -833,6 +833,11 @@ int kvmppc_handle_exit(struct kvm_run *run, =20
> struct kvm_vcpu
> > *vcpu,
> > int r =3D RESUME_HOST;
> > int s;
> >
> > +#ifdef CONFIG_PPC64
> > + WARN_ON(local_paca->irq_happened !=3D 0);
> > +#endif
> > + hard_irq_disable();
>=20
> It is not actually to hard disable as EE is already clear but to make =20
> it looks like hard_disable to host. Right?
> If so, should we write a comment here on why we are doing this?
Yes, I can add a comment.
-Scott=
^ permalink raw reply
* Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
From: Benjamin Herrenschmidt @ 2013-05-10 21:50 UTC (permalink / raw)
To: Kevin Hao
Cc: Wood Scott-B07421, kvm@vger.kernel.org,
Caraman Mihai Claudiu-B02008, agraf@suse.de,
kvm-ppc@vger.kernel.org, tiejun.chen, Bhushan Bharat-R65777,
Scott Wood, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1368222580.4524.52.camel@pasglop>
On Sat, 2013-05-11 at 07:49 +1000, Benjamin Herrenschmidt wrote:
> I would keep the EE_EDGE bit definition. I have no objection to a gradual
> approach however for the other one where we apply it as is now to enable
> coreint while you do a rework to make it better :-)
Note also that I generally don't apply FSL related patches directly, I
rely on Kumar Gala picking them up so he's the one ultimately making
that choice.
Cheers,
Ben.
^ permalink raw reply
* Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
From: Benjamin Herrenschmidt @ 2013-05-10 21:49 UTC (permalink / raw)
To: Kevin Hao
Cc: Wood Scott-B07421, kvm@vger.kernel.org,
Caraman Mihai Claudiu-B02008, agraf@suse.de,
kvm-ppc@vger.kernel.org, tiejun.chen, Bhushan Bharat-R65777,
Scott Wood, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20130510141213.GA31597@pek-khao-d1.corp.ad.wrs.com>
On Fri, 2013-05-10 at 22:12 +0800, Kevin Hao wrote:
> So I would assume you will not pick up these two patches, right?
> http://patchwork.ozlabs.org/patch/235530/
> http://patchwork.ozlabs.org/patch/235532/
>
> Anyway it is more easier to enable the external proxy by using this method.
> But if you insist, I can respin a patch to use the method you suggested
> since it will definitely reduce the window where the irq is hard disabled.
I would keep the EE_EDGE bit definition. I have no objection to a gradual
approach however for the other one where we apply it as is now to enable
coreint while you do a rework to make it better :-)
Cheers,
Ben.
^ permalink raw reply
* [PATCH] powerpc: provide __bswapdi2
From: David Woodhouse @ 2013-05-10 21:18 UTC (permalink / raw)
To: linuxppc-dev; +Cc: viro
[-- Attachment #1: Type: text/plain, Size: 1731 bytes --]
From: David Woodhouse <David.Woodhouse@intel.com>
Some versions of GCC apparently expect this to be provided by libgcc.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
Untested.
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 19e096b..f077dc2 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -657,6 +657,17 @@ _GLOBAL(__ucmpdi2)
li r3,2
blr
+_GLOBAL(__bswapdi2)
+ rlwinm 10,4,8,0xffffffff
+ rlwinm 11,3,8,0xffffffff
+ rlwimi 10,4,24,0,7
+ rlwimi 11,3,24,0,7
+ rlwimi 10,4,24,16,23
+ rlwimi 11,3,24,16,23
+ mr 4,11
+ mr 3,10
+ blr
+
_GLOBAL(abs)
srawi r4,r3,31
xor r3,r3,r4
diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index 5cfa800..3b2e6e8 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -234,6 +234,18 @@ _GLOBAL(__flush_dcache_icache)
isync
blr
+_GLOBAL(__bswapdi2)
+ srdi 8,3,32
+ rlwinm 7,3,8,0xffffffff
+ rlwimi 7,3,24,0,7
+ rlwinm 9,8,8,0xffffffff
+ rlwimi 7,3,24,16,23
+ rlwimi 9,8,24,0,7
+ rlwimi 9,8,24,16,23
+ sldi 7,7,32
+ or 7,7,9
+ mr 3,7
+ blr
#if defined(CONFIG_PPC_PMAC) || defined(CONFIG_PPC_MAPLE)
/*
diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
index 78b8766..c296665 100644
--- a/arch/powerpc/kernel/ppc_ksyms.c
+++ b/arch/powerpc/kernel/ppc_ksyms.c
@@ -143,7 +143,8 @@ EXPORT_SYMBOL(__lshrdi3);
int __ucmpdi2(unsigned long long, unsigned long long);
EXPORT_SYMBOL(__ucmpdi2);
#endif
-
+long long __bswapdi2(long long);
+EXPORT_SYMBOL(__bswapdi2);
EXPORT_SYMBOL(memcpy);
EXPORT_SYMBOL(memset);
EXPORT_SYMBOL(memmove);
--
dwmw2
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]
^ permalink raw reply related
* Re: [v1][KVM][PATCH 1/1] kvm:ppc:booehv: direct ISI exception to Guest
From: Alexander Graf @ 2013-05-10 19:39 UTC (permalink / raw)
To: Scott Wood
Cc: Caraman Mihai Claudiu-B02008, kvm@vger.kernel.org,
Wood Scott-B07421, kvm-ppc@vger.kernel.org, tiejun.chen,
linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1368213757.19683.10@snotra>
Am 10.05.2013 um 21:22 schrieb Scott Wood <scottwood@freescale.com>:
> On 05/10/2013 12:57:33 PM, Alexander Graf wrote:
>> Could you guys please collect performance data during the next weeks on b=
oth guest-directed ISIs as well as VF MMIOs (preferably with in-kernel MMIO)=
, so that we can decide on the direction that's worth going towards?
>=20
> Collecting data on VF MMIO would require implementing it (or at least salv=
aging and fixing some old code), which is not a high priority at the moment.=
If we do implement VF in the future we could always undo the direct ISI ch=
ange, but it would still be nice to know if there's any real benefit in the f=
irst place.
Mike sounded like he had an almost working poc, which is good enough to coll=
ect rough numbers.
And yes, changes like these should always get at least basic performance num=
bers along with them, regardless of drawbacks.
Alex
>=20
> FWIW, I doubt that the "more stress on HW TLB" will be significant.
>=20
> -Scott
^ permalink raw reply
* Re: [v1][KVM][PATCH 1/1] kvm:ppc:booehv: direct ISI exception to Guest
From: Scott Wood @ 2013-05-10 19:22 UTC (permalink / raw)
To: Alexander Graf
Cc: Caraman Mihai Claudiu-B02008, kvm@vger.kernel.org,
Wood Scott-B07421, kvm-ppc@vger.kernel.org, tiejun.chen,
linuxppc-dev@lists.ozlabs.org
In-Reply-To: <F4F07A17-8E1E-433D-9749-68D1495C42A7@suse.de>
On 05/10/2013 12:57:33 PM, Alexander Graf wrote:
> Could you guys please collect performance data during the next weeks =20
> on both guest-directed ISIs as well as VF MMIOs (preferably with =20
> in-kernel MMIO), so that we can decide on the direction that's worth =20
> going towards?
Collecting data on VF MMIO would require implementing it (or at least =20
salvaging and fixing some old code), which is not a high priority at =20
the moment. If we do implement VF in the future we could always undo =20
the direct ISI change, but it would still be nice to know if there's =20
any real benefit in the first place.
FWIW, I doubt that the "more stress on HW TLB" will be significant.
-Scott=
^ permalink raw reply
* Re: [v2][KVM][PATCH 1/1] kvm:ppc: enable doorbell exception with CONFIG_PPC_DOORBELL
From: Alexander Graf @ 2013-05-10 18:18 UTC (permalink / raw)
To: Scott Wood; +Cc: Tiejun Chen, linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <1368209828.19683.5@snotra>
On 10.05.2013, at 20:17, Scott Wood wrote:
> On 05/10/2013 01:14:27 PM, Alexander Graf wrote:
>> On 07.05.2013, at 12:23, Tiejun Chen wrote:
>> > CONFIG_PPC_DOORBELL is enough to cover all variants.
>> >
>> > Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
>> > ---
>> > arch/powerpc/kvm/booke.c | 2 +-
>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
>> > index 1020119..62d4ece 100644
>> > --- a/arch/powerpc/kvm/booke.c
>> > +++ b/arch/powerpc/kvm/booke.c
>> > @@ -795,7 +795,7 @@ static void kvmppc_restart_interrupt(struct =
kvm_vcpu *vcpu,
>> > kvmppc_fill_pt_regs(®s);
>> > timer_interrupt(®s);
>> > break;
>> > -#if defined(CONFIG_PPC_FSL_BOOK3E) || =
defined(CONFIG_PPC_BOOK3E_64)
>> > +#if defined(CONFIG_PPC_DOORBELL)
>> The same question still holds. How is this an improvement over the =
previous code? Does this fix any issues for you? Is this just a coding =
style cleanup?
>=20
> This is an improvement because CONFIG_PPC_DOORBELL is what controls =
whether the function that is called inside the ifdef exists.
Aha! Now that's a good reason. Tiejun, please adjust your patch =
description accordingly.
Alex
^ permalink raw reply
* Re: [v2][KVM][PATCH 1/1] kvm:ppc: enable doorbell exception with CONFIG_PPC_DOORBELL
From: Scott Wood @ 2013-05-10 18:17 UTC (permalink / raw)
To: Alexander Graf; +Cc: Tiejun Chen, linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <39FBAAD3-4B13-4CA3-9E80-B73A266C1D3A@suse.de>
On 05/10/2013 01:14:27 PM, Alexander Graf wrote:
>=20
> On 07.05.2013, at 12:23, Tiejun Chen wrote:
>=20
> > CONFIG_PPC_DOORBELL is enough to cover all variants.
> >
> > Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
> > ---
> > arch/powerpc/kvm/booke.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> > index 1020119..62d4ece 100644
> > --- a/arch/powerpc/kvm/booke.c
> > +++ b/arch/powerpc/kvm/booke.c
> > @@ -795,7 +795,7 @@ static void kvmppc_restart_interrupt(struct =20
> kvm_vcpu *vcpu,
> > kvmppc_fill_pt_regs(®s);
> > timer_interrupt(®s);
> > break;
> > -#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_BOOK3E_64)
> > +#if defined(CONFIG_PPC_DOORBELL)
>=20
> The same question still holds. How is this an improvement over the =20
> previous code? Does this fix any issues for you? Is this just a =20
> coding style cleanup?
This is an improvement because CONFIG_PPC_DOORBELL is what controls =20
whether the function that is called inside the ifdef exists.
-Scott=
^ permalink raw reply
* Re: [v2][KVM][PATCH 1/1] kvm:ppc: enable doorbell exception with CONFIG_PPC_DOORBELL
From: Alexander Graf @ 2013-05-10 18:14 UTC (permalink / raw)
To: Tiejun Chen; +Cc: linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <1367922232-22455-1-git-send-email-tiejun.chen@windriver.com>
On 07.05.2013, at 12:23, Tiejun Chen wrote:
> CONFIG_PPC_DOORBELL is enough to cover all variants.
>=20
> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
> ---
> arch/powerpc/kvm/booke.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>=20
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index 1020119..62d4ece 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -795,7 +795,7 @@ static void kvmppc_restart_interrupt(struct =
kvm_vcpu *vcpu,
> kvmppc_fill_pt_regs(®s);
> timer_interrupt(®s);
> break;
> -#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_BOOK3E_64)
> +#if defined(CONFIG_PPC_DOORBELL)
The same question still holds. How is this an improvement over the =
previous code? Does this fix any issues for you? Is this just a coding =
style cleanup?
Alex
> case BOOKE_INTERRUPT_DOORBELL:
> kvmppc_fill_pt_regs(®s);
> doorbell_exception(®s);
> --=20
> 1.7.9.5
>=20
^ permalink raw reply
* Re: [v1][KVM][PATCH 1/1] kvm:ppc:booehv: direct ISI exception to Guest
From: Alexander Graf @ 2013-05-10 17:57 UTC (permalink / raw)
To: Caraman Mihai Claudiu-B02008
Cc: tiejun.chen, linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
kvm-ppc@vger.kernel.org, Wood Scott-B07421
In-Reply-To: <300B73AA675FCE4A93EB4FC1D42459FF3F80D6@039-SN2MPN1-012.039d.mgd.msft.net>
On 09.05.2013, at 14:36, Caraman Mihai Claudiu-B02008 wrote:
>> -----Original Message-----
>> From: tiejun.chen [mailto:tiejun.chen@windriver.com]
>> Sent: Thursday, May 09, 2013 2:40 PM
>> To: Caraman Mihai Claudiu-B02008
>> Cc: Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org; agraf@suse.de; =
kvm-
>> ppc@vger.kernel.org; kvm@vger.kernel.org
>> Subject: Re: [v1][KVM][PATCH 1/1] kvm:ppc:booehv: direct ISI =
exception to
>> Guest
>>=20
>> On 05/09/2013 07:34 PM, Caraman Mihai Claudiu-B02008 wrote:
>>>>>> VF stands for virtualization fault see MAS8[VF] and we may use it
>> for
>>>> virtualized
>>>>=20
>>>> Looks KVM PPC have no this mechanism currently since I don't find
>> MAS8_VF
>>>> is
>>>> used in kernel, right?
>>>=20
>>> Yes but 'we may use it' in the feature, I have a functional POC with
>> VF.
>>=20
>> Any IO performance to be improved with this POC?
>=20
> VF approach puts more stress on HW TLB so I did not advance with =
performance
> measurements though it may worth to do it.
Could you guys please collect performance data during the next weeks on =
both guest-directed ISIs as well as VF MMIOs (preferably with in-kernel =
MMIO), so that we can decide on the direction that's worth going =
towards?
Thanks!
Alex
^ permalink raw reply
* Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
From: Kevin Hao @ 2013-05-10 14:12 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Wood Scott-B07421, kvm@vger.kernel.org,
Caraman Mihai Claudiu-B02008, agraf@suse.de,
kvm-ppc@vger.kernel.org, tiejun.chen, Bhushan Bharat-R65777,
Scott Wood, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1368137220.3715.14.camel@pasglop>
[-- Attachment #1: Type: text/plain, Size: 2403 bytes --]
On Fri, May 10, 2013 at 08:07:00AM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2013-05-09 at 16:27 -0500, Scott Wood wrote:
> > On 05/09/2013 07:37:42 AM, Benjamin Herrenschmidt wrote:
> > > On Thu, 2013-05-09 at 17:44 +0800, tiejun.chen wrote:
> > > >
> > > > Actually in the case GS=1 even if EE=0, EXT/DEC/DBELL still occur
> > > as I
> > > > recall.
> > >
> > > Only if directed to the hypervisor.
> >
> > This is always the case with KVM, right? At least on booke...
>
> Hrm, on A2 we could choose iirc. Well not DEC but EXT at least, I don't
> remember about DBELL.
>
> > > > > Case 1)
> > > > > -> Local_irq_disable() will set soft_enabled = 0
> > > > > -> Now Externel interrupt happens, there we set PACA_IRQ_EE in
> > > > irq_happened, Also clears EE in SRR1 and rfi. So interrupts are hard
> > > > disabled. No more other interrupt gated by MSR.EE can happen. Looks
> > > > like the idea here is to not let a device keep on inserting
> > > interrupt
> > > > till the interrupt condition on device is cleared, right?
> > >
> > > The external interrupt line is level sensitive normally, so we have to
> > > mask MSR:EE in that case or the interrupt would keep re-occurring
> > > (note
> > > that FSL has this concept of auto-acked interrupts via the on die MPIC
> > > for which you can potentially use PACA_IRQ_EE_EDGE instead and avoid
> > > having to mask MSR:EE).
> >
> > Note that if we do this, we can no longer leave the interrupt vector in
> > EPR, and would have to track (potentially multiple different) pending
> > external interrupts in software.
>
> Right, you can have a little queue in the PACA and leave EE disabled if
> it's full. You can probably get away with a queue of 1 though :-)
So I would assume you will not pick up these two patches, right?
http://patchwork.ozlabs.org/patch/235530/
http://patchwork.ozlabs.org/patch/235532/
Anyway it is more easier to enable the external proxy by using this method.
But if you insist, I can respin a patch to use the method you suggested
since it will definitely reduce the window where the irq is hard disabled.
Thanks,
Kevin
>
> Cheers,
> Ben.
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply
* Re: [upstream] mtd/ifc: fix ifc driver memory release issue
From: Artem Bityutskiy @ 2013-05-10 12:31 UTC (permalink / raw)
To: Brian Norris
Cc: Li Hao, linux-mtd, scottwood, linuxppc-dev, dwmw2, Cao Yonghua
In-Reply-To: <CAN8TOE-fpTYj5Z5VW=MbH6epK9Ymc+=uQkh7V6=k=76_DjouPA@mail.gmail.com>
On Thu, 2013-03-28 at 21:28 -0700, Brian Norris wrote:
> On Wed, Mar 27, 2013 at 5:25 AM, Roy Zang <tie-fei.zang@freescale.com> wrote:
> > memory is allocated by devm_kzalloc, so release it using
> > devm_kfree() instead kfree();
>
> You are correct that it should not be freed with kfree(). But the
> correct solution is that it does not need to be freed (explicitly) at
> all. That's the whole point of the managed allocators (i.e.,
> devm_kzalloc()). Try this patch instead. Totally untested here.
Pushed to l2-mtd.git, thanks!
--
Best Regards,
Artem Bityutskiy
^ permalink raw reply
* Re: Invalid perf_branch_entry.to entries question
From: Benjamin Herrenschmidt @ 2013-05-10 11:51 UTC (permalink / raw)
To: Michael Neuling
Cc: Peter Zijlstra, Linux PPC dev, linux-kernel, eranian,
Anshuman Khandual
In-Reply-To: <CAEjGV6x1=6iFUwQs6=5GY75-J9ZZzs8=CKE=+MGxdwoKycs9jA@mail.gmail.com>
On Fri, 2013-05-10 at 20:50 +1000, Michael Neuling wrote:
> The buffer is in the core (not main memory) and hence only has limited
> entries. So skipping entries that can hopefully be determined in
> other ways means we can log more branches.
>
> That being said, it's a PITA for the kernel ;-)
I would suggest flagging them. As you mention, the code might have been
modified since the sample was taken. Even if it still looks like a
branch and you can compute the "To" address it might not be the right
one ... at least userspace should be notified that this specific sample
is to handle with care.
And if you just can't read the instruction or it's not a branch anymore,
then stick a -1 in there, no way it can be a valid branch address :-)
Cheers,
Ben.
^ permalink raw reply
* Re: Invalid perf_branch_entry.to entries question
From: Michael Neuling @ 2013-05-10 10:50 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Linux PPC dev, linux-kernel, eranian, Anshuman Khandual
In-Reply-To: <20130510104352.GF31235@dyad.programming.kicks-ass.net>
On Fri, May 10, 2013 at 8:43 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, May 09, 2013 at 08:39:15AM +1000, Michael Neuling wrote:
>> > Just because I'm curious.. however does that happen? Surely the CPU
>> > knows where next to fetch instructions?
>>
>> For computed gotos (ie. branch to a register value), the hardware gives
>> you the from and to address in the branch history buffer.
>>
>> For branches where the branch target address is an immediate encoded in
>> the instruction, the hardware only logs the from address. It assumes
>> that software (perf irq handler in this case) can read this branch
>> instruction, calculate the corresponding offset and hence the
>> to/target address.
>>
>> It's entirely possible that when the perf IRQ handler happens, the
>> instruction in question is not readable or is no longer a branch (self
>> modifying code). Hence we aren't able to calculate a valid to address.
>
> Ohh how cute! You've gotta love lazy hardware :-)
The buffer is in the core (not main memory) and hence only has limited
entries. So skipping entries that can hopefully be determined in
other ways means we can log more branches.
That being said, it's a PITA for the kernel ;-)
Mikey
^ permalink raw reply
* Re: Invalid perf_branch_entry.to entries question
From: Peter Zijlstra @ 2013-05-10 10:43 UTC (permalink / raw)
To: Michael Neuling; +Cc: Linux PPC dev, linux-kernel, eranian, Anshuman Khandual
In-Reply-To: <14691.1368052755@ale.ozlabs.ibm.com>
On Thu, May 09, 2013 at 08:39:15AM +1000, Michael Neuling wrote:
> > Just because I'm curious.. however does that happen? Surely the CPU
> > knows where next to fetch instructions?
>
> For computed gotos (ie. branch to a register value), the hardware gives
> you the from and to address in the branch history buffer.
>
> For branches where the branch target address is an immediate encoded in
> the instruction, the hardware only logs the from address. It assumes
> that software (perf irq handler in this case) can read this branch
> instruction, calculate the corresponding offset and hence the
> to/target address.
>
> It's entirely possible that when the perf IRQ handler happens, the
> instruction in question is not readable or is no longer a branch (self
> modifying code). Hence we aren't able to calculate a valid to address.
Ohh how cute! You've gotta love lazy hardware :-)
^ permalink raw reply
* [PATCH] powerpc/perf: Fix setting of "to" addresses for BHRB
From: Michael Neuling @ 2013-05-10 9:56 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Anshuman Khandual
In-Reply-To: <1368064009-13255-2-git-send-email-mikey@neuling.org>
Currently we only set the "to" address in the branch stack when the CPU
explicitly gives us a value. Unfortunately it only does this for XL form
branches (eg blr, bctr, bctar) and not I and B form branches (eg b, bc).
Fortunately if we read the instruction from memory we can extract the offset of
a branch and calculate the target address.
This adds a function power_pmu_bhrb_to() to calculate the target/to address of
the corresponding I and B form branches. It handles branches in both user and
kernel spaces. It also plumbs this into the perf brhb reading code.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
Minor update to add __user to addr pointer cast in __get_user_inatomic()
as suggested by milton.
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 2d81372..501e47f 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -13,11 +13,13 @@
#include <linux/perf_event.h>
#include <linux/percpu.h>
#include <linux/hardirq.h>
+#include <linux/uaccess.h>
#include <asm/reg.h>
#include <asm/pmc.h>
#include <asm/machdep.h>
#include <asm/firmware.h>
#include <asm/ptrace.h>
+#include <asm/code-patching.h>
#define BHRB_MAX_ENTRIES 32
#define BHRB_TARGET 0x0000000000000002
@@ -1458,6 +1460,33 @@ struct pmu power_pmu = {
.flush_branch_stack = power_pmu_flush_branch_stack,
};
+/* Calculate the to address for a branch */
+static __u64 power_pmu_bhrb_to(u64 addr)
+{
+ unsigned int instr;
+ int ret;
+ __u64 target;
+
+ if (is_kernel_addr(addr))
+ return branch_target((unsigned int *)addr);
+
+ /* Userspace: need copy instruction here then translate it */
+ pagefault_disable();
+ ret = __get_user_inatomic(instr, (unsigned int __user *)addr);
+ if (ret) {
+ pagefault_enable();
+ return 0;
+ }
+ pagefault_enable();
+
+ target = branch_target(&instr);
+ if ((!target) || (instr & BRANCH_ABSOLUTE))
+ return target;
+
+ /* Translate relative branch target from kernel to user address */
+ return target - (unsigned long)&instr + addr;
+}
+
/* Processing BHRB entries */
void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
{
@@ -1521,7 +1550,8 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
/* Branches to immediate field
(ie I or B form) */
cpuhw->bhrb_entries[u_index].from = addr;
- cpuhw->bhrb_entries[u_index].to = 0;
+ cpuhw->bhrb_entries[u_index].to =
+ power_pmu_bhrb_to(addr);
cpuhw->bhrb_entries[u_index].mispred = pred;
cpuhw->bhrb_entries[u_index].predicted = ~pred;
}
^ permalink raw reply related
* Re: [PATCH 2/6] KVM: PPC: Add support for multiple-TCE hcalls
From: Alexey Kardashevskiy @ 2013-05-10 7:53 UTC (permalink / raw)
To: David Gibson
Cc: kvm, Joerg Roedel, Alexander Graf, kvm-ppc, Alex Williamson,
Paul Mackerras, linuxppc-dev
In-Reply-To: <20130510065101.GF23358@truffula.fritz.box>
On 05/10/2013 04:51 PM, David Gibson wrote:
> On Mon, May 06, 2013 at 05:25:53PM +1000, Alexey Kardashevskiy wrote:
>> This adds real mode handlers for the H_PUT_TCE_INDIRECT and
>> H_STUFF_TCE hypercalls for QEMU emulated devices such as virtio
>> devices or emulated PCI. These calls allow adding multiple entries
>> (up to 512) into the TCE table in one call which saves time on
>> transition to/from real mode.
>>
>> This adds a guest physical to host real address converter
>> and calls the existing H_PUT_TCE handler. The converting function
>> is going to be fully utilized by upcoming VFIO supporting patches.
>>
>> This also implements the KVM_CAP_PPC_MULTITCE capability,
>> so in order to support the functionality of this patch, QEMU
>> needs to query for this capability and set the "hcall-multi-tce"
>> hypertas property only if the capability is present, otherwise
>> there will be serious performance degradation.
>
>
> Hrm. Clearly I didn't read this carefully enough before. There are
> some problems here.
?
> [snip]
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 72ffc89..643ac1e 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -14,6 +14,7 @@
>> *
>> * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>> * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
>> + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>> */
>>
>> #include <linux/types.h>
>> @@ -36,9 +37,14 @@
>> #include <asm/ppc-opcode.h>
>> #include <asm/kvm_host.h>
>> #include <asm/udbg.h>
>> +#include <asm/iommu.h>
>>
>> #define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64))
>> +#define ERROR_ADDR (~(unsigned long)0x0)
>>
>> +/*
>> + * TCE tables handlers.
>> + */
>> static long kvmppc_stt_npages(unsigned long window_size)
>> {
>> return ALIGN((window_size >> SPAPR_TCE_SHIFT)
>> @@ -148,3 +154,111 @@ fail:
>> }
>> return ret;
>> }
>> +
>> +/*
>> + * Virtual mode handling of IOMMU map/unmap.
>> + */
>> +/* Converts guest physical address into host virtual */
>> +static unsigned long get_virt_address(struct kvm_vcpu *vcpu,
>> + unsigned long gpa)
>
> This should probably return a void * rather than an unsigned long.
> Well, actually a void __user *.
>
>> +{
>> + unsigned long hva, gfn = gpa >> PAGE_SHIFT;
>> + struct kvm_memory_slot *memslot;
>> +
>> + memslot = search_memslots(kvm_memslots(vcpu->kvm), gfn);
>> + if (!memslot)
>> + return ERROR_ADDR;
>> +
>> + /*
>> + * Convert gfn to hva preserving flags and an offset
>> + * within a system page
>> + */
>> + hva = __gfn_to_hva_memslot(memslot, gfn) + (gpa & ~PAGE_MASK);
>> + return hva;
>> +}
>> +
>> +long kvmppc_virtmode_h_put_tce(struct kvm_vcpu *vcpu,
>> + unsigned long liobn, unsigned long ioba,
>> + unsigned long tce)
>> +{
>> + struct kvmppc_spapr_tce_table *tt;
>> +
>> + tt = kvmppc_find_tce_table(vcpu, liobn);
>> + /* Didn't find the liobn, put it to userspace */
>> + if (!tt)
>> + return H_TOO_HARD;
>> +
>> + /* Emulated IO */
>> + return kvmppc_emulated_h_put_tce(tt, ioba, tce);
>> +}
>> +
>> +long kvmppc_virtmode_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>> + unsigned long liobn, unsigned long ioba,
>> + unsigned long tce_list, unsigned long npages)
>> +{
>> + struct kvmppc_spapr_tce_table *tt;
>> + long i;
>> + unsigned long tces;
>> +
>> + /* The whole table addressed by tce_list resides in 4K page */
>> + if (npages > 512)
>> + return H_PARAMETER;
>
> So, that doesn't actually verify what the comment says it does - only
> that the list is < 4K in total. You need to check the alignment of
> tce_list as well.
The spec says to return H_PARAMETER if >512. I.e. it takes just 1 page and
I do not need to bother if pages may not lay continuously in RAM (matters
for real mode).
/*
* As the spec is saying that maximum possible number of TCEs is 512,
* the whole TCE page is no more than 4K. Therefore we do not have to
* worry if pages do not lie continuously in the RAM
*/
Any better?...
>> +
>> + tt = kvmppc_find_tce_table(vcpu, liobn);
>> + /* Didn't find the liobn, put it to userspace */
>> + if (!tt)
>> + return H_TOO_HARD;
>> +
>> + tces = get_virt_address(vcpu, tce_list);
>> + if (tces == ERROR_ADDR)
>> + return H_TOO_HARD;
>> +
>> + /* Emulated IO */
>
> This comment doesn't seem to have any bearing on the test which
> follows it.
>
>> + if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
>> + return H_PARAMETER;
>> +
>> + for (i = 0; i < npages; ++i) {
>> + unsigned long tce;
>> + unsigned long ptce = tces + i * sizeof(unsigned long);
>> +
>> + if (get_user(tce, (unsigned long __user *)ptce))
>> + break;
>> +
>> + if (kvmppc_emulated_h_put_tce(tt,
>> + ioba + (i << IOMMU_PAGE_SHIFT), tce))
>> + break;
>> + }
>> + if (i == npages)
>> + return H_SUCCESS;
>> +
>> + /* Failed, do cleanup */
>> + do {
>> + --i;
>> + kvmppc_emulated_h_put_tce(tt, ioba + (i << IOMMU_PAGE_SHIFT),
>> + 0);
>> + } while (i);
>
> Hrm, so, actually PAPR specifies that this hcall is supposed to first
> copy the given tces to hypervisor memory, then translate (and
> validate) them all, and only then touch the actual TCE table. Rather
> more complicated to do, but I guess we should - that would get rid of
> the need for this partial cleanup in the failure case.
So we have to kmalloc(4K) on every PUT_INDIRECT. Or we can put tces on the
stack (4K is quire a lot for the kernel, no)?
>> +
>> + return H_PARAMETER;
>> +}
>> +
>> +long kvmppc_virtmode_h_stuff_tce(struct kvm_vcpu *vcpu,
>> + unsigned long liobn, unsigned long ioba,
>> + unsigned long tce_value, unsigned long npages)
>> +{
>> + struct kvmppc_spapr_tce_table *tt;
>> + long i;
>> +
>> + tt = kvmppc_find_tce_table(vcpu, liobn);
>> + /* Didn't find the liobn, put it to userspace */
>> + if (!tt)
>> + return H_TOO_HARD;
>> +
>> + /* Emulated IO */
>> + if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
>> + return H_PARAMETER;
>> +
>> + for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE)
>> + kvmppc_emulated_h_put_tce(tt, ioba, tce_value);
>> +
>> + return H_SUCCESS;
>> +}
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index 30c2f3b..55fdf7a 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -14,6 +14,7 @@
>> *
>> * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>> * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
>> + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>> */
>>
>> #include <linux/types.h>
>> @@ -35,42 +36,214 @@
>> #include <asm/ppc-opcode.h>
>> #include <asm/kvm_host.h>
>> #include <asm/udbg.h>
>> +#include <asm/iommu.h>
>> +#include <asm/tce.h>
>>
>> #define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64))
>> +#define ERROR_ADDR (~(unsigned long)0x0)
>>
>> -/* WARNING: This will be called in real-mode on HV KVM and virtual
>> - * mode on PR KVM
>> +/*
>> + * Finds a TCE table descriptor by LIOBN.
>> */
>> +struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(struct kvm_vcpu *vcpu,
>> + unsigned long liobn)
>> +{
>> + struct kvmppc_spapr_tce_table *tt;
>> +
>> + list_for_each_entry(tt, &vcpu->kvm->arch.spapr_tce_tables, list) {
>> + if (tt->liobn == liobn)
>> + return tt;
>> + }
>> +
>> + return NULL;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_find_tce_table);
>> +
>> +/*
>> + * kvmppc_emulated_h_put_tce() handles TCE requests for devices emulated
>> + * by QEMU. It puts guest TCE values into the table and expects
>> + * the QEMU to convert them later in the QEMU device implementation.
>> + * Works in both real and virtual modes.
>> + */
>> +long kvmppc_emulated_h_put_tce(struct kvmppc_spapr_tce_table *tt,
>> + unsigned long ioba, unsigned long tce)
>> +{
>> + unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
>> + struct page *page;
>> + u64 *tbl;
>> +
>> + /* udbg_printf("H_PUT_TCE: liobn 0x%lx => tt=%p window_size=0x%x\n", */
>> + /* liobn, tt, tt->window_size); */
>> + if (ioba >= tt->window_size) {
>> + /* pr_err("%s failed on ioba=%lx\n", __func__, ioba); */
>> + return H_PARAMETER;
>> + }
>> + /*
>> + * Note on the use of page_address() in real mode,
>> + *
>> + * It is safe to use page_address() in real mode on ppc64 because
>> + * page_address() is always defined as lowmem_page_address()
>> + * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial
>> + * operation and does not access page struct.
>> + *
>> + * Theoretically page_address() could be defined different
>> + * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
>> + * should be enabled.
>> + * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64,
>> + * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only
>> + * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP
>> + * is not expected to be enabled on ppc32, page_address()
>> + * is safe for ppc32 as well.
>> + */
>> +#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL)
>> +#error TODO: fix to avoid page_address() here
>> +#endif
>> + page = tt->pages[idx / TCES_PER_PAGE];
>> + tbl = (u64 *)page_address(page);
>> +
>> + /*
>> + * Validate TCE address.
>> + * At the moment only flags are validated
>> + * as other check will significantly slow down
>> + * or can make it even impossible to handle TCE requests
>> + * in real mode.
>> + */
>> + if (tce & ~(IOMMU_PAGE_MASK | TCE_PCI_WRITE | TCE_PCI_READ))
>> + return H_PARAMETER;
>> +
>> + /* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
>> + tbl[idx % TCES_PER_PAGE] = tce;
>> +
>> + return H_SUCCESS;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_emulated_h_put_tce);
>> +
>> +#ifdef CONFIG_KVM_BOOK3S_64_HV
>> +/*
>> + * Converts guest physical address into host real address.
>> + * Also returns pte and page size if the page is present in page table.
>> + */
>> +static unsigned long get_real_address(struct kvm_vcpu *vcpu,
>> + unsigned long gpa, bool writing,
>> + pte_t *ptep, unsigned long *pg_sizep)
>
> The only caller doesn't use the ptep and pg_sizep pointers, so there's
> no point implementing them.
"KVM: PPC: Add support for IOMMU in-kernel handling" will. Is there much
sense in splitting this quite small function between patches?
--
Alexey
^ permalink raw reply
* Re: [PATCH v2 1/4] powerpc: hard_irq_disable(): Call trace_hardirqs_off after disabling
From: Benjamin Herrenschmidt @ 2013-05-10 7:00 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev, Alexander Graf, kvm-ppc, kvm
In-Reply-To: <1368155384-11035-2-git-send-email-scottwood@freescale.com>
On Thu, 2013-05-09 at 22:09 -0500, Scott Wood wrote:
> lockdep.c has this:
> /*
> * So we're supposed to get called after you mask local IRQs,
> * but for some reason the hardware doesn't quite think you did
> * a proper job.
> */
> if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
> return;
>
> Since irqs_disabled() is based on soft_enabled(), that (not just the
> hard EE bit) needs to be 0 before we call trace_hardirqs_off.
>
> Signed-off-by: Scott Wood <scottwood@freescale.com>
Oops ;-)
I'll apply that to my tree and will send it to Linus right after -rc1,
the rest will go the normal way for KVM patches.
Cheers,
Ben.
> ---
> arch/powerpc/include/asm/hw_irq.h | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> index d615b28..ba713f1 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -96,11 +96,12 @@ static inline bool arch_irqs_disabled(void)
> #endif
>
> #define hard_irq_disable() do { \
> + u8 _was_enabled = get_paca()->soft_enabled; \
> __hard_irq_disable(); \
> - if (local_paca->soft_enabled) \
> - trace_hardirqs_off(); \
> get_paca()->soft_enabled = 0; \
> get_paca()->irq_happened |= PACA_IRQ_HARD_DIS; \
> + if (_was_enabled) \
> + trace_hardirqs_off(); \
> } while(0)
>
> static inline bool lazy_irq_pending(void)
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox