LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 12/35] KVM: PPC: Remove unused define
From: Alexander Graf @ 2010-08-31  2:31 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Linuxppc-dev, KVM list
In-Reply-To: <1283221937-21006-1-git-send-email-agraf@suse.de>

The define VSID_ALL is unused. Let's remove it.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s_64_mmu_host.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
index e7c4d00..4040c8d 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -30,7 +30,6 @@
 #include "trace.h"
 
 #define PTE_SIZE 12
-#define VSID_ALL 0
 
 void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
 {
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 24/35] KVM: PPC: initialize IVORs in addition to IVPR
From: Alexander Graf @ 2010-08-31  2:32 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Linuxppc-dev, KVM list
In-Reply-To: <1283221937-21006-1-git-send-email-agraf@suse.de>

From: Hollis Blanchard <hollis_blanchard@mentor.com>

Developers can now tell at a glace the exact type of the premature interrupt,
instead of just knowing that there was some premature interrupt.

Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/booke.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index c604277..835f6d0 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -497,15 +497,19 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 /* Initial guest state: 16MB mapping 0 -> 0, PC = 0, MSR = 0, R1 = 16MB */
 int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
 {
+	int i;
+
 	vcpu->arch.pc = 0;
 	vcpu->arch.shared->msr = 0;
 	kvmppc_set_gpr(vcpu, 1, (16<<20) - 8); /* -8 for the callee-save LR slot */
 
 	vcpu->arch.shadow_pid = 1;
 
-	/* Eye-catching number so we know if the guest takes an interrupt
-	 * before it's programmed its own IVPR. */
+	/* Eye-catching numbers so we know if the guest takes an interrupt
+	 * before it's programmed its own IVPR/IVORs. */
 	vcpu->arch.ivpr = 0x55550000;
+	for (i = 0; i < BOOKE_IRQPRIO_MAX; i++)
+		vcpu->arch.ivor[i] = 0x7700 | i * 4;
 
 	kvmppc_init_timing_stats(vcpu);
 
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 32/35] KVM: PPC: Document KVM_INTERRUPT ioctl
From: Alexander Graf @ 2010-08-31  2:32 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Linuxppc-dev, KVM list
In-Reply-To: <1283221937-21006-1-git-send-email-agraf@suse.de>

This adds some documentation for the KVM_INTERRUPT special cases that
PowerPC now implements.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 Documentation/kvm/api.txt |   33 +++++++++++++++++++++++++++++++--
 1 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index 44d9893..24d6341 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -320,13 +320,13 @@ struct kvm_translation {
 4.15 KVM_INTERRUPT
 
 Capability: basic
-Architectures: x86
+Architectures: x86, ppc
 Type: vcpu ioctl
 Parameters: struct kvm_interrupt (in)
 Returns: 0 on success, -1 on error
 
 Queues a hardware interrupt vector to be injected.  This is only
-useful if in-kernel local APIC is not used.
+useful if in-kernel local APIC or equivalent is not used.
 
 /* for KVM_INTERRUPT */
 struct kvm_interrupt {
@@ -334,8 +334,37 @@ struct kvm_interrupt {
 	__u32 irq;
 };
 
+X86:
+
 Note 'irq' is an interrupt vector, not an interrupt pin or line.
 
+PPC:
+
+Queues an external interrupt to be injected. This ioctl is overleaded
+with 3 different irq values:
+
+a) KVM_INTERRUPT_SET
+
+  This injects an edge type external interrupt into the guest once it's ready
+  to receive interrupts. When injected, the interrupt is done.
+
+b) KVM_INTERRUPT_UNSET
+
+  This unsets any pending interrupt.
+
+  Only available with KVM_CAP_PPC_UNSET_IRQ.
+
+c) KVM_INTERRUPT_SET_LEVEL
+
+  This injects a level type external interrupt into the guest context. The
+  interrupt stays pending until a specific ioctl with KVM_INTERRUPT_UNSET
+  is triggered.
+
+  Only available with KVM_CAP_PPC_IRQ_LEVEL.
+
+Note that any value for 'irq' other than the ones stated above is invalid
+and incurs unexpected behavior.
+
 4.16 KVM_DEBUG_GUEST
 
 Capability: basic
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 23/35] KVM: PPC: Don't put MSR_POW in MSR
From: Alexander Graf @ 2010-08-31  2:32 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Linuxppc-dev, KVM list
In-Reply-To: <1283221937-21006-1-git-send-email-agraf@suse.de>

On Book3S a mtmsr with the MSR_POW bit set indicates that the OS is in
idle and only needs to be waked up on the next interrupt.

Now, unfortunately we let that bit slip into the stored MSR value which
is not what the real CPU does, so that we ended up executing code like
this:

	r = mfmsr();
	/* r containts MSR_POW */
	mtmsr(r | MSR_EE);

This obviously breaks, as we're going into idle mode in code sections that
don't expect to be idling.

This patch masks MSR_POW out of the stored MSR value on wakeup, making
guests happy again.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 7adea63..5833df7 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -134,10 +134,14 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
 	vcpu->arch.shared->msr = msr;
 	kvmppc_recalc_shadow_msr(vcpu);
 
-	if (msr & (MSR_WE|MSR_POW)) {
+	if (msr & MSR_POW) {
 		if (!vcpu->arch.pending_exceptions) {
 			kvm_vcpu_block(vcpu);
 			vcpu->stat.halt_wakeup++;
+
+			/* Unset POW bit after we woke up */
+			msr &= ~MSR_POW;
+			vcpu->arch.shared->msr = msr;
 		}
 	}
 
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 11/35] KVM: PPC: Revert "KVM: PPC: Use kernel hash function"
From: Alexander Graf @ 2010-08-31  2:31 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Linuxppc-dev, KVM list
In-Reply-To: <1283221937-21006-1-git-send-email-agraf@suse.de>

It turns out the in-kernel hash function is sub-optimal for our subtle
hash inputs where every bit is significant. So let's revert to the original
hash functions.

This reverts commit 05340ab4f9a6626f7a2e8f9fe5397c61d494f445.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s_32_mmu_host.c |   10 ++++++++--
 arch/powerpc/kvm/book3s_64_mmu_host.c |   11 +++++++++--
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 343452c..57dddeb 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -19,7 +19,6 @@
  */
 
 #include <linux/kvm_host.h>
-#include <linux/hash.h>
 
 #include <asm/kvm_ppc.h>
 #include <asm/kvm_book3s.h>
@@ -77,7 +76,14 @@ void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
  * a hash, so we don't waste cycles on looping */
 static u16 kvmppc_sid_hash(struct kvm_vcpu *vcpu, u64 gvsid)
 {
-	return hash_64(gvsid, SID_MAP_BITS);
+	return (u16)(((gvsid >> (SID_MAP_BITS * 7)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 6)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 5)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 4)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 3)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 2)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 1)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 0)) & SID_MAP_MASK));
 }
 
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
index 321c931..e7c4d00 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -20,7 +20,6 @@
  */
 
 #include <linux/kvm_host.h>
-#include <linux/hash.h>
 
 #include <asm/kvm_ppc.h>
 #include <asm/kvm_book3s.h>
@@ -44,9 +43,17 @@ void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
  * a hash, so we don't waste cycles on looping */
 static u16 kvmppc_sid_hash(struct kvm_vcpu *vcpu, u64 gvsid)
 {
-	return hash_64(gvsid, SID_MAP_BITS);
+	return (u16)(((gvsid >> (SID_MAP_BITS * 7)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 6)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 5)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 4)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 3)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 2)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 1)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 0)) & SID_MAP_MASK));
 }
 
+
 static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu *vcpu, u64 gvsid)
 {
 	struct kvmppc_sid_map *map;
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 18/35] KVM: PPC: Make PV mtmsr work with r30 and r31
From: Alexander Graf @ 2010-08-31  2:31 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Linuxppc-dev, KVM list
In-Reply-To: <1283221937-21006-1-git-send-email-agraf@suse.de>

So far we've been restricting ourselves to r0-r29 as registers an mtmsr
instruction could use. This was bad, as there are some code paths in
Linux actually using r30.

So let's instead handle all registers gracefully and get rid of that
stupid limitation

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kernel/kvm.c      |   39 ++++++++++++++++++++++++++++++++-------
 arch/powerpc/kernel/kvm_emul.S |   17 ++++++++---------
 2 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 43ec78a..517967d 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -43,6 +43,7 @@
 #define KVM_INST_B_MAX		0x01ffffff
 
 #define KVM_MASK_RT		0x03e00000
+#define KVM_RT_30		0x03c00000
 #define KVM_MASK_RB		0x0000f800
 #define KVM_INST_MFMSR		0x7c0000a6
 #define KVM_INST_MFSPR_SPRG0	0x7c1042a6
@@ -83,6 +84,15 @@ static inline void kvm_patch_ins(u32 *inst, u32 new_inst)
 	flush_icache_range((ulong)inst, (ulong)inst + 4);
 }
 
+static void kvm_patch_ins_ll(u32 *inst, long addr, u32 rt)
+{
+#ifdef CONFIG_64BIT
+	kvm_patch_ins(inst, KVM_INST_LD | rt | (addr & 0x0000fffc));
+#else
+	kvm_patch_ins(inst, KVM_INST_LWZ | rt | (addr & 0x0000fffc));
+#endif
+}
+
 static void kvm_patch_ins_ld(u32 *inst, long addr, u32 rt)
 {
 #ifdef CONFIG_64BIT
@@ -187,7 +197,6 @@ static void kvm_patch_ins_mtmsrd(u32 *inst, u32 rt)
 extern u32 kvm_emulate_mtmsr_branch_offs;
 extern u32 kvm_emulate_mtmsr_reg1_offs;
 extern u32 kvm_emulate_mtmsr_reg2_offs;
-extern u32 kvm_emulate_mtmsr_reg3_offs;
 extern u32 kvm_emulate_mtmsr_orig_ins_offs;
 extern u32 kvm_emulate_mtmsr_len;
 extern u32 kvm_emulate_mtmsr[];
@@ -217,9 +226,27 @@ static void kvm_patch_ins_mtmsr(u32 *inst, u32 rt)
 	/* Modify the chunk to fit the invocation */
 	memcpy(p, kvm_emulate_mtmsr, kvm_emulate_mtmsr_len * 4);
 	p[kvm_emulate_mtmsr_branch_offs] |= distance_end & KVM_INST_B_MASK;
-	p[kvm_emulate_mtmsr_reg1_offs] |= rt;
-	p[kvm_emulate_mtmsr_reg2_offs] |= rt;
-	p[kvm_emulate_mtmsr_reg3_offs] |= rt;
+
+	/* Make clobbered registers work too */
+	switch (get_rt(rt)) {
+	case 30:
+		kvm_patch_ins_ll(&p[kvm_emulate_mtmsr_reg1_offs],
+				 magic_var(scratch2), KVM_RT_30);
+		kvm_patch_ins_ll(&p[kvm_emulate_mtmsr_reg2_offs],
+				 magic_var(scratch2), KVM_RT_30);
+		break;
+	case 31:
+		kvm_patch_ins_ll(&p[kvm_emulate_mtmsr_reg1_offs],
+				 magic_var(scratch1), KVM_RT_30);
+		kvm_patch_ins_ll(&p[kvm_emulate_mtmsr_reg2_offs],
+				 magic_var(scratch1), KVM_RT_30);
+		break;
+	default:
+		p[kvm_emulate_mtmsr_reg1_offs] |= rt;
+		p[kvm_emulate_mtmsr_reg2_offs] |= rt;
+		break;
+	}
+
 	p[kvm_emulate_mtmsr_orig_ins_offs] = *inst;
 	flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsr_len * 4);
 
@@ -403,9 +430,7 @@ static void kvm_check_ins(u32 *inst, u32 features)
 		break;
 	case KVM_INST_MTMSR:
 	case KVM_INST_MTMSRD_L0:
-		/* We use r30 and r31 during the hook */
-		if (get_rt(inst_rt) < 30)
-			kvm_patch_ins_mtmsr(inst, inst_rt);
+		kvm_patch_ins_mtmsr(inst, inst_rt);
 		break;
 	}
 
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index a6e97e7..6530532 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -135,7 +135,8 @@ kvm_emulate_mtmsr:
 
 	/* Find the changed bits between old and new MSR */
 kvm_emulate_mtmsr_reg1:
-	xor	r31, r0, r31
+	ori	r30, r0, 0
+	xor	r31, r30, r31
 
 	/* Check if we need to really do mtmsr */
 	LOAD_REG_IMMEDIATE(r30, MSR_CRITICAL_BITS)
@@ -156,14 +157,17 @@ kvm_emulate_mtmsr_orig_ins:
 
 maybe_stay_in_guest:
 
+	/* Get the target register in r30 */
+kvm_emulate_mtmsr_reg2:
+	ori	r30, r0, 0
+
 	/* Check if we have to fetch an interrupt */
 	lwz	r31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0)
 	cmpwi	r31, 0
 	beq+	no_mtmsr
 
 	/* Check if we may trigger an interrupt */
-kvm_emulate_mtmsr_reg2:
-	andi.	r31, r0, MSR_EE
+	andi.	r31, r30, MSR_EE
 	beq	no_mtmsr
 
 	b	do_mtmsr
@@ -171,8 +175,7 @@ kvm_emulate_mtmsr_reg2:
 no_mtmsr:
 
 	/* Put MSR into magic page because we don't call mtmsr */
-kvm_emulate_mtmsr_reg3:
-	STL64(r0, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+	STL64(r30, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
 
 	SCRATCH_RESTORE
 
@@ -193,10 +196,6 @@ kvm_emulate_mtmsr_reg1_offs:
 kvm_emulate_mtmsr_reg2_offs:
 	.long (kvm_emulate_mtmsr_reg2 - kvm_emulate_mtmsr) / 4
 
-.global kvm_emulate_mtmsr_reg3_offs
-kvm_emulate_mtmsr_reg3_offs:
-	.long (kvm_emulate_mtmsr_reg3 - kvm_emulate_mtmsr) / 4
-
 .global kvm_emulate_mtmsr_orig_ins_offs
 kvm_emulate_mtmsr_orig_ins_offs:
 	.long (kvm_emulate_mtmsr_orig_ins - kvm_emulate_mtmsr) / 4
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 20/35] KVM: PPC: Make PV mtmsrd L=1 work with r30 and r31
From: Alexander Graf @ 2010-08-31  2:32 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Linuxppc-dev, KVM list
In-Reply-To: <1283221937-21006-1-git-send-email-agraf@suse.de>

We had an arbitrary limitation in mtmsrd L=1 that kept us from using r30 and
r31 as input registers. Let's get rid of that and get more potential speedups!

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kernel/kvm.c      |   21 +++++++++++++++++----
 arch/powerpc/kernel/kvm_emul.S |    8 +++++++-
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 517967d..517da39 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -159,6 +159,7 @@ static u32 *kvm_alloc(int len)
 
 extern u32 kvm_emulate_mtmsrd_branch_offs;
 extern u32 kvm_emulate_mtmsrd_reg_offs;
+extern u32 kvm_emulate_mtmsrd_orig_ins_offs;
 extern u32 kvm_emulate_mtmsrd_len;
 extern u32 kvm_emulate_mtmsrd[];
 
@@ -187,7 +188,21 @@ static void kvm_patch_ins_mtmsrd(u32 *inst, u32 rt)
 	/* Modify the chunk to fit the invocation */
 	memcpy(p, kvm_emulate_mtmsrd, kvm_emulate_mtmsrd_len * 4);
 	p[kvm_emulate_mtmsrd_branch_offs] |= distance_end & KVM_INST_B_MASK;
-	p[kvm_emulate_mtmsrd_reg_offs] |= rt;
+	switch (get_rt(rt)) {
+	case 30:
+		kvm_patch_ins_ll(&p[kvm_emulate_mtmsrd_reg_offs],
+				 magic_var(scratch2), KVM_RT_30);
+		break;
+	case 31:
+		kvm_patch_ins_ll(&p[kvm_emulate_mtmsrd_reg_offs],
+				 magic_var(scratch1), KVM_RT_30);
+		break;
+	default:
+		p[kvm_emulate_mtmsrd_reg_offs] |= rt;
+		break;
+	}
+
+	p[kvm_emulate_mtmsrd_orig_ins_offs] = *inst;
 	flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsrd_len * 4);
 
 	/* Patch the invocation */
@@ -424,9 +439,7 @@ static void kvm_check_ins(u32 *inst, u32 features)
 
 	/* Rewrites */
 	case KVM_INST_MTMSRD_L1:
-		/* We use r30 and r31 during the hook */
-		if (get_rt(inst_rt) < 30)
-			kvm_patch_ins_mtmsrd(inst, inst_rt);
+		kvm_patch_ins_mtmsrd(inst, inst_rt);
 		break;
 	case KVM_INST_MTMSR:
 	case KVM_INST_MTMSRD_L0:
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index 6530532..f2b1b25 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -78,7 +78,8 @@ kvm_emulate_mtmsrd:
 
 	/* OR the register's (MSR_EE|MSR_RI) on MSR */
 kvm_emulate_mtmsrd_reg:
-	andi.	r30, r0, (MSR_EE|MSR_RI)
+	ori	r30, r0, 0
+	andi.	r30, r30, (MSR_EE|MSR_RI)
 	or	r31, r31, r30
 
 	/* Put MSR back into magic page */
@@ -96,6 +97,7 @@ kvm_emulate_mtmsrd_reg:
 	SCRATCH_RESTORE
 
 	/* Nag hypervisor */
+kvm_emulate_mtmsrd_orig_ins:
 	tlbsync
 
 	b	kvm_emulate_mtmsrd_branch
@@ -117,6 +119,10 @@ kvm_emulate_mtmsrd_branch_offs:
 kvm_emulate_mtmsrd_reg_offs:
 	.long (kvm_emulate_mtmsrd_reg - kvm_emulate_mtmsrd) / 4
 
+.global kvm_emulate_mtmsrd_orig_ins_offs
+kvm_emulate_mtmsrd_orig_ins_offs:
+	.long (kvm_emulate_mtmsrd_orig_ins - kvm_emulate_mtmsrd) / 4
+
 .global kvm_emulate_mtmsrd_len
 kvm_emulate_mtmsrd_len:
 	.long (kvm_emulate_mtmsrd_end - kvm_emulate_mtmsrd) / 4
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 13/35] KVM: PPC: Add feature bitmap for magic page
From: Alexander Graf @ 2010-08-31  2:31 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Linuxppc-dev, KVM list
In-Reply-To: <1283221937-21006-1-git-send-email-agraf@suse.de>

We will soon add SR PV support to the shared page, so we need some
infrastructure that allows the guest to query for features KVM exports.

This patch adds a second return value to the magic mapping that
indicated to the guest which features are available.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/kvm_para.h |    2 ++
 arch/powerpc/kernel/kvm.c           |   21 +++++++++++++++------
 arch/powerpc/kvm/powerpc.c          |    5 ++++-
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h
index 7438ab3..43c1b22 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -47,6 +47,8 @@ struct kvm_vcpu_arch_shared {
 
 #define KVM_FEATURE_MAGIC_PAGE	1
 
+#define KVM_MAGIC_FEAT_SR	(1 << 0)
+
 #ifdef __KERNEL__
 
 #ifdef CONFIG_KVM_GUEST
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index e936817..f48144f 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -267,12 +267,20 @@ static void kvm_patch_ins_wrteei(u32 *inst)
 
 static void kvm_map_magic_page(void *data)
 {
-	kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
-		       KVM_MAGIC_PAGE,  /* Physical Address */
-		       KVM_MAGIC_PAGE); /* Effective Address */
+	u32 *features = data;
+
+	ulong in[8];
+	ulong out[8];
+
+	in[0] = KVM_MAGIC_PAGE;
+	in[1] = KVM_MAGIC_PAGE;
+
+	kvm_hypercall(in, out, HC_VENDOR_KVM | KVM_HC_PPC_MAP_MAGIC_PAGE);
+
+	*features = out[0];
 }
 
-static void kvm_check_ins(u32 *inst)
+static void kvm_check_ins(u32 *inst, u32 features)
 {
 	u32 _inst = *inst;
 	u32 inst_no_rt = _inst & ~KVM_MASK_RT;
@@ -368,9 +376,10 @@ static void kvm_use_magic_page(void)
 	u32 *p;
 	u32 *start, *end;
 	u32 tmp;
+	u32 features;
 
 	/* Tell the host to map the magic page to -4096 on all CPUs */
-	on_each_cpu(kvm_map_magic_page, NULL, 1);
+	on_each_cpu(kvm_map_magic_page, &features, 1);
 
 	/* Quick self-test to see if the mapping works */
 	if (__get_user(tmp, (u32*)KVM_MAGIC_PAGE)) {
@@ -383,7 +392,7 @@ static void kvm_use_magic_page(void)
 	end = (void*)_etext;
 
 	for (p = start; p < end; p++)
-		kvm_check_ins(p);
+		kvm_check_ins(p, features);
 
 	printk(KERN_INFO "KVM: Live patching for a fast VM %s\n",
 			 kvm_patching_worked ? "worked" : "failed");
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6a53a3f..496d7a5 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -66,6 +66,8 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 		vcpu->arch.magic_page_pa = param1;
 		vcpu->arch.magic_page_ea = param2;
 
+		r2 = 0;
+
 		r = HC_EV_SUCCESS;
 		break;
 	}
@@ -76,13 +78,14 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 #endif
 
 		/* Second return value is in r4 */
-		kvmppc_set_gpr(vcpu, 4, r2);
 		break;
 	default:
 		r = HC_EV_UNIMPLEMENTED;
 		break;
 	}
 
+	kvmppc_set_gpr(vcpu, 4, r2);
+
 	return r;
 }
 
-- 
1.6.0.2

^ permalink raw reply related

* [PATCH 31/35] KVM: PPC: Implement level interrupts for BookE
From: Alexander Graf @ 2010-08-31  2:32 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Linuxppc-dev, KVM list
In-Reply-To: <1283221937-21006-1-git-send-email-agraf@suse.de>

BookE also wants to support level based interrupts, so let's implement
all the necessary logic there. We need to trick a bit here because the
irqprios are 1:1 assigned to architecture defined values. But since there
is some space left there, we can just pick a random one and move it later
on - it's internal anyways.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/booke.c |   17 +++++++++++++++--
 arch/powerpc/kvm/booke.h |    4 +++-
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 835f6d0..77575d0 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -131,13 +131,19 @@ void kvmppc_core_dequeue_dec(struct kvm_vcpu *vcpu)
 void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
                                 struct kvm_interrupt *irq)
 {
-	kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_EXTERNAL);
+	unsigned int prio = BOOKE_IRQPRIO_EXTERNAL;
+
+	if (irq->irq == KVM_INTERRUPT_SET_LEVEL)
+		prio = BOOKE_IRQPRIO_EXTERNAL_LEVEL;
+
+	kvmppc_booke_queue_irqprio(vcpu, prio);
 }
 
 void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu,
                                   struct kvm_interrupt *irq)
 {
 	clear_bit(BOOKE_IRQPRIO_EXTERNAL, &vcpu->arch.pending_exceptions);
+	clear_bit(BOOKE_IRQPRIO_EXTERNAL_LEVEL, &vcpu->arch.pending_exceptions);
 }
 
 /* Deliver the interrupt of the corresponding priority, if possible. */
@@ -150,6 +156,7 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu,
 	ulong crit_raw = vcpu->arch.shared->critical;
 	ulong crit_r1 = kvmppc_get_gpr(vcpu, 1);
 	bool crit;
+	bool keep_irq = false;
 
 	/* Truncate crit indicators in 32 bit mode */
 	if (!(vcpu->arch.shared->msr & MSR_SF)) {
@@ -162,6 +169,11 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu,
 	/* ... and we're in supervisor mode */
 	crit = crit && !(vcpu->arch.shared->msr & MSR_PR);
 
+	if (priority == BOOKE_IRQPRIO_EXTERNAL_LEVEL) {
+		priority = BOOKE_IRQPRIO_EXTERNAL;
+		keep_irq = true;
+	}
+
 	switch (priority) {
 	case BOOKE_IRQPRIO_DTLB_MISS:
 	case BOOKE_IRQPRIO_DATA_STORAGE:
@@ -214,7 +226,8 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu,
 			vcpu->arch.shared->dar = vcpu->arch.queued_dear;
 		kvmppc_set_msr(vcpu, vcpu->arch.shared->msr & msr_mask);
 
-		clear_bit(priority, &vcpu->arch.pending_exceptions);
+		if (!keep_irq)
+			clear_bit(priority, &vcpu->arch.pending_exceptions);
 	}
 
 	return allowed;
diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
index 88258ac..492bb70 100644
--- a/arch/powerpc/kvm/booke.h
+++ b/arch/powerpc/kvm/booke.h
@@ -46,7 +46,9 @@
 #define BOOKE_IRQPRIO_FIT 17
 #define BOOKE_IRQPRIO_DECREMENTER 18
 #define BOOKE_IRQPRIO_PERFORMANCE_MONITOR 19
-#define BOOKE_IRQPRIO_MAX 19
+/* Internal pseudo-irqprio for level triggered externals */
+#define BOOKE_IRQPRIO_EXTERNAL_LEVEL 20
+#define BOOKE_IRQPRIO_MAX 20
 
 extern unsigned long kvmppc_booke_handlers;
 
-- 
1.6.0.2

^ permalink raw reply related

* Re: [PATCH 1/3] PPC: s/mtmsrd/MTMSR in ldstfp.S
From: Sean MacLennan @ 2010-08-31  2:49 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Linuxppc-dev
In-Reply-To: <1283220922-20369-2-git-send-email-agraf@suse.de>

On Tue, 31 Aug 2010 04:15:20 +0200
Alexander Graf <agraf@suse.de> wrote:

> Commit 0016a4cf introduced a lot of mtmsrd's that were plainly
> written out. These failed to compile on my e500v2 system, so let's
> better use the macro that is around just for that purpose.

I sent in the same patch ;)

Cheers,
   Sean

^ permalink raw reply

* Re: [PATCH] powerpc: mtmsrd not defined
From: Paul Mackerras @ 2010-08-31  4:37 UTC (permalink / raw)
  To: Sean MacLennan; +Cc: linuxppc-dev
In-Reply-To: <20100822184844.37c30d5e@lappy.seanm.ca>

On Sun, Aug 22, 2010 at 06:48:44PM -0400, Sean MacLennan wrote:

> Replace the BOOK3S_64 specific mtmsrd with the generic MTMSRD macro.
> 
> Signed-off-by: Sean MacLennan <smaclennan@pikatech.com>

Acked-by: Paul Mackerras <paulus@samba.org>

^ permalink raw reply

* Re: [PATCH 1/3] PPC: s/mtmsrd/MTMSR in ldstfp.S
From: Kumar Gala @ 2010-08-31  4:37 UTC (permalink / raw)
  To: Sean MacLennan; +Cc: Linuxppc-dev, Alexander Graf
In-Reply-To: <20100830224932.21b5d7f2@lappy.seanm.ca>


On Aug 30, 2010, at 9:49 PM, Sean MacLennan wrote:

> On Tue, 31 Aug 2010 04:15:20 +0200
> Alexander Graf <agraf@suse.de> wrote:
>=20
>> Commit 0016a4cf introduced a lot of mtmsrd's that were plainly
>> written out. These failed to compile on my e500v2 system, so let's
>> better use the macro that is around just for that purpose.
>=20
> I sent in the same patch ;)

I'm surprised we are compiling this code on e500v2.  We are probably =
missing some CONFIG_PPC_FPU in there as well.

- k=

^ permalink raw reply

* [PATCH 1/2] powerpc/dma: Add optional platform override of dma_set_mask()
From: Benjamin Herrenschmidt @ 2010-08-31  5:23 UTC (permalink / raw)
  To: linuxppc-dev

Some platforms may want to override dma_set_mask() to take into
account some specific "features" such as the availability of
a direct-map window in addition to an iommu.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/dma-mapping.h |   14 +-------------
 arch/powerpc/include/asm/machdep.h     |    3 +++
 arch/powerpc/kernel/dma.c              |   18 ++++++++++++++++++
 3 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index c85ef23..ce6e78c 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -127,19 +127,7 @@ static inline int dma_supported(struct device *dev, u64 mask)
 	return dma_ops->dma_supported(dev, mask);
 }
 
-static inline int dma_set_mask(struct device *dev, u64 dma_mask)
-{
-	struct dma_map_ops *dma_ops = get_dma_ops(dev);
-
-	if (unlikely(dma_ops == NULL))
-		return -EIO;
-	if (dma_ops->set_dma_mask != NULL)
-		return dma_ops->set_dma_mask(dev, dma_mask);
-	if (!dev->dma_mask || !dma_supported(dev, dma_mask))
-		return -EIO;
-	*dev->dma_mask = dma_mask;
-	return 0;
-}
+extern int dma_set_mask(struct device *dev, u64 dma_mask);
 
 static inline void *dma_alloc_coherent(struct device *dev, size_t size,
 				       dma_addr_t *dma_handle, gfp_t flag)
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 9f0fc9e..41582c3 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -102,6 +102,9 @@ struct machdep_calls {
 	void		(*pci_dma_dev_setup)(struct pci_dev *dev);
 	void		(*pci_dma_bus_setup)(struct pci_bus *bus);
 
+	/* Platform set_dma_mask override */
+	int		(*dma_set_mask)(struct device *dev, u64 dma_mask);
+
 	int		(*probe)(void);
 	void		(*setup_arch)(void); /* Optional, may be NULL */
 	void		(*init_early)(void);
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 84d6367..f368c07 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -12,6 +12,7 @@
 #include <linux/memblock.h>
 #include <asm/bug.h>
 #include <asm/abs_addr.h>
+#include <asm/machdep.h>
 
 /*
  * Generic direct DMA implementation
@@ -154,6 +155,23 @@ EXPORT_SYMBOL(dma_direct_ops);
 
 #define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
 
+int dma_set_mask(struct device *dev, u64 dma_mask)
+{
+	struct dma_map_ops *dma_ops = get_dma_ops(dev);
+
+	if (ppc_md.dma_set_mask)
+		return ppc_md.dma_set_mask(dev, dma_mask);
+	if (unlikely(dma_ops == NULL))
+		return -EIO;
+	if (dma_ops->set_dma_mask != NULL)
+		return dma_ops->set_dma_mask(dev, dma_mask);
+	if (!dev->dma_mask || !dma_supported(dev, dma_mask))
+		return -EIO;
+	*dev->dma_mask = dma_mask;
+	return 0;
+}
+EXPORT_SYMBOL(dma_set_mask);
+
 static int __init dma_init(void)
 {
        dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);

^ permalink raw reply related

* [PATCH 2/2] powerpc/dart_iommu: Support for 64-bit iommu bypass window on PCIe
From: Benjamin Herrenschmidt @ 2010-08-31  5:24 UTC (permalink / raw)
  To: linuxppc-dev

The PCI-Express bus off the U4/CPC945 bridge supports direct DMA to
all of memory, bypassing the DART iommu, for 64-bit capable devices.

This adds support for it on Bimini and Apple Quad G5's in order to
improve DMA performances of cards using that slot (the x16 graphics
slot). Tested with an Intel ixgbe 10GE card.
---
 arch/powerpc/sysdev/dart_iommu.c |   74 ++++++++++++++++++++++++++++++++-----
 1 files changed, 64 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
index 559db2b..17cf15e 100644
--- a/arch/powerpc/sysdev/dart_iommu.c
+++ b/arch/powerpc/sysdev/dart_iommu.c
@@ -70,6 +70,8 @@ static int iommu_table_dart_inited;
 static int dart_dirty;
 static int dart_is_u4;
 
+#define DART_U4_BYPASS_BASE	0x8000000000ull
+
 #define DBG(...)
 
 static inline void dart_tlb_invalidate_all(void)
@@ -292,12 +294,20 @@ static void iommu_table_dart_setup(void)
 	set_bit(iommu_table_dart.it_size - 1, iommu_table_dart.it_map);
 }
 
-static void pci_dma_dev_setup_dart(struct pci_dev *dev)
+static void dma_dev_setup_dart(struct device *dev)
 {
 	/* We only have one iommu table on the mac for now, which makes
 	 * things simple. Setup all PCI devices to point to this table
 	 */
-	set_iommu_table_base(&dev->dev, &iommu_table_dart);
+	if (get_dma_ops(dev) == &dma_direct_ops)
+		set_dma_offset(dev, DART_U4_BYPASS_BASE);
+	else
+		set_iommu_table_base(dev, &iommu_table_dart);
+}
+
+static void pci_dma_dev_setup_dart(struct pci_dev *dev)
+{
+	dma_dev_setup_dart(&dev->dev);
 }
 
 static void pci_dma_bus_setup_dart(struct pci_bus *bus)
@@ -315,6 +325,45 @@ static void pci_dma_bus_setup_dart(struct pci_bus *bus)
 		PCI_DN(dn)->iommu_table = &iommu_table_dart;
 }
 
+static bool dart_device_on_pcie(struct device *dev)
+{
+	struct device_node *np = of_node_get(dev->of_node);
+
+	while(np) {
+		if (of_device_is_compatible(np, "U4-pcie") ||
+		    of_device_is_compatible(np, "u4-pcie")) {
+			of_node_put(np);
+			return true;
+		}
+		np = of_get_next_parent(np);
+	}
+	return false;
+}
+
+static int dart_dma_set_mask(struct device *dev, u64 dma_mask)
+{
+	if (!dev->dma_mask || !dma_supported(dev, dma_mask))
+		return -EIO;
+
+	/* U4 supports a DART bypass, we use it for 64-bit capable
+	 * devices to improve performances. However, that only works
+	 * for devices connected to U4 own PCIe interface, not bridged
+	 * through hypertransport. We need the device to support at
+	 * least 40 bits of addresses.
+	 */
+	if (dart_device_on_pcie(dev) && dma_mask >= DMA_BIT_MASK(40)) {
+		dev_info(dev, "Using 64-bit DMA iommu bypass\n");
+		set_dma_ops(dev, &dma_direct_ops);
+	} else {
+		dev_info(dev, "Using 32-bit DMA via iommu\n");
+		set_dma_ops(dev, &dma_iommu_ops);
+	}
+	dma_dev_setup_dart(dev);
+
+	*dev->dma_mask = dma_mask;
+	return 0;
+}
+
 void __init iommu_init_early_dart(void)
 {
 	struct device_node *dn;
@@ -328,20 +377,25 @@ void __init iommu_init_early_dart(void)
 		dart_is_u4 = 1;
 	}
 
+	/* Initialize the DART HW */
+	if (dart_init(dn) != 0)
+		goto bail;
+
 	/* Setup low level TCE operations for the core IOMMU code */
 	ppc_md.tce_build = dart_build;
 	ppc_md.tce_free  = dart_free;
 	ppc_md.tce_flush = dart_flush;
 
-	/* Initialize the DART HW */
-	if (dart_init(dn) == 0) {
-		ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_dart;
-		ppc_md.pci_dma_bus_setup = pci_dma_bus_setup_dart;
+	/* Setup bypass if supported */
+	if (dart_is_u4)
+		ppc_md.dma_set_mask = dart_dma_set_mask;
 
-		/* Setup pci_dma ops */
-		set_pci_dma_ops(&dma_iommu_ops);
-		return;
-	}
+	ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_dart;
+	ppc_md.pci_dma_bus_setup = pci_dma_bus_setup_dart;
+
+	/* Setup pci_dma ops */
+	set_pci_dma_ops(&dma_iommu_ops);
+	return;
 
  bail:
 	/* If init failed, use direct iommu and null setup functions */

^ permalink raw reply related

* [git pull] Please pull powerpc.git merge branch
From: Benjamin Herrenschmidt @ 2010-08-31  5:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linuxppc-dev list, Andrew Morton, Linux Kernel list

Hi Linus !

Here's a few small fixes, one is an important fix for a nasty regression
breaking pseries machine running under hypervisor... oops !

Cheers,
Ben.

The following changes since commit 2bfc96a127bc1cc94d26bfaa40159966064f9c8c:
  Linus Torvalds (1):
        Linux 2.6.36-rc3

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge

Matthew McClintock (1):
      powerpc/kexec: Adds correct calling convention for kexec purgatory

Michael Neuling (1):
      powerpc: Don't use kernel stack with translation off

Paul Mackerras (1):
      powerpc/perf_event: Reduce latency of calling perf_event_do_pending

 arch/powerpc/kernel/head_64.S |   12 +++++++++---
 arch/powerpc/kernel/misc_32.S |    3 +++
 arch/powerpc/kernel/time.c    |   23 +++++++++++------------
 3 files changed, 23 insertions(+), 15 deletions(-)

^ permalink raw reply

* Re: [PATCH] of_mmc_spi: add card detect irq support
From: Esben Haabendal @ 2010-08-31  6:14 UTC (permalink / raw)
  To: Grant Likely; +Cc: linuxppc-dev, Andrew Morton, David Brownell, linux-mmc
In-Reply-To: <AANLkTinhctfE_BTF3xtACMoqrtRiSNApQ3+qCaOJ=w7m@mail.gmail.com>

On Mon, Aug 30, 2010 at 7:46 PM, Grant Likely <grant.likely@secretlab.ca> w=
rote:
> On Mon, Aug 30, 2010 at 10:04 AM, Esben Haabendal
>> Hopefully, arm users will soon enjoy this driver/wrapper soon also.
>
> My hope is that even on ARM when the device tree is used I'll be able
> eliminate IRQs mapped to 0 (but I need to do a lot more research on
> how best to do that though). =A0Are you actually using this on ARM?

No.

> I'm okay with you keeping the NO_IRQ test for the short term though.

Ok.  I personally don't see a big problem with dropping it either, as
I don't actually
use it on ARM. But until at least ARM has NO_IRQ=3D=3D0, I just thought it =
would be
the right thing to try to be compatible.

/Esben

^ permalink raw reply

* Re: [PATCH 13/26] KVM: PPC: Add feature bitmap for magic page
From: Avi Kivity @ 2010-08-31  6:28 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <FAAE3BB9-E58E-4319-903A-B6C25D88D0E3@suse.de>

  On 08/31/2010 03:56 AM, Alexander Graf wrote:
> On 22.08.2010, at 18:42, Avi Kivity wrote:
>
>> On 08/17/2010 04:57 PM, Alexander Graf wrote:
>>> We will soon add SR PV support to the shared page, so we need some
>>> infrastructure that allows the guest to query for features KVM exports.
>>>
>>> This patch adds a second return value to the magic mapping that
>>> indicated to the guest which features are available.
>>>
>> You need to make that feature controllable from userspace, to allow new->old save/restore.
> Good one :). We're still missing too much stuff to even run without losing interrupts yet and you're thinking about new->old save/restore. Who'd want to migrate onto a system that's broken anyways? Besides - we're missing too many register values from the kernel side to even be able to perform a migration.
>
> I'm planning to add migration, probably after SMP. But that will be another CAP and anything before that won't be able to save/restore.

I'm thinking stability and basic functionality and your running around 
adding features (or new archs, depending on mood).  But I agree, this 
can wait until after SMP.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply

* Re: [PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries
From: Darren Hart @ 2010-08-31  7:12 UTC (permalink / raw)
  To: Ankita Garg
  Cc: Stephen Rothwell, Gautham R Shenoy, Josh Triplett, Steven Rostedt,
	linuxppc-dev, Will Schmidt, Paul Mackerras, Thomas Gleixner
In-Reply-To: <20100819155824.GD2690@in.ibm.com>

On 08/19/2010 08:58 AM, Ankita Garg wrote:
> Hi Darren,
> 
> On Thu, Jul 22, 2010 at 11:24:13AM -0700, Darren Hart wrote:
>>
>> With some instrumentation we were able to determine that the
>> preempt_count() appears to change across the extended_cede_processor()
>> call.  Specifically across the plpar_hcall_norets(H_CEDE) call. On
>> PREEMPT_RT we call this with preempt_count=1 and return with
>> preempt_count=0xffffffff. On mainline with CONFIG_PREEMPT=y, the value
>> is different (0x65) but is still incorrect.
> 
> I was trying to reproduce this issue on a 2.6.33.7-rt29 kernel. I could
> easily reproduce this on the RT kernel and not the non-RT kernel.

This matches my experience.

> However, I hit it every single time I did a cpu online operation. I also
> noticed that the issue persists even when I disable H_CEDE by passing
> the "cede_offline=0" kernel commandline parameter. Could you pl confirm
> if you observe the same in your setup ?

Yes, this is my experience as well.

> 
> However, the issue still remains. Will spend few cycles looking into
> this issue.
> 

I've spent today trying to collect some useful traces. I consistently
see the preempt_count() change to 0xffffffff across the H_CEDE call, but
the irq and sched events (to ftrace) do not indicate anything else
running on the CPU that could change that.

  <idle>-0       1d.h1. 11408us : irq_handler_entry: irq=16 name=IPI
  <idle>-0       1d.h1. 11411us : irq_handler_exit: irq=16 ret=handled
  ...            <other cpus>
  <idle>-0       1d.... 22101us : .pseries_mach_cpu_die: start
  <idle>-0       1d.... 22108us : .pseries_mach_cpu_die: cpu 1 (hwid 1) ceding for offline with hint 2
  ...            <other cpus>
  <idle>-0       1d.Hff. 33804us : .pseries_mach_cpu_die: returned from cede with pcnt: ffffffff
  <idle>-0       1d.Hff. 33805us : .pseries_mach_cpu_die: forcing pcnt to 0
  <idle>-0       1d.... 33807us : .pseries_mach_cpu_die: cpu 1 (hwid 1) returned from cede.
  <idle>-0       1d.... 33808us : .pseries_mach_cpu_die: Decrementer value = 7fa49474 Timebase value = baefee6c88113
  <idle>-0       1d.... 33809us : .pseries_mach_cpu_die: cpu 1 (hwid 1) got prodded to go online
  <idle>-0       1d.... 33816us : .pseries_mach_cpu_die: calling start_seconday, pcnt: 0
  <idle>-0       1d.... 33816us : .pseries_mach_cpu_die: forcing pcnt to 0

---------------------------------
Modules linked in: autofs4 binfmt_misc dm_mirror dm_region_hash dm_log [last unloaded: scsi_wait_scan]
NIP: c00000000006aa50 LR: c00000000006ac40 CTR: c00000000006ac14
REGS: c00000008e79ffc0 TRAP: 0300   Not tainted  (2.6.33-rt-dvhrt16-02358-g61223dd-dirty)
MSR: 8000000000001032 <ME,IR,DR>  CR: 28000022  XER: 00000000
DAR: c000018000ba44c0, DSISR: 0000000040000000
TASK = c00000010e6de040[0] 'swapper' THREAD: c00000008e7a0000 CPU: 1
GPR00: 0000018000000040 c00000008e7a0240 c000000000b55008 c00000010e6de040
GPR04: c000000085d800c0 0000000000000000 0000000000000000 000000000000000f
GPR08: 0000018000000000 c00000008e7a0000 c000000000ba4480 c000000000a32c80
GPR12: 8000000000009032 c000000000ba4680 c00000008e7a08f0 0000000000000001
GPR16: fffffffffffff2fa c00000010e6de040 0000000000000000 7fffffffffffffff
GPR20: 0000000000000000 0000000000000001 0000000000000001 c000000000f42c80
GPR24: c0000000850b7638 0000000000000000 0000000000000000 0000000000000001
GPR28: c000000000f42c80 c00000010e6de040 c000000000ad7698 c00000008e7a0240
NIP [c00000000006aa50] .resched_task+0x48/0xd8
LR [c00000000006ac40] .check_preempt_curr_idle+0x2c/0x44
Call Trace:
Instruction dump:
f821ff71 7c3f0b78 ebc2ab28 7c7d1b78 60000000 60000000 e95e8008 e97e8000
e93d0008 81090010 79084da4 38080040 <7d4a002a> 7c0b502e 7c000074 7800d182
---[ end trace 6d3f1cddaa17382c ]---
Kernel panic - not syncing: Attempted to kill the idle task!
Call Trace:
Rebooting in 180 seconds..



When running with the function plugin I had to stop the trace
immediately before entering start_secondary after an online or my traces
would not include the pseries_mach_cpu_die function, nor the tracing I
added there (possibly buffer size, I am using 2048). The following trace
was collected using "trace-cmd record -p function -e irq -e sched" and
has been filtered to only show CPU [001] (the CPU undergoing the
offline/online test, and the one seeing preempt_count (pcnt) go to
ffffffff after cede. The function tracer does not indicate anything
running on the CPU other than the HCALL - unless the __trace_hcall*
commands might be to blame. Can a POWER guru comment?

          <idle>-0     [001]   417.625286: function:             .cpu_die
          <idle>-0     [001]   417.625287: function:                .pseries_mach_cpu_die
          <idle>-0     [001]   417.625290: bprint:               .pseries_mach_cpu_die : start
          <idle>-0     [001]   417.625291: function:                   .idle_task_exit
          <idle>-0     [001]   417.625292: function:                      .switch_slb
          <idle>-0     [001]   417.625294: function:                   .xics_teardown_cpu
          <idle>-0     [001]   417.625294: function:                      .xics_set_cpu_priority
          <idle>-0     [001]   417.625295: function:             .__trace_hcall_entry
          <idle>-0     [001]   417.625296: function:                .probe_hcall_entry
          <idle>-0     [001]   417.625297: function:             .__trace_hcall_exit
          <idle>-0     [001]   417.625298: function:                .probe_hcall_exit
          <idle>-0     [001]   417.625299: function:             .__trace_hcall_entry
          <idle>-0     [001]   417.625300: function:                .probe_hcall_entry
          <idle>-0     [001]   417.625301: function:             .__trace_hcall_exit
          <idle>-0     [001]   417.625302: function:                .probe_hcall_exit
          <idle>-0     [001]   417.625305: bprint:               .pseries_mach_cpu_die : cpu 1 (hwid 1) ceding for offline with hint 2
          <idle>-0     [001]   417.625307: bprint:               .pseries_mach_cpu_die : calling extended cede, pcnt: 0
          <idle>-0     [001]   417.625308: function:             .__trace_hcall_entry
          <idle>-0     [001]   417.625308: function:                .probe_hcall_entry
          <idle>-0     [001]   417.837734: function:             .__trace_hcall_exit
          <idle>-0     [001]   417.837736: function:                .probe_hcall_exit
          <idle>-0     [001]   417.837740: bprint:               .pseries_mach_cpu_die : returned from cede with pcnt: ffffffff
          <idle>-0     [001]   417.837742: bprint:               .pseries_mach_cpu_die : forcing pcnt to 0
          <idle>-0     [001]   417.837743: bprint:               .pseries_mach_cpu_die : cpu 1 (hwid 1) returned from cede.
          <idle>-0     [001]   417.837745: bprint:               .pseries_mach_cpu_die : Decrementer value = 79844cbf Timebase value = bb3cf7ab2771c
          <idle>-0     [001]   417.837747: bprint:               .pseries_mach_cpu_die : cpu 1 (hwid 1) got prodded to go online
          <idle>-0     [001]   417.837749: function:             .__trace_hcall_entry
          <idle>-0     [001]   417.837749: function:                .probe_hcall_entry
          <idle>-0     [001]   417.837755: function:             .__trace_hcall_exit
          <idle>-0     [001]   417.837755: function:                .probe_hcall_exit
          <idle>-0     [001]   417.837757: bprint:               .pseries_mach_cpu_die : calling start_seconday, pcnt: 0
          <idle>-0     [001]   417.837758: bprint:               .pseries_mach_cpu_die : forcing pcnt to 0



I replaced the extended_cede_processor() call with mdelay(2). When I do
that, I don't see the preempt_count() change across the mdelay(2) call.
However, the system still eventually crashes in sub_preempt_count().
This appears to be independent of if preempt_count changes across
CEDE (since it doesn't occur across mdelay). I will try with larger
values of mdelay tomorrow to see if a longer delay will experience the
preempt_count value change.

Badness at kernel/sched.c:3726
NIP: c000000000698684 LR: c000000000698668 CTR: c00000000007a89c
REGS: c00000008e7a0170 TRAP: 0700   Not tainted  (2.6.33-rt-dvhrt16-02358-g61223dd-dirty)
MSR: 8000000000021032 <ME,CE,IR,DR>  CR: 28000022  XER: 00000000
TASK = c00000010e6de040[0] 'swapper' THREAD: c00000008e7a0000 CPU: 1
GPR00: 0000000000000000 c00000008e7a03f0 c000000000b55008 0000000000000001
GPR04: 0000000000000000 0000000000000000 0000000000000000 000000000000000f
GPR08: 0000000000000200 c000000000ca5420 0000000000000001 c000000000bc22f0
GPR12: 8000000000009032 c000000000ba4680 c00000008e7a0a10 0000000000000001
GPR16: fffffffffffff4d1 c00000010e6de040 0000000000000000 7fffffffffffffff
GPR20: 0000000000000000 0000000000000001 0000000000000001 c000000000f42c80
GPR24: 0000000000000001 0000000000000000 0000000000000000 0000000000000001
GPR28: c000000000f42c80 0000000000000001 c000000000ad7698 c00000008e7a03f0
NIP [c000000000698684] .sub_preempt_count+0xa8/0xdc
LR [c000000000698668] .sub_preempt_count+0x8c/0xdc
Call Trace:
Instruction dump:
41fd0038 78000620 2fa00000 40fe002c 4bd08a61 60000000 2fa30000 419e002c
e93e87e8 80090000 2f800000 409e001c <0fe00000> 48000014 78290464 80090014
Unable to handle kernel paging request for data at address 0xc000018000ba44c0
Faulting instruction address: 0xc00000000006aa1c
Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT SMP NR_CPUS=128 NUMA pSeries
last sysfs file:line



-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply

* Re: [PATCH] gpiolib: Add 'struct gpio_chip' forward declaration for !GPIOLIB case
From: Grant Likely @ 2010-08-31  8:03 UTC (permalink / raw)
  To: Anton Vorontsov
  Cc: David Brownell, Andrew Morton, Benjamin Herrenschmidt,
	linuxppc-dev, linux-kernel
In-Reply-To: <20100824092623.GA20334@oksana.dev.rtsoft.ru>

On Tue, Aug 24, 2010 at 01:26:23PM +0400, Anton Vorontsov wrote:
> With CONFIG_GPIOLIB=n, the 'struct gpio_chip' is not declared,
> so the following pops up on PowerPC:
> 
>   cc1: warnings being treated as errors
>   In file included from arch/powerpc/platforms/52xx/mpc52xx_common.c:19:
>   include/linux/of_gpio.h:74: warning: 'struct gpio_chip' declared
>                               inside parameter list
>   include/linux/of_gpio.h:74: warning: its scope is only this definition
>                               or declaration, which is probably not what
> 			      you want
>   include/linux/of_gpio.h:75: warning: 'struct gpio_chip' declared
>                               inside parameter list
>   make[2]: *** [arch/powerpc/platforms/52xx/mpc52xx_common.o] Error 1
> 
> This patch fixes the issue by providing the proper forward declaration.
> 
> Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>

This doesn't actually solve the problem, and gpiochip should remain undefined when CONFIG_GPIOLIB=n to catch exactly these build failures.  The real problem is that I merged a change into the mpc5200 code that required CONFIG_GPIOLIB be enabled without reflecting that requirement in Kconfig.

g.

> ---
> 
> On Tue, Aug 24, 2010 at 04:26:08PM +1000, Benjamin Herrenschmidt wrote:
> > I get that with my current stuff:
> > 
> > cc1: warnings being treated as errors
> > In file included from [..]/mpc52xx_common.c:19:
> > of_gpio.h:74: error: ‘struct gpio_chip’ declared inside parameter list
> [...]
> > make[3]: *** Waiting for unfinished jobs....
> 
> That's because with GPIOCHIP=n no one declares struct gpio_chip.
> 
> It should be either of_gpio.h or gpio.h. Let's make it gpio.h, as
> this feels more generic, and we already have some !GPIOLIB handling
> in there.
> 
>  include/linux/gpio.h |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/gpio.h b/include/linux/gpio.h
> index 03f616b..85207d2 100644
> --- a/include/linux/gpio.h
> +++ b/include/linux/gpio.h
> @@ -15,6 +15,12 @@
>  struct device;
>  
>  /*
> + * Some code might rely on the declaration. Still, it is illegal
> + * to dereference it for !GPIOLIB case.
> + */
> +struct gpio_chip;
> +
> +/*
>   * Some platforms don't support the GPIO programming interface.
>   *
>   * In case some driver uses it anyway (it should normally have
> -- 
> 1.7.0.5
> 

^ permalink raw reply

* Re: [PATCH] gpiolib: Add 'struct gpio_chip' forward declaration for !GPIOLIB case
From: Anton Vorontsov @ 2010-08-31  8:37 UTC (permalink / raw)
  To: Grant Likely
  Cc: David Brownell, Andrew Morton, Benjamin Herrenschmidt,
	linuxppc-dev, linux-kernel
In-Reply-To: <20100831080344.GA20515@angua.secretlab.ca>

On Tue, Aug 31, 2010 at 02:03:44AM -0600, Grant Likely wrote:
> On Tue, Aug 24, 2010 at 01:26:23PM +0400, Anton Vorontsov wrote:
> > With CONFIG_GPIOLIB=n, the 'struct gpio_chip' is not declared,
> > so the following pops up on PowerPC:
> > 
> >   cc1: warnings being treated as errors
> >   In file included from arch/powerpc/platforms/52xx/mpc52xx_common.c:19:
> >   include/linux/of_gpio.h:74: warning: 'struct gpio_chip' declared
> >                               inside parameter list
> >   include/linux/of_gpio.h:74: warning: its scope is only this definition
> >                               or declaration, which is probably not what
> > 			      you want
> >   include/linux/of_gpio.h:75: warning: 'struct gpio_chip' declared
> >                               inside parameter list
> >   make[2]: *** [arch/powerpc/platforms/52xx/mpc52xx_common.o] Error 1
> > 
> > This patch fixes the issue by providing the proper forward declaration.
> > 
> > Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
> 
> This doesn't actually solve the problem, and gpiochip should 
> remain undefined when CONFIG_GPIOLIB=n to catch exactly these
> build failures.  The real problem is that I merged a change
> into the mpc5200 code that required CONFIG_GPIOLIB be enabled
> without reflecting that requirement in Kconfig.

No, look closer. The error is in of_gpio.h, and it's perfectly
fine to include it w/o GPIOLIB=y.

> > ---
> > 
> > On Tue, Aug 24, 2010 at 04:26:08PM +1000, Benjamin Herrenschmidt wrote:
> > > I get that with my current stuff:
> > > 
> > > cc1: warnings being treated as errors
> > > In file included from [..]/mpc52xx_common.c:19:
> > > of_gpio.h:74: error: ‘struct gpio_chip’ declared inside parameter list
> > [...]
> > > make[3]: *** Waiting for unfinished jobs....
> > 
> > That's because with GPIOCHIP=n no one declares struct gpio_chip.
> > 
> > It should be either of_gpio.h or gpio.h. Let's make it gpio.h, as
> > this feels more generic, and we already have some !GPIOLIB handling
> > in there.
> > 
> >  include/linux/gpio.h |    6 ++++++
> >  1 files changed, 6 insertions(+), 0 deletions(-)
> > 
> > diff --git a/include/linux/gpio.h b/include/linux/gpio.h
> > index 03f616b..85207d2 100644
> > --- a/include/linux/gpio.h
> > +++ b/include/linux/gpio.h
> > @@ -15,6 +15,12 @@
> >  struct device;
> >  
> >  /*
> > + * Some code might rely on the declaration. Still, it is illegal
> > + * to dereference it for !GPIOLIB case.
> > + */
> > +struct gpio_chip;
> > +
> > +/*
> >   * Some platforms don't support the GPIO programming interface.
> >   *
> >   * In case some driver uses it anyway (it should normally have
> > -- 
> > 1.7.0.5
> > 

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply

* [PATCH] powerpc/512x: fix clk_get() return value
From: Akinobu Mita @ 2010-08-31  8:43 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Akinobu Mita

clk_get() should return an ERR_PTR value on error, not NULL.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/platforms/512x/clock.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/512x/clock.c b/arch/powerpc/platforms/512x/clock.c
index 5b243bd..3dc2a8d 100644
--- a/arch/powerpc/platforms/512x/clock.c
+++ b/arch/powerpc/platforms/512x/clock.c
@@ -57,7 +57,7 @@ static struct clk *mpc5121_clk_get(struct device *dev, const char *id)
 	int id_match = 0;
 
 	if (dev == NULL || id == NULL)
-		return NULL;
+		return clk;
 
 	mutex_lock(&clocks_mutex);
 	list_for_each_entry(p, &clocks, node) {
-- 
1.7.1.231.gd0b16

^ permalink raw reply related

* [PATCH 2/2 v2] powerpc, pseries: Re-enable dispatch trace log userspace interface
From: Paul Mackerras @ 2010-08-31 11:59 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <20100827055732.GB19033@brick.ozlabs.ibm.com>

Since the cpu accounting code uses the hypervisor dispatch trace log
now when CONFIG_VIRT_CPU_ACCOUNTING = y, the previous commit disabled
access to it via files in the /sys/kernel/debug/powerpc/dtl/ directory
in that case.  This restores those files.

To do this, we now have a hook that the cpu accounting code will call
as it processes each entry from the hypervisor dispatch trace log.
The code in dtl.c now uses that to fill up its ring buffer, rather
than having the hypervisor fill the ring buffer directly.

This also fixes dtl_file_read() to handle overflow conditions a bit
better and adds a spinlock to ensure that race conditions (multiple
processes opening or reading the file concurrently) are handled
correctly.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
Removed extraneous increment of dtl->last_idx in this version.
Patch 1/1 of this series is unchanged.

 arch/powerpc/include/asm/lppaca.h    |    8 ++
 arch/powerpc/kernel/time.c           |    6 +-
 arch/powerpc/platforms/pseries/dtl.c |  207 +++++++++++++++++++++++++++-------
 3 files changed, 180 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
index cfb85ec..7f5e0fe 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -191,6 +191,14 @@ struct dtl_entry {
 #define DISPATCH_LOG_BYTES	4096	/* bytes per cpu */
 #define N_DISPATCH_LOG		(DISPATCH_LOG_BYTES / sizeof(struct dtl_entry))
 
+/*
+ * When CONFIG_VIRT_CPU_ACCOUNTING = y, the cpu accounting code controls
+ * reading from the dispatch trace log.  If other code wants to consume
+ * DTL entries, it can set this pointer to a function that will get
+ * called once for each DTL entry that gets processed.
+ */
+extern void (*dtl_consumer)(struct dtl_entry *entry, u64 index);
+
 #endif /* CONFIG_PPC_BOOK3S */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_LPPACA_H */
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index fca2064..bcb738b 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -183,6 +183,8 @@ DEFINE_PER_CPU(unsigned long, cputime_scaled_last_delta);
 
 cputime_t cputime_one_jiffy;
 
+void (*dtl_consumer)(struct dtl_entry *, u64);
+
 static void calc_cputime_factors(void)
 {
 	struct div_result res;
@@ -218,7 +220,7 @@ static u64 read_spurr(u64 tb)
  */
 static u64 scan_dispatch_log(u64 stop_tb)
 {
-	unsigned long i = local_paca->dtl_ridx;
+	u64 i = local_paca->dtl_ridx;
 	struct dtl_entry *dtl = local_paca->dtl_curr;
 	struct dtl_entry *dtl_end = local_paca->dispatch_log_end;
 	struct lppaca *vpa = local_paca->lppaca_ptr;
@@ -229,6 +231,8 @@ static u64 scan_dispatch_log(u64 stop_tb)
 	if (i == vpa->dtl_idx)
 		return 0;
 	while (i < vpa->dtl_idx) {
+		if (dtl_consumer)
+			dtl_consumer(dtl, i);
 		dtb = dtl->timebase;
 		tb_delta = dtl->enqueue_to_dispatch_time +
 			dtl->ready_to_enqueue_time;
diff --git a/arch/powerpc/platforms/pseries/dtl.c b/arch/powerpc/platforms/pseries/dtl.c
index 0357655..68cb2f2 100644
--- a/arch/powerpc/platforms/pseries/dtl.c
+++ b/arch/powerpc/platforms/pseries/dtl.c
@@ -23,6 +23,7 @@
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/debugfs.h>
+#include <linux/spinlock.h>
 #include <asm/smp.h>
 #include <asm/system.h>
 #include <asm/uaccess.h>
@@ -37,6 +38,7 @@ struct dtl {
 	int			cpu;
 	int			buf_entries;
 	u64			last_idx;
+	spinlock_t		lock;
 };
 static DEFINE_PER_CPU(struct dtl, cpu_dtl);
 
@@ -55,25 +57,97 @@ static u8 dtl_event_mask = 0x7;
 static int dtl_buf_entries = (16 * 85);
 
 
-static int dtl_enable(struct dtl *dtl)
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING
+struct dtl_ring {
+	u64	write_index;
+	struct dtl_entry *write_ptr;
+	struct dtl_entry *buf;
+	struct dtl_entry *buf_end;
+	u8	saved_dtl_mask;
+};
+
+static DEFINE_PER_CPU(struct dtl_ring, dtl_rings);
+
+static atomic_t dtl_count;
+
+/*
+ * The cpu accounting code controls the DTL ring buffer, and we get
+ * given entries as they are processed.
+ */
+static void consume_dtle(struct dtl_entry *dtle, u64 index)
 {
-	unsigned long addr;
-	int ret, hwcpu;
+	struct dtl_ring *dtlr = &__get_cpu_var(dtl_rings);
+	struct dtl_entry *wp = dtlr->write_ptr;
+	struct lppaca *vpa = local_paca->lppaca_ptr;
 
-	/* only allow one reader */
-	if (dtl->buf)
-		return -EBUSY;
+	if (!wp)
+		return;
 
-	/* we need to store the original allocation size for use during read */
-	dtl->buf_entries = dtl_buf_entries;
+	*wp = *dtle;
+	barrier();
 
-	dtl->buf = kmalloc_node(dtl->buf_entries * sizeof(struct dtl_entry),
-			GFP_KERNEL, cpu_to_node(dtl->cpu));
-	if (!dtl->buf) {
-		printk(KERN_WARNING "%s: buffer alloc failed for cpu %d\n",
-				__func__, dtl->cpu);
-		return -ENOMEM;
-	}
+	/* check for hypervisor ring buffer overflow, ignore this entry if so */
+	if (index + N_DISPATCH_LOG < vpa->dtl_idx)
+		return;
+
+	++wp;
+	if (wp == dtlr->buf_end)
+		wp = dtlr->buf;
+	dtlr->write_ptr = wp;
+
+	/* incrementing write_index makes the new entry visible */
+	smp_wmb();
+	++dtlr->write_index;
+}
+
+static int dtl_start(struct dtl *dtl)
+{
+	struct dtl_ring *dtlr = &per_cpu(dtl_rings, dtl->cpu);
+
+	dtlr->buf = dtl->buf;
+	dtlr->buf_end = dtl->buf + dtl->buf_entries;
+	dtlr->write_index = 0;
+
+	/* setting write_ptr enables logging into our buffer */
+	smp_wmb();
+	dtlr->write_ptr = dtl->buf;
+
+	/* enable event logging */
+	dtlr->saved_dtl_mask = lppaca_of(dtl->cpu).dtl_enable_mask;
+	lppaca_of(dtl->cpu).dtl_enable_mask |= dtl_event_mask;
+
+	dtl_consumer = consume_dtle;
+	atomic_inc(&dtl_count);
+	return 0;
+}
+
+static void dtl_stop(struct dtl *dtl)
+{
+	struct dtl_ring *dtlr = &per_cpu(dtl_rings, dtl->cpu);
+
+	dtlr->write_ptr = NULL;
+	smp_wmb();
+
+	dtlr->buf = NULL;
+
+	/* restore dtl_enable_mask */
+	lppaca_of(dtl->cpu).dtl_enable_mask = dtlr->saved_dtl_mask;
+
+	if (atomic_dec_and_test(&dtl_count))
+		dtl_consumer = NULL;
+}
+
+static u64 dtl_current_index(struct dtl *dtl)
+{
+	return per_cpu(dtl_rings, dtl->cpu).write_index;
+}
+
+#else /* CONFIG_VIRT_CPU_ACCOUNTING */
+
+static int dtl_start(struct dtl *dtl)
+{
+	unsigned long addr;
+	int ret, hwcpu;
 
 	/* Register our dtl buffer with the hypervisor. The HV expects the
 	 * buffer size to be passed in the second word of the buffer */
@@ -85,12 +159,11 @@ static int dtl_enable(struct dtl *dtl)
 	if (ret) {
 		printk(KERN_WARNING "%s: DTL registration for cpu %d (hw %d) "
 		       "failed with %d\n", __func__, dtl->cpu, hwcpu, ret);
-		kfree(dtl->buf);
 		return -EIO;
 	}
 
 	/* set our initial buffer indices */
-	dtl->last_idx = lppaca_of(dtl->cpu).dtl_idx = 0;
+	lppaca_of(dtl->cpu).dtl_idx = 0;
 
 	/* ensure that our updates to the lppaca fields have occurred before
 	 * we actually enable the logging */
@@ -102,17 +175,66 @@ static int dtl_enable(struct dtl *dtl)
 	return 0;
 }
 
-static void dtl_disable(struct dtl *dtl)
+static void dtl_stop(struct dtl *dtl)
 {
 	int hwcpu = get_hard_smp_processor_id(dtl->cpu);
 
 	lppaca_of(dtl->cpu).dtl_enable_mask = 0x0;
 
 	unregister_dtl(hwcpu, __pa(dtl->buf));
+}
+
+static u64 dtl_current_index(struct dtl *dtl)
+{
+	return lppaca_of(dtl->cpu).dtl_idx;
+}
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING */
 
+static int dtl_enable(struct dtl *dtl)
+{
+	long int n_entries;
+	long int rc;
+	struct dtl_entry *buf = NULL;
+
+	/* only allow one reader */
+	if (dtl->buf)
+		return -EBUSY;
+
+	n_entries = dtl_buf_entries;
+	buf = kmalloc_node(n_entries * sizeof(struct dtl_entry),
+			GFP_KERNEL, cpu_to_node(dtl->cpu));
+	if (!buf) {
+		printk(KERN_WARNING "%s: buffer alloc failed for cpu %d\n",
+				__func__, dtl->cpu);
+		return -ENOMEM;
+	}
+
+	spin_lock(&dtl->lock);
+	rc = -EBUSY;
+	if (!dtl->buf) {
+		/* store the original allocation size for use during read */
+		dtl->buf_entries = n_entries;
+		dtl->buf = buf;
+		dtl->last_idx = 0;
+		rc = dtl_start(dtl);
+		if (rc)
+			dtl->buf = NULL;
+	}
+	spin_unlock(&dtl->lock);
+
+	if (rc)
+		kfree(buf);
+	return rc;
+}
+
+static void dtl_disable(struct dtl *dtl)
+{
+	spin_lock(&dtl->lock);
+	dtl_stop(dtl);
 	kfree(dtl->buf);
 	dtl->buf = NULL;
 	dtl->buf_entries = 0;
+	spin_unlock(&dtl->lock);
 }
 
 /* file interface */
@@ -140,8 +262,9 @@ static int dtl_file_release(struct inode *inode, struct file *filp)
 static ssize_t dtl_file_read(struct file *filp, char __user *buf, size_t len,
 		loff_t *pos)
 {
-	int rc, cur_idx, last_idx, n_read, n_req, read_size;
+	long int rc, n_read, n_req, read_size;
 	struct dtl *dtl;
+	u64 cur_idx, last_idx, i;
 
 	if ((len % sizeof(struct dtl_entry)) != 0)
 		return -EINVAL;
@@ -154,41 +277,48 @@ static ssize_t dtl_file_read(struct file *filp, char __user *buf, size_t len,
 	/* actual number of entries read */
 	n_read = 0;
 
-	cur_idx = lppaca_of(dtl->cpu).dtl_idx;
+	spin_lock(&dtl->lock);
+
+	cur_idx = dtl_current_index(dtl);
 	last_idx = dtl->last_idx;
 
-	if (cur_idx - last_idx > dtl->buf_entries) {
-		pr_debug("%s: hv buffer overflow for cpu %d, samples lost\n",
-				__func__, dtl->cpu);
-	}
+	if (last_idx + dtl->buf_entries <= cur_idx)
+		last_idx = cur_idx - dtl->buf_entries + 1;
+
+	if (last_idx + n_req > cur_idx)
+		n_req = cur_idx - last_idx;
 
-	cur_idx  %= dtl->buf_entries;
-	last_idx %= dtl->buf_entries;
+	if (n_req > 0)
+		dtl->last_idx = last_idx + n_req;
+
+	spin_unlock(&dtl->lock);
+
+	if (n_req <= 0)
+		return 0;
+
+	i = last_idx % dtl->buf_entries;
 
 	/* read the tail of the buffer if we've wrapped */
-	if (last_idx > cur_idx) {
-		read_size = min(n_req, dtl->buf_entries - last_idx);
+	if (i + n_req > dtl->buf_entries) {
+		read_size = dtl->buf_entries - i;
 
-		rc = copy_to_user(buf, &dtl->buf[last_idx],
+		rc = copy_to_user(buf, &dtl->buf[i],
 				read_size * sizeof(struct dtl_entry));
 		if (rc)
 			return -EFAULT;
 
-		last_idx = 0;
+		i = 0;
 		n_req -= read_size;
 		n_read += read_size;
 		buf += read_size * sizeof(struct dtl_entry);
 	}
 
 	/* .. and now the head */
-	read_size = min(n_req, cur_idx - last_idx);
-	rc = copy_to_user(buf, &dtl->buf[last_idx],
-			read_size * sizeof(struct dtl_entry));
+	rc = copy_to_user(buf, &dtl->buf[i], n_req * sizeof(struct dtl_entry));
 	if (rc)
 		return -EFAULT;
 
-	n_read += read_size;
-	dtl->last_idx += n_read;
+	n_read += n_req;
 
 	return n_read * sizeof(struct dtl_entry);
 }
@@ -220,11 +351,6 @@ static int dtl_init(void)
 	struct dentry *event_mask_file, *buf_entries_file;
 	int rc, i;
 
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING
-	/* disable this for now */
-	return -ENODEV;
-#endif
-
 	if (!firmware_has_feature(FW_FEATURE_SPLPAR))
 		return -ENODEV;
 
@@ -251,6 +377,7 @@ static int dtl_init(void)
 	/* set up the per-cpu log structures */
 	for_each_possible_cpu(i) {
 		struct dtl *dtl = &per_cpu(cpu_dtl, i);
+		spin_lock_init(&dtl->lock);
 		dtl->cpu = i;
 
 		rc = dtl_setup_file(dtl);
-- 
1.7.1

^ permalink raw reply related

* SPI driver for MPC837xERDB
From: Ravi Gupta @ 2010-08-31 12:30 UTC (permalink / raw)
  To: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1480 bytes --]

Hi all,



I am new to linux device driver development. I have to develop a SPI driver
for MPC8377 processor based board. Right now I don't have a that board
ready(but I have the MPC837xERDB with me), so I was thinking of developing
and testing it on the RDB. But in MPC837xERDB, I don't have an SPI slave
device, the SPI pins on the board is connected to the SD card slot. Is it
possible to use SD card slot as SPI device? Also I have written a sample SPI
driver, but its probe function is not getting called?



#include <linux/module.h>
#include <linux/spi/spi.h>

static int __devinit spi_probe(struct spi_device *spi)
{
        printk(KERN_DEBUG "spi_probe...");
        return 0;
}

static int __devexit spi_remove(struct spi_device *spi)
{
        printk(KERN_DEBUG "spi_remove\n");
        return 0;
}

static struct spi_driver sample_spi_driver = {
        .driver = {
                .name = "sample-spi",
                .bus = &spi_bus_type,
                .owner = THIS_MODULE,
        },
        .probe = spi_probe,
        .remove = spi_remove,
};


static __init int spi_module_init(void)
{
  printk(KERN_INFO "SPI Driver Version 0.1\n");
  return spi_register_driver(&sample_spi_driver);
}

static __exit void spi_module_exit(void)
{
  /* unregister SPI device */
  spi_unregister_driver(&sample_spi_driver);
}

MODULE_LICENSE("GPL");

module_init(spi_module_init);
module_exit(spi_module_exit);



Any help would be greatly appreciated.
Thanks in advance,
Ravi

[-- Attachment #2: Type: text/html, Size: 1944 bytes --]

^ permalink raw reply

* Re: [PATCH] ucc_geth: fix ethtool set ring param bug
From: Ben Hutchings @ 2010-08-31 14:41 UTC (permalink / raw)
  To: Liang Li; +Cc: netdev, avorontsov, linuxppc-dev, davem
In-Reply-To: <1283179677-21262-1-git-send-email-liang.li@windriver.com>

On Mon, 2010-08-30 at 22:47 +0800, Liang Li wrote:
> It's common sense that when we should do change to driver ring
> desc/buffer etc only after 'stop/shutdown' the device. When we
> do change while devices/driver is running, kernel oops occur:
[...]
> diff --git a/drivers/net/ucc_geth_ethtool.c b/drivers/net/ucc_geth_ethtool.c
> index 6f92e48..1b37aaa 100644
> --- a/drivers/net/ucc_geth_ethtool.c
> +++ b/drivers/net/ucc_geth_ethtool.c
> @@ -255,13 +255,18 @@ uec_set_ringparam(struct net_device *netdev,
>  		return -EINVAL;
>  	}
>  
> -	ug_info->bdRingLenRx[queue] = ring->rx_pending;
> -	ug_info->bdRingLenTx[queue] = ring->tx_pending;
> -
>  	if (netif_running(netdev)) {
> -		/* FIXME: restart automatically */
> -		printk(KERN_INFO
> -			"Please re-open the interface.\n");
> +		printk(KERN_INFO "Stopping interface %s.\n", netdev->name);
> +		ucc_geth_close(netdev);
> +
> +		ug_info->bdRingLenRx[queue] = ring->rx_pending;
> +		ug_info->bdRingLenTx[queue] = ring->tx_pending;
> +
> +		printk(KERN_INFO "Reactivating interface %s.\n", netdev->name);
> +		ucc_geth_open(netdev);

What if ucc_geth_open() fails?

Ben.

> +	} else {
> +		ug_info->bdRingLenRx[queue] = ring->rx_pending;
> +		ug_info->bdRingLenTx[queue] = ring->tx_pending;
>  	}
>  
>  	return ret;

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] ucc_geth: fix ethtool set ring param bug
From: Liang Li @ 2010-08-31 15:16 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, avorontsov, linuxppc-dev, davem
In-Reply-To: <1283265682.2244.0.camel@achroite.uk.solarflarecom.com>

On Tue, Aug 31, 2010 at 03:41:22PM +0100, Ben Hutchings wrote:
> On Mon, 2010-08-30 at 22:47 +0800, Liang Li wrote:
> > It's common sense that when we should do change to driver ring
> > desc/buffer etc only after 'stop/shutdown' the device. When we
> > do change while devices/driver is running, kernel oops occur:
> [...]
> > diff --git a/drivers/net/ucc_geth_ethtool.c b/drivers/net/ucc_geth_ethtool.c
> > index 6f92e48..1b37aaa 100644
> > --- a/drivers/net/ucc_geth_ethtool.c
> > +++ b/drivers/net/ucc_geth_ethtool.c
> > @@ -255,13 +255,18 @@ uec_set_ringparam(struct net_device *netdev,
> >  		return -EINVAL;
> >  	}
> >  
> > -	ug_info->bdRingLenRx[queue] = ring->rx_pending;
> > -	ug_info->bdRingLenTx[queue] = ring->tx_pending;
> > -
> >  	if (netif_running(netdev)) {
> > -		/* FIXME: restart automatically */
> > -		printk(KERN_INFO
> > -			"Please re-open the interface.\n");
> > +		printk(KERN_INFO "Stopping interface %s.\n", netdev->name);
> > +		ucc_geth_close(netdev);
> > +
> > +		ug_info->bdRingLenRx[queue] = ring->rx_pending;
> > +		ug_info->bdRingLenTx[queue] = ring->tx_pending;
> > +
> > +		printk(KERN_INFO "Reactivating interface %s.\n", netdev->name);
> > +		ucc_geth_open(netdev);
> 
> What if ucc_geth_open() fails?

I did some runtime tests but did not witness the ucc_geth_open fail.
Assume it may fail for some reason, then I tend to think give out
warnings for request user to open/enable it mannually?  Or we may need
to keep the 'FIXME' line?

Thanks,
				-Liang Li

> 
> Ben.
> 
> > +	} else {
> > +		ug_info->bdRingLenRx[queue] = ring->rx_pending;
> > +		ug_info->bdRingLenTx[queue] = ring->tx_pending;
> >  	}
> >  
> >  	return ret;
> 
> -- 
> Ben Hutchings, Senior Software Engineer, Solarflare Communications
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox