LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
From: Chen, Tiejun @ 2013-05-09 14:13 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Caraman Mihai Claudiu-B02008, kvm@vger.kernel.org,
	Wood Scott-B07421, agraf@suse.de, kvm-ppc@vger.kernel.org,
	Bhushan Bharat-R65777, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1368103062.25488.193.camel@pasglop>

> -----Original Message-----
> From: Benjamin Herrenschmidt [mailto:benh@kernel.crashing.org]=20
> Sent: Thursday, May 09, 2013 8:38 PM
> To: Chen, Tiejun
> Cc: Bhushan Bharat-R65777; Caraman Mihai Claudiu-B02008; Wood=20
> Scott-B07421; linuxppc-dev@lists.ozlabs.org; agraf@suse.de;=20
> kvm-ppc@vger.kernel.org; kvm@vger.kernel.org
> Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64:=20
> soft-disable interrupts
>=20
> On Thu, 2013-05-09 at 17:44 +0800, tiejun.chen wrote:
> >=20
> > Actually in the case GS=3D1 even if EE=3D0, EXT/DEC/DBELL still=20
> occur as I=20
> > recall.
>=20
> Only if directed to the hypervisor.

Yes, this is our current booehv design with EPCR[EXTGS] =3D 0.

>=20
> > > Case 1)
> > >   -> Local_irq_disable()  will set soft_enabled =3D 0
> > >   -> Now Externel interrupt happens, there we set PACA_IRQ_EE in
> > irq_happened, Also clears EE in SRR1 and rfi. So interrupts=20
> are hard=20
> > disabled. No more other interrupt gated by MSR.EE can happen. Looks=20
> > like the idea here is to not let a device keep on inserting=20
> interrupt=20
> > till the interrupt condition on device is cleared, right?
>=20
> The external interrupt line is level sensitive normally, so=20
> we have to mask MSR:EE in that case or the interrupt would=20
> keep re-occurring (note that FSL has this concept of=20
> auto-acked interrupts via the on die MPIC for which you can=20
> potentially use PACA_IRQ_EE_EDGE instead and avoid having to=20
> mask MSR:EE).
>=20
> > I don't understand "the interrupt condition on device is cleared"
> > here.
>=20
> Well, the external interrupt line will remain asserted until=20
> the device (via the PIC) stops interrupting the processor :-)=20

Yes, the device keeps on interrupting the interrupt until the device clear =
its interrupted condition.

> Either because we mask at the PIC level (or raise the=20
> priority) or because the condition goes away.
> =20
> > I think regardless if you clear the device interrupt status, the=20
> > system still  receive a pending interrupt once EE or GS =3D 1.
>=20
> Why ? Depends on your PIC actually but if it's a sane one, it=20
> won't "remember" a level interrupt that is gone. An edge=20
> interrupt is a different matter and is latched.

But the interrupt controller like MPIC should leave this irq as pending if =
we don't ACK/EOI this irq at all if we just clear the device, right?

>=20
> Some MPIC implementations tend to generate a spurrious IRQ in=20
> the case of level IRQs going away. IE. they still remember an=20
> event occurred and interrupt the processor, but on IACK=20
> return the spurious vector. However that isn't guaranteed to=20

Yes, this is just what I mean. No matter if this vector is still valid, the=
 external interrupt exception always occur once EE =3D 1 again.

> be the case and it is perfectly ok (and a good
> idea) for the PIC to just stop asserting the external=20
> interrupt line if the original device interrupt goes away=20
> (and it's configured as level sensitive).

I don't remember MPIC can work with this way. So I'd like to look the manua=
l again :)

Tiejun=

^ permalink raw reply

* [PATCH] rapidio/switches: remove tsi500 driver
From: Alexandre Bounine @ 2013-05-09 14:20 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linuxppc-dev; +Cc: Alexandre Bounine

Remove the driver for Tsi500 Parallel RapidIO switch because this device is
not available for several years. Since the first introduction of Tsi500,
the parallel RapidIO interface was replaced by the serial RapidIO (sRIO)
and therefore there is no value in keeping this driver.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
---
 drivers/rapidio/switches/Kconfig  |    7 ---
 drivers/rapidio/switches/Makefile |    1 -
 drivers/rapidio/switches/tsi500.c |   78 -------------------------------------
 3 files changed, 0 insertions(+), 86 deletions(-)
 delete mode 100644 drivers/rapidio/switches/tsi500.c

diff --git a/drivers/rapidio/switches/Kconfig b/drivers/rapidio/switches/Kconfig
index f47fee5..62d4a06 100644
--- a/drivers/rapidio/switches/Kconfig
+++ b/drivers/rapidio/switches/Kconfig
@@ -26,10 +26,3 @@ config RAPIDIO_CPS_GEN2
 	default n
 	---help---
 	  Includes support for ITD CPS Gen.2 serial RapidIO switches.
-
-config RAPIDIO_TSI500
-	bool "Tsi500 Parallel RapidIO switch support"
-	depends on RAPIDIO
-	default n
-	---help---
-	  Includes support for IDT Tsi500 parallel RapidIO switch.
diff --git a/drivers/rapidio/switches/Makefile b/drivers/rapidio/switches/Makefile
index c4d3acc..051cc6b 100644
--- a/drivers/rapidio/switches/Makefile
+++ b/drivers/rapidio/switches/Makefile
@@ -5,5 +5,4 @@
 obj-$(CONFIG_RAPIDIO_TSI57X)	+= tsi57x.o
 obj-$(CONFIG_RAPIDIO_CPS_XX)	+= idtcps.o
 obj-$(CONFIG_RAPIDIO_TSI568)	+= tsi568.o
-obj-$(CONFIG_RAPIDIO_TSI500)	+= tsi500.o
 obj-$(CONFIG_RAPIDIO_CPS_GEN2)	+= idt_gen2.o
diff --git a/drivers/rapidio/switches/tsi500.c b/drivers/rapidio/switches/tsi500.c
deleted file mode 100644
index 914eddd..0000000
--- a/drivers/rapidio/switches/tsi500.c
+++ /dev/null
@@ -1,78 +0,0 @@
-/*
- * RapidIO Tsi500 switch support
- *
- * Copyright 2009-2010 Integrated Device Technology, Inc.
- * Alexandre Bounine <alexandre.bounine@idt.com>
- *  - Modified switch operations initialization.
- *
- * Copyright 2005 MontaVista Software, Inc.
- * Matt Porter <mporter@kernel.crashing.org>
- *
- * This program is free software; you can redistribute  it and/or modify it
- * under  the terms of  the GNU General  Public License as published by the
- * Free Software Foundation;  either version 2 of the  License, or (at your
- * option) any later version.
- */
-
-#include <linux/rio.h>
-#include <linux/rio_drv.h>
-#include <linux/rio_ids.h>
-#include "../rio.h"
-
-static int
-tsi500_route_add_entry(struct rio_mport *mport, u16 destid, u8 hopcount, u16 table, u16 route_destid, u8 route_port)
-{
-	int i;
-	u32 offset = 0x10000 + 0xa00 + ((route_destid / 2)&~0x3);
-	u32 result;
-
-	if (table == 0xff) {
-		rio_mport_read_config_32(mport, destid, hopcount, offset, &result);
-		result &= ~(0xf << (4*(route_destid & 0x7)));
-		for (i=0;i<4;i++)
-			rio_mport_write_config_32(mport, destid, hopcount, offset + (0x20000*i), result | (route_port << (4*(route_destid & 0x7))));
-	}
-	else {
-		rio_mport_read_config_32(mport, destid, hopcount, offset + (0x20000*table), &result);
-		result &= ~(0xf << (4*(route_destid & 0x7)));
-		rio_mport_write_config_32(mport, destid, hopcount, offset + (0x20000*table), result | (route_port << (4*(route_destid & 0x7))));
-	}
-
-	return 0;
-}
-
-static int
-tsi500_route_get_entry(struct rio_mport *mport, u16 destid, u8 hopcount, u16 table, u16 route_destid, u8 *route_port)
-{
-	int ret = 0;
-	u32 offset = 0x10000 + 0xa00 + ((route_destid / 2)&~0x3);
-	u32 result;
-
-	if (table == 0xff)
-		rio_mport_read_config_32(mport, destid, hopcount, offset, &result);
-	else
-		rio_mport_read_config_32(mport, destid, hopcount, offset + (0x20000*table), &result);
-
-	result &= 0xf << (4*(route_destid & 0x7));
-	*route_port = result >> (4*(route_destid & 0x7));
-	if (*route_port > 3)
-		ret = -1;
-
-	return ret;
-}
-
-static int tsi500_switch_init(struct rio_dev *rdev, int do_enum)
-{
-	pr_debug("RIO: %s for %s\n", __func__, rio_name(rdev));
-	rdev->rswitch->add_entry = tsi500_route_add_entry;
-	rdev->rswitch->get_entry = tsi500_route_get_entry;
-	rdev->rswitch->clr_table = NULL;
-	rdev->rswitch->set_domain = NULL;
-	rdev->rswitch->get_domain = NULL;
-	rdev->rswitch->em_init = NULL;
-	rdev->rswitch->em_handle = NULL;
-
-	return 0;
-}
-
-DECLARE_RIO_SWITCH_INIT(RIO_VID_TUNDRA, RIO_DID_TSI500, tsi500_switch_init);
-- 
1.7.8.4

^ permalink raw reply related

* Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
From: Scott Wood @ 2013-05-09 21:27 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Wood Scott-B07421, kvm@vger.kernel.org,
	Caraman Mihai Claudiu-B02008, agraf@suse.de,
	kvm-ppc@vger.kernel.org, tiejun.chen, Bhushan Bharat-R65777,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1368103062.25488.193.camel@pasglop>

On 05/09/2013 07:37:42 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2013-05-09 at 17:44 +0800, tiejun.chen wrote:
> >
> > Actually in the case GS=3D1 even if EE=3D0, EXT/DEC/DBELL still occur =20
> as I
> > recall.
>=20
> Only if directed to the hypervisor.

This is always the case with KVM, right?  At least on booke...

> > > Case 1)
> > >   -> Local_irq_disable()  will set soft_enabled =3D 0
> > >   -> Now Externel interrupt happens, there we set PACA_IRQ_EE in
> > irq_happened, Also clears EE in SRR1 and rfi. So interrupts are hard
> > disabled. No more other interrupt gated by MSR.EE can happen. Looks
> > like the idea here is to not let a device keep on inserting =20
> interrupt
> > till the interrupt condition on device is cleared, right?
>=20
> The external interrupt line is level sensitive normally, so we have to
> mask MSR:EE in that case or the interrupt would keep re-occurring =20
> (note
> that FSL has this concept of auto-acked interrupts via the on die MPIC
> for which you can potentially use PACA_IRQ_EE_EDGE instead and avoid
> having to mask MSR:EE).

Note that if we do this, we can no longer leave the interrupt vector in =20
EPR, and would have to track (potentially multiple different) pending =20
external interrupts in software.

-Scott=

^ permalink raw reply

* Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
From: Benjamin Herrenschmidt @ 2013-05-09 22:01 UTC (permalink / raw)
  To: David Laight
  Cc: Wood Scott-B07421, kvm, Caraman Mihai Claudiu-B02008, agraf,
	kvm-ppc, tiejun.chen, Bhushan Bharat-R65777, linuxppc-dev
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B722F@saturn3.aculab.com>

On Thu, 2013-05-09 at 14:28 +0100, David Laight wrote:
> That will happen if the IRQ goes away while the cpu is performing
> the IACK sequence.
> If the IRQ goes away while the cpu has interrupts masked then
> the cpu won't start the interrupt sequence and then try to
> read a vector when no interrupt is pending.

Right, and get a spurrious vector which shouldn't be a big deal.

We tend to call that a "short interrupt". There have been many other
cases of short interrupts in the past, for example on some MPICs, when
distribution is enabled, they occasionally shoot an interrupt to more
than one CPU at once :-)

Cheers,
Ben.

^ permalink raw reply

* AUTO: Michael Barry is out of the office (returning 13/05/2013)
From: Michael Barry @ 2013-05-09 22:02 UTC (permalink / raw)
  To: linuxppc-dev


I am out of the office until 13/05/2013.




Note: This is an automated response to your message  "Linuxppc-dev Digest,
Vol 105, Issue 55" sent on 09/05/2013 23:01:55.

This is the only notification you will receive while this person is away.

^ permalink raw reply

* Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
From: Benjamin Herrenschmidt @ 2013-05-09 22:07 UTC (permalink / raw)
  To: Scott Wood
  Cc: Wood Scott-B07421, kvm@vger.kernel.org,
	Caraman Mihai Claudiu-B02008, agraf@suse.de,
	kvm-ppc@vger.kernel.org, tiejun.chen, Bhushan Bharat-R65777,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1368134856.654.11@snotra>

On Thu, 2013-05-09 at 16:27 -0500, Scott Wood wrote:
> On 05/09/2013 07:37:42 AM, Benjamin Herrenschmidt wrote:
> > On Thu, 2013-05-09 at 17:44 +0800, tiejun.chen wrote:
> > >
> > > Actually in the case GS=1 even if EE=0, EXT/DEC/DBELL still occur  
> > as I
> > > recall.
> > 
> > Only if directed to the hypervisor.
> 
> This is always the case with KVM, right?  At least on booke...

Hrm, on A2 we could choose iirc. Well not DEC but EXT at least, I don't
remember about DBELL.

> > > > Case 1)
> > > >   -> Local_irq_disable()  will set soft_enabled = 0
> > > >   -> Now Externel interrupt happens, there we set PACA_IRQ_EE in
> > > irq_happened, Also clears EE in SRR1 and rfi. So interrupts are hard
> > > disabled. No more other interrupt gated by MSR.EE can happen. Looks
> > > like the idea here is to not let a device keep on inserting  
> > interrupt
> > > till the interrupt condition on device is cleared, right?
> > 
> > The external interrupt line is level sensitive normally, so we have to
> > mask MSR:EE in that case or the interrupt would keep re-occurring  
> > (note
> > that FSL has this concept of auto-acked interrupts via the on die MPIC
> > for which you can potentially use PACA_IRQ_EE_EDGE instead and avoid
> > having to mask MSR:EE).
> 
> Note that if we do this, we can no longer leave the interrupt vector in  
> EPR, and would have to track (potentially multiple different) pending  
> external interrupts in software.

Right, you can have a little queue in the PACA and leave EE disabled if
it's full. You can probably get away with a queue of 1 though :-)

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] powerpc/pci: Support per-aperture memory offset
From: Benjamin Herrenschmidt @ 2013-05-09 22:07 UTC (permalink / raw)
  To: Sethi Varun-B16395
  Cc: Thomas Petazzoni, Wood Scott-B07421, linux-pci@vger.kernel.org,
	Bjorn Helgaas, Andrew Murray, linuxppc-dev
In-Reply-To: <C5ECD7A89D1DC44195F34B25E172658D4F01D0@039-SN2MPN1-013.039d.mgd.msft.net>

On Thu, 2013-05-09 at 13:47 +0000, Sethi Varun-B16395 wrote:
> [Sethi Varun-B16395] Tested patch on FSL T4240, P4080, P5040 and P1020
> boards.

Thanks.

Cheers,
Ben.

^ permalink raw reply

* Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
From: Scott Wood @ 2013-05-09 22:13 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Wood Scott-B07421, kvm@vger.kernel.org,
	Caraman Mihai Claudiu-B02008, agraf@suse.de,
	kvm-ppc@vger.kernel.org, tiejun.chen, Bhushan Bharat-R65777,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1368137220.3715.14.camel@pasglop>

On 05/09/2013 05:07:00 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2013-05-09 at 16:27 -0500, Scott Wood wrote:
> > On 05/09/2013 07:37:42 AM, Benjamin Herrenschmidt wrote:
> > > On Thu, 2013-05-09 at 17:44 +0800, tiejun.chen wrote:
> > > >
> > > > Actually in the case GS=3D1 even if EE=3D0, EXT/DEC/DBELL still =20
> occur
> > > as I
> > > > recall.
> > >
> > > Only if directed to the hypervisor.
> >
> > This is always the case with KVM, right?  At least on booke...
>=20
> Hrm, on A2 we could choose iirc. Well not DEC but EXT at least, I =20
> don't
> remember about DBELL.

As for as the hardware goes we can choose on e500mc as well -- we just =20
don't support it in KVM, because we can't distinguish between guest and =20
host interrupts without routing the latter to critical, which is way =20
too much of a pain for Linux (we did it in our standalone hypervisor, =20
though).

-Scott=

^ permalink raw reply

* [PATCH v2 2/4] kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
From: Scott Wood @ 2013-05-10  3:09 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt
  Cc: Scott Wood, linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <1368155384-11035-1-git-send-email-scottwood@freescale.com>

EE is hard-disabled on entry to kvmppc_handle_exit(), so call
hard_irq_disable() so that PACA_IRQ_HARD_DIS is set, and soft_enabled
is unset.

Without this, we get warnings such as arch/powerpc/kernel/time.c:300,
and sometimes host kernel hangs.

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
 arch/powerpc/kvm/booke.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 1020119..705fc5c 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -833,6 +833,11 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	int r = RESUME_HOST;
 	int s;
 
+#ifdef CONFIG_PPC64
+	WARN_ON(local_paca->irq_happened != 0);
+#endif
+	hard_irq_disable();
+
 	/* update before a new last_exit_type is rewritten */
 	kvmppc_update_timing_stats(vcpu);
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH v2 4/4] kvm/ppc: IRQ disabling cleanup
From: Scott Wood @ 2013-05-10  3:09 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt
  Cc: Scott Wood, linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <1368155384-11035-1-git-send-email-scottwood@freescale.com>

Simplify the handling of lazy EE by going directly from fully-enabled
to hard-disabled.  This replaces the lazy_irq_pending() check
(including its misplaced kvm_guest_exit() call).

As suggested by Tiejun Chen, move the interrupt disabling into
kvmppc_prepare_to_enter() rather than have each caller do it.  Also
move the IRQ enabling on heavyweight exit into
kvmppc_prepare_to_enter().

Don't move kvmppc_fix_ee_before_entry() into kvmppc_prepare_to_enter(),
so that the caller can avoid marking interrupts enabled earlier than
necessary (e.g. book3s_pr waits until after FP save/restore is done).

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
 arch/powerpc/include/asm/kvm_ppc.h |    6 ++++++
 arch/powerpc/kvm/book3s_pr.c       |   12 +++---------
 arch/powerpc/kvm/booke.c           |    9 ++-------
 arch/powerpc/kvm/powerpc.c         |   21 ++++++++-------------
 4 files changed, 19 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 6885846..e4474f8 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -404,6 +404,12 @@ static inline void kvmppc_fix_ee_before_entry(void)
 	trace_hardirqs_on();
 
 #ifdef CONFIG_PPC64
+	/*
+	 * To avoid races, the caller must have gone directly from having
+	 * interrupts fully-enabled to hard-disabled.
+	 */
+	WARN_ON(local_paca->irq_happened != PACA_IRQ_HARD_DIS);
+
 	/* Only need to enable IRQs by hard enabling them after this */
 	local_paca->irq_happened = 0;
 	local_paca->soft_enabled = 1;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 0b97ce4..e61e39e 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -884,14 +884,11 @@ program_interrupt:
 		 * and if we really did time things so badly, then we just exit
 		 * again due to a host external interrupt.
 		 */
-		local_irq_disable();
 		s = kvmppc_prepare_to_enter(vcpu);
-		if (s <= 0) {
-			local_irq_enable();
+		if (s <= 0)
 			r = s;
-		} else {
+		else
 			kvmppc_fix_ee_before_entry();
-		}
 	}
 
 	trace_kvm_book3s_reenter(r, vcpu);
@@ -1121,12 +1118,9 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 	 * really did time things so badly, then we just exit again due to
 	 * a host external interrupt.
 	 */
-	local_irq_disable();
 	ret = kvmppc_prepare_to_enter(vcpu);
-	if (ret <= 0) {
-		local_irq_enable();
+	if (ret <= 0)
 		goto out;
-	}
 
 	/* Save FPU state in stack */
 	if (current->thread.regs->msr & MSR_FP)
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index eb89b83..f7c0111 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -666,10 +666,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 		return -EINVAL;
 	}
 
-	local_irq_disable();
 	s = kvmppc_prepare_to_enter(vcpu);
 	if (s <= 0) {
-		local_irq_enable();
 		ret = s;
 		goto out;
 	}
@@ -1148,14 +1146,11 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	 * aren't already exiting to userspace for some other reason.
 	 */
 	if (!(r & RESUME_HOST)) {
-		local_irq_disable();
 		s = kvmppc_prepare_to_enter(vcpu);
-		if (s <= 0) {
-			local_irq_enable();
+		if (s <= 0)
 			r = (s << 2) | RESUME_HOST | (r & RESUME_FLAG_NV);
-		} else {
+		else
 			kvmppc_fix_ee_before_entry();
-		}
 	}
 
 	return r;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 4e05f8c..f8659aa 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -64,12 +64,14 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
 {
 	int r = 1;
 
-	WARN_ON_ONCE(!irqs_disabled());
+	WARN_ON(irqs_disabled());
+	hard_irq_disable();
+
 	while (true) {
 		if (need_resched()) {
 			local_irq_enable();
 			cond_resched();
-			local_irq_disable();
+			hard_irq_disable();
 			continue;
 		}
 
@@ -95,7 +97,7 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
 			local_irq_enable();
 			trace_kvm_check_requests(vcpu);
 			r = kvmppc_core_check_requests(vcpu);
-			local_irq_disable();
+			hard_irq_disable();
 			if (r > 0)
 				continue;
 			break;
@@ -108,21 +110,14 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
 		}
 
 #ifdef CONFIG_PPC64
-		/* lazy EE magic */
-		hard_irq_disable();
-		if (lazy_irq_pending()) {
-			/* Got an interrupt in between, try again */
-			local_irq_enable();
-			local_irq_disable();
-			kvm_guest_exit();
-			continue;
-		}
+		WARN_ON(lazy_irq_pending());
 #endif
 
 		kvm_guest_enter();
-		break;
+		return r;
 	}
 
+	local_irq_enable();
 	return r;
 }
 #endif /* CONFIG_KVM_BOOK3S_64_HV */
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH v2 0/4] kvm/ppc: interrupt disabling fixes
From: Scott Wood @ 2013-05-10  3:09 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt; +Cc: linuxppc-dev, kvm, kvm-ppc

v2:
 - Split into separate changes
 - Rebase on top of (and fix a bug in) powerpc: Make hard_irq_disable()
   do the right thing vs. irq tracing
 - Move interrupt diasbling/enabling into kvmppc_handle_exit

Based on Ben's next branch.

Scott Wood (4):
  powerpc: hard_irq_disable(): Call trace_hardirqs_off after disabling
  kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
  kvm/ppc: Call trace_hardirqs_on before entry
  kvm/ppc: IRQ disabling cleanup

 arch/powerpc/include/asm/hw_irq.h  |    5 +++--
 arch/powerpc/include/asm/kvm_ppc.h |   17 ++++++++++++++---
 arch/powerpc/kvm/book3s_pr.c       |   16 +++++-----------
 arch/powerpc/kvm/booke.c           |   18 +++++++++---------
 arch/powerpc/kvm/powerpc.c         |   23 ++++++++---------------
 5 files changed, 39 insertions(+), 40 deletions(-)

-- 
1.7.10.4

^ permalink raw reply

* [PATCH v2 1/4] powerpc: hard_irq_disable(): Call trace_hardirqs_off after disabling
From: Scott Wood @ 2013-05-10  3:09 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt
  Cc: Scott Wood, linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <1368155384-11035-1-git-send-email-scottwood@freescale.com>

lockdep.c has this:
        /*
         * So we're supposed to get called after you mask local IRQs,
         * but for some reason the hardware doesn't quite think you did
         * a proper job.
         */
	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
		return;

Since irqs_disabled() is based on soft_enabled(), that (not just the
hard EE bit) needs to be 0 before we call trace_hardirqs_off.

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
 arch/powerpc/include/asm/hw_irq.h |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index d615b28..ba713f1 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -96,11 +96,12 @@ static inline bool arch_irqs_disabled(void)
 #endif
 
 #define hard_irq_disable()	do {			\
+	u8 _was_enabled = get_paca()->soft_enabled;	\
 	__hard_irq_disable();				\
-	if (local_paca->soft_enabled)			\
-		trace_hardirqs_off();			\
 	get_paca()->soft_enabled = 0;			\
 	get_paca()->irq_happened |= PACA_IRQ_HARD_DIS;	\
+	if (_was_enabled)				\
+		trace_hardirqs_off();			\
 } while(0)
 
 static inline bool lazy_irq_pending(void)
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH v2 3/4] kvm/ppc: Call trace_hardirqs_on before entry
From: Scott Wood @ 2013-05-10  3:09 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt
  Cc: Scott Wood, linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <1368155384-11035-1-git-send-email-scottwood@freescale.com>

Currently this is only being done on 64-bit.  Rather than just move it
out of the 64-bit ifdef, move it to kvm_lazy_ee_enable() so that it is
consistent with lazy ee state, and so that we don't track more host
code as interrupts-enabled than necessary.

Rename kvm_lazy_ee_enable() to kvm_fix_ee_before_entry() to reflect
that this function now has a role on 32-bit as well.

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
 arch/powerpc/include/asm/kvm_ppc.h |   11 ++++++++---
 arch/powerpc/kvm/book3s_pr.c       |    4 ++--
 arch/powerpc/kvm/booke.c           |    4 ++--
 arch/powerpc/kvm/powerpc.c         |    2 --
 4 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index a5287fe..6885846 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -394,10 +394,15 @@ static inline void kvmppc_mmu_flush_icache(pfn_t pfn)
 	}
 }
 
-/* Please call after prepare_to_enter. This function puts the lazy ee state
-   back to normal mode, without actually enabling interrupts. */
-static inline void kvmppc_lazy_ee_enable(void)
+/*
+ * Please call after prepare_to_enter. This function puts the lazy ee and irq
+ * disabled tracking state back to normal mode, without actually enabling
+ * interrupts.
+ */
+static inline void kvmppc_fix_ee_before_entry(void)
 {
+	trace_hardirqs_on();
+
 #ifdef CONFIG_PPC64
 	/* Only need to enable IRQs by hard enabling them after this */
 	local_paca->irq_happened = 0;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index bdc40b8..0b97ce4 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -890,7 +890,7 @@ program_interrupt:
 			local_irq_enable();
 			r = s;
 		} else {
-			kvmppc_lazy_ee_enable();
+			kvmppc_fix_ee_before_entry();
 		}
 	}
 
@@ -1161,7 +1161,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 	if (vcpu->arch.shared->msr & MSR_FP)
 		kvmppc_handle_ext(vcpu, BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP);
 
-	kvmppc_lazy_ee_enable();
+	kvmppc_fix_ee_before_entry();
 
 	ret = __kvmppc_vcpu_run(kvm_run, vcpu);
 
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 705fc5c..eb89b83 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -673,7 +673,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 		ret = s;
 		goto out;
 	}
-	kvmppc_lazy_ee_enable();
+	kvmppc_fix_ee_before_entry();
 
 	kvm_guest_enter();
 
@@ -1154,7 +1154,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 			local_irq_enable();
 			r = (s << 2) | RESUME_HOST | (r & RESUME_FLAG_NV);
 		} else {
-			kvmppc_lazy_ee_enable();
+			kvmppc_fix_ee_before_entry();
 		}
 	}
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6316ee3..4e05f8c 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -117,8 +117,6 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
 			kvm_guest_exit();
 			continue;
 		}
-
-		trace_hardirqs_on();
 #endif
 
 		kvm_guest_enter();
-- 
1.7.10.4

^ permalink raw reply related

* RE: [PATCH v2 3/4] kvm/ppc: Call trace_hardirqs_on before entry
From: Bhushan Bharat-R65777 @ 2013-05-10  3:34 UTC (permalink / raw)
  To: Wood Scott-B07421, Alexander Graf, Benjamin Herrenschmidt
  Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org,
	kvm@vger.kernel.org, kvm-ppc@vger.kernel.org
In-Reply-To: <1368155384-11035-4-git-send-email-scottwood@freescale.com>



> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Beh=
alf Of
> Scott Wood
> Sent: Friday, May 10, 2013 8:40 AM
> To: Alexander Graf; Benjamin Herrenschmidt
> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-dev@lists.ozla=
bs.org;
> Wood Scott-B07421
> Subject: [PATCH v2 3/4] kvm/ppc: Call trace_hardirqs_on before entry
>=20
> Currently this is only being done on 64-bit.  Rather than just move it
> out of the 64-bit ifdef, move it to kvm_lazy_ee_enable() so that it is
> consistent with lazy ee state, and so that we don't track more host
> code as interrupts-enabled than necessary.
>=20
> Rename kvm_lazy_ee_enable() to kvm_fix_ee_before_entry() to reflect
> that this function now has a role on 32-bit as well.
>=20
> Signed-off-by: Scott Wood <scottwood@freescale.com>
> ---
>  arch/powerpc/include/asm/kvm_ppc.h |   11 ++++++++---
>  arch/powerpc/kvm/book3s_pr.c       |    4 ++--
>  arch/powerpc/kvm/booke.c           |    4 ++--
>  arch/powerpc/kvm/powerpc.c         |    2 --
>  4 files changed, 12 insertions(+), 9 deletions(-)
>=20
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h
> b/arch/powerpc/include/asm/kvm_ppc.h
> index a5287fe..6885846 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -394,10 +394,15 @@ static inline void kvmppc_mmu_flush_icache(pfn_t pf=
n)
>  	}
>  }
>=20
> -/* Please call after prepare_to_enter. This function puts the lazy ee st=
ate
> -   back to normal mode, without actually enabling interrupts. */
> -static inline void kvmppc_lazy_ee_enable(void)
> +/*
> + * Please call after prepare_to_enter. This function puts the lazy ee an=
d irq
> + * disabled tracking state back to normal mode, without actually enablin=
g
> + * interrupts.
> + */
> +static inline void kvmppc_fix_ee_before_entry(void)
>  {
> +	trace_hardirqs_on();
> +
>  #ifdef CONFIG_PPC64
>  	/* Only need to enable IRQs by hard enabling them after this */
>  	local_paca->irq_happened =3D 0;
> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> index bdc40b8..0b97ce4 100644
> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c
> @@ -890,7 +890,7 @@ program_interrupt:
>  			local_irq_enable();
>  			r =3D s;
>  		} else {
> -			kvmppc_lazy_ee_enable();
> +			kvmppc_fix_ee_before_entry();
>  		}
>  	}
>=20
> @@ -1161,7 +1161,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct
> kvm_vcpu *vcpu)
>  	if (vcpu->arch.shared->msr & MSR_FP)
>  		kvmppc_handle_ext(vcpu, BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP);
>=20
> -	kvmppc_lazy_ee_enable();
> +	kvmppc_fix_ee_before_entry();
>=20
>  	ret =3D __kvmppc_vcpu_run(kvm_run, vcpu);
>=20
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index 705fc5c..eb89b83 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -673,7 +673,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct k=
vm_vcpu
> *vcpu)
>  		ret =3D s;
>  		goto out;
>  	}
> -	kvmppc_lazy_ee_enable();
> +	kvmppc_fix_ee_before_entry();

local_irq_disable() is called before kvmppc_prepare_to_enter().
Now we put the irq_happend and soft_enabled back to previous state without =
checking for any interrupt happened in between. If any interrupt happens in=
 between, will not that be lost?

-Bharat

>=20
>  	kvm_guest_enter();
>=20
> @@ -1154,7 +1154,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct
> kvm_vcpu *vcpu,
>  			local_irq_enable();
>  			r =3D (s << 2) | RESUME_HOST | (r & RESUME_FLAG_NV);
>  		} else {
> -			kvmppc_lazy_ee_enable();
> +			kvmppc_fix_ee_before_entry();
>  		}
>  	}
>=20
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 6316ee3..4e05f8c 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -117,8 +117,6 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
>  			kvm_guest_exit();
>  			continue;
>  		}
> -
> -		trace_hardirqs_on();
>  #endif
>=20
>  		kvm_guest_enter();
> --
> 1.7.10.4
>=20
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH 2/2 V8] powerpc/85xx: Add machine check handler to fix PCIe erratum on mpc85xx
From: Jia Hongtao-B38951 @ 2013-05-10  4:00 UTC (permalink / raw)
  To: galak@kernel.crashing.org
  Cc: linuxppc-dev@lists.ozlabs.org, Li Yang-R58472, Jia Hongtao-B38951
In-Reply-To: <1367514224.24411.10@snotra>

> > -----Original Message-----
> > From: Wood Scott-B07421
> > Sent: Friday, May 03, 2013 1:04 AM
> > To: Jia Hongtao-B38951
> > Cc: linuxppc-dev@lists.ozlabs.org; galak@kernel.crashing.org; Wood
> > Scott- B07421; segher@kernel.crashing.org; Li Yang-R58472; Jia
> > Hongtao-B38951
> > Subject: Re: [PATCH 2/2 V8] powerpc/85xx: Add machine check handler to
> > fix PCIe erratum on mpc85xx
> >
> > On 04/28/2013 12:20:08 AM, Jia Hongtao wrote:
> > > A PCIe erratum of mpc85xx may causes a core hang when a link of PCIe
> > > goes down. when the link goes down, Non-posted transactions issued
> > > via the ATMU requiring completion result in an instruction stall.
> > > At the same time a machine-check exception is generated to the core
> > > to allow further processing by the handler. We implements the
> > > handler which skips the instruction caused the stall.
> > >
> > > This patch depends on patch:
> > > powerpc/85xx: Add platform_device declaration to fsl_pci.h
> > >
> > > Signed-off-by: Zhao Chenhui <b35336@freescale.com>
> > > Signed-off-by: Li Yang <leoli@freescale.com>
> > > Signed-off-by: Liu Shuo <soniccat.liu@gmail.com>
> > > Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
> > > ---
> > > V8:
> > > * Add A variant load instruction emulation.
> >
> > ACK
> >
> > -Scott
>=20
> Thanks for the review.
>=20
> Hi Kumar,
>=20
> Could you please review these MSI and PCI hang errata patches?
> http://patchwork.ozlabs.org/patch/233211/
> http://patchwork.ozlabs.org/patch/235276/
> http://patchwork.ozlabs.org/patch/240238/
> http://patchwork.ozlabs.org/patch/240239/ (This patch)
>=20
> Thanks.
> -Hongtao

Hi Kumar,

I'm really appreciated if you have time to review these patches?

Thanks.
-Hongtao

^ permalink raw reply

* Re: [PATCH v2 3/4] kvm/ppc: Call trace_hardirqs_on before entry
From: tiejun.chen @ 2013-05-10  4:40 UTC (permalink / raw)
  To: Bhushan Bharat-R65777
  Cc: Wood Scott-B07421, kvm@vger.kernel.org, Alexander Graf,
	kvm-ppc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <6A3DF150A5B70D4F9B66A25E3F7C888D0700F4FF@039-SN2MPN1-011.039d.mgd.msft.net>

On 05/10/2013 11:34 AM, Bhushan Bharat-R65777 wrote:
>
>
>> -----Original Message-----
>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf Of
>> Scott Wood
>> Sent: Friday, May 10, 2013 8:40 AM
>> To: Alexander Graf; Benjamin Herrenschmidt
>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
>> Wood Scott-B07421
>> Subject: [PATCH v2 3/4] kvm/ppc: Call trace_hardirqs_on before entry
>>
>> Currently this is only being done on 64-bit.  Rather than just move it
>> out of the 64-bit ifdef, move it to kvm_lazy_ee_enable() so that it is
>> consistent with lazy ee state, and so that we don't track more host
>> code as interrupts-enabled than necessary.
>>
>> Rename kvm_lazy_ee_enable() to kvm_fix_ee_before_entry() to reflect
>> that this function now has a role on 32-bit as well.
>>
>> Signed-off-by: Scott Wood <scottwood@freescale.com>
>> ---
>>   arch/powerpc/include/asm/kvm_ppc.h |   11 ++++++++---
>>   arch/powerpc/kvm/book3s_pr.c       |    4 ++--
>>   arch/powerpc/kvm/booke.c           |    4 ++--
>>   arch/powerpc/kvm/powerpc.c         |    2 --
>>   4 files changed, 12 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h
>> b/arch/powerpc/include/asm/kvm_ppc.h
>> index a5287fe..6885846 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -394,10 +394,15 @@ static inline void kvmppc_mmu_flush_icache(pfn_t pfn)
>>   	}
>>   }
>>
>> -/* Please call after prepare_to_enter. This function puts the lazy ee state
>> -   back to normal mode, without actually enabling interrupts. */
>> -static inline void kvmppc_lazy_ee_enable(void)
>> +/*
>> + * Please call after prepare_to_enter. This function puts the lazy ee and irq
>> + * disabled tracking state back to normal mode, without actually enabling
>> + * interrupts.
>> + */
>> +static inline void kvmppc_fix_ee_before_entry(void)
>>   {
>> +	trace_hardirqs_on();
>> +
>>   #ifdef CONFIG_PPC64
>>   	/* Only need to enable IRQs by hard enabling them after this */
>>   	local_paca->irq_happened = 0;
>> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
>> index bdc40b8..0b97ce4 100644
>> --- a/arch/powerpc/kvm/book3s_pr.c
>> +++ b/arch/powerpc/kvm/book3s_pr.c
>> @@ -890,7 +890,7 @@ program_interrupt:
>>   			local_irq_enable();
>>   			r = s;
>>   		} else {
>> -			kvmppc_lazy_ee_enable();
>> +			kvmppc_fix_ee_before_entry();
>>   		}
>>   	}
>>
>> @@ -1161,7 +1161,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct
>> kvm_vcpu *vcpu)
>>   	if (vcpu->arch.shared->msr & MSR_FP)
>>   		kvmppc_handle_ext(vcpu, BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP);
>>
>> -	kvmppc_lazy_ee_enable();
>> +	kvmppc_fix_ee_before_entry();
>>
>>   	ret = __kvmppc_vcpu_run(kvm_run, vcpu);
>>
>> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
>> index 705fc5c..eb89b83 100644
>> --- a/arch/powerpc/kvm/booke.c
>> +++ b/arch/powerpc/kvm/booke.c
>> @@ -673,7 +673,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu
>> *vcpu)
>>   		ret = s;
>>   		goto out;
>>   	}
>> -	kvmppc_lazy_ee_enable();
>> +	kvmppc_fix_ee_before_entry();
>
> local_irq_disable() is called before kvmppc_prepare_to_enter().

In patch 4, we call hard_irq_disable() once enter kvmppc_prepare_to_enter().

Tiejun

> Now we put the irq_happend and soft_enabled back to previous state without checking for any interrupt happened in between. If any interrupt happens in between, will not that be lost?
>
> -Bharat
>
>>
>>   	kvm_guest_enter();
>>
>> @@ -1154,7 +1154,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct
>> kvm_vcpu *vcpu,
>>   			local_irq_enable();
>>   			r = (s << 2) | RESUME_HOST | (r & RESUME_FLAG_NV);
>>   		} else {
>> -			kvmppc_lazy_ee_enable();
>> +			kvmppc_fix_ee_before_entry();
>>   		}
>>   	}
>>
>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
>> index 6316ee3..4e05f8c 100644
>> --- a/arch/powerpc/kvm/powerpc.c
>> +++ b/arch/powerpc/kvm/powerpc.c
>> @@ -117,8 +117,6 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
>>   			kvm_guest_exit();
>>   			continue;
>>   		}
>> -
>> -		trace_hardirqs_on();
>>   #endif
>>
>>   		kvm_guest_enter();
>> --
>> 1.7.10.4
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
>

^ permalink raw reply

* RE: [PATCH v2 2/4] kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
From: Bhushan Bharat-R65777 @ 2013-05-10  5:01 UTC (permalink / raw)
  To: Wood Scott-B07421, Alexander Graf, Benjamin Herrenschmidt
  Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org,
	kvm@vger.kernel.org, kvm-ppc@vger.kernel.org
In-Reply-To: <1368155384-11035-3-git-send-email-scottwood@freescale.com>



> -----Original Message-----
> From: kvm-ppc-owner@vger.kernel.org [mailto:kvm-ppc-owner@vger.kernel.org=
] On
> Behalf Of Scott Wood
> Sent: Friday, May 10, 2013 8:40 AM
> To: Alexander Graf; Benjamin Herrenschmidt
> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-dev@lists.ozla=
bs.org;
> Wood Scott-B07421
> Subject: [PATCH v2 2/4] kvm/ppc/booke64: Fix lazy ee handling in
> kvmppc_handle_exit()
>=20
> EE is hard-disabled on entry to kvmppc_handle_exit(), so call
> hard_irq_disable() so that PACA_IRQ_HARD_DIS is set, and soft_enabled
> is unset.
>=20
> Without this, we get warnings such as arch/powerpc/kernel/time.c:300,
> and sometimes host kernel hangs.
>=20
> Signed-off-by: Scott Wood <scottwood@freescale.com>
> ---
>  arch/powerpc/kvm/booke.c |    5 +++++
>  1 file changed, 5 insertions(+)
>=20
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index 1020119..705fc5c 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -833,6 +833,11 @@ int kvmppc_handle_exit(struct kvm_run *run, struct k=
vm_vcpu
> *vcpu,
>  	int r =3D RESUME_HOST;
>  	int s;
>=20
> +#ifdef CONFIG_PPC64
> +	WARN_ON(local_paca->irq_happened !=3D 0);
> +#endif
> +	hard_irq_disable();

It is not actually to hard disable as EE is already clear but to make it lo=
oks like hard_disable to host. Right?
If so, should we write a comment here on why we are doing this?=20

-Bharat

> +
>  	/* update before a new last_exit_type is rewritten */
>  	kvmppc_update_timing_stats(vcpu);
>=20
> --
> 1.7.10.4
>=20
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH v2 4/4] kvm/ppc: IRQ disabling cleanup
From: Bhushan Bharat-R65777 @ 2013-05-10  5:01 UTC (permalink / raw)
  To: Wood Scott-B07421, Alexander Graf, Benjamin Herrenschmidt
  Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org,
	kvm@vger.kernel.org, kvm-ppc@vger.kernel.org
In-Reply-To: <1368155384-11035-5-git-send-email-scottwood@freescale.com>



> -----Original Message-----
> From: kvm-ppc-owner@vger.kernel.org [mailto:kvm-ppc-owner@vger.kernel.org=
] On
> Behalf Of Scott Wood
> Sent: Friday, May 10, 2013 8:40 AM
> To: Alexander Graf; Benjamin Herrenschmidt
> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-dev@lists.ozla=
bs.org;
> Wood Scott-B07421
> Subject: [PATCH v2 4/4] kvm/ppc: IRQ disabling cleanup
>=20
> Simplify the handling of lazy EE by going directly from fully-enabled
> to hard-disabled.  This replaces the lazy_irq_pending() check
> (including its misplaced kvm_guest_exit() call).
>=20
> As suggested by Tiejun Chen, move the interrupt disabling into
> kvmppc_prepare_to_enter() rather than have each caller do it.  Also
> move the IRQ enabling on heavyweight exit into
> kvmppc_prepare_to_enter().
>=20
> Don't move kvmppc_fix_ee_before_entry() into kvmppc_prepare_to_enter(),
> so that the caller can avoid marking interrupts enabled earlier than
> necessary (e.g. book3s_pr waits until after FP save/restore is done).
>=20
> Signed-off-by: Scott Wood <scottwood@freescale.com>
> ---
>  arch/powerpc/include/asm/kvm_ppc.h |    6 ++++++
>  arch/powerpc/kvm/book3s_pr.c       |   12 +++---------
>  arch/powerpc/kvm/booke.c           |    9 ++-------
>  arch/powerpc/kvm/powerpc.c         |   21 ++++++++-------------
>  4 files changed, 19 insertions(+), 29 deletions(-)
>=20
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h
> b/arch/powerpc/include/asm/kvm_ppc.h
> index 6885846..e4474f8 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -404,6 +404,12 @@ static inline void kvmppc_fix_ee_before_entry(void)
>  	trace_hardirqs_on();
>=20
>  #ifdef CONFIG_PPC64
> +	/*
> +	 * To avoid races, the caller must have gone directly from having
> +	 * interrupts fully-enabled to hard-disabled.
> +	 */
> +	WARN_ON(local_paca->irq_happened !=3D PACA_IRQ_HARD_DIS);
> +
>  	/* Only need to enable IRQs by hard enabling them after this */
>  	local_paca->irq_happened =3D 0;
>  	local_paca->soft_enabled =3D 1;
> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> index 0b97ce4..e61e39e 100644
> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c
> @@ -884,14 +884,11 @@ program_interrupt:
>  		 * and if we really did time things so badly, then we just exit
>  		 * again due to a host external interrupt.
>  		 */
> -		local_irq_disable();
>  		s =3D kvmppc_prepare_to_enter(vcpu);
> -		if (s <=3D 0) {
> -			local_irq_enable();
> +		if (s <=3D 0)
>  			r =3D s;
> -		} else {
> +		else
>  			kvmppc_fix_ee_before_entry();
> -		}
>  	}
>=20
>  	trace_kvm_book3s_reenter(r, vcpu);
> @@ -1121,12 +1118,9 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struc=
t
> kvm_vcpu *vcpu)
>  	 * really did time things so badly, then we just exit again due to
>  	 * a host external interrupt.
>  	 */
> -	local_irq_disable();
>  	ret =3D kvmppc_prepare_to_enter(vcpu);
> -	if (ret <=3D 0) {
> -		local_irq_enable();
> +	if (ret <=3D 0)
>  		goto out;
> -	}
>=20
>  	/* Save FPU state in stack */
>  	if (current->thread.regs->msr & MSR_FP)
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index eb89b83..f7c0111 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -666,10 +666,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct
> kvm_vcpu *vcpu)
>  		return -EINVAL;
>  	}
>=20
> -	local_irq_disable();
>  	s =3D kvmppc_prepare_to_enter(vcpu);
>  	if (s <=3D 0) {
> -		local_irq_enable();
>  		ret =3D s;
>  		goto out;
>  	}
> @@ -1148,14 +1146,11 @@ int kvmppc_handle_exit(struct kvm_run *run, struc=
t
> kvm_vcpu *vcpu,
>  	 * aren't already exiting to userspace for some other reason.
>  	 */
>  	if (!(r & RESUME_HOST)) {
> -		local_irq_disable();

Ok, Now we do not soft disable before kvmppc_prapare_to_enter().

>  		s =3D kvmppc_prepare_to_enter(vcpu);
> -		if (s <=3D 0) {
> -			local_irq_enable();
> +		if (s <=3D 0)
>  			r =3D (s << 2) | RESUME_HOST | (r & RESUME_FLAG_NV);
> -		} else {
> +		else
>  			kvmppc_fix_ee_before_entry();
> -		}
>  	}
>=20
>  	return r;
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 4e05f8c..f8659aa 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -64,12 +64,14 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
>  {
>  	int r =3D 1;
>=20
> -	WARN_ON_ONCE(!irqs_disabled());
> +	WARN_ON(irqs_disabled());
> +	hard_irq_disable();

Here we hard disable in kvmppc_prepare_to_enter(), so my comment in other p=
atch about interrupt loss is no more valid.

So here
  MSR.EE =3D 0
  local_paca->soft_enabled =3D 0
  local_paca->irq_happened |=3D PACA_IRQ_HARD_DIS;

> +
>  	while (true) {
>  		if (need_resched()) {
>  			local_irq_enable();

This will make the state:
  MSR.EE =3D 1
  local_paca->soft_enabled =3D 1
  local_paca->irq_happened =3D PACA_IRQ_HARD_DIS;  //same as before

Is that a valid state where interrupts are fully enabled and irq_happend in=
 not 0?

>  			cond_resched();
> -			local_irq_disable();
> +			hard_irq_disable();
>  			continue;
>  		}
>=20
> @@ -95,7 +97,7 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
>  			local_irq_enable();
>  			trace_kvm_check_requests(vcpu);
>  			r =3D kvmppc_core_check_requests(vcpu);
> -			local_irq_disable();
> +			hard_irq_disable();
>  			if (r > 0)
>  				continue;
>  			break;
> @@ -108,21 +110,14 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
>  		}
>=20
>  #ifdef CONFIG_PPC64
> -		/* lazy EE magic */
> -		hard_irq_disable();
> -		if (lazy_irq_pending()) {
> -			/* Got an interrupt in between, try again */
> -			local_irq_enable();
> -			local_irq_disable();
> -			kvm_guest_exit();
> -			continue;
> -		}
> +		WARN_ON(lazy_irq_pending());
>  #endif
>=20
>  		kvm_guest_enter();
> -		break;
> +		return r;
>  	}
>=20
> +	local_irq_enable();
>  	return r;
>  }


int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
{
        int r =3D 0;
        WARN_ON_ONCE(!irqs_disabled());

        kvmppc_core_check_exceptions(vcpu);

        if (vcpu->requests) {
                /* Exception delivery raised request; start over */
                return 1;
        }

        if (vcpu->arch.shared->msr & MSR_WE) {
                local_irq_enable();
                kvm_vcpu_block(vcpu);
                clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
                local_irq_disable();
^^^
We do not require hard_irq_disable() here?

-Bharat

>  #endif /* CONFIG_KVM_BOOK3S_64_HV */
> --
> 1.7.10.4
>=20
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] powerpc: fix numa distance for form0 device tree
From: Ben Hutchings @ 2013-05-10  4:45 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Anton Blanchard, stable
In-Reply-To: <1367898574-20594-1-git-send-email-michael@ellerman.id.au>

[-- Attachment #1: Type: text/plain, Size: 325 bytes --]

On Tue, 2013-05-07 at 13:49 +1000, Michael Ellerman wrote:
> From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
> 
> Commit 7122beeee7bc1757682049780179d7c216dd1c83 upstream.
[...]

Queued up for 3.2, thanks.

Ben.

-- 
Ben Hutchings
For every action, there is an equal and opposite criticism. - Harrison

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* Re: [PATCH v2 4/4] kvm/ppc: IRQ disabling cleanup
From: tiejun.chen @ 2013-05-10  5:31 UTC (permalink / raw)
  To: Bhushan Bharat-R65777
  Cc: Wood Scott-B07421, kvm@vger.kernel.org, Alexander Graf,
	kvm-ppc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <6A3DF150A5B70D4F9B66A25E3F7C888D0700F859@039-SN2MPN1-011.039d.mgd.msft.net>

>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
>> index 4e05f8c..f8659aa 100644
>> --- a/arch/powerpc/kvm/powerpc.c
>> +++ b/arch/powerpc/kvm/powerpc.c
>> @@ -64,12 +64,14 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
>>   {
>>   	int r = 1;
>>
>> -	WARN_ON_ONCE(!irqs_disabled());
>> +	WARN_ON(irqs_disabled());
>> +	hard_irq_disable();
>
> Here we hard disable in kvmppc_prepare_to_enter(), so my comment in other patch about interrupt loss is no more valid.
>
> So here
>    MSR.EE = 0
>    local_paca->soft_enabled = 0
>    local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
>
>> +
>>   	while (true) {
>>   		if (need_resched()) {
>>   			local_irq_enable();
>
> This will make the state:
>    MSR.EE = 1
>    local_paca->soft_enabled = 1
>    local_paca->irq_happened = PACA_IRQ_HARD_DIS;  //same as before

Why is this same the above state? local_irq_enable() can call 
__check_irq_replay() to clear PACA_IRQ_HARD_DIS.

>
> Is that a valid state where interrupts are fully enabled and irq_happend in not 0?
>
>>   			cond_resched();
>> -			local_irq_disable();
>> +			hard_irq_disable();
>>   			continue;
>>   		}
>>
>> @@ -95,7 +97,7 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
>>   			local_irq_enable();
>>   			trace_kvm_check_requests(vcpu);
>>   			r = kvmppc_core_check_requests(vcpu);
>> -			local_irq_disable();
>> +			hard_irq_disable();
>>   			if (r > 0)
>>   				continue;
>>   			break;
>> @@ -108,21 +110,14 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
>>   		}
>>
>>   #ifdef CONFIG_PPC64
>> -		/* lazy EE magic */
>> -		hard_irq_disable();
>> -		if (lazy_irq_pending()) {
>> -			/* Got an interrupt in between, try again */
>> -			local_irq_enable();
>> -			local_irq_disable();
>> -			kvm_guest_exit();
>> -			continue;
>> -		}
>> +		WARN_ON(lazy_irq_pending());
>>   #endif
>>
>>   		kvm_guest_enter();
>> -		break;
>> +		return r;
>>   	}
>>
>> +	local_irq_enable();
>>   	return r;
>>   }
>
>
> int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
> {
>          int r = 0;
>          WARN_ON_ONCE(!irqs_disabled());
>
>          kvmppc_core_check_exceptions(vcpu);
>
>          if (vcpu->requests) {
>                  /* Exception delivery raised request; start over */
>                  return 1;
>          }
>
>          if (vcpu->arch.shared->msr & MSR_WE) {
>                  local_irq_enable();
>                  kvm_vcpu_block(vcpu);
>                  clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
>                  local_irq_disable();
> ^^^
> We do not require hard_irq_disable() here?

Between kvmppc_core_prepare_to_enter() and kvmppc_prepare_to_enter(), as I 
recall Scott had some discussions with Ben earlier.

Tiejun

>

^ permalink raw reply

* Re: [PATCH 2/6] KVM: PPC: Add support for multiple-TCE hcalls
From: David Gibson @ 2013-05-10  6:51 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, Joerg Roedel, Alexander Graf, kvm-ppc, Alex Williamson,
	Paul Mackerras, linuxppc-dev
In-Reply-To: <1367825157-27231-3-git-send-email-aik@ozlabs.ru>

[-- Attachment #1: Type: text/plain, Size: 19022 bytes --]

On Mon, May 06, 2013 at 05:25:53PM +1000, Alexey Kardashevskiy wrote:
> This adds real mode handlers for the H_PUT_TCE_INDIRECT and
> H_STUFF_TCE hypercalls for QEMU emulated devices such as virtio
> devices or emulated PCI.  These calls allow adding multiple entries
> (up to 512) into the TCE table in one call which saves time on
> transition to/from real mode.
> 
> This adds a guest physical to host real address converter
> and calls the existing H_PUT_TCE handler. The converting function
> is going to be fully utilized by upcoming VFIO supporting patches.
> 
> This also implements the KVM_CAP_PPC_MULTITCE capability,
> so in order to support the functionality of this patch, QEMU
> needs to query for this capability and set the "hcall-multi-tce"
> hypertas property only if the capability is present, otherwise
> there will be serious performance degradation.


Hrm.  Clearly I didn't read this carefully enough before.  There are
some problems here.

[snip]
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 72ffc89..643ac1e 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -14,6 +14,7 @@
>   *
>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
> + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>   */
>  
>  #include <linux/types.h>
> @@ -36,9 +37,14 @@
>  #include <asm/ppc-opcode.h>
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
> +#include <asm/iommu.h>
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
> +#define ERROR_ADDR      (~(unsigned long)0x0)
>  
> +/*
> + * TCE tables handlers.
> + */
>  static long kvmppc_stt_npages(unsigned long window_size)
>  {
>  	return ALIGN((window_size >> SPAPR_TCE_SHIFT)
> @@ -148,3 +154,111 @@ fail:
>  	}
>  	return ret;
>  }
> +
> +/*
> + * Virtual mode handling of IOMMU map/unmap.
> + */
> +/* Converts guest physical address into host virtual */
> +static unsigned long get_virt_address(struct kvm_vcpu *vcpu,
> +		unsigned long gpa)

This should probably return a void * rather than an unsigned long.
Well, actually a void __user *.

> +{
> +	unsigned long hva, gfn = gpa >> PAGE_SHIFT;
> +	struct kvm_memory_slot *memslot;
> +
> +	memslot = search_memslots(kvm_memslots(vcpu->kvm), gfn);
> +	if (!memslot)
> +		return ERROR_ADDR;
> +
> +	/*
> +	 * Convert gfn to hva preserving flags and an offset
> +	 * within a system page
> +	 */
> +	hva = __gfn_to_hva_memslot(memslot, gfn) + (gpa & ~PAGE_MASK);
> +	return hva;
> +}
> +
> +long kvmppc_virtmode_h_put_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce)
> +{
> +	struct kvmppc_spapr_tce_table *tt;
> +
> +	tt = kvmppc_find_tce_table(vcpu, liobn);
> +	/* Didn't find the liobn, put it to userspace */
> +	if (!tt)
> +		return H_TOO_HARD;
> +
> +	/* Emulated IO */
> +	return kvmppc_emulated_h_put_tce(tt, ioba, tce);
> +}
> +
> +long kvmppc_virtmode_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list, unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *tt;
> +	long i;
> +	unsigned long tces;
> +
> +	/* The whole table addressed by tce_list resides in 4K page */
> +	if (npages > 512)
> +		return H_PARAMETER;

So, that doesn't actually verify what the comment says it does - only
that the list is < 4K in total.  You need to check the alignment of
tce_list as well.

> +
> +	tt = kvmppc_find_tce_table(vcpu, liobn);
> +	/* Didn't find the liobn, put it to userspace */
> +	if (!tt)
> +		return H_TOO_HARD;
> +
> +	tces = get_virt_address(vcpu, tce_list);
> +	if (tces == ERROR_ADDR)
> +		return H_TOO_HARD;
> +
> +	/* Emulated IO */

This comment doesn't seem to have any bearing on the test which
follows it.

> +	if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
> +		return H_PARAMETER;
> +
> +	for (i = 0; i < npages; ++i) {
> +		unsigned long tce;
> +		unsigned long ptce = tces + i * sizeof(unsigned long);
> +
> +		if (get_user(tce, (unsigned long __user *)ptce))
> +			break;
> +
> +		if (kvmppc_emulated_h_put_tce(tt,
> +				ioba + (i << IOMMU_PAGE_SHIFT),	tce))
> +			break;
> +	}
> +	if (i == npages)
> +		return H_SUCCESS;
> +
> +	/* Failed, do cleanup */
> +	do {
> +		--i;
> +		kvmppc_emulated_h_put_tce(tt, ioba + (i << IOMMU_PAGE_SHIFT),
> +				0);
> +	} while (i);

Hrm, so, actually PAPR specifies that this hcall is supposed to first
copy the given tces to hypervisor memory, then translate (and
validate) them all, and only then touch the actual TCE table.  Rather
more complicated to do, but I guess we should - that would get rid of
the need for this partial cleanup in the failure case.

> +
> +	return H_PARAMETER;
> +}
> +
> +long kvmppc_virtmode_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *tt;
> +	long i;
> +
> +	tt = kvmppc_find_tce_table(vcpu, liobn);
> +	/* Didn't find the liobn, put it to userspace */
> +	if (!tt)
> +		return H_TOO_HARD;
> +
> +	/* Emulated IO */
> +	if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
> +		return H_PARAMETER;
> +
> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE)
> +		kvmppc_emulated_h_put_tce(tt, ioba, tce_value);
> +
> +	return H_SUCCESS;
> +}
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 30c2f3b..55fdf7a 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -14,6 +14,7 @@
>   *
>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
> + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>   */
>  
>  #include <linux/types.h>
> @@ -35,42 +36,214 @@
>  #include <asm/ppc-opcode.h>
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
> +#include <asm/iommu.h>
> +#include <asm/tce.h>
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
> +#define ERROR_ADDR      (~(unsigned long)0x0)
>  
> -/* WARNING: This will be called in real-mode on HV KVM and virtual
> - *          mode on PR KVM
> +/*
> + * Finds a TCE table descriptor by LIOBN.
>   */
> +struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(struct kvm_vcpu *vcpu,
> +		unsigned long liobn)
> +{
> +	struct kvmppc_spapr_tce_table *tt;
> +
> +	list_for_each_entry(tt, &vcpu->kvm->arch.spapr_tce_tables, list) {
> +		if (tt->liobn == liobn)
> +			return tt;
> +	}
> +
> +	return NULL;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_find_tce_table);
> +
> +/*
> + * kvmppc_emulated_h_put_tce() handles TCE requests for devices emulated
> + * by QEMU. It puts guest TCE values into the table and expects
> + * the QEMU to convert them later in the QEMU device implementation.
> + * Works in both real and virtual modes.
> + */
> +long kvmppc_emulated_h_put_tce(struct kvmppc_spapr_tce_table *tt,
> +		unsigned long ioba, unsigned long tce)
> +{
> +	unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
> +	struct page *page;
> +	u64 *tbl;
> +
> +	/* udbg_printf("H_PUT_TCE: liobn 0x%lx => tt=%p  window_size=0x%x\n", */
> +	/*	    liobn, tt, tt->window_size); */
> +	if (ioba >= tt->window_size) {
> +		/* pr_err("%s failed on ioba=%lx\n", __func__, ioba); */
> +		return H_PARAMETER;
> +	}
> +	/*
> +	 * Note on the use of page_address() in real mode,
> +	 *
> +	 * It is safe to use page_address() in real mode on ppc64 because
> +	 * page_address() is always defined as lowmem_page_address()
> +	 * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial
> +	 * operation and does not access page struct.
> +	 *
> +	 * Theoretically page_address() could be defined different
> +	 * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
> +	 * should be enabled.
> +	 * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64,
> +	 * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only
> +	 * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP
> +	 * is not expected to be enabled on ppc32, page_address()
> +	 * is safe for ppc32 as well.
> +	 */
> +#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL)
> +#error TODO: fix to avoid page_address() here
> +#endif
> +	page = tt->pages[idx / TCES_PER_PAGE];
> +	tbl = (u64 *)page_address(page);
> +
> +	/*
> +	 * Validate TCE address.
> +	 * At the moment only flags are validated
> +	 * as other check will significantly slow down
> +	 * or can make it even impossible to handle TCE requests
> +	 * in real mode.
> +	 */
> +	if (tce & ~(IOMMU_PAGE_MASK | TCE_PCI_WRITE | TCE_PCI_READ))
> +		return H_PARAMETER;
> +
> +	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
> +	tbl[idx % TCES_PER_PAGE] = tce;
> +
> +	return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_emulated_h_put_tce);
> +
> +#ifdef CONFIG_KVM_BOOK3S_64_HV
> +/*
> + * Converts guest physical address into host real address.
> + * Also returns pte and page size if the page is present in page table.
> + */
> +static unsigned long get_real_address(struct kvm_vcpu *vcpu,
> +		unsigned long gpa, bool writing,
> +		pte_t *ptep, unsigned long *pg_sizep)

The only caller doesn't use the ptep and pg_sizep pointers, so there's
no point implementing them.

> +{
> +	struct kvm_memory_slot *memslot;
> +	pte_t pte;
> +	unsigned long hva, pg_size = 0, hwaddr, offset;
> +	unsigned long gfn = gpa >> PAGE_SHIFT;
> +
> +	/* Find a KVM memslot */
> +	memslot = search_memslots(kvm_memslots(vcpu->kvm), gfn);
> +	if (!memslot)
> +		return ERROR_ADDR;
> +
> +	/* Convert guest physical address to host virtual */
> +	hva = __gfn_to_hva_memslot(memslot, gfn);
> +
> +	/* Find a PTE and determine the size */
> +	pte = lookup_linux_pte(vcpu->arch.pgdir, hva,
> +			writing, &pg_size);
> +	if (!pte_present(pte))
> +		return ERROR_ADDR;
> +
> +	/* Calculate host phys address keeping flags and offset in the page */
> +	offset = gpa & (pg_size - 1);
> +
> +	/* pte_pfn(pte) should return an address aligned to pg_size */
> +	hwaddr = (pte_pfn(pte) << PAGE_SHIFT) + offset;
> +
> +	/* Copy outer values if required */
> +	if (pg_sizep)
> +		*pg_sizep = pg_size;
> +	if (ptep)
> +		*ptep = pte;
> +
> +	return hwaddr;
> +}
> +
>  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		      unsigned long ioba, unsigned long tce)
>  {
> -	struct kvm *kvm = vcpu->kvm;
> -	struct kvmppc_spapr_tce_table *stt;
> -
> -	/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
> -	/* 	    liobn, ioba, tce); */
> -
> -	list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
> -		if (stt->liobn == liobn) {
> -			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
> -			struct page *page;
> -			u64 *tbl;
> -
> -			/* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p  window_size=0x%x\n", */
> -			/* 	    liobn, stt, stt->window_size); */
> -			if (ioba >= stt->window_size)
> -				return H_PARAMETER;
> -
> -			page = stt->pages[idx / TCES_PER_PAGE];
> -			tbl = (u64 *)page_address(page);
> -
> -			/* FIXME: Need to validate the TCE itself */
> -			/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
> -			tbl[idx % TCES_PER_PAGE] = tce;
> -			return H_SUCCESS;
> -		}
> +	struct kvmppc_spapr_tce_table *tt;
> +
> +	tt = kvmppc_find_tce_table(vcpu, liobn);
> +	/* Didn't find the liobn, put it to virtual space */
> +	if (!tt)
> +		return H_TOO_HARD;
> +
> +	/* Emulated IO */
> +	return kvmppc_emulated_h_put_tce(tt, ioba, tce);
> +}
> +
> +long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list,	unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *tt;
> +	long i;
> +	unsigned long tces;
> +
> +	/* The whole table addressed by tce_list resides in 4K page */
> +	if (npages > 512)
> +		return H_PARAMETER;
> +
> +	tt = kvmppc_find_tce_table(vcpu, liobn);
> +	/* Didn't find the liobn, put it to virtual space */
> +	if (!tt)
> +		return H_TOO_HARD;
> +
> +	tces = get_real_address(vcpu, tce_list, false, NULL, NULL);
> +	if (tces == ERROR_ADDR)
> +		return H_TOO_HARD;
> +
> +	/* Emulated IO */
> +	if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
> +		return H_PARAMETER;
> +
> +	for (i = 0; i < npages; ++i) {
> +		unsigned long tce;
> +		unsigned long ptce = tces + i * sizeof(unsigned long);
> +
> +		if (get_user(tce, (unsigned long __user *)ptce))
> +			break;

Uh.  get_user() should not be used from real mode.  You'll need to
dereference the pointer directly here.

> +
> +		if (kvmppc_emulated_h_put_tce(tt,
> +				ioba + (i << IOMMU_PAGE_SHIFT), tce))
> +			break;
>  	}
> +	if (i == npages)
> +		return H_SUCCESS;
> +
> +	/* Failed, do cleanup */
> +	do {
> +		--i;
> +		kvmppc_emulated_h_put_tce(tt, ioba + (i << IOMMU_PAGE_SHIFT),
> +				0);
> +	} while (i);
> +
> +	return H_PARAMETER;
> +}
>  
> -	/* Didn't find the liobn, punt it to userspace */
> -	return H_TOO_HARD;
> +long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *tt;
> +	long i;
> +
> +	tt = kvmppc_find_tce_table(vcpu, liobn);
> +	/* Didn't find the liobn, put it to virtual space */
> +	if (!tt)
> +		return H_TOO_HARD;
> +
> +	/* Emulated IO */
> +	if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
> +		return H_PARAMETER;
> +
> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE)
> +		kvmppc_emulated_h_put_tce(tt, ioba, tce_value);
> +
> +	return H_SUCCESS;
>  }
> +
> +#endif /* CONFIG_KVM_BOOK3S_64_HV */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index e5afdcb..6eb6f44 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -565,6 +565,29 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>  			ret = kvmppc_xics_hcall(vcpu, req);
>  			break;
>  		} /* fallthrough */


As paulus already pointed out, inserting this code here breaks the no
kernel xics case badly because of the fallthrough above.

> +	case H_PUT_TCE:
> +		ret = kvmppc_virtmode_h_put_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
> +						kvmppc_get_gpr(vcpu, 5),
> +						kvmppc_get_gpr(vcpu, 6));
> +		if (ret == H_TOO_HARD)
> +			return RESUME_HOST;
> +		break;
> +	case H_PUT_TCE_INDIRECT:
> +		ret = kvmppc_virtmode_h_put_tce_indirect(vcpu, kvmppc_get_gpr(vcpu, 4),
> +						kvmppc_get_gpr(vcpu, 5),
> +						kvmppc_get_gpr(vcpu, 6),
> +						kvmppc_get_gpr(vcpu, 7));
> +		if (ret == H_TOO_HARD)
> +			return RESUME_HOST;
> +		break;
> +	case H_STUFF_TCE:
> +		ret = kvmppc_virtmode_h_stuff_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
> +						kvmppc_get_gpr(vcpu, 5),
> +						kvmppc_get_gpr(vcpu, 6),
> +						kvmppc_get_gpr(vcpu, 7));
> +		if (ret == H_TOO_HARD)
> +			return RESUME_HOST;
> +		break;
>  	default:
>  		return RESUME_HOST;
>  	}
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index d3e26be..4d73406 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -1472,6 +1472,12 @@ hcall_real_table:
>  	.long	0		/* 0x11c */
>  	.long	0		/* 0x120 */
>  	.long	.kvmppc_h_bulk_remove - hcall_real_table
> +	.long	0		/* 0x128 */
> +	.long	0		/* 0x12c */
> +	.long	0		/* 0x130 */
> +	.long	0		/* 0x134 */
> +	.long	.kvmppc_h_stuff_tce - hcall_real_table
> +	.long	.kvmppc_h_put_tce_indirect - hcall_real_table
>  hcall_real_table_end:
>  
>  ignore_hdec:
> diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
> index 8352cac..603e4f2 100644
> --- a/arch/powerpc/kvm/book3s_pr_papr.c
> +++ b/arch/powerpc/kvm/book3s_pr_papr.c
> @@ -220,7 +220,38 @@ static int kvmppc_h_pr_put_tce(struct kvm_vcpu *vcpu)
>  	unsigned long tce = kvmppc_get_gpr(vcpu, 6);
>  	long rc;
>  
> -	rc = kvmppc_h_put_tce(vcpu, liobn, ioba, tce);
> +	rc = kvmppc_virtmode_h_put_tce(vcpu, liobn, ioba, tce);
> +	if (rc == H_TOO_HARD)
> +		return EMULATE_FAIL;
> +	kvmppc_set_gpr(vcpu, 3, rc);
> +	return EMULATE_DONE;
> +}
> +
> +static int kvmppc_h_pr_put_tce_indirect(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
> +	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
> +	unsigned long tce = kvmppc_get_gpr(vcpu, 6);
> +	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
> +	long rc;
> +
> +	rc = kvmppc_virtmode_h_put_tce_indirect(vcpu, liobn, ioba,
> +			tce, npages);
> +	if (rc == H_TOO_HARD)
> +		return EMULATE_FAIL;
> +	kvmppc_set_gpr(vcpu, 3, rc);
> +	return EMULATE_DONE;
> +}
> +
> +static int kvmppc_h_pr_stuff_tce(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
> +	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
> +	unsigned long tce_value = kvmppc_get_gpr(vcpu, 6);
> +	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
> +	long rc;
> +
> +	rc = kvmppc_virtmode_h_stuff_tce(vcpu, liobn, ioba, tce_value, npages);
>  	if (rc == H_TOO_HARD)
>  		return EMULATE_FAIL;
>  	kvmppc_set_gpr(vcpu, 3, rc);
> @@ -249,6 +280,10 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
>  		return kvmppc_h_pr_bulk_remove(vcpu);
>  	case H_PUT_TCE:
>  		return kvmppc_h_pr_put_tce(vcpu);
> +	case H_PUT_TCE_INDIRECT:
> +		return kvmppc_h_pr_put_tce_indirect(vcpu);
> +	case H_STUFF_TCE:
> +		return kvmppc_h_pr_stuff_tce(vcpu);
>  	case H_CEDE:
>  		vcpu->arch.shared->msr |= MSR_EE;
>  		kvm_vcpu_block(vcpu);
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index f9c159e..b7ad589 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -384,6 +384,9 @@ int kvm_dev_ioctl_check_extension(long ext)
>  		r = 1;
>  		break;
>  #endif
> +	case KVM_CAP_SPAPR_MULTITCE:
> +		r = 1;
> +		break;
>  	default:
>  		r = 0;
>  		break;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 6f49c87..6c04da1 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -640,6 +640,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_PPC_HTAB_FD 84
>  #define KVM_CAP_PPC_RTAS (0x100000 + 87)
>  #define KVM_CAP_SPAPR_XICS (0x100000 + 88)
> +#define KVM_CAP_SPAPR_MULTITCE (0x110000 + 89)
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [PATCH v2 1/4] powerpc: hard_irq_disable(): Call trace_hardirqs_off after disabling
From: Benjamin Herrenschmidt @ 2013-05-10  7:00 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Alexander Graf, kvm-ppc, kvm
In-Reply-To: <1368155384-11035-2-git-send-email-scottwood@freescale.com>

On Thu, 2013-05-09 at 22:09 -0500, Scott Wood wrote:
> lockdep.c has this:
>         /*
>          * So we're supposed to get called after you mask local IRQs,
>          * but for some reason the hardware doesn't quite think you did
>          * a proper job.
>          */
> 	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
> 		return;
> 
> Since irqs_disabled() is based on soft_enabled(), that (not just the
> hard EE bit) needs to be 0 before we call trace_hardirqs_off.
> 
> Signed-off-by: Scott Wood <scottwood@freescale.com>

Oops ;-)

I'll apply that to my tree and will send it to Linus right after -rc1,
the rest will go the normal way for KVM patches.

Cheers,
Ben.

> ---
>  arch/powerpc/include/asm/hw_irq.h |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> index d615b28..ba713f1 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -96,11 +96,12 @@ static inline bool arch_irqs_disabled(void)
>  #endif
>  
>  #define hard_irq_disable()	do {			\
> +	u8 _was_enabled = get_paca()->soft_enabled;	\
>  	__hard_irq_disable();				\
> -	if (local_paca->soft_enabled)			\
> -		trace_hardirqs_off();			\
>  	get_paca()->soft_enabled = 0;			\
>  	get_paca()->irq_happened |= PACA_IRQ_HARD_DIS;	\
> +	if (_was_enabled)				\
> +		trace_hardirqs_off();			\
>  } while(0)
>  
>  static inline bool lazy_irq_pending(void)

^ permalink raw reply

* Re: [PATCH 2/6] KVM: PPC: Add support for multiple-TCE hcalls
From: Alexey Kardashevskiy @ 2013-05-10  7:53 UTC (permalink / raw)
  To: David Gibson
  Cc: kvm, Joerg Roedel, Alexander Graf, kvm-ppc, Alex Williamson,
	Paul Mackerras, linuxppc-dev
In-Reply-To: <20130510065101.GF23358@truffula.fritz.box>

On 05/10/2013 04:51 PM, David Gibson wrote:
> On Mon, May 06, 2013 at 05:25:53PM +1000, Alexey Kardashevskiy wrote:
>> This adds real mode handlers for the H_PUT_TCE_INDIRECT and
>> H_STUFF_TCE hypercalls for QEMU emulated devices such as virtio
>> devices or emulated PCI.  These calls allow adding multiple entries
>> (up to 512) into the TCE table in one call which saves time on
>> transition to/from real mode.
>>
>> This adds a guest physical to host real address converter
>> and calls the existing H_PUT_TCE handler. The converting function
>> is going to be fully utilized by upcoming VFIO supporting patches.
>>
>> This also implements the KVM_CAP_PPC_MULTITCE capability,
>> so in order to support the functionality of this patch, QEMU
>> needs to query for this capability and set the "hcall-multi-tce"
>> hypertas property only if the capability is present, otherwise
>> there will be serious performance degradation.
> 
> 
> Hrm.  Clearly I didn't read this carefully enough before.  There are
> some problems here.

?


> [snip]
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 72ffc89..643ac1e 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -14,6 +14,7 @@
>>   *
>>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
>> + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>>   */
>>  
>>  #include <linux/types.h>
>> @@ -36,9 +37,14 @@
>>  #include <asm/ppc-opcode.h>
>>  #include <asm/kvm_host.h>
>>  #include <asm/udbg.h>
>> +#include <asm/iommu.h>
>>  
>>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>> +#define ERROR_ADDR      (~(unsigned long)0x0)
>>  
>> +/*
>> + * TCE tables handlers.
>> + */
>>  static long kvmppc_stt_npages(unsigned long window_size)
>>  {
>>  	return ALIGN((window_size >> SPAPR_TCE_SHIFT)
>> @@ -148,3 +154,111 @@ fail:
>>  	}
>>  	return ret;
>>  }
>> +
>> +/*
>> + * Virtual mode handling of IOMMU map/unmap.
>> + */
>> +/* Converts guest physical address into host virtual */
>> +static unsigned long get_virt_address(struct kvm_vcpu *vcpu,
>> +		unsigned long gpa)
> 
> This should probably return a void * rather than an unsigned long.
> Well, actually a void __user *.
> 
>> +{
>> +	unsigned long hva, gfn = gpa >> PAGE_SHIFT;
>> +	struct kvm_memory_slot *memslot;
>> +
>> +	memslot = search_memslots(kvm_memslots(vcpu->kvm), gfn);
>> +	if (!memslot)
>> +		return ERROR_ADDR;
>> +
>> +	/*
>> +	 * Convert gfn to hva preserving flags and an offset
>> +	 * within a system page
>> +	 */
>> +	hva = __gfn_to_hva_memslot(memslot, gfn) + (gpa & ~PAGE_MASK);
>> +	return hva;
>> +}
>> +
>> +long kvmppc_virtmode_h_put_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce)
>> +{
>> +	struct kvmppc_spapr_tce_table *tt;
>> +
>> +	tt = kvmppc_find_tce_table(vcpu, liobn);
>> +	/* Didn't find the liobn, put it to userspace */
>> +	if (!tt)
>> +		return H_TOO_HARD;
>> +
>> +	/* Emulated IO */
>> +	return kvmppc_emulated_h_put_tce(tt, ioba, tce);
>> +}
>> +
>> +long kvmppc_virtmode_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_list, unsigned long npages)
>> +{
>> +	struct kvmppc_spapr_tce_table *tt;
>> +	long i;
>> +	unsigned long tces;
>> +
>> +	/* The whole table addressed by tce_list resides in 4K page */
>> +	if (npages > 512)
>> +		return H_PARAMETER;
> 
> So, that doesn't actually verify what the comment says it does - only
> that the list is < 4K in total.  You need to check the alignment of
> tce_list as well.



The spec says to return H_PARAMETER if >512. I.e. it takes just 1 page and
I do not need to bother if pages may not lay continuously in RAM (matters
for real mode).

/*
 * As the spec is saying that maximum possible number of TCEs is 512,
 * the whole TCE page is no more than 4K. Therefore we do not have to
 * worry if pages do not lie continuously in the RAM
 */
Any better?...


>> +
>> +	tt = kvmppc_find_tce_table(vcpu, liobn);
>> +	/* Didn't find the liobn, put it to userspace */
>> +	if (!tt)
>> +		return H_TOO_HARD;
>> +
>> +	tces = get_virt_address(vcpu, tce_list);
>> +	if (tces == ERROR_ADDR)
>> +		return H_TOO_HARD;
>> +
>> +	/* Emulated IO */
> 
> This comment doesn't seem to have any bearing on the test which
> follows it.
> 
>> +	if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
>> +		return H_PARAMETER;
>> +
>> +	for (i = 0; i < npages; ++i) {
>> +		unsigned long tce;
>> +		unsigned long ptce = tces + i * sizeof(unsigned long);
>> +
>> +		if (get_user(tce, (unsigned long __user *)ptce))
>> +			break;
>> +
>> +		if (kvmppc_emulated_h_put_tce(tt,
>> +				ioba + (i << IOMMU_PAGE_SHIFT),	tce))
>> +			break;
>> +	}
>> +	if (i == npages)
>> +		return H_SUCCESS;
>> +
>> +	/* Failed, do cleanup */
>> +	do {
>> +		--i;
>> +		kvmppc_emulated_h_put_tce(tt, ioba + (i << IOMMU_PAGE_SHIFT),
>> +				0);
>> +	} while (i);
> 
> Hrm, so, actually PAPR specifies that this hcall is supposed to first
> copy the given tces to hypervisor memory, then translate (and
> validate) them all, and only then touch the actual TCE table.  Rather
> more complicated to do, but I guess we should - that would get rid of
> the need for this partial cleanup in the failure case.


So we have to kmalloc(4K) on every PUT_INDIRECT. Or we can put tces on the
stack (4K is quire a lot for the kernel, no)?



>> +
>> +	return H_PARAMETER;
>> +}
>> +
>> +long kvmppc_virtmode_h_stuff_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_value, unsigned long npages)
>> +{
>> +	struct kvmppc_spapr_tce_table *tt;
>> +	long i;
>> +
>> +	tt = kvmppc_find_tce_table(vcpu, liobn);
>> +	/* Didn't find the liobn, put it to userspace */
>> +	if (!tt)
>> +		return H_TOO_HARD;
>> +
>> +	/* Emulated IO */
>> +	if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
>> +		return H_PARAMETER;
>> +
>> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE)
>> +		kvmppc_emulated_h_put_tce(tt, ioba, tce_value);
>> +
>> +	return H_SUCCESS;
>> +}
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index 30c2f3b..55fdf7a 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -14,6 +14,7 @@
>>   *
>>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
>> + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>>   */
>>  
>>  #include <linux/types.h>
>> @@ -35,42 +36,214 @@
>>  #include <asm/ppc-opcode.h>
>>  #include <asm/kvm_host.h>
>>  #include <asm/udbg.h>
>> +#include <asm/iommu.h>
>> +#include <asm/tce.h>
>>  
>>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>> +#define ERROR_ADDR      (~(unsigned long)0x0)
>>  
>> -/* WARNING: This will be called in real-mode on HV KVM and virtual
>> - *          mode on PR KVM
>> +/*
>> + * Finds a TCE table descriptor by LIOBN.
>>   */
>> +struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn)
>> +{
>> +	struct kvmppc_spapr_tce_table *tt;
>> +
>> +	list_for_each_entry(tt, &vcpu->kvm->arch.spapr_tce_tables, list) {
>> +		if (tt->liobn == liobn)
>> +			return tt;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_find_tce_table);
>> +
>> +/*
>> + * kvmppc_emulated_h_put_tce() handles TCE requests for devices emulated
>> + * by QEMU. It puts guest TCE values into the table and expects
>> + * the QEMU to convert them later in the QEMU device implementation.
>> + * Works in both real and virtual modes.
>> + */
>> +long kvmppc_emulated_h_put_tce(struct kvmppc_spapr_tce_table *tt,
>> +		unsigned long ioba, unsigned long tce)
>> +{
>> +	unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
>> +	struct page *page;
>> +	u64 *tbl;
>> +
>> +	/* udbg_printf("H_PUT_TCE: liobn 0x%lx => tt=%p  window_size=0x%x\n", */
>> +	/*	    liobn, tt, tt->window_size); */
>> +	if (ioba >= tt->window_size) {
>> +		/* pr_err("%s failed on ioba=%lx\n", __func__, ioba); */
>> +		return H_PARAMETER;
>> +	}
>> +	/*
>> +	 * Note on the use of page_address() in real mode,
>> +	 *
>> +	 * It is safe to use page_address() in real mode on ppc64 because
>> +	 * page_address() is always defined as lowmem_page_address()
>> +	 * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial
>> +	 * operation and does not access page struct.
>> +	 *
>> +	 * Theoretically page_address() could be defined different
>> +	 * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
>> +	 * should be enabled.
>> +	 * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64,
>> +	 * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only
>> +	 * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP
>> +	 * is not expected to be enabled on ppc32, page_address()
>> +	 * is safe for ppc32 as well.
>> +	 */
>> +#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL)
>> +#error TODO: fix to avoid page_address() here
>> +#endif
>> +	page = tt->pages[idx / TCES_PER_PAGE];
>> +	tbl = (u64 *)page_address(page);
>> +
>> +	/*
>> +	 * Validate TCE address.
>> +	 * At the moment only flags are validated
>> +	 * as other check will significantly slow down
>> +	 * or can make it even impossible to handle TCE requests
>> +	 * in real mode.
>> +	 */
>> +	if (tce & ~(IOMMU_PAGE_MASK | TCE_PCI_WRITE | TCE_PCI_READ))
>> +		return H_PARAMETER;
>> +
>> +	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
>> +	tbl[idx % TCES_PER_PAGE] = tce;
>> +
>> +	return H_SUCCESS;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_emulated_h_put_tce);
>> +
>> +#ifdef CONFIG_KVM_BOOK3S_64_HV
>> +/*
>> + * Converts guest physical address into host real address.
>> + * Also returns pte and page size if the page is present in page table.
>> + */
>> +static unsigned long get_real_address(struct kvm_vcpu *vcpu,
>> +		unsigned long gpa, bool writing,
>> +		pte_t *ptep, unsigned long *pg_sizep)
> 
> The only caller doesn't use the ptep and pg_sizep pointers, so there's
> no point implementing them.


"KVM: PPC: Add support for IOMMU in-kernel handling" will. Is there much
sense in splitting this quite small function between patches?




-- 
Alexey

^ permalink raw reply

* [PATCH] powerpc/perf: Fix setting of "to" addresses for BHRB
From: Michael Neuling @ 2013-05-10  9:56 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Anshuman Khandual
In-Reply-To: <1368064009-13255-2-git-send-email-mikey@neuling.org>

Currently we only set the "to" address in the branch stack when the CPU
explicitly gives us a value.  Unfortunately it only does this for XL form
branches (eg blr, bctr, bctar) and not I and B form branches (eg b, bc).

Fortunately if we read the instruction from memory we can extract the offset of
a branch and calculate the target address.

This adds a function power_pmu_bhrb_to() to calculate the target/to address of
the corresponding I and B form branches.  It handles branches in both user and
kernel spaces.  It also plumbs this into the perf brhb reading code.

Signed-off-by: Michael Neuling <mikey@neuling.org>
--- 
Minor update to add __user to addr pointer cast in __get_user_inatomic()
as suggested by milton.

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 2d81372..501e47f 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -13,11 +13,13 @@
 #include <linux/perf_event.h>
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
+#include <linux/uaccess.h>
 #include <asm/reg.h>
 #include <asm/pmc.h>
 #include <asm/machdep.h>
 #include <asm/firmware.h>
 #include <asm/ptrace.h>
+#include <asm/code-patching.h>
 
 #define BHRB_MAX_ENTRIES	32
 #define BHRB_TARGET		0x0000000000000002
@@ -1458,6 +1460,33 @@ struct pmu power_pmu = {
 	.flush_branch_stack = power_pmu_flush_branch_stack,
 };
 
+/* Calculate the to address for a branch */
+static __u64 power_pmu_bhrb_to(u64 addr)
+{
+	unsigned int instr;
+	int ret;
+	__u64 target;
+
+	if (is_kernel_addr(addr))
+		return branch_target((unsigned int *)addr);
+
+	/* Userspace: need copy instruction here then translate it */
+	pagefault_disable();
+	ret = __get_user_inatomic(instr, (unsigned int __user *)addr);
+	if (ret) {
+		pagefault_enable();
+		return 0;
+	}
+	pagefault_enable();
+
+	target = branch_target(&instr);
+	if ((!target) || (instr & BRANCH_ABSOLUTE))
+		return target;
+
+	/* Translate relative branch target from kernel to user address */
+	return target - (unsigned long)&instr + addr;
+}
+
 /* Processing BHRB entries */
 void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 {
@@ -1521,7 +1550,8 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 				/* Branches to immediate field 
 				   (ie I or B form) */
 				cpuhw->bhrb_entries[u_index].from = addr;
-				cpuhw->bhrb_entries[u_index].to = 0;
+				cpuhw->bhrb_entries[u_index].to =
+					power_pmu_bhrb_to(addr);
 				cpuhw->bhrb_entries[u_index].mispred = pred;
 				cpuhw->bhrb_entries[u_index].predicted = ~pred;
 			}

^ permalink raw reply related

* Re: Invalid perf_branch_entry.to entries question
From: Peter Zijlstra @ 2013-05-10 10:43 UTC (permalink / raw)
  To: Michael Neuling; +Cc: Linux PPC dev, linux-kernel, eranian, Anshuman Khandual
In-Reply-To: <14691.1368052755@ale.ozlabs.ibm.com>

On Thu, May 09, 2013 at 08:39:15AM +1000, Michael Neuling wrote:
> > Just because I'm curious.. however does that happen? Surely the CPU
> > knows where next to fetch instructions?
> 
> For computed gotos (ie. branch to a register value), the hardware gives
> you the from and to address in the branch history buffer.
> 
> For branches where the branch target address is an immediate encoded in
> the instruction, the hardware only logs the from address.  It assumes
> that software (perf irq handler in this case) can read this branch
> instruction, calculate the corresponding offset and hence the
> to/target address.
> 
> It's entirely possible that when the perf IRQ handler happens, the
> instruction in question is not readable or is no longer a branch (self
> modifying code).  Hence we aren't able to calculate a valid to address.

Ohh how cute! You've gotta love lazy hardware :-)

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox